July 26, 2011

Online shops and robots.txt help to leak personal data


Two major data leaks occurred in Russia over the past several days.  Short Message Service (SMS) text messages and personal Information about people who ordered goods from Russian and Ukranian online shops (including sex shops) have been available for public viewing.  Last week approximately 8000 private SMS messages sent from the Russian mobile network online service MegaFon were indexed by search engines.  

The reasons for this breach? Human error. The robots.txt file was removed by mistake, and a search engine browser plug-in called Yandex.Bar, equivalent to the Google toolbar, sent individual page URLs to the search engine for indexing.

This is an example of poor site design combined with bad luck. Sites should not display pages that contain SMS details to a client other than the sender (by using cookies, for example).  In this case, site designers assumed that a unique URL was enough for security.  They were wrong. The search engine's browser plug-in transferred each unique URL directly to the search engine, and because they had removed robots.txt, the only blocking entity, the result was a flood of personal data. 

Today we see another leak of personal information about online shoppers listed in Yandex, Google, and other major search engine's results. 


In today's case, robots.txt was again a problem, this time because the file was present but incorrectly configured. The file did not include instructions not to index pages with personal data. Publicly leaked information consists of buyers' names, product prices, IP-addresses, and buyers' home/delivery addresses. 


According to Digit.ru, a company called webAsyst developed the software for creating the online shops. Company representatives explained that after a buyer purchases a product from an online shop, the shop sends a link with purchase status to the buyer via email to a web site that is not password protected. So those pages where indexed by search engines. 

As a result of this leak, the Russian search engine Yandex has asked web site administrators to review information about robots.txt files and how to use them, so this type of incident does not happen in the future. Leaked information was still visible at the time of writing this blog. 

Websense recommends protecting private customer data by encrypting it or password protecting any web site that contains personal data so search engine robots cannot index the information.


Thanks to Petr Savich for help in writing this blog.


Forcepoint-authored blog posts are based on discussions with customers and additional research by our content teams.

Read more articles by Forcepoint

About Forcepoint

Forcepoint is the leading user and data protection cybersecurity company, entrusted to safeguard organizations while driving digital transformation and growth. Our solutions adapt in real-time to how people interact with data, providing secure access while enabling employees to create value.