The measure of the Guarantor
The Privacy Guarantor, however, is very attentive to the problem and recently intervened with a measure against the owner of the website trovanumeri.com: stop the constitution and online dissemination of a telephone directory formed with data through Web scraping and an injunction to pay a penalty of 60 thousand euros. In fact, the current regulatory framework does not allow the creation of generic telephone directories, which are not extracted from the single database (DBU) of telephone numbers and customer identification data of all national fixed and mobile telephone operators.
The Authority's findings showed that the site owner did not have a proper regulatory basis for processing the data; there were no directions on the site for contacting the data controller, and there was no possibility of obtaining data deletion if the appropriate form did not work. The brief privacy notice published also did not indicate the owner of the site, the identification of which required lengthy investigation. The Garante therefore declared the collection, storage and publication of personal data unlawful.
Numerous have been in recent years the appeals to the courts and requests for intervention received by the Guarantor concerning the unauthorized publication of names, addresses, telephone numbers, even of holders of confidential utilities.
In Italy, for example, another well-known case of alleged Web scraping concerns the 2019 lawsuit that Trenitalia has filed against the British company Gobright Media Ltd, the producer of Trenìt, an app that allows users to compare high-speed train fares. The focus of the dispute is data and its license to use it: Trenitalia, in fact, accused the British company of improperly using its database, accessing information such as train traffic management, cost of tickets, timetables, delays, etc... The Rome court first ordered Gobright to cease its Web scraping activity and later authorized the activity, because it did not realize substantial data misappropriation.
Widening the focus, other well-known cases concerning illegal Web scraping by companies that abuse and violate terms of service or copyright regulations should be mentioned.
In a ruling by the U.S. Ninth Circuit Court of Appeals, LinkedIn filed suit to prevent a competitor, HiQ, from scraping personal information from users' public social network profiles. In 2020, the ruling determined that the CFAA law had not been violated because the LinkedIn data being scraped was public (not password-protected).
Another case in the headlines involves Clearview AI: the facial recognition company received a hefty fine for scraping millions of photos of people's faces taken from social media. Clearview AI processed sensitive data without a valid legal basis.
In the trovanumeri.com case, the Guarantor thus reiterated some important principles: those who entrust their contact information to the web, have purposes that are not necessarily to receive marketing communications or see them indexed and further disseminated. Collecting contact data to form lists to be used later for marketing purposes is unlawful. So is disseminating such data in the form of a list
In setting the amount of the fine, the Authority took into account the seriousness of the violation, the large number of subjects whose data was published (approximately 26 million users), the duration of the violation, and the willful nature of the conduct of the holder.
How to defend against web scraping
But is it possible for users to defend themselves against web scraping?
First of all, restricted areas can be created on websites that can only be entered through registration, as is the case on social networks, which have different levels to take advantage of certain content. Or anti-bot services, robots.txt files, or blocking IP addresses of bots can be used. However, it is very important to provide in the terms of service (TOS) of a site an absolute prohibition to use scraping techniques for systematic retrieval of data and information.