What is Data Scraping?

Published on
May 31, 2023

Data scraping occurs when data is pulled from one source and saved to a local device, usually in the form of a spreadsheet and usually carried out by a software to do the actual data export. This act then allows someone to be able to sift through a large data set quickly for whatever their needs may be. There are a couple different types of data scraping, including web scraping, in which the user imports data from an HTML site and screen scraping, in which data is extracted from old machines or devices.

The Ethics of Data Scraping

Data scraping is not in and of itself a bad thing. The action of collecting data from one source and saving it to a local device is something that can be used ethically as a tool for a business or individual use. Many businesses will use data scraping to help better understand how their product ranks against other similar products or in order to sift through a long, long list of third party companies that they might wish to work with; by running a software which automatically pulls the desired data for them, they say valuable man hours and money in what could otherwise be a laborious process.

However, some individuals do use this method of data collection for nefarious purposes. A person with malicious intent could take advantage of web scraping to collect a company’s email address list in order to send phishing emails to them. Data scraping has also occurred where a person will scrape all of the posts and pictures from a person’s social media account for the purpose of creating a fake version of their account, often with a celebrity or influencer as the target; these scrapers will use the scam profile to dupe people by using the likeness of someone they recognize. Data scraping also moves toward the zone of being a data breach when the data scraped is pulled without the permission of the source of information. LinkedIn endured a data scraping hack in 2021 in which data, including email addresses, work history, and more, were pulled from the site and sold on the dark web.

Web Scraping vs. Data Mining vs Email Parsing

Many people will hear about data scraping and think that it is the same thing as data mining, but there are some things that distinguish the two from one another. The key difference is that data scraping only utilizes software to pull the data from its initial source, whereas data mining is the analyzing of a data set to show some sort of pattern. In data scraping, analysis and assumptions are extrapolated by the human who receives the scraped data, if there is any analysis performed (as we saw with the LinkedIn scrape, sometimes data is just extracted and, in this case, redistributed). Email parsing is very close to data scraping but the key factor which makes it different is that paring focuses on emails and web scraping can only be considered such if data is pulled from an HTML site.

