{"id":445,"date":"2022-10-17T17:18:25","date_gmt":"2022-10-17T17:18:25","guid":{"rendered":"https:\/\/codecrypt76.com\/?p=445"},"modified":"2022-12-08T20:34:39","modified_gmt":"2022-12-08T20:34:39","slug":"web-scraping","status":"publish","type":"post","link":"https:\/\/codecrypt76.com\/index.php\/2022\/10\/17\/web-scraping\/","title":{"rendered":"What is Web Scraping?"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"781\" height=\"439\" src=\"https:\/\/codecrypt76.com\/wp-content\/uploads\/2022\/12\/codecrypt76.com-what-is-web-scraping-web-scraping.png\" alt=\"\" class=\"wp-image-465\" srcset=\"https:\/\/codecrypt76.com\/wp-content\/uploads\/2022\/12\/codecrypt76.com-what-is-web-scraping-web-scraping.png 781w, https:\/\/codecrypt76.com\/wp-content\/uploads\/2022\/12\/codecrypt76.com-what-is-web-scraping-web-scraping-300x169.png 300w, https:\/\/codecrypt76.com\/wp-content\/uploads\/2022\/12\/codecrypt76.com-what-is-web-scraping-web-scraping-768x432.png 768w\" sizes=\"(max-width: 781px) 100vw, 781px\" \/><\/figure>\n\n\n\n<h2>What is  Web Scraping?<\/h2>\n\n\n\n<p>Web scraping (or data extraction) is a technique used to collect content and data from the internet. This data is usually saved in a local file so that it can be manipulated and analyzed as needed.<\/p>\n\n\n\n<h2>Web Scraping vs Web Crawling<\/h2>\n\n\n\n<p>According to <a href=\"https:\/\/scrapeops.io\/web-scraping-playbook\/what-is-web-scraping\/\" target=\"_blank\" rel=\"noopener\">ScrapeOps<\/a>, Web crawling is the process of traversing a website by following links in pages or the sitemap. Oftentimes, indexing the web pages as it goes. Google uses web crawling to index the web, and provides a powerful search engine that can find content anywhere.<\/p>\n\n\n\n<p>Web scraping, on the other hand, is the process of extracting specific bits of data from a target web page and storing it for your own purposes.<\/p>\n\n\n\n<h2>Uses for Web Scraping<\/h2>\n\n\n\n<ul><li><strong>Price Intelligence<\/strong> &#8211; Price intelligence refers to monitoring a competitor\u2019s prices and responding to their changes in pricing. Retailers use price intelligence to maintain a competitive edge over their rivals.<\/li><li><strong>Market Research<\/strong> &#8211; Web data extraction plays a vital role in market research. Market researchers use the resulting data to inform their market trend analysis, research and development, competitor analysis, price analysis, and other areas of study.<\/li><li><strong>Lead Generation<\/strong> &#8211; Businesses that want to attract new customers and generate more sales need to launch effective sales and marketing campaigns. Web scraping can help companies gather the correct contact information from their target market\u2014including names, job titles, email addresses, and cellphone numbers. Then, they can reach out to these contacts and generate more leads and sales for their business.<\/li><li><strong>Business Automation<\/strong> &#8211; In some cases, you may need to extract large amounts of data from a group of websites. You need to do this consistently, quickly, and structured. You can use web scraping tools to automatically extract these data sets.<\/li><li><strong>Real Estate<\/strong> &#8211; You need web data extraction to generate the most up-to-date and accurate real estate listings. Web scraping is commonly used to retrieve the most updated data about properties, sale prices, monthly rental income, amenities, property agents, and other data points. Web scraped data also informs property value appraisals, rental yield estimates, and real estate market trends analysis.<\/li><li><strong>News and Content Marketing<\/strong> &#8211; Businesses, political campaigns, and nonprofits that need to keep a close eye on brand sentiment, polls, and other trends often invest in web scraping tools. Content and digital marketing agencies also use web scraping tools to monitor, aggregate, and parse the most critical stories from different industries.<\/li><\/ul>\n\n\n\n<h2>Legality of Web Scraping<\/h2>\n\n\n\n<p>The Legality of web scraping is totally dependent on the legal jurisdiction i.e. Laws are country and locality specific. Publicly available information gathering or scraping is not illegal, if it were illegal, Google would not exist as a company because they scrape data from every website in the world.<\/p>\n\n\n\n<ul><li><strong>United States Court of Appeals for the Ninth Circuit &#8211; (LinkedIn v. HiQ Labs (2022))<\/strong> &#8211; The decision echoes the appeal\u2019s court 2019 decision, which upheld a lower court\u2019s 2017 determination in HiQ v. LinkedIn that web scraping doesn\u2019t qualify as accessing a protected computer without authorization.<\/li><li><strong>The US Supreme Court &#8211; (Van Buren v. United States (2021))<\/strong> &#8211; This case does not directly address web scraping, but it does touch upon the use of Computer Fraud and Abuse Act enacted in 1986 (CFAA) in the cases of web scraping. A good commentary about this case can be read here.<\/li><\/ul>\n\n\n\n<p>It is a complex case that may not answer all questions related to CFAA but it does seem to narrow down the scope of the CFAA considerably, which should serve as a deterrent to companies that rely on the CFAA to target web scraping.<\/p>\n\n\n\n<ul><li><strong>Eastern District Court of New York &#8211; (Genius Media Group Inc vs Google LLC and Lyricfind (2020))<\/strong> &#8211; Google, arguably the world\u2019s largest scraping company, had a web scraping case against them dismissed by Judge Margo Brodie. Google had repeatedly scraped lyrics from Genius to show up in their search results and the Judge dismissed the lawsuit stating<\/li><li><strong>US District Court for The District Of Columbia &#8211; (Sandvig v Sessions (2018))<\/strong> &#8211; A US District court ruling (Sandvig v Sessions) that talks directly about web scraping states:<\/li><\/ul>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>scraping plausibly falls within the ambit of the First Amendment.<\/p><\/blockquote>\n\n\n\n<p>and<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>That plaintiffs wish to scrape data from websites rather than manually record information does not change the analysis. Scraping is merely a technological advance that makes information collection easier; it is not meaningfully different from using a tape recorder instead of taking written notes, or using the panorama function on a smartphone instead of taking a series of photos from different positions.<\/p><\/blockquote>\n\n\n\n<ul><li><strong>US District Court for The District Of Columbia &#8211; (Sandvig v Sessions (2020))<\/strong> &#8211; US District Court in Washington, DC, has ruled that violating a website\u2019s terms of service isn\u2019t a crime under the Computer Fraud and Abuse Act US District Judge John D. Bates in Sandvig v Barr (Civil Action No. 16-1368) said,<\/li><\/ul>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>Criminalizing terms-of-service violations risks turning each website into its own criminal jurisdiction and each webmaster into his own legislature. Such an arrangement, wherein each website\u2019s terms of service \u201cis a law unto itself\u201c, would raise serious problems.<\/p><\/blockquote>\n\n\n\n<ul><li><strong>US Court of Appeals for the Ninth Circuit &#8211; (Oracle vs Rimini Street (2018))<\/strong><\/li><\/ul>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>&#8220;Taking data using a method prohibited by the applicable terms of use&#8221; \u2014 i.e., scraping \u2014 when the taking itself generally is permitted, does not violate\u201d the state computer crime laws.<\/p><\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>\u201cAs EFF puts it, \u2018[n]either statute . . . applies to bare violations of a website\u2019s terms of use\u2014such as when a computer user has permission and authorization to access and use the computer or data at issue, but simply accesses or uses the information in a manner the website owner does not like.\u2019\u201d<\/p><\/blockquote>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Web Scraping? Web scraping (or data extraction) is a technique used to collect content and data from the internet. This data is usually &hellip; <\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,65],"tags":[11,12,64],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/posts\/445"}],"collection":[{"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/comments?post=445"}],"version-history":[{"count":8,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/posts\/445\/revisions"}],"predecessor-version":[{"id":471,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/posts\/445\/revisions\/471"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/media\/468"}],"wp:attachment":[{"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/media?parent=445"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/categories?post=445"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codecrypt76.com\/index.php\/wp-json\/wp\/v2\/tags?post=445"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}