This content originally appeared on DEV Community and was authored by conjurer
A cool term:
CRON = programming technique that schedules tasks automatically at specified intervals
Web what?
When researching projects etc., we usually write info from various sites- be it in a diary / excel / doc etc.
We are scraping the web and extracting data manually.
Web scraping is automating this.
Example
When googling say sneakers online, it shows a list of websites with products and prices. On the shopping tab is a more detailed record right?
Google just scraped websites for you to show sneakers from different sites.
This techinque is used by almost all big companies for their businesses since data has been increasing exponentially.
Web Crawler
This is a technique that although fetches information but differs from scraping in the sense that it searches for the best websites and indexes them whereas scraping is done in a single website.
It's used for SEO analysis (scraping - gathering data).
Famous web scraping technologies:
Issues!
Notice it's not a user making requests to get the info from site, it's the code written! If the websites know this task is automated, they will quickly block the IP address.
And this check has given rise to
- Captchas
- Rate limiting
- Dynamic content
Goal: simulate how humans work!
Bright data automates the job. It even rotates IPs to make the user unknown and unblocks sites (paid version!) for the user.
Shoutout to JSM for the wonderful explanation.
Ps:
Lol!
This content originally appeared on DEV Community and was authored by conjurer
conjurer | Sciencx (2024-09-06T04:20:12+00:00) Web scraping- Interesting!. Retrieved from https://www.scien.cx/2024/09/06/web-scraping-interesting/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.