This content originally appeared on Level Up Coding - Medium and was authored by Mykhailo Kushnir
Disclaimer: “TrickyCases” is a series of posts with relatively short code snippets, useful in day-to-day ML practice. Here you can find something that you would search for in StackOverflow in days from now.
Plots often contain valuable data, and if you are a data-geek as I am, you’d want to take that data home. One of my recent discoveries was how easy it is to parse data drawn with the HighCharts.js module.
Before we jump to the scraping part, make sure that:
- The site you want to scrape does not provide the same data through API. It’s always going to be easier to use a programmable interface.
- You’re not legally forbidden to scrape the site.
- You’re not causing a high load to the site.
The Part Where We Scrape
As promised, the scraping part would be relatively simple. For practice purposes, let us take a page like this one. Scroll down to the plot named “Max, Min and Average Temperature” and you’ll see an average degree of temperature in Celcius within each month in Brussels since 2009. Now open a browser console and run the following script:
Highcharts.charts[0].series[0].data
After some examination, you’ll see that now you have access to the entire raw data from the plot. Furthermore, with some javascript knowledge, you’ll even be able to print data cleanly. The trick for scraping would be to automate this process through selenium and pandas usage.
Below I’ll share with you a simplified version of such a script so you’d be able to catch the gist:
In this code:
- You’ll call the target site
- Wait until HighCharts plot appears
- Parse data with linked JS functions
- Store it to dataframe and save it to a local folder
Be aware that code assumes that you have chromedriver already installed, running, and available. Here are a few good tutorials on how to do it on various operating systems:
- On Ubuntu 20.04 and 18.04
- On Macos
- On WSL
- *Installation on Windows is as simple as downloading a correct zip file from here and installing through .exe file
As usual, I’m open to any questions and comments. Let me know if you have any troubles on the road and how you’ve used the script in your automation tasks!
TrickyCases #6. Scrape Highcharts plots was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Mykhailo Kushnir
Mykhailo Kushnir | Sciencx (2022-01-06T14:48:35+00:00) TrickyCases #6. Scrape Highcharts plots. Retrieved from https://www.scien.cx/2022/01/06/trickycases-6-scrape-highcharts-plots/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.