This content originally appeared on Envato Tuts+ Tutorials and was authored by Jemima Abu
In this tutorial, we’ll take a look at how to use JavaScript in a browser’s dev tools to scrape data from any webpage.
If you’ve ever had to manually collate data from a webpage into a different format, like a spreadsheet or a data object, you’ll know it’s a very repetitive and tiresome process!
Luckily, most browsers include tools that allow you manipulate any webpage as much as you’d like. These tools are called developer tools (commonly referred to as dev tools) and are usually used by web developers to debug websites. We’ll be using them in this tutorial.
1. Accessing Dev Tools
There are different ways to access dev tools depending on the browser you’re using: Chrome, Safari, Firefox or Microsoft Edge. The most common way is to right-click (or Control + click) on the webpage and select the Inspect option.
Once we have our dev tools open, the two tabs we’ll be using are the Elements tab and the Console tab.
The Elements panel shows us all the HTML elements present on a webpage and the Console panel allows us to write JavaScript code directly in the webpage.
2. Identifying the Elements
The next step is to identify which elements we want to scrape from the webpage.
For example, let’s say we wanted to get a list of tutorials written by a Tuts+ author. We’d open dev tools on the author page and identify which element we wanted to scrape by using the inspect selector tool.
3. Targeting the Element
The next step is to target the element from the Console panel using JavaScript. There are multiple ways to target elements using JavaScript and in this tutorial we’ll be using the methods querySelectorAll() and querySelector().
In the example above, we want to target all elements with a class name posts_post
. We can do this by typing the following command in the Console panel:
let posts = document.querySelectorAll('.posts_post');
Now we have a variable posts
that contains the elements that we want to collect data from.
4. Manipulating Elements with JavaScript
Since we’re trying to scrape data from a webpage, we need to identify what data we want to collect. In this example, let’s collect the title and description of each tutorial.
Let’s write a function that allows us to collect the title and description from each li.posts_post
in our posts
variable.
Going back to our webpage and inspecting the elements again, we see that the title text is contained in a h1
tag and the description text is contained in a div
with the class name posts_post-teaser
.
To target these elements, we’ll write the following command into console:
let postsObj = [...posts].map(post => ( { title: post.querySelector('h1').innerText, description: post.querySelector('.posts__post-teaser').innerText } ));
Let’s breakdown what’s happening in the above code:
- Create a new variable
postsObj
to store the manipulated data - Use a spread syntax [...] to convert our
posts
variable from a NodeList to an array. - Use the map function to loop through the posts array and carry out the manipulation on each post
- Target the
h1
andposts__post-teaser
elements inside the post and store their innerText values inside the object keys title and description - Return an object value that contains the key and value pairs defined
This is what out postsObj value will return:
5. Conclusion
To recap, in order to scrape any data from page, we:
- Access the browser dev tools
- Identify the element using the inspect tool
- Use the Console panel to target and collect data from the elements
- Store the data in a Javascript object using the map method
Of course, manually writing JavaScript code in dev tools isn’t the only way to scrape data on a webpage and there are a lot of web scraper extensions that offer the same functionality without the need to write code.
However, this method is very useful for getting familiar with the developer tools in a browser and understanding how to manipulate data with JavaScript.
This content originally appeared on Envato Tuts+ Tutorials and was authored by Jemima Abu
Jemima Abu | Sciencx (2022-06-08T11:21:42+00:00) How To Scrape Data From a Webpage Using Vanilla JavaScript. Retrieved from https://www.scien.cx/2022/06/08/how-to-scrape-data-from-a-webpage-using-vanilla-javascript/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.