How To Build Your Data Science Books Reading List Using Scrapy

Web Scraping is a powerful technique for data collection from websites. Scrapy is a powerful python library used for web scraping. Other alternatives are beautifulsoup4

In this article, I will show you how to use Scrapy to extract available information about data science books from Amazon.com

This article is a detailed process on how to implement Scrapy

Photo by Markus Spiske on Unsplash

The Process:

  1. Installing Scrapy
  2. Setting up a Scrapy project
  3. Creating a Spider
  4. Extracting and saving the information needed (i.e Data Science Books)

The First Process:

Before you can use scrapy, you need to have Python installed on your computer.

Scrapy is installed by executing this command

pip install scrapy

The Second Process:

Create a new scrapy project by executing this command

scrapy startproject data_books

A new directory called “data_books” would be created with the requied files for this Scrapy project.

The Third Process:

Create a Scrapy Spider. A scrapy spider is a class that aids Scrapy on how to navigate a website and the information needed to extract.

In the data_books directory, you will have to create a python file called book_bot.py (you can name the python file based on your choice).

You need to define a python class

import scrapy
class BookBotSpider(scrapy.Spider):
name = "data_books"
start_urls = ['https://www.amazon.com/s?k=data+science+books',]

The Fourth Process:

Create a parse function. A parse function is called for each URL in the start_urls list. Parse function aids in navigating through a website and extracting the information you need.

Here’s an example on how to extract the title, price and rating of each data science book

import scrapy
class BookBotSpider(scrapy.Spider):
name = "data_books"
start_urls = ['https://www.amazon.com/s?k=data+science+books',]
def parse(self,response):
for book in response.css('div.sg-col-4-of-12'):
yield{
'title': book.css('span.a-size-medium::text').get(),
'price': book.css('span.a-price-whole::text').get(),
'rating': book.css('span.a-icon-alt::text').get(),
}

Final Process:

The extracted data can be saved and exported in either JSON, CSV and XML file format.

Scrapy has built-in export formats such as JSON, XML and CSV.

To save the extracted data, use this syntax

scrapy crawl data_books -o databooks.csv

If you intend to save the extracted data to a data base or a cloud storage, you can use this command

scrapy.exporters

Web scraping is a powerful tool for extracting data.

Scrapy is efficient in extracting information from websites and have it saved in various formats.

By applying the steps I outlined in this article, you can:

  1. Create a scrapy spider.
  2. Navigate a website.
  3. Extract the information you need using Scrapy in-built functions.

Web Scraping is useful for collecting required data for data science projects such as:

  1. Analyzing book prices.
  2. Analyzing customer reviews.

In conclusion, Scrapy is a great tool to have in your data science toolkit.

Connect With Me On Twitter , LinkedIn, Github


How To Build Your Data Science Books Reading List Using Scrapy was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Ebuka (Gaus Octavio)

Web Scraping is a powerful technique for data collection from websites. Scrapy is a powerful python library used for web scraping. Other alternatives are beautifulsoup4

In this article, I will show you how to use Scrapy to extract available information about data science books from Amazon.com

This article is a detailed process on how to implement Scrapy

Photo by Markus Spiske on Unsplash

The Process:

  1. Installing Scrapy
  2. Setting up a Scrapy project
  3. Creating a Spider
  4. Extracting and saving the information needed (i.e Data Science Books)

The First Process:

Before you can use scrapy, you need to have Python installed on your computer.

Scrapy is installed by executing this command

pip install scrapy

The Second Process:

Create a new scrapy project by executing this command

scrapy startproject data_books

A new directory called “data_books” would be created with the requied files for this Scrapy project.

The Third Process:

Create a Scrapy Spider. A scrapy spider is a class that aids Scrapy on how to navigate a website and the information needed to extract.

In the data_books directory, you will have to create a python file called book_bot.py (you can name the python file based on your choice).

You need to define a python class

import scrapy
class BookBotSpider(scrapy.Spider):
name = "data_books"
start_urls = ['https://www.amazon.com/s?k=data+science+books',]

The Fourth Process:

Create a parse function. A parse function is called for each URL in the start_urls list. Parse function aids in navigating through a website and extracting the information you need.

Here’s an example on how to extract the title, price and rating of each data science book

import scrapy
class BookBotSpider(scrapy.Spider):
name = "data_books"
start_urls = ['https://www.amazon.com/s?k=data+science+books',]
def parse(self,response):
for book in response.css('div.sg-col-4-of-12'):
yield{
'title': book.css('span.a-size-medium::text').get(),
'price': book.css('span.a-price-whole::text').get(),
'rating': book.css('span.a-icon-alt::text').get(),
}

Final Process:

The extracted data can be saved and exported in either JSON, CSV and XML file format.

Scrapy has built-in export formats such as JSON, XML and CSV.

To save the extracted data, use this syntax

scrapy crawl data_books -o databooks.csv

If you intend to save the extracted data to a data base or a cloud storage, you can use this command

scrapy.exporters

Web scraping is a powerful tool for extracting data.

Scrapy is efficient in extracting information from websites and have it saved in various formats.

By applying the steps I outlined in this article, you can:

  1. Create a scrapy spider.
  2. Navigate a website.
  3. Extract the information you need using Scrapy in-built functions.

Web Scraping is useful for collecting required data for data science projects such as:

  1. Analyzing book prices.
  2. Analyzing customer reviews.

In conclusion, Scrapy is a great tool to have in your data science toolkit.

Connect With Me On Twitter , LinkedIn, Github


How To Build Your Data Science Books Reading List Using Scrapy was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Ebuka (Gaus Octavio)


Print Share Comment Cite Upload Translate Updates
APA

Ebuka (Gaus Octavio) | Sciencx (2023-02-24T20:23:49+00:00) How To Build Your Data Science Books Reading List Using Scrapy. Retrieved from https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/

MLA
" » How To Build Your Data Science Books Reading List Using Scrapy." Ebuka (Gaus Octavio) | Sciencx - Friday February 24, 2023, https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/
HARVARD
Ebuka (Gaus Octavio) | Sciencx Friday February 24, 2023 » How To Build Your Data Science Books Reading List Using Scrapy., viewed ,<https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/>
VANCOUVER
Ebuka (Gaus Octavio) | Sciencx - » How To Build Your Data Science Books Reading List Using Scrapy. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/
CHICAGO
" » How To Build Your Data Science Books Reading List Using Scrapy." Ebuka (Gaus Octavio) | Sciencx - Accessed . https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/
IEEE
" » How To Build Your Data Science Books Reading List Using Scrapy." Ebuka (Gaus Octavio) | Sciencx [Online]. Available: https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/. [Accessed: ]
rf:citation
» How To Build Your Data Science Books Reading List Using Scrapy | Ebuka (Gaus Octavio) | Sciencx | https://www.scien.cx/2023/02/24/how-to-build-your-data-science-books-reading-list-using-scrapy/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.