Amazon product dataset

Hi, I found a dataset of Amazon products in Kaggle and decided to find a relationship between price and star rating.

Full code in :
https://github.com/victordalet/Kaggle_analysis/tree/feat/amazon_products

I – Preparing data

To do this, I…


This content originally appeared on DEV Community and was authored by victor_dalet

Hi, I found a dataset of Amazon products in Kaggle and decided to find a relationship between price and star rating.

Full code in :
https://github.com/victordalet/Kaggle_analysis/tree/feat/amazon_products

I - Preparing data

To do this, I use SQLAlchemy to convert the csv file into a small database, and plotly to display the information.

pip install SQLAlchemy
pip install plotly

In the following script, I extract the data and obtain :

  • ratio between price and number of stars
  • final rating and number of stars
  • price and number of stars
import pandas as pd
from sqlalchemy import create_engine, text
import plotly.express as px


class Main:
    def __init__(self):
        self.result = None
        self.connection = None

        self.engine = create_engine("sqlite:///my_database.db", echo=False)
        self.df = pd.read_csv("amazon_product.csv")
        self.df.to_sql("products", self.engine, index=False, if_exists="append")

        self.get_data()
        self.transform_data()
        self.display_graph()
        self.get_data_number_start_and_price()
        self.transform_data()
        self.display_graph()
        self.get_data_number_start_and_start()
        self.display_graph()

    def get_data(self):
        self.connection = self.engine.connect()
        query = text(
            "SELECT product_price, product_star_rating FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()

    def get_data_number_start_and_price(self):
        query = text(
            "SELECT product_price, product_num_ratings FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()

    def get_data_number_start_and_start(self):
        query = text(
            "SELECT product_star_rating, product_num_ratings FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()
        for i in range(len(self.result)):
            self.result[i] = [self.result[i][0], self.result[i][1]]

    def transform_data(self):
        for i in range(len(self.result)):
            self.result[i] = [float(self.result[i][0].split("$")[1]), self.result[i][1]]

    def display_graph(self):
        fig = px.scatter(
            self.result, x=0, y=1, title="Amazon Product Price vs Star Rating"
        )
        fig.show()


Main()

II - Result

Price and notation

Image description

Price and number of notation

Image description

Notation and number of opinion

Image description

III - Conclusion

We can see, there's not necessarily a relationship between price and rating, but the higher the price, the lower the rating, and the more reviews, the higher the rating.
Which seems logical, since if a product is bought a lot, it means it's popular.


This content originally appeared on DEV Community and was authored by victor_dalet


Print Share Comment Cite Upload Translate Updates
APA

victor_dalet | Sciencx (2024-08-25T17:38:26+00:00) Amazon product dataset. Retrieved from https://www.scien.cx/2024/08/25/amazon-product-dataset/

MLA
" » Amazon product dataset." victor_dalet | Sciencx - Sunday August 25, 2024, https://www.scien.cx/2024/08/25/amazon-product-dataset/
HARVARD
victor_dalet | Sciencx Sunday August 25, 2024 » Amazon product dataset., viewed ,<https://www.scien.cx/2024/08/25/amazon-product-dataset/>
VANCOUVER
victor_dalet | Sciencx - » Amazon product dataset. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/25/amazon-product-dataset/
CHICAGO
" » Amazon product dataset." victor_dalet | Sciencx - Accessed . https://www.scien.cx/2024/08/25/amazon-product-dataset/
IEEE
" » Amazon product dataset." victor_dalet | Sciencx [Online]. Available: https://www.scien.cx/2024/08/25/amazon-product-dataset/. [Accessed: ]
rf:citation
» Amazon product dataset | victor_dalet | Sciencx | https://www.scien.cx/2024/08/25/amazon-product-dataset/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.