This content originally appeared on Level Up Coding - Medium and was authored by Ulrik Thyge Pedersen
An Introduction to Low Code Data Cleaning and Analysis
Pandas is a popular library for data manipulation and analysis in Python, and with PandasAI, it's possible to add AI capabilities to Pandas.
One of the main features of pandas-ai is the run function, which allows you to ask questions about your data using natural language.
In this article, we’ll be exploring how to use PandasAI to ask questions about the Car Evaluation dataset. Specifically, we'll be asking questions about the acceptance level of the cars based on their features.
The Dataset
The Car Evaluation dataset contains information on the features of various cars and their acceptance level. The dataset contains 1728 instances with 6 attributes, which are:
- Buying: Buying price (high, med, low, vhigh)
- Maint: Maintenance price (high, med, low, vhigh)
- Doors: Number of doors (2, 3, 4, 5more)
- Persons: Capacity in terms of persons to carry (2, 4, more)
- Lug_boot: The size of luggage boot (small, med, big)
- Safety: Estimated safety of the car (high, med, low)
- Acceptance: Acceptance level of the car (unacc, acc, good, vgood)
The acceptance level of the car is the target variable, and it has four possible values: unacc (unacceptable), acc (acceptable), good, and vgood (very good).
The dataset can be downloaded from UCI Machine Learning Repository.
Loading the Data
Let’s start by loading the dataset into a Pandas DataFrame:
import pandas as pd
# Load the data
url='https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
df = pd.read_csv(url, names=['buying',
'maint',
'doors',
'persons',
'lug_boot',
'safety',
'acceptance'])
This will load the data from the URL and create a DataFrame with columns for buying, maintenance, number of doors, number of persons, size of the luggage boot, safety, and acceptance level.
Asking Questions About the Data
Now that we have our data loaded, we can use pandas-ai.run to ask questions about it. Let's start by asking: What the average acceptance level is for cars with a high safety rating?
from pandas_ai import PandasAI
from pandasai.llm.openai import OpenAI
# Instanciate a LLM, in this case OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")
pandas_ai = PandasAI(llm)
# Configure a prompt and run it
prompt="What is the average acceptance level
for cars with a high safety rating?"
result = pandas_ai.run(df, prompt=prompt)
# output
The average acceptance level for cars with a high safety rating is 2.21
In this example, we’re using PandasAI to calculate the average acceptance level for cars with a high safety rating. We're passing in our DataFrame (df) as the first argument, and then providing a question as the prompt argument. We're also specifying that we want to calculate the average acceptance level (target) and that we're only interested in cars with a high safety rating (conditions=["safety == 'high'"]).
Let’s ask another question: What is the most common acceptance level for cars with a high safety rating and a low price?
prompt="What is the most common acceptance level for
cars with a high safety rating and a low price?"
result = pandas_ai.run(df, prompt=prompt)
# output
The most common acceptance level for cars with a
high safety rating and a low price is unacc
In this example, we’re asking PandasAI to find the most common acceptance level for cars with a high safety rating and a low price. We pass in our DataFrame (df) as the first argument and provide a question as the prompt argument. We also specify that we want to find the most common acceptance level (target) and that we're only interested in cars with a high safety rating and a low buying price.
Additional Questions
Here is a short example of additional questions to ask our dataframe, only your imagination sets the limits!
prompt="What is the average price of cars with a
low maintenance cost and a medium number of doors?"
result = pandas_ai.run(df, prompt=prompt)
# output
The average price of cars with a low maintenance
cost and a medium number of doors is 14100.0
prompt="What is the most common buying price for cars
with a high safety rating and a low maintenance cost?"
result = pandas_ai.run(df, prompt=prompt)
# output
The most common buying price for cars with a high
safety rating and a low maintenance cost is vhigh
prompt="What is the acceptance level of cars
with a high luggage capacity and a low price?"
result = pandas_ai.run(df, prompt=prompt)
# output
The acceptance level of cars with a high
luggage capacity and a low price is unacc
These are just a few examples of the types of questions you can ask using pandas-ai.run. With its natural language processing capabilities, you can ask complex questions about your data without having to write any code.
Conclusion
PandasAI is a powerful tool that adds AI capabilities to the popular Pandas library. With its ability to analyze dataframes and answer questions through natural language processing, it makes data analysis more accessible and intuitive for both beginners and experts alike.
In this article, we used PandasAI to analyze a dataset of car reviews and answer several questions about the data. We started by exploring the dataset and cleaning the data to make it more usable. Then, we used pandas-ai.run to find the most expensive car with a high safety rating, the number of cars with a low price and high comfort rating, and the most common acceptance level for cars with a high safety rating and a low price. The results were clear and concise, making it easy to draw insights from the data.
PandasAI is a powerful tool that can save time and effort in data analysis tasks. With its natural language processing capabilities, it can be a great tool for those who are not familiar with Pandas or Python. As we have shown, PandasAI can be used to answer a wide range of questions about your data, making it a valuable addition to your data analysis toolkit.
Thank you for reading my story!
Subscribe for free to get notified when I publish a new story!
Find me on LinkedIn!
…and I would love your feedback!
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 💰 Free coding interview course ⇒ View Course
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job
PandasAI: Conversational Dataframes was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Ulrik Thyge Pedersen
Ulrik Thyge Pedersen | Sciencx (2023-05-09T13:04:52+00:00) PandasAI: Conversational Dataframes. Retrieved from https://www.scien.cx/2023/05/09/pandasai-conversational-dataframes/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.