Basic Data Analysis on the Iris Flower Dataset (HNG 11)

This content originally appeared on DEV Community and was authored by TOLUWANII-tech

This task was part of my data analysis internship with HNG11. It is a requirement for all interns in stage zero to proceed to the next stage. The task was relatively simple. I only had to review a dataset from a list of given options. The objectives are to identify initial insights from the dataset at first glance and to discover patterns, trends, or anomalies.

I chose the Dataset on Iris Flowers and performed Basic Exploratory Data Analysis using Python and the libraries in a notebook file. At an initial glance, the file containing the dataset (.data) has 150 rows of 5 values each (5 columns), with each value on a row separated by a comma (comma-delimited). The first four values are numerical variables, while the last one is a categorical variable which could immediately be identified as the label of the dataset. However, there was no description in the original file for any of the variables.

Accompanied with the data file was another text file giving a clearer description of the variables represented in the data file. With this information, I imported the data into the notebook, and read it into a DataFrame object using pandas library, assigning appropriate names for the columns of the dataset. In order, the columns are ‘sepal length (cm),’ ‘sepal width (cm),’ ‘petal length (cm),’ ‘petal width (cm),’ and ‘class.’

Using appropriate methods in pandas, I discovered the mean of each of the numerical variables ‘sepal length (cm),’ ‘sepal width (cm),’ ‘petal length (cm),’ ‘petal width (cm)’ to be 5.84, 3.05, 3.76 and 1.20 respectively (to 2 d.p.). Also, I observed that the categorical variable ‘class’ had only three unique values for three kinds of Iris flowers: ‘Iris-setosa, ‘Iris-virginica’ and ‘Iris-Versicolour.’ All of this information was also pointed out in the text description file. Another observation was that each of the three values for the categorical variable was represented the same number of times in the dataset; which means there were 50 Iris-Setosa flowers, 50 Iris-Virginica flowers and 50 Iris-Versicolour flowers.

With the aid of plotting and graphing tools, it was clear that a linear relationship exists between the petal width and the petal length, as well as between the petal length and sepal length of the flowers. The Iris-Virginica flowers had the longest petals and sepals, with the Iris-setosa flowers having the shortest ones. This can be seen in the graph below.

There is a clear correlation between the measurements of the sepals and petals of the flowers and their respective class. Meanwhile, the graph would suggest that petal length and width have a higher influence in determining the flower class than the sepal width. This could be considered in making inferences from a new dataset without the label.

This content originally appeared on DEV Community and was authored by TOLUWANII-tech

Print Share Comment Cite Upload Translate Updates

APA

TOLUWANII-tech | Sciencx (2024-06-29T22:19:02+00:00) Basic Data Analysis on the Iris Flower Dataset (HNG 11). Retrieved from https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/

MLA

" » Basic Data Analysis on the Iris Flower Dataset (HNG 11)." TOLUWANII-tech | Sciencx - Saturday June 29, 2024, https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/

HARVARD

TOLUWANII-tech | Sciencx Saturday June 29, 2024 » Basic Data Analysis on the Iris Flower Dataset (HNG 11)., viewed ,<https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/>

VANCOUVER

TOLUWANII-tech | Sciencx - » Basic Data Analysis on the Iris Flower Dataset (HNG 11). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/

CHICAGO

" » Basic Data Analysis on the Iris Flower Dataset (HNG 11)." TOLUWANII-tech | Sciencx - Accessed . https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/

IEEE

" » Basic Data Analysis on the Iris Flower Dataset (HNG 11)." TOLUWANII-tech | Sciencx [Online]. Available: https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/. [Accessed: ]

rf:citation

» Basic Data Analysis on the Iris Flower Dataset (HNG 11) | TOLUWANII-tech | Sciencx | https://www.scien.cx/2024/06/29/basic-data-analysis-on-the-iris-flower-dataset-hng-11/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Related Posts