This content originally appeared on CodeSource.io - Quality Web & programming Tutorials and was authored by Codesource Staff
In this article, you will learn to compute the median in Pandas.
A Pandas DataFrame is nothing but a two-dimensional data structure or two-dimensional array that represents the data in rows and columns. In other words, it is compared to rectangular grids used to store data. It is open-source and potent, fast, and easy to use. Basically, while working with big data we need to analyze, manipulate and update them and the pandas’ library plays a lead role there.
In Pandas, we may need to compute the median of a column or all the columns. It is a statistical term and we may calculate it manually. Let’s say we have a list that consists of the numbers serially like [1, 2, 3, 4, 5, 6, 7, 8, 9]
this. From this list, the median value will be 5
as it stands in the middle. But in real-world scenarios, things will not be that simple. We may need to handle many complex data which can be negative, sorted, or unsorted. To perform this action, manually will become hard for us. But luckily, we do not need to do this manually. Pandas provide a function named median()
that is used to compute the median value. In this article, we will explore this function and see how we can perform this action in Pandas. Before doing so, let’s create a simple DataFrame in the below section:
import pandas as pd
student_df = pd.DataFrame({'Age' : [21, 23, 19, 22, 20],
'Ct_marks1' : [77, 84, 90, 67, 55],
'Ct_marks2' : [58, 74, 77, 87, 75]
})
print(student_df)
# Output:
# Age Ct_marks1 Ct_marks2
# 0 21 77 58
# 1 23 84 74
# 2 19 90 77
# 3 22 67 87
# 4 20 55 75
Here, you can see that we have created a simple Pandas DataFrame that represents students’ ages and Ct Marks.
Example One: compute the median in Pandas single column
import pandas as pd
student_df = pd.DataFrame({'Age' : [21, 23, 19, 22, 20],
'Ct_marks1' : [77, 84, 90, 67, 55],
'Ct_marks2' : [58, 74, 77, 87, 75]
})
age_median = student_df['Age'].median()
print(age_median)
# Output: 21.0
Here, we use the median()
function to compute the median of the Age column. You can see the result in the output. Like this, you may also specify multiple columns to get the median. All you need to do is to mention the name of the column and separate them with a comma.
Example Two: compute the median in Pandas All columns
import pandas as pd
student_df = pd.DataFrame({'Age' : [21, 23, 19, 22, 20],
'Ct_marks1' : [77, 84, 90, 67, 55],
'Ct_marks2' : [58, 74, 77, 87, 75]
})
all_median = student_df.median()
print(all_median)
# Output:
# Age 21.0
# Ct_marks1 77.0
# Ct_marks2 75.0
# dtype: float64
Here, you can see that we compute the median for all of the columns in DataFrame. You need to make sure that the column consists of numeric data otherwise you will encounter TypeError. Select only valid columns before calling the reduction.
Finally, these are some useful approaches that you may follow to compute the median in Pandas.
This content originally appeared on CodeSource.io - Quality Web & programming Tutorials and was authored by Codesource Staff
Codesource Staff | Sciencx (2022-10-02T14:34:25+00:00) How to compute the median in Pandas. Retrieved from https://www.scien.cx/2022/10/02/how-to-compute-the-median-in-pandas-2/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.