In this article we’ll learn some of the commonly used math functions in Pandas. Let’s get started.
The abs() function:
The first one we are going to see is the abs() function. This function returns the absolute values for the elements in the data frame.
|
import pandas as pd import numpy as np
df = pd.DataFrame() df[‘A’] = np.random.randint(–100,100, size=10) print(df) |
|
A 0 -57 1 -49 2 -8 3 9 4 72 5 -57 6 -72 7 3 8 80 9 13 |
There are few negative values. Let’s apply the abs function.
|
df[‘A’] = df[‘A’].abs() print(df) |
|
A 0 57 1 49 2 8 3 9 4 72 5 57 6 72 7 3 8 80 9 13 |
The abs() function can also be applied to complex numbers. In the case of complex numbers the absolute value is calculated as \sqrt{a^{2}+b^{2}}
Let’s create a new column B and populate it with complex numbers.
|
df[‘B’] = np.random.randint(0,10,10) + np.random.randint(0, 10, 10) * 1j print(df[‘B’]) |
After applying the abs function the results are as follows
|
df[‘B’] = df[‘B’].abs() print(df[‘B’]) |
|
0 7.000000 1 3.162278 2 6.708204 3 8.944272 4 12.727922 5 9.433981 6 6.708204 7 8.246211 8 9.486833 9 8.062258 Name: B, dtype: float64 |
The clip() function:
The clip takes two values lower limit and upper limit. If the values in the data frame exceeds this value then it will be clipped to the upper limit. If it is lower than the lower limit then it will be increased to the lower limit.
|
df[‘A’] = df[‘A’].clip(lower=25, upper=65) df |
|
0 57 1 49 2 25 3 25 4 65 5 57 6 65 7 25 8 65 9 25 Name: A, dtype: int64 |
The count() function:
This function will return the number of non null values in the data frame for each column. Let’s apply the count function on the data frame.
The following is our data frame
|
A B 0 57 7.000000 1 49 3.162278 2 25 6.708204 3 25 8.944272 4 65 12.727922 5 57 9.433981 6 65 6.708204 7 25 8.246211 8 65 9.486833 9 25 8.062258 |
Let’s apply count function on this data frame.
|
A 10 B 10 dtype: int64 |
Since there are no null values in the data frame the result is ten which is the number of observations. Let’s introduce a null value in the column B and again apply count function.
|
df[‘B’][np.random.randint(10)] = pd.np.nan print(df[‘B’]) |
|
0 7.000000 1 3.162278 2 NaN 3 8.944272 4 12.727922 5 9.433981 6 6.708204 7 8.246211 8 9.486833 9 8.062258 Name: B, dtype: float64 |
Now, if we apply the count function we’ll get the following results
|
A 10 B 9 dtype: int64 |
By specifying the axis=1 we can get the number of non null values for each row.
The min, med, mean, max:
These functions are straight forward.
|
print(‘Minimum:’) print(df.min(),‘\n’) print(‘Maximum:’) print(df.max(),‘\n’) print(‘Mean:’) print(df.mean(),‘\n’) print(‘Median:’) print(df.median(),‘\n’) print(‘Mode:’) print(df[‘A’].mode(),‘\n’) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
Minimum: A 25.000000 B 3.162278 dtype: float64
Maximum: A 65.000000 B 12.727922 dtype: float64
Mean: A 45.800000 B 8.196884 dtype: float64
Median: A 53.000000 B 8.246211 dtype: float64
Mode: 0 25 dtype: int64 |
One thing worth mentioning is that by default the null values are not included in the calculation. If you want to include them you can set the parameter skipna to False.
The rank() function:
The rank() function returns the rank of the values in the data frame. Consider the following data frame.
|
df = pd.DataFrame() df[‘A’] = np.random.randint(0,10,5) print(df) |
|
A 0 9 1 0 2 1 3 6 4 7 |
The result of applying rank on the above data frame is as follows
|
A 0 5.0 1 1.0 2 2.0 3 3.0 4 4.0 |
By default it ranks the values in ascending order. If you don’t want to rank the values in the ascending order you can set the parameter ascending to False.
|
print(df.rank(ascending=False)) |
|
A 0 1.0 1 5.0 2 4.0 3 3.0 4 2.0 |
Another important argument worth mentioning is the method. It helps to decide what to do in case of a tie. There are various options like average, min, max, first. Let’s take a look at each of them.
Consider the following data frame.
|
df = pd.DataFrame() df[‘A’] = np.random.randint(0,3,6) print(df) |
|
A 0 0 1 0 2 1 3 1 4 2 5 0 |
Average will use the average rank of the group and apply to all items with the same rank.
|
print(df2.rank(method=‘average’)) |
|
A 0 2.0 1 2.0 2 4.5 3 4.5 4 6.0 5 2.0 |
Let’s see how these ranks are assigned.
The smallest element here is 0. Since there are three of them they will get the ranks 1,2 and 3.
Now we need to calculate the average of these ranks i.e., (3 + 2 + 1) / 3 = 2. So, the rank 2 will be assigned to all the zeros.
The next smallest element is 1. Since there are already three elements which are smaller than 1 we’ll assign a rank of 4 and 5 for the two ones. Again we’ll apply the average to find the rank of 1. The average of 4 and 5 is 4.5
So, the rank 4.5 will be assigned to 1.
The last element 2 will get a rank of 6.
Now let’s see how the ‘min’ will assign the ranks.
|
print(df2.rank(method=‘min’)) |
|
A 0 1.0 1 1.0 2 4.0 3 4.0 4 6.0 5 1.0 |
The min will assign lowest rank to all items.
Similarly the ‘max’ will assign highest rank to all items.
|
print(df2.rank(method=‘max’)) |
|
A 0 3.0 1 3.0 2 5.0 3 5.0 4 6.0 5 3.0 |
In ‘first’ the rank to the items will be assigned in the order they appear.
|
print(df2.rank(method=‘max’)) |
|
A 0 1.0 1 2.0 2 4.0 3 5.0 4 6.0 5 3.0 |
Summary:
In this article we learned some of the commonly used math functions in Pandas.
We discussed a total of five functions. They are:
abs(): returns the absolute values for the elements in the data frame.
clip(): if the values in the data frame exceeds the upper or lower limit it will be clipped.
count(): returns the number of non null values in the data frame.
min, max, med, mode: returns the min, max, med and mode for each column in the data frame.
rank(): return the rank of the elements in the data frame.