In some of the situations, the data that we get contains multiple entries for an object.
For e.g.
In all of these situations, we may have to remove some of the rows and we have to make sure that we have a single entry per user. Hence in these cases, it becomes mandatory to be aware about some Python trick to achieve this.
Although there can be ‘n’ no. of ways for solving this problem, however, we are going to show two approaches.
Consider the data is available in below form where we have Population information for different states of Countries:
Expected Output:
Get the single row for each country where the population is highest.
Approach 1:
Code:
data.sort_values(['CountryName', 'Population'], ascending=True, inplace = True)
data.drop_duplicates(['CountryName'], keep="last", inplace = True)
Alternate Approach:
You can sort the data in descending order and can keep the first row as well.
Code:
data.sort_values(['CountryName', 'Population'], ascending=False, inplace = True)
data.drop_duplicates(['CountryName'], keep="first", inplace = True)
Approach 2:
Code:
data.iloc[data.groupby('CountryName')['Population'].nlargest(1).index.get_level_values(1)]
Python has just lot of these one liner tricks to solve big big problems.
We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.
Our Privacy policy can be found by clicking here