Dataset can have numerical as well as categorical data in it and at times when we are building any machine learning model, we may need to encode these categorical values with some number. Hence in these cases encoding becomes important.
In python it is pretty much simple to encode the data by using an apply function.
Suppose we want to encode the categories of Salary variable, High, Medium, Low into 3, 2, and 1. For this, we will write one function for the mapping and later this function will be called inside the apply function.
Input:
Output:
Code:
sal = ['Low', 'High', 'Medium', 'Low', 'Medium']
age = [18, 30, 25, 20, 22]
data = pd.DataFrame()
data['Salary'] = sal
data['Age'] = age
def encode_salary(salary):
if salary == "Low":
return 1
if salary == "Medium":
return 2
if salary == "High":
return 3
return 4 # for junk values
data['Encoded_Salary'] = data['Salary'].apply(encode_salary)
There are a lot more into this course, stay tuned for more such techniques.
We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.
Our Privacy policy can be found by clicking here