Dataset can have numerical as well as categorical data in it and at times when we are building any machine learning model, we may need to encode these categorical values with some number. Hence in these cases encoding becomes important.
In python it is pretty much simple to encode the data by using an apply function.
Suppose we want to encode the categories of Salary variable, High, Medium, Low into 3, 2, and 1. For this, we will write one function for the mapping and later this function will be called inside the apply function.
sal = ['Low', 'High', 'Medium', 'Low', 'Medium']
age = [18, 30, 25, 20, 22]
data = pd.DataFrame()
data['Salary'] = sal
data['Age'] = age
if salary == "Low":
if salary == "Medium":
if salary == "High":
return 4 # for junk values
data['Encoded_Salary'] = data['Salary'].apply(encode_salary)
There are a lot more into this course, stay tuned for more such techniques.