PadhaiTime Logo
Padhai Time

Introduction to Featurization and Feature Engineering

Featurization: It is a process of converting data in some format like text, images etc to numerical vectors, because all the classical machine learning models can only process numeric data.

 

Feature Engineering: It is the process of how we can modify the data and use our domain knowledge to extract useful features, so that it works well with the machine learning models.

Data can be provided to us in various forms. In this article, let's discuss various ways and techniques of engineering the data before applying any machine learning models.

1) Text Data: Given data is present in text format, some of the techniques that are used for converting to numeric vectors are:

 

2) Categorical Data: Data is in the form of a finite number of groups or distinct categories. Such kind of data can be featured to numeric values using techniques like:

 

  • One Hot Encoding
  • Label Encoding
  • Domain Specific ways

 

3) Numerical Data: Data may also be given in numeric form (age, height, weight etc). Some transformations are required to make data normalization.

  

  • Normalization
  • Standardization
  • Log Transformation

 

4) Time Series Data: Such kinds of data always have a sequence with respect to time. Some examples of time series data include heart rate, number of products sold over months in ecommerce, stock market price etc. Various techniques of featurization are:

 

  • Date Time Features
  • Lag Features
  • Rolling Window Features
  • Expanding Window Features
  • Fourier Transformation
  • Exponential Smoothing etc.

 

5) Images and Videos: Data can also be provided to us in images to solve various real-world use cases like Face Detection, Face Recognition, classifying diseases based on the X-Ray and MRI scans. Video data are a combination of images and time series data. There are various advanced techniques of encoding images and video data to numerical features, which will later be discussed in a separate article.

  

We have given a short introduction to these feature engineering techniques. We will be discussing them in more detail in upcoming articles.

Bengaluru, India
contact.padhaitime@gmail.com
  • We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.
    Our Privacy policy can be found by clicking here