Data Science/Machine Learning Model Life Cycle
In this article, we will discuss how a machine learning project is handled/solved end to end in real case scenarios.
- Understanding Business Requirements: In this stage, we define and understand the problem statement. We need to know the customer/end-user for whom the project needs to be solved and what value proposition we can provide to the end-user.
- Data Acquisition: The next important phase is acquiring the data, which is done using ETL (Extract, Transform and Load). Data could be in various forms like raw log files, data stored in databases, Hadoop clusters, etc. One of the most popular tools used for data acquisition is SQL.
- Data Preparation: It includes cleaning and transforming data. It is an important step prior to processing and involves reformatting data, making corrections, gathering and combining various data sets to enrich the quality of data.
- Exploratory Data Analysis: It means slice and dice of data to bring out information. It includes plotting of data, visualizations, applying various statistical techniques, doing hypothesis testing, etc. for extracting insights and characteristics from the data.
- Modeling, Evaluation, and Interpretation: The next phase is applying various types of ML models like classification, regression, deep learning, etc. Evaluation includes deciding the correct KPI/performance metric of the model and how this can be connected to the business requirements. Model interpretation is very important to understand the behavior of a model, understanding the right features impacting the models, and explaining the same to stakeholders who won't understand machine learning theories.
- Deployment: This phase involves complete software engineering. After the model is approved and is good to go, the model is deployed in production.
- Real-World Testing and Monitoring: Once the model is deployed in production, the next phase is real-world testing i.e. A/B testing. A considerable amount of time is spent to check if all the results obtained in data analysis after modeling, make sense in the real world and what is the true business impact.
- Operations: The next phase is the operationalization of models, it includes how and when to retrain the models, how to handle the various failures in the pipeline of the model deployment.
- Optimization of models: This phase involves improving and optimizing the first cut model in terms of more business impact, interpretability, latency, etc.