Pipeline
A pipeline is a workflow that helps streamline and automate the process of building, training and deploying machine learning models.
OSEMN Process
It’s an acronym that stands for Obtain, Scrub, Explore, Model, Interpret. It’s a list of common task in the work of a data scientist.
- Obtain: collect data
- Scrub: Clean the data, remove duplicates, remove outliers and fill or remove sample with missing values. see Data cleansing
- Explore: Exploratory Data Analysis
- Model: Model Selection and Evaluation
- Intepret: The model (hopefully) provide some useful insights to the Data Scientist, these are analyized and presented to public or stakeholders.