Pipeline

A pipeline is a workflow that helps streamline and automate the process of building, training and deploying machine learning models.

OSEMN Process

It’s an acronym that stands for Obtain, Scrub, Explore, Model, Interpret. It’s a list of common task in the work of a data scientist.

  • Obtain: collect data
  • Scrub: Clean the data, remove duplicates, remove outliers and fill or remove sample with missing values. see Data cleansing
  • Explore: Exploratory Data Analysis
  • Model: Model Selection and Evaluation
  • Intepret: The model (hopefully) provide some useful insights to the Data Scientist, these are analyized and presented to public or stakeholders.