Learning

Release: mvp1-qr-poc-render • Chapter: ch-02-pipeline
Chapter ch-02-pipeline

Chapter 2 — The ML Pipeline (with Datasets)

4 items • Release mvp1-qr-poc-render

The ML Pipeline

Reading Node: ml-pipeline-intro

The ML Pipeline

A machine-learning pipeline is the sequence of steps that transforms raw data into a model:

  1. Data collection
  2. Data preparation (including dataset splitting)
  3. Training
  4. Evaluation
  5. Deployment

This demo focuses on the dataset split and why it matters.

Back to top

Datasets and splits

Reading Node: datasets-text

Datasets and splits

A dataset is a structured collection of examples used to train or evaluate a model.

Common splits

Split Purpose
Training Fit model parameters
Validation Tune hyperparameters and decisions
Test Final, unbiased performance estimate

Using the test set to make repeated decisions leaks information and makes your reported performance too optimistic.

Back to top

Datasets — Video

Video Node: datasets-video
Back to top

Datasets — Quiz

Quiz Node: datasets-quiz
Back to top