Chapter ch-02-pipeline

Chapter 2 — The ML Pipeline (with Datasets)

4 items • Release mvp1-qr-poc-render

Manual PDF

Reading Node: ml-pipeline-intro

The ML Pipeline

A machine-learning pipeline is the sequence of steps that transforms raw data into a model:

This demo focuses on the dataset split and why it matters.

Reading Node: datasets-text

A dataset is a structured collection of examples used to train or evaluate a model.

Split	Purpose
Training	Fit model parameters
Validation	Tune hyperparameters and decisions
Test	Final, unbiased performance estimate

Using the test set to make repeated decisions leaks information and makes your reported performance too optimistic.

Video Node: datasets-video

Watch video

Quiz Node: datasets-quiz

Take quiz