Chapter
ch-02-pipelineChapter 2 — The ML Pipeline (with Datasets)
4 items • Release
mvp1-qr-poc-render
The ML Pipeline
Reading
Node:
ml-pipeline-intro
The ML Pipeline
A machine-learning pipeline is the sequence of steps that transforms raw data into a model:
- Data collection
- Data preparation (including dataset splitting)
- Training
- Evaluation
- Deployment
This demo focuses on the dataset split and why it matters.
Datasets and splits
Reading
Node:
datasets-text
Datasets and splits
A dataset is a structured collection of examples used to train or evaluate a model.
Common splits
| Split | Purpose |
|---|---|
| Training | Fit model parameters |
| Validation | Tune hyperparameters and decisions |
| Test | Final, unbiased performance estimate |
Using the test set to make repeated decisions leaks information and makes your reported performance too optimistic.
Datasets — Video
Video
Node:
Back to top
datasets-video
Datasets — Quiz
Quiz
Node:
Back to top
datasets-quiz