2/20
What is Machine Learning? · Page 2 of 2

The Machine Learning Pipeline

The ML Pipeline

Building an ML model isn't just calling a function. It's a strict pipeline:

  1. Get Data: Collect your structured data.
  2. Preprocess: Handle missing values, scale features, encode text.
  3. Split Data: Crucial step! Separate data into Training and Testing sets.
  4. Train Model: Feed training data to the algorithm.
  5. Evaluate: Test the model on data it has never seen before (Testing set).

Why Split Data?

If you test a model on the exact same data it learned from, it's like giving a student a test with the exact same questions they studied. They might just memorize it, but they didn't learn.

# Standard 80/20 Split
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
main.py
Loading...
OUTPUT
Click "Run Code" to execute…