13/22
Feature Engineering — Create Better Features · Page 2 of 2

Feature Selection

Feature Selection: Keep Only the Good Features

Too many features cause:

  • Overfitting (model memorizes noise)
  • Slow training
  • Curse of dimensionality

Methods

1. Univariate Selection (Filter)

Calculate correlation between each feature and target:

from sklearn.feature_selection import SelectKBest, chi2

selector = SelectKBest(chi2, k=10)  # Keep top 10 features
X_selected = selector.fit_transform(X, y)

2. Tree-Based Feature Importance

Trees tell you which features matter:

model = RandomForestClassifier()
model.fit(X, y)
importances = model.feature_importances_
# Drop features with importance < 0.01

3. Recursive Feature Elimination (RFE)

Remove weakest features iteratively:

from sklearn.feature_selection import RFE

rfe = RFE(LogisticRegression(), n_features_to_select=10)
X_selected = rfe.fit_transform(X, y)

Best Practice:

  1. Start with all features
  2. Train and note baseline performance
  3. Drop weakest feature
  4. Retrain and compare
  5. Repeat until performance drops
main.py
Loading...
OUTPUT
Click "Run Code" to execute…