Logo

Using pre-preprocessing to make it easier for ML models to intake inputs

Example: Imagine we have a dataset with two features: "Age" and "Income" for predicting credit risk.

Before standardization:

Age (years) | Income ($)

25          | 30,000

40          | 60,000

55          | 90,000

After standardization:

Age (standardized) | Income (standardized)

-1.22              | -1.22

0.00               | 0.00

1.22               | 1.22

This standardization helps prevent the larger scale of "Income" from dominating the smaller scale of "Age" in the model's calculations.

Suggetion: If there's lots of outliers, standardisation makes the variation between these outliers and normal values smaller.

Another Visual Example:

Pasted image 20240818165629.png

Common Standarization Classes:

  • StandardScaler
    • Standardizes features by removing the mean and scaling to unit variance
    • Useful when features have different scales and normal distribution is assumed
  • MinMaxScaler
    • Scales features to a fixed range, usually between 0 and 1
    • Useful when you need bounded values or dealing with sparse data
  • RobustScaler

    • Uses statistics that are robust to outliers
    • Helpful when your data contains many outliers
  • Normalizer

    • Scales individual samples to have unit norm
    • Useful for text classification or clustering

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud