🐼

Module

Intro to Pandas

Progress29%

4 / 14 pages

Lesson 1: DataFrames — Your Data Table

Lesson 2: Data Cleaning

Lesson 3: Feature Engineering with Apply

Lesson 4: Merging & Joining Data

Lesson 5: Working with Time Series

Lesson 6: Data Cleaning & Missing Values

Lesson 7: Pivoting & Reshaping Data

Lesson 8: Advanced Time Series & Resampling

Lesson 9: Statistical Analysis & Correlation

Lesson 10: Input/Output & File Formats

Lesson 11: Handling Outliers & Validation

Lesson 12: Advanced Data Transformations

Back to Module Overview

Page4/14

Data Cleaning · Page 1 of 1

Handling Missing Data

Data Cleaning

Why Does Dirty Data Exist?

Real-world datasets are messy. Missing values arise from:

Data entry errors (human mistakes)
System failures (sensors going offline)
Merging datasets with different schemas
Survey non-responses

Data cleaning process

Detecting Missing Values

df.isnull()           # True where NaN
df.isnull().sum()     # count per column
df.isnull().sum() / len(df) * 100  # percentage

Handling Missing Values

Strategy 1: Drop rows/columns

df.dropna()                   # drop rows with ANY NaN
df.dropna(subset=["salary"])  # drop only if salary is NaN
df.dropna(thresh=5)           # keep rows with at least 5 non-NaN

Strategy 2: Fill / Impute

df.fillna(0)                          # fill all with 0
df["age"].fillna(df["age"].mean())    # fill with mean
df["city"].fillna("Unknown")          # fill with constant
df.fillna(method="ffill")             # forward fill

Best practice: Never blindly drop or fill. Always understand why data is missing.

main.py

OUTPUT

▶Click "Run Code" to execute…