Page18/20
DBSCAN β Density-Based Clustering Β· Page 1 of 1
Beyond K-Means
DBSCAN β Density-Based Clustering
Problem with K-Means
K-Means assumes:
- Clusters are roughly spherical
- You know K (number of clusters) in advance
- All clusters have similar sizes
Counter-examples:
- Crescent-shaped clusters
- Clusters of different sizes
- Outliers shouldn't be in any cluster
- Unknown number of clusters
DBSCAN (Density-Based Spatial Clustering)
Core Idea
Two points are in the same cluster if they're close together and surrounded by other close points.
How it works:
- Ξ΅ (epsilon): How close is "close"? (distance threshold)
- minPts: Minimum neighbors within Ξ΅ to be a core point
- For each point:
- If >= minPts points within Ξ΅ β Core point
- If close to core point β Border point
- Otherwise β Noise/Outlier
Parameters:
- Ξ΅: Too small = all noise. Too large = one big cluster. Use k-distance graph.
- minPts: Often 2Γdimensions or 4. Balance sensitivity.
Advantages over K-Means
Pros:
- β No need to specify K (discovers automatically)
- β Finds arbitrary-shaped clusters
- β Detects outliers (noise points)
- β Robust to outliers
Cons:
- β Sensitive to Ξ΅ and minPts (must tune)
- β Slower than K-Means (O(nΒ²))
- β Struggles with varying cluster densities
- β High-dimensional curse (distances become less meaningful)
Choosing Ξ΅
Use the k-distance graph:
- For each point, calculate distance to kth nearest neighbor
- Sort distances
- Plot as line graph
- Find "elbow" where distances increase sharply
- That's your Ξ΅
DBSCAN vs K-Means
| Aspect | K-Means | DBSCAN |
|---|---|---|
| Cluster shape | Spherical | Any shape |
| Outliers | Forced into cluster | Labeled as noise |
| K required? | Yes | No |
| Scalability | Fast | Slow |
| Parameter tuning | Simple (just K) | Complex (Ξ΅, minPts) |
| Use case | Balanced, round clusters | Arbitrary shapes, unknown K |
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦