I have studied the DBSCAN clustering algorithm from both a theoretical and practical point of view.
It is a very interesting density-based algorithm: the idea is that "good " clusters are dense and separated by areas of low or no density at all.
Moreover, it does not require the user to specify the number of clusters in advance and it is capable of identifying arbitrary-shaped clusters and noise efficiently (it can to detect outliers).
Outline:
- Brief introduction of the algorithm
- Pros and cons
- How does the algorithm work?
- How to choose the hyper-parameters required by the algorithm?
- Visualization of the algorithm on toy datasets
- Comparison with other clustering algorithms
- Elbow method (select the right epsilon value given minimum_samples)
- Estimate the hyper-parameters without any domain knowledge