Davies-Bouldin Index

TL;DR – Yet another clustering evaluation metric

Davies-Bouldin index was suggested by David L. Davies and Donald W. Bouldin in “A Cluster Separation Measure” (IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227. doi:10.1109/TPAMI.1979.4766909, full pdf)

Just like Silhouette score, Calinski-Harabasz index and Dunn index, Davies-Bouldin index provide an internal evaluation schema. I.e. the score is based on the cluster itself and not on external knowledge such as labels.

The Silhouette score reflects how similar a point is to the cluster it is associated with. I.e .for each point with compute the average distance of the point from the points in the nearest cluster minus the average distance of the point from the points in its own cluster divided by the maximum between those distances. The overall score is the average of the score per point. The Silhouette score is bounded from -1 to 1 and higher score means more distinct clusters.

The Calinski-Harabasz index compares the variance between-clusters to the variance within each cluster. This measure is much simpler to calculate then the Silhouette score however it is not bounded. The higher the score the better the separation is.

The intuition behind Davies-Bouldin index is the ratio between the within cluster distances and the between cluster distances and computing the average overall the clusters. It is therefore relatively simple to compute, bounded – 0 to 1, lower score is better. However, since it measures the distance between clusters’ centroids it is restricted to using Euclidean distance function.

Silhouette score and Calinski-Harabasz index were previously implemented and are part of scikit-learn and I implemented Davies-Bouldin index. Hopefully this will be my first contribution to scikit-learn, my implementation is here.


6 thoughts on “Davies-Bouldin Index

  1. Hey! It is a nice implementation. I copied your function davies_bouldin_index in my source code and imported required modules but I am getting a score greater than 1. Is it possible?

  2. Hi Tom! Congrats. I found your implementation by chance only after I implemented my own function to compute the DB index. I checked mine against yours and we’re getting the same results (values greater than 1 included). Why is your implementation not merged to scikit-learn alreay on gitbub? Could you check if there is something else you need to address? I would really like to see it in scikit. 🙂

      1. I know, I downloaded the code from there. However, it’s been stalled for a while now and it would be nice to finally merge it for us…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s