The cluster validation consists of measuring the goodness of clustering results. Before applying any clustering algorithm to a data set, the first thing to do is to assess the clustering tendency. That is, whether applying clustering is suitable for the data. If yes, then how many clusters are there. Next, you can perform hierarchical clustering or partitioning clustering (with a pre-specified number of clusters). Finally, you can use a number of measures, described in this part, to evaluate the goodness of the clustering results.
In this course, you will learn the following contents. We also provide practical examples in R software:
- Assessing clustering tendency using visual and statistical methods
- Determining the optimal number of clusters using elbow method, cluster silhouette analysis and gap statistics
- Cluster validation statistics using internal and external measures (silhouette coefficients and Dunn index)
- Choosing the best clustering algorithms. We’ll present different measures for comparing clustering algorithms and choosing the best one
- Computing p-value for hierarchical clustering using the
pvclust()
R function
good