In our previous article, we described the basic concept of fuzzy clustering and we showed how to compute fuzzy clustering. In this current article, we’ll present the fuzzy c-means clustering algorithm, which is very similar to the k-means algorithm and the aim is to minimize the objective function defined as follow:
\[
\sum\limits_{j=1}^k \sum\limits_{x_i \in C_j} u_{ij}^m (x_i - \mu_j)^2
\]
Where,
- \(u_{ij}\) is the degree to which an observation \(x_i\) belongs to a cluster \(c_j\)
- \(\mu_j\) is the center of the cluster j
- \(u_{ij}\) is the degree to which an observation \(x_i\) belongs to a cluster \(c_j\)
- \(m\) is the fuzzifier.
It can be seen that, FCM differs from k-means by using the membership values \(u_{ij}\) and the fuzzifier \(m\).
The variable \(u_{ij}^m\) is defined as follow:
\[
u_{ij}^m = \frac{1}{\sum\limits_{l=1}^k \left( \frac{| x_i - c_j |}{| x_i - c_k |}\right)^{\frac{2}{m-1}}}
\]
The degree of belonging, \(u_{ij}\), is linked inversely to the distance from x to the cluster center.
The parameter \(m\) is a real number greater than 1 (\(1.0 < m < \infty\)) and it defines the level of cluster fuzziness. Note that, a value of \(m\) close to 1 gives a cluster solution which becomes increasingly similar to the solution of hard clustering such as k-means; whereas a value of \(m\) close to infinite leads to complete fuzzyness.
Note that, a good choice is to use m = 2.0 (Hathaway and Bezdek 2001).
In fuzzy clustering the centroid of a cluster is he mean of all points, weighted by their degree of belonging to the cluster:
\[
C_j = \frac{\sum\limits_{x \in C_j} u_{ij}^m x}{\sum\limits_{x \in C_j} u_{ij}^m}
\]
Where,
- \(C_j\) is the centroid of the cluster j
- \(u_{ij}\) is the degree to which an observation \(x_i\) belongs to a cluster \(c_j\)
The algorithm of fuzzy clustering can be summarize as follow:
- Specify a number of clusters k (by the analyst)
- Assign randomly to each point coefficients for being in the clusters.
- Repeat until the maximum number of iterations (given by “maxit”) is reached, or when the algorithm has converged (that is, the coefficients’ change between two iterations is no more than \(\epsilon\), the given sensitivity threshold):
- Compute the centroid for each cluster, using the formula above.
- For each point, compute its coefficients of being in the clusters, using the formula above.
The algorithm minimizes intra-cluster variance as well, but has the same problems as k-means; the minimum is a local minimum, and the results depend on the initial choice of weights. Hence, different initializations may lead to different results.
Using a mixture of Gaussians along with the expectation-maximization algorithm is a more statistically formalized method which includes some of these ideas: partial membership in classes.
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
While I believe you somewhat make a good introduction to it, this post includes typos and errors. Consider fixing it