This article describes how to compute the fuzzy clustering using the function cmeans() [in e1071 R package]. Previously, we explained what is fuzzy clustering and how to compute the fuzzy clustering using the R function fanny()[in cluster package].
Related articles:
cmeans() format
The simplified format of the function cmeans() is as follow:
cmeans(x, centers, iter.max = 100, dist = "euclidean", m = 2)
- x: a data matrix where columns are variables and rows are observations
- centers: Number of clusters or initial values for cluster centers
- iter.max: Maximum number of iterations
- dist: Possible values are “euclidean” or “manhattan”
- m: A number greater than 1 giving the degree of fuzzification.
The function cmeans() returns an object of class fclust which is a list containing the following components:
- centers: the final cluster centers
- size: the number of data points in each cluster of the closest hard clustering
- cluster: a vector of integers containing the indices of the clusters where the data points are assigned to for the closest hard - clustering, as obtained by assigning points to the (first) class with maximal membership.
- iter: the number of iterations performed
- membership: a matrix with the membership values of the data points to the clusters
- withinerror: the value of the objective function
Compute fuzzy c-means clustering
set.seed(123)
# Load the data
data("USArrests")
# Subset of USArrests
ss <- sample(1:50, 20)
df <- scale(USArrests[ss,])
# Compute fuzzy clustering
library(e1071)
cm <- cmeans(df, 4)
cm
## Fuzzy c-means clustering with 4 clusters
##
## Cluster centers:
## Murder Assault UrbanPop Rape
## 1 0.857 0.338 -0.729 0.200
## 2 -0.731 -0.665 1.003 -0.333
## 3 -1.210 -1.248 -0.728 -1.153
## 4 0.629 0.970 0.501 0.865
##
## Memberships:
## 1 2 3 4
## Iowa 0.00916 0.0191 0.9658 0.00594
## Rhode Island 0.09885 0.5915 0.2050 0.10463
## Maryland 0.22786 0.0475 0.0273 0.69731
## Tennessee 0.87231 0.0286 0.0211 0.07801
## Utah 0.04446 0.8218 0.0844 0.04929
## Arizona 0.11876 0.1008 0.0399 0.74056
## Mississippi 0.62441 0.0931 0.1030 0.17952
## Wisconsin 0.03363 0.1110 0.8313 0.02403
## Virginia 0.39552 0.2570 0.1918 0.15573
## Maine 0.03433 0.0530 0.8915 0.02117
## Texas 0.24082 0.1595 0.0541 0.54557
## Louisiana 0.61799 0.0653 0.0419 0.27473
## Montana 0.13551 0.1366 0.6657 0.06215
## Michigan 0.09620 0.0371 0.0178 0.84890
## Arkansas 0.56529 0.1223 0.1805 0.13188
## New York 0.13194 0.1323 0.0416 0.69421
## Florida 0.17377 0.0749 0.0398 0.71155
## Alaska 0.38155 0.1354 0.1136 0.36947
## Hawaii 0.06662 0.7206 0.1487 0.06410
## New Jersey 0.05957 0.8009 0.0575 0.08206
##
## Closest hard clustering:
## Iowa Rhode Island Maryland Tennessee Utah
## 3 2 4 1 2
## Arizona Mississippi Wisconsin Virginia Maine
## 4 1 3 1 3
## Texas Louisiana Montana Michigan Arkansas
## 4 1 3 4 1
## New York Florida Alaska Hawaii New Jersey
## 4 4 1 2 2
##
## Available components:
## [1] "centers" "size" "cluster" "membership" "iter"
## [6] "withinerror" "call"
The different components can be extracted using the code below:
# Membership coefficient
head(cm$membership)
## 1 2 3 4
## Iowa 0.00916 0.0191 0.9658 0.00594
## Rhode Island 0.09885 0.5915 0.2050 0.10463
## Maryland 0.22786 0.0475 0.0273 0.69731
## Tennessee 0.87231 0.0286 0.0211 0.07801
## Utah 0.04446 0.8218 0.0844 0.04929
## Arizona 0.11876 0.1008 0.0399 0.74056
# Visualize using corrplot
library(corrplot)
corrplot(cm$membership, is.corr = FALSE)
# Observation groups/clusters
cm$cluster
## Iowa Rhode Island Maryland Tennessee Utah
## 3 2 4 1 2
## Arizona Mississippi Wisconsin Virginia Maine
## 4 1 3 1 3
## Texas Louisiana Montana Michigan Arkansas
## 4 1 3 4 1
## New York Florida Alaska Hawaii New Jersey
## 4 4 1 2 2
Visualize clusters
library(factoextra)
fviz_cluster(list(data = df, cluster=cm$cluster),
ellipse.type = "norm",
ellipse.level = 0.68,
palette = "jco",
ggtheme = theme_minimal())
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
No Comments