[R]聚類算法:k-means模組

2021年11月14日星期日

延伸<[Excel]k-means聚類算法的應用，以評價現有供應商的水平為例。>文章，同時恰巧在圖書館看到一本R語言機器學習書籍，因此正好可進一步了解如何用R語言來實現k-means算法和應用，一併將k-means模組建立起來，做為未來參照使用。

利用R語言常用的iris數據集，同時在已知有三個品種的鸢尾花(setosa, versicolor, virginica)，分別各有50個樣品，程式碼如下:

#clear

rm(list=ls())

#pull in data

kmean_iris<-iris

#erase species data

kmean_iris$Species<-NULL

#apply k-mean with k=3

clusters<-kmeans(kmean_iris,3,iter.max=100,nstart = 20)

#plot the clustered points along sepal length and width

plot(kmean_iris[c("Petal.Length","Petal.Width")], col=clusters$cluster,pch=16, cex=1)

#plot the center of group

points(clusters$centers[,c("Petal.Length","Petal.Width")],col="blue",pch=8, cex=5)

#comparsion

Realdata<-as.data.frame(table(iris$Species))

kmeanresult<-as.data.frame(table(clusters$cluster))

#輸出結果

> kmeanresult

Var1 Freq

1 1 62

2 2 50

3 3 38

由結果可知，依照k-means算法，根據花瓣(Petal)的長度和寬度的分布，若以設定三種類別來區分，分別各有62個, 50個和38個，換言之，兩個品種(versicolor和virginica)花瓣的長度和寬度相似。

Reference:

R語言機器學習(實用案例分析)，ISBN 978-7-111-56590-1

Learn & Share