Difference between revisions of "K-Means"

From Rodinia
Jump to: navigation, search
(New page: K-means is a clustering algorithm used extensively in data-mining and elsewhere, important primarily for its simplicity. Many data-mining algorithms show a high degree of data parallelism....)
 
Line 1: Line 1:
 
K-means is a clustering algorithm used extensively in data-mining and elsewhere,
 
K-means is a clustering algorithm used extensively in data-mining and elsewhere,
 
important primarily for its simplicity. Many data-mining algorithms show a high degree of data parallelism.
 
important primarily for its simplicity. Many data-mining algorithms show a high degree of data parallelism.
Our GPU implementation is based on the k-means implementation of Northwestern University's [http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html Minebench].
 
  
 
In k-means, a data object is comprised of several values, called features. By dividing
 
In k-means, a data object is comprised of several values, called features. By dividing
Line 12: Line 11:
 
sub-cluster respectively. The algorithm iterates until no data objects move from one
 
sub-cluster respectively. The algorithm iterates until no data objects move from one
 
sub-cluster to another.
 
sub-cluster to another.
 +
 +
Our GPU implementation is based on the k-means implementation of Northwestern University's Minebench The source code of
 +
serial/OpenMP version of k-means which is developed by Minebench team can be downloaded from [http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html link1] and [www.people.virginia.edu/~sc5nf/kmeans.zip link2].

Revision as of 16:17, 1 July 2008

K-means is a clustering algorithm used extensively in data-mining and elsewhere, important primarily for its simplicity. Many data-mining algorithms show a high degree of data parallelism.

In k-means, a data object is comprised of several values, called features. By dividing a cluster of data objects into K sub-clusters, k-means represents all the data objects by the mean values or centroids of their respective sub-clusters. The initial cluster center for each sub-cluster is randomly chosen or derived from some heuristic. In each iteration, the algorithm associates each data object with its nearest center, based on some chosen distance metric. The new centroids are calculated by taking the mean of all the data objects within each sub-cluster respectively. The algorithm iterates until no data objects move from one sub-cluster to another.

Our GPU implementation is based on the k-means implementation of Northwestern University's Minebench The source code of serial/OpenMP version of k-means which is developed by Minebench team can be downloaded from link1 and [www.people.virginia.edu/~sc5nf/kmeans.zip link2].