cluster analysis - R: clustering documents -


I have found a document that looks like this:

  article name Product Manufacturer Loon Worlof Dos 1 1 1 2 1 0 Doctor 2 1 1 1 0 0 Doctor 3 0 1 1 2 1 Doctor 4 0 0 1 1 1  

In the package < Code> TM , it is possible to calculate Hammang distance between 2 documents. But now I want to cluster all those documents that have at least one hamming distance compared to 3. Here I would like Cluster 1 Document 1 and 2, and this Cluster 2 document is 3 and 4. Is there any possibility of doing this?

I saved your table on myData :

  myData article name Product Manufacturer Elon Vellolf Dock 1 1 1 2 1 0 Dock 2 1 1 1 0 0 0 Doc 3 0 0 1 1 2 1 Dosi 4 0 0 1 1 1  

Then the hamming.distance () function uses the e1071 library you can use your distance (unless they are matrix Form)

  Lilbury (e1071) distMat & lt; - hamming.distance (myData)  

To ensure that the maximum distance within a cluster can be specified later, followed by graded clustering using the "complete" linkage method was done.

  dendrogram & lt;  

Select the group according to the maximum distance between the numbers in the group (max = 5)

  Groups & lt; Plot the results in the end:  
  Plot (dendrud) # Main plot point (c (-100) - Kantari (Dendrogram, H = 5)  

, 100), c (5,5), col = "red", type = "l", lty = 2) # link line rect.hclust (dendrogram, h = 5, range = c (1: length (unique ( Group)) +1) # draw rectangle

hclust

Another way to view cluster membership for each document is with the table :

  table (g roups, rownames (myData)) group doc1 doc2 Doc3 doc4 1 1 Therefore, the documents of another group fall into one group while the third and the fourth - in the second group. 


Comments