I have found a document that looks like this:
article name Product Manufacturer Loon Worlof Dos 1 1 1 2 1 0 Doctor 2 1 1 1 0 0 Doctor 3 0 1 1 2 1 Doctor 4 0 0 1 1 1
In the package < Code> TM , it is possible to calculate Hammang distance between 2 documents. But now I want to cluster all those documents that have at least one hamming distance compared to 3. Here I would like Cluster 1 Document 1 and 2, and this Cluster 2 document is 3 and 4. Is there any possibility of doing this?
I saved your table on myData :
myData article name Product Manufacturer Elon Vellolf Dock 1 1 1 2 1 0 Dock 2 1 1 1 0 0 0 Doc 3 0 0 1 1 2 1 Dosi 4 0 0 1 1 1
Then the hamming.distance ()
function uses the e1071
library you can use your distance (unless they are matrix Form)
Lilbury (e1071) distMat & lt; - hamming.distance (myData)
To ensure that the maximum distance within a cluster can be specified later, followed by graded clustering using the "complete" linkage method was done.
dendrogram & lt;
Select the group according to the maximum distance between the numbers in the group (max = 5)
Groups & lt; Plot the results in the end: Plot (dendrud) # Main plot point (c (-100) - Kantari (Dendrogram, H = 5)
, 100), c (5,5), col = "red", type = "l", lty = 2) # link line rect.hclust (dendrogram, h = 5, range = c (1: length (unique ( Group)) +1) # draw rectangle
Another way to view cluster membership for each document is with the table
:
table (g roups, rownames (myData)) group doc1 doc2 Doc3 doc4 1 1 Therefore, the documents of another group fall into one group while the third and the fourth - in the second group.
Comments
Post a Comment