Hi, I am trying to implement a hierarchical clustring of a distance matrix. I first construct Instances form double[][] matrix and use the Instances to HierarchicalCluster. I could not figure out how to retrieve the clustering output. When I get the Newick output, the nodes are not named properly for some reason, and I cannot make sense of the output.
Also, do I need to setNumClusters()? I thought you always end up with 1 cluster after agglomerate clusters. When I don't set anything, it returns 2 clusters. Is it because the data I have is very sparse?
Here is the code and output. Could someone advise me how to get the correct output to draw dendograms?
output looks something like this
Also, do I need to setNumClusters()? I thought you always end up with 1 cluster after agglomerate clusters. When I don't set anything, it returns 2 clusters. Is it because the data I have is very sparse?
Here is the code and output. Could someone advise me how to get the correct output to draw dendograms?
Code:
double[][] matrix;
String[] names;
......
//create Instances
FastVector atts = new FastVector();
int numDimensions = samples.length;
int numInstances = samples.length;
List<Instance> instances = new ArrayList<Instance>();
for(int i = 0; i<numDimensions; i++){
Attribute current = new Attribute("Attribute"+i, i);
// Attribute current = new Attribute(names[i], i);
//create instances
if(i==0){
for(int obj = 0; obj < numInstances; obj++){
instances.add(new SparseInstance(numDimensions));
}
}
//fill the values
for(int j = 0; j < numInstances; j++){
instances.get(j).setValue(current, matrix[i][j]);
}
atts.addElement(current);
}
//create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());
//fill in data objects
for(Instance inst:instances){
newDataset.add(inst);
}
tring[] options = new String[2];
options[0] = "-L";
options[1] = "AVERAGE";
int[] results = new int[numInstances];
HierarchicalClusterer hc = new HierarchicalClusterer();
try{
hc.setOptions(options);
hc.setPrintNewick(true);
// hc.setNumClusters(10);
hc.setDebug(true);
hc.buildClusterer(newDataset);
println(hc.toString());
println(hc.numClustersTipText());
println(hc.graph());}catch(Exception e){ println(e.getMessage());
}
//here I am trying to get the order of nodes.... Instance temp;
for(int i = 0; i<numInstances; i++){
try{
temp = newDataset.instance(i);
results[i] = hc.clusterInstance(temp);
// println("Debug:"+temp.toString());
}catch(Exception e){
println(e.getLocalizedMessage());
}
}
println("results:"+Arrays.toString(results));
output looks something like this
Code:
Newick:(((((((1.0:0,1.0:0):1.08978,(1.0:0,1.0:0):1.08978):0.48826,1.0:1.57805):0.54355, .....
results:[0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1 .......