Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

HierarchicalClusterer

$
0
0
Hi, I am trying to implement a hierarchical clustring of a distance matrix. I first construct Instances form double[][] matrix and use the Instances to HierarchicalCluster. I could not figure out how to retrieve the clustering output. When I get the Newick output, the nodes are not named properly for some reason, and I cannot make sense of the output.

Also, do I need to setNumClusters()? I thought you always end up with 1 cluster after agglomerate clusters. When I don't set anything, it returns 2 clusters. Is it because the data I have is very sparse?

Here is the code and output. Could someone advise me how to get the correct output to draw dendograms?

Code:

double[][] matrix;
String[] names;
......
//create Instances
        FastVector atts = new FastVector();
        int numDimensions = samples.length;
        int numInstances = samples.length;


        List<Instance> instances = new ArrayList<Instance>();
        for(int i = 0; i<numDimensions; i++){
                Attribute current = new Attribute("Attribute"+i, i);
                // Attribute current = new Attribute(names[i], i);
                //create instances
                if(i==0){
                        for(int obj = 0; obj < numInstances; obj++){
                                instances.add(new SparseInstance(numDimensions));
                        }
                }
                //fill the values
                for(int j = 0; j < numInstances; j++){
                        instances.get(j).setValue(current, matrix[i][j]);
                }
                atts.addElement(current);
        }
        //create new dataset
        Instances newDataset = new Instances("Dataset", atts, instances.size());
        //fill in data objects
        for(Instance inst:instances){
                newDataset.add(inst);
        }

tring[] options = new String[2];
        options[0] = "-L";
        options[1] = "AVERAGE";


        int[] results = new int[numInstances];


        HierarchicalClusterer hc = new HierarchicalClusterer();
        try{
                hc.setOptions(options);
                hc.setPrintNewick(true);
                // hc.setNumClusters(10);
                hc.setDebug(true);
                hc.buildClusterer(newDataset);


                println(hc.toString());
                println(hc.numClustersTipText());
                println(hc.graph());
}catch(Exception e){
                println(e.getMessage());
        }
//here I am trying to get the order of nodes....
        Instance temp;
        for(int i = 0; i<numInstances; i++){
                try{
                        temp = newDataset.instance(i);
                        results[i] = hc.clusterInstance(temp);
                        // println("Debug:"+temp.toString());
                }catch(Exception e){
                        println(e.getLocalizedMessage());
                }
        }


        println("results:"+Arrays.toString(results));


output looks something like this

Code:

Newick:(((((((1.0:0,1.0:0):1.08978,(1.0:0,1.0:0):1.08978):0.48826,1.0:1.57805):0.54355, .....
results:[0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1 .......


Viewing all articles
Browse latest Browse all 16689

Trending Articles