Statistics::Cluto - Perl binding for CLUTO
Download CLUTO from http://glaros.dtc.umn.edu/gkhome/views/cluto.
Find libcluto.a which matches your environment and place it under your library path (or specify its path with LIBS option as shown below).
libcluto.a
Then do:
perl Makefile.PL [LIBS='-L/where/to/find/libcluto.a -lcluto'] make make test make install
Tested with cluto-2.1.2/Darwin-i386, cluto-2.1.2/Darwin-ppc and cluto-2.1.1/Linux-i686.
use Statistics::Cluto; use Data::Dumper; my $c = new Statistics::Cluto; $c->set_dense_matrix(4, 5, [ [8, 8, 0, 3, 2], [2, 9, 9, 1, 4], [7, 6, 1, 2, 3], [1, 7, 8, 2, 1] ]); $c->set_options({ rowlabels => [ 'row0', 'row1', 'row2', 'row3' ], collabels => [ 'col0', 'col1', 'col2', 'col3', 'col4' ], nclusters => 2, rowmodel => CLUTO_ROWMODEL_NONE, colmodel => CLUTO_COLMODEL_NONE, pretty_format => 1, }); my $clusters = $c->VP_ClusterRB; print Dumper $clusters; my $cluster_features = $c->V_GetClusterFeatures; print Dumper $cluster_features;
This is a perl binding for CLUTO. Please refer to the CLUTO's manual sections 5.6 - 5.8 for details of each function. Basically, Statistics::Cluto has all corresponding methods for functions described in the manual.
Initial matrix can be set either via set_dense_matrix or via set_sparse_matrix method.
set_dense_matrix
set_sparse_matrix
# loading 4x5 dense matrix # # 1 1 0 1 1 # 1 0 0 1 0 # 0 1 1 0 0 # 0 0 1 0 0 my $c = new Statistics::Cluto; my $nrows = 4; my $ncols = 5; my $rowval = [ [1, 1, 0, 0, 1], [1, 1, 0, 1, 1], [1, 0, 1, 1, 0], [1, 0, 1, 0, 0] ]; $c->set_dense_matrix($nrows, $ncols, $rowval); # loading 4x5 sparse matrix # # 1 1 0 1 1 # 1 0 0 1 0 # 0 1 1 0 0 # 0 0 1 0 0 my $c = new Statistics::Cluto; my $nrows = 4; my $ncols = 5; my $rowval = [ [1, 1, 2, 1, 4, 1, 5, 1], [1, 1, 4, 1], [2, 1, 3, 1], [3, 1] ]; $c->set_sparse_matrix($nrows, $ncols, $rowval)
Sparse matrix can also be set with set_raw_sparse_matrix, using the data format described in the manual section 3.3, Fig 16.
set_raw_sparse_matrix
# loading sparse matrix via set_raw_sparse_matrix() # # 1 1 0 1 1 # 1 0 0 1 0 # 0 1 1 0 0 # 0 0 1 0 0 my $c = new Statistics::Cluto; my $nrows = 4; my $ncols = 5; my $rowptr = [0, 4, 6, 8, 9]; my $rowind = [0, 1, 3, 4, 0, 3, 1, 2, 2]; my $rowval = [1, 1, 1, 1, 1, 1, 1, 1, 1]; $c->set_raw_sparse_matrix($nrows, $ncols, $rowptr, $rowind, $rowval);
Input parameters nrows, ncols, rowptr, rowind, rowval are set automatically when initial matrix is loaded. All other input parameters should be set before calling clustering functions via set_options method. See sections 5.6 - 5.8 for necessary parameters.
nrows
ncols
rowptr
rowind
rowval
set_options
$c->set_options({ rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'], collabels => ['col0', 'col1', 'col2', 'col3', 'col4'], nclusters => 2, nfeatures => 2, clfun => CLUTO_CLFUN_I2, treetype => CLUTO_TREE_TOP, });
CLUTO's api functions described in the manual sections from 5.6 to 5.8 can be called with methods of the same name, but without prefix "CLUTO_".
e.g. CLUTO_VP_ClusterDirect (in section 5.6.1) is named VP_ClusterDirect in this package.
CLUTO_VP_ClusterDirect
VP_ClusterDirect
Routines with a single output parameter will return a single value / arrayref. Routines with multiple output parameters will return an array, each member of the array being the output parameters appearing in the same order as the manual.
# suppose $c is initialized with 5x5 sparse matrix: # col0 ... col4 # row0: 2 2 0 2 2 # row1: 2 1 0 1 4 # row2: 0 2 5 0 0 # row3: 0 1 6 0 0 # row4: 2 1 0 3 4 $c->set_options({ rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'], collabels => ['col0', 'col1', 'col2', 'col3', 'col4'], nclusters => 2, nfeatures => 2, }); my $part = $c->VP_ClusterDirect; # $part = [ # '1', # '1', # '0', # '0', # '1' # ]; my ($internalids, $internalwgts, $externalids, $externalwgts) = $c->V_GetClusterFeatures; # $internalids = # [ # '2', # '0', # '4', # '0' # ] # $internalwgts = # [ # '1', # '0', # '0.598181843757629', # '0.209491595625877' # ] # $externalids = # [ # '2', # '4', # '2', # '4' # ] # $externalwgts = # [ # '0.5', # '0.299090921878815', # '0.5', # '0.299090921878815' # ]
Please refer to the manual for the details of the returned data structure.
When pretty_format option is set to 1, results are returned in a single hashref, and in a (hopefully) little bit more comprehensible way. Meaning of the returned data should be pretty much self-explanatory.
pretty_format
# with the same matrix and options as above... $c->set_options({ pretty_format => 1 }); my $result = $c->VP_ClusterDirect; # $result = # [ # [ # { 'row' => 2, 'rowlabel' => 'row2' }, # { 'row' => 3, 'rowlabel' => 'row3' } # ], # [ # { 'row' => 0, 'rowlabel' => 'row0' }, # { 'row' => 1, 'rowlabel' => 'row1' }, # { 'row' => 4, 'rowlabel' => 'row4' } # ] # ]; $result = $c->V_GetClusterFeatures; # $result = # [ # [ # { # 'discriminating' => [ # { # 'externalwgt' => '0.5', # 'collabel' => 'col2', # 'externalid' => 2 # }, # { # 'externalwgt' => '0.299090921878815', # 'collabel' => 'col4', # 'externalid' => 4 # } # ], # 'descriptive' => [ # { # 'internalid' => 2, # 'internalwgt' => '1', # 'collabel' => 'col2' # }, # { # 'internalid' => 0, # 'internalwgt' => '0', # 'collabel' => 'col0' # } # ] # }, # { # 'discriminating' => [ # { # 'externalwgt' => '0.5', # 'collabel' => 'col2', # 'externalid' => 2 # }, # { # 'externalwgt' => '0.299090921878815', # 'collabel' => 'col4', # 'externalid' => 4 # } # ], # 'descriptive' => [ # { # 'internalid' => 4, # 'internalwgt' => '0.598181843757629', # 'collabel' => 'col4' # }, # { # 'internalid' => 0, # 'internalwgt' => '0.209491595625877', # 'collabel' => 'col0' # } # ] # } # ] # ];
use Statistics::Cluto qw(:all)
will export all constants defined in cluto.h. (Auto generated by h2xs). See section 5 of CLUTO's manual, or cluto.h for details.
cluto.h
CLUTO_CLFUN_CLINK CLUTO_CLFUN_CLINK_W CLUTO_CLFUN_CUT CLUTO_CLFUN_E1 CLUTO_CLFUN_G1 CLUTO_CLFUN_G1P CLUTO_CLFUN_H1 CLUTO_CLFUN_H2 CLUTO_CLFUN_I1 CLUTO_CLFUN_I2 CLUTO_CLFUN_MMCUT CLUTO_CLFUN_NCUT CLUTO_CLFUN_RCUT CLUTO_CLFUN_SLINK CLUTO_CLFUN_SLINK_W CLUTO_CLFUN_UPGMA CLUTO_CLFUN_UPGMA_W CLUTO_COLMODEL_IDF CLUTO_COLMODEL_NONE CLUTO_CSTYPE_BESTFIRST CLUTO_CSTYPE_LARGEFIRST CLUTO_CSTYPE_LARGESUBSPACEFIRST CLUTO_DBG_APROGRESS CLUTO_DBG_CCMPSTAT CLUTO_DBG_CPROGRESS CLUTO_DBG_MPROGRESS CLUTO_DBG_PROGRESS CLUTO_DBG_RPROGRESS CLUTO_GRMODEL_ASYMETRIC_DIRECT CLUTO_GRMODEL_ASYMETRIC_LINKS CLUTO_GRMODEL_EXACT_ASYMETRIC_DIRECT CLUTO_GRMODEL_EXACT_ASYMETRIC_LINKS CLUTO_GRMODEL_EXACT_SYMETRIC_DIRECT CLUTO_GRMODEL_EXACT_SYMETRIC_LINKS CLUTO_GRMODEL_INEXACT_ASYMETRIC_DIRECT CLUTO_GRMODEL_INEXACT_ASYMETRIC_LINKS CLUTO_GRMODEL_INEXACT_SYMETRIC_DIRECT CLUTO_GRMODEL_INEXACT_SYMETRIC_LINKS CLUTO_GRMODEL_NONE CLUTO_GRMODEL_SYMETRIC_DIRECT CLUTO_GRMODEL_SYMETRIC_LINKS CLUTO_MEM_NOREUSE CLUTO_MEM_REUSE CLUTO_MTYPE_HEDGE CLUTO_MTYPE_HSTAR CLUTO_MTYPE_HSTAR2 CLUTO_OPTIMIZER_MULTILEVEL CLUTO_OPTIMIZER_SINGLELEVEL CLUTO_ROWMODEL_LOG CLUTO_ROWMODEL_MAXTF CLUTO_ROWMODEL_NONE CLUTO_ROWMODEL_SQRT CLUTO_SIM_CORRCOEF CLUTO_SIM_COSINE CLUTO_SIM_EDISTANCE CLUTO_SIM_EJACCARD CLUTO_SUMMTYPE_MAXCLIQUES CLUTO_SUMMTYPE_MAXITEMSETS CLUTO_TREE_FULL CLUTO_TREE_TOP CLUTO_VER_MAJOR CLUTO_VER_MINOR CLUTO_VER_SUBMINOR
http://glaros.dtc.umn.edu/gkhome/views/cluto
Ikuhiro IHARA <tsukue@gmail.com>
Copyright (C) 2007 by Ikuhiro IHARA
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.
To install Statistics::Cluto, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Statistics::Cluto
CPAN shell
perl -MCPAN -e shell install Statistics::Cluto
For more information on module installation, please visit the detailed CPAN module installation guide.