comparePartitions.pl - Script to compare set partitions.
comparePartitions.pl [-f fileP fileQ -t ',' -h -c]
The script comparePartitions.pl computes the accuracy and precision of the set partitions stored in the files fileP and fileQ.
comparePartitions.pl
fileP
fileQ
-f fileP fileQ
The option -f specifies the two files containing the partitions to be compared. Each line of a file is treated as a subset of the partition whose elements are stored as comma separated values. The module Text::CSV is used to parse each line. The files must in UTF-8 format.
-f
The set of elements comprizing each partition must be equal to properly compare them. Set elements missing from either partition are added to the other partition as singleton subsets. For example, if fileP and fileQ contained the lines indicated below
fileP fileQ ----- ----- line 1 a,b,c a,b line 2 d,e,f c,d line 3 g,h
then the singleton sets {g} and {h} are added to partition P making it equal {{a,b,c}, {d,e,f}, {g}, {h}} and similarly, the sets {e} and {f} are added to Q making it equal {{a,b}, {c,d}, {e}, {f}, {g,h}}.
{g}
{h}
P
{{a,b,c}, {d,e,f}, {g}, {h}}
{e}
{f}
Q
{{a,b}, {c,d}, {e}, {f}, {g,h}}
-t ','
Use option -t to set the delimiter to use in the CSV files fileP and fileQ; the default delimiter is a comma.
-t
-c
If option -c is present, the subsets in each partition are checked to ensure they are disjoint. If they are not, an exception is thrown.
-h
Causes this documentation to be printed.
If there are no errors, the output is the comma separated line accuracy,precision,fileP,fileQ.
accuracy,precision,fileP,fileQ
To install the module run the following commands:
perl Makefile.PL make make test make install
If you are on a windows box you should use 'nmake' rather than 'make'.
Please email bugs reports or feature requests to bug-set-partitions-similarities@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Set-Partitions-Similarity. The author will be notified and you can be automatically notified of progress on the bug fix or feature request.
bug-set-partitions-similarities@rt.cpan.org
Jeff Kubina<jeff.kubina@gmail.com>
Copyright (c) 2009 Jeff Kubina. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
accuracy, clustering, measure, metric, partitions, precision, set, similarity
Concise explainations of many cluster validity measures (including set partition measures) are available on the Cluster validity algorithms page of the Machaon Clustering and Validation Environment web site by Nadia Bolshakova.
The Wikipedia article Accuracy and precision has a good explaination of the accuracy and precision measures when applied to binary classifications.
The report Objective Criteria for the Evaluation of Clustering Methods (1971) by W.M. Rand in the Journal of the American Statistical Association provides an excellent analysis of the accuracy measure of partitions.
Math::Set::Partitions::Similarity, Text::CSV
To install Set::Partitions::Similarity, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Set::Partitions::Similarity
CPAN shell
perl -MCPAN -e shell install Set::Partitions::Similarity
For more information on module installation, please visit the detailed CPAN module installation guide.