The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Geo::KML::PolyMap - Generate KML/KMZ-format choropleth (shaded polygonal) maps viewable in Google Earth

SYNOPSIS

        use Geo::KML::PolyMap qw(generate_kml_file generate_kmz_file);

        # Clusters "Total Population" data for "Foobar City" in $entities into 5 bins;
        # renders using colors from $startcolor to $endcolor;
        # generates a legend; renders output to file handle passed in $kmz_filehandle
        generate_kmz_file(entities => $entities,
                          placename => "Foobar City",
                          data_desc => "Total Population",
                          nbins => 5,
                          kmzfh => $filehandle_for_kmz_output,
                          startcolor => "FFFF0000",
                          endcolor => "FF00FF00");

        # As above, but without a legend
        generate_kml_file(entities => $entities,
                          placename => "Foobar City",
                          data_desc => "Total Population",
                          nbins => 5,
                          kmlfh => $filehandle_for_kml_output,
                          startcolor => "FFFF0000",
                          endcolor => "FF00FF00");

REQUIRES

  • Carp

  • Archive::Zip

  • GD >=2.0

  • File::Temp

  • Statistics::Descriptive

DESCRIPTION

Geo::KML::PolyMap generates KML or KMZ-formatted maps for Google Earth. Given a set of polygonal regions and a number associated with each region (for example, city blocks and population counts on each block), Geo::KML::PolyMap generates a choropleth map showing the data value for each region as a shaded polygon. The polygons are divided into a number of bins, with the color of each bin unique. Optionally, Geo::KML::PolyMap will generate a legend along with the map file to illustrate the data ranges represented by each color.

CONFIGURATION

Geo::KML::PolyMap includes two parameters which must be configured by direct code changes.

Font Selection

To generate legend files with generate_kmz_file(), you must specify the path to a TrueType (.ttf) font file in the variable $FONT_PATH. This is clearly suboptimal and will change in a future revision.

Binning Method

The algorithm used to bin data points is also configurable. Please see the section on binning in generate_kml_file() for details.

DATA STRUCTURES

Points

A point is defined as a latitude,longitude pair. Since Google Earth uses the WGS-84 coordinate system, you probably should too.

Points are represented in Geo::KML::PolyMap as strings of the following form:

        (latitude,longitude)

So, for example, the following are legal points:

        my $pt = "(24,-12)";
        my $pt = "(123.456,-78.90)"

But the following are not:

        my $pt = "(199,140)";    # latitude is out of range
        my $pt = "24,-12";       # missing parentheses

Polygons

A polygon is defined as a series of at least 4 points in the plane. Consecutive points are joined to form the polygon edges, and the last point must be the same as the first. The mapping results are undefined if edges in the polygon cross.

Polygons are represented in Geo::KML::PolyMap as strings containing comma-delimited lists of points, wrapped in a pair of parentheses:

        "((lat1,long1),(lat2,long2),(lat3,long3),(lat1,long1))"

The following is an example of a legal polygon:

        my $poly = "((1,2),(3,4),(5,6),(7,8),(1,2))";

The following are examples of illegal polygons:

        my $poly = "((1,2),(3,4),(5,6),(7,8))"; # last point must be the same as the first point
        my $poly = "((1,2),(3,4),(1,2))";       # not enough points; need at least 4
        my $poly = "(1,2),(3,4),(5,6),(1,2)";   # missing parentheses

Entities

Entities are the structure used by Geo::KML::PolyMap to move data into the map generation process. An entity is a very simple polygon/data pair, stored in a hashref. The polygon must be accessible from key "polygon", and the data point must be a number accessible from key "data":

        my $polygon = "((1,2),(3,4),(5,6),(1,2))";
        my $data = "10";
        my $entity = {  data => $data,
                        polygon => $polygon};

Geo::KML::PolyMap functions take references to arrays of entities:

        # Assume we have $entity1,$entity2,$entity3 defined already
        my $mapdata = [$entity1,$entity2,\$entity3];

METHODS

generate_kml_file() -- generate a KML file (map only)

Renders the data passed in in entities to a KML file, rendered to the filehandle passed in.

Parameters are passed in as named arguments in a hash.

Example:

        generate_kml_file(entities => $entities,
                          placename => "Foobar City",
                          data_desc => "Total Population",
                          nbins => 5,
                          kmlfh => $filehandle_for_kml_output,
                          startcolor => "FFFF0000",
                          endcolor => "FF00FF00");

Mandatory arguments

  • entities

    Reference to an array of "entities", the data structure described above, used to store lists of (polygon,data) pairs.

  • placename

    A string containing a textual description (name) of the place represented by the given entities

  • data_desc

    A string describing the sort of data given in the entities

  • nbins

    The maximum number of bins into which to cluster the given data (see "Binning" below)

  • kmlfh

    Handle to an open-for-writing file into which to render the KML data.

Optional arguments

  • startcolor

    The OBGR color used for the bins with the lowest numerical value in the range provided. Defaults to FF000000. See "Colors" below.

  • endcolor

    The OBGR color used for the bins with the highest numerical value in the range provided. Defaults to FFFFFFFF.

Description

generate_kml_file renders the data provided in the given entities to a KML map suitable for display in Google Earth. To do this, it first separates the data into a user-configurable number of bins (see "Binning"), then assigns each bin a color. The bin with lowest numerical value is assigned color "startcolor" and the bin with largest value gets color "endcolor"; bins between these have their colors calculated by linear interpolation between these two values. The final KML file will have one placemark for each data bin, so that each bin can be viewed/hidden independently.

Placemark naming

Each placemark is named "[placename] Bin [n]", where placename is the parameter passed in, and n is the index of the bin which the placemark represents. Each placemark has a description "[data_desc] less than or equal to [bound]", where data_desc is passed in, and bound is the upper bound on the data values in that bin.

Binning

The code makes an attempt to separate the data into nbins separate bins. In some degenerate cases (such as nbins > #data points), there will be fewer output bins than requested, but there will never be more than nbins bins in the output map. There are three binning algorithms implemented in the code, as _bin_percentile, _bin_equipartition, and _bin_kmeans. The particular algorithm used can be modified by changing the function bin_entities (there may be future support for a parameter to change the method). The default method is _bin_kmeans. The algorithms are detailed below:

  • _bin_percentile

    This method calculates a histogram of the data values, then divides the bins equally by percentile. For example, with nbins=5, the bins will contain the [0,20), [20,40), [40,60), [60,80), [80,100] 'th percentiles of the data. This method is fast but has several drawbacks. The most serious is that the raw percentile boundaries are often not helpful in the presence of outliers.

  • _bin_equipartition

    This method calculates a histogram of the data values, then divides the histogram into nbins sections such that each bin has an (almost) equal number of data points within it. This also suffers the problem that outliers can induce highly artificial bin boundaries.

  • _bin_kmeans

    This method performs a k-means clustering on the data, with k=nbins. In theory, this should separate the data points into "natural" groupings; in practice it seems to work quite well. Its major disadvantage is that it is much more computationally intensive than the other two methods, a problem which is exacerbated when the number of data points becomes large.

Colors

Colors for this library are represented in the same OBGR format used by KML files. This format represents each color as a 32-bit hexadecimal number, with 8 bits each for opacity (transparency), blue, green, and red. Note that the ordering of values is different from usual web color specifications, which are RGB. Examples:

        FFFF0000 = pure blue
        80FF0000 = blue, 50% transparency
        00FF0000 = blue, fully transparent
        FF00FF00 = pure green
        FF0000FF = pure red

Colors for each bin are constructed by linear interpolation between the optional parameters startcolor and endcolor. The interpolation is not weighted by bin values; it is just a simple interpolation along the line between start and end in RGB space.

generate_kmz_file() -- generate a KMZ file (KML map + PNG legend)

Renders the data passed in in entities to a KML file. Generates appropriate legend, and combines the legend and KML file into a KMZ file stored into the filehandle passed as a parameter.

Parameters are passed in as named arguments in a hash.

Example:

        generate_kmz_file(entities => $entities,
                          placename => "Foobar City",
                          data_desc => "Total Population",
                          nbins => 5,
                          kmzfh => $filehandle_for_kmz_output,
                          startcolor => "FFFF0000",
                          endcolor => "FF00FF00");

Mandatory arguments

  • entities

    Reference to an array of "entities", the data structure described above, used to store lists of (polygon,data) pairs.

  • placename

    A string containing a textual description (name) of the place represented by the given entities

  • data_desc

    A string describing the sort of data given in the entities

  • nbins

    The maximum number of bins into which to cluster the given data (see "Binning" below)

  • kmzfh

    Handle to an open-for-writing file into which to render the KML data.

Optional arguments

  • startcolor

    The OBGR color used for the bins with the lowest numerical value in the range provided. Defaults to FF000000. See "Colors" under generate_kml_file().

  • endcolor

    The OBGR color used for the bins with the highest numerical value in the range provided. Defaults to FFFFFFFF.

Description

generate_kmz_file first generates a KML map for the given data, as described in generate_kml_file. It then generates a PNG legend containing a color swatch for each data bin and the range of values represented in each bin. The KML map and PNG legend are put together into a ZIP archive known as a KMZ file, ready for viewing in Google Earth. This KMZ file is written out to the filehandle passed in as a parameter.

Please see generate_kml_file for additional details on the rendered KML map.

This version of generate_kmz_file renders temporary data (the KML map) to a tempfile to reduce memory footprint.

TIPS

Chunk size in Archive::Zip

I've found that changing the chunk size ($ChunkSize) in the Archive::Zip module from the default 32K (32768) to around 128K (131072) can really speed up KMZ generation, especially for really big maps. To do this, change the line

        $ChunkSize=32768;

in Zip.pm to:

        $ChunkSize=131072;

AUTHOR

Imran Haque, ihaque@cs.stanford.edu

COPYRIGHT AND LICENSE

This module is Copyright 2007, Imran Haque, ihaque@cs.stanford.edu.

You may modify and/or redistribute this module under the same terms as Perl itself.