NAME

dpan - create a DarkPAN from directories

SYNOPSIS

        # from the command line
        prompt> dpan [-l log4perl.config] [-f config] [directory [directory2]]

        # get some help
        prompt> dpan -h
        prompt> dpan --help

        # see the version
        prompt> dpan -v
        prompt> dpan --version

        # see the configuration (exits after printing config)
        prompt> dpan -c
        prompt> dpan -c -f config

DESCRIPTION

The dpan script takes a list of directories, indexes any Perl distributions it finds, and creates the PAUSE index files from what it finds. Afterward, you should be able to point a CPAN tool at the directory and install the distributions normally.

If you don't specify any directories, it works with the current working directory.

At the end, dpan creates a modules directory in the first directory (or the current working directory) and creates the 02package.details.txt.gz and 03modlist.data.gz.

Command-line processing

-c

Dump the configuration and exit.

-f config_file

Use the named configuration file.

-l log4perl_config_file

The path to the log4perl configuration file. You can also set this with the <DPAN_LOG4PERL_FILE> environment variable or the log4perl_file configuration directive.

Configuration options

If you don't specify these values in a configuation file, dpan will use its defaults. The behavior of these options only apply to the default components. If you subclass a component or use a different component, you might see something different.

Some of the configuration options come from MyCPAN::Indexer::App::BackPAN. The source of each directive is noted.

alarm

The maximum amount of time allowed to index a distribution, in seconds.

Default: 15

Affected components: Worker

Config level: MyCPAN

author_map

dpan will use this filename to add PAUSE IDs it finds in the repository to authors/00whois.xml and authors/01mailrc.txt.gz (see also pause_id and pause_full_name). Without this file, dpan will use the value of pause_full_name for new PAUSE IDs it finds.

The format is the file are lines that have the PAUSE ID followed by the full name and email address in angle brackets:

        BUSTER Buster Bean <buster@example.com>

This option helps your DPAN work with CPAN::Mini::Webserver, which tries to pull values out of authors/00whois.xml and authors/01mailrc.txt.gz to insert into its templates.

Default: undef

Affected components: Collator

Config level: DPAN

collator_class

The Perl class to use as the Dispatcher component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::App::DPAN::Reporter::Minimal

copy_bad_dists

If set to a true value, copy bad distributions to the named directory so you can inspect them later.

Default: 0

Affected components: Queue

Config level: MyCPAN

dispatcher_class

The Perl class to use as the Dispatcher component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::Indexer::Dispatch::Serial

Config level: MyCPAN

dpan_dir

The directory where dpan will create the DPAN. It takes a single directory.

Default: the current working directory

Affected components: all of them

Config level: DPAN

error_report_subdir

The subdirectory under report_dir to use for error reports.

Default: error

Affected components: Reporter, Collator

Config level: MyCPAN

extra_reports_dir

You can specify another directory that contains pre-indexed reports. dpan will add these reports to its queue to create 02packages.details.txt.gz. So far, dpan doesn't check that these reports correspond to any files in the repository and could cause the checks against 02packages.details.txt.gz to fail.

You probably want to use this as a way to inject information about distributions that you skipped with skip_dists_regexes. This is also very handy for problematic distributions that resist indexing.

Default: undef

Affected components: Indexer

Config level: DPAN

fresh_start

Delete the report directory before indexing. This cleans out all previous work in report_dir, so you need to save that on your own. You can also set this with the DPAN_FRESH_START environment variable.

Default: 0

Affected components: Queue, Reporter

Config level: DPAN

i_ignore_errors_at_my_peril

dpan considers some problems to be too troublesome to continue, such as when it thinks it created incomplete index files. If you don't want dpan to stop, set i_ignore_errors_at_my_peril to a true value. However, you might get an invalid mirror.

Default: 0

Affected components: Collator

Config level: DPAN

ignore_missing_dists

dpan aborts the process if it finds distributions in the repository that don't show up in 02packages.details.txt.gz, which it think signals an incomplete index. If you want to risk an incomplete index, set this to a true value.

Default: 0

Affected components: Collator

Config level: DPAN

ignore_packages

You can tell DPAN to ignore some namespaces. The indexer may still record them, but they won't show up in 02packages.details.txt.gz. It's a space-separated list of exact package names

Default: main MY MM DB bytes DynaLoader

Affected components: Indexer

Config level: DPAN

indexer_class

The Perl class to use as the Indexer component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::App::DPAN::Indexer

indexer_id

Give yourself a name so people who who ran dpan.

Default: Joe Example <joe@example.com>

Affected components: Reporter

Config level: MyCPAN

interface_class

The Perl class to use as the Interface component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::Indexer::Interface::Text

Config level: MyCPAN

log_file_watch_time

The time, in seconds, to wait until Log4perl checks for an updated configuration file.

Default: 30

Affected components: all

Config level: MyCPAN

log4perl_file

The path to the log4perl configuration file. You can also set this with the <-l> switch or the DPAN_LOG4PERL_FILE environment variable.

Default: undef

Affected components: all

Config level: MyCPAN

merge_dirs

A space separated list of directories containing distributions to merge into DPAN. These will go into the directory for pause_id. The files are copied, not renamed, so it works across partitions and mount points.

Default: undef

Affected components: Queue

Config level: MyCPAN

organize_dists

Take all of the distributions dpan finds and put the into a PAUSE-like structure under authors/id/D/DP/DPAN under dpan_dir. You can change the author ID with the pause_id directive.

Default: 0

Affected components: Queue

Config level: DPAN

parallel_jobs

The number of parallel jobs to run. This only matters for dispatcher classes that can do more than one thing at a time. The default dispatcher does not use parallel jobs. You need to change the dispatcher_class to MyCPAN::Indexer::Dispatcher::Parallel (or another class that supports this).

Default: 0

Affected components: Dispatcher

Config level: MyCPAN

pause_id

The author ID to use if organize_dists is set to a true value.

Default: DPAN

Affected components: Queue, Reporter, Collator

Config level: DPAN

pause_full_name

The full name to use for the default PAUSE ID and any unclaimed ID dpan finds in the repository. This shows up in the 01mailrc.txt.gz and 00whois.xml files.

Default: "DPAN user <CENSORED>"

Affected components: Collator

Config level: DPAN

postflight_class

If defined, after dpan has finished all of its work and is ready to exit, it will try to load this class and execute its run() method. It passes run() the application object, so you have full access to everything. See MyCPAN::App::DPAN::NullPostFlight for an example. To use the application object, you need to know a little about the guts of MyCPAN::Indexer.

Default: undef

Affected components: none

Config level: DPAN

prefer_bin

If set to a true value, dpan will try to use an external binary before it results to a pure Perl option. This matters mostly for Archive::Extract. Your tar utility may have better luck than Archive::Tar.

This setting overrides the PREFER_BIN environment variable. You can also set that environment variable to change how Archive::Extract decides to extract an archive.

Default: 0

Affected components: Indexer

Config level: MyCPAN

queue_class MyCPAN::Indexer::SomeOtherQueue

The Perl class to use as the Queue component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::Indexer::SkipQueue

Config level: MyCPAN

relative_paths_in_report

If true, supporting Reporter classes change the path to the distribution file to be relative to authors/id. Otherwise, the path in the report is the absolute path to the distribution.

Supported by MyCPAN::App::DPAN::Reporter::Minimal.

Default: true

Affected components: Reporter

Config level: DPAN

reporter_class

The Perl class to use as the Reporter component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::App::DPAN::Reporter::Minimal

Config level: MyCPAN

report_dir

Where to store the distribution reports.

Default: a directory named indexer_reports in the current working directory

Affected components: Reporter

Config level: MyCPAN

retry_errors

Try to index a distribution even if it was previously tried and had an error. This depends on previous reports being in report_dir, so if you don't set that configuration directive, it won't matter.

Default: 1

Affected components: Indexer

Config level: DPAN

skip_dists_regexes

You can specify a list of whitespace-separated regexes for dpan to use to filter the queue of distributions to index. You probably want to use this to skip very large distributions. You can have a pre-made index report by setting extra_reports_dir.

This is only supported by MyCPAN::Indexer::SkipQueue.

Default: null

Affected components: Queue

Config level: DPAN

skip_perl

When set to a true value, skip_perl cause dpan to ignore distributions that match /^(strawberry-?)perl-/, since these can take a long time to index. You can have a pre-made index report by setting extra_reports_dir.

This is only supported by MyCPAN::Indexer::SkipQueue.

Default: 0

Affected components: Queue

Config level: DPAN

error_report_subdir

The subdirectory under report_dir to use for success reports.

Default: success

Affected components: Reporter, Collator

Config level: MyCPAN

system_id macbookpro

Give the indexing system a name, just to identify the machine. This shows up in the run_info, although the default Reporter does not include it in the report.

Default: 'an unnamed machine'

Affected components: Reporter

Config level: MyCPAN

temp_dir

Where to unpack the dists or create any temporary files.

Default: a temp directory in the current working directory

Affected components: Worker

Config level: MyCPAN

use_real_whois

When use_real_whois is set to a true value, dpan tries to update the authors/00whois.xml and authors/01mailrc.txt.gz files from a real CPAN mirror. This will overwrite the files already in the repository, although it will update the CPAN versions with PAUSE IDs dpan finds in the repository (see author_map).

The default case for most dpan options is to avoid the real CPAN, so by default dpan will create stub files for these files instead of fetching the real ones.

If you are using CPAN::Mini, you can mirror 00whois.xml by adding a line to your .minicpanrc:

        also_mirror authors/00whois.xml

Default: 0

Affected components: Collator

Config level: DPAN

worker_class

The Perl class to use as the Worker component. See MyCPAN::Indexer::Tutorial for details on components.

Default: MyCPAN::Indexer::Worker

Logging

dpan uses Log4perl if it is available. Without Log4perl, it uses a small internal logger that prints to the screen.

You can set the Log4perl levels on each of the components separately:

        log4perl.rootLogger               =    FATAL, Null

        log4perl.logger.backpan_indexer   =    DEBUG, File

        log4perl.logger.Indexer           =    DEBUG, File
        log4perl.logger.Worker            =    DEBUG, File

        log4perl.logger.Interface         =    DEBUG, File

        log4perl.logger.Dispatcher        =    DEBUG, File
        log4perl.logger.Queue             =    DEBUG, File

        log4perl.logger.Reporter          =    DEBUG, File
        log4perl.logger.Collator          =    DEBUG, File

ENVIRONMENT VARIABLES

DPAN_FRESH_START

Delete the report directory before indexing. This cleans out all previous work, so you need to save that on your own. You can also set this with the fresh_start configuration directive.

DPAN_LOG4PERL_FILE

The path to the log4perl configuration file. You can also set this with the <-l> switch of the log4perl_file configuration directive.

DPAN_LOGLEVEL

The Log4perl level for easyinit. This won't affect the logging level if you specify a log configuration file.

SEE ALSO

MyCPAN::Indexer, MyCPAN::Indexer::DPAN

SOURCE AVAILABILITY

This code is in Github:

      git://github.com/briandfoy/mycpan-indexer.git
      git://github.com/briandfoy/mycpan--app--dpan.git

AUTHOR

brian d foy, <bdfoy@cpan.org>

COPYRIGHT AND LICENSE

Copyright (c) 2008-2009, brian d foy, All Rights Reserved.

You may redistribute this under the same terms as Perl itself.