wikipedia2alvis.pl - Wikipedia XML dump to Alvis XML converter
wikipedia2alvis.pl [options] [Wikipedia XML dump file] Options: --out-dir output directory --namespaces list of namespaces to extract --N-per-out-dir # of records per output directory --[no-]original include original document? --[no-]expand-templates-fully do we try to expand templates fully? --[no-]dump-templates do we dump the templates? --template-dump-file the file to dump the templates to --[no-]convert-via-html do we convert via HTML or directly to Alvis? --date the date of the Wikipedia dump --[no-]dump-category-graph do we dump the category graph? --category-graph-dump-file the file to dump the category graph to --category-word category namespace identifier --root-category root category identifier --template-word template namespace identifier --language the language of the Wikipedia dump --help brief help message --man full documentation --[no]warnings warnings output flag
Sets the output directory. Default value: '.'.
Sets the namespaces whose records to extract. Given as a ','-separated list. The namespace names have to be the exact identifiers. Articles are always extracted. Default value: '''', i.e. articles.
Sets the # of records per output directory. Default value: 1000.
Shall the original document be included in the output? Default value: no.
Do we try to expand templates fully or do we simply insert a list of the template parameter values given in the call? Default value: no.
Do we dump the templates onto disk in a loadable format? Default value: no.
The name of the (possible) template dump file. Default value: 'Templates.storable'.
Do we sacrifice speed for quality (possibly) by converting from Wikitext to Alvis XML via an intermediate HTML version. Default value: yes.
The language of the Wikipedia dump. Affects category and template extraction. Possible values: 'en' (English), 'fr' (French), 'sl' (Slovenian). Default value: 'en'.
The identifier for the category namespace. Overruled by '--language'. Default value: 'Category'.
The identifier for the root category of the category graph. Overruled by '--language'. Default value: 'fundamental'.
The identifier for the template namespace. Overruled by '--language'. Default value: 'Template'.
The date of the Wikipedia dump as YYYYMMDD. Default value: undefined (means: use current date).
Do we dump the category graph onto disk in a loadable format?. Default value: yes.
The name of the (possible) category graph dump file. Default value: 'CategoryGraph.storable'.
Prints a brief help message and exits.
Prints the manual page and exits.
Output (or suppress) warnings. Default value: yes.
Converts the articles in the Wikipedia XML dump to Alvis records.
To install Alvis::Convert, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Alvis::Convert
CPAN shell
perl -MCPAN -e shell install Alvis::Convert
For more information on module installation, please visit the detailed CPAN module installation guide.