The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Changes for version 3.012

  • Interface:
  • FS methods add_ufeatures() and get_ufeatures() now store unknown language- specific features or values in the 'other' feature (and set tagset to 'mul::uposf').

Modules

DZ Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped.
Atomic driver for a surface feature.
Implements a converter between two physical tagsets via Interset.
Definition of morphosyntactic features and their values.
A temporary envelope that provides access to the old (Interset 1.0) drivers from Interset 2.0.
Atomic driver for a surface feature.
The root class for all physical tagsets covered by DZ Interset 2.0.
Driver for the Arabic tagset of the CoNLL 2006 Shared Task.
Driver for the Arabic tagset of the CoNLL 2007 Shared Task.
Driver for the PADT 2.0 / ElixirFM Arabic positional tagset.
Driver for the Bulgarian tagset of the CoNLL 2006 Shared Task.
Driver for the Bengali tagset of the ICON 2009 and 2010 Shared Tasks, as used in the CoNLL data format.
Driver for the Catalan tagset of the CoNLL 2009 Shared Task.
Driver for the tagset of the Czech morphological analyzers Ajka and Majka (Masaryk University in Brno).
Driver for the tagset of the Czech National Corpus (Český národní korpus).
Driver for the Czech tagset of the CoNLL 2006 and 2007 Shared Tasks.
Driver for the Czech tagset of the CoNLL 2009 Shared Task.
Driver for the Czech tagset of the Multext-EAST project.
Driver for the tagset of the Prague Dependency Treebank.
Driver for the Czech tagset of the Prague Spoken Corpus (Pražský mluvený korpus).
Driver for the shortened Czech tagset of the Prague Spoken Corpus (Pražský mluvený korpus).
Common code for drivers of tagsets from files in CoNLL 2006 format.
Driver for the Danish tagset of the CoNLL 2006 Shared Task (derived from the Danish Parole tagset).
Driver for the German tagset of the CoNLL 2006 Shared Task.
Driver for the German tagset of the CoNLL 2009 Shared Task.
Driver for the German tagset of SMOR (Stuttgart Morphology)
Driver for the Stuttgart-Tübingen Tagset of German.
Driver for the Greek tagset of the CoNLL 2007 Shared Task.
Driver for the English tagset of the CoNLL 2007 Shared Task.
Driver for the English tagset of the CoNLL 2009 Shared Task.
Driver for the tagset of the Penn Treebank.
Driver for the Spanish tagset of the CoNLL 2009 Shared Task.
Driver for the Estonian tagset from the Eesti keele puudepank (Estonian Language Treebank).
Driver for the tagset of the Basque Dependency Treebank in the CoNLL format.
Driver for the tagset of the Persian Dependency Treebank (in the CoNLL-X format).
Driver for the Finnish tagset from the Turku Dependency Treebank.
Driver for the Faroese tagset provided by Bjartensen.
Driver for the tagset of the Ancient Greek Dependency Treebank in CoNLL format.
Driver for the Hebrew tagset.
Driver for the Hindi tagset of the shared tasks at ICON 2009, ICON 2010 and COLING 2012, as used in the CoNLL data format.
Driver for the Croatian tagset of the Multext-EAST v4 project.
Driver for the Upper Sorbian tagset of the tagger created by Daniil Sorokin.
Driver for the Hungarian tagset of the CoNLL 2007 Shared Task (derived from the Szeged Treebank).
Driver for the Italian tagset of the CoNLL 2007 Shared Task (derived from the ISST, Italian Syntactic-Semantic Treebank).
Driver for the Japanese tagset of the CoNLL 2006 Shared Task (derived from the TüBa J/S Verbmobil treebank).
Driver for the IPADIC tagset.
Driver for the tagset of the Latin Dependency Treebank in CoNLL format.
Driver for the positional tagset of the Index Thomisticus Treebank.
Driver for the tagset of the Index Thomisticus Treebank in CoNLL format.
Driver for the tagset of the Maltese Language Software Services (TnT tagger).
Driver for the Google Universal Part-of-Speech Tagset.
Driver for the Universal Part-of-Speech Tagset, version 2014-10-01, part of Universal Dependencies.
Driver for the Universal Part-of-Speech Tagset + Universal Features, version 2014-10-01, part of Universal Dependencies.
Common code for drivers of tagsets of the Multext-EAST project.
Driver for the CGN/Lassy/Alpino Dutch tagset.
Driver for the Dutch tagset of the CoNLL 2006 Shared Task (derived from the Alpino tagset).
Driver for a Norwegian tagset.
Driver for the tagset of the Korpus Języka Polskiego IPI PAN for Polish.
Driver for the Portuguese tagset of the CINTIL corpus (Corpus Internacional do Portugues).
Driver for the Portuguese tagset of the CoNLL 2006 Shared Task (derived from the Bosque / Floresta sintá(c)tica treebank).
Driver for the EAGLES-based tagset for Portuguese in Freeling.
Driver for the Romanian tagset of the Multext-EAST v4 project.
Driver for the tagset of the Romanian Dependency Treebank (RDT).
Driver for Syntagrus (Russian Dependency Treebank) tags.
Driver for the tags of the Slovak National Corpus (Slovenský národný korpus)
Driver for the Slovene tagset of the CoNLL 2006 Shared Task (derived from the Slovene Dependency Treebank).
Driver for the Slovene tagset of the Multext-EAST v4 project.
Driver for the tagset of the Swedish treebank from the CoNLL 2006 Shared Task (Talbanken / Mamba).
Driver for the Mamba tagset of Swedish (Talbanken).
Driver for the Swedish PAROLE tagset.
Driver for the Swedish tagset of the Stockholm-Umeå Corpus.
Driver for the tagset of the (Prague) Tamil Dependency Treebank (TamilTB)
Driver for the Telugu tagset of the ICON 2009 and 2010 Shared Tasks, as used in the CoNLL data format.
Driver for the Turkish tagset of the CoNLL 2007 Shared Task (derived from the METU Sabanci Treebank).
Driver for the tagset of the Uyghur Dependency Treebank.
Driver for the tagset of the Hyderabad Urdu Treebank, as used in the CoNLL data format.
Driver for the Chinese tagset of the CoNLL 2006 & 2007 Shared Tasks (derived from the Academia Sinica Treebank).
A trie-like structure for DZ Interset features and their values.