MaltParser

English MaltParser models

The files engmalt.poly-1.7.mco and engmalt.linear-1.7.mco contain single malt configurations for parsing English text with MaltParser, version 1.6 or later. The two models differ in that engmalt.poly-1.7.mco uses SVMs with a polynomial kernel for classification, while engmalt.linear-1.7.mco uses linear SVMs. While the latter parser is much faster, the former requires less memory, and parsing accuracy is similar for the two models. The parsing models have been trained on sections 2-21 of the Wall Street Journal section of the Penn Treebank extended with about 4000 questions from the QuestionBank, converted to dependency trees using the Stanford Parser. The parsers presuppose that the input is in CoNLL format and tagged with the Penn Treebank part-of-speech tagset and outputs Stanford typed dependencies (in the basic variety where every dependency structure forms a tree).

Note: These models can be used for research purposes provided that you have a license for the Penn Treebank. If you want to use them for commercial applications, please contact the Linguistic Data Consortium to find out which conditions apply.

Running engmalt.poly and engmalt.linear

Download engmalt.poly-1.7.mco or engmalt.linear-1.7.mco into your working directory and execute the following command:

prompt> $ java -Xmx1024m -jar maltparser-1.7.jar -c engmalt.poly-1.7 -i infile.conll -o outfile.conll -m parse

or

prompt> $ java -Xmx1024m -jar maltparser-1.7.jar -c engmalt.linear-1.7 -i infile.conll -o outfile.conll -m parse

where infile.conll and outfile.conll should be replaced by the names of your input and output files.

For more information, see the MaltParser user guide.

Here is an exemple of an input sentence:

1    Pierre    _    NNP    NNP    _
2    Vinken    _    NNP    NNP    _
3    ,    _    ,    ,    _
4    61    _    CD    CD    _
5    years    _    NNS    NNS    _
6    old    _    JJ    JJ    _
7    ,    _    ,    ,    _
8    will    _    MD    MD    _
9    join    _    VB    VB    _
10    the    _    DT    DT    _
11    board    _    NN    NN    _
12    as    _    IN    IN    _
13    a    _    DT    DT    _
14    nonexecutive    _    JJ    JJ    _
15    director    _    NN    NN    _
16    Nov.    _    NNP    NNP    _
17    29    _    CD    CD    _
18    .    _    .    .    _

Note that the columns are tab-separated and that columns 4 and 5 should have same value.