The files engmalt.poly-1.7.mco and engmalt.linear-1.7.mco contain single malt configurations for parsing English text with MaltParser, version 1.6 or later. The two models differ in that engmalt.poly-1.7.mco uses SVMs with a polynomial kernel for classification, while engmalt.linear-1.7.mco uses linear SVMs. While the latter parser is much faster, the former requires less memory, and parsing accuracy is similar for the two models. The parsing models have been trained on sections 2-21 of the Wall Street Journal section of the Penn Treebank extended with about 4000 questions from the QuestionBank, converted to dependency trees using the Stanford Parser. The parsers presuppose that the input is in CoNLL format and tagged with the Penn Treebank part-of-speech tagset and outputs Stanford typed dependencies (in the basic variety where every dependency structure forms a tree).
Note: These models can be used for research purposes provided that you have a license for the Penn Treebank. If you want to use them for commercial applications, please contact the Linguistic Data Consortium to find out which conditions apply.
prompt> $ java -Xmx1024m -jar maltparser-1.7.jar -c engmalt.poly-1.7 -i infile.conll -o outfile.conll -m parse
prompt> $ java -Xmx1024m -jar maltparser-1.7.jar -c engmalt.linear-1.7 -i infile.conll -o outfile.conll -m parse
be replaced by the names of your input and output files.
For more information, see the MaltParser user guide.
Here is an exemple of an input sentence:
1 Pierre _ NNP NNP _ 2 Vinken _ NNP NNP _ 3 , _ , , _ 4 61 _ CD CD _ 5 years _ NNS NNS _ 6 old _ JJ JJ _ 7 , _ , , _ 8 will _ MD MD _ 9 join _ VB VB _ 10 the _ DT DT _ 11 board _ NN NN _ 12 as _ IN IN _ 13 a _ DT DT _ 14 nonexecutive _ JJ JJ _ 15 director _ NN NN _ 16 Nov. _ NNP NNP _ 17 29 _ CD CD _ 18 . _ . . _
Note that the columns are tab-separated and that columns 4 and 5 should have same value.