MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. MaltParser is developed by Johan Hall, Jens Nilsson and Joakim Nivre at Växjö University and Uppsala University, Sweden.

MaltParser 1.0.0 and later releases constitute a complete reimplementation of MaltParser in Java and are distributed with an open source license. The previous versions 0.1-0.4 of MaltParser were implemented in C. The Java implementation (version 1.0.0 and later releases) replaces the C implementation (version 0.x) and MaltParser 0.x will not be supported and updated any more.

Inductive Dependency Parsing

MaltParser can be characterized as a data-driven parser-generator. While a traditional parser-generator constructs a parser given a grammar, a data-driven parser-generator constructs a parser given a treebank. MaltParser is an implementation of inductive dependency parsing, where the syntactic analysis of a sentence amounts to the derivation of a dependency structure, and where inductive machine learning is used to guide the parser at nondeterministic choice points (Nivre, 2006). The parsing methodology is based on three essential components:

  1. Deterministic parsing algorithms for building labeled dependency graphs (Kudo and Matsumoto,2002; Yamada and Matsumoto, 2003; Nivre,2003)
  2. History-based models for predicting the next parser action at nondeterministic choice points (Black et al., 1992; Magerman, 1995; Ratnaparkhi, 1997; Collins, 1999)
  3. Discriminative learning to map histories to parser actions (Kudo and Matsumoto, 2002; Yamada and Matsumoto, 2003; Nivre et al., 2004; Hall et al., 2006)

MaltParser 1.9.2

MaltParser implements nine deterministic parsing algorithms:

MaltParser allows users to define feature models of arbitrary complexity.

MaltParser currently includes two machine learning packages (thanks to Sofia Cassel for her work on LIBLINEAR):

MaltParser can also be turned into a phrase structure parser that recovers both continuous and discontinuous phrases with both phrase labels and grammatical functions (Hall and Nivre, 2008a; Hall and Nivre, 2008b).