Abstract
Phrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capturelinguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrase-based andsyntactically informed statistical MT, in the form of a model that supplements conventional, non-hierarchical phrase-based techniques with linguistically informed reordering based on syntactic dependencytrees. The key idea is to exploit linguistically-informed hierchical structures only for those dependencies that cannot be captured within a single flat phrase. For very local dependencies we leverage the successof conventional phrase-based approaches, which provide a sequence of target-language words appropriately ordered and ready-made with any agreement morphology. Working with dependency trees rather than constituency trees allows us to take advantage of the flexibility of phrase-based systems to treat non-constituent fragments as phrases. We do impose a requirement-that the fragment be a novel sort of"dependency constituent"-on what can be translated as a phrase, but this is much weaker than the requirement that phrases be traditional linguistic constituents, which has often proven toorestrictive in MT systems.
Original language | English (US) |
---|---|
Pages (from-to) | 123-140 |
Number of pages | 18 |
Journal | Machine Translation |
Volume | 24 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2010 |
Externally published | Yes |
Bibliographical note
Funding Information:Acknowledgements We thank Chris Dyer and Adam Lopez for many helpful comments and discussions, and Jeremy Kahn for the use of his EDPM evaluation software (which, in turn, uses the parser developed by Eugene Charniak and Mark Johnson). This work has been supported in part by Department of Defense contract RD-02-5700 and the GALE program of the Defense Advanced Research Projects Agency, Contract No. HR0011-06-02-001. Any opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the sponsors.
Keywords
- Phrase-based translation
- Reordering
- Statistical MT
- Syntax