Open Source Machine Translation Engines?

旧巷老猫 提交于 2019-12-02 17:42:28

This question is better asked on the Moses mailing list (moses-support@mit.edu), I think. There are lots of people there working with different types of systems, so you'll get an objective answer. Apart from that, here's my input:

  • With respect to Java: it does not matter in which language the MT system is written. No offense, but you may safely assume that even if the code was written in a language you were familiar with, it would be too difficult to understand without a deeper knowledge of MT. So what you are looking for are interfaces. Moses's xml-rpc works fine.
  • With respect to MT systems: look for the best results, ignore the programming language it is written in. Results are here: matrix.statmt.org. The people using your MT system are interested in output not in your coding preferences.
  • With respect to the whole venture: once you start offering MT output, make sure you can adapt it quickly. MT is rapidly shifting towards a pipeline process in which an MT system is the core (and not the only) component. So focus on maintainability. In the ideal case, you would be able to connect any MT system to your framework.

And here's some input on your feature requests:

  • Domain-specific training: you don't need that feature. You get the best MT results by using customer specific data training.
  • Incremental training: see Stream Based Statistical Machine Translation
  • Parallelizing the translation process: you will have to implement this yourself. Note that most MT software is purely academic and will never reach a 1.0 milestone. It helps of course if a multi-threaded server is available (Moses), but even then, you will need lots of harnessing code.

Hope this helps. Feel free to PM me if you have any more questions.

A lot has been moving forward, so I thought to give an update on this topic, and leave the previous answer there to document the progress.

Domain-specific training: domain adaptation techniques can be useful if your data is taken from various sources and you need to optimise towards a sub-domain. From our experience, there is no single solution that consistently performs best, so you need to try out as many as possible approaches and compare results. There is a mail on the Moses mailing list that lists possible methods: http://thread.gmane.org/gmane.comp.nlp.moses.user/9742/focus=9799various. The following page also gives an overview of the current research: http://www.statmt.org/survey/Topic/DomainAdaptation

Incremental training: there was an interesting talk on IWSLT 2013: http://www.iwslt2013.org/downloads/Assessing_Quick_Update_Methods_of_Statistical_Translation_Models.pdf it demonstrated that current incremental methods (1) take your system offline, so you have no real "live-update" of your models (2) are outperformed by full re-trainings. It seems that the problem has not been solved yet.

Parallelizing the translation process: the moses server lags behind on the moses-cmd binary. So if you want to use the latest features, it is better to start from moses-cmd. Also, the community has not kept its promise of never releasing a 1.0 version :-). In fact, you can find the latest release (2.1) here: http://www.statmt.org/moses/?n=Moses.Releases

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!