Using old training data or test sets

2 minute read


In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.

Statistical significance testing

3 minute read


In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.

Simulating low-resource experiments

4 minute read


Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.

Comparing to previous work

1 minute read


What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying