Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Using old training data or test sets

2 minute read

Published:

In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.

Statistical significance testing

3 minute read

Published:

In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.

Simulating low-resource experiments

4 minute read

Published:

Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.

Comparing to previous work

1 minute read

Published:

What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying

portfolio

publications

talks

teaching