Posts by Tags

best practices

Using old training data or test sets

2 minute read

Published:

In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.

Statistical significance testing

3 minute read

Published:

In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.

Simulating low-resource experiments

4 minute read

Published:

Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.

Comparing to previous work

1 minute read

Published:

What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying

bleu

Using old training data or test sets

2 minute read

Published:

In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.

evaluation

Using old training data or test sets

2 minute read

Published:

In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.

Statistical significance testing

3 minute read

Published:

In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.

Simulating low-resource experiments

4 minute read

Published:

Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.

Comparing to previous work

1 minute read

Published:

What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying

human evaluation

low-resource

Simulating low-resource experiments

4 minute read

Published:

Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.

machine translation

Using old training data or test sets

2 minute read

Published:

In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.

Statistical significance testing

3 minute read

Published:

In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.

Simulating low-resource experiments

4 minute read

Published:

Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.

Comparing to previous work

1 minute read

Published:

What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying

parity

previous work

Comparing to previous work

1 minute read

Published:

What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying

randomness

recommendations

reimplementation

Comparing to previous work

1 minute read

Published:

What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying

significance testing

Statistical significance testing

3 minute read

Published:

In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.

variance