Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
This is a page not in th emain menu
Evaluating machine translation systems is not as obvious as it seems on first glance.
In MT research there is a long-standing tradition of using old data sets when newer versions are available for the same language pair and domain.
In recent blog posts I have described many potential issues with MT evaluation. Surely statistical significance testing should help mitigate some of those problems? That may seem reasonable, but the truth is: it can be laughably easy to arrive at results that are statistically significant according to the most popular test in MT research, bootstrap resampling.
Consider a very simple example of a table reporting BLEU scores:
Low-resource machine translation has been an active area of research for years. On a high level, what many papers on low-resource MT have in common is that they simulate low-resource scenarios.
BLEU scores are ubiquitous in MT research and they usually appear in tables that look like this:
What is common between many of the questionable practices I blog about is that they are seemingly legitimized by saying
Short description of portfolio item number 1
Short description of portfolio item number 2
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.