The Biggest Dictionary in the World

By now, we are all familiar with the use of machine translation to produce versions of a text in different languages.

Did you know, though, that scientists are also working on producing a massive multilingual dictionary with minimal human intervention? And when I say massive, I mean over 9,000 language varieties – in other words, all the languages in the world. The project, named Panlex, started in 2004, and there is nothing comparable to it. Google Translate, for instance, covers a meager 103 languages.

So how does Panlex work? The math is complicated, but the logic isn’t.

Example:
Let’s say you want to translate the Swahili word ‘nyumba’ (house) into Kyrgyz (a Central Asian language with about 3 million speakers). You are unlikely to find a Swahili–Kyrgyz dictionary; if you look up ‘nyumba’ in PanLex, you’ll find that even among its >1 billion direct (attested) translations, there isn’t any from this Swahili word into Kyrgyz. So you ask PanLex for indirect translations via other languages. PanLex reveals translations of ‘nyumba’ that, in turn, have four different Kyrgyz equivalents. Of these, three (‘башкы уяча’, ‘үй барак’, and ‘байт’) have only one or two links to ‘nyumba’. But a fourth, ‘үй’, is linked to it by 45 different intermediate language translations. You look them over and conclude that ‘үй’ is the most credible answer.

Nyumba

One of the beauties of this system is that it works as a semantic web, with no central hub. Most multilingual projects (including MT) have to rely on a pivot language (often English) in order to harvest translations between two minority ones (say, Basque and Quechua). With the Panlex model, this is not necessary; translations are validated without having to go through the English “funnel.”

Panlex network

I like this example of technology working for the little (language) guy.

Lil guy

And there is more.

Panlex’s sponsor, The Long Now Foundation, has some other interesting stuff in the works, like the 10,000 Year Clock or the Rosetta Project, all centered on making us take the really long-term view on the world. If you are local to the San Francisco Bay Area, you are in luck — you can explore their events at the Blog of the Long Now.

And if you enjoyed this article, please check other posts from the eBay MT Language Specialists series.