Monday, November 23, 2009


Japanese blogger Chikirin writes that he would like to see automatic translations of everything on the internet to allow multinational communication, in an article that he kindly translated into English. He believes that this would facilitate world understanding and peace.
I'm more skeptical. I question the logistics of it. Sure, there's babelfish and google translator, but they're often tripped up by slang, idioms, and puns. Running this very page through services like that shows that they trip up on words like "Just" and "Kinda" (slang misspelling for "kind of", meaning "slightly.") Also, they can't do anything for graphics, because computers generally have comparatively poor visual recognition. (OCR can often fail because the page was tilted a mere 2 degrees.) You'd be shocked at how many pages use "navigation buttons" that consist of an image of a word, because the page designer liked it that way.
Secondly because communication doesn't necessarily make peace. How much worse would trolling become when nationalism is added to the mix? I still have memories of when the Beijing Olympics inspired nationalistic Chinese young people to go post puff-pieces about their favorite country and then recoil in horror when these got less than glowing reviews. (or even got outright trolled instead.) How many discussions would bog down to "China sucks" "No, japan sucks" "No, USA sucks" "No, Poland sucks" and so on until the heat death of the universe?
thirdly, Chikirin says that "only the important information is translated, what about the trivial?" The trivial information is typically not translated exactly because it is trivial. Good translation takes effort, and it's not really worth anyone's time to translate quite a bit of the internet. Human time is limited, and machine translations are at best stilted, and like I pointed out above, often just plain wrong.
Worse if you want to translate all the video, too, because Speech recognition has a hidden problem: The computer's never quite sure of what it is that you're saying, but is making the best probable guesses. Thus compounding any possible misunderstandings.

