What's in a name? A heck of a lot, if you know what to look for, says Jack Hermensen, the CEO and co-founder of US-based Language Analysis Systems (LAS). It can probably - though not always - tell you what gender its owner comes from, and perhaps which culture too. In some cases, it can even identify the country of origin.
And even if you're not a law officer looking for known criminals or terrorists, there can still be difficulties from getting a name wrong. Who hasn't had a letter or email from someone with an unfamiliar foreign name and not known whether to address the reply to Mr or Mrs, for example, or which was their personal name and which their family name?
These differences are common when names in languages that use other scripts are transliterated into Roman characters, Hermensen says. For example, the same Arabic name could be written quite differently in English by an Egyptian and a Saudi, while the same Korean name can be transliterated as Ryoo, Yoo, Ryo, Lyu or Lew, depending on its owner's preferences.
With a doctorate in computational linguistics and a minor in Chinese under his belt, Hermensen set up LAS in 1984 to help the US State Department manage its watch-lists of names. At that time, the authorities were still using a name-coding scheme called Soundex, which dates back to work done for the 1890 US census.
"Prior work on names had all been key-based, such as taking the first letter, removing the vowels, and then coding the remains," he says. Even within a single language this can easily give the same key for different names, which means poor recall or a poor decision. Worse, a simple typo can produce a greatly different key, allowing a suspect to slip by.
He adds that when you consider that the same name can be written differently in different languages or dialects, "Keys can give outrageous numbers of false hits. The old Soundex scheme codes on consonants, not pronunciation or how the name is compounded, it also can't handle random errors or typos."
This makes it quite easy for people to confuse key-based automated name recognition systems, Hermensen says - an Arabic speaker just has to spell their name in a different dialect, say, and the key generated is quite different too. "There's a US bill demanding standardisation of Romanisation of names, but that can't work," he says.
He adds that it's important to remember that names are not just about crime and terrorism - there's commercial uses too. Security is a big focus, fighting money laundering and fraud. For example, the US Treasury's Office of Foreign Asset Control publishes a list of people and businesses that it bans institutions from dealing with, and not spotting a variant or mistyped name could be expensive, to say the least.
Name your customers correctly
Name recognition also has uses in CRM and marketing, where it can help people understand other cultures, address customers correctly and identify prospects, Hermensen says. He adds that it's used by airlines too - some people make multiple airline bookings under slight variants of their names that humans would accept as being the same but which a computer might not, and then they only take up one of these. If the airline can recognise these multiple bookings and do 'flight firming' to cancel the surplus, as Hermensen says Continental does, it can get a significant return on its investment.
Until three years ago, LAS was purely a US government consultancy, with a software toolkit for various name recognition tasks, but it then took the decision to go commercial in order to get better funding and provide more support, he adds: "It was a case of taking our toolkit and turning it into products - we had seen the target environments and knew what those products would need to look like.
"We built an encyclopaedia explaining how names work in each culture, there were also tips on how to search. But Border Control said they'd no time for that, they just needed a fast decision, so we built a parser which identifies or suggests culture, gender, source countries, and so on, and also gives a string of possible variants.
"The name classifier is the hardest part, for instance telling Hispanic from Portuguese, or German from Dutch or Scandinavian. The hardest thing for a computer to say is 'I don't know', but we have that now in version 2 of our software."
Thus far, the range of names covered by LAS is a bit limited, but as Hermensen says, the company has focused on the cultures that its customers - mostly in the US government - have asked for. For example, the list includes Asian and Arabic languages, plus Russian and now West African too, starting with Yoruban.
Nom de guerre, nom de plume...
So what can you learn from a name, and isn't it easy to confuse the system by adopting a false one? More to the point, how much can we rely upon the name that some one gives us?
"There's an enormous amount you can learn about a person just from their name," Hermensen says. "Most cultures don't give away gender from the surname, but some do, for example. When you play around with this you learn about other cultures. It begins to break down stereotypes.
"It helps law officers make better searches, but it also helps them with their links to the community - using their names properly is one of the best ways to show people you understand their community."
Name order is just one example - in Korean, say, the family name comes first, which is why Kim Il Sung's son is named Kim Jong Il, but the same is also true in parts of central Europe such as Hungary, although this is changing now under western European influence.
"The US tends to ask for your middle name, but other cultures may not have them," Hermensen adds. "For example, Hispanic names typically go forename - patronymic - matronymic." And he argues that false names are rarely the problem that people imagine they might be.
"People are very resistant to changing names," he says. "Intelligent people will take a false surname but tend to keep the first name, partly because you respond to hearing your name called out - and partly because everyone in the world understands how easy it is to circumvent border control anyway, except Americans."
A name alone is not enough
It might seem strange given his line of business, but he also warns against assigning too much value to a name - or, for that matter, to a biometric. His software will tell you what it can about a name's origins and meaning, and give a list of variants or alternative spellings, but it cannot prove anything.
"A name is not an identity, it's just a name," he explains. "We are saying here are other names you might want to consider - then you have to use corroborating information to assign identity.
"It's complementary to biometrics - data is organised by names, not biometrics. Biometrics aren't sophisticated enough to provide entry into a file, and more importantly, they're only good once you meet someone a second time.
"We also have data cleaning tools - they are critical. No-one says 'garbage in, garbage out' any more, but all databases still have a layer of sludge at the bottom. Our tools can pull out unrecognised names, say - we've seen a lot of names, so if we don't recognise them they're probably junk."