Examples of some homograph domain names, which use internationalized characters to visually imitate popular brand names and sites. A recent study by Farsight Security discovered over 100,000 of these homograph domain names on the web.

Editor’s Note: Welcome to my weekly column, Virtual Case Notes, in which I interview industry experts for their take on the latest cybersecurity situation. Each week I will take a look at a new case from the evolving realm of digital crime and digital forensics. For previous editions, please type “Virtual Case Notes” into the search bar at the top of the site.

What is the difference between and аррӏе.com? If you’re using an updated computer browser with a standard font, you might not see any difference. You might expect both URLs to lead to the same place—Apple’s website. The first one does (you can copy and paste it into your address bar to see for yourself). But do the same for the second one, and you’ll be met with a surprise: a warning about a little-known but long-running phishing tactic that makes it easier for cybercriminals to trick unsuspecting users.

Homograph attacks are phishing schemes in which the phisher takes advantage of the ability to register internationalized domain names (IDNs) using non-Latin characters that look the same as Latin characters (such as some Cryllic or Greek characters, for example). Through this technique, attackers can create the otherwise-taken domain name of a popular brand (such as Apple or Amazon) out  of the lookalike characters available to them, and thus convince a user that they are visiting the brand’s site when in reality they are being phished by a well-disguised imposter.

It is nearly impossible to tell, but the аррӏе.com domain uses all Cryllic characters and no Latin ones (except for the domain extension .com). The only giveaway in this example is that the Cryllic letter “ӏ” may appear slightly shorter than the lower case Latin “l.” But some potential homographs might have no visible differences at all, especially if the phisher only replaces one or two characters with their homoglyph counterparts.

A screenshot of the page "аррӏе.com," set up by Johns Hopkins University student Xudong Zheng. At first glance, the typical user would likely not see any difference between the domain name in the address bar and the domain name of Apple's real website.

A study published Jan. 17 by Farsight Security revealed that this threat is currently going strong—the company observed over 100,000 homograph domains targeting a set of 125 popular domains such as Facebook, Yahoo, Microsoft and cryptocurrency exchange Poloniex. They showed some of these sites to be likely phishing sites, asking users to login and displaying the same logos and designs as their genuine counterparts.

“It’s all a question of what character sets are allowed in domain names,” explained Tim Helming, director of product management at DomainTools. I spoke to Helming last year about use of counterfeit domain names made to look like popular brand names through slight tweaks such as misspellings or the use of new domain extensions like .bike or .ninja. But homographs are different because they are more difficult to spot, harder to regulate, and enabled specifically through the existence of IDNs, which allow for non-Latin, non-ASCII domain names for domain owners to represent their site or brand in their own language.

While IDNs have an obvious practical use, cybercriminals take advantage of them by using them for social engineering—tricking users into thinking they’re visiting a safe and familiar site when in reality they may be falling into a malicious trap. According to Helming, homograph attacks have been around for over a decade, and reliable prevention solutions have been elusive, if not non-existent.

“There have been some efforts to give brand holders the first shot at the domains that correspond to their brands,” Helming said. “But the problem is, first of all, even if we don’t include the homoglyphs, there are so many possible permutations of a name that you just can’t realistically register all of them. And then, secondly, the homoglyphs just compound that.”

Helming added that, with hundreds of thousands of domain names being registered every day, any system to avoid the creation of counterfeit domains would have to be automated—yet teaching an automated system to detect homographs of thousands of possible brands would be tricky. And even if they could be detected, an automated system would have little way of distinguishing a good faith registration from a nefarious one.

“They would need a system that said ‘Okay, there’s something that’s Google-related here, now let’s figure out if the registered information that the person types in seems to correspond to Google,’” he said. “(But) there are some legitimate ways that somebody can use a brand name; if it’s sort of a form of protest or something like that for example (…) It’s actually an extraordinarily hard problem to prove that a registration is (legitimate).”

There are ways for companies to report when they’ve identified a site misusing their brand name, through a process called the uniform domain-name dispute-resolution policy (UDRP), but Helming describes the process as “lengthy and cumbersome.” And while one way to help the situation might be to “tighten up that process,” he says, the main way to protect against phishing will be in the hands of the end user.

While homographs are tougher to detect to the average person than many other phishing attempts, Helming identified one simple way users can test any suspicious link that comes their way.

“We always encourage people, if they get an email that has even a slight chance of being a phish, or if it’s an ad that they saw somewhere that looks too good to be true (…) hovering your mouse over the link is a good way to see what’s going on. And for these homoglyphs, in most browsers, when you do that it will actually show that the domain you’re going to (is an IDN),” he explains.

A screenshot of a draft page I created on Forensic Magazine's website displaying the homograph "аррӏе.com" link. With the cursor hovered over the link, a preview at the bottom of the screen displays the underlying Punycode translation of the internationalized Unicode characters.

This is because these link previews will often not display non-ASCII Unicode characters and instead convert them to something called Punycode, which will start with "xn--" followed by what may look like a string of random Latin characters. For example, аррӏе.com looks like in Punycode format. Hover over this link to see for yourself: аррӏе.com.

“I do think that ultimately the best thing here for all of us is going to be on the protection-of-end-users side of things, because the prevention of the registrations is tricky,” Helming concluded.