Researchers have built a proof-of concept security system that can accurately sift real websites from malicious ones by looking for only two simple giveaway ‘signals’ – content obfuscation and the use of bogus online certifications.

In a paper presented at last week’s RAID 2014 security conference, a team of engineers from security startup Lastline and the University of California Santa Barbara (USCB) set out to analyse a sample of 149,700 web pages, compiled to offer a representative sample of what the Internet might throw at a browser.

This sample was up of 81,000 apparently legitimate sites drawn from Alexa, plus a further 68,000 URLs drawn from the Wepawet online analyser, only a small percentage of which were likely to be malicious.

Sifting the bogus, fraudulent and plain dangerous sites form the harmless and legitimate ones would normally be a difficult proposition, largely because criminals sites go to some effort to hide from automatic crawlers by using programming elements such as Javascript that many ignore.

It turns out that Javascript is a good way of hiding because it imposes latency on a crawler and often requires human interaction for its purpose and intent to become clear, something no computer analysis system can easily ‘see’.

Equally while human beings can spot this sort of subterfuge on real pages they are easily fooled by a second technique, that of borrowing official seals and certification icons (for example ‘Verified by Visa’ or ‘VeriSign Trusted’) which help a site to look authentic when it is anything but.

The team discovered that the combination of these two techniques – deliberate content obfuscation and the use or over-use of seals – turned out to be a remarkably accurate predictor of a site’s malevolence.

“Of the 149,700 pages studied, we found that benign pages rarely exhibit these behaviours. We also uncovered hundreds of malicious pages that traditional malware detectors would have missed, including 400 rogue pharmacy websites displaying fake seals like those above,” said Lastline’s researchers.

The method produced very few false positives (i.e. sites that had these characteristics but turned out to be legitimate). A particular type of site that often used this approach were online pharmacies that posed as legitimate while being portals to malware, data theft, and counterfeit goods.

“Ultimately, we’ve determined that content obfuscation and the use of fake seals are both very strong signals for malicious intent.”

Although this sort of advance sounds arcane the ability to quickly and automatically detect the large number of new malicious websites that appear each day is a cornerstone of the fight against malware. Putting the team’s methodology to work could turf the criminals from under another damp stone.

Earlier this year, Lastline reported on the related struggle of antivirus software to spot brand new malware, with detection rates across an industry sample of suites only 50 percent on day one. A year ago the firm opened an office in London's Tech City.