Data is generally thought the be the lifeblood of 21st Century business, but the so-called “information age” is also fraught with risk. Cybercrime continues to grow rapidly, and evidence of fraud, abuse and criminal activity often resides on a multitude of electronic systems.

Most companies are still using rules-based queries and analytics tools to identify fraud, which rely on the individual to ask questions of the data, based on what is currently known. This approach requires both time and luck to uncover inconsistencies.

Ernst & Young’s forensic analytics team aims to provide its clients with new ways to identify, predict and reduce fraud and improve business efficiencies. By analysing both structured and unstructured data, the company claims to be able to take a proactive approach to fraud detection.

“We've got a lot of experience working with different organisations, and working with many different kinds of data, so within the realms of our own experience we tend to very quickly be able to spot when the fraud's being committed,” said Rashmi Joshi, Director of Ernst & Young’s IT forensics team.

“It's a question of looking at the data distributions. Mathematical algorithms are very effective at picking out any deviations from certain parameters in the data, so we can spot the unknown frauds as well as the known frauds.”

Commercial tools and bespoke algorithms

Ernst & Young uses a variety of commercial software tools in order to identify fraud. These include data visualisation technology from Tableau, SAS business analytics software, IBM's SPSS predictive analytics and data mining software from Megaputer.

“We're quite vendor agnostic. We don't have preferred partnerships, we just want to make sure we're using the right tool to analyse the data,” said Joshi.

In cases where a number of complex data sets need to be correlated together, the company also creates its own bespoke mathematical models and algorithms to pull out anomalies and predictors of fraud.

“It's one thing having lots and lots of data. It's another thing being able to recognise what data it is you actually need.”

Joshi said that analysing numerical transactional (structured) data can be very useful, but Ernst & Young also has a service know as Fraud Triangle Analytics, which enables clients to monitor their employees' emails through analysis of unstructured data.

The Fraud Triangle is based on the research of criminologist Donald Cressey, who said that in order for fraud to be present in an environment there have got to be three factors – rationale, incentive and opportunity. Together with the FBI, Ernst & Young has developed a library of 3,000 keywords associated with these factors.

If structured data indicates that fraud has been committed within an organisation and a certain group of employees is implicated, Ernst & Young is able to analyse the language used in their emails and uncover the sentiments that are involved. If keywords associated with all three factors are identified, then the likelihood is they are responsible for the fraud.

“We have a prototype that we developed on a subset of the Enron data, and very quickly through applying this we can see who the main culprits are through their fraud score,” said Joshi.

“So that's a good example of using structured data and unstructured data together. Through the combination of both you've got a very comprehensive way of tackling fraud.”

Social media can also be a valuable source of data. Ernst & Young uses social network analytics to examine suspects' social connections in order to establish their footprint in society. This can be very useful for social services, for example, to help them understand and identify benefit cheats.

The human factor

Joshi said that computers alone cannot do the job of analytics. Human manual intervention is also essential to provide interpretation, because otherwise there can be a tendency to extrapolate too far, and the findings become meaningless.

She gave the example of a recent piece of statistical analysis that Ernst & Young conducted in relation to a large comparison website, in order to predict the range of prices that customers would be willing to pay when presented with different pricing comparisons.

When performing the logistic regression analysis, Joshi's team noticed that the software was converging extremely quickly and, although all the green lights were on, it was clear that something was not right.

The team decided to investigate the underlying algorithms and discovered that there were problems with the convergence, which meant that it was not an optimum model, and therefore invalid.

“If you've got people who haven't got a strong mathematical or statistical background, they just wouldn't have spotted that, and they're making massive decisions on the basis of models that are just not optimal,” said Joshi.

“Because we're deep statisticians and we've got such strong skills in the team we were able to do that, and that's where human intervention is something we cannot do without.”

Some organisations now employ data analysts in-house, but this can be very costly and the end results are not always tangible. Joshi believes that  Ernst & Young's experience working with a variety of different organisations gives it a particularly rich perspective.

“Through looking at what's going on elsewhere, through looking at benchmarking, we can really provide insight. So they may be seeing a pattern that they've not seen before and we can say we've seen this before with another client or in another industry. That's the real value and that's where we tend to get a lot of our work.”

Beyond fraud detection

Ernst & Young's clients include NHS trusts, banks, insurance companies and government departments. However, Joshi said that there are opportunities for fraud in every organisation, and most of them now have the data to be able to use these kinds of services.

The uses of big data analysis are not restricted to fraud detection either, but can also be used to deliver insight and improve processes in other industries. Joshi herself previously worked in the health sector, working with patients and clinicians to identify the factors that relate to mortality and re-admission.

“When you look at the person's age, you look at the person's gender, you look at the stage of their tumour, you look at the thickness of the tumour; based on some variables, what's the probability of their survival going forwards 10 years? And therefore you can adequately plan for patient care.”

C-level executives can also use big data analysis to make key strategic decisions based on their projected return on investment. Using quantitative modelling, they now have a way of prioritising their spend because they can see where they are likely to see a loss in the future.

“At the boardroom level it really helps the organisation become more efficient but understand a lot more about what's going on in their organisations,” said Joshi.

Ultimately, however, the interpretation of the output is just as important as the way in which the data is analysed. A series of numbers will mean nothing to most people in the boardroom, so this data has to be translated in a way that is understandable from a business perspective.

“The keen analyst's eye should be able to fairly and impartially drive insights from those results, and by using technology to neatly categorise the data into tables or charts, business people should be able to look at it and immediately see the trends,” concluded Joshi.