Humans could soon learn the date on which they will die through the prophecies of AI, after scientists discovered that it predicts the risk of premature death more accurately than the standard prediction models developed by humans.
The researchers used machine learning algorithms to analyse demographic, biometric, clinical and lifestyle data on over half a million people aged 40-to-69 who were recruited to the UK Biobank study to predict which of them would die prematurely from chronic diseases. The forecasts were then compared to the results of the 'Cox regression' model based on age and gender that is typically used to make these predictions.
The team's deep learning and random forest techniques proved to be 9.4 percent and 10.1 percent respectively more accurate than the Cox regression model at identifying the 14,500 participants who had died since entering the Biobank.
Dr Stephen Weng, the study's lead researcher, says the biggest difference in his system is it takes a less guided approach than the traditional method, which takes account of prior clinical evidence on how different factors affect the risk of death.
"It's taking a step back from such a guided human input," Dr Weng, an assistant professor of epidemiology and data science at the University of Nottingham, tells Techworld.
"You're one step removed from the process, so when the algorithms are training, you don't really have control over how it makes those predictions, and what the models that the computers are developing to fit that data look like.
"Initially, it makes the wrong predictions. It's a trial and error process. But when the algorithm finds that the data doesn't fit, it says this is not a good model, and it strikes that model off its list of potential models. As it's learning, it's optimising, and getting better and better, and the more data you have, the better it becomes. That's why in the era of big data and these large population cohorts, these types of approaches really do shine."
The random forest algorithm broke the data on factors ranging from alcohol consumption to sunscreen usage into progressively smaller subsets to find understand the influence that each could have on the person's death, while the deep learning technique processed them layers until arriving at a decision.
"Traditionally, using a statistical approach, we know exactly how that model's being constructed, because we build that model and we build those relationships. With these approaches, we don't," says Dr Weng. "That's the main difference between an unguided versus a human-specified approach.”
The deep learning technique flagged the more surprising signals as it was not numerically-driven but it was harder to understand the impact of individual inputs because the researchers could not interpret them in isolation.
These meant deep learning identified more public health-related factors such as air pollution exposure that are normally masked by the effect of other variables, while the random forest model emphasised more numerical inputs such as waist circumference.
Dr Weng believes these systems could play a major role in addressing the issues that cause premature death, but that great care must be taken over how they're implemented and what they will be used to decide.
"They can be very useful tools for prevention; they also could be used for commercial gain as well," he says. "My approach is to be as transparent as possible about the development of these things, hence we publish the full material and the full coding structure."
He adds that before these models are used for any real-world applications, they would need to be trained on a dataset that is representative of the public as the Biobank participants have some atypical characteristics that impact their health outcomes such as high level of education.
There is also a risk that it could be used by insurers to discriminate against people based on their health risks, but Dr Weng believes they could also have a positive effect on people who want insurance.
"Traditionally insurers use models that are very clunky and not entirely sophisticated methods, and if we look at the traditional models that are being used, they tend to over-predict risk, as in they probably decline more people than they accept," he says. "This could be the converse - that actually more people can have access, because with a more precise measurement of assessing future risk, we might actually can get more people.
"When I looked at the regression models based on an age and gender model, one of the calibration models was way over-predicting, and most insurance-based models are based on life tables that are mostly driven by age and gender. So my thinking is it might actually increase the number of individuals, but that has to be tested.
"I think insurers are risk-averse. They tend to decline more people than they offer, because they use a conservative type of estimation and certainly the way it's going for risk estimation, risk algorithms and risk calculations is that the more precise we can get it, we can actually probably put more people on coverage."