In 2014, Sports Illustrated placed a bold prediction on its front cover. Below a picture of Houston Astros outfielder George Springer swinging his bat was the promise: "Your 2017 World Series Champ".

The Astros had finished the 2014 season as one of the worst teams in the league, and many dismissed the claim as clickbait.

Image: iStock/iLexx
Image: iStock/iLexx

Three years later the prediction proved true. Springer was named the 2017 World Series Most Valuable Player (MVP). The words above him on that prophetic cover gave a hint of how it happened: "Baseball's Great Experiment".

The Astros had added a new ingredient to the tried and tested recipe of great coaching and scouting: data analytics.

On the pitch, the jocks still win the games, earn the big bucks and get the fame. However, in the back-office, the nerds are inheriting the earth, developing algorithms that help determine everything from which players to target, to ticket prices.

Player recruitment

Investment in analytics is a long-term plan rather than a quick fix, as the Astros wait for a title proved. Baseball has long been the flagbearer for the sports analytics movement, with teams using the techniques many years before the Astros won their unlikely title.

Michael Lewis brought the practice to the wider public's attention with his 2003 book Moneyball: The Art of Winning an Unfair Game. In 2011 the book was adapted into an Oscar-nominated film starring Brad Pitt.

The story broadly charts how Oakland Athletics general manager Billy Beane shook up the player recruitment strategy of the then-underperforming team, by emphasising the importance of stats ahead of scouts.

Analytics in the AFL

His approach has since then been adapted across the sporting world, from the English Premier League to the Australian Football League, where Daniel Pelchen, a performance analyst at Collingwood FC, has used it to dramatically change how the team assesses the value of players.

Every year in the Australian Football League (AFL), teams take turns drafting new recruits to their clubs from a pool of unsigned players. Pelchen wanted to review the past picks to evaluate whether the recruiters had made the right choices.

In 2000, players who were picked early in the draft didn’t play as many games as those picked later than their value would have suggested. By 2008, the difference had grown, as recruitment had become more efficient and great picks late in the draft became harder to find. The main reason for that change was that the 2004 draft had been the first time that recruiters had been exposed to stats from the under-18 level.

"When that happened, recruiters effectively doubled the efficiency at which they were picking the best player in the right order," said Pelchen at the Sports Analytics World Series in November.

He analysed the 873 players that were picked in each draft from 2001-2010, leaving them at least five years of maturation to develop their potential.

Success was measured through a combination of games played (70 percent), performance awards (20 percent) and whether they were picked for the All-Australia selection of the season’s top players (10 percent). These were totalled to create a composite success score. He then assessed the different individual characteristics that contributed to this score.

There were around 150 variables per player, including their body composition, physiological attributes, training background and on-field performance. His team predicted the over-under valuation of players at draft level.

Their success scores were mapped against the number they were picked at in the draft to understand the return on investment.

Dichotomania

The difference between success and failure is more complicated than the obsession with binary classifications though. This is what Pelchen’s compatriot Darren O'Shaughnessy refers to "dichotomania". This mentality often leads to incorrect conclusions.

Take pass completion rate. This stat is a more effective measure of a player's appetite for risk than of their skill. That player’s pass could have been a bad one but still part of a sequence of play that led to a goal. The result doesn’t determine the value of every action that led to it.

"If you were part of a sequence of play that led to a goal, then in your analytics notebook, that piece of play is gold," says O'Shaughnessy, an analytics consultant at the AFL's Hawthorn FC. "It led to a score so it must have been good.

"In this world of dichotomies, the way that a lot of elite coaches and elite players think is the way we played in a win, must have been better than the way that we played in a loss. We know that's simply not true."

Coaches make more changes after any loss than any win, even if the loss was to a great team after an outstanding performance. Luck often plays the decisive role in a result.

The paradox of skill

Physical processes have a predictable natural variation. A tennis shot is classified as positive if it wins the point, even if it was a poor decision and a mishit that landed in a corner out of luck. The result can be uninformative. Sport has unnatural payoffs and not everything can be controlled. 

In a contest between two elite competitors, luck is more likely to play the decisive factor in the result of a contest than small differences in skill.

Michael Mauboussin, the head of global financial strategies at Credit Suisse, calls this phenomenon the “paradox of skill”.

"The difference in skill plus luck gives you that scoreboard result,” says O'Shaughnessy. “You can then turn that around a bit and measure some of the sources of luck and subtract them from the scoreboard, and get the residual as a difference in skill."

Counterfactual analysis reveals that the amount of chance goes unreported in statistics.

O'Shaughnessy uses the example of centre bounces in Australian Rules Football. This is where the referee bounces the ball high into the air for two opposing players to go after posession.

Teams employ full-time ruck coaches and establish detailed strategies for centre bounces, but data analysis showed the difference about who wins and loses each one could be guessed by a coin toss. Understanding that all their time and money made almost no difference allows them to invest their resources elsewhere.

Injury risk analysis

The days lost to injury is increasing in every major sport and the cost of injuries keeps growing. In the English Premier League, injuries cost each club an average of $336 million (£250 million) in salary per season, and lost them 16 percent more playing days from 2000-15, says Stephen Smith, CEO of Kitman Labs.

His company developed an "athlete optimisation system" to help clubs reduce the risk of injury.

Their work with the Houston Dynamo team in Major League Soccer (MLS) led to a 76 percent decrease in player unavailability from 476 to 115 days, an 88 percent reduction in strains and sprains, and a 63 percent decrease of in-game injuries, from 11 to four. The Dynamo ended the season in fourth place, after finishing the previous one at the bottom of the league.

Kitman Labs analysed data on hamstring injuries to understand the risk factors. The findings revealed that a change in muscle activation patterns resulting from an increased range of motion could cause a higher risk of hamstring injury.

An increase in the player’s average score on a sit and reach hamstring stretch was associated with a 3.79 times higher rate of hamstring injuries in a game. There was, however, no correlation for hamstring injuries that occurred during training, perhaps due to the greater quantity of higher speed running during games.

The probability of different injuries occurring change as a season progresses, due to fluctuations in physical fitness and appetite for risk.

For example, Manchester United would likely be more prepared to risk an injury to a bench warmer than a star like Paul Pogba, while Pogba is less likely to push his body to its limits during preseason than a cup final.

There is an enormous level of granularity in injury analysis, requiring massive computation that only machines can process, but Smith believes that important decisions should be left to humans due to their domain experience and knowledge.

Read next: How STATS Edge wants to solve Premier League clubs' analytics problem with AI

If the coaches don't understand the black box they can't use it to make decisions. They need to know how the prediction and risk analysis were calculated if they are to trust it.

"We look it at as an augmented model," he explains. "The computers and machine learning should be utilised to filter through the information as quickly as you possibly can and get the insights back out, but the humans are absolutely there to take that information and then to turn that into decisions and to drive that forward."

Meanwhile in the boardroom

The exact effects of analytics on sporting performance still cause debate, but their benefits on business operations are uncontentious.

They are also becoming more important as the football business changes and fans become more digital, social, and mobile.

The typical revenue model is comprised of match day (25 percent), broadcasting (35 percent) and commercial (40 percent). Digital is a fourth revenue stream that also boosts the others.

Clubs are increasingly becoming media companies, driven by data to produce "sportainment".

Social media analysis reveals that players can attract more fans than their clubs. Take PSG's recent star signing, Brazilian international Neymar. He has 180 million followers across the major social media platforms, compared to the 50 million of his club.

It's safe to assume that commercial value would have influenced PSG's decision to buy him at a record-breaking transfer fee of €222 million (£200m).

Clubs can now use data on their fans to segment them by factors including location, age and sex. Store sales, hospitality revenue from bars and cafes and season ticket holders can also be monitored, to understand where and when sales are being made in relation marketing campaigns.

They could then decide the best location for their retail stores based on the demographics, location and behaviour of their fans, or boost their ticket sales by analysing past results and balancing the profit margins and purchases through predictive analytics.

Mixed results

Back on the pitch, the potential of analytics is still up for debate. Dynamic sports driven by team interactions are harder to measure than those which involve individuals repeating identical situations, and intangible qualities such as mentality are hard to measure. But when it works, the return on investment can be huge.

In 2012 Arsenal Football Club paid a reported £2 million for American sports analytics company StatDNA. The club had high hopes that the acquisition would transform their fortunes through bargain signings, but the mixed fortunes of Lucas Pérez, Mohamed Elneny, Shkodran Mustafi and Gabriel showed analytics isn’t a panacea.

Even analytics poster boys the Houston Astros had to break the sabermetric rules to win their title, by investing in a core of young players supported by senior teammates who engendered a positive environment, even if they didn't fit the analytics ideology.

Most clubs nonetheless recognise that data can create insights that humans can’t replicate. As Moneyball's protagonist Billy Beane once put it: "The idea that I trust my eyes more than the stats, I don't buy that because I've seen magicians pull rabbits out of hats and I just know that rabbit's not in there."