Regression to the mean and its implications

An ubiquitous statistical phenomenon that affects our everyday lives, but which most of us are unaware of.

Deepak Dilipkumar
Towards Data Science

--

The LA Dodgers, Los Angeles’ resident baseball team, appeared on the cover of Sports Illustrated’s August 28, 2017 issue following a run of tremendous form. They had won 71% of their games and were on pace to tie the record for the most wins in a season. The cover came with the caption “Best. Team. Ever?”

The team then went on to lose 17 of their next 22 games, and would eventually lose in the World Series to the Houston Astros. This is just one example of the notorious Sports Illustrated cover jinx, an urban legend that apparently causes teams and athletes who appear on the cover to immediately experience a run of bad form.

The jinx goes all the way back to Sports Illustrated’s very first issue, featuring Eddie Mathews. His team was on a 9-game winning streak prior to his appearance on the cover, but they went on to lose their next game and he soon picked up an injury that caused him to miss 7 more games. The jinx is not limited only to baseball however — it afflicts athletes across sports. Recent victims of the jinx include Serena Williams, Conor McGregor, Luis Suárez, Lindsey Vonn and Tom Brady.

There have been a number of potential explanations for this. For instance, it could be a psychological effect — appearing on the cover causes the team or athlete to feel more pressure to perform, and could result in them losing focus. However, there was an interesting variation to the jinx recently, in the November 2019 issue featuring the SF 49ers. When the story had been written, the 49ers were on an 8–0 winning streak, and they went on to lose their next game. However, this loss happened two days before the issue was published. This indicates that there may be something more to the jinx.

In reality, the jinx is just the consequence of a very simple but far-reaching statistical phenomenon called regression to the mean. It occurs when a random variable is extreme on its first measurement but closer to mean (hence “regressing”) on its second measurement, or vice versa. But what does this have to do with the world of sports publications?

Essentially this just says that a team or athlete’s performance at any point in time is a random variable — underlying skill is no doubt a very important factor, but it depends at least partly on some notion of noise or luck that we can’t predict. A team is only featured on the cover after the first measurement of the random variable (the team’s performances before the cover) have been extreme. On the second measurement, after the cover is published, the team is more likely to “revert” to the mean, and it looks like the cover appearance has jinxed them.

Galton’s original interpretation

Regression to the mean was first discovered by Sir Francis Galton, a renowned statistician, in the late 1800s. In 1886 he conducted a study investigating the relation between the heights of children and their parents. He measured the heights of 928 adult children and the corresponding 205 parent couples, and after accounting for the height difference between men and women (multiplying every woman’s height by 1.08), he noticed something interesting. The heights of children tended to be less extreme (closer to the mean) than the average of their parents’ heights. So if the parents were unusually tall, the child was typically shorter. If the parents were very short, the child tended to be taller. You can see this on the graph below, with the dotted line being less steep than the line of equality:

Source: Article by Stephen Senn in the Royal Statistical Society’s Significance magazine

It’s tempting to think of a genetic explanation for this, and this is in fact what Galton suggested. He said that a child’s genetic makeup consisted of a sort of exponential average of all their ancestors. Since distant ancestors are shared by more people, you end up moving towards the mean population height instead of having your height determined purely based on your parents’ genetics. But we now know that this isn’t true — people get their genetic makeup only from their parents. And there was another unusual effect in Galton’s experiment. Not only were children less extreme in height than parents, the reverse was also true! Children with extreme heights tended to have parents whose heights were closer to average, as we can see on the graph below. This makes genetic arguments harder to justify, and in fact the effect is purely statistical.

Source: Article by Stephen Senn in the Royal Statistical Society’s Significance magazine

The intuition behind regression to the mean

Regression to the mean occurs when you compare any two variables that aren’t perfectly correlated. Here’s an example to understand what that means.

Let’s say we chose a 100 people and had them each flip a coin 10 times. Basic probability suggests that across all 1000 flips, we would expect roughly half of them to be heads.

Now, let’s say we chose the 20 people who had the most heads in this first round, and moved them to a second round where they again had to flip a coin 10 times. Across these 200 flips, what would we expect? Would we expect these people (who are “good” at getting heads) to get more heads than tails? Not at all! Across the new set of 200 flips, we would expect roughly half of them to be heads again. The average of this group of outliers has now “regressed” back to the mean.

This was an extreme example, where the two variables being measured (number of heads in the first round and number of heads in the second round) were not correlated at all. Let’s consider the other extreme, where the two variables are perfectly correlated. Let’s say we measured the temperatures of a 100 places around the world in Celsius. We then pick the top 20 hottest places, and then measure their temperature again in Fahrenheit. In this case what would we see? These places would continue to be hotter than the average Fahrenheit temperature, and there would be no regression to the mean! This case of perfect correlation between variables (given a temperature in Celsius, we know the exact temperature in Fahrenheit) is the only situation in which we would not see regression to the mean.

These two examples may seem very obvious, and it may not be clear what the connection is to the SI cover jinx or Galton’s heights, but that’s only because most instances of regression to the mean occur with variables that aren’t in either of two extremes we saw — they will be somewhat correlated.

The heights of parents and the heights of their children are certainly not perfectly correlated. Clearly, a child’s height depends on factors apart from their parents’ height. So regression to the mean is guaranteed to occur. However, the heights are also not completely independent — due to the underlying genetics, there is likely to be some correlation. Hence the effect won’t be as extreme as the coin toss example, where the regression in fact goes all the way back to the population mean. So in this situation of partial correlation, there is partial regression to the mean.

Most cases we would see in real life are similar to this, where there’s an underlying fixed element that partially correlates the two variables, and some unknown effect or noise or luck that makes them less correlated. This is why regression to the mean is so hard to spot in real life examples.

Examples in the real world

Regression to the mean isn’t just a weird statistical phenomenon that affects sportspeople. It has very real implications to real world policy decisions. One example — prioritizing different ways to reduce traffic accidents. One method that UK law enforcement considered was to install speed cameras at intersections that had a high incidence of accidents in the recent past. After doing this, they noticed that the frequency of accidents went back down, presumably due to people slowing down because of the presence of the cameras. But was that the only reason? It’s likely that the places where the cameras were set up had just been more extreme in terms of accidents than they usually were, as the intersections were chosen based on recent accidents. So these intersections regressing back to the mean was at least partially a reason for the decrease in accidents. Separating the two effects is important when deciding whether to set up more speed cameras or try a different approach instead.

Another interesting example is the case of improvement scores in Massachusetts schools in 1999. Schools were set goals that aimed to improve student scores from 1999 to 2000. When they looked at the results the following year, they noticed that the worst performing schools from 1999 had met their improvement targets — but some of the best performing schools had failed! This was likely another instance of regression to the mean, with both the worst and best schools moving back towards the average, appearing to be improvements in the case of the worst schools and failures in the case of the best schools.

This actually raises a deep philosophical point about human nature and society, best summarized by Daniel Kahneman, author of “Thinking, Fast and Slow”:

“This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them.” — Dan Kahneman

Accounting for regression to the mean

Regression to the mean is hard to notice and even harder to quantify, but it’s clear that it cannot be ignored. The repercussions could be particularly dangerous in medical studies. It’s common for participants in medical trials to have to satisfy certain criteria — for instance, if you want to test the efficacy of a new drug in combating a particular illness, then the participants will probably be chosen from the pool of people who have already had the illness for a while. It’s possible that these people are currently “extreme” examples, and that they will on average regress to the mean and improve irrespective of their treatment.

Medical studies and other similar experiments commonly have control groups to account for this. The control group is given no treatment (usually in the form of a placebo), and the treatment group is actually given the new drug. Now as long as both of these groups are drawn from the same population (say, people who have had the illness for at least 2 years), regression to the mean effects will be present in both groups, and the difference between the groups will be due to the actual impact of the drug!

This actually raises an interesting question. At least part of the reason for the presence of a control group is to account for the placebo effect — the control group may improve without actually receiving a treatment as long as patients believe they are given one. Any improvements in the control group may actually be due to both placebo effect and regression to the mean. So to what extent is the placebo effect real? This paper talks about the importance of taking regression to the mean into account in such studies, and mentions that in certain cases, what we believe to be the placebo effect may actually just be regression to the mean. This is why some studies have 3 groups instead — group A with no treatment at all, group B with a placebo, and group C with the actual treatment. Any changes within group A are likely due to regression to the mean, the difference between group A and group B shows the placebo effect, and the difference between group B and group C is due to the drug itself!

Conclusion

Regression to the mean as a concept has been known to us for well over a hundred years, but because of its ubiquity and the subtle reasoning behind it, it’s easy to miss when it happens in real life. I talked about just a few examples, but there are plenty more, including the sophomore slump, Madden curse, plexiglas principle, manager of the month curse and many others. As we’ve seen, the consequences of ignoring it can be far-reaching. The best we can do is to remember that the effect exists, and train ourselves to spot it in the real world!

--

--