Beware of Outliers

As we currently digest the run-up to the 2016 presidential election, it can be expected that the candidates will present exaggerated claims to promote their agenda.  Often, these claims are abetted by less than objective press outlets.  Now, that’s not supposed to be the press corps job obviously, but it is what it is.  How do we discern fact from exaggeration?  One way to do that is to be on the lookout for the use of outliers to promote falsities.  So what exactly is an outlier?  Merriam-Webster defines it as follows:

A statistical observation that is markedly different in value from the others of the sample.

The Wolfram MathWorld website adds:

Usually, the presence of an outlier indicates some sort of problem. This can be a case which does not fit the model under study, or an error in measurement.

The most simple case of an outlier is a single data point that strays greatly from an overall trend.  An example of this is the United States jobs report from September 1983.

bls
Credit: Bureau of Labor Statistics

In September 1983, the Bureau of Labor Statistics announced a net gain of 1.1 million new jobs.  As you can tell from the graph above, it is the only month since 1980 that has gained 1 million jobs.  And why would we care about a jobs report from three decades ago?  It is often used to promote the stimulus of the Reagan tax cuts.  When you see an outlier such as this being used to support an argument, you should be wary.  As it turned out, there is a simpler explanation for this that has nothing to do, pro or con, with Reagan’s economic policy.  See the job loss immediately preceding September 1983?  In August 1983, there was a net loss of 308,000 jobs.  This was caused by the strike of 650,000 AT&T workers who returned to work the following month.

If you eliminate the statistical noise of the striking workers from both months, you have a gain of over 300,000 jobs in August 1983, and 400,000 jobs in September 1983.  Those are still impressive numbers and require no need for the use of an outlier to exaggerate.  However, it has to be noted, it was the monetary policy of the Fed Chair Paul Volcker, rather than the fiscal policy of the Reagan administration that was the main driver of the economy then.  Volcker pushed the Fed Funds rate as high as 19% in 1981 to choke off inflation causing the recession.  When the Fed eased up on interest rates, the economy rebounded quickly as is the normal response as predicted by standard economic models.  So we really can’t credit Reagan for the recovery, or blame him for the 1981-82 recession, either.  It’s highly suspect to use an outlier to support an argument, it’s even more suspect to assume a correlation.

To present a proper argument, your data has to fit a model consistently.  In this case, the argument is tax cuts alone are the dominant driver determining job creation in the economy.  That argument is clearly falsified in the data above as the 1993 tax increases were followed by a sustained period of job creation in the mid-late 1990’s.  And that is precisely why supporters of the tax cuts equals job creation argument have to rely on an outlier to make their case.  It’s a false argument intended to rely on the fact that, unless one is a trained economist, you are not likely to be aware of what occurred in a monthly jobs report over three decades ago.  Clearly, a more sophisticated model with multiple inputs are required to predict an economy’s ability to create jobs.

When dealing with an outlier, you have to explore whether it is a measurement error, and if not, can it be accounted for with existing models.  If it cannot, you’ll need to determine what type of modification is required to make your model explain it.  In science, the classic case is the orbit of Mercury.  Newton’s Laws do not accurately predict this orbit.  Mercury’s perihelion precesses at a rate of 43 arc seconds per century greater than predicted by Newton’s Laws.  Precession of planetary orbits are caused by the gravitational influence of the other planets.  The orbital precession of the planets besides Mercury are correctly predicted by Newton’s laws.  Explaining this outlier was a key problem for astronomers in the late 1800’s.

At first, astronomers attempted to analyze this outlier within the confines of the Newtonian model.  The most prominent of these solutions was the proposal that a planet, whose orbit resided inside of Mercury’s, perturbed the orbit of Mercury in a manner that explained the extra precession.  This proposed planet was dubbed Vulcan, after the Roman god of fire.  Several attempts were made to observe this planet during solar eclipses and predicted transits of the Sun with no success.  In 1909, William W. Campbell of the Lick Observatory stated no such planet existed and declared the matter closed.  At the same time, Albert Einstein was working on a new model of gravity that would accurately predict the orbit of Mercury.

Vulcan’s Forge by Diego Velázquez, 1630. Apollo pays Vulcan a visit. Instead of having a real planet named after him, Vulcan settled for one of the most famous planets in science fiction.  Credit: Museo del Prado, Madrid.

The general theory of relativity describes the motion of matter in two areas that Newton could not.  That is, when located near a large gravity well such as the Sun or moving at a velocity close to the speed of light.  In all other cases, the solutions of Newton and Einstein match.  Einstein understood that if his new theory could predict the orbit of Mercury, this would pass a key test for his work.  On November 18, 1915, Einstein presented his successful calculation of Mercury’s orbit to the Prussian Academy of Sciences.  This outlier was finally understood and a new theory of gravity was required to do it.  Nearly 100 years later, another outlier was discovered that could have challenged Einstein’s theory.

Relativity puts a velocity limit in the universe at the speed of light.  A measurement of a particle traveling faster than this would, as the orbit of Mercury did to Newton, require a modification to Einstein’s work.  In 2011, a team of physicists announced they had recorded a neutrino with a velocity faster than the speed of light.  The OPERA (Oscillation Project with Emulsion-tRacking Apparatus) team could not find any evidence for a measurement error.  Understanding the ramifications of this conclusion, OPERA asked for outside help in verifying this result.  As it turned out, a loose fiber optic cable caused a delay in firing the neutrinos.  This delay resulted in the measurement error.  Once the cable was repaired, OPERA measured the neutrinos at its proper velocity in accordance with Einstein’s theory.

While the OPERA situation was concluding, another outlier was beginning to gain headlines.  This being the increase in the annual sea ice in Antarctica, seemingly contradicting the claim by climate scientists that global temperatures are on the rise.  Is it possible to reconcile this observation within the confines of a model of global warming?  What has to understood is this measurement is an outlier that cannot be extrapolated globally.  It only pertains to sea ice surrounding the Antarctica continent.

Glaciers on the land mass of Antarctica continue to recede, along with mountain ranges across the globe and in the Arctic as well.  Clearly something interesting is happening in Antarctica, but it is regional in nature and does not overturn current climate change models.  At least, none of the arguments I’ve seen using this phenomenon to rebut global warming models have provided an alternative model that also explains why glaciers are receding on a global scale.

Outliers are found in business as well.  Most notably, carelessly taking an outlier and incorporating it as a statistical average in a forecasting model is dangerous.  Lets take a look at the history of housing prices.

Credit: St. Louis Federal Reserve.
Credit: St. Louis Federal Reserve.

In the period from 2004-06, housing prices climbed over 25% per year.  This was clearly a historic outlier and yet, many assumed this was the new normal and underwrote mortgages and derivative products as such.  An example of this would be balloon mortgages, where it was assumed the homeowner could refinance the large balloon payment at the end of the note with newly acquired equity in the property as a result of rapid appreciation.  Instead, the crash in property values left these homeowners owing more than the property was worth causing high rates of defaults.  Often, the use of outliers for business purposes are justified with slogans such as this is a new era, or the new prosperity.  It turns out to be just another bubble.  Slogans are never enough to justify using an outlier as an average in a model and never be swayed by any outside noise demanding you accept an outlier as the new normal.  Intimidation in the workplace played no small role in the real estate bubble, and if you are a business major, you’ll need to prepare yourself against such a scenario.

If you are a student and have an outlier in your data set, what should you do?  Ask your teachers to start with.  Often outliers have a very simple explanation, such as the 1983 jobs report, that will not interfere with the overall data set.  Look at the long range history of your data.  In the case of economic bubbles, you will note a similar pattern, the “this time is different” syndrome.  Only to eventually find out this time was not different.  More often than not, an outlier can be explained as an anomaly within a current working model.  And if that is not the case, you’ll need to build a new model to explain the data in a manner that predicts the outlier, but also replicates the accurate predictions of the previous model.  It’s a tall order, but that is how science progresses.

*Image on top of post is record Antarctic sea ice from 2014.  This is an outlier as ice levels around the globe recede as temperatures warm.  Credit:  NASA’s Scientific Visualization Studio/Cindy Starr.