Why is the normality of residuals “barely important at all” for the purpose of estimating the regression line?

user1205901 05/17/2015. 2 answers, 3.700 views
regression residuals assumptions

Gelman and Hill (2006) write on p46 that:

The regression assumption that is generally least important is that the errors are normally distributed. In fact, for the purpose of estimating the regression line (as compared to predicting individual data points), the assumption of normality is barely important at all. Thus, in contrast to many regression textbooks, we do not recommend diagnostics of the normality of regression residuals.

Gelman and Hill don't seem to explain this point any further.

Are Gelman and Hill correct? If so, then:

  1. Why "barely important at all"? Why is it neither important nor completely irrelevant?

  2. Why is the normality of residuals important when predicting individual data points?

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press

2 Answers


Glen_b 08/14/2017.

For estimation normality isn't exactly an assumption, but a major consideration would be efficiency; in many cases a good linear estimator will do fine and in that case (by Gauss-Markov) the LS estimate would be the best of those things-that-would-be-okay. (If your tails are quite heavy, or very light, it may make sense to consider something else)

In the case of tests and CIs, while normality is assumed, it's usually not all that critical (again, as long as tails are not really heavy or light, or perhaps one of each), in that, at least in not-very-small samples the tests and typical CIs tend to have close to their nominal properties (not-too-far from claimed significance level or coverage) and perform well (reasonable power for typical situations or CIs not too much wider than alternatives) - as you move further from the normal case power can be more of an issue, and in that case large samples won't generally improve relative efficiency, so where effect sizes are such that power is middling in a test with relatively good power, it may be very poor for the tests which assume normality.

This tendency to have close to the nominal properties for CIs and significance levels in tests is because of several factors operating together (one of which is the tendency of linear combinations of variables to have close to normal distribution as long as there's lots of values involved and none of them contribute a large fraction of the total variance).

However, in the case of a prediction interval based on the normal assumption, normality is relatively more critical, since the width of the interval is strongly dependent on the distribution of a single value. However, even there, for the most common interval size (95% interval), the fact that many unimodal distributions have very close to 95% of their distribution within about 2sds of the mean tends to result in reasonable performance of a normal prediction interval even when the distribution isn't normal. [This doesn't carry over quite so well to much narrower or wider intervals -- say a 50% interval or a 99.9% interval -- though.]


zbicyclist 05/17/2015.

2: When predicting individual data points, the confidence interval around that prediction assumes that the residuals are normally distributed.

This isn't much different than the general assumption about confidence intervals -- to be valid, we need to understand the distribution, and the most common assumption is normality. For example, a standard confidence interval around a mean works because the distribution of sample means approaches normality, so we can use a z or t distribution


HighResolutionMusic.com - Download Hi-Res Songs

1 The Chainsmokers

Beach House flac

The Chainsmokers. 2018. Writer: Andrew Taggart.
2 (G)I-DLE

POP/STARS flac

(G)I-DLE. 2018. Writer: Riot Music Team;Harloe.
3 Ariana Grande

​Thank U, Next flac

Ariana Grande. 2018. Writer: Crazy Mike;Scootie;Victoria Monét;Tayla Parx;TBHits;Ariana Grande.
4 Anne-Marie

Rewrite The Stars flac

Anne-Marie. 2018. Writer: Benj Pasek;Justin Paul.
5 Clean Bandit

Baby flac

Clean Bandit. 2018. Writer: Jack Patterson;Kamille;Jason Evigan;Matthew Knott;Marina;Luis Fonsi.
6 Nicki Minaj

No Candle No Light flac

Nicki Minaj. 2018. Writer: Denisia “Blu June” Andrews;Kathryn Ostenberg;Brittany "Chi" Coney;Brian Lee;TJ Routon;Tushar Apte;ZAYN;Nicki Minaj.
7 BlackPink

Kiss And Make Up flac

BlackPink. 2018. Writer: Soke;Kny Factory;Billboard;Chelcee Grimes;Teddy Park;Marc Vincent;Dua Lipa.
8 Imagine Dragons

Bad Liar flac

Imagine Dragons. 2018. Writer: Jorgen Odegard;Daniel Platzman;Ben McKee;Wayne Sermon;Aja Volkman;Dan Reynolds.
9 BTS

Waste It On Me flac

BTS. 2018. Writer: Steve Aoki;Jeff Halavacs;Ryan Ogren;Michael Gazzo;Nate Cyphert;Sean Foreman;RM.
10 Halsey

Without Me flac

Halsey. 2018. Writer: Halsey;Delacey;Louis Bell;Amy Allen;Justin Timberlake;Timbaland;Scott Storch.
11 Little Mix

Woman Like Me flac

Little Mix. 2018. Writer: Nicki Minaj;Steve Mac;Ed Sheeran;Jess Glynne.
12 Brooks

Limbo flac

Brooks. 2018.
13 Fitz And The Tantrums

HandClap flac

Fitz And The Tantrums. 2017. Writer: Fitz And The Tantrums;Eric Frederic;Sam Hollander.
14 Backstreet Boys

Chances flac

Backstreet Boys. 2018.
15 Lady Gaga

I'll Never Love Again flac

Lady Gaga. 2018. Writer: Benjamin Rice;Lady Gaga.
16 Diplo

Close To Me flac

Diplo. 2018. Writer: Ellie Goulding;Savan Kotecha;Peter Svensson;Ilya;Swae Lee;Diplo.
17 Rita Ora

Velvet Rope flac

Rita Ora. 2018.
18 Bradley Cooper

Always Remember Us This Way flac

Bradley Cooper. 2018. Writer: Lady Gaga;Dave Cobb.
19 Imagine Dragons

Machine flac

Imagine Dragons. 2018. Writer: Wayne Sermon;Daniel Platzman;Dan Reynolds;Ben McKee;Alex Da Kid.
20 Erika Sirola

Speechless flac

Erika Sirola. 2018. Writer: Teemu Brunila;Stefan Dabruck;Jürgen Dohr;Guido Kramer;Dennis Bierbrodt;Chris Braide;Robin Schulz.

Related questions

Hot questions

Language

Popular Tags