Gelman and Hill (2006) write on p46 that:

The regression assumption that is generally least important is that the errors are normally distributed. In fact, for the purpose of estimating the regression line (as compared to predicting individual data points), the assumption of normality is barely important at all. Thus, in contrast to many regression textbooks, we do not recommend diagnostics of the normality of regression residuals.

Gelman and Hill don't seem to explain this point any further.

Are Gelman and Hill correct? If so, then:

Why "barely important at all"? Why is it neither important nor completely irrelevant?

Why is the normality of residuals important when predicting individual data points?

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press

Glen_b 08/14/2017.

For *estimation* normality isn't exactly an assumption, but a major consideration would be efficiency; in many cases a good linear estimator will do fine and in that case (by Gauss-Markov) the LS estimate would be the best of those things-that-would-be-okay. (If your tails are quite heavy, or very light, it may make sense to consider something else)

In the case of tests and CIs, while normality is assumed, it's usually not all that critical (again, as long as tails are not really heavy or light, or perhaps one of each), in that, at least in not-very-small samples the tests and typical CIs tend to have close to their nominal properties (not-too-far from claimed significance level or coverage) and perform well (reasonable power for typical situations or CIs not too much wider than alternatives) - as you move further from the normal case *power* can be more of an issue, and in that case large samples won't generally improve relative efficiency, so where effect sizes are such that power is middling in a test with relatively good power, it may be very poor for the tests which assume normality.

This tendency to have close to the nominal properties for CIs and significance levels in tests is because of several factors operating together (one of which is the tendency of linear combinations of variables to have close to normal distribution as long as there's lots of values involved and none of them contribute a large fraction of the total variance).

However, in the case of a prediction interval based on the normal assumption, normality is relatively more critical, since the width of the interval is strongly dependent on the distribution of a *single* value. However, even there, for the most common interval size (95% interval), the fact that many unimodal distributions have very close to 95% of their distribution within about 2sds of the mean tends to result in reasonable performance of a normal prediction interval even when the distribution isn't normal. [This doesn't carry over quite so well to much narrower or wider intervals -- say a 50% interval or a 99.9% interval -- though.]

zbicyclist 05/17/2015.

2: When predicting individual data points, the confidence interval around that prediction assumes that the residuals are normally distributed.

This isn't much different than the general assumption about confidence intervals -- to be valid, we need to understand the distribution, and the most common assumption is normality. For example, a standard confidence interval around a mean works because the distribution of sample means approaches normality, so we can use a z or t distribution

- If the t-test and the ANOVA for two groups are equivalent, why aren't their assumptions equivalent?
- What is a good index of the degree of violation of normality and what descriptive labels could be attached to that index?
- Normality of dependent variable = normality of residuals?
- Why some people test regression-like model assumptions on their raw data and other people test them on the residual?
- Struggling with non-normality in generalized linear model
- Multilevel Binary Logistic Instrumental Variables Regression
- Why is OLS/ANOVA assumption about the normality of residuals rather than normality of the error distribution (of the mean)?
- Interpreting model diagnostic plots for multiple linear mixed-effects regression & motivating alternatives
- All residuals on 0,0 line after linear regression
- Investigating the Normality of Residuals in Longitudinal Regression - consecutive S-W testing on subsets of residuals
- How does linear regression use the normal distribution?
- What does confidence interval and p values mean w.r.t linear regression?
- Is normality of residuals necessary for drawing conclusions from Impulse Response function
- Why is OLS/ANOVA assumption about the normality of residuals rather than normality of the error distribution (of the mean)?
- Investigating the Normality of Residuals in Longitudinal Regression - consecutive S-W testing on subsets of residuals
- Simulating data for linear regression
- For linear regression, is it important for predictors and response to be normally distributed?

- Migrating to Salesforce DX
- How to reshape an array faster?
- Selling the right to kill one's convicted abuser: how could an NGO make this profitable?
- Avengers Infinity War: Casualty List?
- Using asterisk (*) when all fields are required
- In what era or tech level in earth's history did people have the ability to cross a super-earth's ocean 50000km wide?
- How do I respond to someone apologizing for coughing/yawning?
- Why are wings load tested upside down?
- How to have good public opinion of police in a place with low or no crime?
- Should I disclose past religious discrimination issues when applying for a graduate program?
- Can my boss make me do manual labor if my job description is as the Office Manager?
- What is the meaning of "with big hugs and kisses" in the following sentence?
- How can Ron know that Voldemort was in Slytherin if nobody knows his real identity?
- Why is Windows using CRLF and Unix just LF?
- Why does my LED not blow out when overpowered by my Arduino?
- How ethical is the practice of including spouses as co-authors when they are in the same field?
- Can a spaceship land on an icy body using retropropulsion? wouldn't the ice melt?
- How to kill off my DnD character, without upsetting the Dungeon Master who has invested time into creating his backstory?
- Dangerous pending task in world managing A.I
- Citation: refer to author or paper?
- Is it more secure to program a client-server system in a language other than English?
- Interesting Programming Exercises to Teach Inheritance?
- How to stop a friend from being "touchy-feely" towards me?
- Can members of the US Congress lie during debate without penalty?