Why does it says data should be normally distributed for analysis, when different test follow its own distribution (i.e. t, Z, F)?

Kynda 05/16/2018. 2 answers, 294 views
normal-distribution t-test normality-assumption f-test z-test

Why does it say data should be normally distributed for statistical analysis when different test follow its own distribution (i.e. t, Z, F)?

What does normality have to do with this?

2 Answers


Glen_b 05/16/2018.

Let's look at a specific example, a one sample t-test.

The t-statistic consists of a numerator and a denominator:

$$t = \frac{\bar{X}-\mu_0}{s_X/\sqrt{n}}$$

Both $\bar{X}$ and $s_X$ -- the sample mean and standard deviation -- are random quantities that depend on the (random) sample.

Because the random values in the sample we will be taking ($X_1,X_2,...,X_n$) are assumed to be independent and identically distributed as $N(\mu_0,\sigma^2)$ for some unknown $\sigma^2$, their mean $\bar{X}$ is in turn normally distributed with mean $\mu_0$ and variance $\sigma^2/n$ (these statements can be proved under the assumptions; for the mean and variance see https://en.wikipedia.org/wiki/Expected_value#Basic_properties and https://en.wikipedia.org/wiki/Variance#Basic_properties).

So the numerator of our t-statistic is distributed as $N(0,\sigma^2/n)$. If we knew $\sigma$, we could divide that by $\sigma/\sqrt{n}$ and get a test statistic that was distributed as a standard normal (a Z test). ($Z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}$)

However, we don't have that information, and because we don't know how variable the population is, we don't have a way to exactly work out how "unusual" a particular sample mean would be (if it were to have come from a normal distribution with the hypothesized mean).

We can, however, get an estimate of that variability (we can estimate $\sigma^2$ by $s_X^2$, the sample variance). If we do that, and use the estimate in place of the population value, the statistic becomes the t-statistic. But because the numerator and denominator are now both random (different samples will give different values for both), the statistic no longer has a normal distribution. The tendency of $1/s_X$ to be occasionally much larger than $1/\sigma_X$ makes the tail heavier, and it turns out that under the assumptions we made, the statistic now has a Student's t distribution.

[It's named for Student, the pseudonym of the person that correctly guessed the form of the distribution of the t-statistic and checked it using a kind of simulation (of sorts); Fisher later proved the guess correct.]

The t-distribution is one of a number of distributions connected to the normal distribution (in that they're what you end up with when you do certain calculations on samples from normal populations). Others include the chi-squared distribution and the F-distribution.

Similar - but slightly more complicated explanations than the above one - apply to F which arises in problems related to ANOVA, regression, and tests of variance when used on normal or conditionally normal populations (as appropriate).

What does normality have to do with this?

The calculations that get to a t-distribution or an F-distribution respectively rely on the original values being one (or more) random sample(s) from normally distributed populations. (In the case of regression, its the error term that's normally distributed rather than the unconditional population of responses.)

If you didn't have normal populations, you wouldn't get things that are t- or F- distributed respectively, but something else.

For example, if we do a one-sample t-test on data from an exponential distribution with n=8, then if the hypothesized population mean is correct the distribution of the t-statistic looks like this:

histogram and density plot of t-statistc; the distribution is short tailed on the right and very long tailed on the left

The bars are a histogram and the green curve is a kernel density estimate for a one-sample t-statistic based on drawing many samples from an exponential population at n=8.

You might have expected that (since the numerator would be right-skew) that the t-statistic would be skewed right, but it's clearly left-skew in this instance. [This should reinforce to us the idea that you cannot ignore the behavior of the denominator when dealing with the t-test.]

If we used the ordinary t-tables with this statistic our p-values would be wrong -- quite wrong for either one-tailed test (too high or too low depending on which side we're testing against) but still somewhat wrong for a two-tailed test.

The two-sample test is considerably less impacted than this with exponential data, but it's still not t-distributed. The F-test for equality of variances is more substiantially affected, however.


Tests are typically done on statistics, which are functions of data. The mean is one example where you sum N random variables and then divide by N. These functions often have their own distributions that are distinct from the underlying data.


HighResolutionMusic.com - Download Hi-Res Songs

1 Martin Garrix

Yottabyte flac

Martin Garrix. 2018. Writer: Martin Garrix.
2 Alan Walker

Diamond Heart flac

Alan Walker. 2018. Writer: Alan Walker;Sophia Somajo;Mood Melodies;James Njie;Thomas Troelsen;Kristoffer Haugan;Edvard Normann;Anders Froen;Gunnar Greve;Yann Bargain;Victor Verpillat;Fredrik Borch Olsen.
3 Sia

I'm Still Here flac

Sia. 2018. Writer: Sia.
4 Blinders

Breach (Walk Alone) flac

Blinders. 2018. Writer: Dewain Whitmore;Ilsey Juber;Blinders;Martin Garrix.
5 Dyro

Latency flac

Dyro. 2018. Writer: Martin Garrix;Dyro.
6 Cardi B

Taki Taki flac

Cardi B. 2018. Writer: Bava;Juan Vasquez;Vicente Saavedra;Jordan Thorpe;DJ Snake;Ozuna;Cardi B;Selena Gomez.
7 Bradley Cooper

Shallow flac

Bradley Cooper. 2018. Writer: Andrew Wyatt;Anthony Rossomando;Mark Ronson;Lady Gaga.
8 Halsey

Without Me flac

Halsey. 2018. Writer: Halsey;Delacey;Louis Bell;Amy Allen;Justin Timberlake;Timbaland;Scott Storch.
9 Lady Gaga

I'll Never Love Again flac

Lady Gaga. 2018. Writer: Benjamin Rice;Lady Gaga.
10 Kelsea Ballerini

This Feeling flac

Kelsea Ballerini. 2018. Writer: Andrew Taggart;Alex Pall;Emily Warren.
11 Mako

Rise flac

Mako. 2018. Writer: Riot Music Team;Mako;Justin Tranter.
12 Dewain Whitmore

Burn Out flac

Dewain Whitmore. 2018. Writer: Dewain Whitmore;Ilsey Juber;Emilio Behr;Martijn Garritsen.
13 Bradley Cooper

Always Remember Us This Way flac

Bradley Cooper. 2018. Writer: Lady Gaga;Dave Cobb.
14 Little Mix

Woman Like Me flac

Little Mix. 2018. Writer: Nicki Minaj;Steve Mac;Ed Sheeran;Jess Glynne.
15 Charli XCX

1999 flac

Charli XCX. 2018. Writer: Charli XCX;Troye Sivan;Leland;Oscar Holter;Noonie Bao.
16 Rita Ora

Let You Love Me flac

Rita Ora. 2018. Writer: Rita Ora.
17 Diplo

Electricity flac

Diplo. 2018. Writer: Diplo;Mark Ronson;Picard Brothers;Wynter Gordon;Romy Madley Croft;Florence Welch.
18 Jonas Blue

Polaroid flac

Jonas Blue. 2018. Writer: Jonas Blue;Liam Payne;Lennon Stella.
19 Lady Gaga

Look What I Found flac

Lady Gaga. 2018. Writer: DJ White Shadow;Nick Monson;Mark Nilan Jr;Lady Gaga.
20 Avril Lavigne

Head Above Water flac

Avril Lavigne. 2018. Writer: Stephan Moccio;Travis Clark;Avril Lavigne.

Related questions

Hot questions

Language

Popular Tags