The distribution of sample means

The applet below illustrates how the sampling mean is distributed.

The sampling distribution is the distribution of samples of size n. We simulate it in the following way. First we choose an underlying distribution. Here we can choose to sample from populations with three distributions: a normal distribution, a uniform distribution and a discrete distribution (2-point) that chooses one of two discrete values (-10 or 10) with equal probability. All three distributions have mean 0 and standard deviation 10. We then generate a sample in yellow from our chosen distribution. As we generate the sample, we build up a histogram (or a bar chart in the case of the 2-point distribution) of the distribution. This should fit approximately the grey area of our distribution's density function.

When our distribution is complete we plot (in blue at first, then green) the mean value of our sample. This will be somehere in the middle of the yellow histogram.

We repeat the process again and again until we have a second histogram (in green) representing the distribution of means of the samples.

As you might expect, this distribution, the sampling distribution is narrower than the original distribution. The narrow grey density function for this sampling distibution is also shown so that you can compare, though you should only expect a good fit if the sample size n is large or the original distribution is approximately normal. You can test to see just how large a value of n you need to get a reasonable fit.

The Central Limit Theorem says that the sampling distribution should approximate a normal distribution with standard deviation equal to the original standard deviation divided by the square root of the sample size. The standard deviation of the sampling distribution is often called the standard error.

Compare how quickly the sampling distribution for each of the three underlying population distributions converges towards a normal distribution.

The 2-point distribution is the one whose sampling distribution should be least like a normal distribution, but you should find that, even for quite small samples, the sampling distribution is approximately normal. Notice that this distribution is just a scaled and shifted version of the Bernoulli distribution with p equal to 0.5, whose mean is just the population proportion p.

You will need a java plugin for this web page to work properly. If you don't have one, you can get one from Sun Microsystems. Some java plugins, notably the ones from Microsoft, but also some older plugins won't work with the java applet on this page.

You may freely download and use the source code for the applet. It is written in Java.

John Lamb