A Derivation of the Normal Distribution

Robert S. Wilson PhD.

Data are said to be normally distributed if their frequency histogram is apporximated by a bell shaped curve. In practice, one can tell by looking at a histogram if the data are normally distributed.

The bell shaped curve was discovered by Carl Friedrich Gauss (1777-1855), whom many mathematical historians consider to have been the greatest mathematician of all time. Gauss was working as the royal surveyor for the king of Prussia. Surveyors maesure distances. For instance, a survey crew may measure a distance to be 135.674m. To tell if that is the correct distance, they would check their work by measuring it again. The second time, they might get an answer of 135.677m. the second time they measured it. So is it 135.674m. or 135.677m.? They would have to measure it again. The next time, they might get an answer of 135.675m. Which one is it? Each time they meausre they have gotten a different answer. Gauss would have them measure it about 15 times, and they would get

 

135.674
135.677
135.675
135.675
135.676
135.672
135.675
135.674
135.676
135.675
135.676
135.674
135.675
135.676
135.675

If we make up a histogram for this data, we get,

 

At this point we can see that the true value would be 135.675m. accurate to the nearest millimeter.

These data are approximately normally distributed. We will get a normal distribution if there is a true answer for the distance, but as we shoot for this distance, since, to err is human, we are likely to miss the target. we are more likely to land on or near the target. As we get farther from the true value, the chances of landing there gets less and less. We can express this by saying that the rate at which the frequencies fall off is proportional to the distance from the true value.

If this were the end of the story, the histogram would be parabolically shaped, and as you got farther and farther from the true value, the frequencies would eventually become negative, and we can't have negative frequencies. We can get the frequencies to level off as they asymptotically approach zero by further requiring that the rate at which the frequencies fall off is also proportional to the frequencies themselves. Then as the frequencies approach zero, slope of the histogram also approaches zero, and the curve levels off as we get into the tail end of the curve. This gives us the following

Definition: Data are said to be normally distributed if the rate at which the frequencies fall off is proportional to the distance of the score from the mean, and to the frequencies themselves.

 

In practice, the value of the bell shaped curve is that we can find the proportion of the scores which lie over a certain interval. In a probability distribution, this is the area under the curve over the interval: a typical calculus problem. However, in order to use calculus to find these areas, we need a formula for the curve. We can find such a formula because our definition gives us the following differential equation.

Where   k   is a positive constant. Note that to the right of the mean, the curve will be decreasing and to the left, it will be increaing. We can separate the varaibles

and we can integrate both sides.

Take exponentials of both sides.

The question now becomes, what is   C,   not to mention, what is   k? We can use the fact that the normal distribution is a probability distribution, and the total area under the curve is   1.   If   f(x)   is a probability measure, then

This is actually somewhat humorous. It is a function which does not have an elementary function for its integral. Howe ever, there is a trick for getting the total area under the curve. First, let

Then,

and

so

So our integral becomes

Square the integral. When we write down two factors of the integral, we can use different variables for the two integrals.

Transfer to polar coordinates.

We now have something we can integrate. Let

v = -r2

dv = -2rdr

So our integral becomes

or

so

Now that we have a formula for the distribution, we can compute its expected value. Let

dx = dv

The expected value of   v   is

Let

so

hence,

We can use this to find the variance.

Let

and

We can use integration by parts to find this. Let

 

Then our integral becomes

The first term is zero, and in the second term, after the   1/k,   we have the area under the bell shaped curve, with mean = 0,   which is   1.   We get

which says

So the formula becomes

Now that we have the formula, we can locate the critical points in the bell shaped curve. To find maxima and minima, solve

f '(x) = 0

Which is zero if and only if

x - μ = 0

or

x = μ

which says that the bell shaped curve peaks out above the mean, which we suspected to be true to begin with.

For the points of inflection, solve

f "(x) = 0

which happens if and only if

or

x = μ ± σ

This says that the points of inflection in the bell shaped curve lie one standard deviation above and below the mean.

Another advantage of having a formula for the bell shaped curve is that we can use integral calculus to find the areas under portions of the bell shaped curve. Unfortunately, the formula does not have an elementary integral even if we allow the use of exponential functions. To get a method for finding the area under the bell shaped curve, first define

is called the   z-score.   It tells us how many standard deviations our score lies from the mean.

then

or

dx = σdz

Then

Note that

and we can then integtrate term by term.

This is the area under the bell shaped curve between the mean and your   z-score.

It is not that unreasonable that we should need to use an infinite sum to get the area, because one needs to use infinite series to compute   ex   anyway.

This says that approimately 68% of the scores are within one standard deviation from the mean, 95% of the scores are within two standard deviations from the mean, and 99.7% are within 3 standard deviations. Most scores are within standard deviations from the mean.

The Normal Approximation of the Binomial Distribution

In the case of an experiment being repeated   n   times, if the probability of an event is   p,   then the probability of the event occurring   k   times is

  nCkpkqn-k.

where   q = 1 - p.  

If one were to graph these distributions, it would look somewhat like a bell shaped curve. If the event happening is called a success and it not happening is called a failure, then, if the number of trials is rasonably small, this formula will work fairly well in computing the probabililties of various numbers of successes. However, if you have a problem like, when you say that the probability of a coin landing heads is 1/2 means that if you were to flip a coin a large number of times like, say, 100 times, that heads should come up on half if the tosses, or 50 times, using the formula for computing the probability of getting 50 heads in 100 tosses would be quite tedious, but even more tedious would be that, if, after you noticed that this probability of getting exactly 50 heads was rather small, you dedided to illustrate the point by computing the probability of getting between 45 and 55 heads, the effort involved in using the binomial formula for getting the probabilities of the eleven outcomes in this event would be somewhat significant. If the binomial probabilities were approximately normally distributed, one could treat it as a bell shaped curve problem.

For a bell shaped curve problem one needs a mean and a standard deviation. It the binomial distribution, it is well known that

  μ = np

and

But does the normal distribution approximate the binomial distribution? There are differences. The binomial distribution is discrete and the normal distribution is continuous. This will present a challenge in seeing if the binomial distribution satisfies the differential equation which defines the normal distribution.

Recall

The smallest value that   h   could possibly take in this case would be   1.  

But   p + q = 1,   so we get

It is time to start approximating.   q   is a probability, so it is a number between   0   and   1,   and, as such, it will be negligible compared to   np   and   x,   especially when   n   is large, so we can ignore it. If   x + 1   is close to the mean,   np,   then this becomes close to

So we see that while the binomial distribution does not give us an exact solution to the differential equation, it is pretty close when   x   is close to the mean. When   x   gets farther away from the mean, the probabilities become smaller, so even if the probabilities are off by a significant factor, they are so small that the differences could be negligible compared to the probabilities of outcomes closer to the mean.

In our example of flipping a coin 100 times and noting the number of heads, if one computes the binomial probability of exactly 50 heads, it comes out to 0.079589 ( rounded to 6 places after the decimal point), not very much. This is because the probability of 50 heads is not that much different from the probability of 49 or 51 heads, or even the probability of 48 or 52 heads. When one spreads the probabilities out over all those possibilities, any one of them will not be that great.

If you figure the binomial probabilities for all of the outcomes between 45 and 55 heads, it comes out to be about 0.728747. If you use the normal distribution, the probability comes of to be about 0.728668. In both cases they round off to 0.7287 -- agreement to four significant digits, which is not bad, especially for a probability, where that much exactness is not really that meaningful. Notice that even though from 45 to 55 is only around 1/10th of the domain, the probability, when we take these outcomes which are close to half heads and half tails, is over 70%, supporting the contention that you will probably get half heads and half tails.

If on the other hand you try the probability of between 25 and 30 heads, if you use the binomial probabilities, you get around 3.9163 x 10-5, where if you use the normal distribution you get around 4.7945 x 10-5. Proportionally, this is a much more significant difference, which is explained by the fact that we are farther from the mean, but either of these numbers, and hence their difference, is negligible compared to our previous result.

The upshot is that even though the binomial distribution is not exactly normal, it is close enough that people do use the normal approximation.