Data are said to be normally distributed if their frequency histogram is apporximated by a bell shaped curve. In practice, one can tell by looking at a histogram if the data are normally distributed.
The bell shaped curve was discovered by Carl Friedrich Gauss (1777-1855), whom many mathematical historians consider to have been the greatest mathematician of all time. Gauss was working as the royal surveyor for the king of Prussia. Surveyors maesure distances. For instance, a survey crew may measure a distance to be 135.674m. To tell if that is the correct distance, they would check their work by measuring it again. The second time, they might get an answer of 135.677m. the second time they measured it. So is it 135.674m. or 135.677m.? They would have to measure it again. The next time, they might get an answer of 135.675m. Which one is it? Each time they meausre they have gotten a different answer. Gauss would have them measure it about 15 times, and they would get
If we make up a histogram for this data, we get,
At this point we can see that the true value would be 135.675m. accurate to the nearest millimeter.
These data are approximately normally distributed. We will get a normal distribution if there is a true answer for the distance, but as we shoot for this distance, since, to err is human, we are likely to miss the target. we are more likely to land on or near the target. As we get farther from the true value, the chances of landing there gets less and less. We can express this by saying that the rate at which the frequencies fall off is proportional to the distance from the true value.
If this were the end of the story, the histogram would be parabolically shaped, and as you got farther and farther from the true value, the frequencies would eventually become negative, and we can't have negative frequencies. We can get the frequencies to level off as they asymptotically approach zero by further requiring that the rate at which the frequencies fall off is also proportional to the frequencies themselves. Then as the frequencies approach zero, slope of the histogram also approaches zero, and the curve levels off as we get into the tail end of the curve. This gives us the following
Definition: Data are said to be normally distributed if the rate at which the frequencies fall off is proportional to the distance of the score from the mean, and to the frequencies themselves.
In practice, the value of the bell shaped curve is that we can find the proportion of the scores which lie over a certain interval. In a probability distribution, this is the area under the curve over the interval: a typical calculus problem. However, in order to use calculus to find these areas, we need a formula for the curve. We can find such a formula because our definition gives us the following differential equation.
Where k is a positive constant. Note that to the right of the mean, the curve will be decreasing and to the left, it will be increaing. We can separate the varaibles
and we can integrate both sides.
Take exponentials of both sides.
The question now becomes, what is C, not to mention, what is k? We can use the fact that the normal distribution is a probability distribution, and the total area under the curve is 1. If f(x) is a probability measure, then
This is actually somewhat humorous. It is a function which does not have an elementary function for its integral. Howe ever, there is a trick for getting the total area under the curve. First, let
So our integral becomes
Square the integral. When we write down two factors of the integral, we can use different variables for the two integrals.
Transfer to polar coordinates.
We now have something we can integrate. Let
v = -r2
dv = -2rdr
So our integral becomes
Now that we have a formula for the distribution, we can compute its expected value. Let
dx = dv
The expected value of v is
We can use this to find the variance.
We can use integration by parts to find this. Let
Then our integral becomes
The first term is zero, and in the second term, after the 1/k, we have the area under the bell shaped curve, with mean = 0, which is 1. We get
So the formula becomes
Now that we have the formula, we can locate the critical points in the bell shaped curve. To find maxima and minima, solve
f '(x) = 0
Which is zero if and only if
x - μ = 0
x = μ
which says that the bell shaped curve peaks out above the mean, which we suspected to be true to begin with.
For the points of inflection, solve
f "(x) = 0
which happens if and only if
x = μ ± σ
This says that the points of inflection in the bell shaped curve lie one standard deviation above and below the mean.
is called the z-score. It tells us how many standard deviations our score lies from the mean. Define
dx = σdz
This is the area under the bell shaped curve between the mean and your z-score.
It is not that unreasonable that we should need to use an infinite sum to get the area, because one needs to use infinite series to compute ex anyway.