## First effort at Computing the Sampling Variance

The two basic things we want out of stereology are the right answer and some idea of how well the work was done.

The second issue is obtaining some measure of how well the sampling went. When sampling is performed there is always some variation between samples. This spread of sample results gives us uncertainty about the answer. So how good is the estimate? That depends on two factors. One is the method that is used and the other is the thing being sampled. The same method applied to two different objects gives two different spreads. There are statistical measures that quantify the spread. These are said to be measures of dispersion.

One of the most important of these is the variance. Another measure of dispersion is the standard deviation. There is a statistical measure called the coefficient of variance. This is a relative measure of the dispersion. These values talk about the sample. The measure called the coefficient of error talks about the method being used.

Possibly the first coefficient of error calculation used was the binomial distribution formula. This was used for point counting in the 1930s and right into the 1960s and possibly even today. The problem with this method was known right from the start. Quite a few rules for sampling were added in the 1930s, 1940s, and 1950s to account for the mathematics of the binomial distribution.

Any mathematical formula is based on the assumption or ground rules used to derive a formula. If a formula is based on the assumption that samples are taken from an infinite set, then applying the sample to a population of 100 things is likely to give you the wrong answer. If the formula is based on the notion that all samples are independent, then using the formula when doing systematic sampling is likely to give you the wrong answer. It is very important to know what ground rules must be met to make a formula provide the correct answer. None of us would be foolish enough to use F=ma and substitute a temperature into the formula. But sampling is so complex that we might apply a formula without realizing that the formula was not applicable.

Point counting in the early years was quickly applied to studies under the microscope. Although it had been developed for large scale studies of land management, by the early 1930s geologists were doing point counting under the microscope. The binomial distribution formula was latched onto as a means of describing the precision of the result.

The problem with using this formula was recognized right away. The sampling was done in a systematic manner, but the formula is very clear that the results must be independent. Geologists doing modal analysis of rocks identify minerals through the microscope by producing thin sections of the rocks and analyzing them using a polarizing microscope. The view under the microscope is of crystals in the rocks revealed in different colors. Suppose you were sampling and several points fell inside of the same crystal under the microscope. Are these independent results?

The sampling was in a systematic manner because it was cheap to build a device that supported systematic sampling. Previously devices relied on the Rosiwal method. Those devices were also cheap to build, but their use was tedious. The newer point counting devices produced sampling speeds of 1500 points per hour. Compare that to modern computerized systems and you see that the older mechanical systems with automatic totals were pretty darn good.

By the 1960s the discussion of the binomial distribution formula was again raised. The underlying mathematics was yet again brought to light and a serious discussion of the limitations and requirements were openly discussed in the literature.

Eventually the matter was addressed in more theoretical terms. Instead of justifying the use of a particular formula, work was done to see what needed to be done. The use of the binomial distribution formula dropped as better methods were developed. These newer methods dropped the independent requirement. The results that are obtained now are in keeping with the nature of the sampling that is being done.