Archive for the ‘Coefficient of Error’ Category

Are designed based stereological methods unbiased?

October 12, 2010

There is a common misunderstanding that design based methods are by definition unbiased. That is simply not true. There are those that have said to me, “It’s design based because I designed this idea. It is designed and therefore unbiased.” I think that latter comment tells you that being a designed based method is not necessarily unbiased. I realize that the person making the comment was unclear about the meaning of design based, but that’s the way it happened.

In a design based approach it is is possible to show whether or not a method is unbiased. It doesn’t mean that biased can’t creep in if the method cannot be implemented properly, but at least there is the hope that the biases introduced during the implementation of the method are not overwhelming.

You might ask yourself if bias if really all that bad. Does it really matter if a method used in stereology has bias? What is that doing to the result? If the amount of bias is small relative to the value being determined, then it might not be bad if the method is biased.

Suppose a method had a bias estimated to be less than 5% and the data showed a 20% difference between control and experimental, then the 5% is not important. The method would be a reasonable method if it saved work.

Unfortunately, biases are difficult to determine. Showing that the bias is less than a certain magnitude is usually impossible.

That is why design based methods that are unbiased are favored. If the method can be shown to have zero bias, then the issue is how close to the mathematical ideal is the implementation.

First effort at Computing the Sampling Variance

April 30, 2010

The two basic things we want out of stereology are the right answer and some idea of how well the work was done.

The second issue is obtaining some measure of how well the sampling went. When sampling is performed there is always some variation between samples. This spread of sample results gives us uncertainty about the answer. So how good is the estimate? That depends on two factors. One is the method that is used and the other is the thing being sampled. The same method applied to two different objects gives two different spreads. There are statistical measures that quantify the spread. These are said to be measures of dispersion.

One of the most important of these is the variance. Another measure of dispersion is the standard deviation. There is a statistical measure called the coefficient of variance. This is a relative measure of the dispersion. These values talk about the sample. The measure called the coefficient of error talks about the method being used.

Possibly the first coefficient of error calculation used was the binomial distribution formula. This was used for point counting in the 1930s and right into the 1960s and possibly even today. The problem with this method was known right from the start. Quite a few rules for sampling were added in the 1930s, 1940s, and 1950s to account for the mathematics of the binomial distribution.

Any mathematical formula is based on the assumption or ground rules used to derive a formula. If a formula is based on the assumption that samples are taken from an infinite set, then applying the sample to a population of 100 things is likely to give you the wrong answer. If the formula is based on the notion that all samples are independent, then using the formula when doing systematic sampling is likely to give you the wrong answer. It is very important to know what ground rules must be met to make a formula provide the correct answer. None of us would be foolish enough to use F=ma and substitute a temperature into the formula. But sampling is so complex that we might apply a formula without realizing that the formula was not applicable.

Point counting in the early years was quickly applied to studies under the microscope. Although it had been developed for large scale studies of land management, by the early 1930s geologists were doing point counting under the microscope. The binomial distribution formula was latched onto as a means of describing the precision of the result.

The problem with using this formula was recognized right away. The sampling was done in a systematic manner, but the formula is very clear that the results must be independent. Geologists doing modal analysis of rocks identify minerals through the microscope by producing thin sections of the rocks and analyzing them using a polarizing microscope. The view under the microscope is of crystals in the rocks revealed in different colors. Suppose you were sampling and several points fell inside of the same crystal under the microscope. Are these independent results?

The sampling was in a systematic manner because it was cheap to build a device that supported systematic sampling. Previously devices relied on the Rosiwal method. Those devices were also cheap to build, but their use was tedious. The newer point counting devices produced sampling speeds of 1500 points per hour. Compare that to modern computerized systems and you see that the older mechanical systems with automatic totals were pretty darn good.

By the 1960s the discussion of the binomial distribution formula was again raised. The underlying mathematics was yet again brought to light and a serious discussion of the limitations and requirements were openly discussed in the literature.

Eventually the matter was addressed in more theoretical terms. Instead of justifying the use of a particular formula, work was done to see what needed to be done. The use of the binomial distribution formula dropped as better methods were developed. These newer methods dropped the independent requirement. The results that are obtained now are in keeping with the nature of the sampling that is being done.