A study of the distribution of the variance in small samples.

成果类型:
Article
署名作者:
Le Roux, JM
署名单位:
Stellenbosch University
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/23.1-2.134
发表日期:
1931
页码:
134190
关键词:
摘要:
The main object of this investigation is to study the first 4 moments of D (s2), the sampling distribution of the variance, in order to obtain suitable Pearson curves to represent D(s2) for populations which can themselves be represented by Pearson curves. The manner in which D (s2) alters as the population and the sample size alter is traced out: (1) Type I, Type VI and Type IV distributions are indicated, the latter being mostly associated with very leptokurtic Type IV populations; (2) a series of populations is found whose D (s2) can be approximately represented for all sample sizes by Type III curves; (3) limits as to population and sample size are given, within which (D(s2) is approximately normal. A general procedure is outlined for fitting these curves by fixing the start at s2 = 0 and using the first 3 moments. By this method Type IV will not arise. Probability integrals of the Type I and Type VI D (s2) curves with fixed start are discussed, Type III being treated as a special case. Methods of obtaining the betas of D (s3) from the parent population constants are given, special reference being made to difficulties associated with very leptokurtic curves. The methods outlined are tested on 21 experimental distributions of s2. These results seem to lead to the following conclusions: (1) Unless the sampled population is extremely leptokurtic we are not likely to be misled if, when knowing only B1 and B2 of the sampled population, we calculate B1 and B2, of D(s2) by using the difference formulae. If, however, the population is very leptokurtic (e.g., [beta]1 = 0, [beta]2 = 4.1; [beta]1=l.2, [beta]2 = 5.8, B1 and B2 should preferably be calculated from actual higher betas of that population; (2) whether actual higher moments or difference formula betas be used, and whether the population be leptokurtic or not, the fixed start method seems on the whole more reliable than the 4-moment fit. The former is very satisfactory for all sample sizes, giving in the goodness of fit tests an average P of about 0.49; the latter sometimes leads to quite bad or even impossible fits in the case of very small samples; (3) the Type I and Type VI fixed start equations used to describe D(s2) seem quite adequate for samples of 10 or more, and low values of P in the x 2 tests seem to be due mainly to chance fluctuations. The same curves may be used for samples as small as 5, but here the fits seem to be a little less satisfactory; (4) we are liable to go far wrong when using the normal theory variance distribution to represent D(s2) in samples from non-normal parent populations; (5) the Type III fixed start curve appears to be applicable in the case of many parent populations in the neighborhood of the so-called D(s2) III line.