-
作者:Bartlett, PL; Jordan, MI; McAuliffe, JD
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relations...
-
作者:Nan, B; Lin, XH; Lisabeth, LD; Harlow, SD
作者单位:University of Michigan System; University of Michigan; Harvard University; Harvard T.H. Chan School of Public Health; University of Michigan System; University of Michigan; University of Michigan System; University of Michigan
摘要:A question of significant interest in female reproductive aging is to identify bleeding criteria for menopausal transition. Although various bleeding criteria, or markers, have been proposed for menopausal transition, their validity has not been adequately examined. The Tremin Trust data were collected from a long-term cohort study that followed a group of women throughout their whole reproductive life. Such data provide a unique opportunity for evaluating the utility of a bleeding criterion-b...
-
作者:Nettleton, Dan
作者单位:Iowa State University
-
作者:Fan, Juanjuan; Su, Xiao-Gang; Levine, Richard A.; Nunn, Martha E.; LeBlanc, Michael
作者单位:California State University System; San Diego State University; State University System of Florida; University of Central Florida; Boston University; Boston University; Fred Hutchinson Cancer Center
摘要:In this article the regression tree method is extended to correlated survival data and applied to the problem of developing objective prognostic classification rules in periodontal research. The robust logrank statistic is used as the splitting statistic to measure the between-node difference in survival, while adjusting for correlation among failure times from the same patient. The partition-based survival function estimator is shown to converge to the true conditional survival function. Toot...
-
作者:Quale, Christopher M.; Van der Laan, Mark J.; Robins, James R.
作者单位:Novo Nordisk; University of California System; University of California Berkeley; Harvard University; Harvard T.H. Chan School of Public Health
摘要:Estimation of the survival curve for independently right-censored bivariate failure time data is a problem that has been studied extensively over the past 20 years. In this article we propose a new class of estimators for the bivariate survivor function based on locally efficient (LE) estimation. The LE estimator takes bivariate estimators F, and G,, of the distributions of the time variables (T-1, T-2) and the censoring variables (C-1, C-2), and maps them to the resulting estimator LE. If F, ...
-
作者:Liu, Yufeng; Shen, Xiaotong
作者单位:University of North Carolina; University of North Carolina Chapel Hill; University of Minnesota System; University of Minnesota Twin Cities
摘要:In binary classification, margin-based techniques usually deliver high performance. As a result, a multicategory problem is often treated as a sequence of binary classifications. In the absence of a dominating class, this treatment may be suboptimal and may yield poor performance, such as for support vector machines (SVMs). We propose a novel multicategory generalization of psi-learning that treats all classes simultaneously. The new generalization eliminates this potential problem while at th...
-
作者:Lin, DY; Zeng, D
作者单位:University of North Carolina; University of North Carolina Chapel Hill
摘要:A haplotype is a specific sequence of nucleotides on a single chromosome. The population associations between haplotypes and disease phenotypes provide critical information about the genetic basis of complex human diseases. Standard genotyping techniques cannot distinguish the two homologous chromosomes of an individual, so only the unphased genotype (i.e., the combination of the two homologous haplotypes) is directly observable. Statistical inference about haplotype-phenotype associations bas...
-
作者:Kong, Yong
作者单位:National University of Singapore; National University of Singapore
摘要:Exact distributions of run statistics are traditionally obtained using combinatorial methods, which, under certain situations, become very tedious. Run distributions of multiple object systems, although appearing frequently in applications from various fields, such as computational biology, are not commonly used, due in part to the lack of easy-to-use formulas. In this article, a method for evaluating partition functions of lattice models in the field of statistical mechanics is used to develo...
-
作者:Mark, Steven D.; Katki, Hormuzd A.
作者单位:University of Colorado System; University of Colorado Anschutz Medical Campus; University of Colorado Denver; National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI); NIH National Cancer Institute- Division of Cancer Epidemiology & Genetics
摘要:Since 1986, we have been studying a cohort of individuals from a region in China with epidemic rates of gastric cardia cancer and have conducted numerous two-stage studies to assess the association of various exposures with this cancer. Two-stage studies are a commonly used statistical design. Stage one involves observing the outcomes and accessible baseline covariate information on all cohort members, and stage two involves using the stage one observations to select a subset of the cohort for...
-
作者:Morales, KH; Ibrahim, JG; Chen, CJ; Ryan, LM
作者单位:University of Pennsylvania; University of Pennsylvania; University of North Carolina; University of North Carolina Chapel Hill; National Taiwan University; Harvard University; Harvard T.H. Chan School of Public Health
摘要:An important component of quantitative risk assessment involves characterizing the dose-response relationship between an environmental exposure and adverse health outcome and then computing a benchmark dose, or the exposure level that yields a suitably low risk. This task is often complicated by model choice considerations, because risk estimates depend on the model parameters. We pro pose using Bayesian methods to address the problem of model selection and derive a model-averaged version of t...