-
作者:Chen, K; Jin, ZZ
作者单位:Hong Kong University of Science & Technology; Columbia University
摘要:This article considers the analysis of clustered data via partial linear regression models. Adopting the idea of modeling the within-cluster correlation from the method of generalized estimating equations, a least squares type estimate of the slope parameter is obtained through piecewise local polynomial approximation of the nonparametric component. This slope estimate has several advantages: (a) It attains n(1/2)-consistency without undersmoothing; (b) it is efficient when correct within-clus...
-
作者:Copt, S; Victoria-Feser, MP
作者单位:University of Sydney; University of Geneva
摘要:Mixed linear models are used to analyze data in many settings. These models have a multivariate normal formulation in most cases. The maximum likelihood estimator (MLE) or the residual MLE (REML) is usually chosen to estimate the parameters. However, the latter are based on the strong assumption of exact multivariate normality. Welsh and Richardson have shown that these estimators are not robust to small deviations from multivariate normality. This means that in practice a small proportion of ...
-
作者:Jiang, JM; Lahiri, R
作者单位:University of California System; University of California Davis; University System of Maryland; University of Maryland College Park
摘要:In this article we introduce a general methodology for producing a model-assisted empirical best predictor (EBP) of a finite population domain mean using data from a complex survey. Our method improves on the commonly used design-consistent survey estimator by using a suitable mixed model. Such a model combines information from related sources, such as census and administrative data. Unlike a purely model-based EBP, the proposed model-assisted EBP converges in probability to the customary desi...
-
作者:Zhang, P; Wang, XG; Song, PXK
作者单位:University of Waterloo; York University - Canada
摘要:We introduce a novel statistical procedure for clustering categorical data based on Hamming distance (HD) vectors. The proposed method is conceptually simple and computationally straightforward, because it does not require any specific statistical models or any convergence criteria. Moreover, unlike most currently existing algorithms that compute the class membership or membership probability for every data point at each iteration, our algorithm sequentially extracts clusters from the given da...
-
作者:Raftery, AE; Dean, N
作者单位:University of Washington; University of Washington Seattle
摘要:We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed using approximate Bayes factors. A greedy search algorithm is proposed for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples and fou...
-
作者:Li, Hongzhe; Hong, Fangxin
作者单位:University of Pennsylvania; Salk Institute
-
作者:Duembgen, Lutz; Freitag-Wolf, Sandra; Jongbloed, Geurt
作者单位:University of Bern; University of Kiel; Vrije Universiteit Amsterdam
摘要:In this article we consider three nonparametric maximum likelihood estimators based on mixed-case interval-censored data. Apart from the unrestricted estimator, we consider estimators under the assumption that the underlying distribution function of event times is concave or unimodal. Characterizations of the estimates are derived, and algorithms are proposed for their computation. The estimators are shown to be asymptotically consistent, and the benefits of additional constraints are illustra...
-
作者:Yuan, Ming; Kendziorski, Christina
作者单位:University System of Georgia; Georgia Institute of Technology; University of Wisconsin System; University of Wisconsin Madison
摘要:Among the first microarray experiments were those measuring expression over time, and time course experiments remain common. Most methods to analyze time course data attempt to group genes sharing similar temporal profiles within a single biological condition. However. with time course data in multiple conditions, a main goal is to identify differential expression patterns over time. An intuitive approach to this problem would be to apply at each time point any of the many methods for identify...
-
作者:Ferreira, Jose T. A. S.; Steel, Mark F. J.
作者单位:University of Warwick
摘要:We introduce a general perspective on the introduction of skewness into symmetric distributions. Through inverse probability integral transformations we provide a constructive representation of skewed distributions, where the skewing mechanism and the original symmetric distributions are specified separately. We study the effects of the skewing mechanism on, e.g., modality, tail behavior and the amount of skewness generated. The representation is used to introduce novel classes of skewed distr...
-
作者:Griffin, JE; Steel, MFJ
作者单位:University of Warwick
摘要:In this article we propose a new framework for Bayesian nonparametric modeling with continuous covariates. In particular. we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation. We focus mostly on the class of random distributions that induces a Dirichlet process at each covariate value. We derive the correlation between distributions at different covariate values and use a point process to...