-
作者:Thas, Olivier; De Neve, Jan; Clement, Lieven; Ottoy, Jean-Pierre
作者单位:Ghent University; University of Wollongong
摘要:. We present a semiparametric statistical model for the probabilistic index which can be defined as P(YY*), where Y and Y* are independent random response variables associated with covariate patterns X and X* respectively. A link function defines the relationship between the probabilistic index and a linear predictor. Asymptotic normality of the estimators and consistency of the covariance matrix estimator are established through semiparametric theory. The model is illustrated with several exa...
-
作者:Fan, Jianqing; Feng, Yang; Tong, Xin
作者单位:Princeton University; Columbia University
摘要:. For high dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due to diverging spectra and accumulation of noise. Therefore, researchers proposed independence rules to circumvent the diverging spectra, and sparse independence rules to mitigate the issue of accumulation of noise. However, in biological applications, often a group of correlated genes are responsible for clinical outcomes, and the use of the covariance informati...
-
作者:Tibshirani, Robert; Bien, Jacob; Friedman, Jerome; Hastie, Trevor; Simon, Noah; Taylor, Jonathan; Tibshirani, Ryan J.
作者单位:Stanford University
摘要:. We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have proposed SAFE rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predicto...
-
作者:Casella, G.; Roberts, G.
-
作者:Allen, Genevera I. I.; Tibshirani, Robert
作者单位:Baylor College of Medicine; Rice University; Stanford University; Rice University
摘要:We consider the problem of large-scale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are transposable meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario is detecting significant genes in microarrays when the samples may be dependent because of latent variables or unknown batch effects. By modelling this matrix data by using the matrix variate normal distribu...
-
作者:Ambroise, Christophe; Matias, Catherine
作者单位:Universite Paris Saclay; Centre National de la Recherche Scientifique (CNRS)
摘要:Random-graph mixture models are very popular for modelling real data networks. Parameter estimation procedures usually rely on variational approximations, either combined with the expectation-maximization (EM) algorithm or with Bayesian approaches. Despite good results on synthetic data, the validity of the variational approximation is, however, not established. Moreover, these variational approaches aim at approximating the maximum likelihood or the maximum a posteriori estimators, whose beha...
-
作者:Dellaportas, Petros; Kontoyiannis, Ioannis
作者单位:Athens University of Economics & Business
摘要:. A general methodology is introduced for the construction and effective application of control variates to estimation problems involving data from reversible Markov chain Monte Carlo samplers. We propose the use of a specific class of functions as control variates, and we introduce a new consistent estimator for the values of the coefficients of the optimal linear combination of these functions. For a specific Markov chain Monte Carlo scenario, the form and proposed construction of the contro...
-
作者:Lee, Stephen M. S.
作者单位:University of Hong Kong
摘要:. We consider the general problem of constructing confidence regions for, possibly multi-dimensional, parameters when we have available more than one approach for the construction. These approaches may be motivated by different model assumptions, different levels of approximation, different settings of tuning parameters or different Monte Carlo algorithms. Their effectiveness is often governed by different sets of conditions which are difficult to vindicate in practice. We propose two procedur...
-
作者:Evans, Steven N.; Matsen, Frederick A.
作者单位:Fred Hutchinson Cancer Center; University of California System; University of California Berkeley
摘要:. It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample ...
-
作者:Chen, Kehui; Mueller, Hans-Georg
作者单位:University of California System; University of California Davis
摘要:. Motivated by the conditional growth charts problem, we develop a method for conditional quantile analysis when predictors take values in a functional space. The method proposed aims at estimating conditional distribution functions under a generalized functional regression framework. This approach facilitates balancing of model flexibility and the curse of dimensionality for the infinite dimensional functional predictors. Its good performance in comparison with other methods, both for sparsel...