-
作者:Balakrishnan, Sivaraman; Wainwrightt, Martin J.; Yu, Bin
作者单位:University of California System; University of California Berkeley; Carnegie Mellon University
摘要:The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (o...
-
作者:Jin, Jiashun; Ke, Zheng Tracy; Wang, Wanjie
作者单位:Carnegie Mellon University; University of Chicago; University of Pennsylvania
摘要:Consider a two-class clustering problem where we observe X-i = l(i)mu + Zi, Zi((i,i,d) under tilde) N(0, I-p), 1 <= i <= n. The feature vector mu is an element of R-p is unknown but is presumably sparse. The class labels l(i) is an element of {-1, 1} are also unknown and the main interest is to estimate them. We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we find the precise demarcation for the Region of ...
-
作者:Choi, David
作者单位:Carnegie Mellon University
摘要:Performance bounds are given for exploratory co-clustering/blockmodeling of bipartite graph data, where we assume the rows and columns of the data matrix are samples from an arbitrary population. This is equivalent to assuming that the data is generated from a nonsmooth graphon. It is shown that co-clusters found by any method can be extended to the row and column populations, or equivalently that the estimated blockmodel approximates a blocked version of the generative graphon, with estimatio...
-
作者:Mousavi, Ali; Maleki, Arian; Baraniuk, Richard G.
作者单位:Rice University; Columbia University
摘要:This paper studies the optimal tuning of the regularization parameter in LASSO or the threshold parameters in approximate message passing (AMP). Considering a model in which the design matrix and noise are zero-mean i.i.d. Gaussian, we propose a data-driven approach for estimating the regularization parameter of LASSO and the threshold parameters in AMP. Our estimates are consistent, that is, they converge to their asymptotically optimal values in probability as n, the number of observations, ...
-
作者:Fallat, Shaun; Lauritzen, Steffen; Sadeghi, Kayvan; Uhler, Caroline; Wermuth, Nanny; Zwiernik, Piotr
作者单位:University of Regina; University of Copenhagen; University of Cambridge; Massachusetts Institute of Technology (MIT); Institute of Science & Technology - Austria; Chalmers University of Technology; Johannes Gutenberg University of Mainz; Pompeu Fabra University
摘要:We discuss properties of distributions that are multivariate totally positive of order two (MTP2) related to conditional independence. In particular, we show that any independence model generated by an MTP2 distribution is a compositional semi-graphoid which is upward-stable and singletontransitive. In addition, we prove that any MTP2 distribution satisfying an appropriate support condition is faithful to its concentration graph. Finally, we analyze factorization properties of MTP2 distributio...
-
作者:Lee, Young K.; Mammen, Enno; Nielsen, Jens P.; Park, Byeong U.
作者单位:Kangwon National University; Ruprecht Karls University Heidelberg; City St Georges, University of London; Seoul National University (SNU)
摘要:In this paper, we consider a new structural model for in-sample density forecasting. In-sample density forecasting is to estimate a structured density on a region where data are observed and then reuse the estimated structured density on some region where data are not observed. Our structural assumption is that the density is a product of one-dimensional functions with one function sitting on the scale of a transformed space of observations. The transformation involves another unknown one-dime...
-
作者:Wang, Y. X. Rachel; Bickel, Peter J.
作者单位:Stanford University; University of California System; University of California Berkeley
摘要:The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is...
-
作者:Xu, Gongjun
作者单位:University of Minnesota System; University of Minnesota Twin Cities
摘要:Statistical latent class models are widely used in social and psychological researches, yet it is often difficult to establish the identifiability of the model parameters. In this paper, we consider the identifiability issue of a family of restricted latent class models, where the restriction structures are needed to reflect pre-specified assumptions on the related assessment. We establish the identifiability results in the strict sense and specify which types of restriction structure would gi...
-
作者:Anevski, Dragi; Gill, Richard D.; Zohren, Stefan
作者单位:Lund University; Leiden University - Excl LUMC; Leiden University; University of Oxford
摘要:In the context of a species sampling problem, we discuss a nonparametric maximum likelihood estimator for the underlying probability mass function. The estimator is known in the computer science literature as the high profile estimator. We prove strong consistency and derive the rates of convergence, for an extended model version of the estimator. We also study a sieved estimator for which similar consistency results are derived. Numerical computation of the sieved estimator is of great intere...
-
作者:Bai, Shuyang; Taqqu, Murad S.
作者单位:University System of Georgia; University of Georgia; Boston University
摘要:For long-memory time series, inference based on resampling is of crucial importance, since the asymptotic distribution can often be non-Gaussian and is difficult to determine statistically. However, due to the strong dependence, establishing the asymptotic validity of resampling methods is nontrivial. In this paper, we derive an efficient bound for the canonical correlation between two finite blocks of a long-memory time series. We show how this bound can be applied to establish the asymptotic...