-
作者:Dombowsky, Alexander; Dunson, David B.
作者单位:Duke University; Duke University
摘要:While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables. A challenge is how to model the across-view dependence between the partitions of patients into subgroups. T...
-
作者:Wei, Waverly; Ma, Xinwei; Wang, Jingshen
作者单位:University of Southern California; University of California System; University of California San Diego
摘要:Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analysing observational data based on strong causal assumptions or conducting post hoc analyses of randomized controlled trial data, and there has been limited effort dedicated to the design of randomized experime...
-
作者:Sesia, Matteo; Wang, Y. X. Rachel; Tong, Xin
作者单位:University of Southern California; University of Southern California; University of Sydney; University of Hong Kong
摘要:This article develops a conformal prediction method for classification tasks that can adapt to random label contamination in the calibration sample, often leading to more informative prediction sets with stronger coverage guarantees compared to existing approaches. This is obtained through a precise characterization of the coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through a new calibration a...
-
作者:Zhang, Jiawei; Yang, Yuhong; Ding, Jie
作者单位:University of Kentucky; University of Minnesota System; University of Minnesota Twin Cities
摘要:It is quite popular nowadays for researchers and data analysts holding different datasets to seek assistance from each other to enhance their modelling performance. We consider a scenario where different learners hold datasets with potentially distinct variables, and their observations can be aligned by a nonprivate identifier. Their collaboration faces the following difficulties: first, learners may need to keep data values or even variable names undisclosed due to, e.g. commercial interest o...
-
作者:Craig, Erin; Pilanci, Mert; Le Menestrel, Thomas; Narasimhan, Balasubramanian; Rivas, Manuel A.; Gullaksen, Stein-Erik; Dehghannasiri, Roozbeh; Salzman, Julia; Taylor, Jonathan; Tibshirani, Robert
作者单位:Stanford University; Stanford University; Stanford University; Stanford University; University of Bergen; Haukeland University Hospital; University of Bergen; Stanford University
摘要:Pre-training is a powerful paradigm in machine learning to pass information across models. For example, suppose one has a modest-sized dataset of images of cats and dogs and plans to fit a deep neural network to classify them. With pre-training, we start with a neural network trained on a large corpus of images of not just cats and dogs but hundreds of classes. We fix all network weights except the top layer(s) and fine tune on our dataset. This often results in dramatically better performance...
-
作者:Cheng, Chao; Li, Fan
作者单位:Yale University; Yale University
摘要:We consider assessing causal mediation in the presence of a posttreatment event (examples include noncompliance, a clinical event, or death). We identify natural mediation effects for the entire study population and for each principal stratum characterized by the joint potential values of the posttreatment event. We derive the efficient influence function for each mediation estimand, which motivates a set of multiply robust estimators for inference. The multiply robust estimators are consisten...
-
作者:Rios, Nicholas; Lin, Dennis K. J.
作者单位:George Mason University; Purdue University System; Purdue University
摘要:In an Order-of-Addition (OofA) experiment, the order in which m components are added to a system influences a response. Although much research has been done on optimal OofA experiments, existing methodologies typically assume that all m! orders are possible. However, in many practical examples, there are directed constraints on the pairwise order of components, making some of the m! orders infeasible. These constraints can be represented by a directed acyclic graph (DAG). The goal of the OofA ...
-
作者:Miao, Wang; Li, Xinyu; Zhang, Ping; Sun, Baoluo
作者单位:Peking University; National University of Singapore
摘要:Nonresponse arises frequently in surveys, and follow-ups are routinely made to increase the response rate. In order to monitor the follow-up process, callback data have been used in social sciences and survey studies for decades. In modern surveys, the availability of callback data is increasing because the response rate is decreasing, and follow-ups are essential to collect maximum information. Although callback data are helpful to reduce the bias in surveys, such data have not been widely us...
-
作者:Sood, Anav; Hastie, Trevor
作者单位:Stanford University
摘要:We consider the problem of selecting a small subset of representative variables from a large dataset. In the computer science literature, this dimensionality reduction problem is typically formalized as column subset selection (CSS). Meanwhile, the typical statistical formalization is to find an information-maximizing set of principal variables. This paper shows that these two approaches are equivalent, and moreover, both can be viewed as maximum-likelihood estimation within a certain semi-par...
-
作者:Yiu, Andrew; Fong, Edwin; Holmes, Chris; Rousseau, Judith
作者单位:University of Oxford; University of Hong Kong; University of Oxford; Universite PSL; Universite Paris-Dauphine; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI)
摘要:We present a new approach to semiparametric inference using corrected posterior distributions. The method allows us to leverage the adaptivity, regularization, and predictive power of nonparametric Bayesian procedures to estimate low-dimensional functionals of interest without being restricted by the holistic Bayesian formalism. Starting from a conventional posterior on the whole data-generating distribution, we correct the marginal posterior for each functional of interest with the help of th...