-
作者:Ahmed, Hanan; Einmahl, John H. J.; Zhou, Chen
作者单位:Tilburg University; Erasmus University Rotterdam - Excl Erasmus MC; Erasmus University Rotterdam
摘要:We consider extreme value analysis in a semi-supervised setting, where we observe, next to the n data on the target variable, n + m data on one or more covariates. This is called the semi-supervised model with n labeled and m unlabeled data. By exploiting the tail dependence between the target variable and the covariates, we derive estimators for the extreme value index and extreme quantiles of the target variable in this setting and establish their asymptotic behavior. Our estimators substant...
-
作者:Lee, Seong-ho; Ma, Yanyuan; Zhao, Jiwei
作者单位:University of Seoul; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; University of Wisconsin System; University of Wisconsin Madison
摘要:In studies ranging from clinical medicine to policy research, complete data are usually available from a population P, but the quantity of interest is often sought for a related but different population Q which only has partial data. We consider the setting when both outcome Y and covariate X are available from P but only X is available from Q, under the label shift assumption; that is, the conditional distribution of X given Y is the same in the two populations. To estimate the parameter of i...
-
作者:Duan, Yunshan; Guo, Shuai; Wang, Wenyi; Mueller, Peter
作者单位:University of Texas System; University of Texas Austin; University of Texas System; UTMD Anderson Cancer Center
摘要:Comparison of transcriptomic data across different conditions is of interest in many biomedical studies. In this article, we consider comparative immune cell profiling for early-onset (EO) versus late-onset (LO) colorectal cancer (CRC). EOCRC, diagnosed between ages 18-45, is a rising public health concern that needs to be urgently addressed. However, its etiology remains poorly understood. We work toward filling this gap by identifying homogeneous T cell sub-populations that show significantl...
-
作者:Gu, Yu; Zeng, Donglin; Lin, D. Y.
作者单位:University of Hong Kong; University of Michigan System; University of Michigan; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine
摘要:In studies of chronic diseases, the health status of a subject can often be characterized by a finite number of transient disease states and an absorbing state, such as death. The times of transitions among the transient states are ascertained through periodic examinations and thus interval-censored. The time of reaching the absorbing state is known or right-censored, with the transient state at the previous instant being unobserved. In this article, we provide a general framework for analyzin...
-
作者:Kuusela, Mikael
作者单位:Carnegie Mellon University
-
作者:Tian, Ye; Feng, Yang
作者单位:Columbia University; New York University
摘要:Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due...
-
作者:Gao, Zhaoxing; Tsay, Ruey S.
作者单位:University of Electronic Science & Technology of China; Zhejiang University; University of Chicago
摘要:This paper proposes a novel dynamic forecasting method using a new supervised Principal Component Analysis (PCA) when a large number of predictors are available. The new supervised PCA provides an effective way to bridge the gap between predictors and the target variable of interest by scaling and combining the predictors and their lagged values, resulting in an effective dynamic forecasting. Unlike the traditional diffusion-index approach, which does not learn the relationships between the pr...
-
作者:Qin, Jing; Liu, Yukun; Li, Moming; Huang, Chiung-Yu
作者单位:National Institutes of Health (NIH) - USA; NIH National Institute of Allergy & Infectious Diseases (NIAID); East China Normal University; East China Normal University; University of California System; University of California San Francisco
摘要:Owing to its appealing distribution-free feature, conformal inference has become a popular tool for constructing prediction intervals with a desired coverage rate. In scenarios involving covariate shift, where the shift function needs to be estimated from data, many existing methods resort to data-splitting techniques. However, these approaches often lead to wider intervals and less reliable coverage rates, especially when dealing with finite sample sizes. To address these challenges, we propo...
-
作者:Linero, Antonio R.
作者单位:University of Texas System; University of Texas Austin
摘要:Bayesian additive regression trees have seen increased interest in recent years due to their ability to combine machine learning techniques with principled uncertainty quantification. The Bayesian backfitting algorithm used to fit BART models, however, limits their application to a small class of models for which conditional conjugacy exists. In this article, we greatly expand the domain of applicability of BART to arbitrary generalized BART models by introducing a very simple, tuning-paramete...
-
作者:Park, Seyoung; Lee, Eun Ryung; Kim, Hyunjin; Zhao, Hongyu
作者单位:Yonsei University; Yonsei University; Sungkyunkwan University (SKKU); Yale University
摘要:In high-dimensional multiple response regression problems, the large dimensionality of the coefficient matrix poses a challenge to parameter estimation. To address this challenge, low-rank matrix estimation methods have been developed to facilitate parameter estimation in the high-dimensional regime, where the number of parameters increases with sample size. Despite these methodological advances, accurately predicting multiple responses with limited target data remains a difficult task. To gai...