-
作者:Wang, Zixiao; Feng, Yi; Liu, Lin
作者单位:Johns Hopkins University; Johns Hopkins Bloomberg School of Public Health; Shanghai Jiao Tong University; Shanghai Jiao Tong University
-
作者:Zhou, Yang; Xue, Lirong; Shi, Zhengyu; Wu, Libo; Fan, Jianqing
作者单位:Fudan University; Princeton University; Fudan University; Fudan University
摘要:Measuring timely high-resolution socioeconomic outcomes is critical for policymaking and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This article demonstrates an adaptive way to measure the time trend and spatial distribution of housing vitality (number of occupied houses) with the help of multiple easily accessible datasets: energy, nig...
-
作者:Xue, Fei; Zhang, Yanqing; Zhou, Wenzhuo; Fu, Haoda; Qu, Annie
作者单位:University of Pennsylvania; Yunnan University; University of Illinois System; University of Illinois Urbana-Champaign; Eli Lilly; University of California System; University of California Irvine
摘要:An optimal dynamic treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits, which is applicable for chronic diseases such as HIV infection or cancer. In this article, we develop a novel angle-based approach to search the optimal DTR under a multicategory treatment framework for survival data. The proposed method targets to maximize the conditional survival function of patients following a DTR. In contrast to most existing approaches which are designed t...
-
作者:Wu, Tung-Yu; Rachel Wang, Y. X.; Wong, Wing H.
作者单位:Stanford University; University of Sydney; Stanford University
摘要:Traditional Markov chain Monte Carlo (MCMC) algorithms are computationally intensive and do not scale well to large data. In particular, the Metropolis-Hastings (MH) algorithm requires passing over the entire dataset to evaluate the likelihood ratio in each iteration. We propose a general framework for performing MH-MCMC using mini-batches of the whole dataset and show that this gives rise to approximately a tempered stationary distribution. We prove that the algorithm preserves the modes of t...
-
作者:Zhou, Yang; Xue, Lirong; Shi, Zhengyu; Wu, Libo; Fan, Jianqing
作者单位:Fudan University; Fudan University; Fudan University; Princeton University
-
作者:Barabesi, Lucio; Cerasa, Andrea; Cerioli, Andrea; Perrotta, Domenico
作者单位:University of Siena; European Commission Joint Research Centre; EC JRC ISPRA Site; University of Parma
摘要:Benford's law defines a probability distribution for patterns of significant digits in real numbers. When the law is expected to hold for genuine observations, deviation from it can be taken as evidence of possible data manipulation. We derive results on a transform of the significand function that provide motivation for new tests of conformance to Benford's law exploiting its sum-invariance characterization. We also study the connection between sum invariance of the first digit and the corres...
-
作者:Hessellund, Kristian Bjorn; Xu, Ganggang; Guan, Yongtao; Waagepetersen, Rasmus
作者单位:Aalborg University; University of Miami
摘要:We propose a new method for analysis of multivariate point pattern data observed in a heterogeneous environment and with complex intensity functions. We suggest semiparametric models for the intensity functions that depend on an unspecified factor common to all types of points. This is for example well suited for analyzing spatial covariate effects on events such as street crime activities that occur in a complex urban environment. A multinomial conditional corn posite likelihood function is i...
-
作者:Hoff, Peter
作者单位:Duke University
摘要:This article develops p-values for evaluating means of normal populations that make use of indirect or prior information. A p-value of this type is based on a biased frequentist hypothesis test that has optimal average power with respect to a probability distribution that encodes indirect information about the mean parameter, resulting in a smaller p-value if the indirect information is accurate. In a variety of multiparameter settings, we show how to adaptively estimate the indirect informati...
-
作者:Betancourt, Brenda; Zanella, Giacomo; Steorts, Rebecca C.
作者单位:State University System of Florida; University of Florida; Bocconi University; Duke University
摘要:Traditional Bayesian random partition models assume that the size of each cluster grows linearly with the number of data points. While this is appealing for some applications, this assumption is not appropriate for other tasks such as entity resolution (ER), modeling of sparse networks, and DNA sequencing tasks. Such applications require models that yield clusters whose sizes grow sublinearly with the total number of data points-the microclustering property. Motivated by these issues, we propo...
-
作者:Mukherjee, Somabha; Agarwal, Divyansh; Zhang, Nancy R.; Bhattacharya, Bhaswar B.
作者单位:University of Pennsylvania
摘要:In this article, we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based on the minimum non-bipartite matching, and then utilizes the number of edges connecting data points from different classes to examine the closeness between the distributions. The proposed test is exactly distribution-free (the null distribution ...