Local rademacher complexities and oracle inequalities in risk minimization

成果类型:
Article
署名作者:
Koltchinskii, Vladimir
署名单位:
University System of Georgia; Georgia Institute of Technology; University of New Mexico
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/009053606000001019
发表日期:
2006
页码:
2593-2656
关键词:
generalization error Convex combinations model selection Consistency CLASSIFICATION CLASSIFIERS margin bounds
摘要:
Let F be a class of measurable functions f : S -> [0, 1] defined on a probability space (S, A, P). Given a sample (X-1,X-...,X- X-n) of i.i.d. random variables taking values in S with common distribution P, let P-n denote the empirical measure based on (X-1,X-...,X- X-n). We study an empirical risk minimization problem P-n f -> min, f is an element of F. Given a solution (f) over cap (n) of this problem, the goal is to obtain very general upper bounds on its excess risk E-p((f) over cap (n)) ;= P (f) over cap (n) - inf/f is an element of F p f, expressed in terms of relevant geometric parameters of the class F. Using concentration inequalities and other empirical processes tools, we obtain both distribution-dependent and data-dependent upper bounds on the excess risk that are of asymptotically correct order in many examples. The bounds involve localized sup-norms of empirical and Rademacher processes indexed by functions from the class. We use these bounds to develop model selection techniques in abstract risk minimization problems that can be applied to more specialized frameworks of regression and classification.