Home / Papers / CS 229 T / STATS 231 : Statistical Learning Theory

CS 229 T / STATS 231 : Statistical Learning Theory

88 Citations•2018•

Weiyun Ma

journal unavailable

This result partially explains why the MLE, which has the smallest training loss, is also likely to achieve a small testing error when there are enough training examples.

Abstract

Here L is the expected loss. θ̂ is the minimizer of the training loss, while θ∗ is the ground truth parameter. This result partially explains why the MLE, which has the smallest training loss, is also likely to achieve a small testing error when there are enough training examples. One limitation of the above result is that it requires well-specifiedness, i.e., the data are distributed precisely according to a particular ground truth parameter θ∗ in the parameter space. We would like to prove a more general result in the following form without assuming well-specifiedness.1 L(θ̂)− L(θ∗) ≤ f(p, n), ∀p, n ≥ 1.