NeuroCOLT
workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland
Lodge
"Avoiding Union Bounds"
David McAllester
In analyzing a given learning algorithm it is often tempting
to take a union bound over a large number of different measurements.
Union bounds over large sets can weaken an analysis considerably. We
give two case studies in "direct" concentration inequalities that
avoid large union bounds. The first is the concentration inequality
for the Good-Turing estimator of the missing mass and the second is a
concentration inequality for the error rate of a tabular rule. The
proof methods employed seem fairly general and some open problems will
be presented that we feel should yield to similar methods, in particular
bounds on the cross entropy of statistical natural language models.