Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Year
2021
Type(s)
Author(s)
Sicheng Zhu, Bang An, Furong Huang
Source
Neural Information Processing System (NeurIPS), 2021.
Url
https://proceedings.neurips.cc/paper/2021/file/2287c6b8641dd2d21ab050eb9ff795f3-Paper.pdf
BibTeX
BibTeX

Machine learning models that are developed to be invariant under certain types of data transformations have shown improved generalization in practice. However, a principled understanding of why invariance benefits generalization is limited. Given a task, there is often no principled way to select “suitable” data transformations under which model invariance guarantees better generalization. This paper understands the generalization benefit of model invariance by introducing the sample cover induced by transformations, i.e., a representative subset of a dataset that can approximately recover the whole dataset using transformations. For any data transformations, we provide refined generalization bounds for invariant models based on the sample cover. We also characterize the “suitability” of a set of data transformations by the sample covering number induced by transformations, i.e., the smallest size of its induced sample covers. We show that we could tighten the generalization bounds for “suitable” transformations that have a small sample covering number. In addition, our proposed sample covering number can be empirically evaluated and thus provides principled guidance for selecting transformations to develop model invariance for better generalization. In experiments on CIFAR-10 and a 3D dataset, we empirically evaluate sample covering numbers for some commonly used transformations and show that the smaller sample covering number for a set of transformations (e.g. the 3D-view transformation) indicates a smaller gap between the test and training error for invariant models, which verifies our propositions.

Summary of Contributions

Data-perspective understanding.   We understand the generalization benefit of model invariance via characterizing the sample-dependent properties of data transformations. We introduce the notion of sample cover induced by data transformations and establish connections between the small sample covering number of data transformations and the small generalization bound of invariant models. Since this general connection requires a strong assumption on data transformations to be instructive, we relax the assumption by further assuming the Lipschitzness of model class to get a refined generalization bound. To understand the generalization benefit of model invariance in a more interpretable way, we also consider linear model class and show that the model complexity of invariant models can be much smaller than its counterpart, depending on the sample-dependent properties of data transformations.

 

Guidance for data transformation selection.  To guide the data transformation selection in practice, we propose to use the sample covering number induced by data transformations as a suitability measurement to estimate the generalization benefit for invariant models.This measurement is model-agnostic and applies generally to any data transformation.We also introduce an algorithm to estimate the sample covering number in practice.

 

Empirical evaluations.   We empirically estimate the sample covering number for some common data transformations on CIFAR-10 and ShapeNet (a 3D-dataset) to evaluate their potential generalization benefit. The 3D-view transformation renders smaller empirical sample covering number than others by a large margin on ShapeNet, while cropping is the most favorable among others. Then, we do invariant model training under those data transformations by data-augmentation and invariance loss penalization to evaluate the actual generalization benefit for invariant models. Results show clear correlation between smaller sample covering numbers of data transformations and better generalization benefit for invariant models.