https://arxiv.org/pdf/2002.05709.pdf
1. Introduction
Two types of visual representation learning
- Generative
- Generate pixels in input space. Model how data is generated. a generative model tries to understand the underlying cause-effect relationships in the data, so it can generate new data points that resemble the training data
- Pixel generation is expensive
- Discriminative Model
- These models learn the conditional probability distribution p(y | x), which is the probability of the output y given the input x.
- Dont create new inputs, but rather distinguish between output classes
- Relies on heuristics on pretext tasks, limit generality of learned representations
<aside>
💡 we introduce a simple framework for contrastive learning of visual representations, which we call SimCLR
</aside>
Major components:
- augmentation operations is helpful in contrastive tasks
- nonlinear transformations introduces more compact representations
- contrastive learning with nonlinear transformation and normalized embeddings
- Larger batch sizes and longer steps
2. Method
2.1. The Contrastive Learning Framework
- SimCLR learns representations by maximizing agreement between differently augmented
views of the same data example via a contrastive loss in the latent space.
Major components
- three simple augmentations: random cropping followed by resize back to the original size, random color distortions, and random Gaussian blur