A Simple Framework for Contrastive Learning of Visual Representations

1. Introduction

Two types of visual representation learning

Generative
1. Generate pixels in input space. Model how data is generated. a generative model tries to understand the underlying cause-effect relationships in the data, so it can generate new data points that resemble the training data
2. Pixel generation is expensive
Discriminative Model
1. These models learn the conditional probability distribution p(y | x), which is the probability of the output y given the input x.
2. Dont create new inputs, but rather distinguish between output classes
3. Relies on heuristics on pretext tasks, limit generality of learned representations

<aside> 💡 we introduce a simple framework for contrastive learning of visual representations, which we call SimCLR

</aside>

Major components:

SimCLR learns representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space.

Major components

three simple augmentations: random cropping followed by resize back to the original size, random color distortions, and random Gaussian blur