VQGAN+CLIP — How does it work?

Early stages of training on the prompt “A high-tech outer circle with a low-tech inner filling trending on art station”
  1. What is VQGAN+CLIP
  2. Who made VQGAN+CLIP
  3. How does it work technically
  4. What is VQGAN
  5. What is CLIP
  6. How do VQGAN and CLIP work together
  7. What about the training data?
  8. Further reading and cool links

1. What is VQGAN+CLIP?

2. Who made VQGAN+CLIP

3. How does it work technically?

4. What is VQGAN?

  • a type of neural network architecture
  • VQGAN = Vector Quantized Generative Adversarial Network
  • was first proposed in the paper “Taming Transformers” by University Heidelberg (2020)
  • it combines convolutional neural networks (traditionally used for images) with Transformers (traditionally used for language)
  • it’s great for high-resolution images

5. What is CLIP?

  • a model trained to determine which caption from a set of captions best fits with a given image
  • CLIP = Contrastive Language–Image Pre-training
  • it also uses Transformers
  • proposed by OpenAI in Januar 2021
  • Paper: “Learning transferable visual models from natural language supervision”
  • Git Repository: https://github.com/openai/CLIP

6. How do VQGAN and CLIP work together

7. What about the training data?

8. Further reading and cool links

--

--

--

A mix of Frontend Development, Machine Learning, Musings about Creative AI and more

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Recognizing Handwritten Digits Using Scikit-Learn in Python

Imbalanced Multilabel Scene Classification using Keras

Counting new objects in image using only a few examples

Analance Machine Learning: Put an end to unplanned downtimes with Predictive Maintenance Analytics

Computer vision transformer models (CLIP, ViT, DeiT) released by Hugging Face

Good and fast 3D object detection by using simulated depth data

I have read “Creating Convolutional Neural Network from scratch using C#”

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alexa Steinbrück

Alexa Steinbrück

A mix of Frontend Development, Machine Learning, Musings about Creative AI and more

More from Medium

Explaining the code of the popular text-to-image algorithm (VQGAN+CLIP in PyTorch)

The History of the Battle of Artists and Scientists against Forgeries

The New Version of GPT-3 Is Much, Much Better

DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)