VQGAN+CLIP — How does it work?

Early stages of training on the prompt “A high-tech outer circle with a low-tech inner filling trending on art station”
  1. What is VQGAN+CLIP
  2. Who made VQGAN+CLIP
  3. How does it work technically
  4. What is VQGAN
  5. What is CLIP
  6. How do VQGAN and CLIP work together
  7. What about the training data?
  8. Further reading and cool links

1. What is VQGAN+CLIP?

2. Who made VQGAN+CLIP

3. How does it work technically?

4. What is VQGAN?

  • a type of neural network architecture
  • VQGAN = Vector Quantized Generative Adversarial Network
  • was first proposed in the paper “Taming Transformers” by University Heidelberg (2020)
  • it combines convolutional neural networks (traditionally used for images) with Transformers (traditionally used for language)
  • it’s great for high-resolution images

5. What is CLIP?

  • a model trained to determine which caption from a set of captions best fits with a given image
  • CLIP = Contrastive Language–Image Pre-training
  • it also uses Transformers
  • proposed by OpenAI in Januar 2021
  • Paper: “Learning transferable visual models from natural language supervision”
  • Git Repository: https://github.com/openai/CLIP

6. How do VQGAN and CLIP work together

7. What about the training data?

8. Further reading and cool links

--

--

--

A mix of Frontend Development, Machine Learning, Musings about Creative AI and more

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How Does NLP Pre-Processing Actually Work?

Natural Language in iOS 12: Customizing tag schemes and named entity recognition

Weekly Briefing #2

Remove haze in a single image using estimated transmission map with EDN-GTM

Using TensorFlow Lite and ML Kit to build custom machine learning models for Android

3D Object Detection for Mobile AR using MediaPipe and WebXR

Simple Linear regression model

Building a smart garbage sorter using Rekognition along with inference

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alexa Steinbrück

Alexa Steinbrück

A mix of Frontend Development, Machine Learning, Musings about Creative AI and more

More from Medium

These Bored Apes Do Not Exist

A GIF showing slight tweaks to four generated Bored Apes.

GPT-3 — Sophisticated Gimmick or Pathway to AGI?

HAL 9000 — A fictional AGI from 2001: A Space Odyssey

Using AI and CGI in Music

City Guesser AI — Classifying Street View Images Using Transfer Learning