Can ChatGPT do image recognition?

This is a quick experiment to test if ChatGPT can do a computer vision task such as image recognition.

When prompted directly with the question “Can you process images?”, ChatGPT replies promisingly:

Sounds promising…

ChatGPT doesn’t have a media upload functionality, so we have to supply the image via a text-based form. We can do this via a base64 encoding or an image URL.

1. Supply a base64-encoded image

I base64-encoded a low-res image of a cat and asked ChatGPT:

What is depicted in this image: data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDA…

ChatGPT replied:

ChatGPT hallucinating the content of the image..

Conclusion: ChatGPTs answer is completely hallucinated. Not a single visual property was detected (person instead of cat, dark background instead of bright background, etc).

2. Supply an (obvious) base64-encoded image

Let’s try a more obvious example, a single black pixel. The base-64 code for that is: data:image/gif;base64,R0lGODlhAQABAIAAAAUEBAAAACwAAAAAAQABAAACAkQBADs=

Conclusion: The answer is partly correct. It’s a base64 encoded one, but it is not broken. ChatGPT also does not infer that it’s a single black pixel.

3. Supply an image URL

Conclusion: ChatGPTs answer is completely hallucinated.

4. Supply a different image URL

Next test: Same image, but I am supplying a different link to it:

This is a wrong answer since the link is absolutely valid.

Conclusion: ChatGPT pretends to detect broken links, which makes it look like ChatGPT is doing HTTP requests during runtime, which it certainly isn’t doing!

5. Supply an image URL with an expressive file name “Cat.jpg”

800px-A-Cat.jpg (Credits: Florinux, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons)

Conclusion: When the image file name contains a descriptive text, ChatGPT pretends to detect the image content (cat), however other inferred details (coat color, posture) are completely wrong and hallucinated!

This short experiment triggers a series of follow-up questions

  • How does ChatGPT deal with URLs in general?
  • Why is ChatGPT saying that it actually has (limited) image recognition capacities?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alexa Steinbrück

A mix of Frontend Development, Machine Learning, Musings about Creative AI and more