Can ChatGPT do image recognition?

3 min readJan 20, 2023

This is a quick experiment to test if ChatGPT can do a computer vision task such as image recognition.

When prompted directly with the question “Can you process images?”, ChatGPT replies promisingly:

ChatGPT doesn’t have a media upload functionality, so we have to supply the image via a text-based form. We can do this via a base64 encoding or an image URL.

1. Supply a base64-encoded image

I base64-encoded a low-res image of a cat and asked ChatGPT:

What is depicted in this image: data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDA…

ChatGPT replied:

ChatGPT hallucinating the content of the image..

Conclusion: ChatGPTs answer is completely hallucinated. Not a single visual property was detected (person instead of cat, dark background instead of bright background, etc).

2. Supply an (obvious) base64-encoded image

Let’s try a more obvious example, a single black pixel. The base-64 code for that is: data:image/gif;base64,R0lGODlhAQABAIAAAAUEBAAAACwAAAAAAQABAAACAkQBADs=

Conclusion: The answer is partly correct. It’s a base64 encoded one, but it is not broken. ChatGPT also does not infer that it’s a single black pixel.

3. Supply an image URL

https://www.flickr.com/photos/139036569@N06/44464771991/in/photolist-2aKczzR-ZjHDzd-n5cJmU-ZD9bY5-ZD9vy1-r5FCkJ-21cvMT9-RL9BCY-213Cz8a-Qxgb3j-23cWBYU-21e7Nrp-KLEND-fvfWYV-2dcUJV3-fvvbjY-Hnkdui-fvvdLh-Dp9Zdt-27cwkkk-HPf9R1-24rXKmT-NrXWTB-Ggp2LH-LNvohM-DoQwgw-26J2uwU-ycC7vJ-xCaWPF-DgrRNJ-229dpjb-2czDaLF-jigXWX-bqBJX-p4nGVR-w1ZRiS-XMd741-QNUykc-29LUdHR-D5Ft4H-eP84CG-B76EEK-a97qAj-6r9W4o-a97qC9-2duxqcw-EaASqT-4eMh6B-Z9hAFG-8Btt6p

Conclusion: ChatGPTs answer is completely hallucinated.

4. Supply a different image URL

Next test: Same image, but I am supplying a different link to it:

https://live.staticflickr.com/1881/44464771991_3d10c7ac8e_c_d.jpg

This is a wrong answer since the link is absolutely valid.

Conclusion: ChatGPT pretends to detect broken links, which makes it look like ChatGPT is doing HTTP requests during runtime, which it certainly isn’t doing!

5. Supply an image URL with an expressive file name “Cat.jpg”

800px-A-Cat.jpg (Credits: Florinux, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons)

Conclusion: When the image file name contains a descriptive text, ChatGPT pretends to detect the image content (cat), however other inferred details (coat color, posture) are completely wrong and hallucinated!

This short experiment triggers a series of follow-up questions

How does ChatGPT deal with URLs in general?
Why is ChatGPT saying that it actually has (limited) image recognition capacities?