This is a quick experiment to test if ChatGPT can do a computer vision task such as image recognition.
When prompted directly with the question “Can you process images?”, ChatGPT replies promisingly:
ChatGPT doesn’t have a media upload functionality, so we have to supply the image via a text-based form. We can do this via a base64 encoding or an image URL.
1. Supply a base64-encoded image
I base64-encoded a low-res image of a cat and asked ChatGPT:
What is depicted in this image: data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDA…
Conclusion: ChatGPTs answer is completely hallucinated. Not a single visual property was detected (person instead of cat, dark background instead of bright background, etc).
2. Supply an (obvious) base64-encoded image
Let’s try a more obvious example, a single black pixel. The base-64 code for that is:
Conclusion: The answer is partly correct. It’s a base64 encoded one, but it is not broken. ChatGPT also does not infer that it’s a single black pixel.
3. Supply an image URL
Conclusion: ChatGPTs answer is completely hallucinated.
4. Supply a different image URL
Next test: Same image, but I am supplying a different link to it:
This is a wrong answer since the link is absolutely valid.
Conclusion: ChatGPT pretends to detect broken links, which makes it look like ChatGPT is doing HTTP requests during runtime, which it certainly isn’t doing!
5. Supply an image URL with an expressive file name “Cat.jpg”
Conclusion: When the image file name contains a descriptive text, ChatGPT pretends to detect the image content (cat), however other inferred details (coat color, posture) are completely wrong and hallucinated!
This short experiment triggers a series of follow-up questions
- How does ChatGPT deal with URLs in general?
- Why is ChatGPT saying that it actually has (limited) image recognition capacities?