Finetuned an open-source LLM and trained with reinforcement learning to improve the model from only being able to complete 0.1% of Wordle games to successfully completing 18% of games.
[ACCV 2024 Oral]
Vision Language Models are Blind
This paper shows the limitations of VLMs, such as GPT-4o, in extremely simple, abstract vision tasks.
Despite their high scores on multimodal benchmarks, these models often fail on very basic cases.
This research has been featured by
OpenAI,
TechCrunch, and
Ars Technica.