For the final project of my graph theory class, I trained a model to reconstruct the adjacency matrix of a graph based off the attention values of a small LLM. This project demonstrates an example of how the attention map patterns of LLMs can be used to help interpret the inner workings of language models.
[ACCV 2024 Oral]
Vision Language Models are Blind
This paper shows the limitations of VLMs, such as GPT-4o, in extremely simple, abstract vision tasks.
Despite their high scores on multimodal benchmarks, these models often fail on very basic cases.
This research has been featured by
OpenAI,
TechCrunch, and
Ars Technica.