WalkingRAG (2023–2024) was one of the first — I'm not certain, but it might have been the very first — products to implement multi-modal, graph-based search: what we'd now call agentic search. Multi-modal meant it could handle images, diagrams, even video at one point.
Below is one of the demos, where we read and trace a path through an IKEA manual with no words in it at all — and we did it all with GPT-4 and the very first vision models. What a time.
This is the story of WalkingRAG, told in tweet threads, Hugging Face articles, and videos. I'll editorialize where it helps.

We called it a demo at the time, but looking back this was really the first proof of concept — the thing that proved to me WalkingRAG was actually possible.
A couple of weeks later, the first real, working demo of WalkingRAG.
And then the first proper demo of it as a product — with an actual UI.
The breakthrough that made me happiest came in February: the visual indexing pass — the step that reads every page — finally ran end to end, fast, on an open model.
Around the same time we pulled off something I'd been chasing since the first release of WishfulSearch. It's less a WalkingRAG demo than pure data magic — the coolest, least sexy thing we'd done, and worth a watch.
WalkingRAG ran counter to the ideas of the time — those of chunking and embedding. These three articles cover the techniques behind what we did differently. Even today, building double-sided graphs and connecting them with embeddings is an underexplored idea that I believe could significantly improve today's systems.
