Cloudflare's workers-oauth-provider (by Ken) is a really interesting experiment. This is - to me - the first time I've seen in-prod development driven entirely with AI, and open-sourced - with the prompts.
Let's look at what happened. If you want to follow along, you can use this thing Claude Code made for me: https://cf-commits-visualized.vercel.app/
Making the site above took an hour. Here's what I did:
Clone the repo
Use Claude Code to make a gh cli script to get additional comment metadata into a json
Use this script to run it through gemini-2.5-flash (using Cloudflare Gateway) to get some additional analysis
Try making a frontend to look at everything



Some interesting things in this process:
To explain this a little more, the two ways to process commits here would be:
I found I had significantly better results with the sequential approach. The tradeoff here I expected to be some confusion around which commit to focus on, but 2.5 flash didn't seem to have that issue. If you notice anything I missed in the results, let me know!
Before my thoughts, Max's thoughts are worth a read, as are Neil's.
Looking at git-of-theseus, we see a pattern I've become very familiar with:

We have code being added in layers on top of each other, with not a lot of removals. I wouldn't be surprised if most of the core removals came from Ken (or were explicitly directed by Ken to CC) - from his logs it looks like this is the case.
The initial commit is interesting, since it's leaving almost everything up to Claude Code. Not much is provided except the outer interface (through a user example) that we need to satisfy.
Some interesting commits:


Overall this still feels a lot like pair-coding with an eager junior. If we look at Gemini's review of the commits by level of AI-involvement, we see a graph that looks a lot like my own projects:

Write some code with AI, make some changes, make some more changes, hand control to the AI - repeat.
What's interesting is that two different general flows for coding with AI have emerged.
The first one is to start with a broad intent - some level of spec about the output, some kind of user story - to generate some code, review and repeat. This is what Ken seems to be doing here, and it requires a few things to get to production with any meaningful level of complexity:


The second (which is the one I've found myself following) - is to start with very detailed specs. Often I'll have 10-20k words of specs, covering behavior, architecture choices, pitfalls, testing methodologies - all before calling Claude Code once. I'll write something up about this later, but it comes with its own tradeoffs.
Overall this has been really interesting. It's always insightful to review another engineer's process, what the emergent behaviors were. Especially in the context of writing production code with agentic coding tools, this is likely the first example among many.

