The Product Thinking Framework for Shipping with AI

"Just describe what you want and launch your app"—that's what the tools promise. You can build without coding. I did. I built a glycemic load calculator using AI tools. No code written.

But if you've tried this, you've probably hit the same wall: the first version comes fast, then you spend hours prompting in circles, watching the tool rebuild things you didn't ask for, unsure why it's not landing.

That's the prompt-to-product gap.

Getting a click-through prototype is fast. Shipping something you'll actually use daily—or that real users will rely on—requires product thinking.

This post is the framework I now use every time I build with AI. It's not about the tools. It's about the questions you ask before and during the build.

Who This Is For

This framework is for you if:

You've tried vibe coding but got stuck iterating expensively
You want to ship MVPs to real users, not just create demos
You're building for yourself, testing ideas, or learning by shipping

Before You Start

You don't need to know how to code. I don't write code—I describe what I want and steer the AI when it makes choices I don't agree with.

You don't need to pick the "right" tools. I used Claude Code, Stitch for design, and Render for hosting. You might use Cursor, Replit, or Figma. The framework works regardless.

What you do need is a problem worth solving and the patience to test what you build.

My GL calculator took about a week of evenings—not because the building was slow, but because the thinking and testing took time. The actual coding happened in a few sessions.

The 4-step framework

Step 1: Define the problem, not the features

Before touching any tool, I articulate the problem in plain language.

I didn't want to track glycemic load through meticulous Excel entries. I needed something fast, accurate enough, and easy to use daily on my phone—without feeling like homework.

That led to a single core user story:

As someone who won't keep up with Excel tracking,
I want to know if my meal is too high in glycemic load in under 30 seconds,
so I can make better choices without the burden of daily logging.

Everything else was secondary.

User story template

Step 2: Design before your tool starts coding

Once the problem was locked, I focused on the experience before getting an AI agent to generate code.

I used a text-to-design tool (Stitch, in my case) to iterate on UX cheaply. This mattered because unconstrained AI design tends to replicate what already exists.

For a meal-tracking app, that default looks like dashboards and analytics, meal history timelines, reminders and streaks, profiles and social sharing.

All reasonable. None necessary for my problem.

How I decided what to cut

I went back to my user story and asked one question for each feature AI suggested:

"Does this help me know if my meal is okay in under 30 seconds?"

Feature	Serves the 30-sec check?	V1?
Natural language input	✓ Yes — fastest way to describe food	✓
Instant GL feedback	✓ Yes — the answer I need	✓
Food swap suggestion	✓ Yes — actionable next step	✓
Meal history	✗ Adjacent — don't need last week's lunch	✗
Streaks / gamification	✗ Harmful — I'd quit after breaking one	✗
Dashboards	✗ Nice to have — adds complexity	✗

The filter I now use

For any feature, I ask:

Does this serve the core user story, or is it adjacent?
Would I use this in the first week, or is it a "maybe later"?
Does adding this make the happy path longer?

If a feature doesn't pass all three, it's not v1.

The lesson: If you don't constrain design upfront, AI will overbuild for you. It's not wrong—it's just optimising for completeness, not your specific problem.

Step 3: Ask better questions before accepting what AI builds

Once the UX was locked, I started coding with Claude Code as my agent.

This is where I realised something important: AI doesn't just write code—it quietly answers questions you never asked.

If you don't intervene, it decides how often to call external services, how tightly your app depends on them, and what happens when things fail. These aren't technical details. They're product decisions that affect whether your app feels reliable or flaky.

You don't need to understand architecture. You need a small set of questions to ask before the AI starts scaffolding.

I now ask for a plan of execution before approving any build. Some tools like Replit have a default planning mode. If yours doesn't, just ask: "Before you start building, walk me through your plan."

Then I interrogate that plan with three question sets.

Question set 1: About dependencies

Any time the plan includes an external service—an API, an LLM call, a database lookup—I ask:

Does the user do this action often?
If this service fails or slows down, will it break trust?

Then I ask the agent directly.

Me:

"Does this really need to call the Anthropic API every time someone enters a meal? Can we reduce how often we depend on it?"

Claude Code:

"We could pre-load a local database of common foods with their glycemic index values. API calls would only happen for foods not in the database. Want me to structure it that way?"

Me:

"Yes. And what happens if the API is slow or unavailable?"

Claude Code:

"I can add a timeout and show a fallback message: 'Couldn't fetch data for this food—try a simpler description or check spelling.'"

Me:

"Do that."

How this played out: Glycemic load calculation happens for every meal and needs consistency. My meals are repetitive, so I chose to provide an exhaustive local list with pre-configured GI values. LLM calls only happen for foods not in the list, with caching and a graceful fallback.

The rule of thumb: reduce dependency on external services as much as possible.

Question set 2: About setup and complexity

AI often defaults to what looks "proper"—separate frontend and backend, multiple layers, clean abstractions. This is fine for teams. It's overkill for a solo builder shipping an MVP.

Me:

"You're suggesting a React SPA with a separate Flask backend. Is there a simpler setup that still works for what I need?"

Claude Code:

"You could use Flask with server-rendered templates—one codebase, one deployment, no build step for the frontend. It's less flexible for future features but faster to ship and easier to debug."

Me:

"Let's do that. I can always split it later if I need to."

How this played out: I went with Flask and templates instead of React. Not because I understood the tradeoffs deeply, but because I asked and the agent gave me a simpler option. That's the point—you don't need to know the right answer, you need to ask if a simpler answer exists.

Question set 3: About going live

If your app will be on the open internet with external services like LLMs or paid APIs, you need to think about abuse and cost before you launch.

Me:

"What's the worst-case cost if someone spams this app with requests?"

Claude Code:

"Each Anthropic API call costs roughly $0.01. If someone sent 10,000 requests, that's $100. Without rate limiting, this could happen in minutes."

Me:

"Add rate limits. What's reasonable for a real user?"

Claude Code:

"Most people check 3-4 meals a day. I'd suggest a limit of 10 calculations per IP per day, with a clear message when they hit the limit."

Me:

"Do that. Also cap API calls at the user level, not just IP."

How this played out: I set rate limits per IP and usage limits of 4 meals per day. Not because I expected abuse, but because I didn't want to find out the hard way.

Step 4: Use it like your users will

This is where most real issues showed up—not in the build, but in the first few days of actually using the app.

I tested variations I hadn't explicitly thought about when prompting:

What I typed	What I expected	What happened
"rotis"	Match "roti" from my list	✗ No match
"2 eggs and toast"	Calculate both items	✗ Only calculated eggs
"greek yoghurt"	Match yogurt entry	✗ No match (expected "yogurt")

Each failure felt obvious in hindsight. None of them came up during the build.

How I found these edge cases

I didn't test randomly. I followed three patterns:

Plurals and variants: If "roti" works, does "rotis" work? What about "chapati" or "phulka"?
Real input messiness: I typed meals the way I actually think about them—"eggs, toast, butter" not "2 eggs and toast with butter".
Regional and spelling differences: "Yogurt" vs "yoghurt." "Eggplant" vs "brinjal" vs "aubergine."

How I reported issues to the agent

I didn't try to diagnose the problem or suggest fixes. I just described the failure from a user's perspective.

Me:

"When I type 'rotis', it doesn't find anything. But 'roti' is in my list. This should work."

Claude Code:

"I'll add plural handling so 'rotis' resolves to 'roti'. Want me to apply this to all food items?"

Me:

"Yes, do that for all items."

The agent then added plural handling across the food database.

The insight: AI is excellent at fixing things—after you surface failures from real usage. Your job isn't to QA every possible input. It's to use the app like a real user for a few days and report what breaks.

A note on testing with others

I also sent the app to two friends who eat differently than I do. Within an hour, they'd found inputs I'd never have thought of—foods I don't eat, portion descriptions I don't use, spelling conventions from their backgrounds.

If you can, get one or two real people to try your app before you call it done. Their failures will be different from yours.

What this experience changed for me

If you've worked in a professional product setup, this framework may look familiar.

Step 1 is essentially a lightweight PRD—focused on one outcome, not a roadmap. Step 2 maps to UX design, but led by constraints instead of completeness. Step 3 mirrors tech and dependency decisions, expressed in plain language. Step 4 reflects real-world testing rather than formal QA.

The difference is that this compresses traditional product development into something a single builder can actually use—especially when AI accelerates execution but not judgment.

The vibe-coding landscape changes constantly. Replit, Cursor, Claude Code, Stitch, v0, Lovable. I used Claude Code, Stitch, and Render. You might use Cursor and Figma.

That part doesn't matter.

All these tools turn descriptions into code. What they don't do is decide what to build, what to cut, where reliability matters, or when simplicity beats correctness.

Fast execution doesn't fix unclear thinking.

Vibe coding delivers on its promise: you can build anything without coding. What it doesn't replace is the thinking required to ship something real.

This framework is how I built the glycemic load calculator I shared in my last post. It's not complete, and it will evolve as tools change. But it's the mental checklist I now reach for every time I build with AI.

The tools will keep getting better.

The hard part will still be deciding what deserves to exist.

Want the one-page cheatsheet?

All 4 steps, the feature filter, and the questions to ask before you build—on a single page you can reference while shipping.

Download the cheatsheet