A learning journal from building advisor.rachithasuresh.com
2am. Baby crying for 2 hours straight.
Four voices competed in my head:
Baby's needs: "Hold me. Don't put me down."
My survival: "Sleep now or you'll break."
Medical evidence: "Baby is safe. Crying won't harm them."
What I'd read online: "Could be colic? Reflux? Try massage?"
Every voice made sense. Every voice conflicted. No single "right" answer.
And then it clicked.
I'm not drowning in confusion. I'm experiencing a multi-agent system.
Four agents. Four objectives. Zero consensus.
Not in a paper. In my life. At 2am.
Competing objective functions. Agent coordination. The alignment problem. All the concepts I'd been reading about, happening live, in my living room, with a screaming infant.
I didn't write another note about it. I built it.
What I Built
An interactive game where you navigate 5 parenting scenarios while AI agents debate what you should do.
The setup
- Week 1: First night home (baby won't stop crying)
- Week 6: Growth spurt (constant feeding)
- Week 12: Advice avalanche (everyone has opinions)
- Week 16: First illness (fever, fear, conflicting advice)
- Week 24: Return to work (new competing priority)
The mechanics
- 4-5 AI agents per scenario (powered by GPT-4)
- Round 1: Agents state positions independently
- Round 2: Agents see each other's responses and debate
- You choose which agent to follow
- Stats change based on choices (Energy, Bond, Sanity)
- Agent trust scores accumulate
- Final result: Your "parenting style"
Takes 10 minutes. Works on mobile.
The game was just the vehicle. Understanding multi-agent AI was the destination.
Here's what I learned by building it.
The Multi-Agent Concepts I Implemented
1. Competing Objective Functions
The concept: Each agent optimizes for a different goal.
How I built it:
Four different system prompts = four different objectives:
Same scenario input. Different optimization targets. Four different recommendations.
What I learned:
An agent is just an LLM with a specific job.
Give it an objective and it stops being a general assistant. It becomes an advocate.
"Maximize comfort" vs "Maximize rest" creates the conflict. When both can't win, someone's objective gets deprioritized. That's the coordination problem.
Where this shows up in real AI applications:
Think about a multi-agent customer support system. A triage agent wants to classify the issue accurately. A resolution agent wants to solve it fast. A sentiment agent wants to protect the customer relationship. A cost agent wants to minimize escalations.
Same ticket. Four agents. Four objectives. The triage agent flags it as a billing dispute. The resolution agent says auto-refund. The cost agent says offer a credit instead. The sentiment agent reads three prior frustrated contacts and says escalate to a human now.
When all four can't win, someone's objective gets deprioritized. That's the coordination problem, whether you're building AI systems or standing in your living room at 2am.
2. Multi-Turn Agent Communication
The concept: Agents don't just respond in parallel. They interact with each other.
How I built it:
Two rounds of responses. Round 1: all agents respond independently (parallel API calls). Round 2: each agent sees others' Round 1 responses and reacts (sequential calls, conditioned on previous outputs).
What I learned:
This is where it started feeling like a system, not just 4 chatbots.
Round 2 responses aren't scripted. They emerge from seeing what others said. Technical implementation: pass Round 1 outputs into Round 2 prompts. That's it.
But the effect is striking. Agents reference each other, counter each other, try to bridge perspectives. The debate flows naturally because agents are actually reading and responding to each other's positions.
This is exactly how it would work in that customer support system. The triage agent classifies a ticket as "billing dispute." The resolution agent sees that classification and pulls refund policy. The sentiment agent reads three prior contacts and flags frustration. The resolution agent sees the sentiment flag and shifts from "here's our policy" to "let me escalate this immediately."
Each agent's output changes based on what the others found. That's multi-turn conditioning. And it produces fundamentally different outcomes than four agents working in isolation.
3. Context-Aware Responses
The concept: Agents adapt based on system state, not just the query.
How I built it:
Each agent receives the scenario text, current stats (Energy: 30/100, Bond: 70/100, Sanity: 40/100), and week number.
What I learned:
The LLM actually adjusts based on context. When energy is 20/100, Survival Agent says: "CRITICAL. Sleep immediately." When energy is 60/100, same agent says: "Rest when you can."
Same agent, same objective, different context = different response. The stats aren't cosmetic. They change what the agent recommends.
Think about how your own recommendations shift based on company stage. "Move fast" at pre-revenue becomes "optimize retention" post-PMF becomes "compliance review" at enterprise scale. Context changes everything.
4. Revealed Preference Tracking
The concept: Agents recommend. Humans decide. The system watches.
How I built it:
Agents provide competing recommendations. You choose which to follow. The system tracks which agents you trusted across all 5 scenarios.
Final result: "You followed Rational 65%, Survival 20%, Baby 10%, Chaos 5%."
What I learned:
This is preference tracking. The system observes which recommendations you accept and reveals your decision-making pattern. It doesn't adapt yet; that's where it would become true RLHF (Reinforcement Learning from Human Feedback), where the system learns from choices and adjusts future behavior.
But even without the reinforcement loop, the tracking is revealing. Your choices expose your weighting system, which objectives you consistently prioritize when tradeoffs are forced.
5. Dynamic Multi-Agent Systems
The concept: Agents join and leave based on context changes.
How I built it:
Weeks 1–16: 4 agents (Baby, Survival, Rational, Chaos).
Week 24: Work Agent activates when you return to work. Now 5 competing objectives instead of 4.
What I learned:
Context doesn't just change agent responses. It changes which agents are active.
Before work return: Baby vs You (2-way tension).
After work return: Baby vs You vs Career (3-way conflict).
The decision space fundamentally shifts. And if you've lived through a company scaling from seed to Series C, you've felt this. Every growth stage activates new stakeholders with new objectives, and decision complexity compounds.
What Building Revealed
Five-word constraints created distinct voices
I forced agents to respond in maximum 5 words (8 for Chaos).
I thought this would make them feel robotic. It did the opposite. Brevity forced personality:
- Baby: "Need you right now" (vulnerable, direct)
- Survival: "Sleep now or break" (urgent, protective)
- Rational: "Try fifteen, then rest" (specific, measured)
- Chaos: "But family says! Internet says!" (scattered, overwhelming)
Constraints don't limit character. They reveal it.
The debate felt real
I wasn't certain agents would reference each other versus generating disconnected text. They referenced. They debated. Baby heard Survival's "you need sleep" and responded "But I need YOU." Rational saw both and tried to bridge: "Both valid. Time-box trying."
Multi-turn conditioning works.
People felt the problem, not the AI
When friends tested it: "That Week 16 illness decision, I felt that."
Not: "Cool AI implementation."
The AI became invisible. The impossible choice was visceral. The best teaching isn't explanation. It's making people feel what you're trying to explain.