Case Studies

What three pods actually shipped.

We anonymize on request. We don't anonymize the work. Below is what three pods built, what they didn't, what hurt, and how the numbers moved. The clients are real. The engagements ran between mid-2024 and early 2026.

If you recognise yourself in any of these and want to compare notes off the record, book a call.

Case 01 · Proptech · Middle East

Listings published 25 times faster. Duplicates down 70%.

A Kuwait City-based property listing platform serving licensed brokers across Kuwait and KSA. Roughly 3,000 active brokers, 180,000 monthly listings, Series B in motion.

Pod
Architect + 2 senior engineers + QA

Engagement
5 months, async-first, 3 sync hours weekly

Stack
Next.js, Node, Postgres, Go service for fingerprinting

The brief

The platform had grown faster than its moderation workflow. The average time from a broker hitting submit to a listing appearing on the search page was four hours and eleven minutes, almost all of it queue time. About 8% of incoming listings were duplicates of existing units under different IDs. Leadership was preparing for a Series B, and the diligence calls were going to ask uncomfortable throughput questions.

What we shipped

The first two weeks were not code. The architect and one engineer shadowed two moderation analysts and rebuilt their mental model of the workflow. The proposal that came out of that wasn't a rewrite. It was three smaller bets, sequenced so each could be rolled back independently.

Image and address fingerprinting service in Go, sitting in front of the listing API. Caught most duplicates before they reached a moderator.
Moderation queue rebuilt as a state machine. Inline correction replaced full resubmits for roughly 40% of rejected listings.
Ministry of Justice and PACI checks at submit time so ownership and broker-ID errors got surfaced in seconds instead of after a four-hour wait.
Arabic and English content parity validation, which the previous flow had punted to manual review.

The numbers

9m 40s

Average submit-to-live, down from 4h 11m

2.3%

Duplicate rate in month one, down from 8.4%

3.1×

Moderator throughput per FTE

The Series B closed two months after the pod handed off.

Where having one team paid off

The decision to fingerprint images at submit time, instead of at moderation review, got made on a Tuesday morning in a 25-minute call. The architect proposed it. The QA lead pushed back with the fact that 22% of historical rejections were photo-quality issues that wouldn't fingerprint cleanly. We adjusted the algorithm to handle crops and rotations before the call ended. In a staff-augmentation model, that round trip is a three-day async thread with a project manager in the middle, and the QA insight usually lands after the engineer has already shipped the wrong version.

QA shadowed the moderation team in week one alongside the engineers, not after them. By the time the new moderation queue hit feature flag, QA's test cases were broker-submission errors pulled from real production rejections, not synthetic happy-path scenarios. First dogfooding round caught one bug. Not twelve.

What we didn't do

Replace anything in the existing Next.js or Node codebase. Both were healthy. Touching them would have added migration risk without changing the throughput story. The broker mobile app rebuild the client originally asked about was descoped after week two; the math didn't support it as a Series-B-readiness lever.

Case 02 · Specialty Retail · United States

Zero oversell incidents at 14× baseline traffic on Black Friday.

A 40-store US specialty retailer running Shopify Plus on top of a Rails monolith, with two distribution centers on east and west coasts. Last BFCM, oversells hit 1.8% of orders.

Pod
Architect + 2 senior engineers + QA + half-time SRE for cutover

Engagement
6 months, plus a follow-on quarter for returns and transfers

Stack
Rails, Shopify Plus, Kafka, Postgres event store, Go publishers

The brief

Inventory truth was split four ways: store POS, two WMS instances for the DCs, and Shopify Plus. They reconciled overnight, which meant a sweater sold in the Boston store at 11am could still be available online at 3pm. Last Black Friday, that gap cost them 1.8% of orders to oversell, three days of support backlog, and most of January's engineering capacity to refund processing. They wanted the inventory layer rebuilt before the next BFCM cycle. They did not want a year-long platform replacement.

What we shipped

The proposal was to leave every existing system alone and put an event-sourced ledger underneath them. Each store, warehouse, and channel published events into Kafka. A projection worker built a fast read model that POS and Shopify Plus could query. Nobody had to migrate. Everybody had to subscribe.

Event ledger in Go, backed by Postgres + Kafka, with replay-from-zero as a first-class operation.
POS publisher agent that batched events to handle store backhaul latency. Idempotency keys on every event to survive double-publish during cutover.
Read model with sub-second freshness for hot SKUs, eventual for the long tail.
Three feature-flagged cutover stages. Each was rollback-safe inside five minutes.
Shopify Plus integration via webhook plus a scheduled reconciliation worker as a safety net.

The numbers (measured over BFCM)

0.04%

Oversell rate, down from 1.8%

3.8s

POS-to-OMS sync p50, down from 90s

14.2×

Peak traffic vs rolling baseline. Zero paging incidents.

Support volume the week after Black Friday came in 31% below the prior year despite higher order count.

Where having one team paid off

The architecture question that mattered most for BFCM was whether reads off the inventory ledger needed to be strongly consistent or could be eventually consistent. We resolved it in a 40-minute call with the architect, both engineers, the half-time SRE, and the QA lead all in the room. The deciding input came from QA: in week two, QA had spent a day with the client's call-center supervisors and learned that phone-order reps needed real-time inventory, but the website could tolerate four seconds of lag because the cart already held its own inventory reservation. The architecture came out of that meeting with two consistency tiers, not one.

Idempotency keys on every event came from a QA-designed chaos drill in week six. We landed them before the first feature flag flipped. In a vendor-staffed model, that finding usually lands in production.

What hurt

The east DC's WMS publisher ran into a packet-fragmentation issue with their VPN appliance two weeks before BFCM. We caught it in shadow-mode reconciliation, not in production, but it cost the SRE four days to root-cause and reorder a hardware replacement. If the cutover schedule had been tighter, that would have been the call.

Case 03 · Last-Mile Logistics · LATAM

Dispatch latency 10× better. Same-day pickups up 18%.

A Brazilian last-mile delivery operator running São Paulo, Rio de Janeiro, and Belo Horizonte with roughly 1,200 driver-partners. Planning expansion to Curitiba and Porto Alegre in 2026.

Pod
Architect + 2 senior engineers + QA + half-time DevOps for cutover

Engagement
5 months. Re-engaged 3 months later for a cross-border scope.

Stack
PHP/MySQL legacy, new Go dispatch service, Redis Streams, OR-tools

The brief

Dispatch decisions were taking eight to twelve seconds in morning peak. The Android driver app froze under poor network on the periphery of São Paulo. The COO had set a regional expansion target for 2026 and the engineering team's own assessment was that the existing dispatch service would not survive doubling driver count. We were not asked to rewrite anything. We were asked to figure out what to do.

What we shipped

Two and a half weeks of reading code and riding along with three dispatch agents and four drivers across different bairros. The recommendation we came back with was the one the client had been resisting: split dispatch from the monolith and rewrite it in Go. Not because PHP was the problem. Because the synchronous nested-loop assignment algorithm was. The rest of the platform stayed on PHP. The COO approved the scope on the condition we shipped in five months and didn't break the existing driver app during cutover.

Dispatch microservice in Go with an OR-tools-backed graph assignment routine.
Driver telemetry pipeline rebuilt on Redis Streams. Sub-second position updates feed the dispatcher.
Android app: rewritten location and queue layer only. The rest of the UI was left alone.
Shadow mode for three weeks. New dispatcher made decisions in parallel, no acting on them, so the team could diff against legacy.
Three-bairro phased cutover, with per-region rollback in under five minutes.
Optimistic reassignment that revisits a decision when a driver becomes available, instead of waiting for the next assignment tick.

The numbers

720ms

Dispatch p50, down from 8.4s

−92%

Driver app crash rate

+18%

Same-day pickups on equivalent demand

Zero unplanned downtime during the cutover. Re-engaged for cross-border deliveries three months after handoff.

Where having one team paid off

The decision to split dispatch from the monolith, instead of trying to optimise the PHP in place, was made in week 2.5 in a 90-minute conversation. Architect, both engineers, and QA had the legacy code and the rider-along notes open in the same room. Three weeks of staff-aug back-and-forth replaced by a single Tuesday afternoon. The client's COO joined the last 20 minutes, heard the same three voices that had been in the call from the start, and approved scope without needing a separate vendor-vetting cycle.

Shadow mode for three weeks was QA-led, not engineer-led. Because QA had been in the architecture conversation, the rollback triggers we built into the cutover dashboards matched the actual risk surface: assignment quality versus legacy, driver-app crash rate, latency p99. A bolt-on QA team writes regression tests against a spec. Ours wrote shadow-mode comparators against the legacy dispatcher's real decisions.

What we left for the client

A driver-side feature for offline-first order acceptance was on the original list. We dropped it in month three because the data showed under 2% of dispatches were affected by network loss long enough to matter, and the engineering cost would have pushed the cutover into BFCM-equivalent peak. The client's own team picked it up six months later.

Patterns

What's the same across all three.

The first two weeks aren't code. They're reading the existing codebase, watching the team work, and pushing back on the original ask if the math doesn't support it. Two of the three engagements above changed scope before we wrote a line.

No platform replacements. We don't move clients off Shopify, Rails, PHP, or Next.js. We add what's missing and leave what's working. The fastest path to a result is usually a narrow new service alongside the legacy one, not a rewrite.

Cutover is the deliverable. Every engagement above had a multi-stage, rollback-safe cutover plan as part of the scope. Shadow mode where it was cheap. Feature flags where it wasn't. Rollback drills before flipping any traffic.

We say what we didn't do. Each case above lists the thing we descoped or punted. We tell clients no early, so we don't tell them sorry later.

One pod, no handoffs. Architect, engineers, and QA are in the same Slack, the same review calls, and the same shadow weeks with your team. Architecture decisions get made in 30-minute rooms instead of three-day async threads, because the people who'd push back on a bad call are already in the room. QA writes tests against real workflows, not against a spec written by someone they've never met.

Your engagement isn't a case study yet.
Let's make it one.

A 25-minute call to scope the work. If the math doesn't support an engagement, we'll say so.

Book Strategy Call See Pricing

Or stay in the loop. One engineering teardown a week, no fluff.

What three pods actually shipped.

Listings published 25 times faster. Duplicates down 70%.

The brief

What we shipped

The numbers

Where having one team paid off

What we didn't do

Zero oversell incidents at 14× baseline traffic on Black Friday.

The brief

What we shipped

The numbers (measured over BFCM)

Where having one team paid off

What hurt

Dispatch latency 10× better. Same-day pickups up 18%.

The brief

What we shipped

The numbers

Where having one team paid off

What we left for the client

What's the same across all three.

Your engagement isn't a case study yet. Let's make it one.

Your engagement isn't a case study yet.
Let's make it one.