Blog / engineering

The 2026 Engineering Stack Decision Tree

Metafic Team May 3, 2026

Stack arguments on Twitter are entertaining. Stack arguments at month six of a product build are expensive. The first is a sport. The second is a postmortem.

Most teams pick a stack from vibe. Someone on the team is excited about a framework, someone else read a blog post, and a decision that will shape the next three years of engineering hours gets made over a slack thread on a Wednesday afternoon. Then the team spends the next eighteen months either pretending the choice was great or quietly building escape hatches around it.

This post is the fit-based version. We have shipped on most of the stacks below, watched several get retired in production, and rescued enough projects to have an opinion about why the second category exists. The point of a stack decision is not to be defensible on Hacker News. The point is to be defensible six months in, when the original author has left, the requirements have changed, and the bill is real.

What follows is the decision tree, layer by layer: frontend, backend, database, deployment. We name the cases where each option is the right answer, the cases where it’s the wrong one, and what we have actually seen go sideways in production. At the end we walk a worked example and a list of stacks we would push back on if a team proposed them today.

Frontend

The frontend question is not “which framework is best”. The question is “what shape is your app, and which framework was built for that shape”. The honest answer for most product teams in 2026 is still Next.js, but Next.js is also the most expensively misused tool in this category. Let’s go through the realistic options.

Next.js. Right answer for: product teams shipping a web app that needs SEO, server-rendered marketing pages, authenticated dashboards, and a real backend story in the same codebase. The App Router and server components removed the ceremony around data fetching, and the deployment story on Vercel (or Cloudflare, or self-hosted Node) is mature. We default to it when a team needs to ship a full SaaS product with a public surface and an authenticated surface in one codebase. Wrong answer for: pure marketing sites (use Astro), internal tools where SEO and RSC do not matter (use plain React + Vite), or content-heavy publications where you do not need any client framework at all. Who’s using it in production: most consumer SaaS shipping in 2026, most YC companies, a significant chunk of mid-market dashboards. If a team is hiring for frontend, the resume pool is largest here, which matters more than people admit. When we staff a Next.js pod, this is the workload we are matching against.

Remix. Right answer for: teams who genuinely care about web fundamentals (forms, progressive enhancement, nested routing) and have a workload that maps cleanly to those primitives. Wrong answer for: most product apps in 2026. The marketing for Remix oversells its general fit. After the Shopify acquisition the project stabilised, but the actual feature gap with Next.js shrunk in both directions: Next added server actions, Remix added pieces it had previously rejected. If your team is not already deeply opinionated about the Remix way of doing forms and loaders, the rational pick is Next.js, and you will be hiring for it more easily. Real production use exists, but most of it is teams who picked it during the 2022-2024 window and have since stayed put.

SvelteKit. Right answer for: small teams (two to four engineers) who value developer experience above all else and are willing to trade ecosystem breadth for code that reads cleanly. The compiler approach pays off in bundle size and runtime simplicity. Wrong answer for: anything where you need a deep library ecosystem (auth providers, headless CMS integrations, complex form libraries, rich text editors). The ecosystem has grown, but you will still hit walls Next.js solved years ago. We see SvelteKit working well at agencies, internal tools, and small ambitious startups. We see it failing when a team scales past about six engineers and the hiring funnel goes dry.

HTMX. Right answer for: content-heavy sites, internal tools, server-rendered apps where every interaction can map cleanly to a server round-trip. Pair it with whatever backend you already use. HTMX is the correct answer more often than the JavaScript-industrial-complex wants to admit; admin panels, B2B internal dashboards, marketing CMSes with light interactivity, all of them ship faster and break less with HTMX than with a SPA. Wrong answer for: anything that needs offline support, complex client-side state (drag-and-drop interfaces, real-time collaborative editing, anything resembling Figma or Linear), or rich client interactions that cannot tolerate a server round-trip. We have shipped HTMX in production for Rails monoliths and Django shops and the velocity gain over a separate SPA is real.

Plain React + Vite. Right answer for: internal tools, embedded apps inside a larger product, dashboards that live behind auth and do not need SEO, and any case where the Next.js opinions get in the way more than they help. Wrong answer for: anything public, anything that needs SSR, anything where you would rather not build your own routing, data fetching, and auth conventions from scratch. The trap here is that “just React” sounds simple at week one and starts to look like a worse version of Next.js by month six because you’ve been hand-rolling all the missing pieces. If you go this route, accept that you are owning that infrastructure forever. Our React pods handle both flavours: standalone Vite apps and Next.js builds.

The strong take: Next.js for most product teams. HTMX for content-heavy and internal tools. Svelte for small teams who treat DX as a load-bearing requirement. Remix has a narrower fit than its marketing suggests. Plain React + Vite when the Next.js abstractions actively hurt.

Backend

The backend question is two questions: what does your team know, and what does the workload demand. Most stack debates pretend only the second matters. In practice both matter, and the team-skill side often wins.

Node.js (Express, Fastify, NestJS). Right answer for: full-stack TypeScript teams who want one language across the wire, and for I/O-bound workloads (a lot of waiting on databases and APIs, not a lot of CPU work). Fastify is the sane default in 2026; Express works but the ecosystem moved on; NestJS is fine if your team genuinely wants the Angular-flavoured opinions and decorators, and a tax if they don’t. Wrong answer for: CPU-heavy workloads (image processing, encoding, ML inference at scale), and any team that doesn’t already have TypeScript fluency. The common scaling pitfall: a Node service grows to do CPU work it was never designed for, the event loop blocks, and you discover this in production at 3am. The fix is to push CPU work to a worker process or a different runtime, not to add more Node instances. Our Node backend pods work primarily with Fastify, and we have rescued enough Express monoliths to be cautious about the upgrade paths.

Python (FastAPI, Django). Right answer for: anything ML-adjacent, anything data-heavy, anything where the team needs the Python ecosystem (numpy, pandas, scikit, the entire AI stack). FastAPI for service APIs. Django for full CRUD products where the admin, ORM, and conventions earn their keep. Wrong answer for: high-concurrency network services where the GIL becomes the ceiling, even with the recent GIL-removal work landing in 3.13 and improving in 3.14. Django specifically is the wrong answer when you need flexibility in the data layer that the ORM doesn’t cleanly give you. Common scaling pitfall: a Python service hits CPU limits and the team tries to async-rewrite it instead of either moving the hot path to Go or splitting CPU work to a worker queue. We staff Python pods and dedicated Django teams and the workloads rarely overlap.

Go. Right answer for: high-concurrency network services, anything where you would otherwise reach for Java or .NET but want a smaller operational surface, and any service where binary deployment and predictable resource use matter. The stdlib carries most of what you need; the framework debate (gin vs chi vs echo vs none) is mostly aesthetics. Wrong answer for: rapid CRUD product development with a small team. The verbosity is real, and the productivity per line of code is lower than Python or Rails. Common pitfall: teams pick Go because they read it scales well, then spend the first six months reinventing the wheel because they don’t have a Rails-equivalent batteries-included framework. Pick Go when you have a real concurrency or performance requirement, not because it’s fashionable. Our Go services pods tend to be staffed for infrastructure and high-throughput API work, not for the first version of a SaaS product.

Ruby on Rails. Right answer for: fast CRUD products with conventional patterns, B2B SaaS where the workload is “users, accounts, billing, dashboards, the occasional background job”, and teams who want to ship features at the pace Rails actually delivers. Hotwire + Turbo + Stimulus is a serious answer for the frontend half, especially in 2026 with the upgrades that landed in Rails 8. Wrong answer for: services where Ruby’s runtime characteristics dominate (high concurrency network services, CPU-heavy work, anything where memory footprint matters), and for teams who don’t have anyone fluent in Rails conventions. The pitfall: a Rails app grows into a monolith that wasn’t decomposed when it should have been, and the team blames Rails instead of the absence of service boundaries. Done well, Rails monoliths run for a decade. Our Rails monolith pods do as much rescue work as net-new builds, which tells you something.

.NET. Right answer for: enterprise contexts where the team is already in the Microsoft ecosystem, regulated industries with strong tooling needs, and any team where C# fluency is the constraint. The runtime is genuinely fast, the tooling is excellent, and ASP.NET Core is a sane web framework. Wrong answer for: startups outside the .NET world. The hiring pool and library ecosystem outside enterprise contexts will frustrate you. Common pitfall: teams pick .NET because the original founder came from a Microsoft background, then can’t hire for it once they move past employees one through five.

The strong take: Go for high-concurrency network services. Python for ML-adjacent and data-heavy. Rails for fast CRUD products with conventional patterns. Node for full-stack TypeScript teams. .NET for enterprise. The team-skill question matters as much as the workload question, and we wish more stack debates admitted it.

Database

Use Postgres. Then read the exceptions. The exceptions matter, but the default matters more, and most teams over-engineer this layer because someone in the room read a blog post about a niche tool.

Postgres. Right answer for: 95% of products. Transactional workloads, relational data, JSON when you need it, full-text search when you don’t want to stand up a separate service, geo queries with PostGIS, vector search with pgvector. The extension ecosystem in 2026 means Postgres is genuinely the answer to questions that used to require a separate system. Operational tooling is mature across every cloud provider. Hiring people who know it is easy. Wrong answer for: nothing on the first version of a product. There is no startup that should pick a different primary store before they hit a real, measured constraint Postgres cannot meet. The common failure mode: a team picks something else, runs into operational pain, and migrates to Postgres anyway by year two.

DynamoDB. Right answer for: KV workloads at extreme write scale where the access patterns are known up front and will not change. Single-table design is a real discipline and pays off when the workload fits. Wrong answer for: anything relational, anything where you will need new query patterns later, and any team that doesn’t have someone fluent in DynamoDB modelling. The pitfall: a team picks Dynamo because it sounds like the AWS-native answer, then six months later they need a different access pattern and the redesign is painful. If you do not have a quantitative reason Postgres won’t serve, you do not need Dynamo.

ClickHouse. Right answer for: analytics workloads, event streams, anything where you are aggregating across hundreds of millions of rows and queries can be eventually consistent. Pair it with Postgres for the operational store. Wrong answer for: anything resembling OLTP, anything with frequent updates, anything where you need ACID transactions. The pitfall: teams try to use ClickHouse as a primary store and discover the update semantics aren’t what they expected.

Redis. Right answer for: cache, ephemeral session storage, lightweight queue (Sidekiq, BullMQ, equivalent), pub/sub for in-memory fanout, rate limiting. Wrong answer for: anything you cannot lose. Treat Redis as ephemeral, even with persistence enabled, and your operational life gets easier. The pitfall: a team starts storing real data in Redis because it’s fast, then has an outage and discovers what “ephemeral” actually meant.

SQLite. Right answer for: local-first apps, embedded use, single-machine deployments where the operational simplicity is worth more than the scaling ceiling, and any case where Litestream or LiteFS gives you the durability story you need. With LiteFS the case for SQLite in production has genuinely strengthened over the past two years. Wrong answer for: anything with meaningful concurrent write throughput from multiple processes across machines, or anything where you need real replication semantics beyond what the SQLite-on-Fly story gives you.

The honest truth: most products need Postgres and one or two of the above. Postgres for everything transactional, Redis for cache and queue, ClickHouse if you genuinely have analytics scale. The rest is over-engineering, and we have seen enough mid-stage rewrites caused by speculative database choices to be confident in that take.

Deployment and infra

This is the layer where the bill becomes real. Pick wrong and you either pay too much, run out of platform when you grow, or end up with an operational burden that the team isn’t staffed for.

Vercel. Right answer for: Next.js apps at small to mid scale where the team wants zero ops, preview deployments are non-negotiable, and the bill is acceptable. The DX is genuinely best in class for Next.js. Wrong answer for: high-traffic apps where the bill scales faster than your revenue does, anything that needs a long-running process (sockets, queues, background jobs that don’t fit in serverless), and any case where you need to run code in regions Vercel doesn’t price competitively in. The migration off-ramp: self-host Next.js on a Node server behind your own CDN, or move to Cloudflare. Both are real options in 2026 but neither matches the Vercel DX.

Cloudflare Pages and Workers. Right answer for: static frontends, edge-rendered apps, services where you genuinely benefit from running compute at the edge (auth checks, A/B routing, geo logic). The economics are excellent at scale. Wrong answer for: workloads with heavy server-side compute that don’t fit the Workers model, anything needing long-running connections beyond what Durable Objects support, or teams that want a traditional Node runtime. The pitfall: a team picks Workers, then needs a piece of Node-only middleware, then spends a sprint rewriting it.

Fly.io. Right answer for: full-stack apps that want global deployment, Rails and Django apps that need a real Postgres next to them, anything where you want Docker without learning Kubernetes. The Postgres story (Fly Postgres) is good enough for most production workloads. Wrong answer for: teams who need the deep IAM and service catalogue of a hyperscaler, or workloads with extreme scale where the cost story changes. The pitfall: relying on Fly Postgres without understanding the failover semantics; their docs are clear but teams skim. We default to Fly for Rails and Django builds that need real geographic distribution.

Railway. Right answer for: prototypes, internal tools, side projects that occasionally become real businesses. The DX is genuinely good, the pricing predictable at small scale. Wrong answer for: serious production traffic. Not because Railway can’t handle it, but because at that scale you usually need more control over networking and database operations than the platform exposes.

Render. Right answer for: small teams who want a Heroku-shaped platform with sane pricing and a reasonable database story. Wrong answer for: anything where you need fine-grained networking control or extreme scale. The platform is solid, but the ceiling is real.

AWS (the real one, not amplify). Right answer for: teams with operational maturity who need the full service catalogue, regulated industries that need the compliance story, and any company past about Series B where the bill on a PaaS becomes worse than the ops cost. Wrong answer for: small teams without a dedicated platform engineer. You will spend more in engineer-hours configuring AWS than you save in compute. The off-ramp from AWS exists but is rarely taken; once a company is deep in AWS, the inertia is enormous.

GCP. Right answer for: teams already using BigQuery for analytics, teams who want Cloud Run as a simpler-than-AWS compute primitive, anything ML-adjacent where the Vertex story matters. Wrong answer for: anything where the team is more comfortable in the AWS service catalogue. GCP is a real cloud, but the network effects favour AWS for most hiring pools.

Your own Kubernetes. Right answer for: companies with a real platform team, predictable multi-cloud requirements, or genuine scale where the per-pod cost wins matter. Wrong answer for: 95% of teams who propose it. Kubernetes is a platform for building platforms; if you do not have the headcount to run a platform, you should not be running Kubernetes. The pitfall is universal: a team picks K8s for “scalability”, then spends two engineers full-time on YAML, then wonders why velocity is slow.

A worked example

A B2B SaaS for around 500 customers, mid-market, complex auth (SAML, SCIM, role-based access), audit logs that the security team will inspect, and a webhook API that customers integrate into. What stack does the decision tree produce?

Frontend: Next.js. Public marketing pages need SEO. The authenticated app has a real backend story. The team needs to ship a dashboard, a settings surface, an admin area, all of it. Hiring is easy. We are not going to be clever here.

Backend: Rails. The workload shape is conventional: users, organisations, roles, audit records, webhooks, billing. Rails was designed for this shape. The team can ship a feature a week, every week, with conventions that the security audit can actually evaluate. We could also pick Node here, and if the team is full-stack TypeScript that’s a fine swap. We are not picking Go or .NET, neither maps to the workload.

Database: Postgres. The audit log is relational. The auth model is relational. The webhook delivery records are relational. There is no part of this workload that wants a different primary store. Add Redis for cache and Sidekiq queue. That’s it.

Deployment: Fly.io. The team is four engineers. They do not have a platform engineer. They want preview deployments, regional databases for the EU customer who asked, and a deployment story that does not require a SOC2 of its own. Fly maps to this. We would not pick AWS for this team at this stage; the per-engineer-hour cost of running on AWS for a four-person team is worse than the Fly bill at the scale they are at. When they hit Series B and the platform team appears, AWS becomes defensible. Not before.

The stack defends itself in a sentence per layer. That’s the bar for any stack choice you want to take to a CTO or a CFO.

The anti-checklist

Stacks we would push back on in 2026, with no apology:

Meteor. The community is a fraction of what it was, the hiring pool is smaller still, and there is no workload in 2026 where Meteor is the best answer. If your team is on it, plan the migration.

AngularJS (the original, version 1.x). Still alive in a lot of legacy codebases, still being extended by teams who should be planning the rewrite. Long out of support. Every month you delay is a month of accumulating risk.

MongoDB as a primary store for relational data. Mongo is fine for genuinely document-shaped data. Most products that picked Mongo did so because of a 2014 blog post and have been working around the lack of joins ever since. If your data has relations, your database should know about them.

Microservices for a five-person team. A microservice architecture is a coordination tax that pays off when the coordination cost without it exceeds the operational cost with it. For a five-person team, that crossover does not exist. Build the monolith. Decompose it when the team and the codebase actually demand it.

Kubernetes for an app with one EC2 worth of traffic. If your application could run on a single server (and most can, much later than people assume), the right answer is a single server with a deployment script. Kubernetes is the wrong tax to pay for that workload.

Custom-built CMSes for marketing sites. Sanity, Contentful, even WordPress in headless mode all exist. If your marketing team is asking for a CMS, do not build them one. You will be maintaining it forever.

The pattern across the anti-checklist is the same: each of these was a defensible choice in a different decade, a different team size, or a different workload. Carrying the choice past the context it was made for is what makes it a problem.

Closing

Defending your stack to a CTO is easy if your reasoning ties to the workload. “We picked Rails because the product is conventional CRUD with audit and billing and the team can ship a feature a week” is a defence. “We picked Rails because it’s what we used at the last company” is not, even if it produces the same stack. The first survives a change of CTO; the second falls over the moment someone asks the question seriously.

The decision tree above is not a prescription. It’s a way to make the question answerable. When a team comes to us asking for a calculator on what a pod costs or for our take on a build philosophy, the first conversation is always about workload shape and team skills, in that order. Get those right and the stack choices follow. Get them wrong and no stack will save you.

Pick your stack from fit. Defend it from the workload. Migrate it when the workload changes, not before, and not because the framework shipped a new logo.

More like this, in your inbox.

One engineering teardown a week. Real pods, real code, no fluff. About 3 minutes a week.

You're in. First teardown lands Sunday.