Every few weeks the same victory thread goes around: cancelled Pinecone, moved to pgvector on a cheap box, the bill dropped to a rounding error, nothing broke. The replies fill with people who did exactly the same. They are not wrong. They are also not describing your situation, because the move that rescues one team buries the next under a database it cannot scale or cannot keep on its own hardware. There is no best vector database. There is the one that fits the constraint you cannot move, and the loudest advice on your feed is someone else's constraint, not yours.

We shipped our own matcher on Qdrant. That was not a ruling that Qdrant is the best. It was the answer to two questions about our situation, and for a different situation the answer would have been Postgres, or Pinecone, or nothing fancier than a library. So before the table, the two questions that actually decide it.

The first is whether your data is allowed to leave your infrastructure. If a client says the documents cannot sit on someone else's cloud, you have just eliminated the entire managed-only category, Pinecone included, no matter how good it is. The second is scale and what you already run. With Postgres already in production the boring answer wins more often than people expect, and it now wins further up the curve than it used to, because the pgvectorscale extension moved the old ceiling out by an order of magnitude. You reach for a purpose-built engine when retrieval quality itself is the product, or when you are genuinely heading past a hundred million vectors. Most of the rest is detail.

Two questions pick your vector database: can the data leave the building, and how big does this get.

Here is the whole field on one screen. The rest of the article is the reasoning behind each row.

vector databases, scored by constraintmid-2026
DatabaseDeploymentHybrid searchComfortable scaleCost shapeReach for it when
PineconeManaged only, a few cloud regionsYes, dense plus sparse100M and up, at a pricePay per use; cheap start, steep at scaleZero ops and the data can leave
WeaviateSelf-host, cloud, or bothNative, the deepest100M and upFree self-host; low managed entryHybrid and multi-tenant are the job
QdrantSelf-host or cloudNative, plus filtered HNSW100M and upTens of dollars a month on a VPSSpeed, filtering, open, cheap at scale
pgvectorA Postgres extensionNot native; you compose itTens of M; 100M+ with pgvectorscaleFree if you already run PostgresOn Postgres and you want ACID
MilvusSelf-host or cloudYesBillionsCheap software, costly opsHundreds of millions of vectors
ChromaEmbedded or client-serverBasicSmall to mediumFree, runs in your processPrototyping, working today

Pinecone: zero ops, if the data can leave

Pinecone is the fastest way to a working production index. Serverless, no nodes to size, you pay for storage and the reads and writes you actually do. Hybrid is built in: one index holds a dense and a sparse vector per record, queried together, though with sharper edges than the natively-hybrid engines have, since it leans on a dense-first retrieval that can miss an exact keyword match the sparse side would have caught. For a team that wants to ship and never think about the database again, it is still hard to beat.

Two things to know going in. The base product is cloud only. There is a bring-your-own-cloud tier for enterprises that need the index inside their own account, but the control plane stays on Pinecone's side and has to stay reachable, so there is no true on-prem or air-gapped option, and a hard rule that nothing leaves your own datacenter still rules it out. And the bill scales with usage, gently at first and steeply later, with enough lock-in that moving a large index out becomes a project of its own. We did not pick it. It is still the right answer for plenty of teams.

Weaviate: the hybrid champion

If hybrid search is the product and you want it native and deep, Weaviate is the strongest option. Keyword scoring, dense vectors, and metadata filters resolve in one query, with real multi-tenant isolation for SaaS. It ships three ways: self-host the open source, run it in their cloud, or split the difference, which keeps a compliance team happy. If our matcher had been a multi-tenant product instead of an internal tool, this is the one I would have looked at first.

Qdrant: speed, filtering, and a small VPS

Qdrant is open source, fast, and the strongest at filtered search, which is why we run it. The hybrid mechanics, the filtered traversal, and the quantization are all in the flagship teardown, so I will not repeat them here. The short version: it self-hosts on a small box for tens of dollars a month, the managed cloud is there when you want it, and the filtering held up under exactly the selective conditions that quietly wreck a naive setup. For retrieval quality as the core job, on your own infrastructure, it is the one to beat.

pgvector: the boring answer that is usually right

If you already run Postgres, start with pgvector. Your vectors live next to your application data, you get one backup story and one transaction, and you do not operate a second system. It is not native hybrid and it is not the fastest at the top end, but most projects are neither huge nor hybrid-critical on day one. The old wisdom was that it fell over around ten million vectors, and that was true while the index was the memory-hungry HNSW graph that has to sit in RAM to be fast. The pgvectorscale extension changed the arithmetic: its disk-based DiskANN index with binary quantization keeps the hot part small in memory and carries Postgres comfortably into the hundreds of millions of vectors on ordinary NVMe, which is well past where most teams will ever be. The reasons to leave are now about hybrid quality and filtering behavior, not raw count.

Milvus and Chroma: the two ends

Two more belong on the map, for the extremes. Milvus is built for the billion-vector end, with compute and storage separated so it scales sideways. Its distributed mode runs on Kubernetes, which is where the operational cost lives, but the old line that Milvus means Kubernetes is out of date: a Standalone build runs in a single Docker container, and an embedded Lite build runs in-process for prototyping, so you can grow into the heavy mode rather than starting there. Chroma is the other end: it runs inside your process and takes you from nothing to working search in an afternoon. A Rust rewrite and a hosted cloud have hardened it, but the self-hosted version still strains under heavy concurrent writes, so it stays a prototyping and small-scale tool, not the multi-tenant backbone. Neither is where most production RAG lands, but knowing they exist stops you from forcing a middle-tier tool to do an end-tier job.

Filtering is where it actually breaks

Almost every benchmark you will read measures the wrong thing. They race raw similarity over a flat dataset: one query at a time, no filters, nobody else on the box. Production never looks like that. You have many requests in flight at once, each asking for a different slice (this tenant, this date range, this document type), and the database spends most of its effort resolving those metadata filters, not computing cosine distance. The similarity math was rarely the bottleneck. Concurrent filtering is.

This is where the engines genuinely differ, and it never shows up on a leaderboard. Qdrant folds the filter into the graph traversal, so a selective filter stays cheap no matter how narrow it gets; that is the filtered-HNSW behavior from the flagship teardown, and it is the concrete reason we run it. A post-filtering design does the reverse: it fetches the nearest neighbors first, then discards the ones that miss the filter, and when the filter is narrow it can discard the whole result and have to widen and retry, which is exactly when tail latency blows up. pgvector sat awkwardly in the middle for years, until its 0.8 release added iterative index scans that made selective filters genuinely fast, which is a real part of why the Postgres answer now holds up further than it used to.

So when you are down to your last two finalists, do not benchmark the demo. Fire a few hundred concurrent queries carrying the filters your real tenants will use, and watch the 99th percentile, not the average. That is the test that predicts production, and it is the one the marketing pages will never run for you.

The decision, in the order that matters

pick by your hardest constraint
Must stay on your hardwareSelf-host Qdrant or Weaviate. Pinecone runs in a cloud, even its bring-your-own tier.
On Postgres alreadypgvector. One system, one transaction; pgvectorscale carries it past a hundred million.
Hybrid is the whole pointWeaviate for depth and multi-tenant, Qdrant for speed and filtering.
Zero ops, data can leavePinecone. Pay it and move on.
Billions of vectorsMilvus. Distributed mode wants Kubernetes; Standalone is one container.
Just getting startedChroma in your process. Graduate when it hurts.

Managed or self-host, the axis under all of it

Underneath the names is one tradeoff that repeats. Managed buys you zero operations and the fastest path to production, and charges for it in money and in lock-in. Self-host buys you control and a cheap bill at scale, and charges for it in the hours you spend running it. Neither one is the virtuous choice.

One thing skews this in 2026: agents. A human-in-the-loop app reads far more than it writes, but an agent writes constantly, updating memory, pruning context, re-indexing, and serverless pricing punishes writes hardest. So an agentic workload reaches the point where a flat self-hosted bill beats per-use billing much sooner than a read-heavy app would, often while the corpus is still modest. If you are building agents, model the write volume before you sign up for usage pricing, not after the invoice arrives.

A two-person team racing to a demo should probably pay Pinecone and stop thinking about it. A team with a data-residency clause and an ops engineer already on payroll should self-host Qdrant or Weaviate and keep the money. The right call is the one that spends your scarcest resource the least, and that resource is different on every team.

What 2muchcoffee covers

We build production RAG and AI systems, and choosing the store is one of the first calls we make with a client, usually in the first conversation, because it is cheap to get right early and expensive to change late. If you are staring at this fork and not sure which constraint should win, that is the conversation to have before you write a line of code. The plain path to start it is the AI work we do.

One concrete action

Before you compare a single benchmark, write down your two hardest constraints: where the data is allowed to live, and the order of magnitude you are heading toward. Most of the table disappears once those two are on paper. The database you are left with is usually the one to start with. If two survive and you have to benchmark, benchmark them the way the filtering section said, under concurrent filtered load, and read any published chart with a cold eye, because most are produced by the vendors and tuned for the one workload that flatters them, and some licenses even forbid publishing a benchmark that makes the product look slow, which is most of why honest independent numbers are so thin.

Oleg Logvin Tech Lead at 2muchcoffee Builds production RAG, AI pipelines, and the boring infrastructure that makes them trustworthy.