Enhance scale with vector indexes
As your data set grows from hundreds to millions of rows, fast retrieval becomes a hard requirement. Without optimization, a similarity search scans the entire table, which raises latency and hurts the user experience. A vector index reduces the work by directing the database to the most promising rows first, so queries return faster. Azure Database for PostgreSQL supports vector indexes through the pgvector and pg_diskann extensions. In a Retrieval Augmentation Generation (RAG) solution, store each item's embedding in a vector column on the same row as its related fields, then index that column. That index is the vector index.
Tip
To be able to use pgvector or pg_diskann on a Flexible Server, you must first add the vector extension on the Server parameters azure.extensions parameter.
Why vector indexes matter
Speed is the name of the game. You want your queries to return results as quickly as possible. Think of an HR assistant that answers questions about policies. With 500 rows, a full scan might be fine. With 5 million rows, it isn't. Indexes reduce the amount of data scanned at query time. Indexes trade a bit of storage and build time for faster queries as data grows. Vector indexes quickly narrow down the search space, allowing for rapid retrieval of relevant rows.
Index options in PostgreSQL
PostgreSQL supports several approximate nearest neighbor index types for vector search. Each has its own strengths and weaknesses. Two are provided by the pgvector extension, and a third is available via the pg_diskann extension:

IVFFlat (Inverted File with Flat Compression) - Provided by the
pgvectorextension. Groups vectors into many lists. At query time, it picks the closest lists and compares the query only to items in those lists. Theivfflat.probessetting controls how many lists to check per query; more probes usually improve recall but add time. If you set probes equal to the number of lists, the search checks every list (an exact search over the index) and loses the speed benefit. Begin with the defaults and adjustivfflat.probesat query time if obvious matches are missing.- Pros: fast to build, modest memory, predictable behavior.
- Cons: you choose the number of lists; large distribution changes can require a rebuild.
HNSW (Hierarchical Navigable Small Worlds) - Provided by the
pgvectorextension. Builds a multilayer neighbor graph. Search begins in the top layer and narrows as it moves down to denser layers near the closest neighbors. Compared toIVFFlat, it can deliver better query speed at similar recall, but it uses more memory and takes longer to build. There's no training step, so you can create the index even on an empty table. The key settings are m and ef_construction when you build, and hnsw.ef_search when you query. Themparameter sets the maximum number of connections per node in each layer (default 16). Theef_constructionparameter sets the size of the candidate list while the index is built (default 64). At query time,hnsw.ef_searchcontrols the candidate list the search keeps (default 40). Larger values generally improve recall, with more memory or longer build/query time. Start with defaults, then adjusthnsw.ef_searchwhen speed matters more and results are already solid.- Pros: strong speed and recall at query time.
- Cons: higher memory and longer build time.
DiskANN (Disk Approximate Nearest Neighbor) - Provided by the
pg_diskannextension. This extension adds a separateDiskANNindex access method. Keeps most of the structure on disk with a small working set in memory; designed for very large data. It offers high recall, high queries per second, and low query latency, even for tables with billions of rows. DiskANN’s Azure implementation stores full vectors on SSD while compressing working sets in RAM, which trims memory use and limits SSD reads during queries. The built-in vector compression and quantization preserve accuracy as data evolves, making DiskANN a strong fit for large semantic search and RAG scenarios. The defaults aim for strong results at large scale. If searches skip good neighbors, raisediskann.l_value_isto consider more candidates, then check latency. If memory is tight on very large tables, create the index withproduct_quantized = trueto reduce memory use, noting there can be a small quality trade-off.
Create a vector index
Creating an index in Azure Database for PostgreSQL is simple, first enable the respective extension, and then run the respective CREATE INDEX statement for the index type you want to use. Let's assume you're creating a vector index for a table named company_policies. The table embeddings are stored in a vector column named embedding.
Let's create some indexes:
Creating a pgvector indexes:
Create the extension in PostgreSQL (you only need to enable the extension once per database):
CREATE EXTENSION IF NOT EXISTS vector;Create an
IVFFlatindex:CREATE INDEX company_policies_vec_ivf ON company_policies USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); ANALYZE company_policies;This statement creates an
IVFFlatindex on theembeddingcolumn of thecompany_policiestable. Thelistsparameter specifies how many lists to create in the index. You can adjust this value based on your data size and query performance needs.Create an
HNSWindex:CREATE INDEX company_policies_vec_hnsw ON company_policies USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);This statement creates an
HNSWindex on theembeddingcolumn of thecompany_policiestable. Themparameter sets the maximum number of connections per node, andef_constructioncontrols the size of the candidate list during index construction.
Creating a pg_diskann indexes:
Creating a
pg_diskannindexes (you only need to enable the extension once per database):CREATE EXTENSION IF NOT EXISTS pg_diskann CASCADE;Create a
DiskANNindex.CREATE INDEX company_policies_vec_diskann ON company_policies USING diskann (embedding vector_cosine_ops);Note how the
DiskANNindex is created without any extra parameters. While you can use parameters, DiskANN is designed to work well with default settings for a wide range of use cases.
Operator and distance pairing
When creating a vector index, you must pair the operator class on the column with the matching operator in your query. This match ensures that the database uses the index effectively. The most common distance metrics are:
- Cosine distance (
<=>) - Measures the angle between vectors. Common for unit-normalized text embeddings. Usevector_cosine_opsoperator class. - Euclidean/L2 distance (
<->) - Measures straight-line distance between points. Usevector_l2_opsoperator class. - Inner product (
<#>) - Maximizes dot product. Works best with unit-normalized vectors. Usevector_ip_opsoperator class.
Note
Rule of thumb: Create the index with the operator class that matches your distance, and sort with the matching operator in the query. Mixing them prevents index use.
In the company policy example, cosine distance works well for text embeddings. Think of each question and policy as arrows pointing toward topics. The question 'How many days off do I get?' points almost the same way as the Vacation policy (small angle → strong match) and points away from Travel reimbursement policy (large angle → weak match). Here's how to create the index and query with cosine distance:
-- Create index with cosine operator class
CREATE INDEX company_policies_vec_ivf_cos
ON company_policies USING ivfflat (embedding vector_cosine_ops);
-- Query using cosine distance operator
SELECT id, title
FROM company_policies
ORDER BY embedding <=> azure_openai.create_embeddings('<embedding-deployment>',
'How many vacation days do employees get?')::vector
LIMIT 5;
Key takeaways
As your tables grow, you need a vector index so retrieval stays fast. Store each embedding in a vector column on the same row as its data and index that column using pgvector or pg_diskann. Choose IVFFlat for simple, lower-memory builds, HNSW when you want stronger in-memory query performance, and DiskANN when the data is very large or memory is tight. At query time, match the operator class to the distance used in your query so the planner can use the index. Begin with defaults and make small, measured adjustments to query-time parameters (ivfflat.probes, hnsw.ef_search, diskann.l_value_is) if obvious matches are missing or speed needs improvement. After any change, run the same queries with EXPLAIN (ANALYZE, VERBOSE, BUFFERS) and check that the plan uses the index, execution time drops, I/O goes down, and the top results still look correct.