Thursday, June 11, 2026

Design and Implement In-Database RAG Architecture with Azure SQL and Azure OpenAI

Implementing Retrieval-Augmented Generation (RAG) traditionally requires orchestrating complex external vector databases and separate middleware code. However, modern database engineering allows us to implement the entire RAG pipeline—from vector search to LLM generation—directly inside the database engine using T-SQL.

This post analyzes an enterprise-grade RAG pattern built for Adventure Works, leveraging Azure SQL Database and Azure OpenAI via Microsoft Foundry.

System Architecture Flow Diagram:


Step 1: Provision an Azure SQL Database

Login to Azure Portal and go to the Azure SQL Hub and then select Create SQL Database.



Create a Foundry project

Go to Microsoft Foundry and sign in with your Azure account.


Deploy Azure OpenAI models, two models: gpt-5.4-mini / similar (for chat completions) and text-embedding-3-small for embeddings.



Go to Resource and copy OpenAI endpoint to use in the RAG routine.


By using an Azure SQL Server system-assigned managed identity and mapping it to the

Cognitive Services OpenAI User role, the database securely authenticates with Azure OpenAI natively




Create the ProductReview table, Load the data, create Master key and embedding. Encapsulate into a RAG stored procedure.



 Native T-SQL Vector Search

Utilize native Azure SQL capabilities:

  • AI_GENERATE_EMBEDDINGS: Generates vectors directly inside T-SQL scripts.

  • DiskANN Vector Indexing: Implements an Approximate Nearest Neighbor (ANN) search index for ultra-fast, sub-millisecond semantic retrieval.

  • VECTOR_SEARCH Function: Uses cosine similarity metrics to pull the top 5 most contextually relevant product reviews.


RAG procedure is encapsulated with RAG capabilities.
1) retrieval step of RAG. Instead of using a hardcoded WHERE clause, you convert the user’s question into an embedding and use VECTOR_SEARCH to find the relevant reviews and format results as JSON.
2) combine retrieved data with a system message and user question to build the augmented prompt. This prompt is the “A” in RAG.
3) Call the Azure OpenAI endpoint and generate a response. This step is the “G” in RAG, the generation step. You send the augmented prompt to Azure OpenAI and extract the answer.


Cleanup: If you aren’t using the Azure SQL Database or the Azure OpenAI resources for any other purpose, you can clean up the resources you created in this exercise.

If you provisioned a new resource group for this lab, you can simply delete the entire resource group to remove all resources at once. In the Azure portal, navigate to your resource group.
Select Delete resource group and confirm deletion by typing the resource group name.
Select Delete to remove all resources created in this lab.

In this exercise, implemented a complete Retrieval-Augmented Generation (RAG) solution using Azure SQL Database and Azure OpenAI. Generated embeddings and created a vector index for fast retrieval, searched for semantically relevant reviews, formatted the results as context for a language model, built augmented prompts with grounding instructions, called the Azure OpenAI endpoint from T-SQL, and packaged the entire RAG pipeline into a reusable stored procedure with error handling.

Ready to build this yourself? The complete T-SQL scripts—including the schema creation, batch embedding generation loop, DiskANN vector indexing, and the final RAG stored procedure—are fully open-source and available on GitHub.

👉 View the Full Project Repository on GitHub

No comments: