Zero-Trust RAG: Defeating the Shared Private Link Deadlock in Azure Terraform

Your Terraform pipeline is green. The deployment completes without errors. You grab a coffee.

Ten minutes later, you test your new Enterprise RAG application. It immediately throws a 403 Forbidden. You log into the Azure Portal, check the OpenAI Networking tab, and there it is: your Shared Private Link from AI Search is sitting in a Pending state. Nobody told Terraform to approve it. Nobody told you it even needed approving.

This is the CI/CD killer of Azure AI infrastructure — and it affects every team trying to deploy a private RAG stack in a governed environment.

In this article I’ll walk through how to break the Pending deadlock programmatically, strip out static API keys with Identity Chaining, and wire up Private DNS so the whole stack survives a Zero-Trust audit.

Get the AzAPI Auto-Approve workaround free on GitHub 🐙

The Two Failure Modes of Enterprise AI Deployments

Deploying OpenAI and AI Search in a sandbox takes an afternoon. Securing them in an enterprise tenant with public_network_access_enabled = false and an active compliance framework is a completely different game. Two things will reliably break a standard Terraform deployment:

1. The Pending Deadlock

AI Search must call OpenAI to vectorize data. The azurerm provider can successfully request this Shared Private Link connection — but it cannot approve its own request. The target resource (OpenAI) must explicitly accept the inbound connection. Because the standard provider has no method to approve inbound Cognitive Services connections, the pipeline deadlocks. Someone has to manually click “Approve” in the Portal. ClickOps in a CI/CD pipeline is unacceptable.

2. The Public PaaS Trap

The moment you set public_network_access_enabled = false and local_auth_enabled = false, your AI services disappear from the internet. If your VNet injection, Private DNS, and Managed Identities are not perfectly aligned in Terraform, your application silently breaks — and debugging a 403 against a private endpoint with no public access is genuinely painful.

Target Architecture

┌──────────────────────────────────────────────────────────────┐
│  Isolated AI VNet                                            │
│                                                              │
│  ┌─────────────────────┐             ┌────────────────────┐  │
│  │ Azure AI Search     │             │ Azure OpenAI       │  │
│  │ (Vector Database)   │════════════▶│ (LLM & Embeddings) │  │
│  │                     │ Shared Link │                    │  │
│  │ [System Identity]   │  Approved   │ [Local Auth: OFF]  │  │
│  └───────┬─────────────┘             └──────────┬─────────┘  │
│          │    RBAC: Cognitive Services User      │           │
│          └───────────────────────────────────────┘           │
│          ▼                                       ▼           │
│  [Private Endpoint]                    [Private Endpoint]    │
└──────────┼─────────────────────────────────────┼─────────────┘
           ▼                                     ▼
 privatelink.search.windows.net     privatelink.openai.azure.com

Three engineering decisions make this work:

AzAPI State Machine — dynamically reads and approves the Pending connection without manual Portal access
Identity Chaining — AI Search authenticates to OpenAI via Managed Identity, zero static keys
Private DNS — both services resolve to private IPs inside the VNet, invisible to the internet

Step 1: The AzAPI Auto-Approve State Machine

To fix the Pending Deadlock, we bypass the azurerm provider and talk directly to the Azure Resource Manager REST API using the azapi provider.

The challenge: Azure dynamically generates a random GUID for the incoming connection on the OpenAI side. We cannot hardcode this ID — we have to read it at runtime. This requires a two-step state machine.

First, we read all current Private Endpoint Connections on the OpenAI account:

data "azapi_resource_list" "pe_connections" {
  type                   = "Microsoft.CognitiveServices/accounts/privateEndpointConnections@2023-05-01"
  parent_id              = azurerm_cognitive_account.openai.id
  response_export_values = ["value"]

  depends_on = [
    azurerm_search_shared_private_link_service.openai_link
  ]
}

The depends_on is critical here. Without it, Terraform might query the connection list before the Shared Private Link has even been requested — returning an empty list and making the approval resource fail silently.

Second, we decode the response, filter for the Pending connection, and approve it:

resource "azapi_update_resource" "approve_shared_link" {
  type = "Microsoft.CognitiveServices/accounts/privateEndpointConnections@2023-05-01"

  resource_id = try(
    [for conn in jsondecode(data.azapi_resource_list.pe_connections.output).value :
      conn.id
      if conn.properties.privateLinkServiceConnectionState.status == "Pending"
    ][0],
    ""
  )

  body = jsonencode({
    properties = {
      privateLinkServiceConnectionState = {
        status      = "Approved"
        description = "Approved via Terraform AzAPI Pipeline"
      }
    }
  })
}

The try() wrapper deserves special attention. On a terraform destroy run, the Shared Private Link is deleted before this resource is evaluated. The connection list will be empty, and the for expression will return an empty array. Without try(), indexing [0] on an empty array throws a hard error and your destroy run crashes — leaving orphaned resources in Azure. With try(), the expression gracefully returns an empty string and Terraform skips the resource cleanly.

After terraform apply, the link transitions from Pending to Approved in under 30 seconds. No Portal access required.

Step 2: Identity Chaining — Killing Static API Keys

Auto-approving the link allows AI Search to reach OpenAI. But if you rely on static admin_keys to authenticate that traffic, you will fail any modern compliance audit. Keys leak. Keys get committed to Git. Keys expire at 3am on a Friday.

The enterprise standard is Identity Chaining: AI Search authenticates to OpenAI using its own cryptographic Entra ID identity, not a shared secret.

First, give AI Search a System Assigned Managed Identity and explicitly disable local authentication:

resource "azurerm_search_service" "search" {
  name                          = var.search_service_name
  resource_group_name           = var.resource_group_name
  location                      = var.location
  sku                           = "standard"

  public_network_access_enabled = false
  local_authentication_enabled  = false  # No API keys — ever

  identity {
    type = "SystemAssigned"
  }
}

Then grant that identity exactly the permissions it needs on OpenAI — nothing more:

resource "azurerm_role_assignment" "search_to_openai" {
  scope                = azurerm_cognitive_account.openai.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_search_service.search.identity[0].principal_id
}

Cognitive Services OpenAI User is the least-privileged role for this use case — it allows AI Search to submit inference requests but cannot modify the OpenAI account configuration. Least-privilege RBAC is a hard requirement under ISO 27001 Annex A.8 and NIS2 Article 21.

When the AI Search instance is deleted, its Entra ID identity and all associated role assignments are automatically destroyed. No credential rotation. No secret management. No audit findings.

Step 3: Private DNS — The Last Trap

The final failure mode is DNS. When Private Endpoints are injected into a subnet, internal traffic relies on DNS overrides. If your-instance.openai.azure.com still resolves to a public IP — because you forgot to link the Private DNS Zone to your VNet — the Azure Firewall drops the traffic and you get another opaque 403.

Both services need their own Private DNS Zone linked to the VNet:

resource "azurerm_private_dns_zone" "openai_dns" {
  name                = "privatelink.openai.azure.com"
  resource_group_name = var.resource_group_name
}

resource "azurerm_private_dns_zone" "search_dns" {
  name                = "privatelink.search.windows.net"
  resource_group_name = var.resource_group_name
}

resource "azurerm_private_dns_zone_virtual_network_link" "openai_vnet_link" {
  name                  = "link-openai-vnet"
  resource_group_name   = var.resource_group_name
  private_dns_zone_name = azurerm_private_dns_zone.openai_dns.name
  virtual_network_id    = azurerm_virtual_network.vnet.id
  registration_enabled  = false
}

resource "azurerm_private_dns_zone_virtual_network_link" "search_vnet_link" {
  name                  = "link-search-vnet"
  resource_group_name   = var.resource_group_name
  private_dns_zone_name = azurerm_private_dns_zone.search_dns.name
  virtual_network_id    = azurerm_virtual_network.vnet.id
  registration_enabled  = false
}

registration_enabled = false is intentional — automatic DNS registration conflicts with centralized Private DNS Zone management in Hub & Spoke environments. If you are running this inside an existing Hub & Spoke topology, the DNS zones should live in the Hub and be linked cross-subscription, not deployed per-workload.

The Result

Getting a Zero-Trust RAG stack to actually work in a governed Azure environment means navigating a Pending deadlock that the standard Terraform provider cannot solve, stripping out API keys that violate compliance frameworks, and wiring up DNS that silently breaks when you forget a single VNet link.

With the AzAPI state machine, Identity Chaining, and proper Private DNS, you end up with an AI infrastructure that:

Deploys end-to-end without any manual Portal approval
Uses zero static API keys — authentication is entirely identity-based
Is completely invisible to the public internet
Satisfies ISO 27001 Annex A.8 and NIS2 Article 21 network and identity controls

The free repository at the top covers the AzAPI automation trick and the basic networking setup. The Enterprise Blueprint packages everything in this article — automated approvals, full VNet injection, Private DNS, RBAC Identity Chaining, and hub integration — into a single tested module.