Deploying Azure Private Clusters
By default, HyperShift guest clusters are publicly accessible through public DNS and the management cluster's default router.
For private clusters on Azure, all communication between worker nodes and the hosted control plane occurs over Azure Private Link. This guide walks through the process of configuring HyperShift for private cluster support on Azure.
Tech Preview in OCP 4.22
Private self-managed Azure HostedClusters are planned as a Tech Preview feature in OpenShift Container Platform 4.22.
Before You Begin
This guide assumes you have completed the self-managed Azure setup described in the Self-Managed Azure Overview, including:
- An OpenShift management cluster running on Azure (not AKS). The private cluster
workflow uses
oc get infrastructure clusterto discover the management cluster's Azure resource group, VNet, and other platform details — these APIs are only available on OpenShift. For AKS-based management clusters, use managed Azure HyperShift (ARO HCP) instead. - Azure Workload Identity and OIDC issuer configuration
- Management cluster with HyperShift operator installed (will be reinstalled with private support)
- Azure CLI (
az), HyperShift CLI (hypershift),oc/kubectl,jq, andyq
Overview
Private endpoint access uses Azure Private Link Service (PLS) to expose the hosted control plane's internal load balancer to the guest cluster's VNet through a Private Endpoint. Worker nodes resolve the API server hostname via Private DNS zones that point to the Private Endpoint IP.
The workflow has five steps:
- Prepare a NAT subnet in the management cluster's VNet
- Install the HyperShift operator with private platform support
- Create IAM resources
- Create infrastructure
- Create the private HostedCluster
Step 1: Prepare the NAT Subnet
Azure Private Link Service requires a dedicated subnet for NAT IP allocation. This
subnet must be in the management cluster's VNet and must have
privateLinkServiceNetworkPolicies disabled.
Region Requirement
The Private Link Service, NAT subnet, and management cluster's internal load balancer must all be in the same Azure region. The PLS is automatically created in the HostedCluster's configured location. Azure will reject PLS creation if the NAT subnet is in a different region.
First, identify the management cluster's VNet:
# Get the management cluster's infrastructure resource group
MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}')
# Find the VNet in the infrastructure resource group
MGMT_VNET_NAME=$(az network vnet list --resource-group "${MGMT_INFRA_RG}" --query "[0].name" -o tsv)
MGMT_VNET_RG="${MGMT_INFRA_RG}"
Create the NAT subnet:
NAT_SUBNET_NAME="pls-nat-subnet"
# Check existing address space and subnets to choose a non-overlapping CIDR
az network vnet show \
--resource-group "${MGMT_VNET_RG}" \
--name "${MGMT_VNET_NAME}" \
--query '{addressSpace: addressSpace.addressPrefixes, subnets: subnets[].{name: name, prefix: addressPrefix}}' \
-o json
az network vnet subnet create \
--resource-group "${MGMT_VNET_RG}" \
--vnet-name "${MGMT_VNET_NAME}" \
--name "${NAT_SUBNET_NAME}" \
--address-prefixes 10.1.64.0/24 \
--disable-private-link-service-network-policies true
Choose a Non-Overlapping CIDR
The 10.1.64.0/24 address prefix above is an example only. You must choose a
CIDR range that does not overlap with any existing subnets in the management cluster's
VNet. Check the VNet's address space and existing subnets before creating the NAT
subnet. If the management cluster's VNet uses 10.0.0.0/16, the NAT subnet must
fall within that range (e.g., 10.0.64.0/24) or you must first expand the VNet's
address space.
Get the NAT subnet resource ID for later use:
NAT_SUBNET_ID=$(az network vnet subnet show \
--resource-group "${MGMT_VNET_RG}" \
--vnet-name "${MGMT_VNET_NAME}" \
--name "${NAT_SUBNET_NAME}" \
--query id -o tsv)
Important
The NAT subnet must be in the management cluster's VNet, not the guest VNet. This is because the Private Link Service is created alongside the management cluster's internal load balancer.
Note
The --disable-private-link-service-network-policies true flag is required.
Without it, Azure will reject PLS creation on this subnet.
Step 2: Install HyperShift Operator with Private Platform Support
To support private clusters, the HyperShift operator must be installed with additional flags that configure Azure Private Link Service management.
You need credentials that allow the operator to manage PLS resources:
# Azure credentials file for PLS management (same format as standard Azure creds)
AZURE_PRIVATE_CREDS="/path/to/azure-private-credentials.json"
# Management cluster's infrastructure resource group
MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}')
Install the operator with private platform support. The private-specific flags are added in addition to the standard install flags (External DNS, pull secret, etc.):
hypershift install \
--pull-secret ${PULL_SECRET} \
--private-platform Azure \
--azure-private-creds ${AZURE_PRIVATE_CREDS} \
--azure-pls-resource-group ${MGMT_INFRA_RG} \
# ... include your standard install flags (External DNS, etc.)
| Flag | Description |
|---|---|
--private-platform Azure |
Enables Azure Private Link Service management in the operator |
--azure-private-creds |
Path to Azure credentials file used for PLS operations |
--azure-pls-resource-group |
Resource group where PLS resources will be created (the management cluster's infrastructure RG) |
Alternative authentication methods (use one of these instead of --azure-private-creds):
| Flag | Description |
|---|---|
--azure-private-secret |
Name of an existing Kubernetes secret containing Azure credentials (use with --azure-private-secret-key to specify the key, default: credentials) |
--azure-pls-managed-identity-client-id |
Client ID of a managed identity for PLS operations via Azure Workload Identity federation (requires --azure-pls-subscription-id) |
--azure-pls-subscription-id |
Azure subscription ID for PLS operations (required with --azure-pls-managed-identity-client-id) |
Choose One Authentication Method
The three authentication methods (--azure-private-creds, --azure-private-secret,
--azure-pls-managed-identity-client-id) are mutually exclusive. Use exactly one.
Re-install Required for Private Support
If you already installed HyperShift without --private-platform Azure, you must
re-run hypershift install with the private platform flags before creating any
private clusters. The operator will not watch AzurePrivateLinkService CRs until
configured with private platform support. You can safely re-run hypershift install
to update the existing installation.
Step 3: Create IAM Resources
Create workload identities for the cluster. The create iam azure command always creates
a Control Plane Operator identity, which is used by private clusters to manage Private
Endpoints and Private DNS zones in the guest subscription.
PREFIX="your-prefix"
CLUSTER_NAME="${PREFIX}-hc"
RESOURCE_GROUP_NAME="${CLUSTER_NAME}-${PREFIX}"
LOCATION="eastus"
AZURE_CREDS="/path/to/azure-credentials.json"
OIDC_ISSUER_URL="https://yourstorageaccount.blob.core.windows.net/yourstorageaccount"
WORKLOAD_IDENTITIES_FILE="./workload-identities.json"
hypershift create iam azure \
--name "${CLUSTER_NAME}" \
--infra-id "${PREFIX}" \
--azure-creds "${AZURE_CREDS}" \
--location "${LOCATION}" \
--resource-group-name "${RESOURCE_GROUP_NAME}" \
--oidc-issuer-url "${OIDC_ISSUER_URL}" \
--output-file "${WORKLOAD_IDENTITIES_FILE}"
The command creates 8 workload identities, including the Control Plane Operator identity:
| Identity | Operator | Azure Role | Scopes |
|---|---|---|---|
| Control Plane Operator | CPO | Contributor (default) or Custom HCP Role | Managed RG, NSG RG, VNet RG |
This identity allows the CPO to create and manage Private Endpoints, Private DNS zones, VNet links, and DNS A records in the guest subscription.
Note
The CPO identity is assigned the Contributor role by default. When using
--assign-custom-hcp-roles, a more restrictive custom role is used instead.
Step 4: Create Infrastructure
Create the Azure infrastructure. The create infra azure command creates the same
resources regardless of endpoint access topology:
DNS_ZONE_RG_NAME="os4-common"
PARENT_DNS_ZONE="your-base.domain.com"
INFRA_OUTPUT_FILE="${PREFIX}-infra-output.json"
hypershift create infra azure \
--azure-creds "${AZURE_CREDS}" \
--infra-id "${PREFIX}" \
--name "${CLUSTER_NAME}" \
--location "${LOCATION}" \
--base-domain "${PARENT_DNS_ZONE}" \
--dns-zone-rg-name "${DNS_ZONE_RG_NAME}" \
--workload-identities-file "${WORKLOAD_IDENTITIES_FILE}" \
--assign-identity-roles \
--output-file "${INFRA_OUTPUT_FILE}"
Step 5: Create the Private HostedCluster
Read the infrastructure output to get the resource IDs created in Step 4:
MANAGED_RG_NAME=$(yq -r -p yaml '.resourceGroupName' "${INFRA_OUTPUT_FILE}")
VNET_ID=$(yq -r -p yaml '.vnetID' "${INFRA_OUTPUT_FILE}")
SUBNET_ID=$(yq -r -p yaml '.subnetID' "${INFRA_OUTPUT_FILE}")
NSG_ID=$(yq -r -p yaml '.securityGroupID' "${INFRA_OUTPUT_FILE}")
Create the private HostedCluster:
hypershift create cluster azure \
--name "$CLUSTER_NAME" \
--namespace "clusters" \
--azure-creds ${AZURE_CREDS} \
--location ${LOCATION} \
--node-pool-replicas 2 \
--base-domain ${PARENT_DNS_ZONE} \
--pull-secret ${PULL_SECRET} \
--generate-ssh \
--release-image ${RELEASE_IMAGE} \
--resource-group-name "${MANAGED_RG_NAME}" \
--vnet-id "${VNET_ID}" \
--subnet-id "${SUBNET_ID}" \
--network-security-group-id "${NSG_ID}" \
--sa-token-issuer-private-key-path "${SA_TOKEN_ISSUER_PRIVATE_KEY_PATH}" \
--oidc-issuer-url "${OIDC_ISSUER_URL}" \
--dns-zone-rg-name ${DNS_ZONE_RG_NAME} \
--assign-service-principal-roles \
--workload-identities-file ${WORKLOAD_IDENTITIES_FILE} \
--diagnostics-storage-account-type Managed \
--external-dns-domain ${DNS_ZONE_NAME} \
--endpoint-access Private \
--endpoint-access-private-nat-subnet-id "${NAT_SUBNET_ID}"
Note
The --endpoint-access flag accepts three values:
Public(default): API server accessible via public endpoint onlyPublicAndPrivate: API server accessible via both public and private endpointsPrivate: API server accessible only via Private Link (private endpoint)
Endpoint Access Type is Immutable
You cannot change a cluster between Public and non-Public (Private or
PublicAndPrivate) after creation. Transitions between PublicAndPrivate and
Private are allowed, but switching from Public to Private (or vice versa)
requires creating a new cluster.
Additional Allowed Subscriptions
If you need to allow Private Endpoint connections from Azure subscriptions other
than the guest cluster's own subscription, use the
--endpoint-access-private-additional-allowed-subscriptions flag:
--endpoint-access-private-additional-allowed-subscriptions "sub-id-1,sub-id-2"
Verify Private Connectivity
After creating the cluster, monitor the Private Link Service setup progress:
# Check AzurePrivateLinkService resources
oc get azureprivatelinkservices -n clusters-${CLUSTER_NAME}
# Check detailed status and conditions
oc get azureprivatelinkservices -n clusters-${CLUSTER_NAME} -o yaml
The conditions should progress through these stages:
| Condition | Description |
|---|---|
AzureInternalLoadBalancerAvailable |
Internal load balancer has a frontend IP |
AzurePLSCreated |
Private Link Service created in management cluster |
AzurePrivateEndpointAvailable |
Private Endpoint created in guest VNet |
AzurePrivateDNSAvailable |
Private DNS zones and A records created |
AzurePrivateLinkServiceAvailable |
All components ready, private connectivity available |
Check overall cluster status:
oc get hostedcluster ${CLUSTER_NAME} -n clusters
oc wait --for=condition=Available hostedcluster/${CLUSTER_NAME} -n clusters --timeout=30m
Access a Private HostedCluster
Generate a Kubeconfig
hypershift create kubeconfig --name ${CLUSTER_NAME} --port-forward > ${CLUSTER_NAME}-kubeconfig
Port-Forward Method
If you have access to the management cluster, you can port-forward to the API server:
# Port-forward the kube-apiserver service
kubectl port-forward svc/kube-apiserver -n clusters-${CLUSTER_NAME} 6443:6443 &
# Use the kubeconfig (it will connect via localhost:6443)
KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes
VNet-Peered Access
If you have a VM in a VNet that is peered with the guest VNet, you can access the API server, but you must first link the Private DNS zones to the peered VNet:
# Link the hypershift.local Private DNS zone to your peered VNet
PEERED_VNET_ID="/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>"
az network private-dns link vnet create \
--resource-group "${MANAGED_RG_NAME}" \
--zone-name "${CLUSTER_NAME}.hypershift.local" \
--name "peered-vnet-link" \
--virtual-network "${PEERED_VNET_ID}" \
--registration-enabled false
# If you also need base domain resolution (for OAuth/console):
az network private-dns link vnet create \
--resource-group "${MANAGED_RG_NAME}" \
--zone-name "${PARENT_DNS_ZONE}" \
--name "peered-vnet-basedomain-link" \
--virtual-network "${PEERED_VNET_ID}" \
--registration-enabled false
# Then access the cluster
KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes
Private DNS Zones Are Only Linked to the Guest VNet
The CPO only links Private DNS zones to the guest cluster's VNet. If you want to resolve the API server hostname from a peered VNet, you must manually link the Private DNS zones to that VNet as shown above. Without this step, DNS resolution will fail from the peered VNet.
Cleanup
To delete a private HostedCluster:
hypershift destroy cluster azure \
--name ${CLUSTER_NAME} \
--azure-creds ${AZURE_CREDS} \
--resource-group-name ${MANAGED_RG_NAME} \
--dns-zone-rg-name ${DNS_ZONE_RG_NAME}
The deletion process automatically cleans up Private Link resources in the correct order:
- The control plane operator removes the Private Endpoint, Private DNS zones, VNet links, and A records
- The HyperShift operator removes the Private Link Service
Cleanup Order
The dual-finalizer pattern ensures resources are deleted in the correct dependency order. The CPO finalizer runs first (removing guest-side resources), then the HO finalizer runs (removing management-side resources).
Gotchas and Troubleshooting
Management Cluster Requirements
-
The management cluster must be an OpenShift cluster running on Azure, not AKS. Commands like
oc get infrastructure clusterare used to discover the management cluster's Azure resource group and VNet, and these only work on OpenShift. For AKS-based management clusters, use managed Azure HyperShift (ARO HCP) instead. -
The HyperShift operator must be installed with
--private-platform Azurebefore creating any private clusters. If you followed the management cluster setup guide without private flags, re-runhypershift installwith the additional private platform flags.
NAT Subnet
-
The NAT subnet CIDR (
--address-prefixes) must fall within the management cluster's VNet address space. If the VNet uses10.0.0.0/16, a NAT subnet of10.1.64.0/24will fail unless you first expand the VNet address space. -
The
--disable-private-link-service-network-policies trueflag is required on the NAT subnet. If omitted, Azure will reject PLS creation with an error about network policies. This error is not always obvious — if PLS creation fails, check this setting first:az network vnet subnet show \ --resource-group "${MGMT_VNET_RG}" \ --vnet-name "${MGMT_VNET_NAME}" \ --name "${NAT_SUBNET_NAME}" \ --query privateLinkServiceNetworkPoliciesThe value must be
"Disabled".
Endpoint Access Immutability
-
You cannot change a cluster from
PublictoPrivate(orPrivatetoPublic) after creation. The API validation rejects this transition. You can only switch betweenPublicAndPrivateandPrivate. -
If you need to change a public cluster to private, you must create a new cluster with
--endpoint-access Privatefrom the start.
Cross-Subscription Scenarios
-
If the management cluster and guest cluster are in different Azure subscriptions, you must include the guest subscription in the PLS auto-approval list using
--endpoint-access-private-additional-allowed-subscriptionswith the guest's subscription ID. -
The CPO workload identity must also have permissions (Contributor or custom role) in the guest subscription's resource groups to create Private Endpoints and DNS resources.
Private DNS Resolution
-
Private DNS zones are only linked to the guest cluster's VNet. If you need to access the API server from a peered VNet, you must manually link the Private DNS zones to that VNet (see VNet-Peered Access above).
-
Two Private DNS zones are created:
<clusterName>.hypershift.local— synthetic internal zone withapiand*.appsrecords<baseDomain>— base domain zone withapi-<clusterName>andoauth-<clusterName>records
External DNS Domain Must Not Match Cluster Domain
Azure Private DNS Zone Shadowing
On private Azure HostedClusters, do not set --external-dns-domain to a value
that matches or is a parent domain of {clusterName}.{baseDomain}. For example,
if your cluster is named my-cluster with base domain example.com, do not use
--external-dns-domain my-cluster.example.com or --external-dns-domain example.com.
This misconfiguration cannot be corrected after cluster creation because the
relevant fields (spec.services, spec.dns.baseDomain, and metadata.name) are
all immutable. The cluster must be destroyed and recreated with a different
--external-dns-domain value.
Safe example: If your cluster is my-cluster with base domain example.com,
use a separate subdomain such as --external-dns-domain custom-dns.example.com
that does not overlap with my-cluster.example.com.
What Goes Wrong
Private Azure clusters use two separate routing paths:
-
Management-plane router (
private-router): An HAProxy pod in the hosted control plane namespace, fronted by an internal load balancer and exposed to the guest VNet through Azure Private Link. Worker nodes reach this router via the Private Endpoint IP. HAProxy uses SNI-based routing and only has ACLs for.hypershift.localhostnames (KAS, ignition, konnectivity, OAuth). Any hostname that does not match an ACL falls through to thedefault_backend kube_api, which returns KAS certificates. -
Data-plane router (
router-default): The OpenShift ingress controller running on worker nodes, serving*.apps.{clusterName}.{baseDomain}hostnames with the correct wildcard ingress certificate.
When --external-dns-domain matches the cluster domain, the PLS controller creates a
Private DNS zone named {clusterName}.{baseDomain}. This zone becomes authoritative
for all queries under that name within the guest VNet, including
*.apps.{clusterName}.{baseDomain}. Since the zone only has api and oauth A
records pointing to the Private Endpoint IP, apps queries either:
- Return NXDOMAIN (if no
*.appsrecord exists in the zone), or - Resolve to the Private Endpoint IP, which routes to
private-router(HAProxy). Because*.appshostnames do not match any HAProxy SNI ACL, traffic falls through tokube_apiand the client receives a TLS certificate mismatch (KAS cert instead of the ingress wildcard cert).
Neither outcome is usable. The console, OAuth login, and all application routes are unreachable.
Why the Controller Cannot Self-Heal
The controller cannot fix this by adding a *.apps wildcard record to the shadowing
zone because:
- The Private Endpoint IP routes to the management-plane
private-router, not the data-planerouter-default. Adding*.apps → PE IPwould route apps traffic to HAProxy, which does not serve those hostnames. - The correct target (the data-plane ingress IP on worker nodes) is not available to the PLS controller. The controller runs in the control plane and has no client to the guest cluster. There is no HCP status field that reports the guest ingress IP, and the HostedCluster Controller Operator (HCCO) does not propagate it back.
When the controller detects shadowing, it sets AzurePrivateDNSAvailable=False with
reason BaseDomainShadowsClusterDomain and skips zone creation entirely. This
prevents the shadowing zone from being created, but the api and oauth hostnames
from --external-dns-domain will not resolve via Private DNS. The cluster must be
recreated with a non-overlapping domain.
Condition Debugging
If the cluster gets stuck, check the AzurePrivateLinkService CR conditions:
oc get azureprivatelinkservices -n clusters-${CLUSTER_NAME} -o jsonpath='{.items[0].status.conditions}' | jq .
| Stuck Condition | Likely Cause |
|---|---|
AzureInternalLoadBalancerAvailable = False |
The private-router Service hasn't received an ILB IP yet. Check the Service status and Azure networking. |
AzurePLSCreated = False |
PLS creation failed. Check NAT subnet policies, credentials, and the HO operator logs. |
AzurePrivateEndpointAvailable = False |
PE creation failed or connection not approved. Check the PLS auto-approval list and CPO logs. |
AzurePrivateDNSAvailable = False |
DNS zone or record creation failed. If the reason is BaseDomainShadowsClusterDomain, the --external-dns-domain value overlaps with the cluster domain — the cluster must be recreated with a different value. See External DNS Domain Must Not Match Cluster Domain. |
Related Documentation
- Azure Private Link Architecture - Detailed architecture reference
- Self-Managed Azure Overview - Complete self-managed Azure guide
- Create a Self-Managed Azure HostedCluster - Standard (public) cluster creation
- Azure Self-Managed Infrastructure Reference - Infrastructure details