Overview

Organizations adopting Generative AI on Azure often want two key capabilities:

  1. Pay-as-you-go (PAYG) pricing – to avoid idle infrastructure costs associated with hosting large models.
  2. Private network access – to ensure that AI services are not exposed over the public internet and meet security requirements.

Azure Databricks provides capabilities that partially satisfy both requirements, but there are important architectural constraints when trying to combine them.

This information sheet explains what is possible, what is not currently supported, and recommended architecture options.

TL

Azure Databricks can support both PAYG GenAI and private networking architectures, but they cannot currently be combined for Databricks-hosted foundation model APIs.

Organizations must decide whether cost efficiency or private endpoint enforcement is the primary architectural requirement.

PAYG foundation model APIs

  • Lowest cost
  • No hosting required
  • No private serving endpoint

Provisioned/custom model serving

  • Private endpoints supported
  • Higher cost
  • Requires provisioned infrastructure

Organizations should choose the architecture based on whether cost optimization or network isolation is the higher priority.


1. Using GenAI Models in Azure Databricks

Azure Databricks provides Foundation Model APIs that allow users to access large language models and other GenAI models directly from their Databricks workspace.

Foundational Model APIs documentation

These APIs enable:

  • Prompt-based model inference
  • Chat completion
  • Embeddings
  • Model evaluation and experimentation

The models are hosted and managed by Databricks, meaning customers do not need to provision GPU clusters or manage infrastructure.

Key Characteristics

  • Managed model hosting
  • Token-based usage billing
  • Direct integration with Databricks notebooks, jobs, and ML pipelines
  • Supports popular foundation models (LLMs and embedding models)

Supported models documentation


2. Pay-As-You-Go Pricing Model

Azure Databricks Foundation Model APIs support a pay-per-token pricing model, which provides a natural PAYG cost structure.

Pricing and usage documentation

Benefits

  • No infrastructure provisioning required
  • No idle GPU cost
  • Scales automatically based on usage
  • Costs are tied directly to inference volume

This approach is well suited for:

  • Experimental GenAI development
  • Low or variable traffic workloads
  • Internal copilots or AI assistants
  • Prototyping RAG pipelines

3. Private Networking in Azure Databricks

Azure Databricks supports private connectivity through Azure Private Link, enabling organizations to prevent public internet exposure of workspace resources.

Allows secure inbound connectivity to the Databricks workspace.

Benefits:

  • Workspace accessible only through a private network
  • Integration with enterprise VNET architectures
  • Eliminates public endpoint access

Private Link documentation


Serverless Private Connectivity (Network Connectivity Configurations)

Databricks also provides Network Connectivity Configurations (NCC) which enable serverless Databricks compute services to securely access Azure resources through private endpoints.

Serverless Private Link documentation

Examples of resources accessed privately include:

  • Azure Storage
  • Azure SQL
  • Azure Key Vault
  • Internal enterprise APIs

4. Limitation: PAYG Foundation Models and Private Endpoints

The key limitation is related to Model Serving networking capabilities.

While Azure Databricks supports private networking in many areas, Databricks-hosted pay-per-token foundation model endpoints currently do not support private endpoint access for the model serving endpoint itself.

Private connectivity for model serving is supported only for:

  • Provisioned Throughput Endpoints
  • Custom Model Serving Endpoints

Model serving documentation

This means that the Foundation Model PAYG endpoints cannot currently be placed behind private endpoints.


5. Architecture Options

Organizations must choose between two primary architecture patterns depending on their priorities.


Option A — PAYG-Optimized Architecture

Objective

Minimize infrastructure costs while still maintaining secure access to the Databricks environment.

Architecture Components

  • Azure Databricks Workspace
  • Azure Private Link for workspace access
  • Foundation Model APIs (Pay-Per-Token)
  • Databricks notebooks or applications invoking the models

Foundation model APIs documentation

Characteristics

Pros

  • True PAYG model inference
  • No GPU hosting costs
  • Simple operational model

Cons

  • Model serving endpoint itself cannot be private-endpoint restricted
  • Some outbound access to Databricks-hosted APIs is required

Best For

  • Development environments
  • R&D teams
  • Internal productivity tools
  • Low-risk workloads

Option B — Private-Endpoint-First Architecture

Objective

Ensure model serving occurs entirely within private networking boundaries.

Architecture Components

  • Azure Databricks Workspace with Private Link
  • Provisioned Throughput Model Serving
  • Custom Model Serving endpoints
  • Private endpoint connectivity for serving endpoints

Model serving architecture documentation

Characteristics

Pros

  • Model serving endpoints can be privately exposed
  • Strongest security posture
  • Meets strict enterprise networking requirements

Cons

  • Requires provisioned capacity
  • Higher cost than PAYG
  • Infrastructure management required

Best For

  • Regulated environments
  • Sensitive data processing
  • Enterprise AI production systems
  • Strict network isolation policies

6. Recommendation Framework

When choosing an architecture, consider the following decision factors:

RequirementRecommended Approach
Lowest possible costPAYG Foundation Model APIs
Fully private AI servingProvisioned throughput endpoints
Prototype GenAI applicationsPAYG
Enterprise production workloadsPrivate serving architecture
High-security environmentsPrivate endpoints + provisioned serving