Overview
Organizations adopting Generative AI on Azure often want two key capabilities:
- Pay-as-you-go (PAYG) pricing – to avoid idle infrastructure costs associated with hosting large models.
- Private network access – to ensure that AI services are not exposed over the public internet and meet security requirements.
Azure Databricks provides capabilities that partially satisfy both requirements, but there are important architectural constraints when trying to combine them.
This information sheet explains what is possible, what is not currently supported, and recommended architecture options.
TL
Azure Databricks can support both PAYG GenAI and private networking architectures, but they cannot currently be combined for Databricks-hosted foundation model APIs.
Organizations must decide whether cost efficiency or private endpoint enforcement is the primary architectural requirement.
PAYG foundation model APIs
- Lowest cost
- No hosting required
- No private serving endpoint
Provisioned/custom model serving
- Private endpoints supported
- Higher cost
- Requires provisioned infrastructure
Organizations should choose the architecture based on whether cost optimization or network isolation is the higher priority.
1. Using GenAI Models in Azure Databricks
Azure Databricks provides Foundation Model APIs that allow users to access large language models and other GenAI models directly from their Databricks workspace.
Foundational Model APIs documentation
These APIs enable:
- Prompt-based model inference
- Chat completion
- Embeddings
- Model evaluation and experimentation
The models are hosted and managed by Databricks, meaning customers do not need to provision GPU clusters or manage infrastructure.
Key Characteristics
- Managed model hosting
- Token-based usage billing
- Direct integration with Databricks notebooks, jobs, and ML pipelines
- Supports popular foundation models (LLMs and embedding models)
Supported models documentation
2. Pay-As-You-Go Pricing Model
Azure Databricks Foundation Model APIs support a pay-per-token pricing model, which provides a natural PAYG cost structure.
Pricing and usage documentation
Benefits
- No infrastructure provisioning required
- No idle GPU cost
- Scales automatically based on usage
- Costs are tied directly to inference volume
This approach is well suited for:
- Experimental GenAI development
- Low or variable traffic workloads
- Internal copilots or AI assistants
- Prototyping RAG pipelines
3. Private Networking in Azure Databricks
Azure Databricks supports private connectivity through Azure Private Link, enabling organizations to prevent public internet exposure of workspace resources.
Workspace Private Link
Allows secure inbound connectivity to the Databricks workspace.
Benefits:
- Workspace accessible only through a private network
- Integration with enterprise VNET architectures
- Eliminates public endpoint access
Serverless Private Connectivity (Network Connectivity Configurations)
Databricks also provides Network Connectivity Configurations (NCC) which enable serverless Databricks compute services to securely access Azure resources through private endpoints.
Serverless Private Link documentation
Examples of resources accessed privately include:
- Azure Storage
- Azure SQL
- Azure Key Vault
- Internal enterprise APIs
4. Limitation: PAYG Foundation Models and Private Endpoints
The key limitation is related to Model Serving networking capabilities.
While Azure Databricks supports private networking in many areas, Databricks-hosted pay-per-token foundation model endpoints currently do not support private endpoint access for the model serving endpoint itself.
Private connectivity for model serving is supported only for:
- Provisioned Throughput Endpoints
- Custom Model Serving Endpoints
This means that the Foundation Model PAYG endpoints cannot currently be placed behind private endpoints.
5. Architecture Options
Organizations must choose between two primary architecture patterns depending on their priorities.
Option A — PAYG-Optimized Architecture
Objective
Minimize infrastructure costs while still maintaining secure access to the Databricks environment.
Architecture Components
- Azure Databricks Workspace
- Azure Private Link for workspace access
- Foundation Model APIs (Pay-Per-Token)
- Databricks notebooks or applications invoking the models
Foundation model APIs documentation
Characteristics
Pros
- True PAYG model inference
- No GPU hosting costs
- Simple operational model
Cons
- Model serving endpoint itself cannot be private-endpoint restricted
- Some outbound access to Databricks-hosted APIs is required
Best For
- Development environments
- R&D teams
- Internal productivity tools
- Low-risk workloads
Option B — Private-Endpoint-First Architecture
Objective
Ensure model serving occurs entirely within private networking boundaries.
Architecture Components
- Azure Databricks Workspace with Private Link
- Provisioned Throughput Model Serving
- Custom Model Serving endpoints
- Private endpoint connectivity for serving endpoints
Model serving architecture documentation
Characteristics
Pros
- Model serving endpoints can be privately exposed
- Strongest security posture
- Meets strict enterprise networking requirements
Cons
- Requires provisioned capacity
- Higher cost than PAYG
- Infrastructure management required
Best For
- Regulated environments
- Sensitive data processing
- Enterprise AI production systems
- Strict network isolation policies
6. Recommendation Framework
When choosing an architecture, consider the following decision factors:
| Requirement | Recommended Approach |
|---|---|
| Lowest possible cost | PAYG Foundation Model APIs |
| Fully private AI serving | Provisioned throughput endpoints |
| Prototype GenAI applications | PAYG |
| Enterprise production workloads | Private serving architecture |
| High-security environments | Private endpoints + provisioned serving |