`?`	Open keyboard shortcuts
`Ctrl/Cmd`+`K`	Open search
`Ctrl/Cmd`+`Shift`+`K`	Open tag search
`Ctrl/Cmd`+`Shift`+`G`	Open global graph
`Esc`	Close the active popup

AI Platform Architecture Document

How

This template defines the structure for a Platform Architecture Document (PAD). The PAD is the foundational document in the template system — it describes the shared infrastructure, integrations, security, and operational capabilities of a platform upon which one or more use cases are built. This template has been pre-populated with defaults derived from the standard NeuralOps Platform and common client deployment patterns. Pre-populated content reflects what is typically true for every deployment. You should:

Review all pre-populated content and adjust for client-specific differences.

Replace all [bracketed placeholders] with project-specific content.

Remove or replace guidance text (blockquote format, lines starting with >) once the section is populated.

[Diagram: ...] placeholders indicate where diagrams should be inserted. Replace with the actual diagram and a brief caption. Recommended tools: draw.io, Mermaid, Lucidchart, or Visio.

Cross-references use Obsidian link syntax: in-document links use [[#heading-text|Heading Text]] format, and cross-document links use [[Document Name#section|Section]] format (e.g. [[PROJECT CODE – USE CASE CODE – SDD#5-architectural-impact-assessment|5 Architectural Impact Assessment]] or [[#62-network-architecture|6.2 Network Architecture]]).

This document should be created once per platform. Use cases built on this platform are documented in separate OAD and SDD documents and registered in the 4 Use Case Register.

Document Metadata

Field	Detail
Initiative code	[PROJECT CODE]
Platform title	[Platform Name]
Document type	PAD – Platform Architecture Document
Status	[Draft / Under Review / Endorsed / Approved]
Author(s)	[Author Name(s)]
Approved by	[Approving body or individual]

Document Version Control

This document has undergone the following document version controls:

Date	Version	Change Description	Author
DD/MM/YYYY	0.1	Initial draft created	[Author Name]

Contributors

The content of this document has been authored with the combined input of the following group of key individuals:

Name	Role	Area
[Name]	Solution Lead	calab.ai (Vendor)
[Name(s)]	Solution Team	calab.ai (Vendor)
[Name(s)]	IT Rep - Security	Information Technology ([Client])
[Name(s)]	IT Rep - Architecture	Information Technology ([Client])
[Name(s)]	IT Rep - Infrastructure & Networks	Information Technology ([Client])
[Name(s)]	Business Sponsor / Process Owner	[Business Unit] ([Client])

Intended Audience

[List the target reader roles for this document and indicate which sections are most relevant to each. This helps readers quickly navigate to the content most applicable to their responsibilities.]

Role	Description	Key Sections
Architecture / Engineering	Solution architects, cloud engineers, and technical leads responsible for platform design and implementation	3 Platform Overview, 5 Integration View, 6 Infrastructure View
Security / Risk / Compliance	Information security officers, risk analysts, and compliance managers assessing platform security posture	8 Security View, 3.5 Guardrails and Compliance, 7 Information View
Infrastructure & Networks	Network engineers and infrastructure teams responsible for connectivity, firewall rules, and environment provisioning	6.2 Network Architecture, 6.1 Deployment Architecture, 6.4 Infrastructure Requirements
Business Sponsors / Process Owners	Business stakeholders sponsoring the platform initiative and overseeing use case onboarding	2 Business View, 4 Use Case Register, 6.5 Licensing and Cost Considerations
BAU Support / Operations	Operational support teams responsible for ongoing platform monitoring, incident response, and maintenance	9 Support View, 6.6 Backup and Recovery, 6.8 Failover and High Availability

Document Approval Requirements

The following table describes the approval gates required for this document:

Approval Gate	Status	Date
[Gate Name, e.g. Security Endorsement]	[Pending / Complete]	[Date]
[Gate Name, e.g. Architecture Peer Review]	[Pending / Complete]	[Date]
[Gate Name, e.g. Architecture Board Endorsement]	[Pending / Complete]	[Date]

1 Introduction

This document is the Platform Architecture Document (PAD) for the [Platform Name].

The purpose of this document is to:

Describe the foundational infrastructure and architectural patterns of the platform
Document the integration patterns and standard interfaces available to use cases
Capture the networking, security, and information governance requirements
Serve as the central reference point for IT stakeholders concerned with cyber security, risk and compliance, information classifications, integrations, and architectural patterns
Act as the parent architecture document for all Solution Design Documents that leverage this platform

The [Platform Name] is an Azure-hosted Generative AI platform built on the NeuralOps Platform — an enterprise Generative AI platform deployable from the Azure Marketplace that provides enterprise-grade agentic workflow and conversational agent capabilities. The platform consolidates AI services, data processing pipelines, and operational tooling into a single managed environment that can be extended to support multiple business use cases over time, with customisation support provided by calab.ai.

Relationship to other documents: This PAD is the foundational reference for all Solution Design Documents (SDD) and Opportunity Assessments (OAD) that target this platform. Each SDD includes an Architectural Impact Assessment (PROJECT CODE – USE CASE CODE – SDD) that references sections of this document.

2 Business View

2.1 Background

[Provide client-specific background on why this platform is being established. The standard framing below can be adapted.]

The [Platform Name] was established to provide a secure, governed, and reusable foundation for deploying Generative AI use cases within the [Client] Azure environment. The platform consolidates AI services, data processing pipelines, and operational tooling into a single managed environment that can be extended to support multiple business use cases over time.

[If applicable, describe the phased approach used to establish the platform. A typical pattern is:]

Phase 1 — Proof of Value (POV): Initial demonstration of GenAI capabilities deployed outside of the [Client] Azure environment using anonymised data.

Phase 2 — Detailed Design: Identification of platform capabilities, architecture patterns, and the approvals required to deploy within the [Client] Azure environment.

Phase 3 — Production Deployment: Deployment of the approved platform architecture and operationalisation of the first registered use case.

2.2 Objectives

The [Platform Name] aims to achieve the following objectives:

Provide a secure, compliant Azure environment for hosting Generative AI workloads behind [Client]‘s private network.
Establish reusable AI services (document processing, language understanding, speech-to-text, embeddings, search) that can be shared across use cases.
Enable governed data ingestion, enrichment, and retrieval workflows through standardised platform components.
Deliver operational tooling for monitoring, logging, alerting, and cost management across all platform workloads.
Support iterative onboarding of new AI use cases with minimal incremental infrastructure provisioning.

2.3 Scope

| Area | In Scope | Out of Scope | | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | -------- | | Infrastructure | The following resources are deployed as part of this platform: Azure AI Search, Azure Storage Account, Azure Key Vault, Azure Function App (Backend), Azure App Service (Web Applications x2), App Service Environment, Azure AI Document Intelligence, Azure OpenAI Service (x2 — primary reasoning and global voice/audio), Azure Speech Service, Azure Content Safety, Azure Container Registry, Event Grid System Topic, Application Insights, Azure Cosmos DB, Log Analytics Workspace, Azure Monitor Dashboard, Azure Monitor Workbook. | Infrastructure managed by external integration partners (e.g. RPA component infrastructure). | | Access Control and Security | User access management to ensure only authorised users can access platform resources and AI outputs. Compliance with security and governance policies. | Access management for external systems connecting to the platform (managed by respective system owners). | | Environment Management | Separate non-production and production instances for development, testing, and troubleshooting without interrupting business operations. | Environment management for external integration components. | | Middleware and Identity | Setup of middleware (e.g. Azure Function Apps, Azure Web Apps) using managed identity for seamless operation. Monitoring and logging usage and tokens consumed by platform services. | External middleware and identity management. | | Data Security | Secure storage and transmission of information processed by the platform. Compliance with [Client]‘s data use policy and Generative AI policy. | Data security management for external integration systems. | | Cost Management | Reporting on Azure consumption costs associated with the platform. | External system cost components (e.g. RPA licensing). | | Admin / Configuration | Access to configure AI processing parameters via prompt configurations. Output formats in human-readable document formats. Various ingestion strategies for different document types. | Use case-specific configuration options (documented in respective SDDs). |

2.4 Dependencies and Constraints

#	Dependency / Constraint
DP-01	GenAI Platform Services: Dependency on Azure AI services (OpenAI, Document Intelligence, Speech, Content Safety, AI Search) for core platform processing capabilities. Platform availability and performance are dependent on the health of these services.
DP-02	Azure Services: Dependency on Azure services for middleware, storage, identity management, and cost reporting. Integration with Azure Active Directory (Entra ID) for managed identities and access control.
DP-03	Monitoring and Logging: Dependency on Azure Monitor, Log Analytics, and Application Insights for platform health monitoring, usage logging, and troubleshooting.
DP-04	External Integration Components: Integration with external systems (e.g. RPA, enterprise data platforms) follows existing development lifecycle patterns and is governed by the respective system architecture documentation. Data or files passed from external components to the platform must be reviewed to ensure appropriateness as per the 8.2 Authentication and Authorisation and 8.1 Information Classification sections of this document.
CN-01	User Training and Adoption: Ensuring users are adequately trained to use platform interfaces. Change management processes to facilitate smooth adoption of platform capabilities.
[DP/CN-##]	> [Add client-specific dependencies and constraints as needed, e.g. enterprise data platform connectivity, network migration timelines, etc.]

2.5 Assumptions

#	Assumption	Consequences if Invalid
AS-01	Data Quality and Format: Work instructions for generating AI outputs are accurate and appropriate for intended results. Data sources used for generating outputs are relevant and collectively exhaustive.	Low quality outputs from the platform.
AS-02	Security and Compliance Measures: Existing security and compliance measures are adequate and will support the platform without requiring significant changes. Any additional security measures needed can be implemented within scope and timeline.	Registration of new risks or new technical debt.
AS-03	User Adoption and Training: Users will receive adequate training and support to use platform interfaces effectively. There is a willingness among users to adopt new processes and tools provided by the platform.	Wide adoption of the platform within the organisation will be impacted.
[AS-##]	> [Add client-specific assumptions as needed.]

2.6 Risks

[Risks are assessed using [Client]‘s risk management framework. Once the platform architecture is endorsed/approved, risk items are moved to the appropriate risk register for ongoing management.]

Consequence is one of the following: Level 1 (lowest), Level 2, Level 3, Level 4, Level 5 (highest)

Likelihood is one of the following: Rare, Unlikely, Possible, Likely, Almost certain

Rating is one of: Low, Medium, High, Very High, Extreme

#	Risk Description	Consequence	Impact Rating	Mitigation Controls	Likelihood
RI-01	Handling of PII data: Personally Identifiable Information (PII) processed by the platform should be accessed only by appropriate [Client] staff.	Regulatory Compliance: Failure to handle PII data correctly could result in compliance failures as per applicable privacy legislation.	[Rating]	All data processed by the platform will be stored only within the [Client] Azure tenant for internal use, ensuring no avenues of data leakage. Access to platform endpoints will be restricted to [Client] private networks to further lockdown access.	[Likelihood]
[RI-##]	> [Add client-specific risks as needed.]

2.7 Technical Debt

#	Title	Description	Source	Owner (Platform or Team)	Date Raised
DEBT-01	[Title]	[Description]	[Source]	[Owner]	[Date]

2.8 References

[List key reference documents, architecture patterns, and standards that are relevant to this platform. Include links where available. The entries below are examples — replace with client-specific references.]

#	Source	Relevance
REF-01	[Client] Cloud & Network Patterns	Home page for all Cloud & Network patterns used at [Client]
REF-02	[Client] Cloud Network Integration Patterns	Parent page for all cloud network integration patterns
REF-03	[Client] Solution Architecture Templates Guide	How to guide for [Client] Solution Architecture Templates
REF-04	[Client] PaloAlto Firewalls in Azure	Technical documentation reference for PaloAlto Firewalls within the [Client] Azure environment
REF-05	[Client] Azure Naming Conventions	Technical documentation reference for Azure naming conventions
REF-06	[Client] App Service & Function Apps + Private Endpoint & ASE Pattern	Pattern specific to Azure App Services & Function Apps with Private Endpoints and ASE
REF-07	[Client] PaaS to PaaS Pattern	Pattern specific to PaaS-to-PaaS communications (e.g. Cosmos DB & Azure App Services)
[REF-##]	> [Add additional client-specific references.]

3 Platform Overview

3.1 Platform Description

The [Platform Name] is an Azure-hosted platform that provides the shared infrastructure, AI services, and operational tooling required to deploy and operate Generative AI use cases within [Client]‘s private network. The platform is built on the NeuralOps Platform — an enterprise Generative AI platform deployable from the Azure Marketplace, with customisation support provided by calab.ai. It provides enterprise-grade agentic workflow and conversational agent capabilities. The platform is composed of four logical modules:

AI Engine — Core AI reasoning and user interaction layer, providing semantic search, natural language processing, conversational interfaces (including voice via WebRTC), and AI-generated output storage. Supports multiple orchestration strategies including OpenAI Agents, LangChain, and Prompt Flow.
AI Pre-Trainer — Data preparation and enrichment layer, responsible for ingesting raw data, extracting content from documents and audio, generating embeddings (text-embedding-3-large), and indexing processed content for downstream AI consumption. Supports multiple chunking strategies (fixed size, layout, page, paragraph, HTML header, table-specific).
External Integration Layer — Interfaces with external systems (e.g. file repositories, enterprise data platforms, automation tools) to facilitate data exchange with the platform.
Operational Services Layer — Cross-cutting Azure services that enable identity management, security, monitoring, alerting, and event-driven orchestration across all platform components.

The platform comprises four application components:

Component	Technology	Purpose
Chat Web App	React 18 / TypeScript / Vite (frontend) + Flask / Python 3.11 (backend)	User-facing chat interface with streaming responses, WebRTC voice, citations, and configurable agent personas
Admin Web App	Streamlit / Python 3.11	Administration dashboard for data ingestion, configuration management, index management, and prompt configuration
Function App Backend	Azure Functions v2 / Python 3.11	Serverless batch processing: document ingestion, embedding generation, indexing, and AI output generation. Uses Azure Functions Python Blueprints
Teams Extension	TypeScript	Microsoft Teams bot integration for conversational AI access within Teams (optional)

[Diagram: Platform overview showing the four logical modules — AI Engine, AI Pre-Trainer, External Integration Layer, and Operational Services Layer — with their constituent Azure resources and interconnections. Recommended: draw.io or Mermaid component diagram]

3.2 Platform Capabilities

The [Platform Name] provides the following capabilities to registered use cases:

Capability	Description	Status
Document Ingestion	Ingest and store raw documents (PDF, DOCX, DOC, TXT, HTML, XLSX, XLS, CSV, PPTX, MD, JSON, XML, RTF) and audio files (WAV) for processing.	Available
Document Intelligence	Extract structured content from documents using computer vision and OCR capabilities (Azure AI Document Intelligence).	Available
Audio Transcription	Transcribe audio files into diarised text with speaker identification and summarisation (Azure Speech Service).	Available
Content Chunking & Indexing	Break down processed content into optimised chunks using configurable strategies (fixed size overlap, layout, page, paragraph, HTML header, table-specific) and index for semantic retrieval.	Available
Semantic Search	Query indexed content using natural language with vector-based semantic search (Azure AI Search with semantic ranker).	Available
LLM Reasoning	Perform reasoning and analysis over text-based information using Azure OpenAI models (GPT-4o). Supports multiple orchestration strategies: OpenAI Agents, OpenAI Function Calling, LangChain, and Prompt Flow.	Available
Embedding Generation	Generate vector embeddings from processed content using text-embedding-3-large for semantic understanding.	Available
Conversational Interface	Web-based chat interface for users to query AI-generated insights and results. Supports streaming, citations, and conversation history.	Available
Voice Interface	WebRTC-based voice interaction using GPT-4o-mini-audio-preview and GPT-4o-mini-realtime-preview models.	Available
Configuration Management	Streamlit-based admin interface for managing AI processing configurations, prompt templates, workspace settings, and document upload.	Available
Output Generation	Generate AI-driven reports and outputs based on configurable agent chain-of-thought logic.	Available
Content Safety	Content moderation and safety filtering via Azure Content Safety service.	Available
Event-Driven Processing	Trigger automated processing workflows based on storage events (blob created/deleted) via Event Grid → Storage Queue → Function App.	Available
API Integration	Expose and consume APIs securely via API Management and private network routing.	Available
Monitoring & Alerting	Centralised monitoring, logging, alerting, and dashboarding across all platform components via Application Insights, Log Analytics, Dashboards, and Workbooks.	Available

3.3 Component Architecture

3.3.1 Principal Accounts

All platform resources are deployed within the [Client] Azure environment ([Azure Region] region) under the [Platform Name] subscription. Resources are organised into environment-specific Resource Groups (DEV, PPD, PRD) within this subscription.

Account / Subscription	Purpose	Environment(s)
[Platform Name] Azure Subscription	Hosts all platform resources	DEV, PPD, PRD
[Client] Azure Active Directory (Entra ID)	Identity provider for user and service authentication	All
GitHub Organisation	Source code and infrastructure-as-code repositories	All

3.3.2 Solution Technologies

The table below describes the key technology components deployed as part of the [Platform Name]. Resource naming follows the convention {resource-prefix}-{resourceToken} where resourceToken = toLower(uniqueString(subscription().id, environmentName, location)).

[The Impact and Source UC columns should be updated as use cases are onboarded. For a new platform, set all impacts to NEW and Source UC to the first registered use case.]

Component Group	Name	Description	Impact	Source UC
AI Engine	Azure AI Search	Provides semantic search capabilities (SKU: Standard) with semantic ranker. Indexes content and metadata for vector-based retrieval. SystemAssigned managed identity. 1 partition, 1 replica (scalable).	NEW	[UC-##]
AI Engine	Azure OpenAI Service (Primary)	Performs reasoning over text-based information using GPT-4o (v2024-11-20, Standard deployment, 30K TPM capacity). Generates embeddings using text-embedding-3-large (v1, Standard deployment, 300K TPM capacity). SKU: S0.	NEW	[UC-##]
AI Engine	Azure OpenAI Service (Global/Voice)	Provides voice and audio capabilities. Models: gpt-4o-mini-audio-preview (GlobalStandard, 3K capacity) and gpt-4o-mini-realtime-preview (GlobalStandard, 3K capacity). Deployed to a global region (e.g. eastus2) for model availability.	NEW	[UC-##]
AI Engine	Azure Cosmos DB	Stores chat conversation logs, workspace configurations, and AI-generated outputs. Kind: GlobalDocumentDB, Serverless capacity mode, Session consistency. Database: `db_conversation_history`. Containers: `conversations` (/userId), `configurations` (/workspaceId), `workspaces` (/tenantId).	NEW	[UC-##]
AI Engine	React & Flask Chat Web App	User-facing chat interface. React 18 / TypeScript / Vite frontend with Tailwind CSS and Radix UI. Flask / Python 3.11 backend. Supports streaming, WebRTC voice, citations, agent personas, and conversation history. Deployed as Docker container on App Service. `public_network_access_enabled = false`, `vnet_image_pull_enabled = true` (container images pulled via VNet).	NEW	[UC-##]
AI Pre-Trainer	Azure Storage Account	Stores indexed input data and output files; message queue processing items for batch workflows; JSON-based configuration created via the Admin App. SKU: Standard_GRS, Hot tier, StorageV2, TLS 1.2. Blob containers: `documents`, `config`. Queues: `doc-processing`, `doc-processing-poison`.	NEW	[UC-##]
AI Pre-Trainer	Azure AI Document Intelligence	Extracts information from uploaded documents using computer vision and OCR. Kind: FormRecognizer, SKU: S0.	NEW	[UC-##]
AI Pre-Trainer	Azure Speech Service	Processes and transcribes audio files into diarised text with speaker identification. Kind: SpeechServices, SKU: S0.	NEW	[UC-##]
AI Pre-Trainer	Azure Content Safety	Content moderation and safety filtering for AI-generated outputs. Kind: ContentSafety, SKU: S0.	NEW	[UC-##]
AI Pre-Trainer	Azure Computer Vision	Advanced image processing capabilities (optional — deployed conditionally when `useAdvancedImageProcessing` is enabled). Kind: ComputerVision, SKU: S1.	NEW	[UC-##]
AI Pre-Trainer	Azure Function App	Core processing logic for the AI Pre-Trainer and AI Engine. Handles document ingestion, chunking, embedding generation, indexing, AI output generation, and inter-component orchestration. Azure Functions v4, Python 3.11, Docker container deployment from ACR. `public_network_access_enabled = false`, `vnet_image_pull_enabled = true`.	NEW	[UC-##]
AI Pre-Trainer	Streamlit Admin Web App	Streamlit-based web application for management of platform configurations. Manages chunking strategies, prompt configurations, workspace settings, and uploaded documents. Python 3.11, deployed as Docker container on App Service. `public_network_access_enabled = false`, `vnet_image_pull_enabled = true`.	NEW	[UC-##]
External Integration	[Integration System Name]	> [Describe external integration systems relevant to this deployment, e.g. RPA tools, enterprise data platforms, file repositories.]	EXISTING	Platform
Operational Services	Azure AD (Entra ID)	Manages authentication and user identity across the platform. App Services have built-in AAD authentication enabled.	EXISTING	Platform
Operational Services	Log Analytics Workspace	Collects, analyses, and acts on telemetry data from Azure resources. Tracks platform health, performance, and diagnostic logs. SKU: PerGB2018, 30-day retention.	NEW	[UC-##]
Operational Services	Azure Key Vault	Securely stores and manages sensitive information such as certificates, cryptographic keys, and connection strings. SKU: Standard. Access policies configured for managed identity and deployment principal.	NEW	[UC-##]
Operational Services	Application Insights	Monitors live application logs, detects and diagnoses performance issues, and provides usage pattern analytics. Kind: web, linked to Log Analytics workspace.	NEW	[UC-##]
Operational Services	Azure Monitor Dashboard	Centralised location to visualise and share metrics, logs, and telemetry data. Includes charts for sessions, users, failures, response time, CPU, and memory.	NEW	[UC-##]
Operational Services	Azure Monitor Workbook	Interactive reports combining text, queries, and visualisations for a unified view of platform resources.	NEW	[UC-##]
Operational Services	Event Grid System Topic	Monitors Storage Account Blob Container events (BlobCreated, BlobDeleted on `documents` container) and triggers Azure Function App processing workflows via the `doc-processing` queue. Retry: 30 attempts, 1440 min TTL.	NEW	[UC-##]
Operational Services	Azure Container Registry	Stores Docker images for platform application components (frontendwebapp, adminwebapp, backendapi). SKU: Standard, admin user enabled.	NEW	[UC-##]
Operational Services	App Service Environment (ASEv3)	Dedicated, isolated hosting environment for App Services and Function Apps within the [Platform Name] VNet. Version: ASEv3. Internal Load Balancing Mode: Web, Publishing (fully internal ILB — both web traffic and deployment traffic are internal, no public-facing endpoints). Cluster settings: configurable `FrontEndSSLCipherSuiteOrder` for TLS cipher control. App Service Plans within the ASE use Isolated v2 tier SKUs (I1v2/I2v2/I3v2). Provides network-level isolation and enhanced security.	NEW	[UC-##]

[List external applications that the platform integrates with at the platform level. Replace the examples below with client-specific applications.]

Name	Description	Application Type	Comments
[RPA Tool, e.g. UiPath / Appian]	Provides automation capabilities for data collection, file mapping, and workflow triggering.	API / Desktop	> [Architecture governed by existing RPA solution documentation.]
[Enterprise Data Platform]	Provides historical and contextual data for enrichment of AI processing workflows.	API	> [Connectivity details to be confirmed.]
API Management	Exposes platform backend APIs securely. Routes external requests through the enterprise firewall.	API Gateway	—
[Source System Name(s)]	> [Describe source systems that provide data to the platform.]	[Type]	> [Integration details.]

3.4 Architecture Decision Records

The following architecture decisions have been made for this platform:

ID	Decision	Description	Rationale
PAD-ADR-01	Use of Private Endpoints	Data classifications for this platform require usage of Private Endpoints (as opposed to Service Endpoints) for all PaaS services. All Azure PaaS services are accessed exclusively via Private Endpoints with no public internet exposure. Specific PE subresources: Storage (`blob`, `queue`), Cosmos DB (`Sql`), Key Vault (`vault`), AI Search (`searchService`), OpenAI (`account`), Cognitive Services (`account`), Container Registry (`registry`), App Services (`sites`). Each PE is registered in a centrally managed Private DNS Zone for automatic DNS A-record resolution. All PaaS services also enforce network ACLs with a default deny action. See 6.2.1 Network Components for the full Private Endpoint configuration matrix.	Compliance with [Client] security patterns and data classification requirements. Private Endpoints provide full network-level isolation compared to Service Endpoints which only restrict traffic at the service level. Centralised Private DNS Zone management ensures consistent name resolution across the enterprise.
PAD-ADR-02	App Service Environment v3 (ASEv3)	The platform uses App Service Environment v3 (ASEv3) for hosting App Services and Function Apps, rather than standard App Service Plans. ASEv3 provides dedicated, isolated compute within the [Client] VNet. Configuration: Internal Load Balancing mode (Web, Publishing) ensures no public endpoints. App Service Plans use Isolated v2 tier SKUs (I1v2/I2v2/I3v2). All application components have `public_network_access_enabled = false` and `vnet_image_pull_enabled = true` to ensure container images are pulled via the VNet rather than the public internet.	ASEv3 simplifies network complexity, provides greater network control and compute isolation, and provides greater flexibility around ingress and egress application traffic compared to standard App Service Plans. Internal ILB mode ensures all traffic remains within [Client]‘s private network.
PAD-ADR-03	NVA-enabled Network Design	The platform design follows [Client] cyber security guidelines requiring all resources and initiatives to be deployed behind the NVA (Network Virtual Appliance) hub. All traffic is inspected by PaloAlto NVA.	Compliance with [Client] cyber security guidelines for network traffic inspection and control.
[PAD-ADR-##]	> [Add client-specific ADRs as needed.]

3.5 Guardrails and Compliance

[If the client has architecture guardrails or compliance standards, document adherence here. The table below shows a typical pattern — replace references with client-specific guardrail identifiers.]

Guardrail Title	Reference	Adherence / Deviation	Rationale
App Service Environments & Private Endpoints	[Client Pattern Reference]	ADHERENCE	Approved pattern. ASEv3 with Internal Load Balancing (Web, Publishing). All PaaS services accessed via Private Endpoints with centralised Private DNS Zone registration. App Services have public network access disabled and VNet image pull enabled.
PaaS to PaaS Communications (Cosmos DB, AI Services, Storage)	[Client Pattern Reference]	ADHERENCE	Managed Identity + Key Vault; VNet Integration + Private Endpoint.
Storage Accounts	[Client Pattern Reference]	ADHERENCE	Storage Account with private network communications. Sensitive data protected.
Azure Container Registry	[Client Pattern Reference]	ADHERENCE	Adherence to approved pattern.
Azure Key Vault	[Client Pattern Reference]	ADHERENCE	Adherence to approved pattern.
Staff connecting to Web Apps	[Client Pattern Reference]	ADHERENCE	Access via Zscaler Private Access, Corporate Office, or WVD.
Identity and Access Management	[Client Pattern Reference]	ADHERENCE	Web App interfaces authorised via RBAC. User authentication via Azure AD with MFA enabled.
Logging	[Client Pattern Reference]	ADHERENCE	Logs and metrics captured via Azure Monitor for all applicable resources.
Encryption	[Client Pattern Reference]	ADHERENCE	Encryption at rest and in transit are compliant.
Secret Management	[Client Pattern Reference]	ADHERENCE	Secrets managed and accessed via Azure Key Vault.
Diagnostic Settings	[Client Pattern Reference]	ADHERENCE	Enabled for all applicable resources.

3.6 Architectural Principles

[Document the architectural principles that govern the platform design. For each principle, describe how the platform architecture adheres to it. Source principles from the client’s enterprise architecture framework or standards body.]

#	Principle	Description	Platform Adherence
AP-01	[Principle Name]	> [Description of the principle]	> [How the platform design adheres to this principle]

4 Use Case Register

[This section maintains a register of all use cases (Solution Designs and Opportunity Assessments) that are built on this platform. Each entry links to the relevant documents and notes the current status. This provides a single view of everything running on the platform. This register should be updated whenever a new use case is onboarded, decommissioned, or materially changed. Maintenance responsibility lies with the platform architecture owner.]

[Note: A single use case may have multiple SDD documents if the use case requires distinct solutions (e.g. different automation approaches). List each SDD as a separate entry in the SD Document(s) column.]

#	Use Case Name	OA Document	SD Document(s)	Status	Date Onboarded	Key Platform Impacts
UC-01	[Use Case Name]	[Link to OAD or “N/A”]	[Link to SDD(s)]	[Active / In Design / Decommissioned]	[Date]	[Brief summary of platform changes required]

5 Integration View

5.1 Integration Patterns

The [Platform Name] uses the following integration patterns for communication between platform components and external systems:

REST API over HTTPS: Primary pattern for synchronous communication between platform components and external system triggers. Used between Chat Web App ↔ Function App, Admin Web App ↔ Function App, and external API consumers.
Azure Blob API: Used for file storage and retrieval operations between Function Apps and Storage Accounts. Documents are uploaded to the documents blob container for processing.
Private Link / TLS: Used for secure communication between ASE-hosted components and private endpoint-enabled services (Cosmos DB, Storage, Key Vault, AI Services, AI Search, Container Registry). Each PaaS service has a dedicated Private Endpoint deployed to the Private Links Subnet with automatic DNS A-record registration in the corresponding Private DNS Zone. Private DNS zones are managed centrally in [Client]‘s shared services subscription. See 6.2.1 Network Components for the full Private Endpoint configuration matrix.
Event-Driven (Event Grid → Storage Queue): Used for asynchronous processing triggers based on storage events. Blob created/deleted events on the documents container are published via Event Grid System Topic to the doc-processing Storage Queue, which triggers Function App processing.
Managed Identity: Used for authentication between Azure PaaS components, eliminating the need for key-based authentication. All service-to-service communication uses SystemAssigned managed identities with RBAC role assignments.

[Diagram: Integration diagram showing API interactions between platform components — Function App, Admin Web App, Chat Web App, Storage Account, Azure AI Services, Cosmos DB, Azure AI Search, Key Vault, and network boundary components (PaloAlto NVA, ASE) — with protocol annotations (HTTPS/REST API, Azure Blob API, Private Link/TLS). Recommended: draw.io or Mermaid sequence/flow diagram]

5.2 Standard Interfaces

The integration interfaces for the platform are described below.

[Update the Impact column as use cases are onboarded. For a new platform, set all impacts to NEW.]

Integration Process	Description	Impact	Interfaces
Upload Source Files	External automation processes or platform users upload source data files for processing and indexing.	NEW	Via External Automation: 1. External System → ASE (Function App) \| HTTPS / REST API \| Orchestrates file upload and data processing requests; 2. Function App → Storage Account \| HTTPS / Azure Blob API \| Stores raw uploaded files for indexing. Via Admin Web App: 1. User → ASE (Admin Web App) \| HTTPS / REST API \| Authenticate and interact with web app; 2. Admin Web App → ASE (Function App) \| HTTPS / REST API \| Orchestrates file upload and data processing requests; 3. Function App → Storage Account \| HTTPS / Azure Blob API \| Stores raw uploaded files for indexing.
Generate AI Insights	The Function App processes uploaded data by calling Azure AI services for analysis and insight generation. Indexed results are stored for retrieval.	NEW	1. Function App → Azure AI Services (Document Intelligence, OpenAI) \| HTTPS / REST API \| Invokes AI services for data analysis; 2. Azure AI Services → Function App \| HTTPS \| Returns processed insights and metadata; 3. Function App → Cosmos DB \| HTTPS \| Stores AI-generated results and metadata for querying; 4. Function App → Azure AI Search \| HTTPS / REST API \| Indexes metadata for fast retrieval.
Query AI Insights	Users query AI-generated insights using the Chat Web App. Queries are routed to Azure AI Search for retrieval and Azure OpenAI for reasoning.	NEW	1. User → Chat Web App \| HTTPS / Web Interface \| User sends queries via the chat application; 2. Chat Web App → Azure AI Search \| HTTPS / REST API \| Executes queries to retrieve indexed data; 3. Chat Web App → Azure OpenAI \| HTTPS / REST API \| Sends retrieved context + query for LLM reasoning; 4. Azure OpenAI → Chat Web App \| HTTPS (streaming) \| Returns AI-generated response for display.
Secure Data Management	Application secrets and sensitive data are securely managed using Azure Key Vault. Private Link subnet enables secure communication with integrated resources.	NEW	1. App Services → Key Vault \| HTTPS \| Retrieves application secrets for secure operations; 2. Virtual Network (ASE Subnet) → Private Links Subnet \| Private Link / TLS \| Provides secure connectivity to resources like Storage and Cosmos DB.
Event-Driven Processing	Storage events trigger automated document processing workflows.	NEW	1. Storage Account (Blob) → Event Grid System Topic \| Event Subscription \| BlobCreated/BlobDeleted events on `documents` container; 2. Event Grid → Storage Queue (`doc-processing`) \| Queue Message \| Triggers Function App processing; 3. Function App → AI Services \| HTTPS / REST API \| Processes document through ingestion pipeline.
Monitoring and Logging	Azure Monitor collects telemetry and diagnostic data for platform performance tracking and troubleshooting.	NEW	1. All App Services/Function App → App Insights \| HTTPS \| Sends telemetry data for monitoring; 2. Azure Monitor Resources → Dashboards \| HTTPS \| Displays performance metrics and alerts for administrators.
Expose APIs	The App Service Environment integrates with API Management to expose backend APIs securely. External requests are routed through the enterprise firewall.	NEW	1. API Management → PaloAlto NVA \| HTTPS / TLS \| Routes secure API requests; 2. PaloAlto NVA → External Systems \| HTTPS \| Ensures secure external communication.

5.3 Middleware Components

Component	Type	Purpose
Azure Function App	Serverless Compute	Core middleware for orchestrating data processing, AI service calls, and inter-component communication. Hosted within the App Service Environment. Uses Azure Functions v4 with Python Blueprints for modular function registration.
Event Grid System Topic	Event Broker	Provides event-driven triggers for storage-based events (blob created/deleted on `documents` container), publishing to Storage Queue for Function App consumption. Retry policy: 30 attempts, 1440 min TTL.
API Management	API Gateway	Exposes platform APIs to authorised consumers. Provides rate limiting, authentication, and routing capabilities.
PaloAlto NVA	Network Firewall	Inspects and secures all traffic entering and leaving the platform VNet, including inter-VNet and external communications.

6 Infrastructure View

6.1 Deployment Architecture

The deployment architecture supports [Platform Name] component deployments across Development (DEV), Pre-Production (PPD), and Production (PRD) environments. All deployments leverage infrastructure-as-code (IaC) via Bicep templates and are managed using CI/CD pipelines for consistency.

The platform uses a tag-driven CI/CD pipeline:

Developers merge to develop branch
Semantic-release creates a Release Candidate (RC) tag (vX.Y.Z-rc.N)
RC tag triggers automatic deployment to staging environment
Playwright smoke tests run against staging
On success, a General Availability (GA) tag (vX.Y.Z) is created
GA tag triggers production deployment (with manual approval gate via GitHub Environments)

Three Docker images are built and pushed to Azure Container Registry:

frontendwebapp — Chat Web App
adminwebapp — Admin Web App
backendapi — Function App Backend

[Diagram: Deployment view showing three Azure Resource Groups ([Platform Code] DEV, [Platform Code] PPD, [Platform Code] PRD) each containing Core Resources, Monitoring Resources, and Networking Resources, with CI/CD pipelines triggered from GitHub Repositories and Bicep-based provisioning. Recommended: draw.io deployment diagram]

6.1.1 Deployment Principles

The platform is deployed across three distinct environments, each within its own Azure Resource Group:
1. [Platform Code] DEV Resource Group: Development environment for testing and iterative development.
2. [Platform Code] PPD Resource Group: Pre-Production environment for integration and validation.
3. [Platform Code] PRD Resource Group: Production environment for live applications and services.
Environments are fully isolated to ensure no cross-environment dependencies.
All changes are deployed via CI/CD pipelines (GitHub Actions) triggered from GitHub Repositories, ensuring repeatable and tested releases.
Tag-driven release process: RC tags deploy to staging automatically; GA tags deploy to production with manual approval.
The same Azure services (Core Resources, Monitoring Resources, Networking Resources) are deployed across all three environments to ensure a uniform architecture.
All core services (e.g. Key Vault, Storage, Cosmos DB, AI Services) are accessed securely via Private Endpoints to prevent public exposure.
Infrastructure is provisioned using Bicep templates via Azure Developer CLI (azd provision) for consistent deployments across environments.
Application components are deployed as Docker containers from Azure Container Registry to App Services in container mode.
Source code and infrastructure definitions are stored in GitHub Repositories.
Authentication to Azure uses federated credentials (OIDC) — no client secrets in CI/CD pipelines.

6.2 Network Architecture

[Diagram: Networking view showing traffic flows between staff access methods (Zscaler, Corporate Office, WVD), PaloAlto NVA firewall, and [Platform Name] VNet hosted resources in spoke VNets. Recommended: draw.io or Visio topology diagram showing hub/spoke VNets, subnets, firewall placement, and access paths]

6.2.1 Network Components

Name	Description	Reference
PaloAlto NVA (Network Virtual Appliance)	Enterprise firewall that inspects all traffic between [Client] networks and [Platform Name] resources. Routes traffic between hub and spoke VNets.	[Client Firewall Documentation]
App Service Environment v3 (ASEv3)	Dedicated, isolated hosting environment for App Services and Function Apps within the [Platform Name] VNet. Internal Load Balancing Mode: Web, Publishing (fully internal — no public-facing endpoints). Cluster settings: configurable `FrontEndSSLCipherSuiteOrder` for TLS cipher control. App Service Plans use Isolated v2 tier SKUs (I1v2/I2v2/I3v2) with optional CPU-based autoscaling.	[Client ASE Pattern Reference]
Hub VNet	Central network hub hosting the PaloAlto NVA and providing connectivity between on-premises networks, Zscaler, and spoke VNets.	[Client Cloud Platform Zone Model]
Spoke VNet ([Platform Name])	Dedicated VNet for [Platform Name] resources, peered with the Hub VNet. Contains ASE subnet and Private Links subnet. Uses custom DNS servers (region-specific) for private DNS zone resolution rather than Azure-provided DNS.	[Client Cloud Platform Zone Model]
ASE Subnet	Subnet within the [Platform Name] spoke VNet hosting the App Service Environment (Function Apps, Web Apps). Delegated to `Microsoft.Web/hostingEnvironments`. Minimum size: /27 (32 IPs). NSG associated (Azure default rules only). Route table auto-associated based on region and environment (hub route tables for prod/nonprod × Australia East/Southeast).	—
Private Links Subnet	Subnet within the [Platform Name] spoke VNet hosting Private Endpoints for PaaS services. Private endpoint network policies disabled (PE traffic bypasses subnet-level NSG rules). NSG associated (Azure default rules only).	—
Route Tables	Hub-managed route tables auto-associated to subnets based on region (Australia East/Southeast) and environment (prod/nonprod). Ensures all traffic is routed through the PaloAlto NVA.	—
SDWAN	Site-to-site connectivity between [Client] Corporate Office networks and the Azure Hub VNet.	—

Subnet Service Endpoints

Both subnets include the following default service endpoints for management plane connectivity:

Service Endpoint	Purpose
`Microsoft.AzureCosmosDB`	Cosmos DB service endpoint
`Microsoft.ContainerRegistry`	Container Registry service endpoint
`Microsoft.EventHub`	Event Hub service endpoint
`Microsoft.KeyVault`	Key Vault service endpoint
`Microsoft.ServiceBus`	Service Bus service endpoint
`Microsoft.Sql`	SQL Database service endpoint
`Microsoft.Storage`	Storage Account service endpoint
`Microsoft.Web`	App Service service endpoint

[Note: Service endpoints coexist with Private Endpoints. Service endpoints provide management plane connectivity at the subnet level, while Private Endpoints provide data plane connectivity via private IP addresses. The ADR in 3.4 Architecture Decision Records (PAD-ADR-01) confirms Private Endpoints are the primary connectivity mechanism for data plane traffic.]

Private Endpoint Configuration

All Azure PaaS services are accessed via Private Endpoints deployed to the Private Links Subnet. Each Private Endpoint is registered in a centrally managed Private DNS Zone for automatic DNS resolution. Private DNS zones are hosted in a shared services subscription and resource group, managed by [Client]‘s platform team.

Resource	PE Subresource	Private DNS Zone	Notes
Azure Storage Account (blob)	`blob`	`privatelink.blob.core.windows.net`	Document and config blob containers
Azure Storage Account (queue)	`queue`	`privatelink.queue.core.windows.net`	Document processing queues
Azure Cosmos DB	`Sql`	`privatelink.documents.azure.com`	SQL API (GlobalDocumentDB)
Azure Key Vault	`vault`	`privatelink.vaultcore.azure.net`	Secrets and certificates
Azure AI Search	`searchService`	`privatelink.search.windows.net`	Semantic search indexes
Azure OpenAI Service	`account`	`privatelink.openai.azure.com`	LLM reasoning and embeddings
Azure AI Document Intelligence	`account`	`privatelink.cognitiveservices.azure.com`	Document content extraction
Azure Content Safety	`account`	`privatelink.cognitiveservices.azure.com`	Content moderation
Azure Speech Service	`account`	`privatelink.cognitiveservices.azure.com`	Audio transcription
Azure Computer Vision	`account`	`privatelink.cognitiveservices.azure.com`	Image processing (optional)
Azure Container Registry	`registry`	`privatelink.azurecr.io`	Docker image registry
App Services (Chat, Admin, Function)	`sites`	ASE DNS suffix (auto-registered)	App Services hosted in ASE use the ASE’s internal DNS suffix rather than standard `privatelink.azurewebsites.net`

[The PE Subresource column indicates the subresource_names used when creating the Private Endpoint. The Private DNS Zone column indicates the Azure Private DNS Zone where the PE’s A-record is automatically registered via a DNS Zone Group. All DNS zones follow the privatelink.{service}.{domain} naming convention and are managed centrally in [Client]‘s shared services subscription.]

6.2.2 Staff Access Methods

Zscaler Private Access (Remote / [Client] Laptop)

[Client] staff members use [Client]-managed laptops with Zscaler Client Connector installed. Users authenticate to Zscaler Client Connector using Azure AD (with MFA enabled). When a user sends a connection request to platform applications hosted on [Platform Name] VNets, Zscaler Client Connector redirects traffic (TCP or UDP) to Zscaler Private Access (ZPA) on cloud. ZPA assesses the request via ZPA policies to ensure the authenticated user has access to the [Platform Name] VNet hosted resource. If ZPA policies allow the connection, ZPA sends the traffic to a ZPA App Connector hosted on Azure. The App Connector forwards the traffic to the application hosted in the [Platform Name] VNets. Traffic between the ZPA App Connector and [Platform Name] VNet applications is inspected by the PaloAlto Firewall.

Pre-conditions: User connecting via a [Client]-managed laptop from outside of the Corporate Office. Zscaler Client Connector application must be running on the laptop.

[Client] Corporate Office (On-Premises)

User is connected to [Client] trusted network using a [Client]-managed laptop. Since the user is already connected to the trusted network, ZPA is not required to connect to [Platform Name] VNet hosted resources. Connections between user and Azure are established using SDWAN. Traffic between the [Client] network and [Platform Name] VNet hosted resources in the spoke VNet is inspected by the PaloAlto Firewall. PaloAlto NVA allows access from the [Client] private network to [Platform Name] VNet hosted resources.

Pre-conditions: User connecting via a [Client]-managed laptop using Corporate Office Network Connection. PaloAlto NVA allows connectivity from [Client] offices to [Platform Name] VNet hosted resources in the spoke VNet.

Windows Virtual Desktop (WVD in Azure)

Windows Virtual Desktop provides the ability to connect to Virtual Desktops running on Azure. Microsoft manages portions of the Windows Virtual Desktop service on [Client]‘s behalf and provides secure endpoints for connecting clients and session hosts. Users need to have access to the Windows Virtual Desktop Host Pool or the application hosted on them.

Pre-conditions: User connecting via a Virtual Desktop hosted on Azure. PaloAlto NVA allows connectivity from Virtual Desktops address range to [Platform Name] VNet resources in the spoke VNet.

6.2.3 Network Interactions

All network interactions within the platform follow these principles:

Ingress: All user traffic enters through one of the three staff access methods (Zscaler, Corporate Office, WVD) and is inspected by the PaloAlto NVA before reaching platform resources.
Internal (PaaS-to-PaaS): Communication between Azure PaaS services uses Private Endpoints within the [Platform Name] VNet, with Managed Identity authentication. Each PaaS service enforces network ACLs with a default deny action, allowing only traffic from authorised VNet subnets and IP ranges.
Storage Account Network Rules: Default action: Deny. Allows VNet subnet access + authorised IP ranges. Bypass: Logging, Metrics, AzureServices. Private link access granted to Document Intelligence, Speech Service, and Computer Vision for direct BYOS (Bring Your Own Storage) connectivity.
Key Vault Network Rules: RBAC authorisation mode. Default action: Deny. Allows authorised subnet IDs + IP ranges. Bypass: AzureServices.
AI Services Network Rules: Each Cognitive Service enforces network ACLs with VNet rules and IP-based allow lists.
Egress: Outbound traffic to external data sources is routed through the PaloAlto NVA for inspection and policy enforcement.
Encryption: All data is encrypted via TLS between all components, both in transit and at rest.

6.3 Environment Strategy

All non-production environments are provisioned as duplicates of the production environment. Only dummy data is used in non-production environments. Should a requirement arise to use production data in non-production environments for testing purposes, this will be raised with the appropriate teams (e.g. Data Governance, Security) and deleted immediately after the test scenarios are completed.

Environment	Purpose	Data Policy
DEV	Development and iterative testing	Dummy / synthetic data only
PPD	Pre-production integration and validation	Dummy / synthetic data only
PRD	Production live services	Production data with full security controls

6.4 Infrastructure Requirements

Requirement	Description
Azure Subscription	[Platform Name] subscription within [Client] Azure environment ([Azure Region])
Resource Groups	Three isolated resource groups (DEV, PPD, PRD)
Networking	Hub/Spoke VNet topology with PaloAlto NVA. Spoke VNet with custom DNS servers. ASE subnet (delegated to `Microsoft.Web/hostingEnvironments`, minimum /27) and Private Links subnet (PE network policies disabled). Route tables auto-associated from hub. 8 default service endpoints on all subnets.
Compute	App Service Environment v3 (ASEv3) with Internal Load Balancing (Web, Publishing). Isolated v2 tier SKUs (I1v2/I2v2/I3v2). Optional CPU-based autoscaling.
Storage	Azure Storage Accounts with Private Endpoints (Standard_GRS, Hot tier)
Database	Azure Cosmos DB (Serverless, Session consistency) with Private Endpoints
AI Services	Azure OpenAI (x2 — primary + global/voice), Document Intelligence, Speech, Content Safety, AI Search
Container Registry	Azure Container Registry (Standard SKU) with Private Endpoint
Secrets Management	Azure Key Vault (Standard SKU) with Private Endpoint
Monitoring	Log Analytics (PerGB2018, 30-day retention), Application Insights, Dashboards, Workbooks
Event Handling	Event Grid System Topic for storage event processing
Infrastructure-as-Code	Bicep templates provisioned via Azure Developer CLI (`azd`)
CI/CD	GitHub Actions pipelines for deployment automation (tag-driven: RC → staging → GA → production)
Authentication	Azure AD federated credentials (OIDC) for CI/CD — no client secrets

6.5 Licensing and Cost Considerations

[Describe any licensing or cost implications for the platform infrastructure. Use-case-specific cost analysis belongs in the OAD. The items below are standard considerations — adjust for client-specific pricing and agreements.]

Component	Cost Consideration
App Service Environment v3 (ASEv3)	ASEv3 is a premium Azure service with dedicated Isolated v2 tier compute resources (I1v2/I2v2/I3v2 SKUs), which incurs higher costs compared to shared App Service Plans. Cost is incurred regardless of workload utilisation. Internal Load Balancing mode ensures no additional public IP costs.
Azure OpenAI Services	Costs are based on usage, including the number of API calls, token consumption (TPM quotas), and model deployment types (Standard vs GlobalStandard). Costs scale with the number of use cases and processing volume.
Azure AI Search	Costs based on SKU tier (Standard), number of search units (replicas × partitions), and semantic ranker usage.
Azure Cosmos DB	Serverless billing based on consumed Request Units (RU) and storage. Cost scales with concurrent user sessions and data volume.
Azure Container Registry	Standard SKU with per-image storage and bandwidth charges.

6.6 Backup and Recovery

The [Platform Name] does not require extensive backup and recovery policies due to the transient nature of the processing workloads. Key considerations:

Aspect	Approach
Transient Processing	The platform is primarily used to generate point-in-time outputs to assist users. Data processed is transient in nature.
Source Data Regeneration	If any data persisted by the platform is lost (e.g. AI-generated reports), outputs can be regenerated by reprocessing the source data.
Automated Provisioning	In the event of Azure infrastructure resources requiring a full recovery, this can be achieved by re-deploying the platform using the automated provisioning pipelines (Bicep IaC via `azd provision`).
Data Archival and Retention	> [Data archival and retention policies to be defined during delivery phase.]

6.7 Capacity Planning

Capacity planning for the [Platform Name] is driven by the number of registered use cases, concurrent users, and data processing volume. As this is a newly established (greenfield) platform, capacity planning focuses on projected needs and scaling triggers rather than historical utilisation baselines.

Resource	Current Utilisation	Scaling Threshold	Growth Projection	Notes
ASE Compute	N/A — Greenfield	Concurrent request load exceeds single instance capacity	Scale based on number of registered use cases	Scale App Service Plan instances within the ASE. CPU-based autoscaling available: scale up when CPU > 70% (cooldown 10 min), scale down when CPU < 25% (cooldown 1 min). Isolated v2 tier SKUs (I1v2/I2v2/I3v2) provide different compute capacities.
Azure OpenAI (Primary)	N/A — Greenfield	Token-per-minute (TPM) quota exhaustion (GPT-4o: 30K TPM, Embeddings: 300K TPM)	Scale with use case processing volume	Adjust TPM quotas and model deployment regions.
Azure AI Search	N/A — Greenfield	Index size or query volume exceeds single unit (1 partition, 1 replica)	Scale with indexed data volume	Scale search units (replicas and partitions).
Cosmos DB	N/A — Greenfield	Serverless RU consumption patterns indicate provisioned throughput would be more cost-effective	Scale with concurrent user sessions	Evaluate Serverless vs Provisioned throughput mode.
Storage Accounts	N/A — Greenfield	Automatic scaling	Monitor for tier optimisation	Monitor for hot/cool/archive tier optimisation opportunities. Standard_GRS provides geo-redundancy.

6.8 Failover and High Availability

[Describe the high availability strategy for the platform. Document the redundancy topology, failover mechanisms, and any active/passive or active/active patterns. Consider: compute redundancy (ASE instance count, autoscaling), data redundancy (Cosmos DB multi-region, Storage GRS), and network redundancy (hub failover paths).]

Component	HA Strategy	Failover Mechanism	Notes
App Service Environment (ASEv3)	> [e.g. Single-region with autoscaling]	> [e.g. CPU-based autoscaling within ASE]	> [Notes]
Azure Cosmos DB	> [e.g. Serverless with session consistency]	> [e.g. Automatic failover within region]	> [Notes]
Azure Storage Account	> [e.g. GRS — geo-redundant storage]	> [e.g. Automatic failover to paired region]	> [Notes]
Azure OpenAI Service	> [e.g. Single-region Standard deployment]	> [e.g. Manual failover to secondary region]	> [Notes]
Azure AI Search	> [e.g. Single replica, single partition]	> [e.g. Scale replicas for HA]	> [Notes]
PaloAlto NVA	> [e.g. Active/Passive pair in hub VNet]	> [e.g. Automatic failover to standby NVA]	> [Notes]

6.9 Disaster Recovery

[Describe the disaster recovery strategy for the platform. Specify RPO/RTO targets, DR site architecture, failover procedures, and testing cadence. Consider: paired Azure region strategy, data replication, infrastructure rebuild capability (IaC), and communication/notification procedures.]

Aspect	Detail
RPO (Recovery Point Objective)	> [Target RPO — e.g. 24 hours. Maximum acceptable data loss.]
RTO (Recovery Time Objective)	> [Target RTO — e.g. 4 hours. Maximum acceptable downtime.]
DR Region	> [e.g. Australia Southeast (paired region for Australia East)]
Data Replication	> [e.g. Storage GRS provides automatic geo-replication. Cosmos DB single-region with backup policy.]
Infrastructure Rebuild	> [e.g. Full platform can be redeployed to DR region using Bicep IaC templates via `azd provision`.]
Application Recovery	> [e.g. Docker images in ACR can be replicated to DR region. CI/CD pipelines can target DR environment.]
Failover Procedure	> [e.g. 1. Assess outage scope. 2. Trigger IaC deployment to DR region. 3. Update DNS/routing. 4. Validate services. 5. Notify stakeholders.]
Failback Procedure	> [e.g. 1. Confirm primary region recovery. 2. Sync any data changes. 3. Redirect traffic to primary. 4. Decommission DR resources.]
DR Testing Cadence	> [e.g. Annual DR test with documented results and lessons learned.]

7 Information View

7.1 System of Record

The [Platform Name] is not a system of record. It processes copies of source data provided by upstream systems and generates AI-derived outputs for consumption by platform users. Source data remains governed by its respective system of record.

Data Object	System of Record	Copy	Impact Description
Source data files	Upstream systems ([Source System Names])	[Platform Name] (Storage Account)	Consumer — processes copies for AI analysis
AI-generated outputs	[Platform Name]	[Platform Name] (Storage Account, Cosmos DB)	Producer — generates and stores AI-derived reports and insights
Chat conversation logs	[Platform Name] (Cosmos DB)	N/A	Producer — stores user interaction history
Platform configurations	[Platform Name] (Storage Account)	N/A	Producer — stores AI processing configurations as JSON in `config` blob container
Workspace configurations	[Platform Name] (Cosmos DB)	N/A	Producer — stores workspace and tenant configurations in `workspaces` and `configurations` containers

7.2 Data Governance

[Describe how data is governed on the platform. Replace [Client] references with actual client name and policies.]

The [Platform Name] adheres to the [Client] Data Governance and Classification Standard, which specifies [Client]‘s requirements for the accurate classification of data and the level of protection applied to data and its use.

Data shared with the Generative AI services (Azure OpenAI) is used only for transient processing — it is not persisted within the AI model or used for training the AI model.

7.3 Data Migration

No data migration is required for the platform. The platform ingests data from upstream systems on-demand and does not replace any existing data stores.

7.4 Privacy and Data Protection

The platform processes data that may include Personally Identifiable Information (PII). The following protections are in place:

Aspect	Description
Privacy Impact Assessment	> [Reference to PIA document or “To be conducted during delivery phase.”]
Personal Data Types	> [Types of personal data the platform may process, e.g. customer records, interaction history, correspondence, case notes.]
Compliance Obligations	> [Applicable privacy legislation, e.g. Australian Privacy Act, GDPR. [Client] Data Classification Standard.]
Data Subject Rights	Handled via existing [Client] data governance processes.
Network Isolation	All platform resources are accessible only from [Client] private networks via Private Endpoints. No public internet exposure.
Data Residency	All data is processed and stored within the [Client] Azure tenant ([Azure Region] region).
Transient Processing	Azure OpenAI processes data transiently — no data is persisted in AI models or used for model training.
Access Control	Role-based access controls ensure only authorised users can access processed data and AI outputs.
Encryption	Data is encrypted at rest and in transit across all platform components (TLS 1.2 minimum).
Retention	> [Data retention policies to be defined during delivery. Short retention periods recommended for transient AI workloads.]

8 Security View

8.1 Information Classification

[Classify information types handled by the platform using [Client]‘s information classification standard. The table below is an example — replace with actual data types and classifications.]

Information Type	Classification Level	Non-Compliance and Exceptions
Source data files	[Classification Level]	> [Compliance status and security measures applied.]
AI-generated outputs	[Classification Level] (derived)	> [Compliance status and security measures applied.]
Platform configuration data	Internal	Standard protection measures applied.

8.2 Authentication and Authorisation

The platform defines the following user roles. Specific use case role mappings are documented in the respective SDDs.

Conceptual Role	Description	App Interfaces	Authentication Type	Access
Standard Users	Users who interact with the Chat Web App to query AI-generated insights and results. All data they interact with is data they are authorised to access.	Chat Web App	Azure Active Directory + MFA	Source data files, AI-generated outputs
Admin Users	Users who interact with the Admin Web App to configure AI processing parameters. They have elevated access for platform configuration management.	Chat Web App, Admin Web App	Azure Active Directory + MFA	Source data files, AI-generated outputs, Platform configurations
Technical Support	Users who interact with all components of the platform to provide support to Standard Users and Admin Users.	Chat Web App, Admin Web App, Azure Portal	Azure Active Directory + MFA	Source data files, AI-generated outputs, Platform configurations, Azure Resources
System Integrations	Automated processes (e.g. RPA components) that interact with the platform to upload source data files and trigger AI processing workflows.	Azure Function App (API)	Service Principal / Managed Identity	Source data files, AI-generated outputs

Azure Web Apps (Chat Web App, Admin Web App) have Microsoft AAD authentication enabled via the in-built Authentication mechanism.

[Document Azure Role Assignments (RBAC) per environment as applicable. The RBAC matrix below is an example — adjust for [Client]‘s IAM model.]

Azure Role Assignments

[The below RBAC roles are assigned at the Resource Group level per environment. Adjust based on [Client]‘s access management framework.]

Role	Standard User			Admin User			Technical Support
	DEV	PPD	PRD	DEV	PPD	PRD	DEV	PPD	PRD
Contributor	NO	NO	NO	NO	NO	NO	[As per CR]	NO	NO
Reader	NO	NO	NO	NO	YES	YES	[Inherited]	YES	YES
Storage Blob Data Contributor	NO	NO	NO	NO	YES	YES	[Inherited]	YES	YES
Cognitive Services OpenAI Contributor	NO	YES	YES	NO	YES	YES	[Inherited]	YES	YES
Search Service Contributor	NO	YES	YES	NO	YES	YES	[Inherited]	YES	YES
Search Index Data Contributor	NO	YES	YES	NO	YES	YES	[Inherited]	YES	YES
Storage Blob Data Reader	NO	YES	YES	NO	[Inherited]	[Inherited]	[Inherited]	[Inherited]	[Inherited]

8.3 Security Controls

Security Item	Controls
Azure Role-Based Access Controls (RBAC)	Each user type requires a specific set of access controls as defined in the Azure Role Assignments table above. RBAC is enforced at the Resource Group level.
Change / Privileged Management	Changes to the platform must go through standard [Client] DevOps change management. Changes to platform components must go through the standard change management process. Technical Support will request access to platform components when raising a Change Request (CR) and access will be revoked after the CR window. Privileged management for cloud resources follows [Client] cloud standards.
Network Controls	Azure assets (Storage, Key Vault, AI Services, Cosmos DB, AI Search, Container Registry) only accept traffic from specified VNets via Private Endpoints. All PaaS services enforce network ACLs with default deny and authorised VNet subnet + IP range allow lists. Storage Accounts grant private link access to Document Intelligence, Speech, and Computer Vision for BYOS connectivity. Key Vault uses RBAC authorisation mode with AzureServices bypass. App Services have `public_network_access_enabled = false` and `vnet_image_pull_enabled = true`. Managed Identities are used instead of key-based authentication for all communication between Azure PaaS components. Data is encrypted via TLS (1.2 minimum) between all components.
DevOps	GitHub encrypts data at rest and in transit. 2FA is supported. Azure Key Vault is used for secrets management. GitHub performs code scanning for vulnerability monitoring. CI/CD pipelines use federated credentials (OIDC) — no stored secrets.

8.4 Auditing and Logging

Auditing and logging are addressed through the Guardrails adherence (3.5 Guardrails and Compliance — Logging guardrail). Key aspects:

All applicable Azure resources have diagnostic settings enabled, forwarding logs and metrics to the centralised Log Analytics workspace (30-day retention).
Resources with logging enabled: Key Vault, Storage Account Blobs, App Service, Azure Function App, Cosmos DB, Container Registry, AI Services, AI Search.
Application-level logging is captured via Application Insights with OpenTelemetry instrumentation.
Dashboards and Workbooks provide consolidated views of platform health, performance, and security events.
Alert Rules are configured to trigger notifications when specified conditions are met (e.g. service health degradation, error rate thresholds).

9 Support View

9.1 Service Classification

[Classify the service level for platform components (e.g. Platinum, Gold, Silver, Bronze). To be determined during delivery phase.]

9.2 Support Model

[Describe the support model — who supports what, escalation paths, etc. The table below shows a typical pattern — adjust for [Client]‘s support structure.]

Team	Responsibility
Business Operations — Admin Users	Escalate requests to other support teams when required for either bugs or enhancements.
[Client] IT — Technical Support	Act as the first point of escalation for enhancements and bugs relating to the platform.
[Client] CloudOps — Technical Support	Act as the second point of escalation for when IaC or IAM changes are required to be deployed.
calab.ai — Technical Support	Act as the second point of escalation for when platform-specific bugs or enhancements require additional assistance in collaboration with the [Client] IT team.

9.3 Non-Functional Requirements

9.3.1 Scalability

The platform is designed to support multiple concurrent use cases with independent data processing pipelines.
Azure AI services can be scaled horizontally by adjusting TPM quotas (OpenAI), search replicas/partitions (AI Search), and compute instances (ASE).
ASE provides dedicated compute that can be scaled within the environment to handle increased workload. CPU-based autoscaling is available on App Service Plans within the ASE (scale up > 70% CPU, scale down < 25% CPU). Isolated v2 tier SKUs (I1v2/I2v2/I3v2) provide different compute capacities.
Cosmos DB Serverless mode automatically scales with demand; can be migrated to provisioned throughput if usage patterns warrant.
Storage Accounts scale automatically with Standard_GRS providing geo-redundancy.

9.3.2 Maintainability

All infrastructure is provisioned via Bicep templates (IaC), enabling version-controlled and repeatable deployments via Azure Developer CLI.
Application code and infrastructure definitions are stored in GitHub with CI/CD pipelines (GitHub Actions) for automated deployment.
Tag-driven release process ensures traceability: every deployment maps to a semantic version tag.
Environment parity (DEV/PPD/PRD) ensures that changes can be tested in non-production before promotion.
Platform configurations are stored as JSON in Azure Blob Storage (config container), enabling versioning and rollback.
Python dependencies managed via Poetry for reproducible builds. Frontend dependencies managed via npm.

9.3.3 Security

All resources are deployed behind Private Endpoints with no public internet exposure. All PaaS services enforce network ACLs with default deny and authorised subnet/IP allow lists.
App Services have public_network_access_enabled = false and vnet_image_pull_enabled = true to ensure container images are pulled via the VNet.
User authentication requires Azure AD (Entra ID) with MFA enabled.
Managed Identities (SystemAssigned) are used for all service-to-service communication, eliminating stored credentials.
Secrets are managed in Azure Key Vault with access policies restricted to authorised identities.
Network traffic is inspected by PaloAlto NVA for all ingress, egress, and inter-VNet flows.
DevOps security is enforced via GitHub code scanning and vulnerability monitoring.
CI/CD pipelines use OIDC federated credentials — no client secrets stored in pipelines.
TLS 1.2 minimum enforced across all components. HTTPS-only access to all web applications.
Content Safety service provides content moderation for AI-generated outputs.

9.3.4 Reusability

Platform capabilities (document processing, transcription, indexing, search, LLM reasoning, content safety) are designed as shared services that can be consumed by any registered use case.
New use cases are onboarded by registering in the 4 Use Case Register and creating an SDD — without requiring new infrastructure provisioning.
Configuration-driven architecture allows behaviour to be customised per use case via JSON configuration files managed through the Admin App.
Multiple orchestration strategies (OpenAI Agents, OpenAI Function Calling, LangChain, Prompt Flow) can be selected per use case without platform changes.

Name	Layer	Description	Related Feature / Epic
Document Processing Pipeline	AI Pre-Trainer	Ingestion, chunking, embedding, and indexing pipeline configurable for multiple document types	Shared across all use cases
Conversational Interface	AI Engine	Chat Web App with streaming, citations, voice, and agent personas	Shared across all use cases
Admin Configuration	AI Pre-Trainer	Workspace and prompt configuration management via Admin Web App	Shared across all use cases
Monitoring Stack	Operational Services	App Insights, Log Analytics, Dashboards, Workbooks, Alert Rules	Shared across all use cases

9.3.5 Recoverability

Infrastructure can be fully rebuilt from Bicep definitions and GitHub repositories using azd provision.
Application containers can be rebuilt from Dockerfiles and redeployed from ACR.
AI-generated outputs can be regenerated by reprocessing source data through the platform pipelines.
No critical state is stored exclusively within the platform that cannot be reconstructed from source systems or IaC definitions.
Azure service-level SLAs provide built-in redundancy for individual components.
Cosmos DB Serverless with Session consistency provides built-in resilience for conversation state.

9.4 Monitoring and Observability

Category	Tool / Service	Description
Application Monitoring	Application Insights	Captures live application logs, performance metrics, and usage patterns. OpenTelemetry instrumentation for distributed tracing.
Infrastructure Monitoring	Log Analytics Workspace	Collects telemetry from all Azure resources for health and diagnostic analysis. SKU: PerGB2018, 30-day retention.
Alerting	Alert Rules	Configured for service health, error rates, and performance thresholds with notification actions.
Dashboards	Azure Monitor Dashboard	Centralised, real-time views of platform health and performance metrics. Includes charts for sessions, users, failures, response time, CPU, and memory.
Interactive Reporting	Azure Monitor Workbook	Interactive reporting combining text, queries, and visualisations for deep-dive analysis.
Diagnostic Settings	Azure Monitor	Enabled on all applicable resources, forwarding logs and metrics to Log Analytics. Resources covered: Key Vault, Storage Blobs, App Service, Function App, Cosmos DB, Container Registry, AI Services, AI Search.
Health Checks	App Service Health Check	Health check endpoint (`/api/health`) configured on Chat Web App with automatic instance replacement on failure.

10 Appendix

10.1 Glossary

Acronym / Term	Full Name / Description
PAD	Platform Architecture Document
OAD	Opportunity Assessment Document
SDD	Solution Design Document
ASE	App Service Environment — a dedicated, isolated Azure hosting environment for App Services and Function Apps. Version 3 (ASEv3) is used, with Internal Load Balancing (ILB) mode for fully private traffic routing.
NVA	Network Virtual Appliance — a virtual machine (PaloAlto) that provides network security functions such as firewalling and traffic inspection.
VNet	Virtual Network — Azure’s network isolation construct for hosting cloud resources.
PaaS	Platform as a Service — Azure managed services (e.g. Cosmos DB, AI Search, Storage Accounts).
IaC	Infrastructure as Code — the practice of managing infrastructure through version-controlled code (Bicep).
RBAC	Role-Based Access Control — Azure’s authorisation model for granting granular access to resources.
MFA	Multi-Factor Authentication — requiring multiple verification factors for user authentication.
RAG	Retrieval Augmented Generation — a pattern that combines information retrieval with generative AI for grounded responses.
LLM	Large Language Model — AI models (e.g. GPT-4o) capable of natural language understanding and generation.
PII	Personally Identifiable Information — data that can be used to identify an individual.
ZPA	Zscaler Private Access — zero-trust network access service for secure remote connectivity.
WVD	Windows Virtual Desktop — Azure-hosted virtual desktop infrastructure.
TPM	Tokens Per Minute — Azure OpenAI rate limiting metric for API consumption.
GRS	Geo-Redundant Storage — Azure storage redundancy option replicating data across paired regions.
ACR	Azure Container Registry — managed Docker container registry service in Azure.
OIDC	OpenID Connect — authentication protocol used for federated identity in CI/CD pipelines.
SDWAN	Software-Defined Wide Area Network — technology for site-to-site connectivity between corporate offices and Azure.
ASP	App Service Plan — Azure’s shared hosting plan for web apps (alternative to ASE).
ILB	Internal Load Balancer — a load balancer mode where all traffic is routed internally within a VNet with no public-facing endpoints.
BYOS	Bring Your Own Storage — a pattern where Azure AI services (Speech, Document Intelligence, Computer Vision) access a customer-managed Storage Account directly via private link.
PE	Private Endpoint — a network interface that connects privately and securely to an Azure PaaS service via Azure Private Link.
CR	Change Request — formal request to make changes to production systems or access.
RC	Release Candidate — a pre-release version tag (e.g. v1.0.0-rc.1) deployed to staging for validation.
GA	General Availability — a stable release version tag (e.g. v1.0.0) deployed to production.

10.2 Archived Designs

[If any platform designs have been superseded, archive them here for reference.]

Graph View

Explorer

PROJECT CODE – AI PLATFORM CODE – PAD

AI Platform Architecture Document

Document Metadata

Document Version Control

Contributors

Intended Audience

Document Approval Requirements

1 Introduction

2 Business View

2.1 Background

2.2 Objectives

2.3 Scope

2.4 Dependencies and Constraints

2.5 Assumptions

2.6 Risks

2.7 Technical Debt

2.8 References

3 Platform Overview

3.1 Platform Description

3.2 Platform Capabilities

3.3 Component Architecture

3.3.1 Principal Accounts

3.3.2 Solution Technologies

3.3.3 Related Applications

3.4 Architecture Decision Records

3.5 Guardrails and Compliance

3.6 Architectural Principles

4 Use Case Register

5 Integration View

5.1 Integration Patterns

5.2 Standard Interfaces

5.3 Middleware Components

6 Infrastructure View

6.1 Deployment Architecture

6.1.1 Deployment Principles

6.2 Network Architecture

6.2.1 Network Components

Subnet Service Endpoints

Private Endpoint Configuration

6.2.2 Staff Access Methods

Zscaler Private Access (Remote / [Client] Laptop)

[Client] Corporate Office (On-Premises)

Windows Virtual Desktop (WVD in Azure)

6.2.3 Network Interactions

6.3 Environment Strategy

6.4 Infrastructure Requirements

6.5 Licensing and Cost Considerations

6.6 Backup and Recovery

6.7 Capacity Planning

6.8 Failover and High Availability

6.9 Disaster Recovery

7 Information View

7.1 System of Record

7.2 Data Governance

7.3 Data Migration

7.4 Privacy and Data Protection

8 Security View

8.1 Information Classification

8.2 Authentication and Authorisation

Azure Role Assignments

8.3 Security Controls

8.4 Auditing and Logging

9 Support View

9.1 Service Classification

9.2 Support Model

9.3 Non-Functional Requirements

9.3.1 Scalability

9.3.2 Maintainability

9.3.3 Security

9.3.4 Reusability

9.3.5 Recoverability

9.4 Monitoring and Observability

10 Appendix

10.1 Glossary

10.2 Archived Designs

Graph View

Graph View

Table of Contents