AI Platform Architecture Document

How

This template defines the structure for a Platform Architecture Document (PAD). The PAD is the foundational document in the template system — it describes the shared infrastructure, integrations, security, and operational capabilities of a platform upon which one or more use cases are built. This template has been pre-populated with defaults derived from the standard NeuralOps Platform and common client deployment patterns. Pre-populated content reflects what is typically true for every deployment. You should:

  • Review all pre-populated content and adjust for client-specific differences.
  • Replace all [bracketed placeholders] with project-specific content.
  • Remove or replace guidance text (blockquote format, lines starting with >) once the section is populated.
  • [Diagram: ...] placeholders indicate where diagrams should be inserted. Replace with the actual diagram and a brief caption. Recommended tools: draw.io, Mermaid, Lucidchart, or Visio.
  • Cross-references use Obsidian link syntax: in-document links use [[#heading-text|Heading Text]] format, and cross-document links use [[Document Name#section|Section]] format (e.g. [[PROJECT CODE – USE CASE CODE – SDD#5-architectural-impact-assessment|5 Architectural Impact Assessment]] or [[#62-network-architecture|6.2 Network Architecture]]).
  • This document should be created once per platform. Use cases built on this platform are documented in separate OAD and SDD documents and registered in the 4 Use Case Register.

Document Metadata

FieldDetail
Initiative code[PROJECT CODE]
Platform title[Platform Name]
Document typePAD – Platform Architecture Document
Status[Draft / Under Review / Endorsed / Approved]
Author(s)[Author Name(s)]
Approved by[Approving body or individual]

Document Version Control

This document has undergone the following document version controls:

DateVersionChange DescriptionAuthor
DD/MM/YYYY0.1Initial draft created[Author Name]

Contributors

The content of this document has been authored with the combined input of the following group of key individuals:

NameRoleArea
[Name]Solution Leadcalab.ai (Vendor)
[Name(s)]Solution Teamcalab.ai (Vendor)
[Name(s)]IT Rep - SecurityInformation Technology ([Client])
[Name(s)]IT Rep - ArchitectureInformation Technology ([Client])
[Name(s)]IT Rep - Infrastructure & NetworksInformation Technology ([Client])
[Name(s)]Business Sponsor / Process Owner[Business Unit] ([Client])

Intended Audience

[List the target reader roles for this document and indicate which sections are most relevant to each. This helps readers quickly navigate to the content most applicable to their responsibilities.]

RoleDescriptionKey Sections
Architecture / EngineeringSolution architects, cloud engineers, and technical leads responsible for platform design and implementation3 Platform Overview, 5 Integration View, 6 Infrastructure View
Security / Risk / ComplianceInformation security officers, risk analysts, and compliance managers assessing platform security posture8 Security View, 3.5 Guardrails and Compliance, 7 Information View
Infrastructure & NetworksNetwork engineers and infrastructure teams responsible for connectivity, firewall rules, and environment provisioning6.2 Network Architecture, 6.1 Deployment Architecture, 6.4 Infrastructure Requirements
Business Sponsors / Process OwnersBusiness stakeholders sponsoring the platform initiative and overseeing use case onboarding2 Business View, 4 Use Case Register, 6.5 Licensing and Cost Considerations
BAU Support / OperationsOperational support teams responsible for ongoing platform monitoring, incident response, and maintenance9 Support View, 6.6 Backup and Recovery, 6.8 Failover and High Availability

Document Approval Requirements

The following table describes the approval gates required for this document:

Approval GateStatusDate
[Gate Name, e.g. Security Endorsement][Pending / Complete][Date]
[Gate Name, e.g. Architecture Peer Review][Pending / Complete][Date]
[Gate Name, e.g. Architecture Board Endorsement][Pending / Complete][Date]

1 Introduction

This document is the Platform Architecture Document (PAD) for the [Platform Name].

The purpose of this document is to:

  • Describe the foundational infrastructure and architectural patterns of the platform
  • Document the integration patterns and standard interfaces available to use cases
  • Capture the networking, security, and information governance requirements
  • Serve as the central reference point for IT stakeholders concerned with cyber security, risk and compliance, information classifications, integrations, and architectural patterns
  • Act as the parent architecture document for all Solution Design Documents that leverage this platform

The [Platform Name] is an Azure-hosted Generative AI platform built on the NeuralOps Platform — an enterprise Generative AI platform deployable from the Azure Marketplace that provides enterprise-grade agentic workflow and conversational agent capabilities. The platform consolidates AI services, data processing pipelines, and operational tooling into a single managed environment that can be extended to support multiple business use cases over time, with customisation support provided by calab.ai.

Relationship to other documents: This PAD is the foundational reference for all Solution Design Documents (SDD) and Opportunity Assessments (OAD) that target this platform. Each SDD includes an Architectural Impact Assessment (PROJECT CODE – USE CASE CODE – SDD) that references sections of this document.


2 Business View

2.1 Background

[Provide client-specific background on why this platform is being established. The standard framing below can be adapted.]

The [Platform Name] was established to provide a secure, governed, and reusable foundation for deploying Generative AI use cases within the [Client] Azure environment. The platform consolidates AI services, data processing pipelines, and operational tooling into a single managed environment that can be extended to support multiple business use cases over time.

[If applicable, describe the phased approach used to establish the platform. A typical pattern is:]

  • Phase 1 — Proof of Value (POV): Initial demonstration of GenAI capabilities deployed outside of the [Client] Azure environment using anonymised data.
  • Phase 2 — Detailed Design: Identification of platform capabilities, architecture patterns, and the approvals required to deploy within the [Client] Azure environment.
  • Phase 3 — Production Deployment: Deployment of the approved platform architecture and operationalisation of the first registered use case.

2.2 Objectives

The [Platform Name] aims to achieve the following objectives:

  1. Provide a secure, compliant Azure environment for hosting Generative AI workloads behind [Client]‘s private network.
  2. Establish reusable AI services (document processing, language understanding, speech-to-text, embeddings, search) that can be shared across use cases.
  3. Enable governed data ingestion, enrichment, and retrieval workflows through standardised platform components.
  4. Deliver operational tooling for monitoring, logging, alerting, and cost management across all platform workloads.
  5. Support iterative onboarding of new AI use cases with minimal incremental infrastructure provisioning.

2.3 Scope

| Area | In Scope | Out of Scope | | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | -------- | | Infrastructure | The following resources are deployed as part of this platform: Azure AI Search, Azure Storage Account, Azure Key Vault, Azure Function App (Backend), Azure App Service (Web Applications x2), App Service Environment, Azure AI Document Intelligence, Azure OpenAI Service (x2 — primary reasoning and global voice/audio), Azure Speech Service, Azure Content Safety, Azure Container Registry, Event Grid System Topic, Application Insights, Azure Cosmos DB, Log Analytics Workspace, Azure Monitor Dashboard, Azure Monitor Workbook. | Infrastructure managed by external integration partners (e.g. RPA component infrastructure). | | Access Control and Security | User access management to ensure only authorised users can access platform resources and AI outputs. Compliance with security and governance policies. | Access management for external systems connecting to the platform (managed by respective system owners). | | Environment Management | Separate non-production and production instances for development, testing, and troubleshooting without interrupting business operations. | Environment management for external integration components. | | Middleware and Identity | Setup of middleware (e.g. Azure Function Apps, Azure Web Apps) using managed identity for seamless operation. Monitoring and logging usage and tokens consumed by platform services. | External middleware and identity management. | | Data Security | Secure storage and transmission of information processed by the platform. Compliance with [Client]‘s data use policy and Generative AI policy. | Data security management for external integration systems. | | Cost Management | Reporting on Azure consumption costs associated with the platform. | External system cost components (e.g. RPA licensing). | | Admin / Configuration | Access to configure AI processing parameters via prompt configurations. Output formats in human-readable document formats. Various ingestion strategies for different document types. | Use case-specific configuration options (documented in respective SDDs). |

2.4 Dependencies and Constraints

#Dependency / Constraint
DP-01GenAI Platform Services: Dependency on Azure AI services (OpenAI, Document Intelligence, Speech, Content Safety, AI Search) for core platform processing capabilities. Platform availability and performance are dependent on the health of these services.
DP-02Azure Services: Dependency on Azure services for middleware, storage, identity management, and cost reporting. Integration with Azure Active Directory (Entra ID) for managed identities and access control.
DP-03Monitoring and Logging: Dependency on Azure Monitor, Log Analytics, and Application Insights for platform health monitoring, usage logging, and troubleshooting.
DP-04External Integration Components: Integration with external systems (e.g. RPA, enterprise data platforms) follows existing development lifecycle patterns and is governed by the respective system architecture documentation. Data or files passed from external components to the platform must be reviewed to ensure appropriateness as per the 8.2 Authentication and Authorisation and 8.1 Information Classification sections of this document.
CN-01User Training and Adoption: Ensuring users are adequately trained to use platform interfaces. Change management processes to facilitate smooth adoption of platform capabilities.
[DP/CN-##]> [Add client-specific dependencies and constraints as needed, e.g. enterprise data platform connectivity, network migration timelines, etc.]

2.5 Assumptions

#AssumptionConsequences if Invalid
AS-01Data Quality and Format: Work instructions for generating AI outputs are accurate and appropriate for intended results. Data sources used for generating outputs are relevant and collectively exhaustive.Low quality outputs from the platform.
AS-02Security and Compliance Measures: Existing security and compliance measures are adequate and will support the platform without requiring significant changes. Any additional security measures needed can be implemented within scope and timeline.Registration of new risks or new technical debt.
AS-03User Adoption and Training: Users will receive adequate training and support to use platform interfaces effectively. There is a willingness among users to adopt new processes and tools provided by the platform.Wide adoption of the platform within the organisation will be impacted.
[AS-##]> [Add client-specific assumptions as needed.]

2.6 Risks

[Risks are assessed using [Client]‘s risk management framework. Once the platform architecture is endorsed/approved, risk items are moved to the appropriate risk register for ongoing management.]

  • Consequence is one of the following: Level 1 (lowest), Level 2, Level 3, Level 4, Level 5 (highest)
  • Likelihood is one of the following: Rare, Unlikely, Possible, Likely, Almost certain
  • Rating is one of: Low, Medium, High, Very High, Extreme
#Risk DescriptionConsequenceImpact RatingMitigation ControlsLikelihood
RI-01Handling of PII data: Personally Identifiable Information (PII) processed by the platform should be accessed only by appropriate [Client] staff.Regulatory Compliance: Failure to handle PII data correctly could result in compliance failures as per applicable privacy legislation.[Rating]All data processed by the platform will be stored only within the [Client] Azure tenant for internal use, ensuring no avenues of data leakage. Access to platform endpoints will be restricted to [Client] private networks to further lockdown access.[Likelihood]
[RI-##]> [Add client-specific risks as needed.]

2.7 Technical Debt

#TitleDescriptionSourceOwner (Platform or Team)Date Raised
DEBT-01[Title][Description][Source][Owner][Date]

2.8 References

[List key reference documents, architecture patterns, and standards that are relevant to this platform. Include links where available. The entries below are examples — replace with client-specific references.]

#SourceRelevance
REF-01[Client] Cloud & Network PatternsHome page for all Cloud & Network patterns used at [Client]
REF-02[Client] Cloud Network Integration PatternsParent page for all cloud network integration patterns
REF-03[Client] Solution Architecture Templates GuideHow to guide for [Client] Solution Architecture Templates
REF-04[Client] PaloAlto Firewalls in AzureTechnical documentation reference for PaloAlto Firewalls within the [Client] Azure environment
REF-05[Client] Azure Naming ConventionsTechnical documentation reference for Azure naming conventions
REF-06[Client] App Service & Function Apps + Private Endpoint & ASE PatternPattern specific to Azure App Services & Function Apps with Private Endpoints and ASE
REF-07[Client] PaaS to PaaS PatternPattern specific to PaaS-to-PaaS communications (e.g. Cosmos DB & Azure App Services)
[REF-##]> [Add additional client-specific references.]

3 Platform Overview

3.1 Platform Description

The [Platform Name] is an Azure-hosted platform that provides the shared infrastructure, AI services, and operational tooling required to deploy and operate Generative AI use cases within [Client]‘s private network. The platform is built on the NeuralOps Platform — an enterprise Generative AI platform deployable from the Azure Marketplace, with customisation support provided by calab.ai. It provides enterprise-grade agentic workflow and conversational agent capabilities. The platform is composed of four logical modules:

  1. AI Engine — Core AI reasoning and user interaction layer, providing semantic search, natural language processing, conversational interfaces (including voice via WebRTC), and AI-generated output storage. Supports multiple orchestration strategies including OpenAI Agents, LangChain, and Prompt Flow.
  2. AI Pre-Trainer — Data preparation and enrichment layer, responsible for ingesting raw data, extracting content from documents and audio, generating embeddings (text-embedding-3-large), and indexing processed content for downstream AI consumption. Supports multiple chunking strategies (fixed size, layout, page, paragraph, HTML header, table-specific).
  3. External Integration Layer — Interfaces with external systems (e.g. file repositories, enterprise data platforms, automation tools) to facilitate data exchange with the platform.
  4. Operational Services Layer — Cross-cutting Azure services that enable identity management, security, monitoring, alerting, and event-driven orchestration across all platform components.

The platform comprises four application components:

ComponentTechnologyPurpose
Chat Web AppReact 18 / TypeScript / Vite (frontend) + Flask / Python 3.11 (backend)User-facing chat interface with streaming responses, WebRTC voice, citations, and configurable agent personas
Admin Web AppStreamlit / Python 3.11Administration dashboard for data ingestion, configuration management, index management, and prompt configuration
Function App BackendAzure Functions v2 / Python 3.11Serverless batch processing: document ingestion, embedding generation, indexing, and AI output generation. Uses Azure Functions Python Blueprints
Teams ExtensionTypeScriptMicrosoft Teams bot integration for conversational AI access within Teams (optional)

[Diagram: Platform overview showing the four logical modules — AI Engine, AI Pre-Trainer, External Integration Layer, and Operational Services Layer — with their constituent Azure resources and interconnections. Recommended: draw.io or Mermaid component diagram]

3.2 Platform Capabilities

The [Platform Name] provides the following capabilities to registered use cases:

CapabilityDescriptionStatus
Document IngestionIngest and store raw documents (PDF, DOCX, DOC, TXT, HTML, XLSX, XLS, CSV, PPTX, MD, JSON, XML, RTF) and audio files (WAV) for processing.Available
Document IntelligenceExtract structured content from documents using computer vision and OCR capabilities (Azure AI Document Intelligence).Available
Audio TranscriptionTranscribe audio files into diarised text with speaker identification and summarisation (Azure Speech Service).Available
Content Chunking & IndexingBreak down processed content into optimised chunks using configurable strategies (fixed size overlap, layout, page, paragraph, HTML header, table-specific) and index for semantic retrieval.Available
Semantic SearchQuery indexed content using natural language with vector-based semantic search (Azure AI Search with semantic ranker).Available
LLM ReasoningPerform reasoning and analysis over text-based information using Azure OpenAI models (GPT-4o). Supports multiple orchestration strategies: OpenAI Agents, OpenAI Function Calling, LangChain, and Prompt Flow.Available
Embedding GenerationGenerate vector embeddings from processed content using text-embedding-3-large for semantic understanding.Available
Conversational InterfaceWeb-based chat interface for users to query AI-generated insights and results. Supports streaming, citations, and conversation history.Available
Voice InterfaceWebRTC-based voice interaction using GPT-4o-mini-audio-preview and GPT-4o-mini-realtime-preview models.Available
Configuration ManagementStreamlit-based admin interface for managing AI processing configurations, prompt templates, workspace settings, and document upload.Available
Output GenerationGenerate AI-driven reports and outputs based on configurable agent chain-of-thought logic.Available
Content SafetyContent moderation and safety filtering via Azure Content Safety service.Available
Event-Driven ProcessingTrigger automated processing workflows based on storage events (blob created/deleted) via Event Grid → Storage Queue → Function App.Available
API IntegrationExpose and consume APIs securely via API Management and private network routing.Available
Monitoring & AlertingCentralised monitoring, logging, alerting, and dashboarding across all platform components via Application Insights, Log Analytics, Dashboards, and Workbooks.Available

3.3 Component Architecture

3.3.1 Principal Accounts

All platform resources are deployed within the [Client] Azure environment ([Azure Region] region) under the [Platform Name] subscription. Resources are organised into environment-specific Resource Groups (DEV, PPD, PRD) within this subscription.

Account / SubscriptionPurposeEnvironment(s)
[Platform Name] Azure SubscriptionHosts all platform resourcesDEV, PPD, PRD
[Client] Azure Active Directory (Entra ID)Identity provider for user and service authenticationAll
GitHub OrganisationSource code and infrastructure-as-code repositoriesAll

3.3.2 Solution Technologies

The table below describes the key technology components deployed as part of the [Platform Name]. Resource naming follows the convention {resource-prefix}-{resourceToken} where resourceToken = toLower(uniqueString(subscription().id, environmentName, location)).

[The Impact and Source UC columns should be updated as use cases are onboarded. For a new platform, set all impacts to NEW and Source UC to the first registered use case.]

Component GroupNameDescriptionImpactSource UC
AI EngineAzure AI SearchProvides semantic search capabilities (SKU: Standard) with semantic ranker. Indexes content and metadata for vector-based retrieval. SystemAssigned managed identity. 1 partition, 1 replica (scalable).NEW[UC-##]
AI EngineAzure OpenAI Service (Primary)Performs reasoning over text-based information using GPT-4o (v2024-11-20, Standard deployment, 30K TPM capacity). Generates embeddings using text-embedding-3-large (v1, Standard deployment, 300K TPM capacity). SKU: S0.NEW[UC-##]
AI EngineAzure OpenAI Service (Global/Voice)Provides voice and audio capabilities. Models: gpt-4o-mini-audio-preview (GlobalStandard, 3K capacity) and gpt-4o-mini-realtime-preview (GlobalStandard, 3K capacity). Deployed to a global region (e.g. eastus2) for model availability.NEW[UC-##]
AI EngineAzure Cosmos DBStores chat conversation logs, workspace configurations, and AI-generated outputs. Kind: GlobalDocumentDB, Serverless capacity mode, Session consistency. Database: db_conversation_history. Containers: conversations (/userId), configurations (/workspaceId), workspaces (/tenantId).NEW[UC-##]
AI EngineReact & Flask Chat Web AppUser-facing chat interface. React 18 / TypeScript / Vite frontend with Tailwind CSS and Radix UI. Flask / Python 3.11 backend. Supports streaming, WebRTC voice, citations, agent personas, and conversation history. Deployed as Docker container on App Service. public_network_access_enabled = false, vnet_image_pull_enabled = true (container images pulled via VNet).NEW[UC-##]
AI Pre-TrainerAzure Storage AccountStores indexed input data and output files; message queue processing items for batch workflows; JSON-based configuration created via the Admin App. SKU: Standard_GRS, Hot tier, StorageV2, TLS 1.2. Blob containers: documents, config. Queues: doc-processing, doc-processing-poison.NEW[UC-##]
AI Pre-TrainerAzure AI Document IntelligenceExtracts information from uploaded documents using computer vision and OCR. Kind: FormRecognizer, SKU: S0.NEW[UC-##]
AI Pre-TrainerAzure Speech ServiceProcesses and transcribes audio files into diarised text with speaker identification. Kind: SpeechServices, SKU: S0.NEW[UC-##]
AI Pre-TrainerAzure Content SafetyContent moderation and safety filtering for AI-generated outputs. Kind: ContentSafety, SKU: S0.NEW[UC-##]
AI Pre-TrainerAzure Computer VisionAdvanced image processing capabilities (optional — deployed conditionally when useAdvancedImageProcessing is enabled). Kind: ComputerVision, SKU: S1.NEW[UC-##]
AI Pre-TrainerAzure Function AppCore processing logic for the AI Pre-Trainer and AI Engine. Handles document ingestion, chunking, embedding generation, indexing, AI output generation, and inter-component orchestration. Azure Functions v4, Python 3.11, Docker container deployment from ACR. public_network_access_enabled = false, vnet_image_pull_enabled = true.NEW[UC-##]
AI Pre-TrainerStreamlit Admin Web AppStreamlit-based web application for management of platform configurations. Manages chunking strategies, prompt configurations, workspace settings, and uploaded documents. Python 3.11, deployed as Docker container on App Service. public_network_access_enabled = false, vnet_image_pull_enabled = true.NEW[UC-##]
External Integration[Integration System Name]> [Describe external integration systems relevant to this deployment, e.g. RPA tools, enterprise data platforms, file repositories.]EXISTINGPlatform
Operational ServicesAzure AD (Entra ID)Manages authentication and user identity across the platform. App Services have built-in AAD authentication enabled.EXISTINGPlatform
Operational ServicesLog Analytics WorkspaceCollects, analyses, and acts on telemetry data from Azure resources. Tracks platform health, performance, and diagnostic logs. SKU: PerGB2018, 30-day retention.NEW[UC-##]
Operational ServicesAzure Key VaultSecurely stores and manages sensitive information such as certificates, cryptographic keys, and connection strings. SKU: Standard. Access policies configured for managed identity and deployment principal.NEW[UC-##]
Operational ServicesApplication InsightsMonitors live application logs, detects and diagnoses performance issues, and provides usage pattern analytics. Kind: web, linked to Log Analytics workspace.NEW[UC-##]
Operational ServicesAzure Monitor DashboardCentralised location to visualise and share metrics, logs, and telemetry data. Includes charts for sessions, users, failures, response time, CPU, and memory.NEW[UC-##]
Operational ServicesAzure Monitor WorkbookInteractive reports combining text, queries, and visualisations for a unified view of platform resources.NEW[UC-##]
Operational ServicesEvent Grid System TopicMonitors Storage Account Blob Container events (BlobCreated, BlobDeleted on documents container) and triggers Azure Function App processing workflows via the doc-processing queue. Retry: 30 attempts, 1440 min TTL.NEW[UC-##]
Operational ServicesAzure Container RegistryStores Docker images for platform application components (frontendwebapp, adminwebapp, backendapi). SKU: Standard, admin user enabled.NEW[UC-##]
Operational ServicesApp Service Environment (ASEv3)Dedicated, isolated hosting environment for App Services and Function Apps within the [Platform Name] VNet. Version: ASEv3. Internal Load Balancing Mode: Web, Publishing (fully internal ILB — both web traffic and deployment traffic are internal, no public-facing endpoints). Cluster settings: configurable FrontEndSSLCipherSuiteOrder for TLS cipher control. App Service Plans within the ASE use Isolated v2 tier SKUs (I1v2/I2v2/I3v2). Provides network-level isolation and enhanced security.NEW[UC-##]

[List external applications that the platform integrates with at the platform level. Replace the examples below with client-specific applications.]

NameDescriptionApplication TypeComments
[RPA Tool, e.g. UiPath / Appian]Provides automation capabilities for data collection, file mapping, and workflow triggering.API / Desktop> [Architecture governed by existing RPA solution documentation.]
[Enterprise Data Platform]Provides historical and contextual data for enrichment of AI processing workflows.API> [Connectivity details to be confirmed.]
API ManagementExposes platform backend APIs securely. Routes external requests through the enterprise firewall.API Gateway
[Source System Name(s)]> [Describe source systems that provide data to the platform.][Type]> [Integration details.]

3.4 Architecture Decision Records

The following architecture decisions have been made for this platform:

IDDecisionDescriptionRationale
PAD-ADR-01Use of Private EndpointsData classifications for this platform require usage of Private Endpoints (as opposed to Service Endpoints) for all PaaS services. All Azure PaaS services are accessed exclusively via Private Endpoints with no public internet exposure. Specific PE subresources: Storage (blob, queue), Cosmos DB (Sql), Key Vault (vault), AI Search (searchService), OpenAI (account), Cognitive Services (account), Container Registry (registry), App Services (sites). Each PE is registered in a centrally managed Private DNS Zone for automatic DNS A-record resolution. All PaaS services also enforce network ACLs with a default deny action. See 6.2.1 Network Components for the full Private Endpoint configuration matrix.Compliance with [Client] security patterns and data classification requirements. Private Endpoints provide full network-level isolation compared to Service Endpoints which only restrict traffic at the service level. Centralised Private DNS Zone management ensures consistent name resolution across the enterprise.
PAD-ADR-02App Service Environment v3 (ASEv3)The platform uses App Service Environment v3 (ASEv3) for hosting App Services and Function Apps, rather than standard App Service Plans. ASEv3 provides dedicated, isolated compute within the [Client] VNet. Configuration: Internal Load Balancing mode (Web, Publishing) ensures no public endpoints. App Service Plans use Isolated v2 tier SKUs (I1v2/I2v2/I3v2). All application components have public_network_access_enabled = false and vnet_image_pull_enabled = true to ensure container images are pulled via the VNet rather than the public internet.ASEv3 simplifies network complexity, provides greater network control and compute isolation, and provides greater flexibility around ingress and egress application traffic compared to standard App Service Plans. Internal ILB mode ensures all traffic remains within [Client]‘s private network.
PAD-ADR-03NVA-enabled Network DesignThe platform design follows [Client] cyber security guidelines requiring all resources and initiatives to be deployed behind the NVA (Network Virtual Appliance) hub. All traffic is inspected by PaloAlto NVA.Compliance with [Client] cyber security guidelines for network traffic inspection and control.
[PAD-ADR-##]> [Add client-specific ADRs as needed.]

3.5 Guardrails and Compliance

[If the client has architecture guardrails or compliance standards, document adherence here. The table below shows a typical pattern — replace references with client-specific guardrail identifiers.]

Guardrail TitleReferenceAdherence / DeviationRationale
App Service Environments & Private Endpoints[Client Pattern Reference]ADHERENCEApproved pattern. ASEv3 with Internal Load Balancing (Web, Publishing). All PaaS services accessed via Private Endpoints with centralised Private DNS Zone registration. App Services have public network access disabled and VNet image pull enabled.
PaaS to PaaS Communications (Cosmos DB, AI Services, Storage)[Client Pattern Reference]ADHERENCEManaged Identity + Key Vault; VNet Integration + Private Endpoint.
Storage Accounts[Client Pattern Reference]ADHERENCEStorage Account with private network communications. Sensitive data protected.
Azure Container Registry[Client Pattern Reference]ADHERENCEAdherence to approved pattern.
Azure Key Vault[Client Pattern Reference]ADHERENCEAdherence to approved pattern.
Staff connecting to Web Apps[Client Pattern Reference]ADHERENCEAccess via Zscaler Private Access, Corporate Office, or WVD.
Identity and Access Management[Client Pattern Reference]ADHERENCEWeb App interfaces authorised via RBAC. User authentication via Azure AD with MFA enabled.
Logging[Client Pattern Reference]ADHERENCELogs and metrics captured via Azure Monitor for all applicable resources.
Encryption[Client Pattern Reference]ADHERENCEEncryption at rest and in transit are compliant.
Secret Management[Client Pattern Reference]ADHERENCESecrets managed and accessed via Azure Key Vault.
Diagnostic Settings[Client Pattern Reference]ADHERENCEEnabled for all applicable resources.

3.6 Architectural Principles

[Document the architectural principles that govern the platform design. For each principle, describe how the platform architecture adheres to it. Source principles from the client’s enterprise architecture framework or standards body.]

#PrincipleDescriptionPlatform Adherence
AP-01[Principle Name]> [Description of the principle]> [How the platform design adheres to this principle]

4 Use Case Register

[This section maintains a register of all use cases (Solution Designs and Opportunity Assessments) that are built on this platform. Each entry links to the relevant documents and notes the current status. This provides a single view of everything running on the platform. This register should be updated whenever a new use case is onboarded, decommissioned, or materially changed. Maintenance responsibility lies with the platform architecture owner.]

[Note: A single use case may have multiple SDD documents if the use case requires distinct solutions (e.g. different automation approaches). List each SDD as a separate entry in the SD Document(s) column.]

#Use Case NameOA DocumentSD Document(s)StatusDate OnboardedKey Platform Impacts
UC-01[Use Case Name][Link to OAD or “N/A”][Link to SDD(s)][Active / In Design / Decommissioned][Date][Brief summary of platform changes required]

5 Integration View

5.1 Integration Patterns

The [Platform Name] uses the following integration patterns for communication between platform components and external systems:

  • REST API over HTTPS: Primary pattern for synchronous communication between platform components and external system triggers. Used between Chat Web App ↔ Function App, Admin Web App ↔ Function App, and external API consumers.
  • Azure Blob API: Used for file storage and retrieval operations between Function Apps and Storage Accounts. Documents are uploaded to the documents blob container for processing.
  • Private Link / TLS: Used for secure communication between ASE-hosted components and private endpoint-enabled services (Cosmos DB, Storage, Key Vault, AI Services, AI Search, Container Registry). Each PaaS service has a dedicated Private Endpoint deployed to the Private Links Subnet with automatic DNS A-record registration in the corresponding Private DNS Zone. Private DNS zones are managed centrally in [Client]‘s shared services subscription. See 6.2.1 Network Components for the full Private Endpoint configuration matrix.
  • Event-Driven (Event Grid → Storage Queue): Used for asynchronous processing triggers based on storage events. Blob created/deleted events on the documents container are published via Event Grid System Topic to the doc-processing Storage Queue, which triggers Function App processing.
  • Managed Identity: Used for authentication between Azure PaaS components, eliminating the need for key-based authentication. All service-to-service communication uses SystemAssigned managed identities with RBAC role assignments.

[Diagram: Integration diagram showing API interactions between platform components — Function App, Admin Web App, Chat Web App, Storage Account, Azure AI Services, Cosmos DB, Azure AI Search, Key Vault, and network boundary components (PaloAlto NVA, ASE) — with protocol annotations (HTTPS/REST API, Azure Blob API, Private Link/TLS). Recommended: draw.io or Mermaid sequence/flow diagram]

5.2 Standard Interfaces

The integration interfaces for the platform are described below.

[Update the Impact column as use cases are onboarded. For a new platform, set all impacts to NEW.]

Integration ProcessDescriptionImpactInterfaces
Upload Source FilesExternal automation processes or platform users upload source data files for processing and indexing.NEWVia External Automation: 1. External System → ASE (Function App) | HTTPS / REST API | Orchestrates file upload and data processing requests; 2. Function App → Storage Account | HTTPS / Azure Blob API | Stores raw uploaded files for indexing. Via Admin Web App: 1. User → ASE (Admin Web App) | HTTPS / REST API | Authenticate and interact with web app; 2. Admin Web App → ASE (Function App) | HTTPS / REST API | Orchestrates file upload and data processing requests; 3. Function App → Storage Account | HTTPS / Azure Blob API | Stores raw uploaded files for indexing.
Generate AI InsightsThe Function App processes uploaded data by calling Azure AI services for analysis and insight generation. Indexed results are stored for retrieval.NEW1. Function App → Azure AI Services (Document Intelligence, OpenAI) | HTTPS / REST API | Invokes AI services for data analysis; 2. Azure AI Services → Function App | HTTPS | Returns processed insights and metadata; 3. Function App → Cosmos DB | HTTPS | Stores AI-generated results and metadata for querying; 4. Function App → Azure AI Search | HTTPS / REST API | Indexes metadata for fast retrieval.
Query AI InsightsUsers query AI-generated insights using the Chat Web App. Queries are routed to Azure AI Search for retrieval and Azure OpenAI for reasoning.NEW1. User → Chat Web App | HTTPS / Web Interface | User sends queries via the chat application; 2. Chat Web App → Azure AI Search | HTTPS / REST API | Executes queries to retrieve indexed data; 3. Chat Web App → Azure OpenAI | HTTPS / REST API | Sends retrieved context + query for LLM reasoning; 4. Azure OpenAI → Chat Web App | HTTPS (streaming) | Returns AI-generated response for display.
Secure Data ManagementApplication secrets and sensitive data are securely managed using Azure Key Vault. Private Link subnet enables secure communication with integrated resources.NEW1. App Services → Key Vault | HTTPS | Retrieves application secrets for secure operations; 2. Virtual Network (ASE Subnet) → Private Links Subnet | Private Link / TLS | Provides secure connectivity to resources like Storage and Cosmos DB.
Event-Driven ProcessingStorage events trigger automated document processing workflows.NEW1. Storage Account (Blob) → Event Grid System Topic | Event Subscription | BlobCreated/BlobDeleted events on documents container; 2. Event Grid → Storage Queue (doc-processing) | Queue Message | Triggers Function App processing; 3. Function App → AI Services | HTTPS / REST API | Processes document through ingestion pipeline.
Monitoring and LoggingAzure Monitor collects telemetry and diagnostic data for platform performance tracking and troubleshooting.NEW1. All App Services/Function App → App Insights | HTTPS | Sends telemetry data for monitoring; 2. Azure Monitor Resources → Dashboards | HTTPS | Displays performance metrics and alerts for administrators.
Expose APIsThe App Service Environment integrates with API Management to expose backend APIs securely. External requests are routed through the enterprise firewall.NEW1. API Management → PaloAlto NVA | HTTPS / TLS | Routes secure API requests; 2. PaloAlto NVA → External Systems | HTTPS | Ensures secure external communication.

5.3 Middleware Components

ComponentTypePurpose
Azure Function AppServerless ComputeCore middleware for orchestrating data processing, AI service calls, and inter-component communication. Hosted within the App Service Environment. Uses Azure Functions v4 with Python Blueprints for modular function registration.
Event Grid System TopicEvent BrokerProvides event-driven triggers for storage-based events (blob created/deleted on documents container), publishing to Storage Queue for Function App consumption. Retry policy: 30 attempts, 1440 min TTL.
API ManagementAPI GatewayExposes platform APIs to authorised consumers. Provides rate limiting, authentication, and routing capabilities.
PaloAlto NVANetwork FirewallInspects and secures all traffic entering and leaving the platform VNet, including inter-VNet and external communications.

6 Infrastructure View

6.1 Deployment Architecture

The deployment architecture supports [Platform Name] component deployments across Development (DEV), Pre-Production (PPD), and Production (PRD) environments. All deployments leverage infrastructure-as-code (IaC) via Bicep templates and are managed using CI/CD pipelines for consistency.

The platform uses a tag-driven CI/CD pipeline:

  1. Developers merge to develop branch
  2. Semantic-release creates a Release Candidate (RC) tag (vX.Y.Z-rc.N)
  3. RC tag triggers automatic deployment to staging environment
  4. Playwright smoke tests run against staging
  5. On success, a General Availability (GA) tag (vX.Y.Z) is created
  6. GA tag triggers production deployment (with manual approval gate via GitHub Environments)

Three Docker images are built and pushed to Azure Container Registry:

  • frontendwebapp — Chat Web App
  • adminwebapp — Admin Web App
  • backendapi — Function App Backend

[Diagram: Deployment view showing three Azure Resource Groups ([Platform Code] DEV, [Platform Code] PPD, [Platform Code] PRD) each containing Core Resources, Monitoring Resources, and Networking Resources, with CI/CD pipelines triggered from GitHub Repositories and Bicep-based provisioning. Recommended: draw.io deployment diagram]

6.1.1 Deployment Principles

  • The platform is deployed across three distinct environments, each within its own Azure Resource Group:
    1. [Platform Code] DEV Resource Group: Development environment for testing and iterative development.
    2. [Platform Code] PPD Resource Group: Pre-Production environment for integration and validation.
    3. [Platform Code] PRD Resource Group: Production environment for live applications and services.
  • Environments are fully isolated to ensure no cross-environment dependencies.
  • All changes are deployed via CI/CD pipelines (GitHub Actions) triggered from GitHub Repositories, ensuring repeatable and tested releases.
  • Tag-driven release process: RC tags deploy to staging automatically; GA tags deploy to production with manual approval.
  • The same Azure services (Core Resources, Monitoring Resources, Networking Resources) are deployed across all three environments to ensure a uniform architecture.
  • All core services (e.g. Key Vault, Storage, Cosmos DB, AI Services) are accessed securely via Private Endpoints to prevent public exposure.
  • Infrastructure is provisioned using Bicep templates via Azure Developer CLI (azd provision) for consistent deployments across environments.
  • Application components are deployed as Docker containers from Azure Container Registry to App Services in container mode.
  • Source code and infrastructure definitions are stored in GitHub Repositories.
  • Authentication to Azure uses federated credentials (OIDC) — no client secrets in CI/CD pipelines.

6.2 Network Architecture

[Diagram: Networking view showing traffic flows between staff access methods (Zscaler, Corporate Office, WVD), PaloAlto NVA firewall, and [Platform Name] VNet hosted resources in spoke VNets. Recommended: draw.io or Visio topology diagram showing hub/spoke VNets, subnets, firewall placement, and access paths]

6.2.1 Network Components

NameDescriptionReference
PaloAlto NVA (Network Virtual Appliance)Enterprise firewall that inspects all traffic between [Client] networks and [Platform Name] resources. Routes traffic between hub and spoke VNets.[Client Firewall Documentation]
App Service Environment v3 (ASEv3)Dedicated, isolated hosting environment for App Services and Function Apps within the [Platform Name] VNet. Internal Load Balancing Mode: Web, Publishing (fully internal — no public-facing endpoints). Cluster settings: configurable FrontEndSSLCipherSuiteOrder for TLS cipher control. App Service Plans use Isolated v2 tier SKUs (I1v2/I2v2/I3v2) with optional CPU-based autoscaling.[Client ASE Pattern Reference]
Hub VNetCentral network hub hosting the PaloAlto NVA and providing connectivity between on-premises networks, Zscaler, and spoke VNets.[Client Cloud Platform Zone Model]
Spoke VNet ([Platform Name])Dedicated VNet for [Platform Name] resources, peered with the Hub VNet. Contains ASE subnet and Private Links subnet. Uses custom DNS servers (region-specific) for private DNS zone resolution rather than Azure-provided DNS.[Client Cloud Platform Zone Model]
ASE SubnetSubnet within the [Platform Name] spoke VNet hosting the App Service Environment (Function Apps, Web Apps). Delegated to Microsoft.Web/hostingEnvironments. Minimum size: /27 (32 IPs). NSG associated (Azure default rules only). Route table auto-associated based on region and environment (hub route tables for prod/nonprod × Australia East/Southeast).
Private Links SubnetSubnet within the [Platform Name] spoke VNet hosting Private Endpoints for PaaS services. Private endpoint network policies disabled (PE traffic bypasses subnet-level NSG rules). NSG associated (Azure default rules only).
Route TablesHub-managed route tables auto-associated to subnets based on region (Australia East/Southeast) and environment (prod/nonprod). Ensures all traffic is routed through the PaloAlto NVA.
SDWANSite-to-site connectivity between [Client] Corporate Office networks and the Azure Hub VNet.

Subnet Service Endpoints

Both subnets include the following default service endpoints for management plane connectivity:

Service EndpointPurpose
Microsoft.AzureCosmosDBCosmos DB service endpoint
Microsoft.ContainerRegistryContainer Registry service endpoint
Microsoft.EventHubEvent Hub service endpoint
Microsoft.KeyVaultKey Vault service endpoint
Microsoft.ServiceBusService Bus service endpoint
Microsoft.SqlSQL Database service endpoint
Microsoft.StorageStorage Account service endpoint
Microsoft.WebApp Service service endpoint

[Note: Service endpoints coexist with Private Endpoints. Service endpoints provide management plane connectivity at the subnet level, while Private Endpoints provide data plane connectivity via private IP addresses. The ADR in 3.4 Architecture Decision Records (PAD-ADR-01) confirms Private Endpoints are the primary connectivity mechanism for data plane traffic.]

Private Endpoint Configuration

All Azure PaaS services are accessed via Private Endpoints deployed to the Private Links Subnet. Each Private Endpoint is registered in a centrally managed Private DNS Zone for automatic DNS resolution. Private DNS zones are hosted in a shared services subscription and resource group, managed by [Client]‘s platform team.

ResourcePE SubresourcePrivate DNS ZoneNotes
Azure Storage Account (blob)blobprivatelink.blob.core.windows.netDocument and config blob containers
Azure Storage Account (queue)queueprivatelink.queue.core.windows.netDocument processing queues
Azure Cosmos DBSqlprivatelink.documents.azure.comSQL API (GlobalDocumentDB)
Azure Key Vaultvaultprivatelink.vaultcore.azure.netSecrets and certificates
Azure AI SearchsearchServiceprivatelink.search.windows.netSemantic search indexes
Azure OpenAI Serviceaccountprivatelink.openai.azure.comLLM reasoning and embeddings
Azure AI Document Intelligenceaccountprivatelink.cognitiveservices.azure.comDocument content extraction
Azure Content Safetyaccountprivatelink.cognitiveservices.azure.comContent moderation
Azure Speech Serviceaccountprivatelink.cognitiveservices.azure.comAudio transcription
Azure Computer Visionaccountprivatelink.cognitiveservices.azure.comImage processing (optional)
Azure Container Registryregistryprivatelink.azurecr.ioDocker image registry
App Services (Chat, Admin, Function)sitesASE DNS suffix (auto-registered)App Services hosted in ASE use the ASE’s internal DNS suffix rather than standard privatelink.azurewebsites.net

[The PE Subresource column indicates the subresource_names used when creating the Private Endpoint. The Private DNS Zone column indicates the Azure Private DNS Zone where the PE’s A-record is automatically registered via a DNS Zone Group. All DNS zones follow the privatelink.{service}.{domain} naming convention and are managed centrally in [Client]‘s shared services subscription.]

6.2.2 Staff Access Methods

Zscaler Private Access (Remote / [Client] Laptop)

[Client] staff members use [Client]-managed laptops with Zscaler Client Connector installed. Users authenticate to Zscaler Client Connector using Azure AD (with MFA enabled). When a user sends a connection request to platform applications hosted on [Platform Name] VNets, Zscaler Client Connector redirects traffic (TCP or UDP) to Zscaler Private Access (ZPA) on cloud. ZPA assesses the request via ZPA policies to ensure the authenticated user has access to the [Platform Name] VNet hosted resource. If ZPA policies allow the connection, ZPA sends the traffic to a ZPA App Connector hosted on Azure. The App Connector forwards the traffic to the application hosted in the [Platform Name] VNets. Traffic between the ZPA App Connector and [Platform Name] VNet applications is inspected by the PaloAlto Firewall.

  • Pre-conditions: User connecting via a [Client]-managed laptop from outside of the Corporate Office. Zscaler Client Connector application must be running on the laptop.

[Client] Corporate Office (On-Premises)

User is connected to [Client] trusted network using a [Client]-managed laptop. Since the user is already connected to the trusted network, ZPA is not required to connect to [Platform Name] VNet hosted resources. Connections between user and Azure are established using SDWAN. Traffic between the [Client] network and [Platform Name] VNet hosted resources in the spoke VNet is inspected by the PaloAlto Firewall. PaloAlto NVA allows access from the [Client] private network to [Platform Name] VNet hosted resources.

  • Pre-conditions: User connecting via a [Client]-managed laptop using Corporate Office Network Connection. PaloAlto NVA allows connectivity from [Client] offices to [Platform Name] VNet hosted resources in the spoke VNet.

Windows Virtual Desktop (WVD in Azure)

Windows Virtual Desktop provides the ability to connect to Virtual Desktops running on Azure. Microsoft manages portions of the Windows Virtual Desktop service on [Client]‘s behalf and provides secure endpoints for connecting clients and session hosts. Users need to have access to the Windows Virtual Desktop Host Pool or the application hosted on them.

  • Pre-conditions: User connecting via a Virtual Desktop hosted on Azure. PaloAlto NVA allows connectivity from Virtual Desktops address range to [Platform Name] VNet resources in the spoke VNet.

6.2.3 Network Interactions

All network interactions within the platform follow these principles:

  • Ingress: All user traffic enters through one of the three staff access methods (Zscaler, Corporate Office, WVD) and is inspected by the PaloAlto NVA before reaching platform resources.
  • Internal (PaaS-to-PaaS): Communication between Azure PaaS services uses Private Endpoints within the [Platform Name] VNet, with Managed Identity authentication. Each PaaS service enforces network ACLs with a default deny action, allowing only traffic from authorised VNet subnets and IP ranges.
  • Storage Account Network Rules: Default action: Deny. Allows VNet subnet access + authorised IP ranges. Bypass: Logging, Metrics, AzureServices. Private link access granted to Document Intelligence, Speech Service, and Computer Vision for direct BYOS (Bring Your Own Storage) connectivity.
  • Key Vault Network Rules: RBAC authorisation mode. Default action: Deny. Allows authorised subnet IDs + IP ranges. Bypass: AzureServices.
  • AI Services Network Rules: Each Cognitive Service enforces network ACLs with VNet rules and IP-based allow lists.
  • Egress: Outbound traffic to external data sources is routed through the PaloAlto NVA for inspection and policy enforcement.
  • Encryption: All data is encrypted via TLS between all components, both in transit and at rest.

6.3 Environment Strategy

All non-production environments are provisioned as duplicates of the production environment. Only dummy data is used in non-production environments. Should a requirement arise to use production data in non-production environments for testing purposes, this will be raised with the appropriate teams (e.g. Data Governance, Security) and deleted immediately after the test scenarios are completed.

EnvironmentPurposeData Policy
DEVDevelopment and iterative testingDummy / synthetic data only
PPDPre-production integration and validationDummy / synthetic data only
PRDProduction live servicesProduction data with full security controls

6.4 Infrastructure Requirements

RequirementDescription
Azure Subscription[Platform Name] subscription within [Client] Azure environment ([Azure Region])
Resource GroupsThree isolated resource groups (DEV, PPD, PRD)
NetworkingHub/Spoke VNet topology with PaloAlto NVA. Spoke VNet with custom DNS servers. ASE subnet (delegated to Microsoft.Web/hostingEnvironments, minimum /27) and Private Links subnet (PE network policies disabled). Route tables auto-associated from hub. 8 default service endpoints on all subnets.
ComputeApp Service Environment v3 (ASEv3) with Internal Load Balancing (Web, Publishing). Isolated v2 tier SKUs (I1v2/I2v2/I3v2). Optional CPU-based autoscaling.
StorageAzure Storage Accounts with Private Endpoints (Standard_GRS, Hot tier)
DatabaseAzure Cosmos DB (Serverless, Session consistency) with Private Endpoints
AI ServicesAzure OpenAI (x2 — primary + global/voice), Document Intelligence, Speech, Content Safety, AI Search
Container RegistryAzure Container Registry (Standard SKU) with Private Endpoint
Secrets ManagementAzure Key Vault (Standard SKU) with Private Endpoint
MonitoringLog Analytics (PerGB2018, 30-day retention), Application Insights, Dashboards, Workbooks
Event HandlingEvent Grid System Topic for storage event processing
Infrastructure-as-CodeBicep templates provisioned via Azure Developer CLI (azd)
CI/CDGitHub Actions pipelines for deployment automation (tag-driven: RC → staging → GA → production)
AuthenticationAzure AD federated credentials (OIDC) for CI/CD — no client secrets

6.5 Licensing and Cost Considerations

[Describe any licensing or cost implications for the platform infrastructure. Use-case-specific cost analysis belongs in the OAD. The items below are standard considerations — adjust for client-specific pricing and agreements.]

ComponentCost Consideration
App Service Environment v3 (ASEv3)ASEv3 is a premium Azure service with dedicated Isolated v2 tier compute resources (I1v2/I2v2/I3v2 SKUs), which incurs higher costs compared to shared App Service Plans. Cost is incurred regardless of workload utilisation. Internal Load Balancing mode ensures no additional public IP costs.
Azure OpenAI ServicesCosts are based on usage, including the number of API calls, token consumption (TPM quotas), and model deployment types (Standard vs GlobalStandard). Costs scale with the number of use cases and processing volume.
Azure AI SearchCosts based on SKU tier (Standard), number of search units (replicas × partitions), and semantic ranker usage.
Azure Cosmos DBServerless billing based on consumed Request Units (RU) and storage. Cost scales with concurrent user sessions and data volume.
Azure Container RegistryStandard SKU with per-image storage and bandwidth charges.

6.6 Backup and Recovery

The [Platform Name] does not require extensive backup and recovery policies due to the transient nature of the processing workloads. Key considerations:

AspectApproach
Transient ProcessingThe platform is primarily used to generate point-in-time outputs to assist users. Data processed is transient in nature.
Source Data RegenerationIf any data persisted by the platform is lost (e.g. AI-generated reports), outputs can be regenerated by reprocessing the source data.
Automated ProvisioningIn the event of Azure infrastructure resources requiring a full recovery, this can be achieved by re-deploying the platform using the automated provisioning pipelines (Bicep IaC via azd provision).
Data Archival and Retention> [Data archival and retention policies to be defined during delivery phase.]

6.7 Capacity Planning

Capacity planning for the [Platform Name] is driven by the number of registered use cases, concurrent users, and data processing volume. As this is a newly established (greenfield) platform, capacity planning focuses on projected needs and scaling triggers rather than historical utilisation baselines.

ResourceCurrent UtilisationScaling ThresholdGrowth ProjectionNotes
ASE ComputeN/A — GreenfieldConcurrent request load exceeds single instance capacityScale based on number of registered use casesScale App Service Plan instances within the ASE. CPU-based autoscaling available: scale up when CPU > 70% (cooldown 10 min), scale down when CPU < 25% (cooldown 1 min). Isolated v2 tier SKUs (I1v2/I2v2/I3v2) provide different compute capacities.
Azure OpenAI (Primary)N/A — GreenfieldToken-per-minute (TPM) quota exhaustion (GPT-4o: 30K TPM, Embeddings: 300K TPM)Scale with use case processing volumeAdjust TPM quotas and model deployment regions.
Azure AI SearchN/A — GreenfieldIndex size or query volume exceeds single unit (1 partition, 1 replica)Scale with indexed data volumeScale search units (replicas and partitions).
Cosmos DBN/A — GreenfieldServerless RU consumption patterns indicate provisioned throughput would be more cost-effectiveScale with concurrent user sessionsEvaluate Serverless vs Provisioned throughput mode.
Storage AccountsN/A — GreenfieldAutomatic scalingMonitor for tier optimisationMonitor for hot/cool/archive tier optimisation opportunities. Standard_GRS provides geo-redundancy.

6.8 Failover and High Availability

[Describe the high availability strategy for the platform. Document the redundancy topology, failover mechanisms, and any active/passive or active/active patterns. Consider: compute redundancy (ASE instance count, autoscaling), data redundancy (Cosmos DB multi-region, Storage GRS), and network redundancy (hub failover paths).]

ComponentHA StrategyFailover MechanismNotes
App Service Environment (ASEv3)> [e.g. Single-region with autoscaling]> [e.g. CPU-based autoscaling within ASE]> [Notes]
Azure Cosmos DB> [e.g. Serverless with session consistency]> [e.g. Automatic failover within region]> [Notes]
Azure Storage Account> [e.g. GRS — geo-redundant storage]> [e.g. Automatic failover to paired region]> [Notes]
Azure OpenAI Service> [e.g. Single-region Standard deployment]> [e.g. Manual failover to secondary region]> [Notes]
Azure AI Search> [e.g. Single replica, single partition]> [e.g. Scale replicas for HA]> [Notes]
PaloAlto NVA> [e.g. Active/Passive pair in hub VNet]> [e.g. Automatic failover to standby NVA]> [Notes]

6.9 Disaster Recovery

[Describe the disaster recovery strategy for the platform. Specify RPO/RTO targets, DR site architecture, failover procedures, and testing cadence. Consider: paired Azure region strategy, data replication, infrastructure rebuild capability (IaC), and communication/notification procedures.]

AspectDetail
RPO (Recovery Point Objective)> [Target RPO — e.g. 24 hours. Maximum acceptable data loss.]
RTO (Recovery Time Objective)> [Target RTO — e.g. 4 hours. Maximum acceptable downtime.]
DR Region> [e.g. Australia Southeast (paired region for Australia East)]
Data Replication> [e.g. Storage GRS provides automatic geo-replication. Cosmos DB single-region with backup policy.]
Infrastructure Rebuild> [e.g. Full platform can be redeployed to DR region using Bicep IaC templates via azd provision.]
Application Recovery> [e.g. Docker images in ACR can be replicated to DR region. CI/CD pipelines can target DR environment.]
Failover Procedure> [e.g. 1. Assess outage scope. 2. Trigger IaC deployment to DR region. 3. Update DNS/routing. 4. Validate services. 5. Notify stakeholders.]
Failback Procedure> [e.g. 1. Confirm primary region recovery. 2. Sync any data changes. 3. Redirect traffic to primary. 4. Decommission DR resources.]
DR Testing Cadence> [e.g. Annual DR test with documented results and lessons learned.]

7 Information View

7.1 System of Record

The [Platform Name] is not a system of record. It processes copies of source data provided by upstream systems and generates AI-derived outputs for consumption by platform users. Source data remains governed by its respective system of record.

Data ObjectSystem of RecordCopyImpact Description
Source data filesUpstream systems ([Source System Names])[Platform Name] (Storage Account)Consumer — processes copies for AI analysis
AI-generated outputs[Platform Name][Platform Name] (Storage Account, Cosmos DB)Producer — generates and stores AI-derived reports and insights
Chat conversation logs[Platform Name] (Cosmos DB)N/AProducer — stores user interaction history
Platform configurations[Platform Name] (Storage Account)N/AProducer — stores AI processing configurations as JSON in config blob container
Workspace configurations[Platform Name] (Cosmos DB)N/AProducer — stores workspace and tenant configurations in workspaces and configurations containers

7.2 Data Governance

[Describe how data is governed on the platform. Replace [Client] references with actual client name and policies.]

The [Platform Name] adheres to the [Client] Data Governance and Classification Standard, which specifies [Client]‘s requirements for the accurate classification of data and the level of protection applied to data and its use.

Data shared with the Generative AI services (Azure OpenAI) is used only for transient processing — it is not persisted within the AI model or used for training the AI model.

7.3 Data Migration

No data migration is required for the platform. The platform ingests data from upstream systems on-demand and does not replace any existing data stores.

7.4 Privacy and Data Protection

The platform processes data that may include Personally Identifiable Information (PII). The following protections are in place:

AspectDescription
Privacy Impact Assessment> [Reference to PIA document or “To be conducted during delivery phase.”]
Personal Data Types> [Types of personal data the platform may process, e.g. customer records, interaction history, correspondence, case notes.]
Compliance Obligations> [Applicable privacy legislation, e.g. Australian Privacy Act, GDPR. [Client] Data Classification Standard.]
Data Subject RightsHandled via existing [Client] data governance processes.
Network IsolationAll platform resources are accessible only from [Client] private networks via Private Endpoints. No public internet exposure.
Data ResidencyAll data is processed and stored within the [Client] Azure tenant ([Azure Region] region).
Transient ProcessingAzure OpenAI processes data transiently — no data is persisted in AI models or used for model training.
Access ControlRole-based access controls ensure only authorised users can access processed data and AI outputs.
EncryptionData is encrypted at rest and in transit across all platform components (TLS 1.2 minimum).
Retention> [Data retention policies to be defined during delivery. Short retention periods recommended for transient AI workloads.]

8 Security View

8.1 Information Classification

[Classify information types handled by the platform using [Client]‘s information classification standard. The table below is an example — replace with actual data types and classifications.]

Information TypeClassification LevelNon-Compliance and Exceptions
Source data files[Classification Level]> [Compliance status and security measures applied.]
AI-generated outputs[Classification Level] (derived)> [Compliance status and security measures applied.]
Platform configuration dataInternalStandard protection measures applied.

8.2 Authentication and Authorisation

The platform defines the following user roles. Specific use case role mappings are documented in the respective SDDs.

Conceptual RoleDescriptionApp InterfacesAuthentication TypeAccess
Standard UsersUsers who interact with the Chat Web App to query AI-generated insights and results. All data they interact with is data they are authorised to access.Chat Web AppAzure Active Directory + MFASource data files, AI-generated outputs
Admin UsersUsers who interact with the Admin Web App to configure AI processing parameters. They have elevated access for platform configuration management.Chat Web App, Admin Web AppAzure Active Directory + MFASource data files, AI-generated outputs, Platform configurations
Technical SupportUsers who interact with all components of the platform to provide support to Standard Users and Admin Users.Chat Web App, Admin Web App, Azure PortalAzure Active Directory + MFASource data files, AI-generated outputs, Platform configurations, Azure Resources
System IntegrationsAutomated processes (e.g. RPA components) that interact with the platform to upload source data files and trigger AI processing workflows.Azure Function App (API)Service Principal / Managed IdentitySource data files, AI-generated outputs

Azure Web Apps (Chat Web App, Admin Web App) have Microsoft AAD authentication enabled via the in-built Authentication mechanism.

[Document Azure Role Assignments (RBAC) per environment as applicable. The RBAC matrix below is an example — adjust for [Client]‘s IAM model.]

Azure Role Assignments

[The below RBAC roles are assigned at the Resource Group level per environment. Adjust based on [Client]‘s access management framework.]

RoleStandard UserAdmin UserTechnical Support
DEVPPDPRDDEVPPDPRDDEVPPDPRD
ContributorNONONONONONO[As per CR]NONO
ReaderNONONONOYESYES[Inherited]YESYES
Storage Blob Data ContributorNONONONOYESYES[Inherited]YESYES
Cognitive Services OpenAI ContributorNOYESYESNOYESYES[Inherited]YESYES
Search Service ContributorNOYESYESNOYESYES[Inherited]YESYES
Search Index Data ContributorNOYESYESNOYESYES[Inherited]YESYES
Storage Blob Data ReaderNOYESYESNO[Inherited][Inherited][Inherited][Inherited][Inherited]

8.3 Security Controls

Security ItemControls
Azure Role-Based Access Controls (RBAC)Each user type requires a specific set of access controls as defined in the Azure Role Assignments table above. RBAC is enforced at the Resource Group level.
Change / Privileged ManagementChanges to the platform must go through standard [Client] DevOps change management. Changes to platform components must go through the standard change management process. Technical Support will request access to platform components when raising a Change Request (CR) and access will be revoked after the CR window. Privileged management for cloud resources follows [Client] cloud standards.
Network ControlsAzure assets (Storage, Key Vault, AI Services, Cosmos DB, AI Search, Container Registry) only accept traffic from specified VNets via Private Endpoints. All PaaS services enforce network ACLs with default deny and authorised VNet subnet + IP range allow lists. Storage Accounts grant private link access to Document Intelligence, Speech, and Computer Vision for BYOS connectivity. Key Vault uses RBAC authorisation mode with AzureServices bypass. App Services have public_network_access_enabled = false and vnet_image_pull_enabled = true. Managed Identities are used instead of key-based authentication for all communication between Azure PaaS components. Data is encrypted via TLS (1.2 minimum) between all components.
DevOpsGitHub encrypts data at rest and in transit. 2FA is supported. Azure Key Vault is used for secrets management. GitHub performs code scanning for vulnerability monitoring. CI/CD pipelines use federated credentials (OIDC) — no stored secrets.

8.4 Auditing and Logging

Auditing and logging are addressed through the Guardrails adherence (3.5 Guardrails and Compliance — Logging guardrail). Key aspects:

  • All applicable Azure resources have diagnostic settings enabled, forwarding logs and metrics to the centralised Log Analytics workspace (30-day retention).
  • Resources with logging enabled: Key Vault, Storage Account Blobs, App Service, Azure Function App, Cosmos DB, Container Registry, AI Services, AI Search.
  • Application-level logging is captured via Application Insights with OpenTelemetry instrumentation.
  • Dashboards and Workbooks provide consolidated views of platform health, performance, and security events.
  • Alert Rules are configured to trigger notifications when specified conditions are met (e.g. service health degradation, error rate thresholds).

9 Support View

9.1 Service Classification

[Classify the service level for platform components (e.g. Platinum, Gold, Silver, Bronze). To be determined during delivery phase.]

9.2 Support Model

[Describe the support model — who supports what, escalation paths, etc. The table below shows a typical pattern — adjust for [Client]‘s support structure.]

TeamResponsibility
Business Operations — Admin UsersEscalate requests to other support teams when required for either bugs or enhancements.
[Client] IT — Technical SupportAct as the first point of escalation for enhancements and bugs relating to the platform.
[Client] CloudOps — Technical SupportAct as the second point of escalation for when IaC or IAM changes are required to be deployed.
calab.ai — Technical SupportAct as the second point of escalation for when platform-specific bugs or enhancements require additional assistance in collaboration with the [Client] IT team.

9.3 Non-Functional Requirements

9.3.1 Scalability

  • The platform is designed to support multiple concurrent use cases with independent data processing pipelines.
  • Azure AI services can be scaled horizontally by adjusting TPM quotas (OpenAI), search replicas/partitions (AI Search), and compute instances (ASE).
  • ASE provides dedicated compute that can be scaled within the environment to handle increased workload. CPU-based autoscaling is available on App Service Plans within the ASE (scale up > 70% CPU, scale down < 25% CPU). Isolated v2 tier SKUs (I1v2/I2v2/I3v2) provide different compute capacities.
  • Cosmos DB Serverless mode automatically scales with demand; can be migrated to provisioned throughput if usage patterns warrant.
  • Storage Accounts scale automatically with Standard_GRS providing geo-redundancy.

9.3.2 Maintainability

  • All infrastructure is provisioned via Bicep templates (IaC), enabling version-controlled and repeatable deployments via Azure Developer CLI.
  • Application code and infrastructure definitions are stored in GitHub with CI/CD pipelines (GitHub Actions) for automated deployment.
  • Tag-driven release process ensures traceability: every deployment maps to a semantic version tag.
  • Environment parity (DEV/PPD/PRD) ensures that changes can be tested in non-production before promotion.
  • Platform configurations are stored as JSON in Azure Blob Storage (config container), enabling versioning and rollback.
  • Python dependencies managed via Poetry for reproducible builds. Frontend dependencies managed via npm.

9.3.3 Security

  • All resources are deployed behind Private Endpoints with no public internet exposure. All PaaS services enforce network ACLs with default deny and authorised subnet/IP allow lists.
  • App Services have public_network_access_enabled = false and vnet_image_pull_enabled = true to ensure container images are pulled via the VNet.
  • User authentication requires Azure AD (Entra ID) with MFA enabled.
  • Managed Identities (SystemAssigned) are used for all service-to-service communication, eliminating stored credentials.
  • Secrets are managed in Azure Key Vault with access policies restricted to authorised identities.
  • Network traffic is inspected by PaloAlto NVA for all ingress, egress, and inter-VNet flows.
  • DevOps security is enforced via GitHub code scanning and vulnerability monitoring.
  • CI/CD pipelines use OIDC federated credentials — no client secrets stored in pipelines.
  • TLS 1.2 minimum enforced across all components. HTTPS-only access to all web applications.
  • Content Safety service provides content moderation for AI-generated outputs.

9.3.4 Reusability

  • Platform capabilities (document processing, transcription, indexing, search, LLM reasoning, content safety) are designed as shared services that can be consumed by any registered use case.
  • New use cases are onboarded by registering in the 4 Use Case Register and creating an SDD — without requiring new infrastructure provisioning.
  • Configuration-driven architecture allows behaviour to be customised per use case via JSON configuration files managed through the Admin App.
  • Multiple orchestration strategies (OpenAI Agents, OpenAI Function Calling, LangChain, Prompt Flow) can be selected per use case without platform changes.
NameLayerDescriptionRelated Feature / Epic
Document Processing PipelineAI Pre-TrainerIngestion, chunking, embedding, and indexing pipeline configurable for multiple document typesShared across all use cases
Conversational InterfaceAI EngineChat Web App with streaming, citations, voice, and agent personasShared across all use cases
Admin ConfigurationAI Pre-TrainerWorkspace and prompt configuration management via Admin Web AppShared across all use cases
Monitoring StackOperational ServicesApp Insights, Log Analytics, Dashboards, Workbooks, Alert RulesShared across all use cases

9.3.5 Recoverability

  • Infrastructure can be fully rebuilt from Bicep definitions and GitHub repositories using azd provision.
  • Application containers can be rebuilt from Dockerfiles and redeployed from ACR.
  • AI-generated outputs can be regenerated by reprocessing source data through the platform pipelines.
  • No critical state is stored exclusively within the platform that cannot be reconstructed from source systems or IaC definitions.
  • Azure service-level SLAs provide built-in redundancy for individual components.
  • Cosmos DB Serverless with Session consistency provides built-in resilience for conversation state.

9.4 Monitoring and Observability

CategoryTool / ServiceDescription
Application MonitoringApplication InsightsCaptures live application logs, performance metrics, and usage patterns. OpenTelemetry instrumentation for distributed tracing.
Infrastructure MonitoringLog Analytics WorkspaceCollects telemetry from all Azure resources for health and diagnostic analysis. SKU: PerGB2018, 30-day retention.
AlertingAlert RulesConfigured for service health, error rates, and performance thresholds with notification actions.
DashboardsAzure Monitor DashboardCentralised, real-time views of platform health and performance metrics. Includes charts for sessions, users, failures, response time, CPU, and memory.
Interactive ReportingAzure Monitor WorkbookInteractive reporting combining text, queries, and visualisations for deep-dive analysis.
Diagnostic SettingsAzure MonitorEnabled on all applicable resources, forwarding logs and metrics to Log Analytics. Resources covered: Key Vault, Storage Blobs, App Service, Function App, Cosmos DB, Container Registry, AI Services, AI Search.
Health ChecksApp Service Health CheckHealth check endpoint (/api/health) configured on Chat Web App with automatic instance replacement on failure.

10 Appendix

10.1 Glossary

Acronym / TermFull Name / Description
PADPlatform Architecture Document
OADOpportunity Assessment Document
SDDSolution Design Document
ASEApp Service Environment — a dedicated, isolated Azure hosting environment for App Services and Function Apps. Version 3 (ASEv3) is used, with Internal Load Balancing (ILB) mode for fully private traffic routing.
NVANetwork Virtual Appliance — a virtual machine (PaloAlto) that provides network security functions such as firewalling and traffic inspection.
VNetVirtual Network — Azure’s network isolation construct for hosting cloud resources.
PaaSPlatform as a Service — Azure managed services (e.g. Cosmos DB, AI Search, Storage Accounts).
IaCInfrastructure as Code — the practice of managing infrastructure through version-controlled code (Bicep).
RBACRole-Based Access Control — Azure’s authorisation model for granting granular access to resources.
MFAMulti-Factor Authentication — requiring multiple verification factors for user authentication.
RAGRetrieval Augmented Generation — a pattern that combines information retrieval with generative AI for grounded responses.
LLMLarge Language Model — AI models (e.g. GPT-4o) capable of natural language understanding and generation.
PIIPersonally Identifiable Information — data that can be used to identify an individual.
ZPAZscaler Private Access — zero-trust network access service for secure remote connectivity.
WVDWindows Virtual Desktop — Azure-hosted virtual desktop infrastructure.
TPMTokens Per Minute — Azure OpenAI rate limiting metric for API consumption.
GRSGeo-Redundant Storage — Azure storage redundancy option replicating data across paired regions.
ACRAzure Container Registry — managed Docker container registry service in Azure.
OIDCOpenID Connect — authentication protocol used for federated identity in CI/CD pipelines.
SDWANSoftware-Defined Wide Area Network — technology for site-to-site connectivity between corporate offices and Azure.
ASPApp Service Plan — Azure’s shared hosting plan for web apps (alternative to ASE).
ILBInternal Load Balancer — a load balancer mode where all traffic is routed internally within a VNet with no public-facing endpoints.
BYOSBring Your Own Storage — a pattern where Azure AI services (Speech, Document Intelligence, Computer Vision) access a customer-managed Storage Account directly via private link.
PEPrivate Endpoint — a network interface that connects privately and securely to an Azure PaaS service via Azure Private Link.
CRChange Request — formal request to make changes to production systems or access.
RCRelease Candidate — a pre-release version tag (e.g. v1.0.0-rc.1) deployed to staging for validation.
GAGeneral Availability — a stable release version tag (e.g. v1.0.0) deployed to production.

10.2 Archived Designs

[If any platform designs have been superseded, archive them here for reference.]