Files
virtual-banker/docs/ARCHITECTURE.md

8.1 KiB

Virtual Banker Architecture

Overview

The Virtual Banker is a multi-layered system that provides a digital human banking experience with full video realism, real-time voice interaction, and embeddable widget capabilities.

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Client Layer                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Embeddable Widget (React/TypeScript)                 │  │
│  │  - Chat UI                                             │  │
│  │  - Voice Controls                                      │  │
│  │  - Avatar View                                         │  │
│  │  - WebRTC Client                                       │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    Edge Layer                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ CDN          │  │ API Gateway  │  │ WebRTC       │     │
│  │ (Widget)     │  │ (Auth/Rate)  │  │ Gateway      │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    Core Services                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Session      │  │ Orchestrator │  │ LLM Gateway  │     │
│  │ Service      │  │              │  │              │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ RAG Service  │  │ Tool/Action  │  │ Safety/      │     │
│  │              │  │ Service      │  │ Compliance   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    Media Services                            │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ ASR Service  │  │ TTS Service   │  │ Avatar       │     │
│  │ (Streaming)  │  │ (Streaming)   │  │ Renderer     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    Data Layer                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ PostgreSQL   │  │ Redis        │  │ Vector DB    │     │
│  │ (State)      │  │ (Cache)      │  │ (pgvector)   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Data Flow

Voice Turn Flow

  1. User speaks → Widget captures audio via microphone
  2. Audio stream → WebRTC gateway → ASR service
  3. ASR → Transcribes to text (partial + final)
  4. Orchestrator → Sends transcript to LLM with context
  5. LLM → Generates response + tool calls + emotion tags
  6. TTS → Converts text to audio stream
  7. Avatar → Generates visemes, expressions, gestures
  8. Widget → Plays audio, displays captions, animates avatar

Text Turn Flow

  1. User types → Widget sends text message
  2. Orchestrator → Processes message (same as step 4+ above)

Components

Backend Services

Session Service

  • Creates and manages sessions
  • Issues ephemeral tokens
  • Loads tenant configurations
  • Tracks session state

Conversation Orchestrator

  • Maintains conversation state machine
  • Routes messages to appropriate services
  • Handles barge-in (interruptions)
  • Synchronizes audio/video

LLM Gateway

  • Multi-tenant prompt templates
  • Function/tool calling
  • Output schema enforcement
  • Model routing

RAG Service

  • Document ingestion and embedding
  • Vector similarity search
  • Reranking
  • Citation formatting

Tool/Action Service

  • Tool registry and execution
  • Banking service integrations
  • Human-in-the-loop confirmations
  • Audit logging

Frontend Widget

Components

  • ChatPanel: Main chat interface
  • VoiceControls: Push-to-talk, hands-free, volume
  • AvatarView: Video stream display
  • Captions: Real-time captions overlay
  • Settings: User preferences

Hooks

  • useSession: Session management
  • useConversation: Message handling
  • useWebRTC: WebRTC connection

Avatar System

Unreal Engine

  • Digital human character
  • Blendshapes for visemes/expressions
  • Animation blueprints
  • PixelStreaming for video output

Render Service

  • Controls Unreal instances
  • Manages GPU resources
  • Streams video via WebRTC

Security

  • JWT/SSO authentication
  • Ephemeral session tokens
  • PII redaction
  • Content filtering
  • Rate limiting
  • Audit trails

Accessibility

  • WCAG 2.1 AA compliance
  • Keyboard navigation
  • Screen reader support
  • Captions (always available)
  • Reduced motion support
  • ARIA labels

Scalability

  • Stateless services (behind load balancer)
  • Redis for session caching
  • PostgreSQL for persistent state
  • GPU cluster for avatar rendering
  • CDN for widget assets