34 System Architectures

34.1 Introduction

Modern machine learning systems require sophisticated architectural patterns to handle the complexity of production deployments (Sculley et al. 2015; Zhou, Yu, and Ding 2020). These systems must address the unique challenges of deploying, scaling, and maintaining data applications in production environments.

We will explore four fundamental architectural patterns and three critical communication paradigms that form the backbone of scalable ML systems.

34.2 Architectural Patterns

Architecture patterns provide the scaffolding for organizing components, managing data flow, and ensuring reliability. Successful machine learning systems often combine multiple patterns, for example using a layered architecture for vertical organization and microservices for horizontal decomposition.

Layered Architecture

The layered architecture, also known as \(n\)-tier architecture, is the most fundamental pattern for organizing ML systems into horizontal layers with dedicated responsibilities (Fowler 2002). The architecture is particularly effective for traditional ML applications (Buschmann et al. 1996).

The presentation layer handles user interfaces and API endpoints, the business logic layer contains feature engineering and model inference logic, the data access layer manages interactions with data sources and so on (Figure 34.1). An advantage of the layered architecture is the clear separation of concerns; teams can work independently on different layers, provided well-defined interfaces between the layers are maintained.

graph TD
    A[Presentation Layer<br/>Web UI, Mobile Apps] --> B[Application Layer<br/>Business Logic, APIs]
    B --> C[ML Service Layer<br/>Model Inference, Feature Engineering]
    C --> D[Data Access Layer<br/>Database, File Systems]
    D --> E[Infrastructure Layer<br/>Compute, Storage, Networking]

Figure 34.1: Layered (\(n\)-tier) architecture.

Key Characteristics

Separation of Concerns: Each layer has a specific responsibility, making the system easier to maintain and test. The ML service layer is isolated from presentation and data concerns.
Unidirectional Dependencies: Higher layers depend on lower layers, but not vice versa. This creates a stable dependency structure.
Scalability: Individual layers can be scaled independently based on demand patterns.

Use Cases

Enterprise ML Applications: Where governance and clear boundaries are critical
Batch Processing Systems: Traditional ETL pipelines with ML components
Regulatory Environments: Where audit trails and separation of duties are required

Advantages

Clear organizational structure
Easy to understand and maintain
Good for team specialization
Supports incremental deployment

Disadvantages

Can become rigid and slow to change
May introduce unnecessary complexity for simple systems
Difficult to implement changes that span multiple payers
Performance overhead when data traverses multiple layers

Service-Oriented Architecture (SOA)

Service-oriented architecture organizes ML systems as a collection of loosely coupled services that communicate through well-defined interfaces (Erl 2005). The granularity of the individual services can vary. Services can include data preprocessing, feature engineering, model training, model serving, model monitoring, and so on. The capabilities of each service can be developed independently, using the best tools and frameworks for the specific task (Figure 34.2).

The enterprise service bus (ESB) is the middleware through which the services communicates. It has been described as the nervous system for service-oriented architectures. Built on protocols such as REST, SOAP, or gRPC, ESB routes messages between services based on content or headers, handling authentication and authorization, logging, error handling.

SOA emphasizes service reuse and business capability alignment, making it particularly valuable for enterprise ML deployments (Papazoglou et al. 2007). In principle, you can replace the internals of the model inference service, for example, without impacting other services in the architecture. In principle, a single model validation service can be used in multiple architectures across the enterprise.

graph TB
    subgraph "Service Bus / ESB"
        SB[Message Routing & Transformation]
    end
    
    subgraph "ML Services"
        MS1[Feature Engineering<br/>Service]
        MS2[Model Training<br/>Service]
        MS3[Model Inference<br/>Service]
        MS4[Model Validation<br/>Service]
    end
    
    subgraph "Business Services"
        BS1[Customer Service]
        BS2[Product Service]
        BS3[Order Service]
    end
    
    subgraph "Data Services"
        DS1[Data Warehouse<br/>Service]
        DS2[Streaming Data<br/>Service]
        DS3[Model Registry<br/>Service]
    end
    
    MS1 <--> SB
    MS2 <--> SB
    MS3 <--> SB
    MS4 <--> SB
    BS1 <--> SB
    BS2 <--> SB
    BS3 <--> SB
    DS1 <--> SB
    DS2 <--> SB
    DS3 <--> SB

Figure 34.2: Service-oriented architecture.

Key Characteristics

Service Contracts: Well-defined interfaces that specify how services interact, including data formats and communication protocols.
Loose Coupling: Services are independently deployable and maintainable, with minimal dependencies on other services’ internal implementations.
Service Reuse: ML models and data processing logic can be reused across multiple business applications.

Use Cases

Multi-Application Environments: Where the same ML models serve multiple business applications
Legacy System Integration: Wrapping existing ML models as services for broader consumption
Cross-Functional Teams: Where different teams need to share ML capabilities

Advantages

Reuse of ML components (develop once, use in many apps)
Good for complex enterprise environments
Supports gradual modernization
Clear service boundaries

Disadvantages

ESB can become a bottleneck
Complex governance requirements
Potential for service sprawl (where the number of services grows uncontrollably, leading to management and operational challenges)
Higher operational complexity

Microservices Architecture

The microservices architecture is an extension of SOA where the services are more granular and the communication protocol is more lightweight (REST APIs). As in SOA, each service performs tasks independently, responsible for a specific business capability or ML function (Newman 2015; Fowler and Lewis 2014). While SOA is an enterprise-wide architecture, microservices architectures are more application specific. Services in SOA use shared data storage while microservices rely on their own data storage. Microservices can be deployed independently while service deployment in SOA requires more complex integration.

The granularity of the microservices is at the level of data ingestion, feature stores, model repositories, inference engines, etc. Key to the architecture is that each microservice owns its data and business logic; it can be modified without affecting the entire system (Figure 34.3).

The microservices approach has gained significant traction in application architectures in general, and in ML operations due to its scalability and deployment flexibility (Karmel 2020).

graph TB
    subgraph "API Gateway"
        AG[Load Balancing<br/>Authentication<br/>Rate Limiting]
    end
    
    subgraph "ML Microservices"
        M1[Feature Store<br/>Service]
        M2[Model A<br/>Inference]
        M3[Model B<br/>Inference]
        M4[A/B Testing<br/>Service]
        M5[Monitoring<br/>Service]
        M6[Data Validation<br/>Service]
    end
    
    subgraph "Data Layer"
        DB1[(Features DB)]
        DB2[(Model Store)]
        DB3[(Metrics DB)]
        DB4[(Logs)]
    end
    
    subgraph "Infrastructure"
        K8S[Kubernetes Cluster]
        MSG[Message Queue]
        REG[Service Registry]
    end
    
    AG --> M1
    AG --> M2
    AG --> M3
    AG --> M4
    
    M1 --> DB1
    M2 --> DB2
    M3 --> DB2
    M4 --> M2
    M4 --> M3
    M5 --> DB3
    M6 --> DB4
    
    M1 -.-> MSG
    M2 -.-> MSG
    M3 -.-> MSG
    M4 -.-> MSG
    M5 -.-> MSG
    
    M1 -.-> REG
    M2 -.-> REG
    M3 -.-> REG
    M4 -.-> REG
    M5 -.-> REG

Figure 34.3: Microservices architecture.

Key Characteristics

Single Responsibility: Each microservice focuses on one ML capability (e.g., feature engineering, specific model inference, model monitoring).
Independent Deployment: Services can be deployed, scaled, and updated independently without affecting other services.
Decentralized Data Management: Each service manages its own data storage and state.
Technology Diversity: Different services can use different technologies, frameworks, and programming languages.

Use Cases

High-Scale ML Platforms: Systems serving millions of requests with different scaling requirements
Rapid Development: Teams that need to iterate quickly on different ML components
Cloud-Native Applications: Containerized ML workloads in Kubernetes environments

Advantages

Independent scaling and deployment
Technology flexibility
Fault isolation
Team autonomy

Disadvantages

Increased operational complexity
Network latency between services
Data consistency challenges
Monitoring and debugging complexity

Event-Driven Architecture

The vent-driven architecture organizes ML systems around the generation and detection of, and the reaction to events (Hohpe and Woolf 2003). A data ingestion event triggers a data processing pipeline, which triggers a feature engineering pipeline. The completion of model training triggers a deployment workflow (Figure 34.4).

The event-driven architecture pattern is particularly powerful for real-time ML systems that need to respond to business events and data changes (Chen and Zhang 2014; Akidau, Chernyak, and Lax 2018).

graph TB
    subgraph "Event Sources"
        ES1[User Interactions]
        ES2[IoT Sensors]
        ES3[Database Changes]
        ES4[File Uploads]
        ES5[API Calls]
    end
    
    subgraph "Event Streaming Platform"
        ESP[Apache Kafka<br/>Event Streams]
    end
    
    subgraph "Event Processors"
        EP1[Feature Engineering<br/>Processor]
        EP2[Real-time Inference<br/>Processor]
        EP3[Model Drift Detection<br/>Processor]
        EP4[Alert Generation<br/>Processor]
    end
    
    subgraph "Event Stores"
        EVS1[(Feature Store)]
        EVS2[(Model Predictions)]
        EVS3[(Monitoring Data)]
        EVS4[(Alerts)]
    end
    
    subgraph "Downstream Consumers"
        DC1[Mobile Apps]
        DC2[Web Dashboard]
        DC3[Email Service]
        DC4[Data Warehouse]
    end
    
    ES1 --> ESP
    ES2 --> ESP
    ES3 --> ESP
    ES4 --> ESP
    ES5 --> ESP
    
    ESP --> EP1
    ESP --> EP2
    ESP --> EP3
    ESP --> EP4
    
    EP1 --> EVS1
    EP2 --> EVS2
    EP3 --> EVS3
    EP4 --> EVS4
    
    EVS1 --> ESP
    EVS2 --> ESP
    EVS3 --> ESP
    EVS4 --> ESP
    
    ESP --> DC1
    ESP --> DC2
    ESP --> DC3
    ESP --> DC4

Figure 34.4: Event-driven architecture.

Key Characteristics

Event-First Design: The system is designed around events as first-class citizens, with all interactions happening through event publication and consumption.
Loose Temporal Coupling: Producers and consumers do not need to be available at the same time, enabling asynchronous processing.
Event Sourcing: Complete audit trail of all events, enabling replay and debugging capabilities.

Use Cases

Real-Time Recommendation Systems: Responding to user behavior events instantly
Fraud Detection: Processing transaction events in real-time
IoT (Internet of Things) and Sensor Data: Processing continuous streams of sensor readings
Model Monitoring: Responding to model performance degradation events

Advantages

Excellent for real-time processing
High scalability and fault tolerance
Natural audit trail
Loose coupling between components

Disadvantages

Complex event schema evolution
Difficult to implement transactions
Debugging can be challenging
Eventual consistency model

34.3 Communication Paradigms

Synchronous and Asynchronous Communication

The choice between synchronous and asynchronous communication patterns fundamentally impacts system performance, user experience, and architectural complexity (Tanenbaum and Van Steen 2016; Kleppmann 2017).

Synchronous Communication

sequenceDiagram
    participant Client
    participant API Gateway
    participant ML Service
    participant Database
    
    Note over Client, Database: Synchronous Communication
    Client->>+API Gateway: Request with data
    API Gateway->>+ML Service: Forward request
    ML Service->>+Database: Query features
    Database-->>-ML Service: Return features
    ML Service-->>-API Gateway: Return prediction
    API Gateway-->>-Client: Return response
    
    Note over Client, Database: Client waits for complete response

Figure 34.5: Synchronous communication.

Characteristics

Request-response pattern with immediate feedback
Client blocks until receiving response
Strong consistency guarantees
Simpler error handling and debugging

ML Use Cases

Real-Time Inference APIs: Credit scoring, fraud detection during transactions
Interactive Applications: Chatbots, recommendation widgets
Critical Decision Systems: Medical diagnosis, autonomous vehicle control

Trade-offs

Advantages: Immediate results, simpler programming model, strong consistency
Disadvantages: Poor scalability under load, cascading failures, resource blocking

Asynchronous Communication

sequenceDiagram
    participant Client
    participant API Gateway
    participant Message Queue
    participant ML Service
    participant Database
    participant Notification Service
    
    Note over Client, Notification Service: Asynchronous Communication
    Client->>+API Gateway: Submit request
    API Gateway->>Message Queue: Queue prediction job
    API Gateway-->>-Client: Return job ID
    
    Note over Client: Client continues other work
    
    Message Queue->>+ML Service: Process job
    ML Service->>+Database: Query features
    Database-->>-ML Service: Return features
    ML Service->>Notification Service: Send result
    ML Service-->>-Message Queue: Job complete
    
    Notification Service-->>Client: Notify result ready

Figure 34.6: Asynchronous communication

Characteristics

Fire-and-forget or callback-based patterns
Client continues processing while request is handled
Eventually consistent results
Complex error handling and state management

ML Use Cases

Batch Predictions: Large dataset scoring, model training
Long-Running Inference: Complex computer vision, NLP tasks
Background Processing: Feature engineering, model retraining

Trade-offs

Advantages: Better scalability, fault tolerance, resource efficiency
Disadvantages: Complex programming model, eventual consistency, difficult debugging

Streaming and Batch Processing

The choice between streaming and batch processing determines how your ML system handles data flow and computation timing (Chen and Zhang 2014; Akidau, Chernyak, and Lax 2018). Modern ML systems often implement a Lambda architecture to combine both approaches (Marz and Warren 2015).

graph LR
    subgraph "Streaming Processing"
        SD[Streaming Data<br/>Kafka, Kinesis] --> SP[Stream Processor<br/>Flink, Spark Streaming]
        SP --> RT[Real-time Results<br/>Low Latency]
        SP --> FS[Feature Store<br/>Continuous Updates]
    end
    
    subgraph "Batch Processing"
        BD[Batch Data<br/>Data Lake, Warehouse] --> BP[Batch Processor<br/>Spark, Airflow]
        BP --> BR[Batch Results<br/>High Throughput]
        BP --> DW[Data Warehouse<br/>Scheduled Updates]
    end

Figure 34.7: Batch and streaming processing.

Streaming Processing

Characteristics

Continuous data processing as events arrive
Low latency, high velocity processing
Stateful stream processing capabilities
Complex event processing (CEP) support

ML Applications

graph TD
    subgraph "Real-Time ML Pipeline"
        A[Live Data Stream<br/>User Events, Sensors] --> B[Feature Engineering<br/>Windowed Aggregations]
        B --> C[Online Inference<br/>Model Serving]
        C --> D[Real-Time Decisions<br/>Recommendations, Alerts]
        
        E[Model Updates<br/>Online Learning] --> C
        C --> F[Feedback Loop<br/>Performance Metrics]
        F --> E
    end

Figure 34.8: Real-time machine learning pipeline.

Use Cases

Fraud Detection: Processing credit card transactions in real-time
Recommendation Systems: Updating recommendations based on current user behavior
Anomaly Detection: Monitoring system metrics and IoT sensor data
Real-Time Personalization: Dynamic content adaptation

Batch Processing

Characteristics

Scheduled processing of accumulated data
High throughput, higher latency
Simpler programming model
Better resource utilization for large data sets

ML Applications

graph TD
    subgraph "Batch ML Pipeline"
        A[Historical Data<br/>Daily/Weekly Dumps] --> B[Feature Engineering<br/>Large-Scale Aggregations]
        B --> C[Model Training<br/>Offline Learning]
        C --> D[Model Validation<br/>Backtesting]
        D --> E[Model Deployment<br/>Batch Inference]
        E --> F[Results Storage<br/>Batch Predictions]
    end

Figure 34.9: Batch machine learning pipeline

Use Cases

Customer Segmentation: Monthly analysis of customer behavior patterns
Demand Forecasting: Weekly/monthly sales predictions
Model Training: Periodic retraining on historical data
Reporting and Analytics: Daily/weekly ML model performance reports

Hybrid Approaches

Many production ML systems combine both patterns in a Lambda architecture (Figure 34.10).

graph TB
    subgraph "Lambda Architecture"
        A[Raw Data] --> B[Batch Layer]
        A --> C[Speed Layer]
        B --> D[Batch Views]
        C --> E[Real-time Views]
        D --> F[Serving Layer]
        E --> F
        F --> G[Query Interface]
    end

Figure 34.10: Lambda architecture.

Push and Pull Models

The choice between push and pull communication models affects how data flows through your ML system and impacts scalability, reliability, and resource utilization (Hohpe and Woolf 2003; Richardson 2018).

Push Model (Event-Driven)

In the push model, data producers actively send data to consumers when events occur or data becomes available.

graph TD
    subgraph "Push Model Architecture"
        A[Data Source<br/>User Events] --> B[Event Publisher<br/>Kafka Producer]
        B --> C[Message Broker<br/>Kafka Topic]
        C --> D[ML Consumer 1<br/>Feature Engineering]
        C --> E[ML Consumer 2<br/>Real-time Inference]
        C --> F[ML Consumer 3<br/>Monitoring]
        
        G[Model Updates] --> H[Model Registry]
        H --> I[Push Notification]
        I --> J[Model Servers]
    end

Figure 34.11: Push model architecture.

Characteristics

Immediate Delivery: Data is delivered as soon as it’s available
Event-Driven: Consumers react to events rather than polling
Backpressure Handling: Producers must handle cases where consumers can’t keep up
Stateful Connections: Often requires persistent connections between producers and consumers

Use Cases

Real-Time Feature Updates: Pushing new feature values to online feature stores
Model Deployment: Pushing updated models to inference servers
Alert Systems: Pushing notifications when model performance degrades
Live Dashboards: Pushing real-time ML metrics to monitoring systems

Pull Model

In the pull model, consumers actively request data from producers on a schedule or when needed. It is based on polling services.

graph TD
    subgraph "Pull Model Architecture"
        A[Data Source<br/>Feature Store] 
        B[ML Service<br/>Inference Engine]
        C[Model Registry]
        D[Monitoring Service]
        
        B --> |Poll for features| A
        B --> |Check for model updates| C
        D --> |Poll metrics| B
        
        E[Batch Job Scheduler] --> |Trigger every hour| F[Feature Engineering]
        F --> |Pull raw data| G[Data Warehouse]
    end

Figure 34.12: Pull model architecture.

Characteristics

Consumer-Controlled: Consumers decide when to request data
Stateless: No persistent connections required
Polling Overhead: Regular polling can waste resources if no new data
Eventual Consistency: Some delay between data availability and consumption

Use Cases

Batch Feature Retrieval: Pulling features for offline model training
Scheduled Model Updates: Periodically checking for new model versions
Health Checks: Polling ML services for status and performance metrics
Data Synchronization: Pulling updates from external data sources

Comparison and Trade-offs

graph LR
    subgraph "Push Model"
        A[Low Latency<br/>✓] 
        B[Resource Efficient<br/>✓]
        C[Complex Error Handling<br/>✗]
        D[Backpressure Issues<br/>✗]
    end
    
    subgraph "Pull Model"
        E[Simple Error Handling<br/>✓]
        F[No Backpressure<br/>✓]
        G[Higher Latency<br/>✗]
        H[Polling Overhead<br/>✗]
    end

Figure 34.13: Comparison of push and pull models.

34.4 Architectural Decision Framework

When designing ML systems, consider these factors to choose appropriate patterns (Bass, Clements, and Kazman 2012; Richards and Ford 2020):

Latency Requirements

Sub-second: Event-driven + Streaming + Push
Seconds to minutes: Microservices + Asynchronous + Pull
Hours to days: Layered + Batch + Pull

Scale Requirements

High throughput: Microservices or Event-driven
High availability: SOA or Microservices
Global distribution: Event-driven with regional processing

Team Structure

Small team: Layered architecture
Multiple teams: SOA or Microservices
DevOps maturity: Microservices + Event-driven

Common Architectural Combinations

graph TD
    subgraph " "
        D[Layered Architecture<br/>+ SOA]
        E[Batch Processing<br/>+ Pull Model]
        F[Asynchronous Processing<br/>+ Scheduled Execution]
    end

Figure 34.15: Batch Analytics Platform.

graph TD
    subgraph " "
        G[Microservices<br/>+ Event-Driven]
        H[Streaming + Batch<br/>Lambda Architecture]
        I[Push for Real-time<br/>Pull for Batch]
    end

Figure 34.16: Hybrid ML Platform.

34.5 Conclusion

Modern ML systems require careful consideration of architectural patterns and communication paradigms (Sculley et al. 2015; Paleyes, Urma, and Lawrence 2022). The choice between layered, SOA, microservices, or event-driven architectures depends on your specific requirements for scalability, team structure, and operational complexity.

Similarly, the communication patterns you choose—synchronous vs asynchronous, streaming vs batch, push vs pull—will significantly impact system performance, reliability, and user experience.

Key Takeaways

No single architecture fits all use cases—hybrid approaches are often necessary
Start simple and evolve—begin with layered architectures and evolve to microservices as needed
Consider team capabilities—more complex architectures require more operational maturity
Design for your specific requirements—latency, scale, and consistency needs drive architectural decisions
Plan for evolution—architectures should support changing requirements over time

34.1 Introduction

34.2 Architectural Patterns

Layered Architecture

Service-Oriented Architecture (SOA)

Microservices Architecture

Event-Driven Architecture

34.3 Communication Paradigms

Synchronous and Asynchronous Communication

Synchronous Communication

Asynchronous Communication

Streaming and Batch Processing

Streaming Processing

Batch Processing

Hybrid Approaches

Push and Pull Models

Push Model (Event-Driven)

Pull Model

Comparison and Trade-offs

34.4 Architectural Decision Framework

Common Architectural Combinations

34.5 Conclusion

Further Reading