34  System Architectures

34.1 Introduction

Modern machine learning systems require sophisticated architectural patterns to handle the complexity of production deployments (Sculley et al. 2015; Zhou, Yu, and Ding 2020). These systems must address the unique challenges of deploying, scaling, and maintaining data applications in production environments.

We will explore four fundamental architectural patterns and three critical communication paradigms that form the backbone of scalable ML systems.

34.2 Architectural Patterns

Architecture patterns provide the scaffolding for organizing components, managing data flow, and ensuring reliability. Successful machine learning systems often combine multiple patterns, for example using a layered architecture for vertical organization and microservices for horizontal decomposition.

Layered Architecture

The layered architecture, also known as \(n\)-tier architecture, is the most fundamental pattern for organizing ML systems into horizontal layers with dedicated responsibilities (Fowler 2002). The architecture is particularly effective for traditional ML applications (Buschmann et al. 1996).

The presentation layer handles user interfaces and API endpoints, the business logic layer contains feature engineering and model inference logic, the data access layer manages interactions with data sources and so on (Figure 34.1). An advantage of the layered architecture is the clear separation of concerns; teams can work independently on different layers, provided well-defined interfaces between the layers are maintained.

graph TD
    A[Presentation Layer<br/>Web UI, Mobile Apps] --> B[Application Layer<br/>Business Logic, APIs]
    B --> C[ML Service Layer<br/>Model Inference, Feature Engineering]
    C --> D[Data Access Layer<br/>Database, File Systems]
    D --> E[Infrastructure Layer<br/>Compute, Storage, Networking]

Figure 34.1: Layered (\(n\)-tier) architecture.

Key Characteristics

  • Separation of Concerns: Each layer has a specific responsibility, making the system easier to maintain and test. The ML service layer is isolated from presentation and data concerns.
  • Unidirectional Dependencies: Higher layers depend on lower layers, but not vice versa. This creates a stable dependency structure.
  • Scalability: Individual layers can be scaled independently based on demand patterns.

Use Cases

  • Enterprise ML Applications: Where governance and clear boundaries are critical
  • Batch Processing Systems: Traditional ETL pipelines with ML components
  • Regulatory Environments: Where audit trails and separation of duties are required

Advantages

  • Clear organizational structure
  • Easy to understand and maintain
  • Good for team specialization
  • Supports incremental deployment

Disadvantages

  • Can become rigid and slow to change
  • May introduce unnecessary complexity for simple systems
  • Difficult to implement changes that span multiple payers
  • Performance overhead when data traverses multiple layers

Service-Oriented Architecture (SOA)

Service-oriented architecture organizes ML systems as a collection of loosely coupled services that communicate through well-defined interfaces (Erl 2005). The granularity of the individual services can vary. Services can include data preprocessing, feature engineering, model training, model serving, model monitoring, and so on. The capabilities of each service can be developed independently, using the best tools and frameworks for the specific task (Figure 34.2).

The enterprise service bus (ESB) is the middleware through which the services communicates. It has been described as the nervous system for service-oriented architectures. Built on protocols such as REST, SOAP, or gRPC, ESB routes messages between services based on content or headers, handling authentication and authorization, logging, error handling.

SOA emphasizes service reuse and business capability alignment, making it particularly valuable for enterprise ML deployments (Papazoglou et al. 2007). In principle, you can replace the internals of the model inference service, for example, without impacting other services in the architecture. In principle, a single model validation service can be used in multiple architectures across the enterprise.

graph TB
    subgraph "Service Bus / ESB"
        SB[Message Routing & Transformation]
    end
    
    subgraph "ML Services"
        MS1[Feature Engineering<br/>Service]
        MS2[Model Training<br/>Service]
        MS3[Model Inference<br/>Service]
        MS4[Model Validation<br/>Service]
    end
    
    subgraph "Business Services"
        BS1[Customer Service]
        BS2[Product Service]
        BS3[Order Service]
    end
    
    subgraph "Data Services"
        DS1[Data Warehouse<br/>Service]
        DS2[Streaming Data<br/>Service]
        DS3[Model Registry<br/>Service]
    end
    
    MS1 <--> SB
    MS2 <--> SB
    MS3 <--> SB
    MS4 <--> SB
    BS1 <--> SB
    BS2 <--> SB
    BS3 <--> SB
    DS1 <--> SB
    DS2 <--> SB
    DS3 <--> SB

Figure 34.2: Service-oriented architecture.

Key Characteristics

  • Service Contracts: Well-defined interfaces that specify how services interact, including data formats and communication protocols.
  • Loose Coupling: Services are independently deployable and maintainable, with minimal dependencies on other services’ internal implementations.
  • Service Reuse: ML models and data processing logic can be reused across multiple business applications.

Use Cases

  • Multi-Application Environments: Where the same ML models serve multiple business applications
  • Legacy System Integration: Wrapping existing ML models as services for broader consumption
  • Cross-Functional Teams: Where different teams need to share ML capabilities

Advantages

  • Reuse of ML components (develop once, use in many apps)
  • Good for complex enterprise environments
  • Supports gradual modernization
  • Clear service boundaries

Disadvantages

  • ESB can become a bottleneck
  • Complex governance requirements
  • Potential for service sprawl (where the number of services grows uncontrollably, leading to management and operational challenges)
  • Higher operational complexity

Microservices Architecture

The microservices architecture is an extension of SOA where the services are more granular and the communication protocol is more lightweight (REST APIs). As in SOA, each service performs tasks independently, responsible for a specific business capability or ML function (Newman 2015; Fowler and Lewis 2014). While SOA is an enterprise-wide architecture, microservices architectures are more application specific. Services in SOA use shared data storage while microservices rely on their own data storage. Microservices can be deployed independently while service deployment in SOA requires more complex integration.

The granularity of the microservices is at the level of data ingestion, feature stores, model repositories, inference engines, etc. Key to the architecture is that each microservice owns its data and business logic; it can be modified without affecting the entire system (Figure 34.3).

The microservices approach has gained significant traction in application architectures in general, and in ML operations due to its scalability and deployment flexibility (Karmel 2020).

graph TB
    subgraph "API Gateway"
        AG[Load Balancing<br/>Authentication<br/>Rate Limiting]
    end
    
    subgraph "ML Microservices"
        M1[Feature Store<br/>Service]
        M2[Model A<br/>Inference]
        M3[Model B<br/>Inference]
        M4[A/B Testing<br/>Service]
        M5[Monitoring<br/>Service]
        M6[Data Validation<br/>Service]
    end
    
    subgraph "Data Layer"
        DB1[(Features DB)]
        DB2[(Model Store)]
        DB3[(Metrics DB)]
        DB4[(Logs)]
    end
    
    subgraph "Infrastructure"
        K8S[Kubernetes Cluster]
        MSG[Message Queue]
        REG[Service Registry]
    end
    
    AG --> M1
    AG --> M2
    AG --> M3
    AG --> M4
    
    M1 --> DB1
    M2 --> DB2
    M3 --> DB2
    M4 --> M2
    M4 --> M3
    M5 --> DB3
    M6 --> DB4
    
    M1 -.-> MSG
    M2 -.-> MSG
    M3 -.-> MSG
    M4 -.-> MSG
    M5 -.-> MSG
    
    M1 -.-> REG
    M2 -.-> REG
    M3 -.-> REG
    M4 -.-> REG
    M5 -.-> REG

Figure 34.3: Microservices architecture.

Key Characteristics

  • Single Responsibility: Each microservice focuses on one ML capability (e.g., feature engineering, specific model inference, model monitoring).
  • Independent Deployment: Services can be deployed, scaled, and updated independently without affecting other services.
  • Decentralized Data Management: Each service manages its own data storage and state.
  • Technology Diversity: Different services can use different technologies, frameworks, and programming languages.

Use Cases

  • High-Scale ML Platforms: Systems serving millions of requests with different scaling requirements
  • Rapid Development: Teams that need to iterate quickly on different ML components
  • Cloud-Native Applications: Containerized ML workloads in Kubernetes environments

Advantages

  • Independent scaling and deployment
  • Technology flexibility
  • Fault isolation
  • Team autonomy

Disadvantages

  • Increased operational complexity
  • Network latency between services
  • Data consistency challenges
  • Monitoring and debugging complexity

Event-Driven Architecture

The vent-driven architecture organizes ML systems around the generation and detection of, and the reaction to events (Hohpe and Woolf 2003). A data ingestion event triggers a data processing pipeline, which triggers a feature engineering pipeline. The completion of model training triggers a deployment workflow (Figure 34.4).

The event-driven architecture pattern is particularly powerful for real-time ML systems that need to respond to business events and data changes (Chen and Zhang 2014; Akidau, Chernyak, and Lax 2018).

graph TB
    subgraph "Event Sources"
        ES1[User Interactions]
        ES2[IoT Sensors]
        ES3[Database Changes]
        ES4[File Uploads]
        ES5[API Calls]
    end
    
    subgraph "Event Streaming Platform"
        ESP[Apache Kafka<br/>Event Streams]
    end
    
    subgraph "Event Processors"
        EP1[Feature Engineering<br/>Processor]
        EP2[Real-time Inference<br/>Processor]
        EP3[Model Drift Detection<br/>Processor]
        EP4[Alert Generation<br/>Processor]
    end
    
    subgraph "Event Stores"
        EVS1[(Feature Store)]
        EVS2[(Model Predictions)]
        EVS3[(Monitoring Data)]
        EVS4[(Alerts)]
    end
    
    subgraph "Downstream Consumers"
        DC1[Mobile Apps]
        DC2[Web Dashboard]
        DC3[Email Service]
        DC4[Data Warehouse]
    end
    
    ES1 --> ESP
    ES2 --> ESP
    ES3 --> ESP
    ES4 --> ESP
    ES5 --> ESP
    
    ESP --> EP1
    ESP --> EP2
    ESP --> EP3
    ESP --> EP4
    
    EP1 --> EVS1
    EP2 --> EVS2
    EP3 --> EVS3
    EP4 --> EVS4
    
    EVS1 --> ESP
    EVS2 --> ESP
    EVS3 --> ESP
    EVS4 --> ESP
    
    ESP --> DC1
    ESP --> DC2
    ESP --> DC3
    ESP --> DC4

Figure 34.4: Event-driven architecture.

Key Characteristics

  • Event-First Design: The system is designed around events as first-class citizens, with all interactions happening through event publication and consumption.
  • Loose Temporal Coupling: Producers and consumers do not need to be available at the same time, enabling asynchronous processing.
  • Event Sourcing: Complete audit trail of all events, enabling replay and debugging capabilities.

Use Cases

  • Real-Time Recommendation Systems: Responding to user behavior events instantly
  • Fraud Detection: Processing transaction events in real-time
  • IoT (Internet of Things) and Sensor Data: Processing continuous streams of sensor readings
  • Model Monitoring: Responding to model performance degradation events

Advantages

  • Excellent for real-time processing
  • High scalability and fault tolerance
  • Natural audit trail
  • Loose coupling between components

Disadvantages

  • Complex event schema evolution
  • Difficult to implement transactions
  • Debugging can be challenging
  • Eventual consistency model

34.3 Communication Paradigms

Synchronous and Asynchronous Communication

The choice between synchronous and asynchronous communication patterns fundamentally impacts system performance, user experience, and architectural complexity (Tanenbaum and Van Steen 2016; Kleppmann 2017).

Synchronous Communication

sequenceDiagram
    participant Client
    participant API Gateway
    participant ML Service
    participant Database
    
    Note over Client, Database: Synchronous Communication
    Client->>+API Gateway: Request with data
    API Gateway->>+ML Service: Forward request
    ML Service->>+Database: Query features
    Database-->>-ML Service: Return features
    ML Service-->>-API Gateway: Return prediction
    API Gateway-->>-Client: Return response
    
    Note over Client, Database: Client waits for complete response

Figure 34.5: Synchronous communication.

Characteristics

  • Request-response pattern with immediate feedback
  • Client blocks until receiving response
  • Strong consistency guarantees
  • Simpler error handling and debugging

ML Use Cases

  • Real-Time Inference APIs: Credit scoring, fraud detection during transactions
  • Interactive Applications: Chatbots, recommendation widgets
  • Critical Decision Systems: Medical diagnosis, autonomous vehicle control

Trade-offs

  • Advantages: Immediate results, simpler programming model, strong consistency
  • Disadvantages: Poor scalability under load, cascading failures, resource blocking

Asynchronous Communication

sequenceDiagram
    participant Client
    participant API Gateway
    participant Message Queue
    participant ML Service
    participant Database
    participant Notification Service
    
    Note over Client, Notification Service: Asynchronous Communication
    Client->>+API Gateway: Submit request
    API Gateway->>Message Queue: Queue prediction job
    API Gateway-->>-Client: Return job ID
    
    Note over Client: Client continues other work
    
    Message Queue->>+ML Service: Process job
    ML Service->>+Database: Query features
    Database-->>-ML Service: Return features
    ML Service->>Notification Service: Send result
    ML Service-->>-Message Queue: Job complete
    
    Notification Service-->>Client: Notify result ready

Figure 34.6: Asynchronous communication

Characteristics

  • Fire-and-forget or callback-based patterns
  • Client continues processing while request is handled
  • Eventually consistent results
  • Complex error handling and state management

ML Use Cases

  • Batch Predictions: Large dataset scoring, model training
  • Long-Running Inference: Complex computer vision, NLP tasks
  • Background Processing: Feature engineering, model retraining

Trade-offs

  • Advantages: Better scalability, fault tolerance, resource efficiency
  • Disadvantages: Complex programming model, eventual consistency, difficult debugging

Streaming and Batch Processing

The choice between streaming and batch processing determines how your ML system handles data flow and computation timing (Chen and Zhang 2014; Akidau, Chernyak, and Lax 2018). Modern ML systems often implement a Lambda architecture to combine both approaches (Marz and Warren 2015).

graph LR
    subgraph "Streaming Processing"
        SD[Streaming Data<br/>Kafka, Kinesis] --> SP[Stream Processor<br/>Flink, Spark Streaming]
        SP --> RT[Real-time Results<br/>Low Latency]
        SP --> FS[Feature Store<br/>Continuous Updates]
    end
    
    subgraph "Batch Processing"
        BD[Batch Data<br/>Data Lake, Warehouse] --> BP[Batch Processor<br/>Spark, Airflow]
        BP --> BR[Batch Results<br/>High Throughput]
        BP --> DW[Data Warehouse<br/>Scheduled Updates]
    end

Figure 34.7: Batch and streaming processing.

Streaming Processing

Characteristics

  • Continuous data processing as events arrive
  • Low latency, high velocity processing
  • Stateful stream processing capabilities
  • Complex event processing (CEP) support

ML Applications

graph TD
    subgraph "Real-Time ML Pipeline"
        A[Live Data Stream<br/>User Events, Sensors] --> B[Feature Engineering<br/>Windowed Aggregations]
        B --> C[Online Inference<br/>Model Serving]
        C --> D[Real-Time Decisions<br/>Recommendations, Alerts]
        
        E[Model Updates<br/>Online Learning] --> C
        C --> F[Feedback Loop<br/>Performance Metrics]
        F --> E
    end

Figure 34.8: Real-time machine learning pipeline.

Use Cases

  • Fraud Detection: Processing credit card transactions in real-time
  • Recommendation Systems: Updating recommendations based on current user behavior
  • Anomaly Detection: Monitoring system metrics and IoT sensor data
  • Real-Time Personalization: Dynamic content adaptation

Batch Processing

Characteristics

  • Scheduled processing of accumulated data
  • High throughput, higher latency
  • Simpler programming model
  • Better resource utilization for large data sets

ML Applications

graph TD
    subgraph "Batch ML Pipeline"
        A[Historical Data<br/>Daily/Weekly Dumps] --> B[Feature Engineering<br/>Large-Scale Aggregations]
        B --> C[Model Training<br/>Offline Learning]
        C --> D[Model Validation<br/>Backtesting]
        D --> E[Model Deployment<br/>Batch Inference]
        E --> F[Results Storage<br/>Batch Predictions]
    end

Figure 34.9: Batch machine learning pipeline

Use Cases

  • Customer Segmentation: Monthly analysis of customer behavior patterns
  • Demand Forecasting: Weekly/monthly sales predictions
  • Model Training: Periodic retraining on historical data
  • Reporting and Analytics: Daily/weekly ML model performance reports

Hybrid Approaches

Many production ML systems combine both patterns in a Lambda architecture (Figure 34.10).

graph TB
    subgraph "Lambda Architecture"
        A[Raw Data] --> B[Batch Layer]
        A --> C[Speed Layer]
        B --> D[Batch Views]
        C --> E[Real-time Views]
        D --> F[Serving Layer]
        E --> F
        F --> G[Query Interface]
    end

Figure 34.10: Lambda architecture.

Push and Pull Models

The choice between push and pull communication models affects how data flows through your ML system and impacts scalability, reliability, and resource utilization (Hohpe and Woolf 2003; Richardson 2018).

Push Model (Event-Driven)

In the push model, data producers actively send data to consumers when events occur or data becomes available.

graph TD
    subgraph "Push Model Architecture"
        A[Data Source<br/>User Events] --> B[Event Publisher<br/>Kafka Producer]
        B --> C[Message Broker<br/>Kafka Topic]
        C --> D[ML Consumer 1<br/>Feature Engineering]
        C --> E[ML Consumer 2<br/>Real-time Inference]
        C --> F[ML Consumer 3<br/>Monitoring]
        
        G[Model Updates] --> H[Model Registry]
        H --> I[Push Notification]
        I --> J[Model Servers]
    end

Figure 34.11: Push model architecture.

Characteristics

  • Immediate Delivery: Data is delivered as soon as it’s available
  • Event-Driven: Consumers react to events rather than polling
  • Backpressure Handling: Producers must handle cases where consumers can’t keep up
  • Stateful Connections: Often requires persistent connections between producers and consumers

Use Cases

  • Real-Time Feature Updates: Pushing new feature values to online feature stores
  • Model Deployment: Pushing updated models to inference servers
  • Alert Systems: Pushing notifications when model performance degrades
  • Live Dashboards: Pushing real-time ML metrics to monitoring systems

Pull Model

In the pull model, consumers actively request data from producers on a schedule or when needed. It is based on polling services.

graph TD
    subgraph "Pull Model Architecture"
        A[Data Source<br/>Feature Store] 
        B[ML Service<br/>Inference Engine]
        C[Model Registry]
        D[Monitoring Service]
        
        B --> |Poll for features| A
        B --> |Check for model updates| C
        D --> |Poll metrics| B
        
        E[Batch Job Scheduler] --> |Trigger every hour| F[Feature Engineering]
        F --> |Pull raw data| G[Data Warehouse]
    end

Figure 34.12: Pull model architecture.

Characteristics

  • Consumer-Controlled: Consumers decide when to request data
  • Stateless: No persistent connections required
  • Polling Overhead: Regular polling can waste resources if no new data
  • Eventual Consistency: Some delay between data availability and consumption

Use Cases

  • Batch Feature Retrieval: Pulling features for offline model training
  • Scheduled Model Updates: Periodically checking for new model versions
  • Health Checks: Polling ML services for status and performance metrics
  • Data Synchronization: Pulling updates from external data sources

Comparison and Trade-offs

graph LR
    subgraph "Push Model"
        A[Low Latency<br/>✓] 
        B[Resource Efficient<br/>✓]
        C[Complex Error Handling<br/>✗]
        D[Backpressure Issues<br/>✗]
    end
    
    subgraph "Pull Model"
        E[Simple Error Handling<br/>✓]
        F[No Backpressure<br/>✓]
        G[Higher Latency<br/>✗]
        H[Polling Overhead<br/>✗]
    end

Figure 34.13: Comparison of push and pull models.

34.4 Architectural Decision Framework

When designing ML systems, consider these factors to choose appropriate patterns (Bass, Clements, and Kazman 2012; Richards and Ford 2020):

Latency Requirements

  • Sub-second: Event-driven + Streaming + Push
  • Seconds to minutes: Microservices + Asynchronous + Pull
  • Hours to days: Layered + Batch + Pull

Scale Requirements

  • High throughput: Microservices or Event-driven
  • High availability: SOA or Microservices
  • Global distribution: Event-driven with regional processing

Team Structure

  • Small team: Layered architecture
  • Multiple teams: SOA or Microservices
  • DevOps maturity: Microservices + Event-driven

Common Architectural Combinations

graph TD
    subgraph " "
        A[Event-Driven Architecture<br/>+ Microservices]
        B[Streaming Processing<br/>+ Push Model]
        C[Synchronous Inference<br/>+ Asynchronous Training]
    end

Figure 34.14: Real-time recommendation system.

graph TD
    subgraph " "
        D[Layered Architecture<br/>+ SOA]
        E[Batch Processing<br/>+ Pull Model]
        F[Asynchronous Processing<br/>+ Scheduled Execution]
    end

Figure 34.15: Batch Analytics Platform.

graph TD
    subgraph " "
        G[Microservices<br/>+ Event-Driven]
        H[Streaming + Batch<br/>Lambda Architecture]
        I[Push for Real-time<br/>Pull for Batch]
    end

Figure 34.16: Hybrid ML Platform.

34.5 Conclusion

Modern ML systems require careful consideration of architectural patterns and communication paradigms (Sculley et al. 2015; Paleyes, Urma, and Lawrence 2022). The choice between layered, SOA, microservices, or event-driven architectures depends on your specific requirements for scalability, team structure, and operational complexity.

Similarly, the communication patterns you choose—synchronous vs asynchronous, streaming vs batch, push vs pull—will significantly impact system performance, reliability, and user experience.

Key Takeaways

  1. No single architecture fits all use cases—hybrid approaches are often necessary
  2. Start simple and evolve—begin with layered architectures and evolve to microservices as needed
  3. Consider team capabilities—more complex architectures require more operational maturity
  4. Design for your specific requirements—latency, scale, and consistency needs drive architectural decisions
  5. Plan for evolution—architectures should support changing requirements over time

Further Reading