graph TD A[Presentation Layer<br/>Web UI, Mobile Apps] --> B[Application Layer<br/>Business Logic, APIs] B --> C[ML Service Layer<br/>Model Inference, Feature Engineering] C --> D[Data Access Layer<br/>Database, File Systems] D --> E[Infrastructure Layer<br/>Compute, Storage, Networking]
34 System Architectures
34.1 Introduction
Modern machine learning systems require sophisticated architectural patterns to handle the complexity of production deployments (Sculley et al. 2015; Zhou, Yu, and Ding 2020). These systems must address the unique challenges of deploying, scaling, and maintaining data applications in production environments.
We will explore four fundamental architectural patterns and three critical communication paradigms that form the backbone of scalable ML systems.
34.2 Architectural Patterns
Architecture patterns provide the scaffolding for organizing components, managing data flow, and ensuring reliability. Successful machine learning systems often combine multiple patterns, for example using a layered architecture for vertical organization and microservices for horizontal decomposition.
Layered Architecture
The layered architecture, also known as \(n\)-tier architecture, is the most fundamental pattern for organizing ML systems into horizontal layers with dedicated responsibilities (Fowler 2002). The architecture is particularly effective for traditional ML applications (Buschmann et al. 1996).
The presentation layer handles user interfaces and API endpoints, the business logic layer contains feature engineering and model inference logic, the data access layer manages interactions with data sources and so on (Figure 34.1). An advantage of the layered architecture is the clear separation of concerns; teams can work independently on different layers, provided well-defined interfaces between the layers are maintained.
Key Characteristics
- Separation of Concerns: Each layer has a specific responsibility, making the system easier to maintain and test. The ML service layer is isolated from presentation and data concerns.
- Unidirectional Dependencies: Higher layers depend on lower layers, but not vice versa. This creates a stable dependency structure.
- Scalability: Individual layers can be scaled independently based on demand patterns.
Use Cases
- Enterprise ML Applications: Where governance and clear boundaries are critical
- Batch Processing Systems: Traditional ETL pipelines with ML components
- Regulatory Environments: Where audit trails and separation of duties are required
Advantages
- Clear organizational structure
- Easy to understand and maintain
- Good for team specialization
- Supports incremental deployment
Disadvantages
- Can become rigid and slow to change
- May introduce unnecessary complexity for simple systems
- Difficult to implement changes that span multiple payers
- Performance overhead when data traverses multiple layers
Service-Oriented Architecture (SOA)
Service-oriented architecture organizes ML systems as a collection of loosely coupled services that communicate through well-defined interfaces (Erl 2005). The granularity of the individual services can vary. Services can include data preprocessing, feature engineering, model training, model serving, model monitoring, and so on. The capabilities of each service can be developed independently, using the best tools and frameworks for the specific task (Figure 34.2).
The enterprise service bus (ESB) is the middleware through which the services communicates. It has been described as the nervous system for service-oriented architectures. Built on protocols such as REST, SOAP, or gRPC, ESB routes messages between services based on content or headers, handling authentication and authorization, logging, error handling.
SOA emphasizes service reuse and business capability alignment, making it particularly valuable for enterprise ML deployments (Papazoglou et al. 2007). In principle, you can replace the internals of the model inference service, for example, without impacting other services in the architecture. In principle, a single model validation service can be used in multiple architectures across the enterprise.
graph TB subgraph "Service Bus / ESB" SB[Message Routing & Transformation] end subgraph "ML Services" MS1[Feature Engineering<br/>Service] MS2[Model Training<br/>Service] MS3[Model Inference<br/>Service] MS4[Model Validation<br/>Service] end subgraph "Business Services" BS1[Customer Service] BS2[Product Service] BS3[Order Service] end subgraph "Data Services" DS1[Data Warehouse<br/>Service] DS2[Streaming Data<br/>Service] DS3[Model Registry<br/>Service] end MS1 <--> SB MS2 <--> SB MS3 <--> SB MS4 <--> SB BS1 <--> SB BS2 <--> SB BS3 <--> SB DS1 <--> SB DS2 <--> SB DS3 <--> SB
Key Characteristics
- Service Contracts: Well-defined interfaces that specify how services interact, including data formats and communication protocols.
- Loose Coupling: Services are independently deployable and maintainable, with minimal dependencies on other services’ internal implementations.
- Service Reuse: ML models and data processing logic can be reused across multiple business applications.
Use Cases
- Multi-Application Environments: Where the same ML models serve multiple business applications
- Legacy System Integration: Wrapping existing ML models as services for broader consumption
- Cross-Functional Teams: Where different teams need to share ML capabilities
Advantages
- Reuse of ML components (develop once, use in many apps)
- Good for complex enterprise environments
- Supports gradual modernization
- Clear service boundaries
Disadvantages
- ESB can become a bottleneck
- Complex governance requirements
- Potential for service sprawl (where the number of services grows uncontrollably, leading to management and operational challenges)
- Higher operational complexity
Microservices Architecture
The microservices architecture is an extension of SOA where the services are more granular and the communication protocol is more lightweight (REST APIs). As in SOA, each service performs tasks independently, responsible for a specific business capability or ML function (Newman 2015; Fowler and Lewis 2014). While SOA is an enterprise-wide architecture, microservices architectures are more application specific. Services in SOA use shared data storage while microservices rely on their own data storage. Microservices can be deployed independently while service deployment in SOA requires more complex integration.
The granularity of the microservices is at the level of data ingestion, feature stores, model repositories, inference engines, etc. Key to the architecture is that each microservice owns its data and business logic; it can be modified without affecting the entire system (Figure 34.3).
The microservices approach has gained significant traction in application architectures in general, and in ML operations due to its scalability and deployment flexibility (Karmel 2020).
graph TB subgraph "API Gateway" AG[Load Balancing<br/>Authentication<br/>Rate Limiting] end subgraph "ML Microservices" M1[Feature Store<br/>Service] M2[Model A<br/>Inference] M3[Model B<br/>Inference] M4[A/B Testing<br/>Service] M5[Monitoring<br/>Service] M6[Data Validation<br/>Service] end subgraph "Data Layer" DB1[(Features DB)] DB2[(Model Store)] DB3[(Metrics DB)] DB4[(Logs)] end subgraph "Infrastructure" K8S[Kubernetes Cluster] MSG[Message Queue] REG[Service Registry] end AG --> M1 AG --> M2 AG --> M3 AG --> M4 M1 --> DB1 M2 --> DB2 M3 --> DB2 M4 --> M2 M4 --> M3 M5 --> DB3 M6 --> DB4 M1 -.-> MSG M2 -.-> MSG M3 -.-> MSG M4 -.-> MSG M5 -.-> MSG M1 -.-> REG M2 -.-> REG M3 -.-> REG M4 -.-> REG M5 -.-> REG
Key Characteristics
- Single Responsibility: Each microservice focuses on one ML capability (e.g., feature engineering, specific model inference, model monitoring).
- Independent Deployment: Services can be deployed, scaled, and updated independently without affecting other services.
- Decentralized Data Management: Each service manages its own data storage and state.
- Technology Diversity: Different services can use different technologies, frameworks, and programming languages.
Use Cases
- High-Scale ML Platforms: Systems serving millions of requests with different scaling requirements
- Rapid Development: Teams that need to iterate quickly on different ML components
- Cloud-Native Applications: Containerized ML workloads in Kubernetes environments
Advantages
- Independent scaling and deployment
- Technology flexibility
- Fault isolation
- Team autonomy
Disadvantages
- Increased operational complexity
- Network latency between services
- Data consistency challenges
- Monitoring and debugging complexity
Event-Driven Architecture
The vent-driven architecture organizes ML systems around the generation and detection of, and the reaction to events (Hohpe and Woolf 2003). A data ingestion event triggers a data processing pipeline, which triggers a feature engineering pipeline. The completion of model training triggers a deployment workflow (Figure 34.4).
The event-driven architecture pattern is particularly powerful for real-time ML systems that need to respond to business events and data changes (Chen and Zhang 2014; Akidau, Chernyak, and Lax 2018).
graph TB subgraph "Event Sources" ES1[User Interactions] ES2[IoT Sensors] ES3[Database Changes] ES4[File Uploads] ES5[API Calls] end subgraph "Event Streaming Platform" ESP[Apache Kafka<br/>Event Streams] end subgraph "Event Processors" EP1[Feature Engineering<br/>Processor] EP2[Real-time Inference<br/>Processor] EP3[Model Drift Detection<br/>Processor] EP4[Alert Generation<br/>Processor] end subgraph "Event Stores" EVS1[(Feature Store)] EVS2[(Model Predictions)] EVS3[(Monitoring Data)] EVS4[(Alerts)] end subgraph "Downstream Consumers" DC1[Mobile Apps] DC2[Web Dashboard] DC3[Email Service] DC4[Data Warehouse] end ES1 --> ESP ES2 --> ESP ES3 --> ESP ES4 --> ESP ES5 --> ESP ESP --> EP1 ESP --> EP2 ESP --> EP3 ESP --> EP4 EP1 --> EVS1 EP2 --> EVS2 EP3 --> EVS3 EP4 --> EVS4 EVS1 --> ESP EVS2 --> ESP EVS3 --> ESP EVS4 --> ESP ESP --> DC1 ESP --> DC2 ESP --> DC3 ESP --> DC4
Key Characteristics
- Event-First Design: The system is designed around events as first-class citizens, with all interactions happening through event publication and consumption.
- Loose Temporal Coupling: Producers and consumers do not need to be available at the same time, enabling asynchronous processing.
- Event Sourcing: Complete audit trail of all events, enabling replay and debugging capabilities.
Use Cases
- Real-Time Recommendation Systems: Responding to user behavior events instantly
- Fraud Detection: Processing transaction events in real-time
- IoT (Internet of Things) and Sensor Data: Processing continuous streams of sensor readings
- Model Monitoring: Responding to model performance degradation events
Advantages
- Excellent for real-time processing
- High scalability and fault tolerance
- Natural audit trail
- Loose coupling between components
Disadvantages
- Complex event schema evolution
- Difficult to implement transactions
- Debugging can be challenging
- Eventual consistency model
34.3 Communication Paradigms
Synchronous and Asynchronous Communication
The choice between synchronous and asynchronous communication patterns fundamentally impacts system performance, user experience, and architectural complexity (Tanenbaum and Van Steen 2016; Kleppmann 2017).
Synchronous Communication
sequenceDiagram participant Client participant API Gateway participant ML Service participant Database Note over Client, Database: Synchronous Communication Client->>+API Gateway: Request with data API Gateway->>+ML Service: Forward request ML Service->>+Database: Query features Database-->>-ML Service: Return features ML Service-->>-API Gateway: Return prediction API Gateway-->>-Client: Return response Note over Client, Database: Client waits for complete response
Characteristics
- Request-response pattern with immediate feedback
- Client blocks until receiving response
- Strong consistency guarantees
- Simpler error handling and debugging
ML Use Cases
- Real-Time Inference APIs: Credit scoring, fraud detection during transactions
- Interactive Applications: Chatbots, recommendation widgets
- Critical Decision Systems: Medical diagnosis, autonomous vehicle control
Trade-offs
- Advantages: Immediate results, simpler programming model, strong consistency
- Disadvantages: Poor scalability under load, cascading failures, resource blocking
Asynchronous Communication
sequenceDiagram participant Client participant API Gateway participant Message Queue participant ML Service participant Database participant Notification Service Note over Client, Notification Service: Asynchronous Communication Client->>+API Gateway: Submit request API Gateway->>Message Queue: Queue prediction job API Gateway-->>-Client: Return job ID Note over Client: Client continues other work Message Queue->>+ML Service: Process job ML Service->>+Database: Query features Database-->>-ML Service: Return features ML Service->>Notification Service: Send result ML Service-->>-Message Queue: Job complete Notification Service-->>Client: Notify result ready
Characteristics
- Fire-and-forget or callback-based patterns
- Client continues processing while request is handled
- Eventually consistent results
- Complex error handling and state management
ML Use Cases
- Batch Predictions: Large dataset scoring, model training
- Long-Running Inference: Complex computer vision, NLP tasks
- Background Processing: Feature engineering, model retraining
Trade-offs
- Advantages: Better scalability, fault tolerance, resource efficiency
- Disadvantages: Complex programming model, eventual consistency, difficult debugging
Streaming and Batch Processing
The choice between streaming and batch processing determines how your ML system handles data flow and computation timing (Chen and Zhang 2014; Akidau, Chernyak, and Lax 2018). Modern ML systems often implement a Lambda architecture to combine both approaches (Marz and Warren 2015).
graph LR subgraph "Streaming Processing" SD[Streaming Data<br/>Kafka, Kinesis] --> SP[Stream Processor<br/>Flink, Spark Streaming] SP --> RT[Real-time Results<br/>Low Latency] SP --> FS[Feature Store<br/>Continuous Updates] end subgraph "Batch Processing" BD[Batch Data<br/>Data Lake, Warehouse] --> BP[Batch Processor<br/>Spark, Airflow] BP --> BR[Batch Results<br/>High Throughput] BP --> DW[Data Warehouse<br/>Scheduled Updates] end
Streaming Processing
Characteristics
- Continuous data processing as events arrive
- Low latency, high velocity processing
- Stateful stream processing capabilities
- Complex event processing (CEP) support
ML Applications
graph TD subgraph "Real-Time ML Pipeline" A[Live Data Stream<br/>User Events, Sensors] --> B[Feature Engineering<br/>Windowed Aggregations] B --> C[Online Inference<br/>Model Serving] C --> D[Real-Time Decisions<br/>Recommendations, Alerts] E[Model Updates<br/>Online Learning] --> C C --> F[Feedback Loop<br/>Performance Metrics] F --> E end
Use Cases
- Fraud Detection: Processing credit card transactions in real-time
- Recommendation Systems: Updating recommendations based on current user behavior
- Anomaly Detection: Monitoring system metrics and IoT sensor data
- Real-Time Personalization: Dynamic content adaptation
Batch Processing
Characteristics
- Scheduled processing of accumulated data
- High throughput, higher latency
- Simpler programming model
- Better resource utilization for large data sets
ML Applications
graph TD subgraph "Batch ML Pipeline" A[Historical Data<br/>Daily/Weekly Dumps] --> B[Feature Engineering<br/>Large-Scale Aggregations] B --> C[Model Training<br/>Offline Learning] C --> D[Model Validation<br/>Backtesting] D --> E[Model Deployment<br/>Batch Inference] E --> F[Results Storage<br/>Batch Predictions] end
Use Cases
- Customer Segmentation: Monthly analysis of customer behavior patterns
- Demand Forecasting: Weekly/monthly sales predictions
- Model Training: Periodic retraining on historical data
- Reporting and Analytics: Daily/weekly ML model performance reports
Hybrid Approaches
Many production ML systems combine both patterns in a Lambda architecture (Figure 34.10).
graph TB subgraph "Lambda Architecture" A[Raw Data] --> B[Batch Layer] A --> C[Speed Layer] B --> D[Batch Views] C --> E[Real-time Views] D --> F[Serving Layer] E --> F F --> G[Query Interface] end
Push and Pull Models
The choice between push and pull communication models affects how data flows through your ML system and impacts scalability, reliability, and resource utilization (Hohpe and Woolf 2003; Richardson 2018).
Push Model (Event-Driven)
In the push model, data producers actively send data to consumers when events occur or data becomes available.
graph TD subgraph "Push Model Architecture" A[Data Source<br/>User Events] --> B[Event Publisher<br/>Kafka Producer] B --> C[Message Broker<br/>Kafka Topic] C --> D[ML Consumer 1<br/>Feature Engineering] C --> E[ML Consumer 2<br/>Real-time Inference] C --> F[ML Consumer 3<br/>Monitoring] G[Model Updates] --> H[Model Registry] H --> I[Push Notification] I --> J[Model Servers] end
Characteristics
- Immediate Delivery: Data is delivered as soon as it’s available
- Event-Driven: Consumers react to events rather than polling
- Backpressure Handling: Producers must handle cases where consumers can’t keep up
- Stateful Connections: Often requires persistent connections between producers and consumers
Use Cases
- Real-Time Feature Updates: Pushing new feature values to online feature stores
- Model Deployment: Pushing updated models to inference servers
- Alert Systems: Pushing notifications when model performance degrades
- Live Dashboards: Pushing real-time ML metrics to monitoring systems
Pull Model
In the pull model, consumers actively request data from producers on a schedule or when needed. It is based on polling services.
graph TD subgraph "Pull Model Architecture" A[Data Source<br/>Feature Store] B[ML Service<br/>Inference Engine] C[Model Registry] D[Monitoring Service] B --> |Poll for features| A B --> |Check for model updates| C D --> |Poll metrics| B E[Batch Job Scheduler] --> |Trigger every hour| F[Feature Engineering] F --> |Pull raw data| G[Data Warehouse] end
Characteristics
- Consumer-Controlled: Consumers decide when to request data
- Stateless: No persistent connections required
- Polling Overhead: Regular polling can waste resources if no new data
- Eventual Consistency: Some delay between data availability and consumption
Use Cases
- Batch Feature Retrieval: Pulling features for offline model training
- Scheduled Model Updates: Periodically checking for new model versions
- Health Checks: Polling ML services for status and performance metrics
- Data Synchronization: Pulling updates from external data sources
Comparison and Trade-offs
graph LR subgraph "Push Model" A[Low Latency<br/>✓] B[Resource Efficient<br/>✓] C[Complex Error Handling<br/>✗] D[Backpressure Issues<br/>✗] end subgraph "Pull Model" E[Simple Error Handling<br/>✓] F[No Backpressure<br/>✓] G[Higher Latency<br/>✗] H[Polling Overhead<br/>✗] end
34.4 Architectural Decision Framework
When designing ML systems, consider these factors to choose appropriate patterns (Bass, Clements, and Kazman 2012; Richards and Ford 2020):
Latency Requirements
- Sub-second: Event-driven + Streaming + Push
- Seconds to minutes: Microservices + Asynchronous + Pull
- Hours to days: Layered + Batch + Pull
Scale Requirements
- High throughput: Microservices or Event-driven
- High availability: SOA or Microservices
- Global distribution: Event-driven with regional processing
Team Structure
- Small team: Layered architecture
- Multiple teams: SOA or Microservices
- DevOps maturity: Microservices + Event-driven
Common Architectural Combinations
graph TD subgraph " " A[Event-Driven Architecture<br/>+ Microservices] B[Streaming Processing<br/>+ Push Model] C[Synchronous Inference<br/>+ Asynchronous Training] end
graph TD subgraph " " D[Layered Architecture<br/>+ SOA] E[Batch Processing<br/>+ Pull Model] F[Asynchronous Processing<br/>+ Scheduled Execution] end
graph TD subgraph " " G[Microservices<br/>+ Event-Driven] H[Streaming + Batch<br/>Lambda Architecture] I[Push for Real-time<br/>Pull for Batch] end
34.5 Conclusion
Modern ML systems require careful consideration of architectural patterns and communication paradigms (Sculley et al. 2015; Paleyes, Urma, and Lawrence 2022). The choice between layered, SOA, microservices, or event-driven architectures depends on your specific requirements for scalability, team structure, and operational complexity.
Similarly, the communication patterns you choose—synchronous vs asynchronous, streaming vs batch, push vs pull—will significantly impact system performance, reliability, and user experience.
Key Takeaways
- No single architecture fits all use cases—hybrid approaches are often necessary
- Start simple and evolve—begin with layered architectures and evolve to microservices as needed
- Consider team capabilities—more complex architectures require more operational maturity
- Design for your specific requirements—latency, scale, and consistency needs drive architectural decisions
- Plan for evolution—architectures should support changing requirements over time
Further Reading
- Martin Fowler’s “Microservices” articles (Fowler and Lewis 2014)
- “Designing Data-Intensive Applications” by Martin Kleppmann (Kleppmann 2017)
- “Building Microservices” by Sam Newman (Newman 2015)
- “Streaming Systems” by Tyler Akidau, Slava Chernyak, and Reuven Lax (Akidau, Chernyak, and Lax 2018)
- “Machine Learning Engineering” by Andriy Burkov (Burkov 2020)