Lab 5 — Interactive Explainer
Deploy a model to a real-time endpoint, configure blue/green linear traffic shifting, monitor with CloudWatch alarms, and observe automatic rollback on failure.
You've trained and tuned your model in Labs 3–4. Now it's time to put it into production. This lab teaches you how to deploy a model to a SageMaker real-time endpoint and safely shift traffic from an old model to a new one using blue/green deployment with linear traffic shifting — the same pattern used by AnyCompany to update fraud detection models without downtime.
Duration: ~45 minutes • Phase: Deployment • Prerequisite: Labs 3–4 (trained model artifacts in S3)
Production model
XGBoost 1.5-1
100% traffic initially
Gradual shift
CloudWatch monitoring
Auto-rollback on alarm
New model
Better performance
100% traffic after success
The twist: You first try deploying Model B (which has errors). The CloudWatch alarm fires, traffic automatically rolls back to Model A. Then you deploy Model E (the good model) successfully.
SageMaker hosts your model on dedicated instances with auto-scaling. Invoke via API for sub-second predictions. The endpoint stays live 24/7.
Run old (blue) and new (green) models simultaneously. Gradually shift traffic from blue to green. If green fails, instantly revert to blue.
Monitor 5XX errors and model latency during deployment. If metrics breach thresholds, the alarm triggers automatic rollback — no human intervention needed.
When the alarm fires, SageMaker stops the traffic shift and routes 100% back to the original model. Zero downtime, zero data loss.
The lab walks through a complete deployment lifecycle: create endpoint → test → set alarms → shift traffic → handle failure → retry with good model. Click each node to explore the stage.
Blue/green deployment maintains two identical production environments. The "blue" fleet runs the current model, while the "green" fleet hosts the new model. Traffic is gradually shifted from blue to green using a linear policy.
Instead of switching 100% of traffic instantly (risky), linear shifting moves traffic in increments over a defined period. SageMaker monitors health at each step.
| Time | Blue (Model A) | Green (New Model) | What Happens |
|---|---|---|---|
| T+0 | 100% | 0% | Deployment starts, green fleet provisioned |
| T+1 min | 75% | 25% | First traffic batch shifted, alarms monitored |
| T+2 min | 50% | 50% | Equal split — critical monitoring window |
| T+3 min | 25% | 75% | Majority on green, final validation |
| T+4 min | 0% | 100% | Complete — blue fleet decommissioned |
| Strategy | How It Works | Rollback Speed | Risk Level | Best For |
|---|---|---|---|---|
| All-at-once | Instant 0→100% switch | Manual (minutes) | High | Dev/test environments |
| Canary | Small % first, then all | Automatic (seconds) | Medium | Low-traffic endpoints |
| Linear (this lab) | Equal increments over time | Automatic (seconds) | Low | Production ML models |
| Blue/Green | Full parallel fleet, DNS switch | Instant (DNS) | Lowest | Mission-critical systems |
Intentionally deploys a model that throws 5XX errors. CloudWatch alarm fires → automatic rollback to Model A. Demonstrates the safety net works.
Deploys the properly trained model. No errors during traffic shift → linear policy completes → Model E takes 100% traffic. Uses RetainDeploymentConfig=True to reuse alarm settings.
CloudWatch metrics are the eyes and ears of your deployment. SageMaker emits endpoint metrics automatically — you just need to set alarm thresholds that trigger rollback when something goes wrong.
| Metric | Namespace | What It Measures | Alarm Threshold (Lab 5) |
|---|---|---|---|
| Invocation5XXErrors | AWS/SageMaker | Server-side errors (model crashes, OOM, bad predictions) | > 1% error rate for 1 minute |
| ModelLatency | AWS/SageMaker | Time for model to process a request (ms) | > 5000ms average for 1 minute |
| Invocation4XXErrors | AWS/SageMaker | Client-side errors (bad input format) | Monitored but no alarm |
| OverheadLatency | AWS/SageMaker | SageMaker infrastructure overhead (not model time) | Monitored but no alarm |
| CPUUtilization | /aws/sagemaker/Endpoints | Instance CPU usage during inference | Monitored for capacity planning |
| Invocations | AWS/SageMaker | Total number of requests processed | Monitored for traffic volume |
Metric within threshold
Traffic shift continues
Not enough data points yet
Shift pauses, waits for data
Threshold breached
Immediate rollback triggered
| Phase | Invocations | 5XX Errors | Latency | Outcome |
|---|---|---|---|---|
| Initial (Model A) | ~2000 requests | 0 | ~50ms avg | Baseline established |
| Shift to Model B | Traffic splitting | Errors appear ("E" in output) | Spikes | Alarm fires → rollback |
| After rollback | 100% back to A | 0 | ~50ms | Service restored |
| Shift to Model E | Traffic splitting | 0 | Decreasing | Successful deployment |
The safety net that makes blue/green deployment production-safe. When CloudWatch alarms fire during a traffic shift, SageMaker automatically reverts all traffic to the original model — no human intervention, no downtime.
| Step | What Happens | Duration |
|---|---|---|
| 1. Alarm fires | CloudWatch detects 5XX errors exceed 1% threshold for 1 minute | ~60 seconds |
| 2. Traffic reverts | SageMaker routes 100% traffic back to blue fleet (Model A) | ~10 seconds |
| 3. Green fleet removed | Failed model instances are terminated, endpoint config cleaned up | ~1–2 minutes |
| 4. Status: InService | Endpoint returns to stable state with original model serving all traffic | Immediate |
After a rollback, you fix the model and try again. The RetainDeploymentConfig=True parameter tells SageMaker to reuse the same traffic routing policy and alarm configuration from the failed attempt — no need to reconfigure everything.
Must re-specify the entire DeploymentConfig block: BlueGreenUpdatePolicy, traffic routing, alarm references, wait intervals. Error-prone if done manually.
Just provide the new EndpointConfigName. SageMaker reuses the linear policy, alarm ARNs, and wait intervals from the previous deployment. One line change.
New model throws exceptions — incompatible input format, missing dependencies, OOM on larger inputs. Most common in this lab.
New model is too slow — larger architecture, unoptimized inference code, or insufficient instance size. Breaches latency SLA.
Not directly monitored by CloudWatch alarms in this lab, but in production you'd add custom metrics comparing prediction distributions.
How does blue/green deployment with traffic shifting apply to AnyCompany's ML products? Each product has different risk tolerance, traffic patterns, and rollback requirements.
Real-time endpoint, zero tolerance for downtime. Missed fraud = $50K+ loss per incident.
High-traffic LLM endpoint. Latency-sensitive — users expect sub-2s responses.
Monthly batch inference. Lower risk — can validate offline before switching.
Async processing of tax forms. Throughput matters more than latency.
| Lab 5 Concept | AnyCompany Equivalent | Why It Matters |
|---|---|---|
| Model A (production) | Current fraud model (v2.3) | Serving millions of payroll transactions daily |
| Model B (broken) | Model trained on corrupted data | Would flag legitimate transactions as fraud — business impact |
| Model E (improved) | Retrained model with new fraud patterns | Catches new fraud tactics from recent months |
| Linear traffic shift | Gradual rollout across client segments | Start with low-risk clients, expand to enterprise |
| 5XX alarm | Prediction failure rate alarm | Model crashes = transactions processed without fraud check |
| Latency alarm | SLA breach alarm (<200ms required) | Payroll processing has strict time windows |
| Auto-rollback | Instant revert to proven model | Compliance requirement — cannot have unprotected window |