Module 10 - Interactive Explainer
Apply DevOps principles to ML workflows - automate testing, deployment, and versioning with CI/CD pipelines that keep AnyCompany models reliable at enterprise scale.
MLOps = ML + DEV + OPS. It applies DevOps principles (automation, monitoring, collaboration) to machine learning systems. At AnyCompany, MLOps ensures that models serving millions of payroll transactions are reliable, reproducible, and continuously improving.
Data scientists build models - attrition prediction, fraud detection, salary benchmarking. But a notebook model is not a production system.
Software engineers write production code - APIs, containers, tests. ML code needs the same rigor as any AnyCompany microservice.
Ops engineers deploy, monitor, and maintain systems. ML models degrade over time - they need operational care like any production service.
| Feature | DevOps | MLOps (Additional) |
|---|---|---|
| Code versioning | ✓ | ✓ (plus data and model versioning) |
| Compute environment | ✓ | ✓ (GPU/Trainium for training) |
| CI/CD | ✓ | ✓ (plus model validation gates) |
| Production monitoring | ✓ | ✓ (plus data drift and model decay) |
| Data provenance | ✓ Track which data trained which model | |
| Dataset management | ✓ Version, validate, and lineage-track datasets | |
| Model registry | ✓ Catalog models with approval workflows | |
| Model build pipelines | ✓ Automated training and evaluation | |
| Model deployment workflows | ✓ Canary/linear traffic shifting with rollback |
Same code + same data = same model. No "works on my laptop" problems. AnyCompany models must produce identical results across dev, staging, and production.
Recreate any past model version exactly. Required for compliance audits: "show me the model that made this decision 6 months ago."
Handle growing data volumes and model complexity. AnyCompany adds new countries and clients continuously - pipelines must scale without manual intervention.
Full lineage: who trained what, when, with which data, who approved deployment. Non-negotiable for AnyCompany regulatory compliance across 140+ countries.
Traditional CI/CD automates code from commit to production. ML CI/CD extends this to handle data pipelines, model training, evaluation gates, and model deployment - all automated.
| Stage | What Happens | Trigger | AnyCompany Example |
|---|---|---|---|
| Data | Ingest, validate, and version new data | New data arrives (scheduled or event) | Monthly payroll data refresh from HRIS |
| Code | Lint, format, static analysis on ML code | Git push to feature branch | Data scientist pushes new feature engineering code |
| Build | Build training containers, resolve dependencies | Merge to main branch | Build XGBoost container with updated preprocessing |
| Test | Unit tests, integration tests, model validation | After successful build | Verify model AUC > 0.75 on validation set |
| Deploy | Deploy model to staging, then production | Tests pass + manual approval | Canary deploy fraud model to 10% of traffic |
| Monitor | Track performance, detect drift, alert on degradation | Continuous in production | Alert if fraud detection recall drops below 90% |
Owner: Data Engineer. ETL pipelines, data validation, feature store ingestion. Changes here trigger data pipeline runs, not model retraining directly.
Owner: Data Scientist. Training scripts, hyperparameters, evaluation logic. Changes here trigger model build pipeline (train + evaluate + register).
Owner: MLOps Engineer. Infrastructure as code, endpoint configs, traffic shifting rules. Changes here trigger deployment pipeline only.
The ML Engineer (your role in this course) bridges all three systems. You understand data pipelines, model building, AND deployment. At AnyCompany, AutoPay Modernization team members own end-to-end ML features.
Manual testing is error-prone and does not scale. At AnyCompany, with models serving millions of transactions, automated tests catch issues before they reach production.
Test individual functions: feature engineering logic, data transformations, preprocessing steps. Fast, run on every commit. "Does this function correctly calculate tenure from hire date?"
Test components working together: data pipeline feeds training, training produces valid model artifacts. "Does the full pipeline from S3 data to registered model work end-to-end?"
Ensure new changes do not degrade existing performance. Compare new model metrics against baseline. "Is the new fraud model at least as good as the current production version?"
Tests run in minutes, not days. Catch issues immediately after code push. AnyCompany developers get feedback before their PR is even reviewed.
Consistent, repeatable checks every time. No human error in test execution. Same tests run in dev, staging, and pre-production.
Test hundreds of scenarios automatically. Edge cases, boundary conditions, multi-country data formats. Impossible to cover manually at AnyCompany scale.
Find bugs in development, not production. A data format issue caught in CI costs $0. The same bug in production affecting payroll costs millions.
AWS provides a complete toolchain for automating ML deployments - from source control through production monitoring.
| Service | Role in Pipeline | Key Features | AnyCompany Use |
|---|---|---|---|
| AWS CodePipeline | Orchestrator - connects all stages | Manual approvals, notifications, security | Orchestrates the full model deployment workflow with approval gates |
| Git Repository | Source control for ML code | Branching, PRs, code review | CodeCommit or GitHub for training scripts, IaC, and pipeline definitions |
| AWS CodeBuild | Build and test | Scalable, logging, artifacts, AWS integration | Build training containers, run unit tests, validate data schemas |
| AWS CloudFormation | Infrastructure as Code | Templates, nested stacks, rollbacks, change sets | Deploy SageMaker endpoints, configure auto-scaling, provision resources |
| AWS CodeDeploy | Deployment automation | Blue/green, rolling, rollback, integrations | Traffic shifting for model endpoint updates with automatic rollback |
SageMaker Projects provides pre-built MLOps templates that wire together all these services automatically.
Git repository with branching strategy. Separate repos for model code and deployment code. PR-based workflow with code review.
EventBridge rules trigger pipelines on code push, new data arrival, or model registration. No manual intervention needed.
Automated: preprocess data, train model, evaluate metrics, register if quality gate passes. Runs on every trigger.
Automated: deploy to staging, run integration tests, manual approval, deploy to production with traffic shifting.
SageMaker Pipelines is purpose-built for ML workflow orchestration. Define your training pipeline as code, with automated quality gates and model governance built in.
Step 1: Preprocess data - Clean, encode, split. Output to SageMaker Feature Store.
Step 2: Train and tune model - XGBoost with automatic hyperparameter tuning. Output model artifacts to S3.
Step 3: Evaluate model - Calculate AUC, precision, recall on test set. Run SageMaker Clarify for bias detection.
Step 4: Quality gate - Is AUC > 0.75? If NO, pipeline fails and alerts team. If YES, continue.
Step 5: Register model - Add to SageMaker Model Registry with version, metrics, and lineage metadata.
| Pipeline | Trigger | Steps | Output |
|---|---|---|---|
| CI (Model Build) | Code push or new data | Validate repo, run tests, build containers, define pipeline, run training | Registered model in Model Registry |
| CD (Model Deploy) | New model registered | Generate deployment templates, deploy to staging, manual approval, deploy to production | Live SageMaker endpoint serving predictions |
Select an AnyCompany ML system to see its recommended MLOps pipeline architecture - triggers, stages, quality gates, and deployment strategy.
Critical real-time model. Monthly retraining on new transaction data. Zero-downtime deployments.
Monthly batch scoring. Quarterly retraining. HR dashboard integration.
Continuous fine-tuning on new Q&A pairs. A/B testing new versions. User-facing chatbot.
| Pipeline Component | Configuration |
|---|---|
| Trigger | EventBridge rule: new data in s3://transactions/monthly/ OR code push to main branch |
| CI Pipeline | Preprocess (SageMaker Processing) → Train (XGBoost) → Evaluate (recall > 95%, precision > 80%) → Register |
| Quality Gate | Automated: AUC > 0.92, Recall > 95%. Plus SageMaker Clarify bias check on protected attributes. |
| CD Pipeline | Deploy to staging → Integration tests (100 known fraud cases) → Manual approval → Linear deploy to production |
| Deployment Strategy | Linear traffic shifting (25% steps, 1-hour bake, CloudWatch alarm auto-rollback) |
| Monitoring | Model Monitor for data drift. CloudWatch alarm if recall drops below 90%. Monthly retraining trigger. |
ML + DEV + OPS. Extends DevOps with data versioning, model registry, and training pipelines. Consistency, reproducibility, auditability.
Unit, integration, regression tests. Quality gates (AUC thresholds). Catch issues in CI, not production.
CodePipeline orchestrates. CodeBuild tests. CloudFormation deploys. CodeDeploy shifts traffic. SageMaker Projects ties it together.
ML-native workflow orchestration. Preprocess, train, evaluate, gate, register. CI builds models, CD deploys them.