MLOps & Automated Deployment - Module 10 | AnyCompany ML Engineering

🔄 What is MLOps?

MLOps = ML + DEV + OPS. It applies DevOps principles (automation, monitoring, collaboration) to machine learning systems. At AnyCompany, MLOps ensures that models serving millions of payroll transactions are reliable, reproducible, and continuously improving.

Why MLOps Matters at AnyCompany

🧠

ML (Machine Learning)

Data scientists build models - attrition prediction, fraud detection, salary benchmarking. But a notebook model is not a production system.

💻

DEV (Development)

Software engineers write production code - APIs, containers, tests. ML code needs the same rigor as any AnyCompany microservice.

🔧

OPS (Operations)

Ops engineers deploy, monitor, and maintain systems. ML models degrade over time - they need operational care like any production service.

⚖️ DevOps vs MLOps

Feature	DevOps	MLOps (Additional)
Code versioning	✓	✓ (plus data and model versioning)
Compute environment	✓	✓ (GPU/Trainium for training)
CI/CD	✓	✓ (plus model validation gates)
Production monitoring	✓	✓ (plus data drift and model decay)
Data provenance		✓ Track which data trained which model
Dataset management		✓ Version, validate, and lineage-track datasets
Model registry		✓ Catalog models with approval workflows
Model build pipelines		✓ Automated training and evaluation
Model deployment workflows		✓ Canary/linear traffic shifting with rollback

💡

Key difference: In traditional DevOps, code changes trigger deployments. In MLOps, both code AND data changes can trigger retraining and redeployment. A new month of payroll data may require model refresh even if no code changed.

📋 Nonfunctional Requirements

🔄

Consistency

Same code + same data = same model. No "works on my laptop" problems. AnyCompany models must produce identical results across dev, staging, and production.

🔁

Reproducibility

Recreate any past model version exactly. Required for compliance audits: "show me the model that made this decision 6 months ago."

📈

Scalability

Handle growing data volumes and model complexity. AnyCompany adds new countries and clients continuously - pipelines must scale without manual intervention.

📜

Auditability

Full lineage: who trained what, when, with which data, who approved deployment. Non-negotiable for AnyCompany regulatory compliance across 140+ countries.

🔁 CI/CD for Machine Learning

Traditional CI/CD automates code from commit to production. ML CI/CD extends this to handle data pipelines, model training, evaluation gates, and model deployment - all automated.

ML CI/CD Pipeline Stages

Stage	What Happens	Trigger	AnyCompany Example
Data	Ingest, validate, and version new data	New data arrives (scheduled or event)	Monthly payroll data refresh from HRIS
Code	Lint, format, static analysis on ML code	Git push to feature branch	Data scientist pushes new feature engineering code
Build	Build training containers, resolve dependencies	Merge to main branch	Build XGBoost container with updated preprocessing
Test	Unit tests, integration tests, model validation	After successful build	Verify model AUC > 0.75 on validation set
Deploy	Deploy model to staging, then production	Tests pass + manual approval	Canary deploy fraud model to 10% of traffic
Monitor	Track performance, detect drift, alert on degradation	Continuous in production	Alert if fraud detection recall drops below 90%

Code Independence: Three Systems

📊

Data Systems Code

Owner: Data Engineer. ETL pipelines, data validation, feature store ingestion. Changes here trigger data pipeline runs, not model retraining directly.

🧠

ML Model Code

Owner: Data Scientist. Training scripts, hyperparameters, evaluation logic. Changes here trigger model build pipeline (train + evaluate + register).

🚀

Deployment Code

Owner: MLOps Engineer. Infrastructure as code, endpoint configs, traffic shifting rules. Changes here trigger deployment pipeline only.

AnyCompany Team Structure

The ML Engineer (your role in this course) bridges all three systems. You understand data pipelines, model building, AND deployment. At AnyCompany, AutoPay Modernization team members own end-to-end ML features.

🧪 Automated Testing for ML

Manual testing is error-prone and does not scale. At AnyCompany, with models serving millions of transactions, automated tests catch issues before they reach production.

Three Types of ML Tests

🔬

Unit Tests

Test individual functions: feature engineering logic, data transformations, preprocessing steps. Fast, run on every commit. "Does this function correctly calculate tenure from hire date?"

🔗

Integration Tests

Test components working together: data pipeline feeds training, training produces valid model artifacts. "Does the full pipeline from S3 data to registered model work end-to-end?"

🔄

Regression Tests

Ensure new changes do not degrade existing performance. Compare new model metrics against baseline. "Is the new fraud model at least as good as the current production version?"

✅ Benefits of Automated Testing

⚡

Speed

Tests run in minutes, not days. Catch issues immediately after code push. AnyCompany developers get feedback before their PR is even reviewed.

🛡️

Reliability

Consistent, repeatable checks every time. No human error in test execution. Same tests run in dev, staging, and pre-production.

📋

Coverage

Test hundreds of scenarios automatically. Edge cases, boundary conditions, multi-country data formats. Impossible to cover manually at AnyCompany scale.

🎯

Early Detection

Find bugs in development, not production. A data format issue caught in CI costs $0. The same bug in production affecting payroll costs millions.

🎯

ML-specific test: model quality gate. After training, automatically check: Is AUC > 0.75? Is precision > 80%? Is inference latency < 100ms? If any gate fails, the pipeline stops and alerts the team. No bad model reaches production.

☁️ AWS CI/CD Services for ML

AWS provides a complete toolchain for automating ML deployments - from source control through production monitoring.

The Deployment Pipeline

Service	Role in Pipeline	Key Features	AnyCompany Use
AWS CodePipeline	Orchestrator - connects all stages	Manual approvals, notifications, security	Orchestrates the full model deployment workflow with approval gates
Git Repository	Source control for ML code	Branching, PRs, code review	CodeCommit or GitHub for training scripts, IaC, and pipeline definitions
AWS CodeBuild	Build and test	Scalable, logging, artifacts, AWS integration	Build training containers, run unit tests, validate data schemas
AWS CloudFormation	Infrastructure as Code	Templates, nested stacks, rollbacks, change sets	Deploy SageMaker endpoints, configure auto-scaling, provision resources
AWS CodeDeploy	Deployment automation	Blue/green, rolling, rollback, integrations	Traffic shifting for model endpoint updates with automatic rollback

🏗️ SageMaker Projects

SageMaker Projects provides pre-built MLOps templates that wire together all these services automatically.

📋

Source Code Control

Git repository with branching strategy. Separate repos for model code and deployment code. PR-based workflow with code review.

⚡

Built-in Events

EventBridge rules trigger pipelines on code push, new data arrival, or model registration. No manual intervention needed.

🔗

Model Build Pipeline

Automated: preprocess data, train model, evaluate metrics, register if quality gate passes. Runs on every trigger.

🚀

Deployment Pipeline

Automated: deploy to staging, run integration tests, manual approval, deploy to production with traffic shifting.

💡

End-to-end traceability: SageMaker Projects tracks the full lineage - which data version trained which model version, who approved it, when it was deployed, and what its production metrics are. This is the audit trail AnyCompany compliance requires.

🔗 SageMaker Pipelines

SageMaker Pipelines is purpose-built for ML workflow orchestration. Define your training pipeline as code, with automated quality gates and model governance built in.

Pipeline Architecture (Lab 6)

AnyCompany Attrition Model Pipeline

Step 1: Preprocess data - Clean, encode, split. Output to SageMaker Feature Store.

Step 2: Train and tune model - XGBoost with automatic hyperparameter tuning. Output model artifacts to S3.

Step 3: Evaluate model - Calculate AUC, precision, recall on test set. Run SageMaker Clarify for bias detection.

Step 4: Quality gate - Is AUC > 0.75? If NO, pipeline fails and alerts team. If YES, continue.

Step 5: Register model - Add to SageMaker Model Registry with version, metrics, and lineage metadata.

CI Pipeline vs CD Pipeline

Pipeline	Trigger	Steps	Output
CI (Model Build)	Code push or new data	Validate repo, run tests, build containers, define pipeline, run training	Registered model in Model Registry
CD (Model Deploy)	New model registered	Generate deployment templates, deploy to staging, manual approval, deploy to production	Live SageMaker endpoint serving predictions

⚠️

Manual approval gate between staging and production. At AnyCompany, no model goes to production without human review. The CD pipeline deploys to staging automatically, but production deployment requires explicit approval from the MLOps team lead. This prevents automated systems from pushing a degraded model to millions of users.

🎮 Pipeline Builder

Select an AnyCompany ML system to see its recommended MLOps pipeline architecture - triggers, stages, quality gates, and deployment strategy.

🛡️

Payroll Fraud Detection

Critical real-time model. Monthly retraining on new transaction data. Zero-downtime deployments.

👤

Employee Attrition Model

Monthly batch scoring. Quarterly retraining. HR dashboard integration.

💬

AnyCompany Assist (LLM)

Continuous fine-tuning on new Q&A pairs. A/B testing new versions. User-facing chatbot.

📋 Payroll Fraud Detection Pipeline: Triggered monthly by new transaction data arriving in S3. Automated retraining with quality gate (recall > 95%). Linear traffic shifting to production. Full audit trail for financial compliance. Automatic rollback if CloudWatch alarms fire.

Pipeline Component	Configuration
Trigger	EventBridge rule: new data in s3://transactions/monthly/ OR code push to main branch
CI Pipeline	Preprocess (SageMaker Processing) → Train (XGBoost) → Evaluate (recall > 95%, precision > 80%) → Register
Quality Gate	Automated: AUC > 0.92, Recall > 95%. Plus SageMaker Clarify bias check on protected attributes.
CD Pipeline	Deploy to staging → Integration tests (100 known fraud cases) → Manual approval → Linear deploy to production
Deployment Strategy	Linear traffic shifting (25% steps, 1-hour bake, CloudWatch alarm auto-rollback)
Monitoring	Model Monitor for data drift. CloudWatch alarm if recall drops below 90%. Monthly retraining trigger.

📝 Module Summary

✅

MLOps Fundamentals

ML + DEV + OPS. Extends DevOps with data versioning, model registry, and training pipelines. Consistency, reproducibility, auditability.

✅

Automated Testing

Unit, integration, regression tests. Quality gates (AUC thresholds). Catch issues in CI, not production.

✅

AWS CI/CD Services

CodePipeline orchestrates. CodeBuild tests. CloudFormation deploys. CodeDeploy shifts traffic. SageMaker Projects ties it together.

✅

SageMaker Pipelines

ML-native workflow orchestration. Preprocess, train, evaluate, gate, register. CI builds models, CD deploys them.