Lab 6: SageMaker Pipelines & Model Registry

📋 Lab 6 Overview

This lab brings together everything from Labs 1–5 into a single automated workflow. Instead of running data processing, training, tuning, evaluation, and deployment as separate manual steps, you define them as a SageMaker Pipeline — a reproducible, auditable, end-to-end ML workflow that runs with one API call.

Duration: ~90 minutes • Phase: MLOps • Prerequisite: Understanding of Labs 1–5 concepts

What You Build

Input

💾 Feature Store

Customer churn data
30+ engineered features

→

Pipeline

🔗 9 Steps

Process → Tune → Eval
→ Clarify → Register

→

Output

📦 Model Registry

Versioned model
+ bias report + lineage

Key insight: The pipeline only registers the model if AUC exceeds a threshold (conditional step). This is the quality gate that prevents bad models from reaching production.

The 9 Pipeline Steps

#	Step Name	Type	What It Does
1	ChurnModelProcess	Processing	Fetches data from Feature Store, splits into train/validation/test
2	ChurnHyperParameterTuning	Tuning	XGBoost HPO with 5 hyperparameter ranges (like Lab 4)
3	ChurnEvalBestModel	Processing	Evaluates the best model, computes AUC and classification metrics
4	ChurnCreateModel	Model	Creates a SageMaker Model object from the best training job
5	ChurnModelConfigFile	Processing	Generates Clarify analysis config (bias detection settings)
6	ChurnTransform	Transform	Batch inference on test data to generate predictions
7	ClarifyProcessingStep	Processing	Runs SageMaker Clarify to detect bias in predictions
8	RegisterChurnModel	Register	Registers model in Model Registry with metrics and explainability
9	CheckAUCScoreChurnEvaluation	Condition	Gates registration: only proceeds if AUC > threshold

Key Concepts

🔗

SageMaker Pipelines

Orchestration service for ML workflows. Define steps as Python objects, connect inputs/outputs, execute with one call. Tracks every run with full metadata.

💾

Feature Store

Centralized repository for ML features. Serves both batch training (offline store) and real-time inference (online store). Ensures feature consistency.

📦

Model Registry

Version control for models. Stores model artifacts, metrics, bias reports, and approval status. Enables governed promotion from staging to production.

🧬

Model Lineage

Tracks the full provenance of a model: which data, features, training job, and pipeline run produced it. Essential for audit and compliance.

🔗 Pipeline Steps — Interactive Flow

Click any step to explore what it does, or auto-play to walk through the entire 9-step pipeline.

🔗 Click any node or press Auto-play to walk through the ChurnModelPipeline step by step.

Step Details

StepSelect a node above

💾 SageMaker Feature Store

Before the pipeline runs, you populate a Feature Store with pre-engineered features. This decouples feature engineering from model training — features are computed once and reused across multiple pipeline runs, experiments, and models.

Feature Store Architecture

💾

Offline Store (S3)

Full historical feature data in Parquet format. Used for batch training and pipeline processing steps. Queryable via Athena SQL. This is what Lab 6 uses.

⚡

Online Store (DynamoDB)

Low-latency feature lookup for real-time inference. Returns the latest feature values for a given record ID in single-digit milliseconds.

Lab 6 Feature Group Schema

The churn prediction dataset includes 30+ engineered features from customer behavior data:

Feature	Type	Description
`retained`	Long (target)	1 = customer stayed, 0 = churned
`esent`	Float	Number of emails sent to customer
`eopenrate`	Float	Email open rate (engagement signal)
`eclickrate`	Float	Email click-through rate
`avgorder`	Float	Average order value
`ordfreq`	Float	Order frequency (transactions per period)
`paperless`	Long	Paperless billing enabled (1/0)
`refill`	Long	Auto-refill subscription active
`doorstep`	Long	Doorstep delivery preference
`first_last_days_diff`	Float	Days between first and last order (tenure)
`favday_*`	Long	One-hot encoded preferred shopping day

💡 Why Feature Store matters: Without it, every pipeline run must re-compute features from raw data (slow, error-prone). With Feature Store, features are computed once by a dedicated feature engineering pipeline, then consumed by any number of training pipelines. This ensures training and inference use identical feature logic.

📦 Model Registry

The Model Registry is version control for ML models. Every pipeline run that passes the quality gate produces a registered model version — complete with metrics, bias reports, and approval status.

Model Package Contents

Artifact	Source Step	Purpose
model.tar.gz	ChurnHyperParameterTuning	Trained XGBoost model artifact (deployable)
evaluation.json	ChurnEvalBestModel	AUC, accuracy, precision, recall, F1 metrics
Clarify bias report	ClarifyProcessingStep	Statistical parity, disparate impact analysis
Explainability report	ClarifyProcessingStep	SHAP values showing feature importance
Inference spec	Pipeline config	Container image, instance type, input format

Model Lifecycle

⏳ PendingManualApproval

Pipeline registers model
Awaits human review

✅ Approved

Reviewer approves
Ready for deployment

❌ Rejected

Fails review criteria
Archived, not deployed

🔔 The Condition Step is the automated gate. If AUC < threshold, the model is never registered — it doesn't even reach PendingManualApproval. This prevents obviously bad models from wasting reviewer time. Only models that pass the automated check get human review.

Conditional Registration Logic

The CheckAUCScoreChurnEvaluation step uses a ConditionGreaterThan check:

📝 If evaluation.json → binary_classification_metrics.auc.value > threshold:
→ Execute RegisterChurnModel (model enters registry)
Else:
→ Pipeline completes without registration (model discarded)

⚖️ SageMaker Clarify & Model Lineage

Responsible AI requires understanding both what your model predicts and why. Clarify detects bias in predictions, while lineage tracking provides full provenance of every model artifact.

Clarify Bias Detection

The ClarifyProcessingStep runs post-training bias analysis on the model's predictions. It checks whether the model treats different groups fairly.

📊

Pre-Training Bias

Detects imbalances in the training data itself. Example: if 90% of "retained" customers are from one demographic, the model may learn biased patterns.

🔍

Post-Training Bias

Measures whether the model's predictions are fair across groups. Checks disparate impact, statistical parity difference, and conditional demographic disparity.

⚠️ Why this matters for HCM: An attrition model that systematically flags employees from certain demographics as "high flight risk" could lead to discriminatory retention interventions. Clarify catches this before the model reaches production.

Model Lineage

Lineage tracking answers: "How was this model created?" — tracing from raw data through every transformation, training job, and evaluation step.

Lineage Component	What It Tracks	Why It Matters
Data Source	Feature Store group, S3 paths, data version	Reproduce training with exact same data
Processing Job	Script version, parameters, output artifacts	Audit data transformations
Training Job	Algorithm, hyperparameters, instance type, duration	Understand model configuration
Evaluation	Metrics (AUC, F1), evaluation dataset	Compare model versions objectively
Bias Report	Clarify analysis results, fairness metrics	Compliance and responsible AI audit trail

💡 Lineage visualization: In the lab, you generate a visual graph showing all artifacts connected to your model — from the Feature Store query through the training job to the final registered model package. This is what auditors and compliance teams review.

🏢 HCM Mapping — AnyCompany Context

How does a SageMaker Pipeline apply to AnyCompany's ML products? Each product has different pipeline complexity, retraining frequency, and governance requirements.

Pipeline Scenarios at AnyCompany

🏢 Click a scenario to see how SageMaker Pipelines would be configured for different AnyCompany ML products.

🚨

Fraud Detection Pipeline

Monthly retraining with new fraud patterns. Strict quality gates — recall must exceed 95%.

📉

Attrition Prediction Pipeline

Quarterly retraining. Clarify bias checks critical — cannot discriminate by demographics.

🤖

AnyCompany Assist Fine-Tune

Weekly fine-tuning on new conversation data. A/B testing gate before full rollout.

💰

Salary Benchmarking Pipeline

Annual retraining with market data refresh. RMSE threshold as quality gate.

Pipeline Configuration

ScenarioSelect a card above

Lab 6 → AnyCompany Attrition Pipeline

Lab 6 Concept	AnyCompany Equivalent	Why It Matters
Customer churn target	Employee attrition (left_company)	Same binary classification problem structure
Feature Store (30 features)	HR Feature Store (50+ features from HRIS, Comp, Perf)	Centralized features shared across attrition, engagement, and flight-risk models
XGBoost HPO step	XGBoost + LightGBM comparison step	Production pipelines often compare multiple algorithms
AUC condition gate	AUC > 0.78 AND recall > 0.70	Multiple metrics must pass — single AUC isn't enough for HR decisions
Clarify bias check	Bias analysis by gender, age, ethnicity, location	Legal requirement — cannot deploy discriminatory attrition model
Model Registry	Versioned model catalog with approval workflow	HR leadership must approve before model influences retention decisions
Lineage tracking	Full audit trail for compliance (GDPR, DPDP Act)	Must prove which data trained which model for regulatory audits

Production Pipeline Patterns

💡 Scheduled execution: In production, pipelines are triggered by EventBridge schedules (monthly for fraud, quarterly for attrition) or by data arrival events (new Feature Store data lands → pipeline starts automatically). No manual pipeline.start() calls.

🛡️ Multi-stage approval: AnyCompany's production pipeline adds a human approval step between registration and deployment. The pipeline registers the model, sends an SNS notification to the ML team, and waits for manual approval before triggering the deployment pipeline (Lab 5's blue/green traffic shift).