Lab 6 — Interactive Explainer
Build a 9-step ML pipeline that automates data processing, hyperparameter tuning, model evaluation, bias detection, and conditional model registration — all orchestrated as a single reproducible workflow.
This lab brings together everything from Labs 1–5 into a single automated workflow. Instead of running data processing, training, tuning, evaluation, and deployment as separate manual steps, you define them as a SageMaker Pipeline — a reproducible, auditable, end-to-end ML workflow that runs with one API call.
Duration: ~90 minutes • Phase: MLOps • Prerequisite: Understanding of Labs 1–5 concepts
Customer churn data
30+ engineered features
Process → Tune → Eval
→ Clarify → Register
Versioned model
+ bias report + lineage
Key insight: The pipeline only registers the model if AUC exceeds a threshold (conditional step). This is the quality gate that prevents bad models from reaching production.
| # | Step Name | Type | What It Does |
|---|---|---|---|
| 1 | ChurnModelProcess | Processing | Fetches data from Feature Store, splits into train/validation/test |
| 2 | ChurnHyperParameterTuning | Tuning | XGBoost HPO with 5 hyperparameter ranges (like Lab 4) |
| 3 | ChurnEvalBestModel | Processing | Evaluates the best model, computes AUC and classification metrics |
| 4 | ChurnCreateModel | Model | Creates a SageMaker Model object from the best training job |
| 5 | ChurnModelConfigFile | Processing | Generates Clarify analysis config (bias detection settings) |
| 6 | ChurnTransform | Transform | Batch inference on test data to generate predictions |
| 7 | ClarifyProcessingStep | Processing | Runs SageMaker Clarify to detect bias in predictions |
| 8 | RegisterChurnModel | Register | Registers model in Model Registry with metrics and explainability |
| 9 | CheckAUCScoreChurnEvaluation | Condition | Gates registration: only proceeds if AUC > threshold |
Orchestration service for ML workflows. Define steps as Python objects, connect inputs/outputs, execute with one call. Tracks every run with full metadata.
Centralized repository for ML features. Serves both batch training (offline store) and real-time inference (online store). Ensures feature consistency.
Version control for models. Stores model artifacts, metrics, bias reports, and approval status. Enables governed promotion from staging to production.
Tracks the full provenance of a model: which data, features, training job, and pipeline run produced it. Essential for audit and compliance.
Click any step to explore what it does, or auto-play to walk through the entire 9-step pipeline.
Before the pipeline runs, you populate a Feature Store with pre-engineered features. This decouples feature engineering from model training — features are computed once and reused across multiple pipeline runs, experiments, and models.
Full historical feature data in Parquet format. Used for batch training and pipeline processing steps. Queryable via Athena SQL. This is what Lab 6 uses.
Low-latency feature lookup for real-time inference. Returns the latest feature values for a given record ID in single-digit milliseconds.
The churn prediction dataset includes 30+ engineered features from customer behavior data:
| Feature | Type | Description |
|---|---|---|
retained | Long (target) | 1 = customer stayed, 0 = churned |
esent | Float | Number of emails sent to customer |
eopenrate | Float | Email open rate (engagement signal) |
eclickrate | Float | Email click-through rate |
avgorder | Float | Average order value |
ordfreq | Float | Order frequency (transactions per period) |
paperless | Long | Paperless billing enabled (1/0) |
refill | Long | Auto-refill subscription active |
doorstep | Long | Doorstep delivery preference |
first_last_days_diff | Float | Days between first and last order (tenure) |
favday_* | Long | One-hot encoded preferred shopping day |
The Model Registry is version control for ML models. Every pipeline run that passes the quality gate produces a registered model version — complete with metrics, bias reports, and approval status.
| Artifact | Source Step | Purpose |
|---|---|---|
| model.tar.gz | ChurnHyperParameterTuning | Trained XGBoost model artifact (deployable) |
| evaluation.json | ChurnEvalBestModel | AUC, accuracy, precision, recall, F1 metrics |
| Clarify bias report | ClarifyProcessingStep | Statistical parity, disparate impact analysis |
| Explainability report | ClarifyProcessingStep | SHAP values showing feature importance |
| Inference spec | Pipeline config | Container image, instance type, input format |
Pipeline registers model
Awaits human review
Reviewer approves
Ready for deployment
Fails review criteria
Archived, not deployed
The CheckAUCScoreChurnEvaluation step uses a ConditionGreaterThan check:
evaluation.json → binary_classification_metrics.auc.value > threshold:Responsible AI requires understanding both what your model predicts and why. Clarify detects bias in predictions, while lineage tracking provides full provenance of every model artifact.
The ClarifyProcessingStep runs post-training bias analysis on the model's predictions. It checks whether the model treats different groups fairly.
Detects imbalances in the training data itself. Example: if 90% of "retained" customers are from one demographic, the model may learn biased patterns.
Measures whether the model's predictions are fair across groups. Checks disparate impact, statistical parity difference, and conditional demographic disparity.
Lineage tracking answers: "How was this model created?" — tracing from raw data through every transformation, training job, and evaluation step.
| Lineage Component | What It Tracks | Why It Matters |
|---|---|---|
| Data Source | Feature Store group, S3 paths, data version | Reproduce training with exact same data |
| Processing Job | Script version, parameters, output artifacts | Audit data transformations |
| Training Job | Algorithm, hyperparameters, instance type, duration | Understand model configuration |
| Evaluation | Metrics (AUC, F1), evaluation dataset | Compare model versions objectively |
| Bias Report | Clarify analysis results, fairness metrics | Compliance and responsible AI audit trail |
How does a SageMaker Pipeline apply to AnyCompany's ML products? Each product has different pipeline complexity, retraining frequency, and governance requirements.
Monthly retraining with new fraud patterns. Strict quality gates — recall must exceed 95%.
Quarterly retraining. Clarify bias checks critical — cannot discriminate by demographics.
Weekly fine-tuning on new conversation data. A/B testing gate before full rollout.
Annual retraining with market data refresh. RMSE threshold as quality gate.
| Lab 6 Concept | AnyCompany Equivalent | Why It Matters |
|---|---|---|
| Customer churn target | Employee attrition (left_company) | Same binary classification problem structure |
| Feature Store (30 features) | HR Feature Store (50+ features from HRIS, Comp, Perf) | Centralized features shared across attrition, engagement, and flight-risk models |
| XGBoost HPO step | XGBoost + LightGBM comparison step | Production pipelines often compare multiple algorithms |
| AUC condition gate | AUC > 0.78 AND recall > 0.70 | Multiple metrics must pass — single AUC isn't enough for HR decisions |
| Clarify bias check | Bias analysis by gender, age, ethnicity, location | Legal requirement — cannot deploy discriminatory attrition model |
| Model Registry | Versioned model catalog with approval workflow | HR leadership must approve before model influences retention decisions |
| Lineage tracking | Full audit trail for compliance (GDPR, DPDP Act) | Must prove which data trained which model for regulatory audits |
pipeline.start() calls.