Module 9 - Interactive Explainer
Protect sensitive workforce data with IAM policies, VPC isolation, encryption, and CI/CD security - because at AnyCompany, ML models handle SSNs, salaries, and bank details for millions of workers.
Security is not optional at AnyCompany - you handle SSNs, salaries, bank details, and health benefits for millions of workers across multiple countries. Every ML resource must follow least-privilege access: grant only the permissions needed, nothing more.
WHO can do WHAT on WHICH resources. IAM users, groups, roles, and policies control every API call to SageMaker, S3, and other ML services.
WHERE traffic can flow. VPCs, security groups, NACLs, and VPC endpoints isolate ML workloads from the public internet and other accounts.
HOW code and models move to production. Static analysis, vulnerability scanning, secrets management, and deployment controls prevent supply chain attacks.
| Component | What It Is | AnyCompany ML Example |
|---|---|---|
| Users | Individual identities with long-term credentials | Each data scientist has their own IAM user for SageMaker Studio access |
| Groups | Collections of users sharing the same permissions | Data Engineers group, MLOps Engineers group, Security Engineers group |
| Roles | Temporary credentials assumed by services or users | SageMaker execution role that training jobs assume to access S3 data |
| Policies | JSON documents defining allowed/denied actions | Policy allowing s3:GetObject on ml-training-data bucket only |
Different ML team members need different permissions. And SageMaker services themselves need roles to access your data. Least privilege means each entity gets exactly what it needs - no more.
| Role | AWS Services Needed | AnyCompany Context |
|---|---|---|
| Data Scientist | SageMaker Studio, S3 (read training data), Athena (query) | Builds attrition/fraud models. Reads anonymized data only - no raw PII access. |
| Data Engineer | AWS Glue, S3 (read/write), EMR, Athena | Builds data pipelines. Has write access to processed data buckets but not model endpoints. |
| MLOps Engineer | SageMaker AI, CodePipeline, CodeBuild, CloudFormation, ECR, Lambda, Step Functions | Deploys models to production. Has endpoint management but not training data access. |
| Security Engineer | IAM, CloudTrail, Config, GuardDuty, KMS | Audits access patterns, manages encryption keys, reviews policies. No model or data access. |
SageMaker jobs (training, processing, inference) run as AWS services - they need their own IAM roles to access your data and resources.
The role SageMaker Studio assumes. Controls what notebooks can access: S3 buckets, ECR images, CloudWatch. Scoped to specific resources.
Role for SageMaker Processing jobs. Needs S3 read (input data) and S3 write (output). Should NOT have endpoint or model deployment permissions.
Role for training jobs. Needs S3 read (training data), S3 write (model artifacts), CloudWatch (metrics/logs). Separate from processing role.
Role for inference endpoints. Needs S3 read (model artifacts), ECR pull (container image), CloudWatch write. No training data access.
Training job role: Can read from s3://anycompany-ml-training-data/* and write to s3://anycompany-ml-models/*. Cannot access s3://anycompany-raw-payroll/* (raw PII).
Inference endpoint role: Can read model artifacts from s3://anycompany-ml-models/fraud-v3/*. Cannot read training data or write to any bucket.
ML workloads at AnyCompany process sensitive data that must never traverse the public internet. VPCs, endpoints, and firewalls create layered network isolation.
| Layer | What It Controls | Scope | AnyCompany Use |
|---|---|---|---|
| VPC | Isolated virtual network for your ML resources | Account-level isolation | All SageMaker resources run in AnyCompany VPC - never on public internet |
| VPC Endpoints | Private connections to AWS services (S3, SageMaker API) without internet | Service-level | Training jobs access S3 data via private endpoint - no NAT gateway needed |
| Security Groups | Stateful firewall at instance/ENI level. Allow rules only. | Instance-level | SageMaker notebook only accepts connections from AnyCompany corporate CIDR |
| Network ACLs | Stateless firewall at subnet level. Allow and deny rules. | Subnet-level | Block all inbound traffic to training subnet except from VPC endpoints |
| AWS Network Firewall | Managed firewall with deep packet inspection, IDS/IPS | VPC-level | Inspect and filter all traffic entering/leaving the ML VPC |
All ML compute (notebooks, training, endpoints) runs in private subnets. No public IP addresses. No direct internet access. AnyCompany payroll data never leaves the private network.
Interface endpoints for SageMaker API, S3 gateway endpoint for data access. Traffic stays on AWS backbone - never touches the internet. Endpoint policies add another access control layer.
Only for outbound internet access (downloading packages, external APIs). Placed in public subnet. ML data traffic should use VPC endpoints instead.
AnyCompany handles PII for millions of workers across multiple countries (GDPR, CCPA, India DPDP Act). Encryption at rest and in transit is mandatory - not optional.
Centralized encryption key management. Create, rotate, and audit keys. Integrates with S3, SageMaker, EBS. AnyCompany: separate keys per data classification level.
Store and rotate database credentials, API keys, tokens. Never hardcode secrets in notebooks or training scripts. Auto-rotation for compliance.
TLS certificates for encryption in transit. All SageMaker endpoints use HTTPS. Internal service-to-service communication encrypted.
| Stage | Data at Risk | Protection | AnyCompany Implementation |
|---|---|---|---|
| Storage (S3) | Training data, model artifacts | SSE-KMS encryption at rest | All ML buckets encrypted with AnyCompany-managed CMK |
| Training | Data in memory during training | Encrypted EBS volumes, VPC isolation | Training instances use encrypted storage, no internet access |
| Transit | Data moving between services | TLS 1.2+ for all connections | VPC endpoints ensure traffic never leaves AWS network |
| Inference | Request/response payloads (may contain PII) | HTTPS endpoints, request logging controls | Fraud scoring requests contain transaction data - encrypted end-to-end |
| Notebooks | Code, credentials, data samples | Encrypted EBS, Secrets Manager for creds | No PII in notebook outputs. Credentials via Secrets Manager, never hardcoded. |
ML models are software - they need the same CI/CD security as any production code. But ML adds unique risks: poisoned training data, model backdoors, and adversarial attacks. Secure every stage of the pipeline.
| Stage | Security Measure | What It Catches | AnyCompany Tool |
|---|---|---|---|
| Pre-commit | Pre-commit hooks, IDE checks | Secrets in code, PII in notebooks, formatting issues | git-secrets, detect-secrets hooks |
| Commit | Static Application Security Testing (SAST) | Code vulnerabilities, insecure patterns, SQL injection | Amazon CodeGuru Reviewer |
| Build | Software Composition Analysis (SCA) | Vulnerable dependencies, license violations | ECR image scanning, Snyk |
| Test | DAST + IAST | Runtime vulnerabilities, API security issues | Dynamic testing in staging environment |
| Deploy | Penetration testing, approval gates | Exploitable endpoints, misconfigurations | Model Registry approval workflow |
| Monitor | Event monitoring, red/blue teaming | Active attacks, anomalous access patterns | GuardDuty, CloudTrail, Config rules |
Define all ML infrastructure as code. Version-controlled, auditable, repeatable. No manual console changes to production ML resources.
Continuously monitor resource configurations. Alert when S3 buckets lose encryption, security groups open too wide, or VPC endpoints are removed.
Log every API call. Who accessed what data, when, from where. Essential for compliance audits and incident investigation at AnyCompany.
Manage encryption keys and secrets centrally. Auto-rotate credentials. Never store secrets in code, environment variables, or notebooks.
Select an AnyCompany ML workload to see the recommended security configuration across IAM, network, encryption, and monitoring.
Processes transaction data containing amounts, employee IDs, and vendor details. Real-time endpoint.
Uses employee demographics, compensation, and performance data. Monthly batch scoring.
Processes scanned W-2s and I-9s containing SSNs, addresses, and income. Highest PII sensitivity.
Conversational AI handling employee queries about salary, benefits, PTO. User-facing.
| Security Layer | Configuration |
|---|---|
| IAM | Dedicated service role with S3 read (transaction data) + CloudWatch write only. No human user access to endpoint. |
| Network | Private subnet, VPC endpoint for S3. Security group allows inbound only from payroll processing service CIDR. |
| Encryption | KMS-encrypted S3 bucket, encrypted EBS on endpoint instances, TLS 1.2 for all API calls. |
| Monitoring | CloudTrail logging all InvokeEndpoint calls. GuardDuty for anomalous access patterns. Config rule for encryption compliance. |
| Data Protection | No PII in CloudWatch logs (mask sensitive fields). Data capture for model monitoring stored encrypted with separate key. |
Users, groups, roles, policies. Least privilege. Separate service roles for training, processing, and inference.
VPC isolation, VPC endpoints, security groups, NACLs, Network Firewall. No public internet for ML data.
KMS for keys, Secrets Manager for credentials, TLS in transit, SSE-KMS at rest. Multi-country compliance.
SAST, SCA, DAST at every stage. IaC for reproducibility. CloudTrail for audit. No secrets in code.