Module 9 - Interactive Explainer

Securing AWS ML Resources

Protect sensitive workforce data with IAM policies, VPC isolation, encryption, and CI/CD security - because at AnyCompany, ML models handle SSNs, salaries, and bank details for millions of workers.

๐Ÿ”’ Security โšก Interactive ๐Ÿข HCM Context

๐Ÿ”‘ IAM & Access Control for ML

Security is not optional at AnyCompany - you handle SSNs, salaries, bank details, and health benefits for millions of workers across multiple countries. Every ML resource must follow least-privilege access: grant only the permissions needed, nothing more.

Three Pillars of ML Security

๐Ÿ”‘

Access Control

WHO can do WHAT on WHICH resources. IAM users, groups, roles, and policies control every API call to SageMaker, S3, and other ML services.

๐ŸŒ

Network Configuration

WHERE traffic can flow. VPCs, security groups, NACLs, and VPC endpoints isolate ML workloads from the public internet and other accounts.

๐Ÿ”„

CI/CD Pipeline Security

HOW code and models move to production. Static analysis, vulnerability scanning, secrets management, and deployment controls prevent supply chain attacks.

๐Ÿ‘ค IAM Core Components

ComponentWhat It IsAnyCompany ML Example
UsersIndividual identities with long-term credentialsEach data scientist has their own IAM user for SageMaker Studio access
GroupsCollections of users sharing the same permissionsData Engineers group, MLOps Engineers group, Security Engineers group
RolesTemporary credentials assumed by services or usersSageMaker execution role that training jobs assume to access S3 data
PoliciesJSON documents defining allowed/denied actionsPolicy allowing s3:GetObject on ml-training-data bucket only
โš ๏ธ
At AnyCompany, PII data access is audited. Every access to payroll data, SSNs, or salary information is logged via CloudTrail. IAM policies must restrict ML training data access to only the team members who need it. A data scientist building an attrition model should NOT have access to raw SSN data.

๐Ÿ‘ฅ ML Team Roles & Service Roles

Different ML team members need different permissions. And SageMaker services themselves need roles to access your data. Least privilege means each entity gets exactly what it needs - no more.

Team Permission Mapping

RoleAWS Services NeededAnyCompany Context
Data ScientistSageMaker Studio, S3 (read training data), Athena (query)Builds attrition/fraud models. Reads anonymized data only - no raw PII access.
Data EngineerAWS Glue, S3 (read/write), EMR, AthenaBuilds data pipelines. Has write access to processed data buckets but not model endpoints.
MLOps EngineerSageMaker AI, CodePipeline, CodeBuild, CloudFormation, ECR, Lambda, Step FunctionsDeploys models to production. Has endpoint management but not training data access.
Security EngineerIAM, CloudTrail, Config, GuardDuty, KMSAudits access patterns, manages encryption keys, reviews policies. No model or data access.

โš™๏ธ SageMaker Service Roles

SageMaker jobs (training, processing, inference) run as AWS services - they need their own IAM roles to access your data and resources.

๐Ÿ““

Execution Role

The role SageMaker Studio assumes. Controls what notebooks can access: S3 buckets, ECR images, CloudWatch. Scoped to specific resources.

โš™๏ธ

Processing Job Role

Role for SageMaker Processing jobs. Needs S3 read (input data) and S3 write (output). Should NOT have endpoint or model deployment permissions.

๐Ÿ‹๏ธ

Training Job Role

Role for training jobs. Needs S3 read (training data), S3 write (model artifacts), CloudWatch (metrics/logs). Separate from processing role.

๐Ÿš€

Model Serving Role

Role for inference endpoints. Needs S3 read (model artifacts), ECR pull (container image), CloudWatch write. No training data access.

Least Privilege at AnyCompany

Training job role: Can read from s3://anycompany-ml-training-data/* and write to s3://anycompany-ml-models/*. Cannot access s3://anycompany-raw-payroll/* (raw PII).

Inference endpoint role: Can read model artifacts from s3://anycompany-ml-models/fraud-v3/*. Cannot read training data or write to any bucket.

๐ŸŒ Network Security for ML

ML workloads at AnyCompany process sensitive data that must never traverse the public internet. VPCs, endpoints, and firewalls create layered network isolation.

Defense in Depth - Network Layers

LayerWhat It ControlsScopeAnyCompany Use
VPCIsolated virtual network for your ML resourcesAccount-level isolationAll SageMaker resources run in AnyCompany VPC - never on public internet
VPC EndpointsPrivate connections to AWS services (S3, SageMaker API) without internetService-levelTraining jobs access S3 data via private endpoint - no NAT gateway needed
Security GroupsStateful firewall at instance/ENI level. Allow rules only.Instance-levelSageMaker notebook only accepts connections from AnyCompany corporate CIDR
Network ACLsStateless firewall at subnet level. Allow and deny rules.Subnet-levelBlock all inbound traffic to training subnet except from VPC endpoints
AWS Network FirewallManaged firewall with deep packet inspection, IDS/IPSVPC-levelInspect and filter all traffic entering/leaving the ML VPC

Network Architecture for ML

๐Ÿ 

Private Subnets

All ML compute (notebooks, training, endpoints) runs in private subnets. No public IP addresses. No direct internet access. AnyCompany payroll data never leaves the private network.

๐Ÿ”—

VPC Endpoints

Interface endpoints for SageMaker API, S3 gateway endpoint for data access. Traffic stays on AWS backbone - never touches the internet. Endpoint policies add another access control layer.

๐ŸŒ‰

NAT Gateway (if needed)

Only for outbound internet access (downloading packages, external APIs). Placed in public subnet. ML data traffic should use VPC endpoints instead.

๐ŸŽฏ
AnyCompany network rule: All ML training data (payroll records, employee PII) must flow through VPC endpoints - never through NAT gateways or the internet. VPC endpoint policies restrict which S3 buckets can be accessed, adding defense beyond IAM.

๐Ÿ” Encryption & Data Protection

AnyCompany handles PII for millions of workers across multiple countries (GDPR, CCPA, India DPDP Act). Encryption at rest and in transit is mandatory - not optional.

Encryption Strategy

๐Ÿ”‘

AWS KMS (Key Management)

Centralized encryption key management. Create, rotate, and audit keys. Integrates with S3, SageMaker, EBS. AnyCompany: separate keys per data classification level.

๐Ÿ—๏ธ

AWS Secrets Manager

Store and rotate database credentials, API keys, tokens. Never hardcode secrets in notebooks or training scripts. Auto-rotation for compliance.

๐Ÿ“œ

AWS Certificate Manager

TLS certificates for encryption in transit. All SageMaker endpoints use HTTPS. Internal service-to-service communication encrypted.

๐Ÿ›ก๏ธ Data Protection in ML Pipeline

StageData at RiskProtectionAnyCompany Implementation
Storage (S3)Training data, model artifactsSSE-KMS encryption at restAll ML buckets encrypted with AnyCompany-managed CMK
TrainingData in memory during trainingEncrypted EBS volumes, VPC isolationTraining instances use encrypted storage, no internet access
TransitData moving between servicesTLS 1.2+ for all connectionsVPC endpoints ensure traffic never leaves AWS network
InferenceRequest/response payloads (may contain PII)HTTPS endpoints, request logging controlsFraud scoring requests contain transaction data - encrypted end-to-end
NotebooksCode, credentials, data samplesEncrypted EBS, Secrets Manager for credsNo PII in notebook outputs. Credentials via Secrets Manager, never hardcoded.
โš ๏ธ
Multi-country compliance: AnyCompany operates in 140+ countries. GDPR (EU) requires data residency. India DPDP Act mandates consent. US has state-level laws (CCPA). Encryption keys must be region-specific, and data must not cross borders without proper controls.

๐Ÿ”„ CI/CD Pipeline Security

ML models are software - they need the same CI/CD security as any production code. But ML adds unique risks: poisoned training data, model backdoors, and adversarial attacks. Secure every stage of the pipeline.

Security at Every Pipeline Stage

StageSecurity MeasureWhat It CatchesAnyCompany Tool
Pre-commitPre-commit hooks, IDE checksSecrets in code, PII in notebooks, formatting issuesgit-secrets, detect-secrets hooks
CommitStatic Application Security Testing (SAST)Code vulnerabilities, insecure patterns, SQL injectionAmazon CodeGuru Reviewer
BuildSoftware Composition Analysis (SCA)Vulnerable dependencies, license violationsECR image scanning, Snyk
TestDAST + IASTRuntime vulnerabilities, API security issuesDynamic testing in staging environment
DeployPenetration testing, approval gatesExploitable endpoints, misconfigurationsModel Registry approval workflow
MonitorEvent monitoring, red/blue teamingActive attacks, anomalous access patternsGuardDuty, CloudTrail, Config rules

๐Ÿ—๏ธ Infrastructure Security Services

๐Ÿ“‹

AWS CloudFormation (IaC)

Define all ML infrastructure as code. Version-controlled, auditable, repeatable. No manual console changes to production ML resources.

๐Ÿ“Š

AWS Config

Continuously monitor resource configurations. Alert when S3 buckets lose encryption, security groups open too wide, or VPC endpoints are removed.

๐Ÿ‘๏ธ

AWS CloudTrail

Log every API call. Who accessed what data, when, from where. Essential for compliance audits and incident investigation at AnyCompany.

๐Ÿ”‘

AWS KMS + Secrets Manager

Manage encryption keys and secrets centrally. Auto-rotate credentials. Never store secrets in code, environment variables, or notebooks.

๐ŸŽฎ Security Planner

Select an AnyCompany ML workload to see the recommended security configuration across IAM, network, encryption, and monitoring.

๐Ÿ›ก๏ธ

Payroll Fraud Model

Processes transaction data containing amounts, employee IDs, and vendor details. Real-time endpoint.

๐Ÿ‘ค

Attrition Prediction

Uses employee demographics, compensation, and performance data. Monthly batch scoring.

๐Ÿ“„

Document OCR (Tax Forms)

Processes scanned W-2s and I-9s containing SSNs, addresses, and income. Highest PII sensitivity.

๐Ÿ’ฌ

AnyCompany Assist (LLM)

Conversational AI handling employee queries about salary, benefits, PTO. User-facing.

๐Ÿ“‹ Payroll Fraud Model: Handles financial transaction data. Requires encrypted endpoints, VPC isolation, strict IAM roles, and full audit logging. Real-time endpoint must be highly available but locked down to internal services only.
Security LayerConfiguration
IAMDedicated service role with S3 read (transaction data) + CloudWatch write only. No human user access to endpoint.
NetworkPrivate subnet, VPC endpoint for S3. Security group allows inbound only from payroll processing service CIDR.
EncryptionKMS-encrypted S3 bucket, encrypted EBS on endpoint instances, TLS 1.2 for all API calls.
MonitoringCloudTrail logging all InvokeEndpoint calls. GuardDuty for anomalous access patterns. Config rule for encryption compliance.
Data ProtectionNo PII in CloudWatch logs (mask sensitive fields). Data capture for model monitoring stored encrypted with separate key.

๐Ÿ“ Module Summary

โœ…

IAM & Access

Users, groups, roles, policies. Least privilege. Separate service roles for training, processing, and inference.

โœ…

Network Security

VPC isolation, VPC endpoints, security groups, NACLs, Network Firewall. No public internet for ML data.

โœ…

Encryption

KMS for keys, Secrets Manager for credentials, TLS in transit, SSE-KMS at rest. Multi-country compliance.

โœ…

CI/CD Security

SAST, SCA, DAST at every stage. IaC for reproducibility. CloudTrail for audit. No secrets in code.