Securing AWS ML Resources - Module 9 | AnyCompany ML Engineering

🔑 IAM & Access Control for ML

Security is not optional at AnyCompany - you handle SSNs, salaries, bank details, and health benefits for millions of workers across multiple countries. Every ML resource must follow least-privilege access: grant only the permissions needed, nothing more.

Three Pillars of ML Security

🔑

Access Control

WHO can do WHAT on WHICH resources. IAM users, groups, roles, and policies control every API call to SageMaker, S3, and other ML services.

🌐

Network Configuration

WHERE traffic can flow. VPCs, security groups, NACLs, and VPC endpoints isolate ML workloads from the public internet and other accounts.

🔄

CI/CD Pipeline Security

HOW code and models move to production. Static analysis, vulnerability scanning, secrets management, and deployment controls prevent supply chain attacks.

👤 IAM Core Components

Component	What It Is	AnyCompany ML Example
Users	Individual identities with long-term credentials	Each data scientist has their own IAM user for SageMaker Studio access
Groups	Collections of users sharing the same permissions	Data Engineers group, MLOps Engineers group, Security Engineers group
Roles	Temporary credentials assumed by services or users	SageMaker execution role that training jobs assume to access S3 data
Policies	JSON documents defining allowed/denied actions	Policy allowing s3:GetObject on ml-training-data bucket only

⚠️

At AnyCompany, PII data access is audited. Every access to payroll data, SSNs, or salary information is logged via CloudTrail. IAM policies must restrict ML training data access to only the team members who need it. A data scientist building an attrition model should NOT have access to raw SSN data.

👥 ML Team Roles & Service Roles

Different ML team members need different permissions. And SageMaker services themselves need roles to access your data. Least privilege means each entity gets exactly what it needs - no more.

Team Permission Mapping

Role	AWS Services Needed	AnyCompany Context
Data Scientist	SageMaker Studio, S3 (read training data), Athena (query)	Builds attrition/fraud models. Reads anonymized data only - no raw PII access.
Data Engineer	AWS Glue, S3 (read/write), EMR, Athena	Builds data pipelines. Has write access to processed data buckets but not model endpoints.
MLOps Engineer	SageMaker AI, CodePipeline, CodeBuild, CloudFormation, ECR, Lambda, Step Functions	Deploys models to production. Has endpoint management but not training data access.
Security Engineer	IAM, CloudTrail, Config, GuardDuty, KMS	Audits access patterns, manages encryption keys, reviews policies. No model or data access.

⚙️ SageMaker Service Roles

SageMaker jobs (training, processing, inference) run as AWS services - they need their own IAM roles to access your data and resources.

📓

Execution Role

The role SageMaker Studio assumes. Controls what notebooks can access: S3 buckets, ECR images, CloudWatch. Scoped to specific resources.

⚙️

Processing Job Role

Role for SageMaker Processing jobs. Needs S3 read (input data) and S3 write (output). Should NOT have endpoint or model deployment permissions.

🏋️

Training Job Role

Role for training jobs. Needs S3 read (training data), S3 write (model artifacts), CloudWatch (metrics/logs). Separate from processing role.

🚀

Model Serving Role

Role for inference endpoints. Needs S3 read (model artifacts), ECR pull (container image), CloudWatch write. No training data access.

Least Privilege at AnyCompany

Training job role: Can read from s3://anycompany-ml-training-data/* and write to s3://anycompany-ml-models/*. Cannot access s3://anycompany-raw-payroll/* (raw PII).

Inference endpoint role: Can read model artifacts from s3://anycompany-ml-models/fraud-v3/*. Cannot read training data or write to any bucket.

🌐 Network Security for ML

ML workloads at AnyCompany process sensitive data that must never traverse the public internet. VPCs, endpoints, and firewalls create layered network isolation.

Defense in Depth - Network Layers

Layer	What It Controls	Scope	AnyCompany Use
VPC	Isolated virtual network for your ML resources	Account-level isolation	All SageMaker resources run in AnyCompany VPC - never on public internet
VPC Endpoints	Private connections to AWS services (S3, SageMaker API) without internet	Service-level	Training jobs access S3 data via private endpoint - no NAT gateway needed
Security Groups	Stateful firewall at instance/ENI level. Allow rules only.	Instance-level	SageMaker notebook only accepts connections from AnyCompany corporate CIDR
Network ACLs	Stateless firewall at subnet level. Allow and deny rules.	Subnet-level	Block all inbound traffic to training subnet except from VPC endpoints
AWS Network Firewall	Managed firewall with deep packet inspection, IDS/IPS	VPC-level	Inspect and filter all traffic entering/leaving the ML VPC

Network Architecture for ML

🏠

Private Subnets

All ML compute (notebooks, training, endpoints) runs in private subnets. No public IP addresses. No direct internet access. AnyCompany payroll data never leaves the private network.

🔗

VPC Endpoints

Interface endpoints for SageMaker API, S3 gateway endpoint for data access. Traffic stays on AWS backbone - never touches the internet. Endpoint policies add another access control layer.

🌉

NAT Gateway (if needed)

Only for outbound internet access (downloading packages, external APIs). Placed in public subnet. ML data traffic should use VPC endpoints instead.

🎯

AnyCompany network rule: All ML training data (payroll records, employee PII) must flow through VPC endpoints - never through NAT gateways or the internet. VPC endpoint policies restrict which S3 buckets can be accessed, adding defense beyond IAM.

🔐 Encryption & Data Protection

AnyCompany handles PII for millions of workers across multiple countries (GDPR, CCPA, India DPDP Act). Encryption at rest and in transit is mandatory - not optional.

Encryption Strategy

🔑

AWS KMS (Key Management)

Centralized encryption key management. Create, rotate, and audit keys. Integrates with S3, SageMaker, EBS. AnyCompany: separate keys per data classification level.

🗝️

AWS Secrets Manager

Store and rotate database credentials, API keys, tokens. Never hardcode secrets in notebooks or training scripts. Auto-rotation for compliance.

📜

AWS Certificate Manager

TLS certificates for encryption in transit. All SageMaker endpoints use HTTPS. Internal service-to-service communication encrypted.

🛡️ Data Protection in ML Pipeline

Stage	Data at Risk	Protection	AnyCompany Implementation
Storage (S3)	Training data, model artifacts	SSE-KMS encryption at rest	All ML buckets encrypted with AnyCompany-managed CMK
Training	Data in memory during training	Encrypted EBS volumes, VPC isolation	Training instances use encrypted storage, no internet access
Transit	Data moving between services	TLS 1.2+ for all connections	VPC endpoints ensure traffic never leaves AWS network
Inference	Request/response payloads (may contain PII)	HTTPS endpoints, request logging controls	Fraud scoring requests contain transaction data - encrypted end-to-end
Notebooks	Code, credentials, data samples	Encrypted EBS, Secrets Manager for creds	No PII in notebook outputs. Credentials via Secrets Manager, never hardcoded.

⚠️

Multi-country compliance: AnyCompany operates in 140+ countries. GDPR (EU) requires data residency. India DPDP Act mandates consent. US has state-level laws (CCPA). Encryption keys must be region-specific, and data must not cross borders without proper controls.

🔄 CI/CD Pipeline Security

ML models are software - they need the same CI/CD security as any production code. But ML adds unique risks: poisoned training data, model backdoors, and adversarial attacks. Secure every stage of the pipeline.

Security at Every Pipeline Stage

Stage	Security Measure	What It Catches	AnyCompany Tool
Pre-commit	Pre-commit hooks, IDE checks	Secrets in code, PII in notebooks, formatting issues	git-secrets, detect-secrets hooks
Commit	Static Application Security Testing (SAST)	Code vulnerabilities, insecure patterns, SQL injection	Amazon CodeGuru Reviewer
Build	Software Composition Analysis (SCA)	Vulnerable dependencies, license violations	ECR image scanning, Snyk
Test	DAST + IAST	Runtime vulnerabilities, API security issues	Dynamic testing in staging environment
Deploy	Penetration testing, approval gates	Exploitable endpoints, misconfigurations	Model Registry approval workflow
Monitor	Event monitoring, red/blue teaming	Active attacks, anomalous access patterns	GuardDuty, CloudTrail, Config rules

🏗️ Infrastructure Security Services

📋

AWS CloudFormation (IaC)

Define all ML infrastructure as code. Version-controlled, auditable, repeatable. No manual console changes to production ML resources.

📊

AWS Config

Continuously monitor resource configurations. Alert when S3 buckets lose encryption, security groups open too wide, or VPC endpoints are removed.

👁️

AWS CloudTrail

Log every API call. Who accessed what data, when, from where. Essential for compliance audits and incident investigation at AnyCompany.

🔑

AWS KMS + Secrets Manager

Manage encryption keys and secrets centrally. Auto-rotate credentials. Never store secrets in code, environment variables, or notebooks.

🎮 Security Planner

Select an AnyCompany ML workload to see the recommended security configuration across IAM, network, encryption, and monitoring.

🛡️

Payroll Fraud Model

Processes transaction data containing amounts, employee IDs, and vendor details. Real-time endpoint.

👤

Attrition Prediction

Uses employee demographics, compensation, and performance data. Monthly batch scoring.

📄

Document OCR (Tax Forms)

Processes scanned W-2s and I-9s containing SSNs, addresses, and income. Highest PII sensitivity.

💬

AnyCompany Assist (LLM)

Conversational AI handling employee queries about salary, benefits, PTO. User-facing.

📋 Payroll Fraud Model: Handles financial transaction data. Requires encrypted endpoints, VPC isolation, strict IAM roles, and full audit logging. Real-time endpoint must be highly available but locked down to internal services only.

Security Layer	Configuration
IAM	Dedicated service role with S3 read (transaction data) + CloudWatch write only. No human user access to endpoint.
Network	Private subnet, VPC endpoint for S3. Security group allows inbound only from payroll processing service CIDR.
Encryption	KMS-encrypted S3 bucket, encrypted EBS on endpoint instances, TLS 1.2 for all API calls.
Monitoring	CloudTrail logging all InvokeEndpoint calls. GuardDuty for anomalous access patterns. Config rule for encryption compliance.
Data Protection	No PII in CloudWatch logs (mask sensitive fields). Data capture for model monitoring stored encrypted with separate key.

📝 Module Summary

✅

IAM & Access

Users, groups, roles, policies. Least privilege. Separate service roles for training, processing, and inference.

✅

Network Security

VPC isolation, VPC endpoints, security groups, NACLs, Network Firewall. No public internet for ML data.

✅

Encryption

KMS for keys, Secrets Manager for credentials, TLS in transit, SSE-KMS at rest. Multi-country compliance.

✅

CI/CD Security

SAST, SCA, DAST at every stage. IaC for reproducibility. CloudTrail for audit. No secrets in code.