PHI De-identification by ClerkAI¶
REST API and web dashboard for extracting, anonymizing, and de-anonymizing Protected Health Information (PHI) in medical text. Powered by AWS Bedrock (Claude) for accurate extraction, deployed entirely within your AWS account.
Get it on AWS Marketplace Deployment Guide
What it does¶
- Extracts PHI from unstructured medical text — names, dates, SSNs, addresses, MRNs, and 19 standard HIPAA categories.
- Anonymizes records by replacing each PHI entity with a secure token.
- Restores original values on demand using the same token — fully reversible.
All processing happens in your AWS account. Patient data never leaves your environment.
Quick links¶
-
Deploy from AWS Marketplace
Step-by-step CloudFormation deployment, VPC peering, SSM access, dashboard access, and HTTPS setup.
-
API Reference
Every endpoint with curl examples — extract, anonymize, deanonymize, lookup, tokenize, admin, and PHI categories.
How it works¶
- Extraction: Claude (via AWS Bedrock) identifies PHI entities and classifies them by HIPAA category.
- Tokenization: Each entity is replaced with
PHI__CATEGORY__token— a deterministic, opaque token. - Storage: Tokens map to original values via one of four backends:
- AWS KMS — encrypted in the token itself (no database needed)
- DynamoDB Token-Based — fast key-based lookups
- DynamoDB Record-Based — grouped per record ID
- File — local JSON (development only)
- Deanonymization: Same backend resolves tokens back to original PHI values.
Storage backend choice¶
| Backend | When to use | Tradeoffs |
|---|---|---|
| AWS KMS | Default for production. No database needed. | Per-call cost; value→token lookup not supported |
| DynamoDB — Token | High-volume, fast lookups in both directions | Requires HMAC secret; one row per PHI entity |
| DynamoDB — Record | Granular per-record retrieval and deletion | Record ID required for all operations |
| File | Local development and testing | Not recommended for production |
See the Marketplace Guide for full configuration details.
Compliance and security posture¶
- Data residency: All processing happens inside your AWS account in a private VPC subnet — no data leaves your control.
- No public IP: EC2 runs in a private subnet, accessible only via VPC peering or AWS SSM Session Manager.
- Encryption at rest: AWS KMS (RSA 4096) for PHI ciphertext, AWS-managed DynamoDB encryption for token mappings.
- API authentication: Named API keys backed by DynamoDB with optional expiry and individual revocation.
- Audit trail: All API calls logged to CloudWatch with the calling key's label, with 30-day retention.
Getting started¶
- Subscribe and deploy — see the Marketplace Installation guide for subscription, CloudFormation parameters, VPC peering, SSM access, and dashboard access.
- Call the API — see the API Reference for endpoint-by-endpoint curl examples.
PHI De-identification by ClerkAI — Generative Technologies, Inc.