Skip to content

PHI De-identification by ClerkAI

REST API and web dashboard for extracting, anonymizing, and de-anonymizing Protected Health Information (PHI) in medical text. Powered by AWS Bedrock (Claude) for accurate extraction, deployed entirely within your AWS account.

Get it on AWS Marketplace Deployment Guide


What it does

  1. Extracts PHI from unstructured medical text — names, dates, SSNs, addresses, MRNs, and 19 standard HIPAA categories.
  2. Anonymizes records by replacing each PHI entity with a secure token.
  3. Restores original values on demand using the same token — fully reversible.

All processing happens in your AWS account. Patient data never leaves your environment.


  • Deploy from AWS Marketplace

    Step-by-step CloudFormation deployment, VPC peering, SSM access, dashboard access, and HTTPS setup.

    → Marketplace Guide

  • API Reference

    Every endpoint with curl examples — extract, anonymize, deanonymize, lookup, tokenize, admin, and PHI categories.

    → API Endpoints


How it works

  1. Extraction: Claude (via AWS Bedrock) identifies PHI entities and classifies them by HIPAA category.
  2. Tokenization: Each entity is replaced with PHI__CATEGORY__token — a deterministic, opaque token.
  3. Storage: Tokens map to original values via one of four backends:
    • AWS KMS — encrypted in the token itself (no database needed)
    • DynamoDB Token-Based — fast key-based lookups
    • DynamoDB Record-Based — grouped per record ID
    • File — local JSON (development only)
  4. Deanonymization: Same backend resolves tokens back to original PHI values.

Storage backend choice

Backend When to use Tradeoffs
AWS KMS Default for production. No database needed. Per-call cost; value→token lookup not supported
DynamoDB — Token High-volume, fast lookups in both directions Requires HMAC secret; one row per PHI entity
DynamoDB — Record Granular per-record retrieval and deletion Record ID required for all operations
File Local development and testing Not recommended for production

See the Marketplace Guide for full configuration details.


Compliance and security posture

  • Data residency: All processing happens inside your AWS account in a private VPC subnet — no data leaves your control.
  • No public IP: EC2 runs in a private subnet, accessible only via VPC peering or AWS SSM Session Manager.
  • Encryption at rest: AWS KMS (RSA 4096) for PHI ciphertext, AWS-managed DynamoDB encryption for token mappings.
  • API authentication: Named API keys backed by DynamoDB with optional expiry and individual revocation.
  • Audit trail: All API calls logged to CloudWatch with the calling key's label, with 30-day retention.

Getting started

  1. Subscribe and deploy — see the Marketplace Installation guide for subscription, CloudFormation parameters, VPC peering, SSM access, and dashboard access.
  2. Call the API — see the API Reference for endpoint-by-endpoint curl examples.

PHI De-identification by ClerkAI — Generative Technologies, Inc.