PHI Processing API¶
REST API for extracting, anonymizing, and de-anonymizing Protected Health Information (PHI) in medical text.
Interactive Swagger UI: http://localhost:8888/docs
Authentication¶
All endpoints (except GET /health) require an API key in the request header:
The server accepts two key sources, checked in order:
- Master key (
API_KEYenv var) — single shared key set at startup. Cannot be revoked without restarting the server. - Named keys (DynamoDB-backed) — created via the admin API (
POST /admin/keys). Each has a label, optional expiry, and can be revoked individually.
Both key types authorize every endpoint equally.
Storage Backends¶
Pass storage_type as a query parameter to each operation endpoint. See the Storage Backends guide for the full comparison.
storage_type |
Description |
|---|---|
DynamoDBTokenBased |
Recommended. Token -> value map stored in DynamoDB. Fast lookups in both directions. |
DynamoDBGroupedByRecord |
All PHI for a record stored under one Record ID. Record ID required for every operation. |
KMS |
AWS KMS encryption — ciphertext embedded in token, no database. Value→token lookup not supported. |
File |
Local JSON file. Development only. |
If unspecified, /anonymize/phi and /deanonymize/phi default to KMS.
Endpoints¶
1. Extract PHI — POST /extract/phi¶
Identify and extract PHI entities from text. Does not anonymize or store anything.
Query parameters:
| Parameter | Default | Description |
|---|---|---|
model |
server default | Bedrock model ID for extraction |
aws_region |
AWS_DEFAULT_REGION env var -> us-east-1 |
AWS region |
Request:
curl -X POST "http://localhost:8888/extract/phi" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"text": "Patient John Smith, SSN 123-45-6789, DOB 01/15/1980."
}'
Response:
{
"record_id": "a1b2c3d4-...",
"extracted_phi": {
"name": ["John Smith"],
"ssn": ["123-45-6789"],
"birthdate": ["01/15/1980"]
}
}
2. Anonymize PHI — POST /anonymize/phi¶
Extract PHI and replace it with secure tokens in one step.
Query parameters:
| Parameter | Default | Description |
|---|---|---|
model |
server default | Bedrock model ID for extraction |
aws_region |
us-east-1 |
AWS region |
storage_type |
KMS |
KMS, DynamoDBTokenBased, DynamoDBGroupedByRecord, or File |
kms_key_id |
$KMS_KEY_ARN |
KMS key ARN (falls back to env var). Required for KMS. |
kms_encryption_algorithm |
RSAES_OAEP_SHA_256 |
KMS algorithm |
secret_key |
— | HMAC secret. Required for File, DynamoDBTokenBased, DynamoDBGroupedByRecord. |
output_dir |
— | File output path (only used by File backend) |
Request body:
| Field | Required | Description |
|---|---|---|
text |
yes | Medical text to anonymize. Max 500,000 characters. |
record_id |
optional (required for DynamoDBGroupedByRecord) |
Stable identifier. Auto-generated if omitted. |
DynamoDB Token-Based example (recommended):
curl -X POST "http://localhost:8888/anonymize/phi?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"text": "Patient John Smith, SSN 123-45-6789."
}'
DynamoDB Record-Based example (Record ID required):
curl -X POST "http://localhost:8888/anonymize/phi?storage_type=DynamoDBGroupedByRecord&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"text": "Patient John Smith, SSN 123-45-6789.",
"record_id": "patient-12345"
}'
KMS example:
curl -X POST "http://localhost:8888/anonymize/phi?storage_type=KMS&aws_region=us-east-1&kms_key_id=arn:aws:kms:us-east-1:...:key/..." \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"text": "Patient John Smith, SSN 123-45-6789."
}'
Response:
{
"record_id": "patient-12345",
"anonymized_text": "Patient PHI__NAME__aBcD1234efGh with SSN PHI__SSN__xYzW5678ijKl.",
"success": true
}
3. Deanonymize PHI — POST /deanonymize/phi¶
Restore original PHI values from anonymized text. Must use the same storage_type and credentials used during anonymization.
Query parameters: Same as /anonymize/phi.
Request body:
| Field | Required | Description |
|---|---|---|
anonymized_text |
yes | Text containing PHI tokens to restore. |
record_id |
required for DynamoDBGroupedByRecord |
Must match the ID used during anonymization. Ignored for KMS and File. |
KMS example (self-contained — no DB lookup needed):
curl -X POST "http://localhost:8888/deanonymize/phi?storage_type=KMS&aws_region=us-east-1&kms_key_id=arn:aws:kms:us-east-1:...:key/..." \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"anonymized_text": "Patient PHI__NAME__aBcD1234efGh with SSN PHI__SSN__xYzW5678ijKl."
}'
DynamoDB Token-Based example:
curl -X POST "http://localhost:8888/deanonymize/phi?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"anonymized_text": "Patient PHI__NAME__aBcD1234efGh with SSN PHI__SSN__xYzW5678ijKl."
}'
DynamoDB Record-Based example (Record ID required):
curl -X POST "http://localhost:8888/deanonymize/phi?storage_type=DynamoDBGroupedByRecord&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"anonymized_text": "Patient PHI__NAME__aBcD1234efGh.",
"record_id": "patient-12345"
}'
Response:
{
"record_id": "patient-12345",
"deanonymized_text": "Patient John Smith, SSN 123-45-6789.",
"success": true
}
4. Tokenize a PHI Value — POST /tokenize/tokenize¶
Generate a stable token for a single PHI value without embedding it in text. Useful for pre-populating a mapping before anonymizing documents.
phi_type must match one of the active categories. See the PHI Categories reference for the full default list and how to customize it, or call GET /phi-categories for the current active set on your server.
Not supported for KMS — KMS encryption is non-deterministic and has no lookup backend.
Query parameters: Same as /anonymize/phi.
Request body:
| Field | Required | Description |
|---|---|---|
phi_type |
yes | PHI category — must match an active category |
value |
yes | The PHI value to tokenize |
record_id |
required for DynamoDBGroupedByRecord |
Record group to write into |
Example:
curl -X POST "http://localhost:8888/tokenize/tokenize?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"phi_type": "name",
"value": "John Smith"
}'
Response:
{
"phi_type": "name",
"value": "John Smith",
"token": "PHI__NAME__vil4lVu59XgG",
"record_id": null
}
5. Look Up Token for a Value — POST /lookup/value2token¶
Find the token previously generated for a given PHI value.
Not supported for KMS — returns 501. Use token2value instead.
Query parameters: Same as /anonymize/phi.
Request body:
| Field | Required | Description |
|---|---|---|
value |
yes | The original PHI value to look up |
record_id |
required for DynamoDBGroupedByRecord |
Record group to search within |
Example:
curl -X POST "http://localhost:8888/lookup/value2token?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"value": "John Smith"
}'
Response:
6. Look Up Value for a Token — POST /lookup/token2value¶
Resolve a token back to its original PHI value. Works with all backends including KMS (which decrypts the ciphertext embedded in the token).
Query parameters: Same as /anonymize/phi.
Request body:
| Field | Required | Description |
|---|---|---|
token |
yes | The PHI__CATEGORY__token string to resolve |
record_id |
required for DynamoDBGroupedByRecord |
Record group to search within |
Example:
curl -X POST "http://localhost:8888/lookup/token2value?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{
"token": "PHI__NAME__vil4lVu59XgG"
}'
Response:
7. Health Check — GET /health¶
No authentication required.
8. PHI Category Management¶
Manage the set of PHI categories the extractor recognizes. Changes take effect immediately on all subsequent extraction requests and persist to src/config/phi.json. Category names must be lowercase, start with a letter, and contain only a-z, 0-9, and _.
GET /phi-categories — List categories¶
{
"categories": [
{"name": "name", "description": "Patient or person name"},
{"name": "ssn", "description": "U.S. Social Security number"}
]
}
POST /phi-categories — Add a category¶
curl -X POST "http://localhost:8888/phi-categories" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{"name": "patient_id", "description": "Internal patient identifier"}'
Returns the full category list (status 201). Returns 409 if the name already exists.
DELETE /phi-categories/{name} — Remove a category¶
Returns the updated category list. Returns 404 if the category isn't found.
9. Admin — API Key Management¶
All /admin/* endpoints require a valid API key (any named key, or the API_KEY env var). Authorization is binary — every valid key can manage other keys.
Named keys are stored as SHA-256 hashes in the api_keys DynamoDB table. The raw key is shown once on creation and cannot be retrieved later — capture it immediately and store it securely.
POST /admin/keys — Create a key¶
curl -X POST "http://localhost:8888/admin/keys" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{"label": "data-pipeline"}'
Optional expires_in_days (1-3650) for keys that auto-expire. Returns 201 with the raw key:
{
"key_id": "f3a8...uuid",
"label": "data-pipeline",
"api_key": "dei_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"created_at": "2026-04-29T10:00:00",
"expires_at": null,
"warning": "This key will not be shown again. Store it now."
}
GET /admin/keys — List keys¶
Returns metadata only — never the hash or raw key.
DELETE /admin/keys/{key_id} — Revoke a key¶
Returns 200 on success, 404 if the key_id is unknown.
GET /admin/keys/me — Introspect the calling key¶
Returns:
For named keys, key_id is the UUID. For the API_KEY env var, key_id is null and label is env_master_key.
Environment Variables¶
| Variable | Description |
|---|---|
API_KEY |
Required. Master API key checked on every request. |
KMS_KEY_ARN |
AWS KMS key ARN (used as default when kms_key_id query param is not passed). |
DYNAMODB_TOKEN_TABLE |
DynamoDB table name for token-based storage (default: phi_mappings_token_based). CloudFormation sets this automatically. |
DYNAMODB_RECORD_TABLE |
DynamoDB table name for record-based storage (default: phi_mappings_record_based). CloudFormation sets this automatically. |
API_KEYS_TABLE |
DynamoDB table for named API keys (default: api_keys). CloudFormation sets this automatically. |
ALLOWED_ORIGINS |
Comma-separated CORS origins (default: http://localhost:3000,http://localhost:8888). |
DEBUG |
true enables hot-reload and forces single worker (development only). |
AWS_REGION / AWS_DEFAULT_REGION |
Default AWS region for Bedrock / KMS / DynamoDB calls (default us-east-1). |
LICENSE_CHECK |
Set to skip to bypass the Marketplace product-code check (local dev only). |
Input Limits¶
textfield: max 500,000 characters (requests exceeding this are rejected with HTTP 422)
Testing & Benchmarking¶
Scripts under scripts/ for evaluating a running server:
# Sanity test every endpoint
API_KEY=your-key KMS_KEY_ARN=arn:... DYNAMODB_SECRET_KEY=secret \
python scripts/test_api.py
# Benchmark throughput across backends, sync + async
python scripts/benchmark_api.py --samples 10
All scripts read credentials from environment variables (API_KEY, KMS_KEY_ARN, DYNAMODB_SECRET_KEY, AWS_DEFAULT_REGION).
Local DynamoDB Setup¶
For local development without an AWS account:
# 1. Start DynamoDB Local in Docker
docker run -p 8000:8000 amazon/dynamodb-local
# 2. Set the endpoint env var so boto3 connects to local
export AWS_ENDPOINT_URL_DYNAMODB=http://localhost:8000
export AWS_ACCESS_KEY_ID=local AWS_SECRET_ACCESS_KEY=local
# 3. Start the API
API_KEY=your-key python -m src.api.run
Tables are created automatically on first use.
_KEY,KMS_KEY_ARN,DYNAMODB_SECRET_KEY`).