Skip to content

PHI Processing API

REST API for extracting, anonymizing, and de-anonymizing Protected Health Information (PHI) in medical text.

Interactive Swagger UI: http://localhost:8888/docs


Authentication

All endpoints (except GET /health) require an API key in the request header:

-H "API-Key: YOUR_API_KEY"

The server accepts two key sources, checked in order:

  1. Master key (API_KEY env var) — single shared key set at startup. Cannot be revoked without restarting the server.
  2. Named keys (DynamoDB-backed) — created via the admin API (POST /admin/keys). Each has a label, optional expiry, and can be revoked individually.

Both key types authorize every endpoint equally.


Storage Backends

Pass storage_type as a query parameter to each operation endpoint. See the Storage Backends guide for the full comparison.

storage_type Description
DynamoDBTokenBased Recommended. Token -> value map stored in DynamoDB. Fast lookups in both directions.
DynamoDBGroupedByRecord All PHI for a record stored under one Record ID. Record ID required for every operation.
KMS AWS KMS encryption — ciphertext embedded in token, no database. Value→token lookup not supported.
File Local JSON file. Development only.

If unspecified, /anonymize/phi and /deanonymize/phi default to KMS.


Endpoints

1. Extract PHI — POST /extract/phi

Identify and extract PHI entities from text. Does not anonymize or store anything.

Query parameters:

Parameter Default Description
model server default Bedrock model ID for extraction
aws_region AWS_DEFAULT_REGION env var -> us-east-1 AWS region

Request:

curl -X POST "http://localhost:8888/extract/phi" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "text": "Patient John Smith, SSN 123-45-6789, DOB 01/15/1980."
  }'

Response:

{
  "record_id": "a1b2c3d4-...",
  "extracted_phi": {
    "name": ["John Smith"],
    "ssn": ["123-45-6789"],
    "birthdate": ["01/15/1980"]
  }
}


2. Anonymize PHI — POST /anonymize/phi

Extract PHI and replace it with secure tokens in one step.

Query parameters:

Parameter Default Description
model server default Bedrock model ID for extraction
aws_region us-east-1 AWS region
storage_type KMS KMS, DynamoDBTokenBased, DynamoDBGroupedByRecord, or File
kms_key_id $KMS_KEY_ARN KMS key ARN (falls back to env var). Required for KMS.
kms_encryption_algorithm RSAES_OAEP_SHA_256 KMS algorithm
secret_key HMAC secret. Required for File, DynamoDBTokenBased, DynamoDBGroupedByRecord.
output_dir File output path (only used by File backend)

Request body:

Field Required Description
text yes Medical text to anonymize. Max 500,000 characters.
record_id optional (required for DynamoDBGroupedByRecord) Stable identifier. Auto-generated if omitted.

DynamoDB Token-Based example (recommended):

curl -X POST "http://localhost:8888/anonymize/phi?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "text": "Patient John Smith, SSN 123-45-6789."
  }'

DynamoDB Record-Based example (Record ID required):

curl -X POST "http://localhost:8888/anonymize/phi?storage_type=DynamoDBGroupedByRecord&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "text": "Patient John Smith, SSN 123-45-6789.",
    "record_id": "patient-12345"
  }'

KMS example:

curl -X POST "http://localhost:8888/anonymize/phi?storage_type=KMS&aws_region=us-east-1&kms_key_id=arn:aws:kms:us-east-1:...:key/..." \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "text": "Patient John Smith, SSN 123-45-6789."
  }'

Response:

{
  "record_id": "patient-12345",
  "anonymized_text": "Patient PHI__NAME__aBcD1234efGh with SSN PHI__SSN__xYzW5678ijKl.",
  "success": true
}


3. Deanonymize PHI — POST /deanonymize/phi

Restore original PHI values from anonymized text. Must use the same storage_type and credentials used during anonymization.

Query parameters: Same as /anonymize/phi.

Request body:

Field Required Description
anonymized_text yes Text containing PHI tokens to restore.
record_id required for DynamoDBGroupedByRecord Must match the ID used during anonymization. Ignored for KMS and File.

KMS example (self-contained — no DB lookup needed):

curl -X POST "http://localhost:8888/deanonymize/phi?storage_type=KMS&aws_region=us-east-1&kms_key_id=arn:aws:kms:us-east-1:...:key/..." \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "anonymized_text": "Patient PHI__NAME__aBcD1234efGh with SSN PHI__SSN__xYzW5678ijKl."
  }'

DynamoDB Token-Based example:

curl -X POST "http://localhost:8888/deanonymize/phi?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "anonymized_text": "Patient PHI__NAME__aBcD1234efGh with SSN PHI__SSN__xYzW5678ijKl."
  }'

DynamoDB Record-Based example (Record ID required):

curl -X POST "http://localhost:8888/deanonymize/phi?storage_type=DynamoDBGroupedByRecord&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "anonymized_text": "Patient PHI__NAME__aBcD1234efGh.",
    "record_id": "patient-12345"
  }'

Response:

{
  "record_id": "patient-12345",
  "deanonymized_text": "Patient John Smith, SSN 123-45-6789.",
  "success": true
}


4. Tokenize a PHI Value — POST /tokenize/tokenize

Generate a stable token for a single PHI value without embedding it in text. Useful for pre-populating a mapping before anonymizing documents.

phi_type must match one of the active categories. See the PHI Categories reference for the full default list and how to customize it, or call GET /phi-categories for the current active set on your server.

Not supported for KMS — KMS encryption is non-deterministic and has no lookup backend.

Query parameters: Same as /anonymize/phi.

Request body:

Field Required Description
phi_type yes PHI category — must match an active category
value yes The PHI value to tokenize
record_id required for DynamoDBGroupedByRecord Record group to write into

Example:

curl -X POST "http://localhost:8888/tokenize/tokenize?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "phi_type": "name",
    "value": "John Smith"
  }'

Response:

{
  "phi_type": "name",
  "value": "John Smith",
  "token": "PHI__NAME__vil4lVu59XgG",
  "record_id": null
}


5. Look Up Token for a Value — POST /lookup/value2token

Find the token previously generated for a given PHI value.

Not supported for KMS — returns 501. Use token2value instead.

Query parameters: Same as /anonymize/phi.

Request body:

Field Required Description
value yes The original PHI value to look up
record_id required for DynamoDBGroupedByRecord Record group to search within

Example:

curl -X POST "http://localhost:8888/lookup/value2token?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "value": "John Smith"
  }'

Response:

{
  "value": "John Smith",
  "token": "PHI__NAME__vil4lVu59XgG"
}


6. Look Up Value for a Token — POST /lookup/token2value

Resolve a token back to its original PHI value. Works with all backends including KMS (which decrypts the ciphertext embedded in the token).

Query parameters: Same as /anonymize/phi.

Request body:

Field Required Description
token yes The PHI__CATEGORY__token string to resolve
record_id required for DynamoDBGroupedByRecord Record group to search within

Example:

curl -X POST "http://localhost:8888/lookup/token2value?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{
    "token": "PHI__NAME__vil4lVu59XgG"
  }'

Response:

{
  "token": "PHI__NAME__vil4lVu59XgG",
  "value": "John Smith"
}


7. Health Check — GET /health

No authentication required.

curl http://localhost:8888/health
{
  "status": "healthy",
  "version": "1.0.0"
}

8. PHI Category Management

Manage the set of PHI categories the extractor recognizes. Changes take effect immediately on all subsequent extraction requests and persist to src/config/phi.json. Category names must be lowercase, start with a letter, and contain only a-z, 0-9, and _.

GET /phi-categories — List categories

curl "http://localhost:8888/phi-categories" -H "API-Key: YOUR_API_KEY"
{
  "categories": [
    {"name": "name", "description": "Patient or person name"},
    {"name": "ssn",  "description": "U.S. Social Security number"}
  ]
}

POST /phi-categories — Add a category

curl -X POST "http://localhost:8888/phi-categories" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{"name": "patient_id", "description": "Internal patient identifier"}'

Returns the full category list (status 201). Returns 409 if the name already exists.

DELETE /phi-categories/{name} — Remove a category

curl -X DELETE "http://localhost:8888/phi-categories/patient_id" \
  -H "API-Key: YOUR_API_KEY"

Returns the updated category list. Returns 404 if the category isn't found.


9. Admin — API Key Management

All /admin/* endpoints require a valid API key (any named key, or the API_KEY env var). Authorization is binary — every valid key can manage other keys.

Named keys are stored as SHA-256 hashes in the api_keys DynamoDB table. The raw key is shown once on creation and cannot be retrieved later — capture it immediately and store it securely.

POST /admin/keys — Create a key

curl -X POST "http://localhost:8888/admin/keys" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{"label": "data-pipeline"}'

Optional expires_in_days (1-3650) for keys that auto-expire. Returns 201 with the raw key:

{
  "key_id": "f3a8...uuid",
  "label": "data-pipeline",
  "api_key": "dei_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "created_at": "2026-04-29T10:00:00",
  "expires_at": null,
  "warning": "This key will not be shown again. Store it now."
}

GET /admin/keys — List keys

curl "http://localhost:8888/admin/keys" -H "API-Key: YOUR_API_KEY"

Returns metadata only — never the hash or raw key.

DELETE /admin/keys/{key_id} — Revoke a key

curl -X DELETE "http://localhost:8888/admin/keys/f3a8...uuid" -H "API-Key: YOUR_API_KEY"

Returns 200 on success, 404 if the key_id is unknown.

GET /admin/keys/me — Introspect the calling key

curl "http://localhost:8888/admin/keys/me" -H "API-Key: YOUR_API_KEY"

Returns:

{"label": "env_master_key", "key_id": null}

For named keys, key_id is the UUID. For the API_KEY env var, key_id is null and label is env_master_key.


Environment Variables

Variable Description
API_KEY Required. Master API key checked on every request.
KMS_KEY_ARN AWS KMS key ARN (used as default when kms_key_id query param is not passed).
DYNAMODB_TOKEN_TABLE DynamoDB table name for token-based storage (default: phi_mappings_token_based). CloudFormation sets this automatically.
DYNAMODB_RECORD_TABLE DynamoDB table name for record-based storage (default: phi_mappings_record_based). CloudFormation sets this automatically.
API_KEYS_TABLE DynamoDB table for named API keys (default: api_keys). CloudFormation sets this automatically.
ALLOWED_ORIGINS Comma-separated CORS origins (default: http://localhost:3000,http://localhost:8888).
DEBUG true enables hot-reload and forces single worker (development only).
AWS_REGION / AWS_DEFAULT_REGION Default AWS region for Bedrock / KMS / DynamoDB calls (default us-east-1).
LICENSE_CHECK Set to skip to bypass the Marketplace product-code check (local dev only).

Input Limits

  • text field: max 500,000 characters (requests exceeding this are rejected with HTTP 422)

Testing & Benchmarking

Scripts under scripts/ for evaluating a running server:

# Sanity test every endpoint
API_KEY=your-key KMS_KEY_ARN=arn:... DYNAMODB_SECRET_KEY=secret \
  python scripts/test_api.py

# Benchmark throughput across backends, sync + async
python scripts/benchmark_api.py --samples 10

All scripts read credentials from environment variables (API_KEY, KMS_KEY_ARN, DYNAMODB_SECRET_KEY, AWS_DEFAULT_REGION).


Local DynamoDB Setup

For local development without an AWS account:

# 1. Start DynamoDB Local in Docker
docker run -p 8000:8000 amazon/dynamodb-local

# 2. Set the endpoint env var so boto3 connects to local
export AWS_ENDPOINT_URL_DYNAMODB=http://localhost:8000
export AWS_ACCESS_KEY_ID=local AWS_SECRET_ACCESS_KEY=local

# 3. Start the API
API_KEY=your-key python -m src.api.run

Tables are created automatically on first use. _KEY,KMS_KEY_ARN,DYNAMODB_SECRET_KEY`).


Local DynamoDB Setup

# 1. Start DynamoDB Local
docker run -p 8000:8000 amazon/dynamodb-local

# 2. Create the table
python -m src.tools.create_dynamodb_table

# 3. Start the API against the local endpoint
API_KEY=your-key python -m src.api.run \
  --storage dynamodb \
  --dynamodb-endpoint http://localhost:8000