Storage Backends¶

The storage backend controls how PHI token mappings are stored and retrieved. Choose one when making each API request via the storage_type query parameter, or set it once in the dashboard.

Recommendation¶

Use DynamoDB Token-Based for production

DynamoDB Token-Based is the recommended backend for most production deployments. It's fast in both lookup directions, scales automatically with usage, costs a fraction of a cent per operation, and doesn't require you to manage Record IDs. Pick this unless you have a specific reason not to.

At a glance¶

Backend	Production-ready	Lookup directions
DynamoDB Token-Based	Yes	Both directions
DynamoDB Record-Based	Yes	Both directions
AWS KMS	Yes	Token->value only
File	Dev only	Both directions

DynamoDB Token-Based — Recommended¶

How it works: Each PHI value gets an HMAC token (deterministic — same value always produces the same token for a given secret). The token is the primary key in a DynamoDB table, with the original value stored as a column. Lookups by token are direct key reads.

Bidirectional lookups — both token -> value and value -> token work without a Record ID
Fast — token-keyed reads are ~10 ms per lookup
Scales automatically — DynamoDB on-demand mode grows with your traffic, no capacity planning
Cheap at scale — at typical 30 PHI entities per document, ~$0.00005 per anonymize request
Persistent and durable — DynamoDB replicates across three Availability Zones automatically
No per-record state — anonymize and deanonymize are stateless from the caller's perspective; you don't need to remember a Record ID to restore your data
Deterministic tokens — the same PHI value always produces the same token, which simplifies de-duplication and pipeline integration

What you need¶

Requirement	Notes
Secret Key	HMAC secret used for token generation. Must stay the same across all operations or tokens won't match.
AWS Region	Region where the DynamoDB table lives.

Configuration example¶

curl -X POST "http://<host>:8888/anonymize/phi?storage_type=DynamoDBTokenBased&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Patient John Smith, SSN 123-45-6789."}'

In the dashboard: Configuration -> Storage Backend -> DynamoDB — Token.

DynamoDB Record-Based¶

How it works: Each anonymize call writes a single DynamoDB item keyed by Record ID. All PHI mappings for that document are grouped together under that one key. Lookups and deanonymization require the Record ID.

When to choose it¶

You need granular per-record retrieval or deletion (e.g. patient requests right-to-be-forgotten and you must purge their tokens)
You already track stable Record IDs (case numbers, patient IDs) and want them as the unit of organization
Your compliance workflow requires the ability to enumerate all PHI per record in one operation

Trade-offs¶

Record ID is required for every operation — anonymize, deanonymize, lookup. Lose it and the data is unrecoverable.
Read-modify-write pattern — storing a new PHI value requires reading the existing item, appending, then writing back. This is slower than Token-Based and prone to contention under heavy concurrent writes for the same Record ID.
Per-partition throughput limits — if many requests use the same Record ID at once, DynamoDB throttling can occur.

What you need¶

Requirement	Notes
Secret Key	Same as Token-Based.
Record ID	A stable identifier (e.g. `patient-12345`, `case-2026-001`). You must pass this consistently across anonymize, deanonymize, and lookup.
AWS Region	DynamoDB region.

Configuration example¶

curl -X POST "http://<host>:8888/anonymize/phi?storage_type=DynamoDBGroupedByRecord&aws_region=us-east-1&secret_key=YOUR_SECRET" \
  -H "API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Patient John Smith...", "record_id": "patient-12345"}'

AWS KMS¶

How it works: Each PHI value is encrypted with an AWS KMS RSA-4096 key and the ciphertext is embedded directly in the token — no database is involved. To deanonymize, the token is decrypted on the fly. To anonymize, KMS encrypts each value individually.

When to choose it¶

You explicitly want no separate database for PHI mappings
You need defense-in-depth encryption at rest with hardware-backed keys
Your security review requires KMS-managed encryption keys with rotation, IAM-based access control, and CloudTrail audit on every decrypt

Trade-offs¶

Value -> token lookup is not supported — RSA encryption is non-deterministic, so the same PHI value produces a different ciphertext on every call. You can decrypt tokens but cannot search for "what's the token for John Smith?"
Per-call AWS cost — KMS charges ~$0.0001 per encrypt and decrypt. With ~30 PHI values per document, this adds ~$0.003 per anonymize request and another ~$0.003 per deanonymize. ~300× more expensive than DynamoDB per request.
Slower — each PHI value triggers a network round-trip to KMS, both during anonymize and during deanonymize.
Tokens are larger — RSA ciphertext is much bigger than HMAC tokens, so anonymized text is significantly longer than the original.

What you need¶

Requirement	Notes
KMS Key ARN	An RSA 4096 key with `kms:Encrypt` and `kms:Decrypt` permissions for the API server's IAM role. The CloudFormation template creates this automatically.
AWS Region	Region where the KMS key exists.

Configuration example¶

curl -X POST "http://<host>:8888/anonymize/phi?storage_type=KMS&aws_region=us-east-1&kms_key_id=arn:aws:kms:..." \
  -H "API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Patient John Smith..."}'

File¶

How it works: Token mappings are written to a JSON file on the API server's local disk.

When to choose it¶

Local development only. Quick sanity checks without spinning up AWS resources.

Why not in production¶

Single-instance only — file is on local disk; multiple API servers can't share it
No backup or replication — disk failure = total data loss
No concurrent-write safety — risk of corruption under load
Doesn't scale — every operation reads/writes the entire mappings file

What you need¶

Requirement	Notes
Secret Key	HMAC secret for token generation.
Output Directory	Server-side path where the JSON file is saved (optional).

Decision tree¶

Are you in production?
├─ No -> File (development only)
└─ Yes
   ├─ Do you need to search by PHI value (e.g. "what's the token for John Smith"?)
   │  ├─ Yes -> DynamoDB Token-Based (recommended) or DynamoDB Record-Based
   │  └─ No, only token -> value lookup -> any backend works
   │
   ├─ Do you need to delete all PHI for a specific record at once?
   │  ├─ Yes -> DynamoDB Record-Based
   │  └─ No -> DynamoDB Token-Based (recommended)
   │
   └─ Do you require KMS-managed encryption with no separate database?
      ├─ Yes -> AWS KMS (accept the cost trade-off)
      └─ No -> DynamoDB Token-Based (recommended)

Cost¶

For the per-request and per-hour cost breakdown across all backends — including how storage choice compares to Bedrock, EC2, and other AWS costs — see the Cost Analysis.

Short version: DynamoDB Token-Based is ~$0.00005 per request; KMS is ~$0.003 per request. At scale, KMS storage cost can exceed your EC2 cost. But both are dwarfed by Bedrock (~$0.009/request), which dominates the total bill regardless of which backend you choose.

Switching backends¶

You can change backends at any time by passing a different storage_type query parameter — there is no migration required at the server level. However:

Data anonymized with one backend cannot be deanonymized using a different backend. The token formats and storage locations are incompatible.
If you want to migrate existing anonymized data, you must deanonymize with the old backend and re-anonymize with the new one.
We recommend choosing a backend at deployment time and sticking with it.

Summary¶

For nearly all production use cases, DynamoDB Token-Based is the right choice. It's the fastest, cheapest, and most flexible option. Pick another backend only if you have a specific reason: granular per-record deletion (Record-Based), or no-database encryption requirements (KMS).

Storage Backends¶

Recommendation¶

At a glance¶

DynamoDB Token-Based — Recommended¶

Why we recommend it¶

What you need¶

Configuration example¶

DynamoDB Record-Based¶

When to choose it¶

Trade-offs¶

What you need¶

Configuration example¶

AWS KMS¶

When to choose it¶

Trade-offs¶

What you need¶

Configuration example¶

File¶

When to choose it¶

Why not in production¶

What you need¶

Decision tree¶

Cost¶

Switching backends¶

Summary¶