Cost Analysis¶
Per-hour cost breakdown for running the PHI De-identification API at a measured benchmark of 6,000 documents/hour.
Benchmark assumptions¶
| Parameter | Value |
|---|---|
| Throughput | 6,000 requests/hour (~100 req/min) |
| Document size | 1,000 words each |
| Words processed per hour | 6,000,000 |
| AWS region | us-east-1 |
| Bedrock model | Claude Sonnet 4 |
| EC2 instance | t3.2xlarge (the size that sustained 6k req/hour without throttling) |
Per-hour cost summary¶
| Cost component | Hourly cost | Notes |
|---|---|---|
| EC2 (t3.2xlarge on-demand) (This is for heavy workload) | $0.333 | $0.3328/hour in us-east-1 |
| NAT Gateway (idle + transfer) | $0.05 | $0.045/hour + minor data transfer |
| Bedrock — input tokens | $25.20 | 6,000 × ~1,400 tokens × $3/M |
| Bedrock — output tokens | $27.00 | 6,000 × ~300 tokens × $15/M |
| DynamoDB (Token-Based, recommended) | $0.30 | 6,000 × ~30 writes × $1.25/M WCU |
| CloudWatch Logs | $0.01 | Minor ingestion |
| Total (DynamoDB Token-Based) | ~$52.89/hour | |
| Total (KMS storage) | ~$70.59/hour | KMS adds ~$18/hour at this rate |
Bedrock is 99% of the variable cost. Storage choice and EC2 size barely move the needle.
Detailed breakdown¶
1. EC2 — $0.333/hour¶
t3.2xlarge on-demand in us-east-1: $0.3328/hour.
This is a fixed cost — you pay it whether or not the API is processing any requests. It covers:
- 8 vCPUs, 32 GB RAM
- ~$240/month if running 24/7
Cheaper options if you can tolerate lower throughput:
| Instance | Hourly | Notes (Estimated) |
|---|---|---|
t3.small |
$0.021 | ~180 req/hour throughput |
t3.medium |
$0.042 | ~400 req/hour throughput |
t3.large |
$0.083 | ~800 req/hour throughput |
t3.xlarge |
$0.166 | ~2,000 req/hour throughput |
t3.2xlarge |
$0.333 | 6,000 req/hour throughput |
2. NAT Gateway — ~$0.05/hour¶
The CloudFormation template provisions a NAT Gateway so the private-subnet EC2 can reach AWS services (Bedrock, DynamoDB, KMS).
- Base: $0.045/hour (~$32/month)
- Data transfer: $0.045/GB processed (typically ~$0.005/hour at this request rate)
NAT Gateway is the only fixed-cost item besides EC2 that you can't avoid in private-subnet mode.
3. Bedrock — $52.20/hour (the elephant)¶
Pricing for Claude Sonnet 4 in us-east-1:
| Token type | Rate |
|---|---|
| Input tokens | $3.00 per 1M tokens |
| Output tokens | $15.00 per 1M tokens |
For a 1,000-word document: - Input: ~1,400 tokens (text + prompt overhead) = $0.0042 - Output: ~300 tokens (PHI list) = $0.0045 - = ~$0.0087 per request
At 6,000 requests/hour: - Input: 6,000 × $0.0042 = $25.20/hour - Output: 6,000 × $0.0045 = $27.00/hour - = $52.20/hour total
Bedrock dominates everything
At 99% of variable cost, every other optimization is rounding error compared to Bedrock. To meaningfully reduce cost, you must either:
- Process fewer or smaller documents
- Use a cheaper model (see comparison below)
Bedrock model alternatives¶
| Model | Input ($/M) | Output ($/M) | Per request | Per hour @ 6k/hr |
|---|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | $0.044 | $264 |
| Claude Sonnet 4 (benchmark) | $3.00 | $15.00 | $0.0087 | $52 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.0029 | $17 |
Switching from Sonnet to Haiku cuts variable cost by 70% but may reduce extraction accuracy by a small margin. Worth testing for your specific use case before committing.
4. DynamoDB — $0.30/hour (Token-Based, recommended)¶
For a 1,000-word document with ~30 PHI entities, each anonymize request makes ~30 DynamoDB writes.
| Backend | Cost per request | At 6,000 req/hour |
|---|---|---|
| DynamoDB Token-Based ★ | $0.00005 | $0.30/hour |
| DynamoDB Record-Based | $0.00006 | $0.36/hour |
| AWS KMS | $0.003 | $18/hour |
| File | $0 | $0/hour (dev only) |
KMS at this scale costs ~$18/hour — 60× more than DynamoDB. That's $432/day or ~$13,000/month just for KMS at full throughput. Use DynamoDB unless you have an explicit compliance requirement for KMS.
See Storage Backends for the full comparison.
5. CloudWatch Logs — ~$0.01/hour¶
API logs at this volume: ~50 MB/hour at ~$0.50/GB ingested = ~$0.025/hour, rounded for simplicity. 30-day retention adds storage charges of ~$0.03/GB-month, which is negligible.
At-rest cost (idle, no traffic)¶
If the API is deployed but processes zero requests:
Deployed on t3.2xlarge Instance: | Component | Hourly | Monthly | |---|---|---| | EC2 t3.2xlarge | $0.333 | $240 | | NAT Gateway | $0.045 | $32 | | EBS storage (30 GB) | $0.003 | $2.40 | | Idle total | ~$0.38/hour | ~$275/month |
Deployed on t3.medium instance: | Component | Hourly | Monthly | |---|---|---| | EC2 t3.medium | $0.0416 | $30 | | NAT Gateway | $0.045 | $32 | | EBS storage (30 GB) | $0.003 | $2.40 | | Idle total | ~$0.0896/hour | ~$64.5/month |
This is what a customer pays regardless of usage. It's also what determines the minimum hourly fee you should charge on Marketplace.
Monthly projections¶
Assuming continuous 24/7 operation at the benchmark rate (4.38M requests/month):
| Cost line | Monthly (DynamoDB) | Monthly (KMS) |
|---|---|---|
| EC2 (t3.2xlarge) + NAT + EBS + Logs | $283 | $283 |
| Bedrock | $38,065 | $38,065 |
| Storage | $219 | $13,140 |
| Total | ~$38,567 | ~$51,488 |
| Per request | $0.0088 | $0.0117 |
If you only process documents during business hours (40h/week ≈ 173h/month), variable costs scale linearly with usage, but fixed costs stay the same.
Cost-per-document at different volumes¶
How total cost (fixed + variable) scales with usage:
| Requests/month | Variable cost | Fixed cost | Total | Per request |
|---|---|---|---|---|
| 100,000 | $880 | $283 | $1,163 | $0.0116 |
| 1,000,000 | $8,800 | $283 | $9,083 | $0.0091 |
| 4,380,000 (full capacity) | $38,544 | $283 | $38,827 | $0.0089 |
Fixed costs dominate at low volumes — at 100k requests/month, ~25% of your cost is paying for the idle infrastructure. At full utilization, fixed costs are noise.
Optimization checklist¶
If you need to reduce per-request cost, in order of impact:
- Switch from Sonnet to Haiku — saves 70% of Bedrock cost. Test for accuracy regression first.
- Use DynamoDB Token-Based instead of KMS — saves ~30% of total variable cost.
- Right-size the EC2 instance — if you don't need 6k req/hour, a smaller instance saves on fixed costs.
See the Marketplace Guide for deployment details.