PHI De-identification - Marketplace Deployment & Access Guide¶
This guide walks you through deploying the PHI De-identification API from AWS Marketplace, verifying the installation, connecting to the instance, and optionally enabling HTTPS.
Table of Contents¶
- Overview
- Prerequisites
- Deploying the Stack
- Verifying the Deployment
- Connecting via VPC Peering
- Connecting via SSM Session Manager
- Dashboard Access via SSM Port Forwarding
- Dashboard and API Access via AWS Client VPN
- HTTPS Setup
- Managing API Keys
1. Overview¶
The PHI De-identification API is a REST API that extracts, anonymizes, and de-anonymizes Protected Health Information (PHI) in medical text using AWS Bedrock (Claude). Deploying this product provisions:
| Resource | Purpose |
|---|---|
| EC2 instance | Runs the API server |
| VPC + subnets | Isolated network for the instance |
| DynamoDB tables | PHI token mappings and named API keys |
| AWS KMS key | PHI encryption/decryption (RSA 4096) |
| IAM role | Access to Bedrock, DynamoDB, KMS, SSM |
| CloudWatch log group | API server logs, 30-day retention |
| NAT Gateway | Outbound access to AWS services (private mode only) |
The API is available on port 8888. The web dashboard is served at /dashboard on the same port and the Swagger UI at /docs.
2. Prerequisites¶
Before deploying, ensure the following are in place:
AWS account
- IAM permissions to create CloudFormation stacks, EC2 instances, VPCs, DynamoDB tables, KMS keys, and IAM roles.
- The deployment region must support AWS Bedrock with Claude models. Verify at Bedrock model access and enable access to the Claude model you intend to use (e.g.
us.anthropic.claude-sonnet-4-6).
EC2 key pair
Create a key pair in the target region before deploying:
AWS Console -> EC2 -> Key Pairs -> Create key pair
Or you can also use your existing keypair.
AWS CLI + SSM plugin (for SSM-based access - recommended over SSH)
3. Deploying the Stack¶
Find the listing¶
Direct link: https://aws.amazon.com/marketplace/pp/prodview-zrou3ehu2ffdq
Or search manually:
- Open console.aws.amazon.com/marketplace/search
- Search for "PHI De-identification by ClerkAI"
- Click on the listing
Subscribe and launch¶
- Click Continue to Subscribe → review terms → Accept Terms
- Once your subscription is active, click Continue to Configuration
- Select your Region and the Fulfillment Option (CloudFormation), then click Continue to Launch
- Choose Launch CloudFormation and click Launch
- Fill in the parameters below and deploy
Parameters¶
| Parameter | Description |
|---|---|
| APIKey | The master API key used to authenticate all requests. Minimum 16 characters. Store it securely - it is written to SSM Parameter Store and injected at startup. |
| AmiID | The pre-built AMI ID provided in the Marketplace listing. (Don't change anything here.) |
| InstanceType | EC2 instance type. t3.small is the minimum recommended for production. |
| KeyName | Name of an existing EC2 key pair for SSH access. |
| SSHAccessCIDR | CIDR block allowed to SSH to the instance. Use your peered VPC CIDR x.x.x.x/32 or 0.0.0.0/0. SSM Session Manager is recommended over SSH. |
| APIAccessCIDR | CIDR block allowed to reach port 8888. Use your peered VPC CIDR (e.g. 172.31.0.0/16) or 0.0.0.0/0. |
| DynamoBillingMode | PAY_PER_REQUEST for variable workloads (default). PROVISIONED for high, steady throughput. |
| TableNameSuffix | Optional suffix appended to DynamoDB table names (e.g. -prod). Useful when running multiple stacks in the same account. |
| EnableVpcPeering | true to peer this VPC with an existing VPC so your application can reach the API. Requires PeerVpcId and PeerVpcCidr. |
| PeerVpcId | VPC ID of the VPC to peer with (e.g. vpc-0abc1234). |
| PeerVpcCidr | CIDR block of the peer VPC (e.g. 172.31.0.0/16). |
Stack outputs¶
After the stack reaches CREATE_COMPLETE, check the Outputs tab:
| Output | Description |
|---|---|
APIEndpoint |
The API base URL: http://<private-ip>:8888. Accessible via VPC peering or SSM port forwarding. |
APIServerEC2InstanceIP |
Private IP address of the EC2 instance. |
KMSKeyArn |
ARN of the KMS key - use this when configuring KMS storage. |
ApiKeysTableName |
DynamoDB table storing named API keys. |
SSMAPIKeyPath |
SSM Parameter Store path where the master API key is stored. |
VPCPeeringConnectionId |
Peering connection ID (only when VPC peering is enabled). |
4. Verifying the Deployment¶
Once the stack is CREATE_COMPLETE, verify the API is running via SSM (see Section 6):
aws ssm start-session --target <instance-id> --region <region>
# then inside the session:
curl http://localhost:8888/health
Expected response:
5. Connecting via VPC Peering¶
Use this for: your application (running in another VPC) making API calls to the de-identification service in its own VPC.
VPC peering connects two VPCs at the network level. Traffic between them stays on AWS's internal network and never touches the internet.
Step 1 - Deploy with peering enabled¶
Set these parameters when deploying the stack:
EnableVpcPeering: true
PeerVpcId: vpc-0abc1234 <- your application's VPC ID
PeerVpcCidr: 172.31.0.0/16 <- your application's VPC CIDR
APIAccessCIDR: 172.31.0.0/16 <- same as PeerVpcCidr
The stack automatically creates the peering connection and adds a route on the de-identification side.
Step 2 - Add the return route (your side)¶
In your application's VPC, add a route pointing back to the de-identification VPC:
- AWS Console -> VPC -> Route Tables
- Select the route table associated with your application's subnet
- Edit routes -> Add route:
- Destination:
10.0.0.0/16(the de-identification VPC CIDR - or whatever you set inVPCCidr) - Target: the peering connection ID from the stack output
VPCPeeringConnectionId
Step 3 - Call the API¶
From any resource inside your VPC:
# Health check
curl http://10.0.2.x:8888/health
# use the private IP from stack output: APIServerEC2InstanceIP
# Anonymize text
curl -X POST http://10.0.2.x:8888/anonymize/phi?storage_type=KMS \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{"text": "Patient John Smith, DOB 01/15/1980."}'
6. Connecting via SSM Session Manager¶
Use this for: admin access to the instance - checking logs, managing API keys, running diagnostics. No open port 22, no key pair needed at the terminal.
Prerequisites¶
The operator running these commands needs the following IAM permission:
{
"Effect": "Allow",
"Action": ["ssm:StartSession"],
"Resource": [
"arn:aws:ec2:<region>:<account-id>:instance/<instance-id>",
"arn:aws:ssm:*:*:document/AWS-StartInteractiveCommand",
"arn:aws:ssm:*:*:document/AWS-StartPortForwardingSession"
]
}
The instance already has AmazonSSMManagedInstanceCore via the instance role - no changes needed there.
Open a shell session¶
You will get an interactive shell on the instance. Useful commands:
# Check API server status
docker ps
# Tail live logs
docker logs -f deidentify-api
# Check environment (API key, KMS ARN, table names)
cat /etc/app.env
# Test the API locally
curl http://localhost:8888/health
# Create a named API key
curl -X POST http://localhost:8888/admin/keys \
-H "Content-Type: application/json" \
-H "API-Key: $(grep API_KEY /etc/app.env | cut -d= -f2)" \
-d '{"label": "my-integration"}'
# Restart the container
/home/ubuntu/start-deidentify.sh
7. Dashboard Access via SSM Port Forwarding¶
Use this for: accessing the web dashboard from your laptop without a public IP or VPN. Good for one-off admin sessions.
Step 1 - Start the tunnel¶
Open a terminal and run:
aws ssm start-session \
--target i-0abc1234567890 \
--region us-east-1 \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["8888"],"localPortNumber":["8888"]}'
You will see:
Keep this terminal open. Closing it ends the tunnel.
Step 2 - Open the dashboard¶
In your browser:
The browser connects to localhost:8888 and SSM tunnels the traffic through to the instance. Enter your API key when prompted.
Step 3 - End the session¶
Close the terminal running the SSM command, or press Ctrl+C.
8. Dashboard and API Access via AWS Client VPN¶
Use this for: persistent access for a team - developers and admins can browse the dashboard or call the API as if their laptop were inside the VPC. Better than running the SSM command every session.
Step 1 - Generate certificates¶
Use easy-rsa to create a server certificate and one client certificate per user (or one shared client cert for simplicity):
git clone https://github.com/OpenVPN/easy-rsa.git
cd easy-rsa/easyrsa3
./easyrsa init-pki
./easyrsa build-ca nopass
# Server cert
./easyrsa build-server-full server nopass
# Client cert (repeat for each user if needed)
./easyrsa build-client-full client1 nopass
Step 2 - Upload certificates to ACM¶
# Server cert
aws acm import-certificate \
--certificate fileb://pki/issued/server.crt \
--private-key fileb://pki/private/server.key \
--certificate-chain fileb://pki/ca.crt \
--region us-east-1
# Client cert
aws acm import-certificate \
--certificate fileb://pki/issued/client1.crt \
--private-key fileb://pki/private/client1.key \
--certificate-chain fileb://pki/ca.crt \
--region us-east-1
Note the ARNs returned - you will need them in the next step.
Step 3 - Create a Client VPN endpoint¶
In AWS Console -> VPC -> Client VPN Endpoints -> Create Client VPN endpoint:
| Field | Value |
|---|---|
| Client IPv4 CIDR | 192.168.100.0/22 (must not overlap your VPCs) |
| Server certificate ARN | ARN from Step 2 |
| Authentication type | Mutual authentication |
| Client certificate ARN | ARN from Step 2 |
| Enable split-tunnel | Yes (only VPC traffic goes through the VPN) |
| VPC | The de-identification VPC (DeidentifyVPC) |
| Security group | The de-identification security group |
Step 4 - Associate with the subnet and add authorization¶
- Associations tab -> Associate with the private subnet (
PrivateSubnet) - Authorization rules tab -> Add authorization rule:
- Destination network:
10.0.0.0/16(de-identification VPC CIDR) - Allow access to all users
Step 5 - Download the VPN configuration¶
Client VPN Endpoints -> select your endpoint -> Download client configuration
Open the downloaded .ovpn file in a text editor and append the client certificate and key:
<cert>
-----BEGIN CERTIFICATE-----
(paste contents of pki/issued/client1.crt)
-----END CERTIFICATE-----
</cert>
<key>
-----BEGIN PRIVATE KEY-----
(paste contents of pki/private/client1.key)
-----END PRIVATE KEY-----
</key>
Step 6 - Connect and access¶
- Install AWS VPN Client (or any OpenVPN-compatible client)
- Import the
.ovpnfile - Connect
Once connected, open in your browser:
Or call the API directly:
9. HTTPS Setup¶
By default the API runs on HTTP port 8888. If you need HTTPS - for browser security warnings, compliance requirements, or public-facing access - choose one of the three options below.
Option A - Application Load Balancer + ACM (recommended for production)¶
Best when: you have a domain name and want a trusted certificate managed by AWS. The ALB terminates TLS and forwards traffic to the instance on port 8888.
Prerequisites: a domain name with DNS you control.
Step 1 - Request a certificate in ACM¶
AWS Console -> Certificate Manager -> Request a certificate:
- Domain name: api.yourdomain.com
- Validation method: DNS (add the CNAME record ACM gives you to your DNS provider)
Wait until the certificate status is Issued.
Step 2 - Create an Application Load Balancer¶
AWS Console -> EC2 -> Load Balancers -> Create Load Balancer -> Application Load Balancer:
| Field | Value |
|---|---|
| Scheme | Internet-facing (public mode) or Internal (private mode) |
| VPC | DeidentifyVPC |
| Subnets | Select the public subnet (at least one AZ required) |
| Security group | Create a new one allowing TCP 443 inbound from 0.0.0.0/0 |
Listeners: - Add listener: HTTPS port 443 -> Forward to a new target group (HTTP port 8888, target: the EC2 instance) - Select the ACM certificate from Step 1
Step 3 - Update the instance security group¶
The EC2 security group currently allows port 8888 from APIAccessCIDR. Add a rule allowing traffic from the ALB's security group:
AWS Console -> EC2 -> Security Groups -> DeidentifyVPC security group -> Inbound rules -> Add rule:
- Type: Custom TCP
- Port: 8888
- Source: the ALB's security group ID
Step 4 - Point DNS to the ALB¶
In your DNS provider, create a CNAME record:
Step 5 - Test¶
Option B - Caddy reverse proxy on the instance¶
Best when: you want HTTPS without a load balancer, or you need a self-signed certificate for internal use without a domain name.
Connect to the instance¶
Use SSM Session Manager (see Section 6):
Install Caddy¶
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy
Configure Caddy¶
With a domain name (automatic HTTPS via Let's Encrypt):
The instance must be publicly reachable on ports 80 and 443 for this to work.
sudo tee /etc/caddy/Caddyfile > /dev/null <<'EOF'
api.yourdomain.com {
reverse_proxy localhost:8888
}
EOF
Without a domain name (self-signed certificate for internal use):
sudo tee /etc/caddy/Caddyfile > /dev/null <<'EOF'
:443 {
tls internal
reverse_proxy localhost:8888
}
EOF
tls internal generates a self-signed certificate. Browsers will show a warning - accept it or add the Caddy root CA to your trust store.
Update the security group¶
Allow port 443 inbound on the instance security group:
AWS Console -> EC2 -> Security Groups -> Add inbound rule:
- Type: HTTPS
- Port: 443
- Source: your APIAccessCIDR
Start Caddy¶
Test¶
# With a domain
curl https://api.yourdomain.com/health
# Self-signed (skip cert verification for testing only)
curl -k https://<instance-ip>/health
Option C - Nginx reverse proxy on the instance¶
Best when: your team is already familiar with Nginx and prefers it over Caddy. Requires managing TLS certificates manually (via certbot or a pre-existing cert).
Connect to the instance¶
Use SSM Session Manager (see Section 6):
Install Nginx and Certbot¶
Configure Nginx¶
sudo tee /etc/nginx/sites-available/deidentify > /dev/null <<'EOF'
server {
listen 443 ssl;
server_name api.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/api.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
location / {
proxy_pass http://localhost:8888;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 120s;
}
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name api.yourdomain.com;
return 301 https://$host$request_uri;
}
EOF
sudo ln -s /etc/nginx/sites-available/deidentify /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default
Without a domain name (self-signed certificate for internal use):
# Generate a self-signed certificate
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/private/deidentify.key \
-out /etc/ssl/certs/deidentify.crt \
-subj "/CN=deidentify"
sudo tee /etc/nginx/sites-available/deidentify > /dev/null <<'EOF'
server {
listen 443 ssl;
ssl_certificate /etc/ssl/certs/deidentify.crt;
ssl_certificate_key /etc/ssl/private/deidentify.key;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://localhost:8888;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 120s;
}
}
EOF
sudo ln -s /etc/nginx/sites-available/deidentify /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default
Browsers will show a warning for self-signed certificates - accept it, or distribute the cert to your team's trust store.
Obtain a Let's Encrypt certificate (domain only)¶
The instance must be publicly reachable on port 80 for this step:
Certbot automatically edits the Nginx config to point at the new certificate and sets up a renewal cron job.
Update the security group¶
Allow port 443 inbound:
AWS Console -> EC2 -> Security Groups -> Add inbound rule:
- Type: HTTPS
- Port: 443
- Source: your APIAccessCIDR
Start Nginx¶
Test¶
# With a domain
curl https://api.yourdomain.com/health
# Self-signed (skip cert verification for testing only)
curl -k https://<instance-ip>/health
10. Managing API Keys¶
The master APIKey set during deployment authenticates all requests. You can also create named keys - useful for giving different integrations separate credentials that can be individually revoked.
Create a named key¶
curl -X POST http://<api-endpoint>:8888/admin/keys \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_MASTER_KEY" \
-d '{"label": "data-pipeline"}'
The raw key is returned once - save it immediately. Example response:
{
"key_id": "f3a8...uuid",
"label": "data-pipeline",
"api_key": "dei_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"created_at": "2026-04-29T10:00:00",
"expires_at": null,
"warning": "This key will not be shown again. Store it now."
}
Add "expires_in_days": 365 to the request body for a key that auto-expires.
List keys¶
Returns metadata only - the raw key is never stored or returned after creation.
Revoke a key¶
View from the dashboard¶
Navigate to Admin -> API Keys in the web dashboard to see all keys and their status (active, revoked, expired).
For the full API reference, see API_README.md.