Skip to content

PHI De-identification - Marketplace Deployment & Access Guide

This guide walks you through deploying the PHI De-identification API from AWS Marketplace, verifying the installation, connecting to the instance, and optionally enabling HTTPS.


Table of Contents

  1. Overview
  2. Prerequisites
  3. Deploying the Stack
  4. Verifying the Deployment
  5. Connecting via VPC Peering
  6. Connecting via SSM Session Manager
  7. Dashboard Access via SSM Port Forwarding
  8. Dashboard and API Access via AWS Client VPN
  9. HTTPS Setup
  10. Managing API Keys

1. Overview

The PHI De-identification API is a REST API that extracts, anonymizes, and de-anonymizes Protected Health Information (PHI) in medical text using AWS Bedrock (Claude). Deploying this product provisions:

Resource Purpose
EC2 instance Runs the API server
VPC + subnets Isolated network for the instance
DynamoDB tables PHI token mappings and named API keys
AWS KMS key PHI encryption/decryption (RSA 4096)
IAM role Access to Bedrock, DynamoDB, KMS, SSM
CloudWatch log group API server logs, 30-day retention
NAT Gateway Outbound access to AWS services (private mode only)

The API is available on port 8888. The web dashboard is served at /dashboard on the same port and the Swagger UI at /docs.


2. Prerequisites

Before deploying, ensure the following are in place:

AWS account

  • IAM permissions to create CloudFormation stacks, EC2 instances, VPCs, DynamoDB tables, KMS keys, and IAM roles.
  • The deployment region must support AWS Bedrock with Claude models. Verify at Bedrock model access and enable access to the Claude model you intend to use (e.g. us.anthropic.claude-sonnet-4-6).

EC2 key pair

Create a key pair in the target region before deploying:

AWS Console -> EC2 -> Key Pairs -> Create key pair

Or you can also use your existing keypair.

AWS CLI + SSM plugin (for SSM-based access - recommended over SSH)


3. Deploying the Stack

Find the listing

Direct link: https://aws.amazon.com/marketplace/pp/prodview-zrou3ehu2ffdq

Or search manually:

  1. Open console.aws.amazon.com/marketplace/search
  2. Search for "PHI De-identification by ClerkAI"
  3. Click on the listing

Subscribe and launch

  1. Click Continue to Subscribe → review terms → Accept Terms
  2. Once your subscription is active, click Continue to Configuration
  3. Select your Region and the Fulfillment Option (CloudFormation), then click Continue to Launch
  4. Choose Launch CloudFormation and click Launch
  5. Fill in the parameters below and deploy

Parameters

Parameter Description
APIKey The master API key used to authenticate all requests. Minimum 16 characters. Store it securely - it is written to SSM Parameter Store and injected at startup.
AmiID The pre-built AMI ID provided in the Marketplace listing. (Don't change anything here.)
InstanceType EC2 instance type. t3.small is the minimum recommended for production.
KeyName Name of an existing EC2 key pair for SSH access.
SSHAccessCIDR CIDR block allowed to SSH to the instance. Use your peered VPC CIDR x.x.x.x/32 or 0.0.0.0/0. SSM Session Manager is recommended over SSH.
APIAccessCIDR CIDR block allowed to reach port 8888. Use your peered VPC CIDR (e.g. 172.31.0.0/16) or 0.0.0.0/0.
DynamoBillingMode PAY_PER_REQUEST for variable workloads (default). PROVISIONED for high, steady throughput.
TableNameSuffix Optional suffix appended to DynamoDB table names (e.g. -prod). Useful when running multiple stacks in the same account.
EnableVpcPeering true to peer this VPC with an existing VPC so your application can reach the API. Requires PeerVpcId and PeerVpcCidr.
PeerVpcId VPC ID of the VPC to peer with (e.g. vpc-0abc1234).
PeerVpcCidr CIDR block of the peer VPC (e.g. 172.31.0.0/16).

Stack outputs

After the stack reaches CREATE_COMPLETE, check the Outputs tab:

Output Description
APIEndpoint The API base URL: http://<private-ip>:8888. Accessible via VPC peering or SSM port forwarding.
APIServerEC2InstanceIP Private IP address of the EC2 instance.
KMSKeyArn ARN of the KMS key - use this when configuring KMS storage.
ApiKeysTableName DynamoDB table storing named API keys.
SSMAPIKeyPath SSM Parameter Store path where the master API key is stored.
VPCPeeringConnectionId Peering connection ID (only when VPC peering is enabled).

4. Verifying the Deployment

Once the stack is CREATE_COMPLETE, verify the API is running via SSM (see Section 6):

aws ssm start-session --target <instance-id> --region <region>

# then inside the session:
curl http://localhost:8888/health

Expected response:

{"status": "healthy", "version": "1.0.0"}


5. Connecting via VPC Peering

Use this for: your application (running in another VPC) making API calls to the de-identification service in its own VPC.

VPC peering connects two VPCs at the network level. Traffic between them stays on AWS's internal network and never touches the internet.

Step 1 - Deploy with peering enabled

Set these parameters when deploying the stack:

EnableVpcPeering: true
PeerVpcId:        vpc-0abc1234        <- your application's VPC ID
PeerVpcCidr:      172.31.0.0/16      <- your application's VPC CIDR
APIAccessCIDR:    172.31.0.0/16      <- same as PeerVpcCidr

The stack automatically creates the peering connection and adds a route on the de-identification side.

Step 2 - Add the return route (your side)

In your application's VPC, add a route pointing back to the de-identification VPC:

  1. AWS Console -> VPC -> Route Tables
  2. Select the route table associated with your application's subnet
  3. Edit routes -> Add route:
  4. Destination: 10.0.0.0/16 (the de-identification VPC CIDR - or whatever you set in VPCCidr)
  5. Target: the peering connection ID from the stack output VPCPeeringConnectionId

Step 3 - Call the API

From any resource inside your VPC:

# Health check
curl http://10.0.2.x:8888/health
# use the private IP from stack output: APIServerEC2InstanceIP

# Anonymize text
curl -X POST http://10.0.2.x:8888/anonymize/phi?storage_type=KMS \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{"text": "Patient John Smith, DOB 01/15/1980."}'

6. Connecting via SSM Session Manager

Use this for: admin access to the instance - checking logs, managing API keys, running diagnostics. No open port 22, no key pair needed at the terminal.

Prerequisites

The operator running these commands needs the following IAM permission:

{
  "Effect": "Allow",
  "Action": ["ssm:StartSession"],
  "Resource": [
    "arn:aws:ec2:<region>:<account-id>:instance/<instance-id>",
    "arn:aws:ssm:*:*:document/AWS-StartInteractiveCommand",
    "arn:aws:ssm:*:*:document/AWS-StartPortForwardingSession"
  ]
}

The instance already has AmazonSSMManagedInstanceCore via the instance role - no changes needed there.

Open a shell session

aws ssm start-session \
  --target i-0abc1234567890 \
  --region us-east-1

You will get an interactive shell on the instance. Useful commands:

# Check API server status
docker ps

# Tail live logs
docker logs -f deidentify-api

# Check environment (API key, KMS ARN, table names)
cat /etc/app.env

# Test the API locally
curl http://localhost:8888/health

# Create a named API key
curl -X POST http://localhost:8888/admin/keys \
  -H "Content-Type: application/json" \
  -H "API-Key: $(grep API_KEY /etc/app.env | cut -d= -f2)" \
  -d '{"label": "my-integration"}'

# Restart the container
/home/ubuntu/start-deidentify.sh

7. Dashboard Access via SSM Port Forwarding

Use this for: accessing the web dashboard from your laptop without a public IP or VPN. Good for one-off admin sessions.

Step 1 - Start the tunnel

Open a terminal and run:

aws ssm start-session \
  --target i-0abc1234567890 \
  --region us-east-1 \
  --document-name AWS-StartPortForwardingSession \
  --parameters '{"portNumber":["8888"],"localPortNumber":["8888"]}'

You will see:

Starting session with SessionId: ...
Port 8888 opened for sessionId ...
Waiting for connections...

Keep this terminal open. Closing it ends the tunnel.

Step 2 - Open the dashboard

In your browser:

http://localhost:8888/dashboard

The browser connects to localhost:8888 and SSM tunnels the traffic through to the instance. Enter your API key when prompted.

Step 3 - End the session

Close the terminal running the SSM command, or press Ctrl+C.


8. Dashboard and API Access via AWS Client VPN

Use this for: persistent access for a team - developers and admins can browse the dashboard or call the API as if their laptop were inside the VPC. Better than running the SSM command every session.

Step 1 - Generate certificates

Use easy-rsa to create a server certificate and one client certificate per user (or one shared client cert for simplicity):

git clone https://github.com/OpenVPN/easy-rsa.git
cd easy-rsa/easyrsa3

./easyrsa init-pki
./easyrsa build-ca nopass

# Server cert
./easyrsa build-server-full server nopass

# Client cert (repeat for each user if needed)
./easyrsa build-client-full client1 nopass

Step 2 - Upload certificates to ACM

# Server cert
aws acm import-certificate \
  --certificate fileb://pki/issued/server.crt \
  --private-key fileb://pki/private/server.key \
  --certificate-chain fileb://pki/ca.crt \
  --region us-east-1

# Client cert
aws acm import-certificate \
  --certificate fileb://pki/issued/client1.crt \
  --private-key fileb://pki/private/client1.key \
  --certificate-chain fileb://pki/ca.crt \
  --region us-east-1

Note the ARNs returned - you will need them in the next step.

Step 3 - Create a Client VPN endpoint

In AWS Console -> VPC -> Client VPN Endpoints -> Create Client VPN endpoint:

Field Value
Client IPv4 CIDR 192.168.100.0/22 (must not overlap your VPCs)
Server certificate ARN ARN from Step 2
Authentication type Mutual authentication
Client certificate ARN ARN from Step 2
Enable split-tunnel Yes (only VPC traffic goes through the VPN)
VPC The de-identification VPC (DeidentifyVPC)
Security group The de-identification security group

Step 4 - Associate with the subnet and add authorization

  1. Associations tab -> Associate with the private subnet (PrivateSubnet)
  2. Authorization rules tab -> Add authorization rule:
  3. Destination network: 10.0.0.0/16 (de-identification VPC CIDR)
  4. Allow access to all users

Step 5 - Download the VPN configuration

Client VPN Endpoints -> select your endpoint -> Download client configuration

Open the downloaded .ovpn file in a text editor and append the client certificate and key:

<cert>
-----BEGIN CERTIFICATE-----
(paste contents of pki/issued/client1.crt)
-----END CERTIFICATE-----
</cert>

<key>
-----BEGIN PRIVATE KEY-----
(paste contents of pki/private/client1.key)
-----END PRIVATE KEY-----
</key>

Step 6 - Connect and access

  1. Install AWS VPN Client (or any OpenVPN-compatible client)
  2. Import the .ovpn file
  3. Connect

Once connected, open in your browser:

http://10.0.2.x:8888/dashboard
# use the private IP from stack output: APIServerEC2InstanceIP

Or call the API directly:

curl http://10.0.2.x:8888/health


9. HTTPS Setup

By default the API runs on HTTP port 8888. If you need HTTPS - for browser security warnings, compliance requirements, or public-facing access - choose one of the three options below.


Best when: you have a domain name and want a trusted certificate managed by AWS. The ALB terminates TLS and forwards traffic to the instance on port 8888.

Prerequisites: a domain name with DNS you control.

Step 1 - Request a certificate in ACM

AWS Console -> Certificate Manager -> Request a certificate: - Domain name: api.yourdomain.com - Validation method: DNS (add the CNAME record ACM gives you to your DNS provider)

Wait until the certificate status is Issued.

Step 2 - Create an Application Load Balancer

AWS Console -> EC2 -> Load Balancers -> Create Load Balancer -> Application Load Balancer:

Field Value
Scheme Internet-facing (public mode) or Internal (private mode)
VPC DeidentifyVPC
Subnets Select the public subnet (at least one AZ required)
Security group Create a new one allowing TCP 443 inbound from 0.0.0.0/0

Listeners: - Add listener: HTTPS port 443 -> Forward to a new target group (HTTP port 8888, target: the EC2 instance) - Select the ACM certificate from Step 1

Step 3 - Update the instance security group

The EC2 security group currently allows port 8888 from APIAccessCIDR. Add a rule allowing traffic from the ALB's security group:

AWS Console -> EC2 -> Security Groups -> DeidentifyVPC security group -> Inbound rules -> Add rule: - Type: Custom TCP - Port: 8888 - Source: the ALB's security group ID

Step 4 - Point DNS to the ALB

In your DNS provider, create a CNAME record:

api.yourdomain.com  ->  <alb-dns-name>.elb.amazonaws.com

Step 5 - Test

curl https://api.yourdomain.com/health

Option B - Caddy reverse proxy on the instance

Best when: you want HTTPS without a load balancer, or you need a self-signed certificate for internal use without a domain name.

Connect to the instance

Use SSM Session Manager (see Section 6):

aws ssm start-session --target i-0abc1234567890 --region us-east-1

Install Caddy

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy

Configure Caddy

With a domain name (automatic HTTPS via Let's Encrypt):

The instance must be publicly reachable on ports 80 and 443 for this to work.

sudo tee /etc/caddy/Caddyfile > /dev/null <<'EOF'
api.yourdomain.com {
    reverse_proxy localhost:8888
}
EOF

Without a domain name (self-signed certificate for internal use):

sudo tee /etc/caddy/Caddyfile > /dev/null <<'EOF'
:443 {
    tls internal
    reverse_proxy localhost:8888
}
EOF

tls internal generates a self-signed certificate. Browsers will show a warning - accept it or add the Caddy root CA to your trust store.

Update the security group

Allow port 443 inbound on the instance security group:

AWS Console -> EC2 -> Security Groups -> Add inbound rule: - Type: HTTPS - Port: 443 - Source: your APIAccessCIDR

Start Caddy

sudo systemctl reload caddy
sudo systemctl enable caddy

Test

# With a domain
curl https://api.yourdomain.com/health

# Self-signed (skip cert verification for testing only)
curl -k https://<instance-ip>/health

Option C - Nginx reverse proxy on the instance

Best when: your team is already familiar with Nginx and prefers it over Caddy. Requires managing TLS certificates manually (via certbot or a pre-existing cert).

Connect to the instance

Use SSM Session Manager (see Section 6):

aws ssm start-session --target i-0abc1234567890 --region us-east-1

Install Nginx and Certbot

sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx

Configure Nginx

sudo tee /etc/nginx/sites-available/deidentify > /dev/null <<'EOF'
server {
    listen 443 ssl;
    server_name api.yourdomain.com;

    ssl_certificate     /etc/letsencrypt/live/api.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.yourdomain.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    location / {
        proxy_pass         http://localhost:8888;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 120s;
    }
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name api.yourdomain.com;
    return 301 https://$host$request_uri;
}
EOF

sudo ln -s /etc/nginx/sites-available/deidentify /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default

Without a domain name (self-signed certificate for internal use):

# Generate a self-signed certificate
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /etc/ssl/private/deidentify.key \
  -out /etc/ssl/certs/deidentify.crt \
  -subj "/CN=deidentify"

sudo tee /etc/nginx/sites-available/deidentify > /dev/null <<'EOF'
server {
    listen 443 ssl;

    ssl_certificate     /etc/ssl/certs/deidentify.crt;
    ssl_certificate_key /etc/ssl/private/deidentify.key;
    ssl_protocols       TLSv1.2 TLSv1.3;

    location / {
        proxy_pass         http://localhost:8888;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_read_timeout 120s;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/deidentify /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default

Browsers will show a warning for self-signed certificates - accept it, or distribute the cert to your team's trust store.

Obtain a Let's Encrypt certificate (domain only)

The instance must be publicly reachable on port 80 for this step:

sudo certbot --nginx -d api.yourdomain.com

Certbot automatically edits the Nginx config to point at the new certificate and sets up a renewal cron job.

Update the security group

Allow port 443 inbound:

AWS Console -> EC2 -> Security Groups -> Add inbound rule: - Type: HTTPS - Port: 443 - Source: your APIAccessCIDR

Start Nginx

sudo nginx -t                        # verify config
sudo systemctl reload nginx
sudo systemctl enable nginx

Test

# With a domain
curl https://api.yourdomain.com/health

# Self-signed (skip cert verification for testing only)
curl -k https://<instance-ip>/health

10. Managing API Keys

The master APIKey set during deployment authenticates all requests. You can also create named keys - useful for giving different integrations separate credentials that can be individually revoked.

Create a named key

curl -X POST http://<api-endpoint>:8888/admin/keys \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_MASTER_KEY" \
  -d '{"label": "data-pipeline"}'

The raw key is returned once - save it immediately. Example response:

{
  "key_id": "f3a8...uuid",
  "label": "data-pipeline",
  "api_key": "dei_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "created_at": "2026-04-29T10:00:00",
  "expires_at": null,
  "warning": "This key will not be shown again. Store it now."
}

Add "expires_in_days": 365 to the request body for a key that auto-expires.

List keys

curl http://<api-endpoint>:8888/admin/keys \
  -H "API-Key: YOUR_MASTER_KEY"

Returns metadata only - the raw key is never stored or returned after creation.

Revoke a key

curl -X DELETE http://<api-endpoint>:8888/admin/keys/<key_id> \
  -H "API-Key: YOUR_MASTER_KEY"

View from the dashboard

Navigate to Admin -> API Keys in the web dashboard to see all keys and their status (active, revoked, expired).

For the full API reference, see API_README.md.