PHI Categories¶
The extractor identifies 19 categories of Protected Health Information out of the box, covering all 18 HIPAA identifiers plus a generic date category for non-birthdate dates (admission, discharge, appointment, etc.).
The category list is fully customizable at runtime and you can add new categories or remove existing ones via the dashboard or API without restarting the server. See Customizing Categories below.
Default categories¶
| Category | Description | Example matches |
|---|---|---|
name |
Patient's full name or any part of name (first, last, doctor name, etc.) | John Smith, Dr. Jane Doe, Mr. Brown |
birthdate |
Dates of birth in any format | 01/15/1980, January 15, 1980, 1980-01-15 |
address |
Physical addresses or locations (street, city, state, zip, room numbers) | 1234 Elm St, Springfield, IL 62704, Room 302 |
telephone |
Any phone numbers (mobile, home, work) | (555) 123-4567, +1-555-123-4567 |
fax |
Fax numbers | (555) 999-8888 |
email |
Email addresses | john.smith@example.com |
ssn |
Social Security Numbers (any format, including 9 digits without hyphens) | 123-45-6789, 123456789 |
medical_record_number |
Medical record numbers or MRNs | MRN-001234, MR123456 |
health_plan_number |
Health plan or insurance policy numbers | HPB-987654321, INS-12345 |
account_number |
All account numbers (billing, bank, patient) | ACCT-456789, Patient #98765 |
license_number |
Licenses including driver's licenses and professional licenses | D123-4567-8912, MD-LIC-54321 |
vehicle_id |
Vehicle identifiers or VINs | 1HGCM82633A004352 |
device_id |
Device identifiers or serial numbers | DEV-9988776655, SN: AB12345 |
url |
All web URLs | https://patient-portal.example.com/... |
ip_address |
IP addresses (IPv4 or IPv6) | 192.168.1.25, 2001:db8::1 |
biometric |
Biometric identifiers (fingerprints, voiceprints, retinal scans) | fp_hash_5f4dcc3b... |
photograph |
References to photographs or patient images | patient_photo.jpg, image_12345.png |
unique_id |
Any other unique identifying numbers or codes | UUID-a1b2c3d4-..., CASE-2026-001 |
date |
All dates including admission, discharge, and appointment dates (NOT birthdates) | Admitted 03/15/2026, Follow-up next Tuesday |
HIPAA Safe Harbor coverage
These categories together cover all 18 identifiers required by the HIPAA Safe Harbor de-identification standard. The date category is separated from birthdate because the two have different sensitivity profiles and may require different handling in some workflows.
Listing active categories¶
To see which categories are currently active on your server (including any custom additions or removals):
Response:
{
"categories": [
{"name": "name", "description": "Patient's full name or any part of name..."},
{"name": "ssn", "description": "Social Security Numbers..."}
]
}
Customizing categories¶
You can add or remove categories at runtime to tailor the extractor to your domain. Changes take effect on the next extraction request — no server restart required.
From the dashboard¶
Navigate to Configuration -> PHI Categories:
- Click Add Category to define a new one (e.g.
patient_id,case_number,appointment_code) - Click the trash icon next to a category to remove it
From the API¶
Add a category:
curl -X POST http://<host>:8888/phi-categories \
-H "API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "patient_id",
"description": "Internal patient identifier used by the hospital EHR"
}'
Delete a category:
See API Reference -> PHI Category Management for details.
Constraints on category names¶
- Lowercase letters, digits, and underscores only (
a-z,0-9,_) - Must start with a letter
- Maximum 50 characters
- Description: maximum 500 characters
Invalid examples: PatientID (uppercase), 123_id (starts with digit), my-id (hyphen).