Skip to content

PHI Categories

The extractor identifies 19 categories of Protected Health Information out of the box, covering all 18 HIPAA identifiers plus a generic date category for non-birthdate dates (admission, discharge, appointment, etc.).

The category list is fully customizable at runtime and you can add new categories or remove existing ones via the dashboard or API without restarting the server. See Customizing Categories below.


Default categories

Category Description Example matches
name Patient's full name or any part of name (first, last, doctor name, etc.) John Smith, Dr. Jane Doe, Mr. Brown
birthdate Dates of birth in any format 01/15/1980, January 15, 1980, 1980-01-15
address Physical addresses or locations (street, city, state, zip, room numbers) 1234 Elm St, Springfield, IL 62704, Room 302
telephone Any phone numbers (mobile, home, work) (555) 123-4567, +1-555-123-4567
fax Fax numbers (555) 999-8888
email Email addresses john.smith@example.com
ssn Social Security Numbers (any format, including 9 digits without hyphens) 123-45-6789, 123456789
medical_record_number Medical record numbers or MRNs MRN-001234, MR123456
health_plan_number Health plan or insurance policy numbers HPB-987654321, INS-12345
account_number All account numbers (billing, bank, patient) ACCT-456789, Patient #98765
license_number Licenses including driver's licenses and professional licenses D123-4567-8912, MD-LIC-54321
vehicle_id Vehicle identifiers or VINs 1HGCM82633A004352
device_id Device identifiers or serial numbers DEV-9988776655, SN: AB12345
url All web URLs https://patient-portal.example.com/...
ip_address IP addresses (IPv4 or IPv6) 192.168.1.25, 2001:db8::1
biometric Biometric identifiers (fingerprints, voiceprints, retinal scans) fp_hash_5f4dcc3b...
photograph References to photographs or patient images patient_photo.jpg, image_12345.png
unique_id Any other unique identifying numbers or codes UUID-a1b2c3d4-..., CASE-2026-001
date All dates including admission, discharge, and appointment dates (NOT birthdates) Admitted 03/15/2026, Follow-up next Tuesday

HIPAA Safe Harbor coverage

These categories together cover all 18 identifiers required by the HIPAA Safe Harbor de-identification standard. The date category is separated from birthdate because the two have different sensitivity profiles and may require different handling in some workflows.


Listing active categories

To see which categories are currently active on your server (including any custom additions or removals):

curl http://<host>:8888/phi-categories \
  -H "API-Key: YOUR_API_KEY"

Response:

{
  "categories": [
    {"name": "name", "description": "Patient's full name or any part of name..."},
    {"name": "ssn",  "description": "Social Security Numbers..."}
  ]
}


Customizing categories

You can add or remove categories at runtime to tailor the extractor to your domain. Changes take effect on the next extraction request — no server restart required.

From the dashboard

Navigate to Configuration -> PHI Categories:

  • Click Add Category to define a new one (e.g. patient_id, case_number, appointment_code)
  • Click the trash icon next to a category to remove it

From the API

Add a category:

curl -X POST http://<host>:8888/phi-categories \
  -H "API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "patient_id",
    "description": "Internal patient identifier used by the hospital EHR"
  }'

Delete a category:

curl -X DELETE "http://<host>:8888/phi-categories/patient_id" \
  -H "API-Key: YOUR_API_KEY"

See API Reference -> PHI Category Management for details.


Constraints on category names

  • Lowercase letters, digits, and underscores only (a-z, 0-9, _)
  • Must start with a letter
  • Maximum 50 characters
  • Description: maximum 500 characters

Invalid examples: PatientID (uppercase), 123_id (starts with digit), my-id (hyphen).