identifying

This skill should be used when the user asks to "check for identifiability", "analyze re-identification risks", "find privacy issues related to anonymization", "check for PII exposure", or mentions "identifiability" in a privacy context. Maps to LINDDUN category I.

florianbuetow 12 1 Updated 5mo ago

GitHub

Install

npx skillscat add florianbuetow/claude-code/identifying

Install via the SkillsCat registry.

SKILL.md

Identifiability Analysis (LINDDUN I)

Analyze source code for identifiability threats where individuals can be identified
from supposedly anonymous data. Combinations of quasi-identifiers (zip code, birth
date, gender) can uniquely identify individuals. Re-identification attacks on
"anonymized" data are the primary concern.

Supported Flags

Read `../../shared/schemas/flags.md` for full flag
documentation. This skill supports all cross-cutting flags.

Flag	Identifiability-Specific Behavior
`--scope`	Default `changed`. Focuses on files handling user data, anonymization logic, data exports, analytics pipelines, and API responses.
`--depth quick`	Grep patterns only: scan for PII in logs, quasi-identifiers in exports, and missing anonymization.
`--depth standard`	Full code read, analyze data fields returned in APIs and stored in databases for re-identification risk.
`--depth deep`	Trace data flows from collection to storage to export. Assess quasi-identifier combinations across the system.
`--depth expert`	Deep + re-identification risk modeling: estimate k-anonymity violations and uniqueness of attribute combinations.
`--severity`	Filter output. Identifiability findings range from `low` (theoretical) to `critical` (direct PII exposure).
`--fix`	Generate anonymization, generalization, and suppression replacements.

Framework Context

LINDDUN I -- Identifiability

Identifiability occurs when a person can be identified from data that is supposed
to be anonymous or pseudonymous. Read
`../../shared/frameworks/linddun.md` for the
full LINDDUN framework reference including re-identification attack patterns and
regulatory definitions.

Privacy Property Violated: Anonymity / Pseudonymity

STRIDE Mapping: Information Disclosure (identifiability focuses specifically
on re-identification of anonymized data rather than general data access)

Workflow

Step 1 -- Determine Scope

Parse --scope flag (default: changed).
Resolve to a concrete file list.
Filter to relevant files: data models, API handlers, data export logic,
analytics pipelines, logging configuration, database schemas, and
anonymization utilities.
Prioritize files containing: user data structures, data export endpoints,
log statements with user context, report generation, and data sharing logic.

Step 2 -- Analyze for Identifiability Patterns

Read each scoped file and assess re-identification risk:

Identify direct identifiers: Find fields like name, email, phone, SSN,
or national ID that should not appear in anonymous contexts.
Identify quasi-identifiers: Find combinations of fields (zip code, age,
gender, job title) that together may uniquely identify individuals.
Check anonymization logic: Verify that anonymization techniques are
actually applied and are sufficient (not just removing the name field).
Assess API responses: Check whether endpoints return more personal
attributes than the consumer needs.
Examine logs and error messages: Look for PII appearing in log output,
stack traces, or debug messages.

At --depth deep or --depth expert, model quasi-identifier combinations and
estimate uniqueness across the population.

Step 3 -- Report Findings

Output findings per `../../shared/schemas/findings.md`.
Each finding needs: IDENT-NNN id, title, severity (based on directness of
identification and data sensitivity), location with snippet, description of what
enables identification, impact (re-identification harm), fix (anonymization,
generalization, or suppression), and CWE/LINDDUN references.

Analysis Checklist

Are direct identifiers (name, email, phone, SSN) present in data exports or analytics?
Do API responses return more user attributes than the consumer actually needs?
Are quasi-identifiers (zip code, birth date, gender) combined in any output?
Is anonymization actually implemented, or just assumed in comments?
Do logs contain IP addresses, user agents, or device identifiers alongside actions?
Can database queries return single-user results from "anonymous" tables?
Are email addresses or phone numbers used as primary keys or foreign keys?
Do error messages or stack traces expose personal data fields?

What to Look For

PII in log statements: Personal data written to application logs.
- Grep: log\.\w+\(.*email|logger\.\w+\(.*name|console\.log\(.*phone|print\(.*ssn
Email or phone as primary key: Using direct identifiers as database keys.
- Grep: PRIMARY KEY.*email|primary_key.*email|@Column.*email.*unique|findByEmail|findByPhone
IP address logging: Recording IP addresses without anonymization.
- Grep: req\.ip|request\.remote_addr|X-Forwarded-For|ip_address|ipAddress|getRemoteAddr
Over-fetched API responses: SELECT * or returning full user objects.
- Grep: SELECT \*.*FROM.*user|\.findAll\(|\.find\(\{\}\)|res\.json\(user\)|JSON\.stringify\(user
Insufficient anonymization: Removing names but keeping detailed attributes.
- Grep: anonymize|anonymise|deidentify|de_identify|pseudonymize|mask.*data
Quasi-identifier combinations: Multiple demographic fields in the same record.
- Grep: zip_code.*birth_date|zipCode.*gender|age.*location|dateOfBirth.*address
User agent collection: Storing full browser fingerprint strings.
- Grep: user-agent|userAgent|navigator\.userAgent|req\.headers\[.user-agent.\]
Data exports without scrubbing: Export endpoints that dump raw user data.
- Grep: export.*user|download.*report|csv.*user|toCSV|toJSON.*user

Regulatory Mapping

Regulation	Provision	Relevance
GDPR Recital 26	Identifiability test	Data is personal if any means can identify the subject
GDPR Art. 4(5)	Pseudonymization definition	Pseudonymized data is still personal data
GDPR Art. 25	Data protection by design	Anonymization must be effective by design
HIPAA Safe Harbor	18 identifier categories	All 18 must be removed for de-identification
CCPA 1798.140(h)	Deidentified information	Reasonably cannot be linked to a consumer
CCPA 1798.140(o)	Personal information	Includes information that identifies or could be linked

Output Format

Use finding ID prefix IDENT (e.g., IDENT-001, IDENT-002).

All findings follow the schema in
`../../shared/schemas/findings.md` with:

references.cwe: CWE-359 or CWE-200 as appropriate
references.owasp: A02:2021 (Cryptographic Failures -- weak anonymization)
metadata.tool: "identifying"
metadata.framework: "linddun"
metadata.category: "I"

Summary table after all findings:

| Identifiability Pattern     | Critical | High | Medium | Low |
|-----------------------------|----------|------|--------|-----|
| Direct PII exposure         |          |      |        |     |
| PII in logs                 |          |      |        |     |
| Quasi-identifier combos     |          |      |        |     |
| Insufficient anonymization  |          |      |        |     |
| Over-fetched API responses  |          |      |        |     |
| IP / device tracking        |          |      |        |     |

Followed by: top 3 priorities, re-identification risk assessment, and overall assessment.

identifying

Install

Identifiability Analysis (LINDDUN I)

Supported Flags

Framework Context

Workflow

Step 1 -- Determine Scope

Step 2 -- Analyze for Identifiability Patterns

Step 3 -- Report Findings

Analysis Checklist

What to Look For

Regulatory Mapping

Output Format

Categories

Install

Recommended Skills