Technical Memo · Anthropic Safety · June 2026

Longitudinal Behavioral Analysis:
A Missing Safety Layer

How metadata already available to Anthropic could distinguish legitimate users from bad actors — without reading a single word of content.

FromM. M. Carvalho — Independent developer, Maricá, Brazil ToAnthropic Product & Safety teams DateJune 13, 2026 ContextWritten in response to the Fable 5 export control incident

The Triggering Event

On June 13, 2026, Anthropic suspended global access to Fable 5 and Mythos 5 following a US government export control directive. The practical result: every user outside the United States — regardless of history, intent, or usage pattern — lost access simultaneously.

This memo does not address the geopolitical dimension of that decision. It addresses a specific technical gap it revealed: Anthropic's safety infrastructure has no mechanism to distinguish a user with 18 months of consistent, legitimate usage from an account created yesterday with unknown intent.

That gap is not inherent to the problem. It is a design choice — and it is fixable with data Anthropic already has.

The Current Safety Architecture

Anthropic's published defense-in-depth strategy describes four layers:

Layer	What it analyzes	Time horizon
1 — Access controls	Deployment context and expected user group	Static / account-level
2 — Real-time classifiers	Individual prompt and completion content	This message
3 — Async monitoring	Session-level patterns and aggregate behavior	This session / this week
4 — Post-hoc jailbreak detection	Known adversarial patterns after the fact	Recent history
5 — Longitudinal behavioral analysis	User behavior patterns over months of history	Full account lifetime

Layers 1–4 analyze the moment. None of them look back far enough to understand who the user is over time. Layer 5 does not currently exist.

What Longitudinal Analysis Would Measure

The critical constraint: no content is read or stored. The analysis operates entirely on behavioral metadata — signals that are already generated as a byproduct of normal platform operation.

Signal 01

Temporal Pattern

Account age, usage frequency, session regularity over months. Consistent use across weeks reveals intent that a new account cannot demonstrate.

Signal 02

Session Structure

Average session duration, turns per conversation, context window consumption. Long, dense sessions indicate development work. Short, targeted sessions indicate extraction.

Signal 03

Domain Fingerprint

Token distribution patterns across sessions reveal domain without revealing content. Code-heavy vs. text-heavy vs. structured data — identifiable without reading a word.

Signal 04

Iteration Behavior

How often the user returns to the same context. Iterative development has a distinct signature. One-shot information extraction does not.

Signal 05

Feature Usage

Which platform features are used — Projects, Code, export, sharing. Builders use the platform differently from extractors.

Signal 06

Behavioral Consistency

Whether usage patterns are stable over time or show sudden anomalous spikes. Legitimate users have consistent patterns. Coordinated misuse does not.

How It Would Work in Practice

SCENARIO A — Same prompt, different users:
"How do buffer overflow vulnerabilities work?"

User X: Account created 3 days ago. 
        4 sessions total. 
        No consistent domain pattern.
        → Layer 5 trust score: LOW
        → Existing classifiers: ELEVATED sensitivity

User Y: Account active 14 months.
        Consistent development sessions.
        Domain fingerprint: software architecture.
        → Layer 5 trust score: HIGH
        → Existing classifiers: REDUCED sensitivity

Same prompt. Different treatment. No content read.

SCENARIO B — Fable 5 access decision:
Instead of: all foreign nationals suspended equally
With Layer 5: users with established behavioral history
             receive differentiated treatment
             while new accounts face standard restrictions

What This Solves

False positives. The most visible failure of the Fable 5 launch was legitimate researchers being blocked for words like "hello" and "cancer." A trust modifier from longitudinal analysis would reduce classifier sensitivity for established users — fewer false positives without reducing protection where it matters.

Adversarial account creation. Bad actors typically use fresh accounts. Longitudinal analysis makes the platform structurally hostile to this pattern — you cannot fake 14 months of consistent, coherent usage history.

Geopolitical resilience. A differentiated access model based on behavioral history — not nationality — provides a principled, defensible basis for access decisions that does not depend entirely on jurisdiction.

User trust. A user with a long, clean behavioral record has implicitly earned a different level of trust. Treating them identically to a new account is not just technically inaccurate — it is a signal that the platform does not value the relationship.

What This Does Not Solve

This is not a complete safety solution. A sophisticated actor with patience could build behavioral history deliberately. Nation-state level operations may have resources to do so. Geopolitical access restrictions require geopolitical responses.

The claim is narrower and more defensible: longitudinal behavioral analysis is a low-cost, privacy-preserving layer that improves precision across the entire safety stack — reducing false positives for legitimate users while making the platform structurally harder for opportunistic bad actors.

Implementation cost vs. impact The data required for this analysis already exists as a byproduct of normal platform operation. No new data collection is required. No content needs to be read or stored. The infrastructure investment is in the analysis pipeline and the trust score integration with existing classifiers — not in generating new data.

Recommendation

Add longitudinal behavioral analysis as a fifth layer in the defense-in-depth architecture. Output a per-user trust modifier that existing classifiers can consume as a weight — not as an override, but as a context signal that adjusts sensitivity thresholds based on demonstrated usage history.

Start with a simple version: account age × usage consistency × domain stability = trust tier (1–4). Apply it as a modifier to real-time classifier thresholds. Measure false positive rate change. Iterate.

The Fable 5 incident made the cost of the current gap visible and concrete. The fix does not require new research — it requires connecting data that already exists to decisions that are already being made.