Authorship Requires Process, Not Output

Authorship cannot be determined from finished text because it reflects only the outcome, not the process, and must instead be established by observing and recording how the content is created.

Shoko Plambeck

Head of Marketing

Authorship

The question of authorship has been misframed.

Most current approaches attempt to determine who wrote a document by analyzing the document itself. This assumes that authorship is encoded in the final artifact and can be extracted through statistical or stylistic analysis. That assumption is incorrect.

Authorship is not a property of the output. It is a property of the process that produced the output.

Defining Authorship Precisely

Authorship, in operational terms, answers two questions:

Who generated this content, and under what conditions was it created?

This definition is stricter than common usage. It does not refer to stylistic resemblance or likelihood. It refers to actual origin and the circumstances of production.

A document has a single true authorship history. That history includes actions such as typing, editing, revising, and structuring. It also includes the tools and inputs used during creation.

The final text is only the endpoint of that history. It does not contain the history itself.

The Limits of Artifact Based Inference

When systems attempt to infer authorship from completed text, they operate on incomplete information.

They observe surface level features such as vocabulary distribution, sentence structure, and fluency patterns. These features are shared across both human and machine generated writing.

As a result, the inference problem is underdetermined. Multiple distinct creation processes can produce indistinguishable outputs.

A human can write in a highly structured, predictable style. A language model can produce irregular or idiosyncratic phrasing. The mapping between process and output is many to one.

Without access to the process, authorship cannot be resolved with certainty.

Process Based Evidence

If authorship cannot be recovered from the artifact, it must be established through observation of the creation process itself.

Process based evidence records what actually occurs during writing. It does not attempt to reconstruct events after the fact.

This includes:

Timing patterns, such as pauses, bursts, and latency between inputs
Revision behavior, including insertions, deletions, and restructuring
Input modality, such as keyboard entry, paste actions, or assisted generation

These signals are not stylistic guesses. They are direct observations of behavior.

When captured during composition, they form a continuous record of authorship activity.

The Role of Behavioral Signals

Behavioral signals provide a measurable and attributable trace of creation.

They answer questions that the final text cannot:

Was the document typed continuously or assembled through pasted segments
Did the writer revise iteratively or produce large blocks instantaneously
What sequence of actions led to the final structure

These signals do not rely on interpretation of language. They rely on observation of actions.

This distinction matters. Language can be imitated. Behavior is harder to fabricate without leaving detectable traces.

Analogy to Established Systems

Other domains have already solved analogous problems by shifting from artifact to process.

In distributed systems, logs record every transaction and state change. If a discrepancy arises, the system does not infer what happened from the final state. It consults the log.

In finance, audit trails track each modification to a record. The integrity of the system depends on the ability to reconstruct events, not guess them.

In cryptography, digital signatures bind identity to an action at the moment it occurs. Verification relies on that binding, not on post hoc analysis.

Authorship requires the same structure. A verifiable record of creation events.

Why This Shift Is Necessary

Any system that attempts to infer authorship from output alone faces three constraints:

It cannot access the full set of relevant information
It cannot eliminate ambiguity between different creation processes
It cannot produce verifiable evidence

These are not engineering challenges. They are limitations imposed by missing data.

Observing the process resolves these constraints. It introduces new information that is directly tied to authorship.

Implications

Reframing authorship as a process property changes the solution space.

It eliminates the viability of detection based on finished text. It also clarifies the requirements for any credible alternative.

A valid system must:

Capture signals during creation
Bind those signals to the resulting document
Enable independent verification of that binding

Anything less reintroduces inference and uncertainty.

Conclusion

Authorship cannot be extracted from text because it does not reside there.

It exists in the sequence of actions that produced the text. That sequence must be observed, recorded, and verifiable.

The distinction is simple but decisive.

Inference attempts to guess authorship after the fact. Process based systems record it as it happens.

Only one of these approaches can produce evidence.

Why AI Detection Fails at a Fundamental Level

Detection

How Puddin Generates Verifiable Authorship Evidence

Verification

Where Authorship Verification Becomes Mandatory

Compliance