What is invisible data and how is it different from data I knowingly share?

Invisible data is passively collected without surface disclosure at the moment of capture. It includes behavioral telemetry like scroll velocity and cursor hesitation, device fingerprints, temporal metadata and inferences drawn from these signals. Unlike data you enter in a form, invisible data is never presented to you for review and in most cases cannot be corrected or deleted even with a formal subject access request.

Why does AI model training make data ownership more urgent than earlier data collection practices?

When data is used for ad targeting, it is queried from a warehouse and the original record persists. When data enters an AI training corpus, its influence is encoded into model weights distributed across billions of parameters. There is no mechanism to trace which weights your data shaped, and the GDPR right to erasure has no practical enforcement pathway against a deployed model. Provenance collapses at the training step, which is precisely why ownership infrastructure must exist before training begins.

How does PDAOS use zero-knowledge proofs in a training pipeline?

PDAOS uses zk-SNARK constructions to allow a training pipeline to verify that valid consent exists for each data asset without exposing the consent record itself to the model. The pipeline generates a proof that consent scope covers the intended use and that no withdrawal signal has been issued. This verification happens before ingestion and the proof does not leak identity or consent terms into the training data.

What is a data fiduciary and why does the nonprofit structure matter?

A data fiduciary holds data under legally enforceable obligations of loyalty and care running toward the data subject rather than toward commercial third parties. The nonprofit structure of Own Your Data Inc is material because it removes the commercial incentive conflict that makes invisible data harvesting profitable for advertising platforms. Fiduciary obligation requires that the institution's interests align with the people whose data it holds.

Where can I learn more about The Invisible Data and the PDAOS project?

Volume 6 of The Invisible Series, The Invisible Data, is published through theinvisible.life. The PDAOS technical documentation and the MyDataKey sovereign identity product are available at mydatakey.org. Both resources are maintained by Own Your Data Inc, a 501(c)(3) nonprofit founded by Dr. Patrick Fisher.

Personal Data Ownership in the AI Era | OwnMyData.ai

Most people assume their data is the information they type. A name in a form. A credit card number at checkout. A search query entered deliberately. That assumption is wrong by several orders of magnitude.

In Volume 6 of The Invisible Series, The Invisible Data, Dr. Patrick Fisher describes a more accurate picture: the data that matters most to platform operators, advertisers and AI developers is precisely the data users never consciously generate. It is the data that is invisible to the person it describes. And in 2026, as large-scale AI training becomes the dominant commercial rationale for data collection, that invisibility has stopped being a design inconvenience and become a structural power asymmetry.

This article unpacks what invisible data is, why AI training pipelines intensify the problem and what the Personal Data Asset Origination System (PDAOS) is built to correct. Personal data ownership is the core concept threading through all of it.

What Invisible Data Actually Means

The term comes directly from Dr. Fisher's framing in The Invisible Data. Invisible data is not data you forgot you shared. It is data that was never surfaced to you at the moment of collection, that does not appear in any dashboard you can access and that you could not correct or delete even if you knew it existed.

Three properties define it. First, it is collected passively: scroll velocity, cursor hesitation, session resumption timing, battery level, ambient audio fingerprints, accelerometer patterns on mobile devices. None of these require a form submission. Second, it is aggregated across contexts you experience as separate: your fitness app, your browser, your smart TV and your loyalty card rewards program are treated as one profile by data brokers even though you never linked them. Third, it is used to generate derivative data: inferences, scores and predictions about you that have no direct relationship to anything you did.

The W3C Data Privacy Vocabulary (DPV), maintained as a living specification, formalizes some of these categories under derived and inferred data classes. But regulatory recognition of a category does not mean users can see it, correct it or withdraw it from a training corpus after the fact.

A Taxonomy of What Gets Harvested Without Your Knowledge

To make the problem concrete, it helps to organize the collection surface into distinct layers.

Behavioral Telemetry

Every interaction with a digital interface produces a telemetry stream. Click patterns, reading depth, hover duration, typing cadence and navigation paths are captured by client-side JavaScript that ships with virtually every commercial web property. This telemetry is not disclosed in most privacy policies with any granularity. It is treated as a service improvement mechanism but has been documented, in Federal Trade Commission enforcement actions, as the raw material for psychographic profiling.

Network and Device Signals

IP geolocation, device fingerprinting via the combination of browser version, installed fonts, screen resolution and GPU rendering characteristics, and Wi-Fi network names all contribute to an identity graph that persists across cookie resets. The Electronic Frontier Foundation's Panopticlick research demonstrated how high-entropy fingerprints are in practice. Regulatory frameworks including GDPR under Article 4(1) treat fingerprints as personal data, but enforcement has lagged collection by years.

Relational and Social Graph Data

When a platform holds data about your contacts, your communications metadata or your co-occurrence with other users in physical spaces, it generates relational data about people who have no account on that platform. This is the shadow profile problem. Your data does not only describe you. It describes people who never consented to that platform's terms at all.

Temporal and Contextual Metadata

Timestamps at the millisecond level, sequence data showing what you did before and after a specific action, and contextual signals like time of day correlated with content type create temporal signatures that are often more predictive than the content itself. Research published through arXiv on metadata-based re-identification has shown repeatedly that temporal metadata alone can de-anonymize individuals in large datasets.

Why the AI Training Era Intensifies the Asymmetry

The collection practices described above predate large language models and multimodal AI systems. What changes in the current moment is not what gets collected. It is what gets done with it.

AI training pipelines consume data at a scale and with a finality that earlier use cases did not. When a behavioral telemetry record is used to serve a targeted advertisement, that record is queried and then sits in a warehouse. When the same record enters a training corpus, it is encoded into model weights. The record's influence becomes distributed across billions of parameters. There is no audit trail connecting your behavioral data to the specific weights it shaped. There is no mechanism to withdraw your contribution after training completes. The GDPR right to erasure, specified under Article 17, has no practical enforcement pathway against a deployed model that was trained on your data.

Dr. Fisher's analysis in The Invisible Data frames this as a provenance collapse. The chain of custody between a data subject and the downstream use of their data is severed at the point of model training. What remains is a system that benefits commercially from your data while offering you zero verifiable accounting of how it was used.

This is not a theoretical concern. The FTC's enforcement actions in recent years and the European Data Protection Board's guidance on AI training data have both flagged the gap between collection consent and training use as a material compliance problem. But compliance frameworks catch up to exploitation slowly. The technical infrastructure for provenance has to be built before the regulatory mandate arrives if it is to be meaningful.

Inferential Data: The Hidden Layer Most Frameworks Miss

Collected data is only the first layer of the asymmetry. The second layer is inferential data: conclusions drawn about you that you have never seen and cannot contest.

Inferential data includes credit risk scores derived from browsing behavior, health condition predictions derived from purchase history, political affiliation scores derived from social graph position and emotional state estimates derived from typing cadence. None of these are things you reported. All of them circulate in commercial data broker pipelines and feed back into decisions that affect your material life.

The ICO in the United Kingdom published guidance in 2023 specifically addressing inferred data under UK GDPR, acknowledging that inferences carry the same personal data status as directly collected information and trigger the same subject access rights. The practical problem is that most people do not know inferences exist, let alone how to file a subject access request for them.

AI systems make inference generation cheap and fast. A model trained on population-level behavioral data can produce inferences about a new user from a handful of interactions. The inference is generated before the user has any meaningful understanding of what the system is concluding about them. The asymmetry compounds: the platform's knowledge about you grows faster than your ability to understand or contest it.

How PDAOS Infrastructure Corrects Structural Asymmetry

The Personal Data Asset Origination System is the technical infrastructure developed through Own Your Data Inc to address the provenance collapse Dr. Fisher identifies in The Invisible Data. PDAOS operates on a core architectural principle: data must be treated as an asset with a verifiable chain of custody from the moment of origination.

An asset in the PDAOS model is not just a record. It is a record bound to a cryptographic identity, a consent receipt and a provenance anchor. The consent receipt is not a checkbox acknowledgment. It is a structured, machine-readable record conforming to the Kantara Initiative Consent Receipt Specification, extended in the PDAOS implementation to carry data-type granularity, purpose binding and downstream transfer restrictions.

The provenance anchor uses content-addressed storage to create a tamper-evident record of what data existed, when it was created and under what consent terms. This anchoring happens at origination, before the data enters any processing pipeline. Downstream consumers of the data, including AI training infrastructure, receive a data asset that carries its own provenance history.

This architecture addresses the specific failure mode of the AI training era: the severing of the data subject from the downstream use. With PDAOS, the consent receipt travels with the data asset through its lifecycle. A training pipeline that ingests a PDAOS-anchored asset can verify consent scope before including the record. A data subject can query the provenance ledger to confirm what assets they originated and what consent terms applied at origination.

Consent receipts as a concept are not new. The Kantara Initiative published the Consent Receipt Specification as version 1.1 in 2017 and the work informed ISO/IEC 29184 on online privacy notices and consent. What PDAOS adds is the cryptographic binding that makes a consent receipt enforceable rather than merely documented.

In the PDAOS implementation, a consent receipt is signed by both the data subject (using their Decentralized Identifier, or DID, conforming to the W3C DID Core Specification) and the data processor at the moment of origination. The signed receipt is hashed and anchored to an append-only ledger. Neither party can alter the terms of the consent retroactively without producing a detectable fork in the provenance chain.

This matters for AI training pipelines in a specific way. When a PDAOS-anchored dataset is ingested for training, the pipeline can perform automated consent verification: checking that the purpose binding on each asset covers AI training as a declared use, that the data subject's DID has not issued a withdrawal signal and that the data type is within scope. Records that fail verification are excluded before training begins rather than after a regulatory complaint arrives.

Zero-knowledge proof techniques, specifically zk-SNARKs as implemented in production systems like Groth16 and PLONK, enable a training pipeline to verify consent status without exposing the underlying consent record to the model itself. The pipeline proves it has valid consent without the proof process leaking identity or consent terms into the training data. This is a non-trivial cryptographic property. It separates the verification function from the training function, which is the correct architectural boundary.

MyDataKey, the product companion to this research, implements the DID and consent receipt layer described above. Users who generate their MyDataKey hold a sovereign identity that can sign consent receipts and query provenance anchors without depending on any centralized identity provider.

The Data Fiduciary Path Forward

Infrastructure alone does not resolve the power asymmetry that invisible data creates. Technical architecture sets the conditions for accountability. The legal and institutional structure that operationalizes accountability is the data fiduciary model.

A data fiduciary holds data on behalf of its subject with legally enforceable obligations of loyalty, care and confidentiality toward that subject rather than toward third-party commercial interests. The concept was formalized in academic literature by Lina Khan and David Pozen and has since been adopted in policy discussions by the EDPB and explored in the Indian Digital Personal Data Protection framework. It reframes the relationship between platform and user: not a contract between parties with equal bargaining power but a fiduciary obligation running from the stronger party to the weaker one.

Own Your Data Inc operates as a 501(c)(3) nonprofit precisely because the fiduciary relationship requires structural independence from the commercial incentives that drive invisible data harvesting. The PDAOS infrastructure is designed to be operated by entities that hold data fiduciary obligations, not by advertising platforms or AI developers whose commercial interests conflict with the data subject's interests.

The path Dr. Fisher maps in The Invisible Data runs from awareness through architecture to accountability. Awareness means understanding what invisible data is and why it matters. Architecture means building PDAOS-class infrastructure that makes provenance verifiable and consent enforceable before data enters a pipeline. Accountability means adopting data fiduciary governance structures that align institutional incentives with data subject interests.

None of these steps is sufficient alone. Awareness without architecture produces policy documents that change nothing at the technical layer. Architecture without accountability produces cryptographic systems operated by entities whose interests still diverge from users. The framework requires all three layers operating together.

Personal data ownership in the AI era is not a privacy setting. It is a property right backed by cryptographic proof, institutional structure and legal obligation. The Invisible Data is available through theinvisible.life and PDAOS development is documented through mydatakey.org. The infrastructure exists. The question is how fast it scales relative to the systems it is designed to correct.

The Invisible Data: Why Personal Data Ownership Matters in the AI Era

What Invisible Data Actually Means

A Taxonomy of What Gets Harvested Without Your Knowledge

Behavioral Telemetry

Network and Device Signals

Relational and Social Graph Data

Temporal and Contextual Metadata

Why the AI Training Era Intensifies the Asymmetry

Inferential Data: The Hidden Layer Most Frameworks Miss

How PDAOS Infrastructure Corrects Structural Asymmetry

The Data Fiduciary Path Forward

Frequently Asked Questions

What Invisible Data Actually Means

A Taxonomy of What Gets Harvested Without Your Knowledge

Behavioral Telemetry

Network and Device Signals

Relational and Social Graph Data

Temporal and Contextual Metadata

Why the AI Training Era Intensifies the Asymmetry

Inferential Data: The Hidden Layer Most Frameworks Miss

How PDAOS Infrastructure Corrects Structural Asymmetry

Consent Receipts and Cryptographic Provenance Anchoring

The Data Fiduciary Path Forward

Frequently Asked Questions