Federated Learning's Trust Assumptions and the Curious Server Problem

Federated Learning's Trust Assumptions and the Curious Server Problem
Quick Answer
Federated learning is not automatically private. The honest-but-curious server threat model identifies the aggregating server as an adversary that can run gradient inversion attacks on individual client updates, recovering training data without deviating from the protocol. Secure aggregation (Bonawitz et al., CCS 2017) closes the individual update inspection vulnerability. Differential privacy bounds inferential disclosure from aggregates. Neither mechanism alone is sufficient. Production FL deployments require both, with formal composition accounting, minimum cohort size enforcement and a released model policy that treats global weights as sensitive output.

Federated Learning Is Not Automatically Private

Federated learning gets described, frequently and carelessly, as a privacy-preserving machine learning technique. The framing is seductive. Raw training data stays on device. No central repository accumulates sensitive records. The server only sees model updates, not the data itself. It sounds like the privacy problem is solved by design.

It is not solved. The architecture shifts where data exposure can occur. It does not eliminate exposure.

The gap between "data never leaves the device" and "the system is private" is where most real-world FL deployments fail. Understanding that gap requires a precise look at the threat model, the attack surface of gradients and model updates, and the cryptographic protocols that can close specific vulnerabilities when applied correctly. Federated learning as a discipline is worth taking seriously. Treating it as a privacy guarantee without qualification is an engineering mistake.

This article is written for practitioners who need to reason carefully about where FL's protections begin and end, and what additional mechanisms are required to build systems that hold up under adversarial analysis.

The Honest-But-Curious Server Threat Model

In cryptography and distributed systems, the honest-but-curious adversary model describes a party that follows the protocol correctly but also observes and records everything it can, using that information to learn more than the protocol was designed to reveal. The party does not cheat in ways that break the protocol. It simply pays close attention.

In federated learning, the aggregating server is the canonical honest-but-curious party. In the standard FL loop, each participating client computes a local gradient or model update using its private data, and sends that update to the central server. The server aggregates updates from many clients and produces an improved global model. The server is supposed to learn only the aggregate, not the individual contributions. In practice, the server sees each individual update before aggregation. That is the exposure window.

An honest-but-curious server can record every gradient it receives. It can analyze those gradients using inversion techniques. It can correlate updates across rounds to reconstruct training data distributions. None of this requires the server to deviate from the stated protocol. The server is simply doing more with the information it legitimately receives than the system architect assumed it would.

The original federated learning paper by McMahan et al., published at AISTATS 2017, described the architecture and its communication efficiency properties. It framed the privacy benefits informally. Subsequent work, particularly by Zhu et al. in their 2019 NeurIPS paper "Deep Leakage from Gradients" (arXiv:1906.08935), demonstrated concretely that gradients can reconstruct training images and text with alarming fidelity. The informal privacy claim collapsed under formal analysis almost immediately.

How Gradient Leakage Actually Works

Gradient leakage exploits the fact that model gradients carry structured information about the data used to compute them. A gradient is not a random vector. It is a function of the model weights and the training batch. Given the model weights, an attacker can solve an optimization problem: find synthetic inputs whose gradients match the observed gradient. The solution to that optimization recovers an approximation of the original training data.

Zhu et al.'s attack, now referred to as DLG (Deep Leakage from Gradients), demonstrated pixel-level reconstruction of training images from gradients in fully-connected and convolutional networks. The attack works by minimizing the L2 distance between the gradient of a dummy input and the observed gradient. It converges to a close approximation of the original input in a surprising number of settings.

Subsequent refinements strengthened the attack substantially. Zhao et al.'s R-GAP attack (arXiv:2004.00053) showed that in certain architectures, gradient leakage can be analytically exact rather than approximate. Yin et al.'s GradInversion method (arXiv:2104.07586) scaled reconstruction to ImageNet-resolution images from batch gradients. The research trajectory is consistent: gradient leakage is not a narrow edge case. It is a structural property of how gradients encode training data.

Label inference compounds the problem. In classification tasks, an attacker who receives gradients can often infer training labels directly from gradient properties, even before attempting full input reconstruction. Zhao et al. showed label inference requires no additional assumptions beyond access to the gradient and the model architecture.

Text is not exempt. Phong et al. and subsequent NLP-focused work demonstrated that token-level reconstruction from gradients is feasible in transformer-based language models. Sequence-level leakage is harder but not impossible, particularly with large batch sizes or when the attacker has auxiliary knowledge about the vocabulary distribution.

Secure Aggregation Protocols and Their Limits

Secure aggregation is the cryptographic response to the honest-but-curious server problem. The goal is to allow the server to compute the sum of client updates without seeing any individual update in plaintext. Bonawitz et al. published the foundational secure aggregation protocol for FL at CCS 2017. The protocol uses pairwise masking: clients add correlated random masks to their updates such that masks cancel in the aggregate. The server receives the unmasked sum but cannot isolate any single client's contribution.

The protocol handles client dropout through a secret-sharing scheme. If a client fails during the round, the server can reconstruct the dropped client's mask contribution without learning anything about the update itself. This is essential for practical deployment because mobile and edge clients are unreliable.

Secure aggregation materially closes the honest-but-curious server vulnerability for individual update inspection. A server running the Bonawitz protocol cannot run DLG-style attacks against individual client gradients because it never sees them in plaintext.

The limits are real, though. Secure aggregation protects against inspection of individual updates. It does not protect against attacks on the aggregate itself. If the aggregate is computed over a small number of clients, or if one client's update is large relative to others, the server can still make inferences about individual contributions by examining how the aggregate changes across rounds. This is a subtraction attack: the server computes the aggregate for round N minus the aggregate for round N-1, isolating the influence of a specific client if that client only participated in one of the two rounds.

Secure aggregation also does not defend against a malicious server that deviates from the protocol, sends crafted global models designed to amplify gradient leakage, or colludes with a subset of clients. The honest-but-curious model is specific. Secure aggregation is designed for that model and does not generalize to stronger adversaries without additional mechanisms.

Differential Privacy as a Complement to Federated Learning

Differential privacy provides a formal mathematical guarantee about information leakage. A mechanism satisfies (epsilon, delta)-differential privacy if the probability of any output changes by at most a factor of e^epsilon (with an additive delta term) when any single training record is added or removed. The guarantee bounds what any computationally unbounded adversary can infer about any individual.

In the FL context, DP is applied through gradient clipping and noise addition before the client sends its update. Each client clips its gradient to a maximum L2 norm and adds Gaussian noise calibrated to the sensitivity and the target epsilon. The server aggregates these noisy clipped gradients. The composition theorem governs how privacy degrades across multiple rounds.

Google's DP-FedAvg implementation, described by McMahan et al. in their 2018 ICLR paper, demonstrated that DP and FL can be combined without catastrophic model quality loss at scale. The key insight is that large numbers of clients allow the noise to be small relative to the signal in the aggregate, because noise averages down while the true gradient signal does not. This is user-level DP, not example-level DP, and the distinction matters for threat modeling.

The W3C Privacy Principles document (w3.org/TR/privacy-principles/) and the NIST Privacy Framework (nist.gov/privacy-framework) both treat data minimization and formal privacy guarantees as distinct and complementary properties. FL provides one form of data minimization. DP provides a formal bound on inferential disclosure. Neither is sufficient alone for most practical threat models.

The epsilon budget question remains genuinely hard. Practical FL deployments with reasonable model utility typically operate at epsilon values of 1 to 10 per training run. Meaningful formal guarantees require epsilon below 1. The gap reflects an honest engineering tradeoff, not a theoretical failure. Anyone claiming their FL deployment is differentially private should specify the epsilon, the delta, the clipping norm, the number of rounds and the composition accounting method used.

When FL Actually Protects Data and When It Does Not

FL without secure aggregation and without DP protects against a passive external network attacker who can observe traffic. It does not protect against the aggregating server, a compromised server, or a server operator who decides to run inversion attacks on the gradients they receive. The protection is transport-layer adjacent, not cryptographic.

FL with secure aggregation protects against an honest-but-curious server inspecting individual updates. It does not protect against subtraction attacks on small cohorts, attacks on the aggregate itself or a malicious server sending adversarial global models. Secure aggregation is a meaningful upgrade over plain FL for the specific threat it targets.

FL with secure aggregation and user-level DP at a well-calibrated epsilon provides the strongest privacy properties available in the standard FL architecture. It bounds inferential disclosure from aggregates, prevents individual update inspection and tolerates client dropout. The residual risks are model-level memorization, membership inference at scale and adversarial model poisoning attacks from malicious clients.

FL does not protect against membership inference attacks on the released global model. A global model trained on sensitive data can be queried to determine whether a specific record was in the training set, using shadow model techniques described by Shokri et al. at IEEE S&. P 2017. The privacy perimeter of FL ends at the global model. Anyone with query access to the model has attack surface.

At Own Your Data Inc, our technical analysis of these systems in the context of the Personal Data Asset Origination System (PDAOS) consistently finds that FL is most protective when the cohort is large (thousands of clients per round), when secure aggregation is implemented at the protocol level rather than assumed, when DP noise is added with formal composition tracking and when the global model release policy treats the model as sensitive output rather than a sanitized artifact. Systems that satisfy all four conditions are rare in production deployments as of 2026.

Architectural Recommendations for Privacy-Preserving FL

Building FL systems that hold up under adversarial analysis requires deliberate choices at each layer of the stack. The following recommendations reflect the current state of the field based on published cryptographic and systems research.

Implement Secure Aggregation at the Protocol Layer

Do not treat secure aggregation as optional. The Bonawitz et al. CCS 2017 protocol has open-source implementations maintained by Google's TensorFlow Federated team. Production deployments should use a verified implementation, not a hand-rolled variant. The dropout handling in the protocol is non-trivial to implement correctly.

Apply DP with Formal Composition Accounting

Use the Renyi DP accountant for composition across rounds rather than the basic composition theorem. The basic theorem is pessimistic. Renyi composition gives tighter bounds. The Google DP library (github.com/google/differential-privacy) provides accountants that track epsilon accurately across rounds. Set a privacy budget and enforce it as a hard training termination condition, not a guideline.

Enforce Minimum Cohort Size Per Round

Even with secure aggregation, small cohorts create subtraction attack risk. Set a minimum participation threshold per aggregation round. What constitutes a safe minimum depends on the sensitivity of the data and the epsilon budget, but cohort sizes below 100 clients per round should be treated as high-risk absent additional analysis.

Treat the Global Model as Sensitive Output

Apply membership inference risk analysis to any model released externally. If the model cannot be published without membership inference risk, consider releasing model predictions under query constraints rather than model weights. Output perturbation at inference time is a weaker protection than training-time DP but is better than no protection for released models.

Audit Trust Assumptions Explicitly

Document which threat model your FL deployment is designed to defend against. Honest-but-curious server, malicious server, malicious clients and passive external attacker are distinct adversaries requiring distinct defenses. A threat model document that lists all four without specifying which are in scope is not a threat model. It is a list of possibilities.

The broader conversation about data sovereignty and personal data rights, developed extensively in The Invisible Data (Volume 6 of The Invisible Series) and instantiated in the MyDataKey protocol at mydatakey.org, treats FL not as a privacy solution but as one architectural primitive among several. Privacy engineering requires combining FL with cryptographic identity, consent receipts and data fiduciary governance to build systems where trust assumptions are explicit and auditable rather than assumed.

Federated learning is a genuine contribution to privacy-preserving machine learning. The contribution is specific and bounded. Treating it as more than it is does not protect users. It creates false confidence that is arguably worse than acknowledged ignorance.

Frequently Asked Questions

Does federated learning prevent the server from seeing my training data?
Federated learning prevents the server from receiving raw training data directly. It does not prevent the server from reconstructing training data from gradients using inversion attacks like Deep Leakage from Gradients (arXiv:1906.08935). Without secure aggregation, the server sees individual client updates in plaintext and can apply these attacks. Secure aggregation closes that specific vulnerability but does not protect against all server-side inference.
What is the difference between secure aggregation and differential privacy in federated learning?
Secure aggregation is a cryptographic protocol that prevents the server from seeing individual client updates, ensuring it only receives the aggregate sum. Differential privacy adds calibrated noise to gradients before aggregation to bound how much any individual record can influence the output. They address different attack surfaces and are complementary: secure aggregation protects against individual update inspection while DP bounds inferential disclosure from the aggregate itself.
How does a subtraction attack work against federated learning with secure aggregation?
A subtraction attack works by comparing aggregates across rounds. If a client participates in round N but not round N-1, the server can subtract the two aggregates to isolate that client's approximate contribution, even without seeing the individual update in plaintext. This risk is highest with small cohort sizes. Minimum cohort size enforcement and per-round participation randomization reduce but do not eliminate this exposure.
What epsilon value for differential privacy is considered meaningful in a federated learning deployment?
Formal DP guarantees with strong interpretability require epsilon below 1. Practical FL deployments with acceptable model utility typically operate at epsilon values between 1 and 10 per training run, reflecting a real tradeoff between privacy strength and model quality. Any deployment claiming DP protection should specify the exact epsilon and delta values, the clipping norm, the number of rounds and the composition accounting method used.
Can a released federated learning global model expose information about training participants?
Yes. A global model trained on sensitive data is vulnerable to membership inference attacks regardless of how the training process was structured. Shadow model techniques can determine whether a specific record was in the training set by querying the model's output distribution. Releasing model weights without membership inference analysis extends the privacy perimeter beyond the training system itself. Output perturbation or query-constrained inference APIs reduce but do not eliminate this risk.
federated learninggradient leakagesecure aggregationthreat modelingdifferential privacyprivacy engineeringhonest-but-curious
← Back to Blog