A commentary published on SuspectFile by researcher Marco A. De Felice argues that the cybersecurity field's habitual focus on identifying who carried out an attack consistently diverts attention from a more tractable question: why so much sensitive data was available to steal in the first place. The piece frames excessive data collection, centralization, and retention as a structural condition that amplifies every incident — one that attribution-first analysis leaves untouched.
The structural problem De Felice identifies
The argument centers on what De Felice calls structural fragility: organizations accumulate far more personal and clinical data than their immediate operational needs require, store it in centralized repositories, and hold it well beyond any defensible retention window. When an attacker — sophisticated or opportunistic — reaches that repository, the damage is proportional to the accumulation, not to the attacker's technical sophistication.
For healthcare settings, the parallel is direct. Electronic health records, billing systems, and patient portals routinely aggregate years of demographic, financial, and clinical history. Regulatory minimum-necessary and data-retention standards exist precisely to constrain this accumulation, but enforcement pressure on day-to-day data hygiene has historically been lighter than enforcement pressure following a disclosed breach.
Why attribution dominates the conversation anyway
Incident post-mortems, regulatory filings, and press coverage all generate immediate demand for a named threat actor or a specific CVE. That framing is satisfying and often legally useful — it supports law-enforcement referrals and insurance claims. De Felice's critique is not that attribution is worthless, but that it functions as a stopping point rather than a starting point.
When the headline reads "nation-state group exploited unpatched VPN," the implicit conclusion is that better patching or a different threat actor profile would have prevented the harm. The volume of records exposed — and the organizational decisions that produced that volume — receives less scrutiny. That asymmetry leaves the conditions for the next large-scale exposure intact.
Where this lands for independent practices
Smaller healthcare organizations often justify broad data retention on the grounds that purging records is technically complex or that they may need historical data for continuity of care. De Felice's framing suggests that complexity argument has a cost that rarely appears on the same ledger as the storage savings.
Practical implications for compliance officers include:
- Retention schedule enforcement. Policies that specify destruction timelines are only as effective as the technical controls that execute them. Scheduled audits of what data classes are actually being retained, and for how long, are a prerequisite for meaningful minimization.
- Centralization architecture review. Aggregating records from multiple encounter types or locations into a single repository increases breach magnitude. Segmentation decisions made at the architecture stage limit blast radius before any threat actor appears.
- Minimum-necessary application at the system level. HIPAA's minimum-necessary standard is typically applied to disclosure workflows. Applying the same logic to what systems collect and index in the first place addresses the structural condition De Felice describes.
What this signals for the next 12 months
OCR's updated HIPAA Security Rule proposal, still working through the rulemaking process, includes provisions that would tighten technical controls on data access and logging. If finalized, it shifts some compliance pressure upstream toward configuration and architecture decisions — closer to the root-cause framing De Felice advocates.
Whether or not that rule takes its current form, the underlying argument holds: breach economics in healthcare are driven as much by data volume and architecture choices as by the capabilities of any specific adversary. Organizations that treat threat intelligence as the primary input to their security programs, while deferring data minimization work, are optimizing for the wrong variable.