AI Trust in Healthcare Starts with Data Integrity — A Practitioner's Perspective

I've spent more than two decades in healthcare technology — deep in clinical environments, DICOM systems, and the daily realities of how data actually moves through hospitals. I've watched AI go from theoretical to operational. And I'll be honest: I'm both genuinely excited and genuinely worried.

AI holds extraordinary promise for transforming how we deliver care. But it also risks undermining the one thing medicine cannot function without: trust. Whether AI is good for healthcare isn't an abstract question for me. It's something I work through every day.

What AI in Healthcare Is Getting Right

When AI is implemented thoughtfully, it can make a real difference — and I've seen it firsthand.

Emergency physicians once relied on human scribes to document patient encounters so they could actually focus on the patient in front of them. AI-driven documentation and triage tools take that concept further, streamlining workflows from intake through treatment planning.^[1] Less clicking. More listening, examining, caring.

In those critical first minutes — heart attack or indigestion? stroke or migraine? — AI can synthesize symptoms, vitals, labs, and history to flag high-risk cases that might otherwise wait too long in an overcrowded emergency department.^[2] Done right, these tools don't replace clinical judgment. They sharpen it.

Radiology has been on this path for years. Computer-aided detection in mammography acts as a second set of eyes — catching subtle findings, reducing reader fatigue, improving consistency.^[3] The best systems augment expertise rather than supplant it. And in the operating room, AI-assisted surgical platforms are already demonstrating improved precision in complex procedures.^[4]

The benefits are real. But every one of them rests on a single assumption that we don't examine nearly enough.

The Assumption Nobody Wants to Challenge: That the Data Is Trustworthy

Every AI benefit I just described depends entirely on the quality of the data that trained the system. Data integrity isn't a technical footnote. It's the foundation on which everything else stands. And we need to ask uncomfortable questions.

Are training datasets representative of the populations these systems will be used to serve? Are imaging datasets complete, authentic, and diagnostically faithful? Can AI conclusions be replicated across different demographics and institutions? These aren't hypothetical concerns. Studies have already documented algorithmic bias in healthcare AI — particularly when systems train on narrow or skewed data.^[5] In medicine, bias isn't an inconvenience. It's a patient safety issue.

If we cannot verify where imaging data came from, how it was handled, and whether de-identification was applied consistently — we are building AI systems on foundations we cannot fully inspect.

The DICOM De-identification Problem: What Most AI Discussions Miss

This is where my concern becomes most specific — and most technical.

In healthcare AI research, we talk constantly about de-identified medical images. De-identification is essential: it protects patient privacy and enables data sharing for research and AI development. But having worked extensively with DICOM systems, I know something that rarely gets discussed in AI conversations: de-identification, if not carefully validated, can introduce subtle data degradation.

Pixel masking, compression artifacts, and metadata stripping — when applied imprecisely — can make small but potentially meaningful changes to imaging data.^[6] These changes may not be visible to a human reviewer. But an AI model trained on that data may learn from the artifact rather than the pathology. The model may still perform. It may even perform well in validation. But it is performing on something subtly different from what a clinician would see in a real clinical workflow.

If we cannot verify where imaging data came from, how it was handled, whether de-identification was applied consistently and validated, and whether the dataset is complete and authentic — then we are building AI systems on foundations we cannot fully inspect. That is not a responsible approach to clinical AI development.

This is why de-identification needs to be treated as a governed workflow, not a one-time pre-processing step. It requires audit trails, validation protocols, chain-of-custody documentation, and ongoing quality checks — the same operational rigor we apply to any other critical healthcare data process.

From Data Integrity to Patient Trust

The data integrity problem extends beyond the research environment into clinical deployment. Increasingly, patients arrive at appointments having interacted with AI systems that gave them authoritative-sounding diagnostic narratives — generated by models trained on unknown data, with unknown bias, carrying zero clinical accountability.^[9]

Clinician curiosity about AI is healthy and appropriate. Patient certainty based on unvalidated AI output is not — and it places enormous strain on time-constrained clinical encounters. The trust that makes medicine possible depends on a chain of accountability that currently has significant gaps.

What Healthcare Organizations Should Actually Do

The answer is not to stop adopting AI. The clinical benefits are real, the competitive pressure is real, and the long-term potential is significant. But responsible AI adoption in healthcare requires a specific set of operational commitments that most organizations are not yet making systematically.

Establish data provenance and governance before AI deployment

Know where your training data came from, how it was handled, and whether it is representative of your patient population. This is not optional due diligence — it is the prerequisite for trustworthy AI output.

Treat de-identification as a governed workflow, not a preprocessing checkbox

Apply validation protocols, maintain audit trails, and build quality assurance into the de-identification process itself. If you cannot verify the integrity of your de-identified imaging data, you cannot responsibly use it for AI training or research data sharing.

Build interoperability infrastructure before layering AI on top of it

AI tools that pull data from multiple systems — radiology, pathology, EHR, lab — require those systems to communicate reliably and consistently. Fragmented data environments produce fragmented AI outputs. Interoperability is an AI readiness prerequisite, not an AI add-on.

Define clear boundaries between clinical AI tools and consumer-facing AI

The governance requirements, validation standards, and accountability frameworks for these two contexts are fundamentally different. Organizations that treat them identically will encounter both patient safety and regulatory problems.

Preserve human accountability at every decision point

No algorithm should function as the final word in a clinical decision without a human clinician's informed review. The legal, ethical, and practical reasons for this are all converging — and they are not going away as AI becomes more capable.

Building the Infrastructure That Trustworthy AI Requires

I believe healthcare AI will ultimately be good for patients, for clinicians, and for the health of the country. But that outcome is not automatic. It requires deliberate choices about how we build, govern, and deploy these systems — starting with the data infrastructure that sits beneath every AI capability we are hoping to unlock.

The organizations that get this right will be the ones that treated data integrity, governance, and de-identification as strategic priorities — not afterthoughts. The ones that built the operational foundation before they deployed the algorithm.

That is the work worth doing now.

References

Sinsky C. et al. Allocation of Physician Time in Ambulatory Practice. Annals of Internal Medicine, 2016.
Rajkomar A. et al. Machine Learning in Medicine. New England Journal of Medicine, 2019.
McKinney S.M. et al. International Evaluation of an AI System for Breast Cancer Screening. Nature, 2020.
Hashimoto D.A. et al. Artificial Intelligence in Surgery. Annals of Surgery, 2018.
Obermeyer Z. et al. Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. Science, 2019.
Clunie D.A. DICOM De-identification: Privacy and Security Considerations. Journal of Digital Imaging, 2016.
Vosoughi S., Roy D., Aral S. The Spread of True and False News Online. Science, 2018.
Nightingale S.J., Farid H. AI-Synthesized Media and Misinformation. Psychological Science in the Public Interest, 2022.
FDA. Artificial Intelligence and Machine Learning in Software as a Medical Device (SaMD). FDA Discussion Paper.

Governance AI Readiness De-identification Data Integrity DICOM Healthcare AI Clinical AI

Jim Cook

Jim Cook is a Senior PACS Administrator with more than 20 years of experience in healthcare technology, clinical imaging environments, and DICOM systems. He is a contributor to Radiant AI Health Data, a healthcare data infrastructure company developing solutions for migration, interoperability, governance, de-identification, and AI readiness. Questions or thoughts? info@radiantaihealthdata.com