Horror Stories in Pathology Informatics: The Case of the Phantom Neutrophil

06/19/2026

Podcast

In this episode of CIPI Connections, members of the CAP Informatics Committee, Alexis Carter, MD, FCAP, and Richard Davis, MD, MPH, BS, FCAP, discuss the mysterious case of a disappearing hypereosinophilic syndrome.

For questions, feedback, or to learn how to submit your own idea, please email informatics@cap.org.

Transcript

Dr. M. E. de Baca:
Welcome to CIPI Connections, the podcast of the College of American Pathologists Council on Informatics and Pathology Innovation, also known as CIPI. Here we connect you with the leaders and committees shaping the future of pathology. I’m Dr. M. E. de Baca, the Chair of CIPI. In this episode from our Horror Stories and Pathology Informatics series, Dr. Alexis Carter and Dr. Richard Davis will be sharing a mysterious case of a disappearing hypereosinophilic syndrome. Take it away, Dr. Carter.

Dr. Alexis Carter:
So hello everyone. Welcome to Horror Stories and Pathology Informatics, Lessons Learned from Things Gone Awry, a series to help your healthcare organization, number one, keep your patients safe, and number two, avoid painful problems. I’m your host, Dr. Alexis Carter, a pathologist and the moderator for this series. And with me today is another pathologist, Dr. Richard Davis. As a reminder, these podcasts are available from your favorite podcast service under CIPI Connections, that’s C-I-P-I connections, and is also from the College of American Pathologists website at CAP.org. Today’s episode is the case of the Phantom Neutrophil. As always, there are a few disclaimers needed before we get started. All the situations discussed in this podcast are based on real events. So if any of our listeners think that the situation described can’t or wouldn’t happen, we can assure you that it can and did. Having said that, all information that could identify the personnel or the healthcare organization involved have been removed and replaced with fictitious characters and a fictitious location of Cabot Cove Memorial Hospital. Any locale-specific aspects, for example, regulations that are more specific than a national U.S. level have been removed. And some details regarding the situation which do not affect the main point of the event have been changed. Finally, the person presenting the adverse event is not the person who contributed for discussion. And all these things were done to ensure that we focus on the lessons learned so that we can help others avoid these mistakes. Of course, this podcast does not represent legal or medical advice, and the lessons learned may not account for specific barriers that may be present at your own organization. All right, so now that we’ve done our disclaimers, let’s get started. Again, here with me is Dr. Davis, who is going to present today’s healthcare software issue on the Case of the Phantom Neutrophil and the lessons learned from it. So, Dr. Davis, let’s start with the basics of the failure that we’re going to discuss today. What happened?

Dr. Richard Davis:
Thanks, Dr. Carter, for the introduction. So the case today involves a 10-month-old male infant who had a three-day history of intermittent fever and was referred to the Cabot Cove Memorial Hospital Emergency Department by his primary care physician for evaluation of a suspected infection. He was accompanied by his mother on this particular day. So the results of the complete blood count and differential cell count showed absolute neutropenia with eosinophilia. The patient received high-dose intravenous antibiotics and was admitted to pediatric oncology with a bone marrow biopsy scheduled for the following day. The next morning, the CBC and differential showed normal numbers of neutrophils with a resolution of the eosinophilia. A repeat blood sample showed the same result. The patient was transferred to the general pediatrics’ floor for another night of observation. He was subsequently discharged to home. All laboratory tests were normal at the follow-up clinic visit.

Dr. Alexis Carter:
Okay, so initially neutropenia and then eosinophilia. Okay, so this is really very odd since it all disappeared. So I suppose, since you’re bringing this case here to the podcast, that the suspected infection didn’t explain the transient decrease in neutrophils or the increase in eosinophils.

Dr. Richard Davis:
Uh no, not at all.

Dr. Alexis Carter:
All right, fascinating. I’m all ears. What happened?

Dr. Richard Davis:
So all manual differential counts are performed in the laboratory using a digital hematology analyzer that captures images of the cells from the peripheral blood smear slide and classifies the cells using artificial intelligence. Although the cells are pre-sorted and tagged as to cell type by the analyzer, review and verification by the medical technologist is required before the results are finalized and posted to the patient’s chart. It turns out that there had recently been a change in the formulation of the reagents used to stain the peripheral smears, which affected the stain quality. The change in staining quality caused the digital hematology analyzer to misclassify neutrophils as eosinophils.

Dr. Alexis Carter:
Alright, so that doesn’t sound good, but that still doesn’t explain how it reached the patient’s chart if the medical technologist had to review and approve the results.

Dr. Richard Davis:
True. In this case, the medical technologist reviewing the differential count had seen that some of the neutrophils had been misclassified as eosinophils and had actually reclassified all the cells tagged as eosinophils back to neutrophils. But this action did not correct the differential count. The medical technologist had intended to correct the differential count, but was pulled away to perform a time-sensitive test. Upon returning to the differential count, they had forgotten they needed to correct the differential result and instead finalized it with neutrophils being counted as eosinophils and no count for the neutrophils.

Dr. Alexis Carter:
Oh, oh boy. So no count for the neutrophils. So that explains why the patient was put on the pediatric hematology oncology unit. I bet that lab result caused all kinds of worry.

Dr. Richard Davis:
Oh, yes, it did. An infant presenting with a zero neutrophil count uh uh can be quite concerning for leukemia or some kind of bone marrow failure syndrome. So imagine the surprise. And fortunately in this case, it was a happy surprise when the counts came back normal the next morning.

Dr. Alexis Carter:
Yeah, we don’t get many happy surprises in medicine. They’re usually bad surprises. So I imagine they were happy and the parents were relieved. Um, I’m also sure that there was an investigation by the laboratory. So what happened?

Dr. Richard Davis:
So, yeah, we’ll talk a little bit about the laboratory investigation. So the stain that was used for the peripheral smears had recently been reformulated, and this caused the digital hematology analyzer artificial intelligence algorithm to misclassify neutrophils as eosinophils. Both sets of cells are granulocytes in the myeloid lineage of white blood cells, and consequently both have a lot of cytoplasmic granules and convoluted nuclei. The main difference is that the granules of the eosinophils are typically bright red, while they are purple in neutrophils.

Dr. Alexis Carter:
So, you know, so hold up. Aren’t laboratories required to do some level of revalidation after a change in reagents, including staining reagents?

Dr. Richard Davis:
Yeah, so the reformulated stain had been revalidated through visual inspection by the technologists, so no one thought anything was amiss. However, no one had checked that the slides with the reformulated stain were being analyzed the same way by the digital analyzer’s artificial intelligence algorithm. And frankly, laboratory staff had grown accustomed to relying on the accuracy of the instrument and were accustomed to only having to reclassify occasional cells and not 40% of the cells.

Dr. Alexis Carter:
Okay, so what could have been done to avoid or mitigate this problem in this case?

Dr. Richard Davis:
So, yeah, that’s a good question. The long-term solution, of course, is to ensure that the reagent stain used produced a staining quality that is adequate for use on the digital hematology analyzer. The change in the stain quality due to reformulation of the staining reagents caused the digital hematology analyzer AI algorithm to misclassify neutrophils as the eosinophils. So anytime a slide is being analyzed via a digital algorithm, it is always best to validate that the reformulated stain does not cause a big change in the results.

Dr. Alexis Carter:
Yeah, so I’m remembering there’s an AI term for this phenomenon, isn’t there?

Dr. Richard Davis:
Yes. So when a change that is small in input results in a big or clinically significant change in output, it means that the model is what we call quote-unquote brittle.

Dr. Alexis Carter:
Okay, so what are some other things that affected this case?

Dr. Richard Davis:
The laboratory staff had just grown accustomed to relying on the accuracy of the instrument such that it didn’t really register just how off the counts were. This is an example of automation bias, which is the human tendency to over-rely on automated systems despite common sense to the contrary, and treating machine-generated suggestions as inherently more accurate than human judgment. In this case, a change in the input, that is, the visual appearance of the stained cells, caused a deterioration in performance for the AI algorithm. This applicable to not only digital image-based models, but also to clinical decision support models. And a change in the hospital drug formulary or new test component build could result in big changes in the output for a brittle AI model.

Dr. Alexis Carter:
Okay, so we’ve talked about some of the main contributing factors for this particular case, but there’s always secondary or even sometimes tertiary factors that can contribute to issues like this. These can include, you know, kind of like inhibited communication sometimes due to poor relationships, silos between groups, you know, kind of systemic infrastructural issues, et cetera. So what kind of secondary and tertiary issues do you think contributed to this particular failure or any delays in its correction?

Dr. Richard Davis:
Yeah, so this kind of comes back to first principles a little bit and how we, you know, build and use our various information systems. So in this case, one of the contributing issues identified was the fact that the differential count could be verified with no value entered for the neutrophil count. So the medical technologist appropriately recognized that the differential needed to be corrected and had moved all of the neutrophils into the eosinophils classification, intending for this to serve as a trigger to remind them to correct the result. Unfortunately, the technologist was distracted, and when they returned to the result, the system allowed them to finalize the differential account with no values entered for the neutrophils.

Dr. Alexis Carter:
Okay, so what do you think could have been done to prevent that from happening?

Dr. Richard Davis:
Although there are certain clinical circumstances where no neutrophils being present is a plausible result, requiring the laboratory staff to always enter a numerical value for the neutrophils, even if that is zero, may give them the pause needed to reevaluate the results prior to verification.

Dr. Alexis Carter:
Yeah, that’s a good point because it is exceedingly rare when neutrophil counts are absolutely zero, even in a patient who’s got neutropenia. So let’s go on to the examine the issue of workflow. So, what do you think could have been done to prevent this particular workflow issue, specifically the distraction, from causing these, you know, wrong results to go out?

Dr. Richard Davis:
Yeah, well, I think as we all know from our day-to-day life experience, it can’t be stressed enough that humans are bad at task switching, also known as multitasking, despite our best intentions, there are a lot of research studies that show task switching increases cognitive load, decreases efficiency, and increases the risk of error. So, whenever possible, laboratory staff should complete whatever tasks they’re performing before trying to start something else. So doing one thing at a time, relying on human memory as well to pick up where we left off is just not a good workflow strategy. It’s especially important when similar samples for tasks for different patients are involved.

Dr. Alexis Carter:
I totally agree, but you stated that the technologist got pulled away to perform a time-sensitive task, which presumably was a stat or another task on a very sick patient. So, how do you think the workflow could be better managed for that particular situation?

Dr. Richard Davis:
Yeah, so there’s always going to be urgent situations that need to be handled in the laboratory. And if you only have a single technologist on that bench, then sometimes it’s a matter of training the technologists that if they get interrupted, they must start from square one when they return to an interrupted task, in spite of the fact that that may increase the amount of time they have to spend on that task. At other times, though, and especially for high-volume tests like CBCs or basic clinical chemistry tests, there may be more than one technologist such that the time-sensitive task can be given to someone who is not currently occupied with a task.

Dr. Alexis Carter:
So, Dr. Davis, let’s talk a little bit about automation bias. You and I are both involved in clinical informatics. I know I have had some real interesting experiences with automation bias. And it’s a fascinating phenomenon, right? Because you’ve got physicians who are and other healthcare workers who are highly trained professionals. They’re very smart individuals. And my understanding of automation bias has always been that the person is kind of accepting what the computer is telling them, even though common sense and your training would clearly indicate that whatever the computer is saying is not true. I’m wondering kind of what experiences you know you may have had, obviously not revealing any institutional dirty laundry, you know, with automation bias or kind of what kinds of things you may have done to try to prevent people from falling into the rabbit hole of automation bias.

Dr. Richard Davis:
Yeah, so I think what you’re hitting on, Dr. Carter, is there’s actually a couple of different manifestations of automation bias. As you mentioned, it’s the bias sort of as such is when laboratory professionals and clinicians, at least in our space, over-rely on automated systems and favor those outputs over their own critical judgment, especially when evidence is pointing somewhere else. There’s sort of two distinct forms, though, that this takes that sort of manifest. You have errors of omission. So these occur when a professional fails to notice or react to a problem because the automated system did not provide an alert. So that would be kind of related to what we had in this situation of the phantom neutrophil. So examples include, you know, technician might overlook a significant sample abnormality, like the fact that there’s an eosinophilia and essentially zero neutrophil count because the analyzer doesn’t flag that. There’s also kind of what you’re referring to, which is an error of commission. So those types of errors occur when a user uncritically follows an incorrect suggestion or recommendation from an automated system, even when it conflicts with other available evidence. And some of the experience that I’ve had with this comes down to, you know, flagging issues where you’ll have a clinician will get back a pathology report and there’s not like particular keywords or something in there. And so it’s almost like they’re treating the report as just a machine-generated result and not necessarily going back and checking you know, with what the patient’s clinical picture looks like, imaging studies, you know, if there’s microbiology results on board. And so it’s always important to kind of, you know, contextualize and check whatever laboratory report you’re getting back with the patient’s clinical condition.

Dr. Alexis Carter:
Right. I mean, I totally agree. I mean, a lot of times these errors happen when people are busy. I mean, we just talked about this technologist who is distracted, got pulled away, came back. You know, I know I myself, when I’m in the middle of doing clinical sign out, you know, I even though I have, you know, three meetings that day or whatever, when I’m in the middle of doing clinical sign out, like even I’m like, I’m just gonna try to squeeze in 10 more minutes of like, I’m not gonna do anything with this case until I am completely focused and quiet, right? So I totally agree that, you know, and it’s hard. It’s hard in clinical medicine. It’s hard for our clinician partners also because they’re looking at a bazillion things. They’ve got a lot of patients that they’re dealing with, and there’s all this lab data. It’s like drinking from a fire hose in some of these patients. And so it can get really difficult. I want to switch a little bit to talking specifically about artificial intelligence in laboratories. Artificial intelligence and medicine is it’s really already here, coming through medicine like a freight train, pretty much. And it’s an incredibly powerful technology, but we have to use it responsibly. I want to talk a little bit more about brittle algorithms because you know, there’s this famous MIT group, who did this sort of famous study where they went into a bunch of images and they basically changed just, you know, several pixels within the image. And because they did that, so they took the image, they fed it through Google’s algorithm before they altered it. And so Google took this picture of a cat or a turtle or in some cases a rifle, you know, and said it’s you know 99% that it’s a cat, you know, it’s a turtle, it’s a rifle, whatever. So but then this MIT group went in and they basically changed these pixels in this image to where a human looking at the image would still classify it as a cat or a turtle or a rifle in this case, easy, right? Because it was just these single little pixels in the background. But by making those tiny little changes, when they fed those same images, those altered images back through Google’s image analysis algorithm, Google’s image analysis algorithm, you know, funnily enough, for the cat, the cat picture, said they had classified it with 99% confidence that it was guacamole. Now, in Google’s defense, it was a calico cat. All right. But I mean, as you can imagine, so this is, you know, so there were there are two pieces that get brought up here. When you have an analysis algorithm where tiny little changes in the input, like in the case of this stain, where to the human looking at it could still easily tell what was a neutrophil versus an eosinophil, which is mostly color-based. Um, you know, but then this AI algorithm just can’t get it right. So we call those brittle models, right? They’re brittle, meaning it doesn’t take much to break them. There’s a lot of artificial intelligence happening in laboratories. I’m wondering if you want to talk a little bit more about some of these kinds of models and how they get used in hematology. I know we’ve also used this type of algorithm also in cytology for PAP smears with some of the automated pap smear readers.

Dr. Richard Davis:
Yeah, so that’s a kind of a broad topic. Just to sort of talk a little bit about, you know, the sort of historical practice, especially in like hematology, traditionally you would use sort of impedance-based, you know, methods that would send a signal back to computer and, you know, based off of you know certain characteristics, cells would be classified in a certain bucket, you know, going all the way back, you know, to like Wallace Coulter’s original Coulter counter in like the 1960s. But nowadays, especially with the rise of artificial intelligence, you’re starting to see instead of traditional methods, AI-based methods for the classification of these signals. On the imaging aspect, you know, traditional machine learning algorithms have been around for you know a number of decades, especially in the field of cytology for pap smears. And those traditional machine learning algorithms weren’t perfect, but they kind of had a stability to them that based off of the data that they had been trained off of, as well as the way that they had been designed, you had you know a pretty stable environment, even if it was not necessarily perfect. One of the challenges that we have with these more complex deep learning algorithms is the fact that they can be continuously learning. And so that’s an issue that the FDA and other sort of regulatory agencies are having to grapple with is you know how do you sort of lock down the performance for these deep learning algorithms? Because sometimes they can get quite creative and creative in a wrong sort of way. Not necessarily in the medical field, but in the legal field. We’ve seen this with the issue of hallucinations, where generative AI algorithms will produce fake or made-up legal cases, and those will actually get presented by the sort of unsuspecting or unwary legal professional in the context of a court case. And we’ve seen sort of several high-profile incidences of that in the media. But you know, especially when it comes to medicine, I think the real key will be not just developing algorithms that are, you know, can perform at a high level, but algorithms that are robust against, you know, data contamination, you know, changes in the kinds of images that they’re getting. And it’s definitely going to be a major regulatory issue as AI continues to perfuse every aspect of our day-to-day practice.

Dr. Alexis Carter:
Yeah, so I think those are really those are really great points, and I do remember some of those things coming up. I did want to talk about, you know, so what you’re talking about is like having an AI model that generalizes well, right? That you can feed it different data and it will still come up with the same general correct interpretation, right? It may not be perfect, but it’s kind of it’s pointed in the right direction. But the other thing I wanted to touch on, especially because this is a podcast for the College of American Pathologists, you know, we’re laboratorians, we have to follow the clinical laboratory improvement amendments. And so there are a lot of places in medicine that are looking at continuously learning algorithms. These are AI algorithms that are sort of periodically or continuously retraining themselves. But as you mentioned, that can cause lots of issues, right? Because when something’s continuously retraining, you can’t validate it, or you’re having to validate it on the fly, which isn’t necessarily what we want in medicine. And actually under CLIA, because under CLIA law, you know, we have to validate everything before we put it into production, right? So continuously learning algorithms are not something that we can really use in the laboratory under existing federal law. So we have to, you know, train a model and then we have to validate and test it. And then we have to lock the model down, right? You know, so we have to lock the model down and then use it. And then if we want to make an update, we have to go through that process all over again. The next thing I just wanted to talk about was the hallucinations piece. You know, we’ve seen this a lot in medicine, right? When people are trying to use AI to help them write papers, they’ll, you know, and I’ve done this myself. I’ve kind of played around with some AI platforms and just said, you know, give me the best 10 references for, I don’t know, papillary thyroid carcinoma. And then I go look these references up, and you know, like eight out of the 10 of them don’t exist. Like it’s totally made up, which is hugely problematic. And, I don’t know if you have some other examples, you know, we in medicine that we’ve seen with some of these things, but yeah, the hallucinations are a little bit scary because wow, those references looked really plausible.

Dr. Richard Davis:
Yeah, no, I know at least in interacting with some colleagues, they’ve talked about using, you know, some of the sort of popular AI prompt software that’s out there to, you know, whether it be tighten a report or look for research articles. And I think this comes back to the issue of automation bias again, that while these things can be very helpful, anytime you know you’re using a generative AI to either help write a report or look up references or something like that, is that we always need to be double checking our own human intuition. Maybe we would kind of see that with you know sort of traditional search engines like Google or you know, some of the others that exist out there where you look at the various links that are brought up and you would click on those links and you could sort of validate the information that’s in there. So I think it’s important that as we move into this next stage where everybody’s gonna be using an AI chatbot, everybody’s gonna be using, you know, some sort of generative algorithm prompt software interface that when you are generating whether it be something for report or research or whatever application it is, that you’re always vetting the information that’s put in front of you and not just you know getting sucked into that automation bias that would you know prevent you from seeing the issues with hallucinatory results or you know, sort of other problematic results.

Dr. Alexis Carter:
Yeah, so that I know we’re getting towards the end of our time, but, you know, that gets into the issues of explainability and transparency, right, with AI. So yeah, so explainability is you know when the AI algorithm explains where it got its data. And some of the AI, you know, generative large language models kind of show you where it got its data from, which is helpful because like you can click on the leaks and go verify it, right? So that’s a partially explainable model. And it has some elements of transparency also because it’s it the AI algorithm is then showing you the model is showing you kind of what it was using to base its answer off of. You know, transparency is a little bit more than that. It’s also, you know, the people who developed the AI algorithm, what did they mean it for, who was it intended for, you know, what kind of population is embedded in the database, you know, all that stuff. And those are really important because of like what you’re talking about being a human and being able to go in and figure out where the AI got its data from, because then you can kind of more quickly pick up whether or not there’s kind of a serious issue going on that means you need to turn it off. You have any other any other things you want to say about like explainability, transparency, or anything like that?

Dr. Richard Davis:
Yeah, I think it’s important because we have to also recognize that these models have some level of bias in them as well. And depending on how and what kind of data these models are trained on, that can potentially perpetuate biases. Everything from, you know, sort of technical biases that we can see within the data in terms of, you know, like classification of a certain cell type or something like that, to even affecting things like social determinants of health as well. So it’s important to kind of keep that in the back of your mind as well when dealing with these algorithms is to kind of know or be aware of the fact that there could be, you know, biased information that’s coming towards you as well.

Dr. Alexis Carter:
Okay, so all right, this has been quite educational. Let’s sum this up for the listeners. What are the main lessons learned from this incident and what do you recommend that our listeners should do to prevent this from happening in the future?

Dr. Richard Davis:
So in this case, there was an unnecessary hospitalization for an infant due to a spurious absolute neutropenia result. A change in the formulation of the staining reagent caused misclassification of neutrophils into eosinophils by the digital hematology analyzer. When using AI technology, we have to be vigilant about changed inputs that can impact model performance. Although the medical technologists initially recognized the erroneous categorization of the neutrophils, they were ultimately able to finalize the results despite no value entered for the neutrophils after becoming distracted before they could eventually or initially complete the result correction.

Dr. Alexis Carter:
Great. So thanks so much, Dr. Davis, for presenting today. For our listeners, we hope that you have found this podcast on the case of the Phantom Neutrophil helpful. This podcast was produced by the College of American Pathologists, and the content was produced by the College of American Pathologists Informatics Committee. The Informatics Committee always welcomes questions about the podcast as well as suggestions for future podcasts. If you would like to contribute an issue that happened to you, and for instructions on how to anonymously contribute an educational issue and its lessons learned, please contact the committee at the email address listed in the show notes. Please do not send specifics on the issues to this email address. We can get those from you later. We thank you so much for listening, and we look forward to sharing our next podcast with you soon. Bye-bye.

Dr. M. E. de Baca:
Thank you, Dr. Carter and Dr. Davis, for sharing this case and the lessons you learned from things gone awry. Thank you to our listeners for being here. Please stay tuned for future horror stories in pathology informatics. And join us again for insights, updates, and the people behind the innovation. This has been CIPI Connections, where ideas meet action in pathology.

Horror Stories in Pathology Informatics: The Case of the Phantom Neutrophil

Related Content