1. Home
  2. Member Resources
  3. Clinical Informatics Resources
  4. How to Validate AI Algorithms in Anatomic Pathology

How to Validate AI Algorithms in Anatomic Pathology

Read the cases in order starting with Part 1. The case includes questions to support your learning as you progress through each part. In addition, the case includes a summary, key points, multiple-choice questions with answers, and a list of references. After completing at least one case (hopefully you will complete all three), please provide your feedback via a short survey.

  • Open all Toggle
  • Close all Toggle

Your laboratory would like to bring in a new FDA-approved, artificial intelligence (AI) system as a diagnostic aid to pathologists reading cervical biopsies. The system integrates with the FDA-approved digital pathology system that your practice has already implemented and validated for primary diagnosis. The system uses images from the digital pathology system and applies a machine learning algorithm to pre-classify cases as normal, benign reactive, low grade dysplasia, high grade dysplasia, carcinoma in situ, or invasive carcinoma. When a pathologist reviews a cervical biopsy on the digital pathology system, the computer-generated pre-classification is displayed in a separate window on the monitor. The final diagnosis is determined by the pathologist, who can elect to either accept the pre-classification or select an alternate diagnosis. 

The original FDA approval study for the AI system included 2000 cervical biopsies that were evenly distributed between normal, benign reactive, low grade dysplasia, high grade dysplasia, carcinoma in situ, and invasive carcinoma. The data demonstrated 95% sensitivity and 92% specificity for detecting carcinoma in situ or invasive carcinoma, and 96% sensitivity and 88% specificity for detecting dysplasia of any grade.

Question to Think About

Prior to placing this AI system into the pathologist’s workflow, what are the requirements for validation, if any?

Regardless of whether a test, device, or diagnostic aid is FDA-approved or not, it must be validated in the laboratory before being placed into clinical use.

Both CLIA and CAP require that any new test, device, or diagnostic aid undergo validation before reporting results on patient samples. However, neither CAP nor CLIA specify details of the validation study, such as the number of specimens or limits of acceptability. These parameters are determined by the medical director.  Although there is currently no specific published guidance on validation of image-based AI algorithms, the CAP published guidelines in 2013 on validating whole slide imaging, and many of the principles can be applied when validating an image analysis algorithm.  In particular, the guidelines statements listed below are particularly relevant when considering validation of any diagnostic digital imaging system.

  1. Validation should be appropriate for and applicable to the intended clinical use and clinical setting of the application in which WSI will be employed. Validation of WSI systems should involve specimen preparation types relevant to the intended use (eg, formalin-fixed paraffin-embedded tissue, frozen tissue, immunohistochemical stains, cytology slides, hematology blood smears).
  2. The validation process should include a sample set of at least 60 cases for one application (eg, H&E- stained sections of fixed tissue, frozen sections, cytology, hematology) that reflects the spectrum and complexity of specimen types and diagnoses likely to be encountered during routine practice.
  3. The validation study should closely emulate the real-world clinical environment in which the technology will be used.
  4. The validation study should encompass the entire WSI system.
  5. Revalidation is required whenever a significant change is made to any component of the WSI system.

Your laboratory uses the CAP guidelines on validating WSI to aid in designing a validation study for the AI system. The validation study occurs in the laboratory using 200 cervical biopsy samples that were previously diagnosed by pathologists. These cases reflect the variety of diagnoses typically encountered in the pathologist’s workflow, including the categories of normal, benign reactive, low grade dysplasia, high grade dysplasia, carcinoma in situ, and invasive carcinoma. The results of the validation study indicate that the AI system is 96% sensitive and 91% specific for detection of carcinoma in situ or invasive carcinoma when compared to evaluation by a pathologist. Additionally, the system is 95% sensitive and 87% specific for detecting dysplasia of any grade.

These results exceed the medical director’s approval criteria of 95% sensitivity and 85% specificity, and the system is placed into the clinical workflow. The pathologists report high satisfaction with the algorithms, noting its sensitivity in detecting dysplasia and carcinoma. Given this positive experience, the group looks at similar systems for other high-volume applications such as breast and prostate biopsies.

Question to Think About

If a pathology group chooses to use machine-learning models, should they be restricted to using only FDA-approved models? If not, should the regulatory state of the model be included in the surgical report (similar to an ASR?)

As with any test, the laboratory is not restricted to using only FDA-approved systems.  However, a statement regarding the regulatory state of the model and performance of a validation study should be included in the surgical report. 

A non-FDA-approved system may be employed for clinical testing, provided that an adequate validation has been performed by the laboratory.  For non-FDA approved systems, the manufacturer may provide performance data to prospective users, but these data have not undergone successful submission and approval through the FDA.  Non-FDA approved systems may have been evaluated by others in published studies.  Medical directors should take these data in consideration when designing a validation study. 

Regarding including a disclaimer on the surgical pathology report, the best practice is to include a statement in the pathology report indicating that a non-FDA approved system was involved in the diagnostic process.  An example for this application is listed below:

This test was developed and its performance characteristics determined by XXX Laboratories. The U. S. Food and Drug Administration has not approved or cleared this test; however, FDA clearance or approval is not currently required for clinical use.

After many months of using both FDA-approved and non-FDA approved artificial intelligence (AI) systems for cervical, breast, and prostate biopsies, all of which were validated by the laboratory, an apparently discrepant case arises.  A pathologist uses the cervical AI system to assist in signing out a case, and the AI suggested a “benign reactive” diagnosis but failed to identify a collection of multi-nucleated cells with molded nuclei and marginated chromatin, suggestive of herpes simplex virus (HSV) infection.

Question to Think About

Can the pathologist argue the model's overlay misled them?

Not necessarily, as the previously described categories of classification by the AI system only included normal, benign reactive, low grade dysplasia, high grade dysplasia, carcinoma in situ, and invasive carcinoma.

In this case, the pre-classification system correctly classified the HSV infected cells as benign reactive rather than normal, dysplastic, or carcinoma. Thus, although distinction between the different reactive conditions would be ideal, it is not a designed output of the system. For any image recognition or AI system, it is important to have a clear understanding of the system’s intended use and stated output. Additionally, it is important to include a wide variety of different diagnostic entities and histologic findings in the validation study to gain an understanding of the behavior of any image recognition/AI system.

Any type of image analysis or image recognition system, whether it is FDA-approved or non-FDA approved must be validated prior to clinical use. As there are no specific published guidelines for validation studies for image recognition or AI systems by CLIA or CAP, the design of the validation study, including the number and type of specimens and acceptance criteria are up to the medical director’s discretion. One helpful document to aid in designing a validation study is the CAP guidelines for validating whole slide imaging, which highlights important principles in evaluating digital image systems for clinical use. With the recent FDA-approval of WSI for primary diagnosis, diagnosis using digital pathology and diagnostic aids such as AI-based image algorithms will be utilized with increasing frequency in clinical practice. For more information on the validation process, FDA-approval for WSI, and current state of image analysis, please see the included references.

  1. Which of the following is true regarding validation requirements for an FDA-approved image analysis system:
    1. the FDA publishes guidelines on laboratory validation for approved systems.
    2. the medical director can determine whether the system is suitable for clinical use based solely on the data FDA-approval.
    3. validation of the system must occur by the laboratory prior to being placed into clinical use
    4. the manufacturer can validate the system for the laboratory.
  2. Which of the following would require revalidation of an AI system for image analysis?
    1. Adding memory to the computer running the AI system.
    2. Changing the AI system algorithm or model.
    3. Applying an update to the operating system to the computer running the AI system.
    4. Relocating the computer running the AI system.
  3. Applying the CAP recommendations for validating whole slide imaging systems to an AI system, the recommended minimum number of cases that should be included in the validation is
    1. 20
    2. 50
    3. 60
    4. 100
  4.  When using a non-FDA approved system for clinical diagnostics, best practice includes:
    1. placing a disclaimer in the pathology report stating that a non-FDA approved system was used.
    2. annual revalidation of all non-FDA approved systems.
    3. cross checking a subset of results with an FDA-approved system.
    4. allowing the vendor to perform the validation study prior to clinical use.

Aeffner F, Zarella MD, Buchbinder N, Bui MM, Goodman MR, Hartman DJ, Lujan GM, Molani MA, Parwani AV, Lillard K, Turner OC, Vemuri VN, Yuil-Valdes AG, Bowman D. Introduction to digital image analysis in whole-slide imaging: A white paper from the digital pathology association. J Pathol Inform 2019;10:9. doi: 10.4103/jpi.jpi_82_18.  

Evans AJ, Bauer TW, Bui MM, Cornish TC, Duncan H, Glassy EF, Hipp J, MCGee RS, Murphy D, Myers C, O’Neill DG, Parwani AV, Rampy A, Salama ME, and Pantanowitz L. US Food and Drug Administration Approval of Whole Slide Imaging for Primary Diagnosis: A Key Milestone Is Reached and New Questions Are Raised. Arch Pathol Lab Med. 2018; 142(11): 1383-1387. doi: 10.5858/arpa.2017-0496-CP.

Pantanowitz L, Sinard JH, Henricks WH, Fatheree LA, Carter AB, Contis L, Beckwith BA, Evans AJ, Lal A, Parwani AV. Validating Whole Slide Imaging for Diagnostic Purposes in Pathology: Guideline from the College of American Pathologists Pathology and Laboratory Quality Center. Arch Pathol Lab Med. 2013, 137(12): 1710-1722. doi: 10.5858/arpa.2013-0093-CP.     

2019
Brent Tan, MD, PhD
CAP Informatics Committee
Stanford University
Stanford, CA

1.     c

2.     b

3.     c

4.     a