"Rise of the machines" AI is not just for AP anymore
September 24, 2021
Laboratory medicine has continued to generate paramount data related to medical decision making. Much of the machine learning/artificial intelligence literature has circulated digital imaging in pathology, however a recently published review has outlined the various use cases where ML/AI can be leveraged in clinical pathology. While image analysis and rare event detection may also be prevalent in lab medicine, other applications also include applications include instrument automation, error detection, forecasting, result interpretation, test utilization, and genomics. The authors also cover the challenges that exist related to machine learning in healthcare and laboratory medicine. In addition, discussion of a proof of concept related to automated machine learning.
- The Journal of Applied Laboratory Medicine, jfab075, https://doi.org/10.1093/jalm/jfab075
- Rashidi, HH, Tran, N, Albahra, S, Dang, LT. Machine learning in health care and laboratory medicine: General overview of supervised learning and Auto-ML. Int J Lab Hematol. 2021; 43: 15– 22. https://doi.org/10.1111/ijlh.13537
FDA Evaluations of Medical AI Devices Show Limitations
April 13, 2021
While there are no regulatory clearances for pathology related AI devices, this article stresses critical points in the limitations in several currently available FDA cleared medical AI devices. The authors also provide an annotated database of the 130 medial AI devices analyzed in their article, including risk level, demographic availability, and if multiple site data was evaluated (Database). The authors discuss two main limitations in development of AI models including 1) the use of only retrospective data when developing the model; and 2) the lack of generalizability from insufficient variation of data sources or inclusion of only a single/few sites in developing the model. The potential influence of human-computer-interaction in a prospective setting to deviate the model’s intended use should be evaluated in a prospective setting, such that a device cleared as a screening tool is not used as a primary diagnostic tool. Importantly, the use of varied demographic data is critical to ensure inclusion and evaluation of the model in diverse patient populations. The authors presented a case study evaluating three models trained on three publicly available chest x-ray datasets for pneumothorax detection. Each model was trained at a single site and tested across the three datasets. The model’s with the highest performance were the ones trained at the same site, and there was significant decreases in AUC for models trained at a different site than where the test data was evaluated. Recommendations include evaluating the performance of an AI device at multiple clinical sites, encouraging prospective evaluation during clinical trials, and post-market surveillance.
- Wu, E., Wu, K., Daneshjou, R. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med (2021). https://doi.org/10.1038/s41591-021-01312-x
Impact of COVID-19 on AI Models
February 1, 2021
Despite the title, this isn’t a healthcare article but it does highlight a problem in machine learning that is very important in healthcare and underlies some of the other articles listed on this page. The article presents a straightforward illustration of data drift and shift, showing how the COVID pandemic has disrupted the performance of AI models from supply chain prediction to automated classification of photographs. This performance problem occurs because the patterns in the data change from those on which the model was originally trained. Depending on the environment in which it operates, a model may be stable for years or may need frequent updating. Events that change the details of a data environment, such as process improvement projects or instrumentation changes, may contribute to degrading a model's performance. Humans typically adapt more quickly to these types of changes than AI, which may require re-training with a large data set expressing the new patterns to regain accuracy. While it is generally impractical to make models robust to generic drift or rare events like COVID, the article states that “what you can do is detect when a model goes astray, and create a process for deciding if it needs immediate retraining or decommissioning. This way it won’t silently continue making ostensibly well-informed but actually inaccurate predictions.” Thus AI models have QC and QA requirements that are analogous to more familiar laboratory tests.
- Deshpande, P. How COVID-19 Has Infected AI Models. Domino Data Lab Blog 2020:December 22.
New review published on artificial intelligence and machine learning in pathology
January 29, 2021
An introductory review of artificial intelligence (AI) and machine learning in pathology has been written by the Machine Learning Workgroup of the CAP Informatics Committee and published online in the Archives of Pathology and Laboratory Medicine. The article is accompanied by a supplemental glossary of AI and machine learning terms that is downloadable from the Archives site. The purpose of the review is to give practicing pathologists a working knowledge of current concepts in AI. It discusses applications of AI and machine learning in pathology, provides a detailed overview of a representative sample of AI algorithms, discusses the processes of AI model development and validation, presents key issues in local performance validation and monitoring at deployment sites, discusses performance problems and methodological challenges in applying AI safely and effectively, and addresses developing concepts in AI regulation. A general understanding of the strengths and weaknesses of AI systems, and techniques for their effective management, will be important for pathologists if they are to choose, verify, deploy, use, and monitor these types of systems in the future.
- Harrison JH, Gilbertson JR, Hanna MG et al. Introduction to artificial intelligence and machine learning for pathology. Arch Pathol Lab Med. 2021;1-27.
Generalizability and repeatability problems for high-performing AI image classification systems used with real-world data
January 23, 2021
AI systems developed using curated, high quality image sets to classify dermatologic skin lesions were previously shown to perform as well or better than dermatologist experts on similar test sets of images. This study shows that, for a set of the best-performing convolutional neural network models that classify nevi as benign or malignant, accuracy and calibration were variable and significantly lower than on the original high quality test data when they were used to classify real-world data from several different sites. Furthermore, inaccurate answers were produced by non-nevus lesions, and AI performance was inconsistent between images when the same lesions were photographed several times within a visit or evaluated at several angles of rotation. The authors highlight the importance of training and testing AI models with data similar to that in the planned deployment context, and of verifying AI performance and robustness at deployment. While these are dermatologic images, many of the issues highlighted in this study are likely to be similar for histologic images. These results do not diminish the potential for clinical AI. However, they do suggest that understanding the strengths and weaknesses of AI systems, and the use of best practices to develop models and verify their initial and ongoing performance, will be critical aspects of their safe and effective application.
- Young AT, Fernandez K, Pfau J et al. Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models. NPJ Digit Med. 2021;4:10.
AI performance variable and lower than expected in classification of real-world diabetic retinopathy images
January 5, 2021
Automated classification of funduscopic images was one of the first FDA-approved applications of AI in medical imaging and is arguably the most mature. Performance of these systems on defined, high quality image sets has been excellent. However, as with most AI systems, clinical trials using real-world data sets from multiple locations are rare. This study from the University of Washington evaluated seven algorithms for diabetic retinopathy screening, including one that is FDA-approved, in a clinical trial including 24,000 patients from two locations. There was significant variability in performance between algorithms, and between the locations within algorithms. Only one algorithm performed as well as human screeners, and most performed more poorly than expected based on previous data. Possible explanations for performance variability include differences in imaging equipment and technique, leading to more variable image quality than in original training and testing data, with possible systematic differences between sites. Variability in tissue preparation and imaging in pathology would be expected to lead to analogous data characteristics and therefore performance verification and monitoring per site are likely to be critical elements of pathology AI system deployment.
- Lee AY, Yanagihara RT, Lee CS et al. Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems. Diabetes Care. 2021;epub ahead of print:dc201877.
Definitions of Artificial Intelligence and Machine Learning
Artificial intelligence (AI) is the ability of computer software to mimic human judgement. Current AI systems carry out only very specific tasks for which they are designed, but they may integrate large amounts of input data to carry out these tasks quickly and accurately. The current excitement about AI is focused on machine learning (ML) systems and this domain is sometimes referred to as AI/ML. AI/ML systems may be trained using defined input data sets, which may include images, to associate patterns in data with clinical contexts such as diagnoses or outcomes. Once trained, AI/ML systems are used with new data to predict diagnosis or outcome in specific cases, or carry out other useful tasks. To date, systems are limited in the range of diagnoses, predictions, and tasks covered, but can be impressively accurate within their defined scope.
- Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nature Biomedical Engineering. 2018;2:719-731.
- Topol EJ. High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine. 2019;25:44-56.
Concept of Augmented Intelligence
The American Medical Association has popularized the term Augmented Intelligence to represent the use of AI/ML as a tool to enhance rather than replace human healthcare providers. The Augmented Intelligence concept is based on studies that integrate AI/ML with human experts in a synergistic workflow that achieves higher performance than either separately. In the pathology context, Augmented Intelligence brings the computational advantages of AI/ML into the clinical and laboratory setting in the form of supportive tools that can enhance pathologists’ diagnostic capabilities by, for example, suggesting regions of interest or counting elements on a slide, or providing decision support to inform clinical judgement.
How AI/ML may be used in Pathology
Pathologists who are interested in AI/ML envision a variety of tools that may provide increased efficiency and diagnostic accuracy in the pathologist’s daily diagnostic workflow. As noted above, tools for the pathologist could scan slides to count elements such as lymph node metastases, mitoses, inflammatory cells, or pathologic organisms, presenting results at sign-out and flagging examples for review. AI/ML tools could also flag regions of interest on a slide or prioritize cases based on slide content. Studies to date have shown promise for automated detection of foci of cancer and invasion, tissue/cell quantification, virtual immunohistochemistry, spatial cell mapping of disease, novel staging paradigms for some types of tumors, and workload triaging. Future systems may be able to correlate patterns across multiple inputs from the medical record, including genomics, allowing a more comprehensive prognostic statement in the pathology report.
- Pantanowitz L, Quiroga-Garza GM, Bien L et al. An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. The Lancet Digital Health. 2020;2:e407-e416.
- Colling R, Pitman H, Oien K et al. Artificial intelligence in digital pathology: a roadmap to routine use in clinical practice. J Pathol. 2019;249:143-150.
- Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25(8):1301–1309.
- Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad Pathol. 2019;6:2374289519873088.
- Mezheyeuski A, Bergsland CH, Backman M, et al. Multispectral imaging for quantitative and compartment-specific immune infiltrates reveals distinct immune profiles that classify lung cancer patients. J Pathol. 2018;244(4):421–431.
- Wilkes EH, Rumsby G, Woodward GM. Using Machine Learning to Aid the Interpretation of Urine Steroid Profiles. Clin Chem. 2018;64:1586-1595.
- Arnaout R. Machine Learning in Clinical Pathology: Seeing the Forest for the Trees. Clin Chem. 2018;64(11):1553–1554.
- Cabitza F, Banfi G. Machine learning in laboratory medicine: waiting for the flood?. Clin Chem Lab Med. 2018;56(4):516–524.
Ethical use of AI in Healthcare
The need for large sets of patient data to train AI/ML algorithms raises issues of patient consent, privacy, data security, and data de-identification in the production of AI/ML systems. There is also an ethical duty to review algorithms prior to implementation and verify their performance at deployment to ensure that they are safe, efficacious, and reliable. Recent experience has shown that subtle biases may be incorporated into training data and influence the performance of the resulting systems; these must be mitigated and training data must reflect the diversity of the patient population that the AI/ML systems are intended to serve. An algorithm trained without using best practices for representing ethnic groups, socioeconomic classes, ages, and/ or sex may limit system generalizability to these patient populations in real world settings and exclude (or harm) these groups inadvertently. The “black box” nature of some popular algorithms (not revealing the data patterns associated with particular predictions) combined with the natural proprietary orientation of system vendors may lead to transparency problems and difficulty checking the algorithms by independent interpretation. Finally, the human resource toll of AI/ML must be considered: deskilling of the workforce through dependence on AI/ML must be mitigated and there will be a need to repurpose job roles to adapt to increasing automation.
- Jackson BR, Ye Y, Crawford JM et al. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice. Academic Pathology. 2021;8:237428952199078.
- Keskinbora KH. Medical ethics considerations on artificial intelligence. J Clin Neurosci. 2019;64:277-282.
- O’Sullivan S, Nevejans N, Allen C et al. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int J Med Robot. 2019;15:e1968.
Regulation of Artificial Intelligence and Machine Learning
The training and use of AI/ML algorithms introduces a fundamentally new kind of data analysis into the healthcare workflow that requires an appropriate regulatory framework. By virtue of their influence on pathologists and other physicians in selection of diagnoses and treatments, the outputs of these algorithms can critically impact patient care. The data patterns identified by these systems are often not exact: there is not perfect separation of classes or predictions. Thus there are analogies with sensitivity, specificity, and predictive value of other complex tests performed by clinical laboratories. However, in machine learning the patterns in data are identified by software and often are not explicitly revealed. Biases or subtle errors may be incorporated inadvertently into machine learning systems and these must be identified and mitigated prior to deployment. Naturally occurring changes in healthcare context such as case mix changes, updated tests or sample preparation, or new therapies, may also change the input data profile and reduce the accuracy of a previously well-functioning machine learning system.
An effective and equitable regulatory framework for machine learning in healthcare will 1) define requirements based on risk, i.e., tailored to the likelihood and magnitude of possible harm from each machine learning application, 2) require best practices for system development by vendors including bias assessment and mitigation, 3) define appropriate best practices for verification of system performance at deployment sites, i.e., local laboratories, 4) define best practices for monitoring the performance of machine learning systems over time and mitigating performance problems that develop, and 5) clearly assign responsibility for problems if and when they occur.
The development of this framework is in early stages. To date, the White House has released draft guidance for regulation of artificial intelligence applications that provides a set of high-level principles to which a regulatory framework in any domain should adhere. Specific to healthcare, the FDA has released proposals for processes leading to approval or clearance of machine learning software for use as a medical device. None of these proposals yet addresses best practices for local performance verification and monitoring of machine learning systems analogous to CLIA-mandated laboratory test performance requirements. The CAP regards this omission as a gap in current regulatory planning for machine learning in healthcare and is promoting the development of a more complete regulatory framework that will include guidance, approved methods, and best practices for local laboratories in deploying machine learning tools as they become available.
- Shulz WL, Durant TJS, Krumholz HM. Validation and regulation of clinical artificial intelligence. Clin Chem 2019;65:1336-1337.
- Allen TC. Regulating artificial intelligence for a successful pathology future. Arch Pathol Lab Med 2019;143(10):1175.
- Office of Management and Budget. Guidance for regulation of artificial intelligence applications. White House Memo. 2020;Jan 7:1-15.
- FDA. Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD): Discussion paper and request for feedback. 2019;1-20.
- FDA. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. 2021; Jan 12:1-7.
The CAP is engaged in several activities targeting AI/ML. Internally, the Informatics Committee has formed a Machine Learning Working Group focused on education and technical issues particularly related to verification and performance monitoring. This group is sharing its technical work with the FDA. The Information Technology Leadership Committee has formed an AI Project Team to ensure coordination and alignment of AI/ML activities across the organization and to provide reports to the BOG. An AI in Anatomic Pathology Work Group, reporting to the Council on Scientific Affairs, is developing use cases for AI/ML in pathology that may evolve into PT programs.
Externally, the CAP participates in a several organizations including the Alliance for Digital Pathology, a collaborative group interested in the evolution of regulatory science as it applies to digital pathology and AI. The CAP also works with the American College of Radiology Data Science Institute, a resource in understanding how radiologists are developing and using AI systems. In addition, the CAP is the Primary Secretariat to the Integrating the Healthcare Enterprise’s International Pathology and Lab Medicine domain as well as DICOM Working Group 26: Pathology. These standards organizations are developing technical profiles for incorporation of AI/ML systems into healthcare that will be available to developers of AI/ML tools and systems.