AI performance variable and lower than expected in classification of real-world diabetic retinopathy images
January 5, 2021
Automated classification of funduscopic images was one of the first FDA-approved applications of AI in medical imaging and is arguably the most mature. Performance of these systems on defined, high quality image sets has been excellent. However, as with most AI systems, clinical trials using real-world data sets from multiple locations are rare. This study from the University of Washington evaluated seven algorithms for diabetic retinopathy screening, including one that is FDA-approved, in a clinical trial including 24,000 patients from two locations. There was significant variability in performance between algorithms, and between the locations within algorithms. Only one algorithm performed as well as human screeners, and most performed more poorly than expected based on previous data. Possible explanations for performance variability include differences in imaging equipment and technique, leading to more variable image quality than in original training and testing data, with possible systematic differences between sites. Variability in tissue preparation and imaging in pathology would be expected to lead to analogous data characteristics and therefore performance verification and monitoring per site are likely to be critical elements of pathology AI system deployment.
- Lee AY, Yanagihara RT, Lee CS et al. Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems. Diabetes Care. 2021;epub ahead of print:dc201877.
Leadership roles, expertise, and certification in AI deployment and management
December 19, 2020
Quinn et al argue that healthcare AI systems must be developed, validated, and operated by knowledgeable experts to ensure adequate performance quality and engender public trust. They identify key conceptual, technical, and humanistic challenges related to AI for the roles of developer, validator, and system operator. Each of these roles requires particular expertise to meet its challenges and methods to prepare a work force with this expertise. The authors suggest that it may be advisable to implement new certification programs for individuals in these roles to support high quality and reliable management of AI. This approach represents both a challenge and an opportunity for professional organizations that focus on healthcare quality.
- Quinn TP, Senadeera M, Jacobs S, Coghlan S, Le V. Trust and medical AI: the challenges we face and the expertise needed to overcome them. J Am Med Inform Assoc. 2020;doi:10.1093
Combination of routine clinical lab data with radiology images improves AI prediction of the clinical outcome of COVID-19 at ER presentation, compared with either data set alone
December 16, 2020
Machine learning models were developed that at ER presentation predicted the probability of admission, intubation, and death over the following 30 days. Models used chest radiology, laboratory values, or both. Laboratory values included C-reactive protein, WBC count, D-dimer, lactate, LDH, creatinine, eGFR, troponin, AST, and glucose. The hybrid model trained on both radiology images and lab data yielded substantially better predictions than pure models trained on either data set alone. Machine learning models designed for images often use deep convolutional neural networks in which there is an initial set of convolutional and pooling neural network layers that extract image features, followed by standard fully connected layers that analyze these features. In this hybrid model, the lab data was combined with the image features by feeding it directly into the fully connected layers, after the initial image feature extraction layers. The potential for machine learning models to use input data that crosses clinical specialties and expertise domains raises interesting questions about their governance and management.
- Kwan YJ, Toussie D, Finkelstein M, Cedillo MA, et al. Combining initial radiographs and clinical variables improves deep learning prognostication of patients with covid-19 from the emergency department. Radiology: Artificial Intelligence. 2020;doi:10.148
IBM promotes standard Fact Sheets for AI products
December 9, 2020
IBM has developed a template for standardized Fact Sheets for AI products that improve transparency in AI model design and performance, and enable more accurate matching of AI features with users’ needs. The Fact Sheets are designed to provide an accessible description of how a machine learning model was developed, the context for which it was designed, the extent to which it has been tested for particular types of problems, and reasonable expectations for performance. The template was modeled on the NIST standard for Supplier’s Declaration of Conformity, and regulatory agencies have recently promoted the use of conformity approaches in the regulation of software medical devices. While this template is designed for general purposes, a healthcare-specific extension of the template could inform the design of a package insert for AI products. The ability to read and interpret such an AI Fact Sheet could be an important skill for pathologists who wish to manage AI product acquisition, deployment, and use. The IBM manuscript linked below includes in its appendix an empty Fact Sheet template and two filled examples (though neither example is from healthcare).
The US Government Accountability Office (GAO) and the National Academy of Medicine release a technology assessment on AI in health care
December 1, 2020
The GAO and NAM in November released an extensive (106 pp.) report that reviews healthcare AI potential and applications within and outside of the hospital/clinic setting. The report notes that data to date suggests strong potential for AI applications but deployment of clinical AI is limited. Clinical applications in which AI has shown promise include predicting health trajectories, recommending treatments, guiding surgical care, monitoring patients, and supporting population health management. Promising health care administrative applications include recording digital clinical notes, optimizing operational processes, and automating laborious tasks. Challenges include access to high quality data, biases in data, scaling tools across different institutions and patient populations, performance assessment when algorithm transparency is limited, and lack of applicable case law. Recommendations include encouraging interdisciplinary collaboration in AI development, improving data access for system developers, establishing best practices for development, implementation, and use, creating interdisciplinary education opportunities, and clarifying oversight mechanisms. Opportunities outside the clinic include health monitoring and health promotion, but there are challenges related to data privacy, data standardization/interoperability, biases in data and algorithms, payment/reimbursement, and integration into healthcare systems.
- GAO and NAM. Artificial Intelligence in Health Care: Benefits and Challenges of Technologies to Augment Patient Care. Technology Assessment GAO-21-7SP, November 2020. Read the report
Successful “styling” of H&E sections improves machine learning generalizability
November 16, 2020
Generative adversarial networks (GANs) are pairs of competing neural networks that can be used to modify images so that they match the characteristics (or “style”) of a separate set of images. This capability has been used, for example, to render photographic images in the style of particular artists such as Monet or van Gogh, using a set of images of the artists' paintings to define the characteristics to match. One of the challenges of machine learning in pathology is that tissue images from histology laboratories may have subtle differences in quality and detail that cause machine learning models trained with a data set from one location to perform poorly on data from another location, i.e., the models may fail to generalize. This work demonstrates that poor generalizability can be mitigated by using a GAN to modify a second site’s poorly-performing images to fit the “style" of the images used to train the model, recovering excellent performance. This approach may be useful in broadening a model’s applicability across sites, or allowing a single library of proficiency testing images to be specifically adapted to each site so that they can be used to measure and compare performance across a broad range of sites.
- Shin SJ, You SC, Jeon H et al. Style transfer strategy for developing a generalizable deep learning application in digital pathology. Comput Methods Programs Biomed. 2020;198:105815.
New guidelines released for clinical trials incorporating AI interventions
September 12, 2020
There is consensus that rigorous clinical trials are required to establish the safety and efficacy of AI-based healthcare tools and interventions, but these new approaches require trials that are designed to address their particular features. The SPIRIT and CONSORT trial guidelines have now been extended to incorporate elements supporting AI trials. SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence) includes 15 new protocol elements that should be included in AI trial protocol descriptions, in addition to the core SPIRIT items. CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) includes 14 new items that should be reported routinely in AI clinical trial results, in addition to the core CONSORT items. These extensions to the core guidelines are expected to promote transparency and rigor in the evaluation of new AI systems and strategies.
- Editorial. Setting guidelines to report the use of AI in clinical trials. Nat Med 2020;26:1311.
- Cruz Rivera, S., Liu, X., Chan, A. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med 2020;26:1351–1363.
- Liu, X., Cruz Rivera, S., Moher, D. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 26, 1364–1374 (2020).
Automated interpretation of plasma amino acid profiles
September 4, 2020
A machine learning method is reported as a proof of concept for interpreting ion-exchange chromatography profiles of serum amino acids to diagnose inborn errors of metabolism. Training and test data consisted of 2000 cases submitted to a clinical service in one year that were additionally enriched with rare cases, and several ensemble decision tree algorithms were evaluated (random forests, weighted-subspace random forests, and extreme gradient boosted trees, XGBT). Automatic classification was compared with manual classification by two independent experts. Classification of normal vs. abnormal yielded mean area under the precision-recall curve of > 0.94. Multiclass classification for specific diseases yielded a mean F score (weighted for recall) of about 0.78. In both cases, XGBT performed slightly better than the other algorithms. Performance against very limited external data from a proficiency testing program yielded overall accuracy values of 0.8-0.9, suggesting generalizability with performance similar to the internal test data.
- Wilkes EH, Emmett E, Beltran L, Woodward GM, Carling RS. A machine learning approach for the automated interpretation of plasma amino acid profiles. Clin Chem. 2020;66:1210-1218.
Definitions of Artificial Intelligence and Machine Learning
Artificial intelligence (AI) is the ability of computer software to mimic human judgement. Current AI systems carry out only very specific tasks for which they are designed, but they may integrate large amounts of input data to carry out these tasks quickly and accurately. The current excitement about AI is focused on machine learning (ML) systems and this domain is sometimes referred to as AI/ML. AI/ML systems may be trained using defined input data sets, which may include images, to associate patterns in data with clinical contexts such as diagnoses or outcomes. Once trained, AI/ML systems are used with new data to predict diagnosis or outcome in specific cases, or carry out other useful tasks. To date, systems are limited in the range of diagnoses, predictions, and tasks covered, but can be impressively accurate within their defined scope.
- Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nature Biomedical Engineering. 2018;2:719-731.
- Topol EJ. High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine. 2019;25:44-56.
Concept of Augmented Intelligence
The American Medical Association has popularized the term Augmented Intelligence to represent the use of AI/ML as a tool to enhance rather than replace human healthcare providers. The Augmented Intelligence concept is based on studies that integrate AI/ML with human experts in a synergistic workflow that achieves higher performance than either separately. In the pathology context, Augmented Intelligence brings the computational advantages of AI/ML into the clinical and laboratory setting in the form of supportive tools that can enhance pathologists’ diagnostic capabilities by, for example, suggesting regions of interest or counting elements on a slide, or providing decision support to inform clinical judgement.
How AI/ML may be used in Pathology
Pathologists who are interested in AI/ML envision a variety of tools that may provide increased efficiency and diagnostic accuracy in the pathologist’s daily diagnostic workflow. As noted above, tools for the pathologist could scan slides to count elements such as lymph node metastases, mitoses, inflammatory cells, or pathologic organisms, presenting results at sign-out and flagging examples for review. AI/ML tools could also flag regions of interest on a slide or prioritize cases based on slide content. Studies to date have shown promise for automated detection of foci of cancer and invasion, tissue/cell quantification, virtual immunohistochemistry, spatial cell mapping of disease, novel staging paradigms for some types of tumors, and workload triaging. Future systems may be able to correlate patterns across multiple inputs from the medical record, including genomics, allowing a more comprehensive prognostic statement in the pathology report.
- Pantanowitz L, Quiroga-Garza GM, Bien L et al. An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. The Lancet Digital Health. 2020;2:e407-e416.
- Colling R, Pitman H, Oien K et al. Artificial intelligence in digital pathology: a roadmap to routine use in clinical practice. J Pathol. 2019;249:143-150.
- Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25(8):1301–1309.
- Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad Pathol. 2019;6:2374289519873088.
- Mezheyeuski A, Bergsland CH, Backman M, et al. Multispectral imaging for quantitative and compartment-specific immune infiltrates reveals distinct immune profiles that classify lung cancer patients. J Pathol. 2018;244(4):421–431.
- Wilkes EH, Rumsby G, Woodward GM. Using Machine Learning to Aid the Interpretation of Urine Steroid Profiles. Clin Chem. 2018;64:1586-1595.
- Arnaout R. Machine Learning in Clinical Pathology: Seeing the Forest for the Trees. Clin Chem. 2018;64(11):1553–1554.
- Cabitza F, Banfi G. Machine learning in laboratory medicine: waiting for the flood?. Clin Chem Lab Med. 2018;56(4):516–524.
Ethical use of AI in Healthcare
The need for large sets of patient data to train AI/ML algorithms raises issues of patient consent, privacy, data security, and data de-identification in the production of AI/ML systems. There is also an ethical duty to review algorithms prior to implementation and verify their performance at deployment to ensure that they are safe, efficacious, and reliable. Recent experience has shown that subtle biases may be incorporated into training data and influence the performance of the resulting systems; these must be mitigated and training data must reflect the diversity of the patient population that the AI/ML systems are intended to serve. An algorithm trained without using best practices for representing ethnic groups, socioeconomic classes, ages, and/ or sex may limit system generalizability to these patient populations in real world settings and exclude (or harm) these groups inadvertently. The “black box” nature of some popular algorithms (not revealing the data patterns associated with particular predictions) combined with the natural proprietary orientation of system vendors may lead to transparency problems and difficulty checking the algorithms by independent interpretation. Finally, the human resource toll of AI/ML must be considered: deskilling of the workforce through dependence on AI/ML must be mitigated and there will be a need to repurpose job roles to adapt to increasing automation.
- Keskinbora KH. Medical ethics considerations on artificial intelligence. J Clin Neurosci. 2019;64:277-282.
- O’Sullivan S, Nevejans N, Allen C et al. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int J Med Robot. 2019;15:e1968.
Regulation of Artificial Intelligence and Machine Learning
The training and use of AI/ML algorithms introduces a fundamentally new kind of data analysis into the healthcare workflow that requires an appropriate regulatory framework. By virtue of their influence on pathologists and other physicians in selection of diagnoses and treatments, the outputs of these algorithms can critically impact patient care. The data patterns identified by these systems are often not exact: there is not perfect separation of classes or predictions. Thus there are analogies with sensitivity, specificity, and predictive value of other complex tests performed by clinical laboratories. However, in machine learning the patterns in data are identified by software and often are not explicitly revealed. Biases or subtle errors may be incorporated inadvertently into machine learning systems and these must be identified and mitigated prior to deployment. Naturally occurring changes in healthcare context such as case mix changes, updated tests or sample preparation, or new therapies, may also change the input data profile and reduce the accuracy of a previously well-functioning machine learning system.
An effective and equitable regulatory framework for machine learning in healthcare will 1) define requirements based on risk, i.e., tailored to the likelihood and magnitude of possible harm from each machine learning application, 2) require best practices for system development by vendors including bias assessment and mitigation, 3) define appropriate best practices for verification of system performance at deployment sites, i.e., local laboratories, 4) define best practices for monitoring the performance of machine learning systems over time and mitigating performance problems that develop, and 5) clearly assign responsibility for problems if and when they occur.
The development of this framework is in early stages. To date, the White House has released draft guidance for regulation of artificial intelligence applications that provides a set of high-level principles to which a regulatory framework in any domain should adhere. Specific to healthcare, the FDA has released a proposal for processes leading to approval or clearance of machine learning software for use as a medical device. Neither of these proposals yet addresses best practices for local performance verification and monitoring of machine learning systems analogous to CLIA-mandated laboratory test performance requirements. The CAP regards this omission as a gap in current regulatory planning for machine learning in healthcare and is promoting the development of a more complete regulatory framework that will include guidance, approved methods, and best practices for local laboratories in deploying machine learning tools as they become available.
- Shulz WL, Durant TJS, Krumholz HM. Validation and regulation of clinical artificial intelligence. Clin Chem 2019;65:1336-1337.
- Allen TC. Regulating artificial intelligence for a successful pathology future. Arch Pathol Lab Med 2019;143(10):1175.
- Office of Management and Budget. Guidance for regulation of artificial intelligence applications. White House Memo. 2020;Jan 7:1-15.
- FDA. Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD): Discussion paper and request for feedback. 2019;1-20.
The CAP is engaged in several activities targeting AI/ML. Internally, the Informatics Committee has formed a Machine Learning Working Group focused on education and technical issues particularly related to verification and performance monitoring. This group is sharing its technical work with the FDA. The Information Technology Leadership Committee has formed an AI Project Team to ensure coordination and alignment of AI/ML activities across the organization and to provide reports to the BOG. An AI in Anatomic Pathology Work Group, reporting to the Council on Scientific Affairs, is developing use cases for AI/ML in pathology that may evolve into PT programs.
Externally, the CAP participates in a several organizations including the Alliance for Digital Pathology, a collaborative group interested in the evolution of regulatory science as it applies to digital pathology and AI. The CAP also works with the American College of Radiology Data Science Institute, a resource in understanding how radiologists are developing and using AI systems. In addition, the CAP is the Primary Secretariat to the Integrating the Healthcare Enterprise’s International Pathology and Lab Medicine domain as well as DICOM Working Group 26: Pathology. These standards organizations are developing technical profiles for incorporation of AI/ML systems into healthcare that will be available to developers of AI/ML tools and systems.