A regulatory challenge for natural language processing (NLP)‐based tools such as ChatGPT to be legally used for healthcare decisions. Where are we now?
In the global debate about the use of Natural Language Processing (NLP)-based tools such as ChatGPT in healthcare decisions, the question of their use as regulatory-approved Software as Medical Device (SaMD) has not yet been sufficiently clarified. Currently, this discussion is conducted with an astonishing euphoria about countless clinical applications, including their opportunities, but also their pitfalls with potential errors in clinical use.1-5 Although the FDA and international regulatory authorities have already issued initial guideline documents for the development and approval of machine learning (ML)/artificial intelligence (AI)-based tools as SaMD, a mandatory regulatory process for NLP-based tools has not yet been fully clarified. ChatGPT therefore plays a special role in these considerations. In the United States, an FDA discussion paper from 2019 gives first ideas how to deliver safe and effective AI-based software functionality for the total product lifecycle.6 A recent
In the global debate about the use of Natural Language Processing (NLP)-based tools such as ChatGPT in healthcare decisions, the question of their use as regulatory-approved Software as Medical Device (SaMD) has not yet been sufficiently clarified. Currently, this discussion is conducted with an astonishing euphoria about countless clinical applications, including their opportunities, but also their pitfalls with potential errors in clinical use.1-5 Although the FDA and international regulatory authorities have already issued initial guideline documents for the development and approval of machine learning (ML)/artificial intelligence (AI)-based tools as SaMD, a mandatory regulatory process for NLP-based tools has not yet been fully clarified. ChatGPT therefore plays a special role in these considerations. In the United States, an FDA discussion paper from 2019 gives first ideas how to deliver safe and effective AI-based software functionality for the total product lifecycle.6 A recent guidance document for clinical decision support software (September 2022) further clarifies the FDA's position on what qualifies as a regulable medical device, particularly with respect to AI-driven clinical decision support tools.7 In March 2023, the FDA has published guidance on their algorithmic change control policy which discusses how it evaluates algorithms that are periodically updated, which is particularly relevant for NLP-based tools such as ChatGPT. In Europe, the medical device regulations (EU Regulations 2017/745 and 2017/746) are a major update to the way medical devices are regulated. According to the MDCG 2019-11 guidance document, the gap in SaMD classification has been closed so far.8 A Product Watch Report of the European Commission from July 2020 provides an additional update on this topic, discussing AI, ML and statistical tools for risk estimation or decision support.9 The very recently published AI Act (June 2023), a proposed European law on artificial intelligence, will have far-reaching consequences on medical device regulation in Europe in the near future. The International Medical Device Regulators Forum (IMDRF)/Software as a Medical Device Working Group to harmonize the regulatory requirements published a possible risk categorization framework for SaMD in 2014, and a follow-up document with more detailed information on ML/AI-based software in 2022.10 The IMDRF and FDA recommendations allow for clearer identification of risk categories based on the 'intended use' for healthcare decisions in different medical situations or conditions (diagnosis, prognosis, prevention or treatment). Currently, ChatGPT (Figure 1) is not intended by OpenAI for clinical use based on its terms of use (see https://openai.com/policies/terms-of-use), which clearly states that users should review its output for accuracy, that they make no warranties with respect to services, and that they have limited liability for any damages caused by its use. However, if the intended use of NLP-based tools, beyond ChatGPT, for clinical purposes falls within the definition of an AI/ML-based SaMD, regulatory approval is required. In contrast to 'locked' software algorithms with fixed functions, for example, a classifier for clinical decision support, an 'adaptive, continuous learning' (non-locked) algorithm changes its behavior. Because the standard medical device regulatory process is currently not designed for adaptive AI/ML technologies, additional efforts are needed by the regulators, although these algorithms have the potential to adapt and optimize software performance, in part in real-time, to continuously improve patient health outcomes. Inherent changes to the algorithm are typically made and verified through a well-defined and automated process to improve algorithm performance based on analysis and interpretation of new data.9 For clinical purposes, the data need to be evidence-based and scientifically proven. This is basically the concept of ChatGPT, but more complicated. ChatGPT is not actually an algorithm but merely refers to the user interface. GPT-3.5-turbo and GPT-4 are underlying algorithms that drive ChatGPT. Whether the weights for GPT-3.5-turbo and GPT-4 have not been updated since its release is not publicly known (no official confirmation by OpenAI). It can be speculated that the underlying language model has not been updated with new training data ( cf. the ChatGPT references back to September 2021), but the prompting, the structure of the model, and the chat component may have been updated. The fact that these models produce different output in response to the same prompt is actually due to its use of sampling in the output (see Figure 1). Especially this fact makes it difficult to simply modify ChatGPT (if intended by OpenAI) for regulatory approval as a SaMD in its current architecture. However, this need not apply to the architectures of other NLP-based tools. The intended use of an NLP-based SaMD to support clinic
Executive Summary
The article explores the regulatory challenges and current landscape of using Natural Language Processing (NLP)-based tools like ChatGPT in healthcare decisions. It highlights the lack of clarity in their approval as Software as Medical Device (SaMD) and discusses the evolving regulatory frameworks in the United States and Europe. The article emphasizes the need for a mandatory regulatory process for NLP-based tools and the potential risks associated with their clinical use. It also touches upon the recent guidelines and acts that aim to address these challenges, providing a comprehensive overview of the current state and future directions in the regulation of AI-driven healthcare tools.
Key Points
- ▸ The regulatory status of NLP-based tools like ChatGPT as SaMD is not fully clarified.
- ▸ The FDA and international regulatory authorities have issued initial guidelines for ML/AI-based tools.
- ▸ ChatGPT is not intended for clinical use according to its terms of use.
- ▸ The EU's AI Act and other regulatory documents aim to address the challenges posed by AI in healthcare.
- ▸ The IMDRF provides a framework for risk categorization of SaMD based on intended use.
Merits
Comprehensive Overview
The article provides a thorough overview of the current regulatory landscape for NLP-based tools in healthcare, including key guidelines and acts from the FDA and EU.
Clear Identification of Gaps
It clearly identifies the gaps in the regulatory process for NLP-based tools and highlights the need for further clarification.
Relevance to Current Debates
The article is highly relevant to ongoing debates about the use of AI in healthcare and the need for robust regulatory frameworks.
Demerits
Lack of Depth in Analysis
While the article provides a good overview, it lacks in-depth analysis of specific regulatory challenges and potential solutions.
Limited Discussion on Ethical Implications
The article does not delve deeply into the ethical implications of using NLP-based tools in healthcare decisions.
Incomplete Coverage of Global Regulations
The focus is primarily on the US and EU, with limited discussion on regulatory frameworks in other regions.
Expert Commentary
The article effectively highlights the regulatory challenges associated with the use of NLP-based tools like ChatGPT in healthcare decisions. The lack of clarity in their approval as SaMD underscores the need for robust regulatory frameworks that can adapt to the rapidly evolving landscape of AI in healthcare. The FDA's guidelines and the EU's AI Act are steps in the right direction, but more work is needed to address the specific challenges posed by NLP-based tools. The article's emphasis on the importance of intended use in risk categorization is particularly noteworthy, as it aligns with the broader trend towards patient-centric regulatory approaches. However, the article could benefit from a more detailed analysis of the ethical implications and a broader discussion of global regulatory frameworks. Overall, the article provides a valuable contribution to the ongoing debate on the regulation of AI in healthcare and underscores the need for continued vigilance and adaptation in this rapidly evolving field.
Recommendations
- ✓ Regulatory bodies should develop specific guidelines for the approval and use of NLP-based tools in healthcare, considering their unique characteristics and potential risks.
- ✓ Further research and analysis are needed to address the ethical implications of using AI tools in healthcare decisions and to ensure their clinical validity and reliability.