Guiding Principles for Good Machine Learning Practice

October 29, 2021 | Standards

The U.S. Food and Drug Administration (FDA), Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) have jointly identified 10 guiding principles that can inform the development of Good Machine Learning Practice (GMLP). These guiding principles will help promote safe, effective, and high-quality medical devices that use artificial intelligence and machine learning (AI/ML).

Guiding Principles

1. Multi-Disciplinary Expertise Is Leveraged Throughout the Total Product Life Cycle: An in-depth understanding of an intended integration of a model into clinical workflow, the desired benefits and patient associated risks, confirms that Machine Learning-enabled medical devices are safe and effective.

2. Good Software Engineering and Security Practices Are Implemented: Model design is implemented with attention to good software engineering practices, data quality assurance, data management, and robust cybersecurity practices.

3. Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population: Data collection protocols should ensure that the relevant characteristics of the intended patient population, use, and measurement inputs are sufficiently represented in a sample of adequate size in the clinical study and training and test datasets, so that results can be reasonably generalized to the population of interest. This is significant to manage any bias, endorse appropriate and generalizable performance across the intended patient population, assess usability, and identify circumstances where the model may underperform.

4. Training Data Sets Are Independent of Test Sets: Training and test datasets are selected and maintained to be appropriately independent of one another.

5. Selected Reference Datasets Are Based Upon Best Available Methods: Accepted, best available methods for developing a reference dataset ensure that clinically relevant and well characterized data are collected and the limitations of the reference are understood.

6. Model Design Is Tailored to the Available Data and Reflects the Intended Use of the Device: Model design should support the mitigation of known risks such as overfitting, performance degradation, and security risks.

7. Focus Is Placed on the Performance of the Human-AI Team: Where the model has a “human in the loop,” human factors considerations and the human interpretability of the model outputs are addressed with emphasis on the performance of the Human-AI team, rather than just the performance of the model in isolation.

8. Testing Demonstrates Device Performance during Clinically Relevant Conditions: Considerations include the intended patient population, key subgroups, the clinical environment, measurement inputs, and potential confounding factors.

9. Users Are Provided Clear, Essential Information: Users are provided with clear information such as the product’s intended use and indications, the data used to test and train the model, known limitations, and clinical workflow integration. Users are also made aware of device modifications and updates from real-world performance monitoring, the basis for decision-making when available, and a means to communicate product concerns to the developer.

10. Deployed Models Are Monitored for Performance and Re-training Risks are Managed: Deployed models are capable enough to be monitored in “real world” use with a focus on maintained or improved safety and performance.