Case Study | The Therac-25 Tragedy: Software Errors Led to Fatal Overdoses | Medical Software Course

πŸš€ Add to Chrome – It’s Free - YouTube Summarizer

Category: Software Failures

Tags: radiationregulationsafetysoftwaretesting

Entities: Anme LittleAtomic Energy of Canada LimitedFDATherac-25Yale

Building WordCloud ...

Summary

    Introduction
    • Anme Little, a senior in biomedical engineering at Yale, introduces the module on software development failures.
    • The module includes case studies presented by Yale undergraduates, focusing on real-world software failures.
    Case Study: Therac-25
    • Therac-25 was a dual-mode radiation therapy machine involved in several incidents of radiation overdoses.
    • The software replaced mechanical safety mechanisms, leading to failures when the electron and X-ray modes were misaligned.
    • Operators experienced issues with redundant data entry, leading to software modifications that introduced errors.
    • A significant incident occurred in 1986, where a patient received an overdose due to software not detecting a mode change.
    • Six incidents were reported, with delayed manufacturer acknowledgment and inadequate safety responses.
    Regulatory and Safety Lessons
    • The FDA eventually intervened, requiring safety improvements and halting Therac-25 operations until compliance was achieved.
    • These failures highlighted the importance of documentation, testing, and responsive complaint handling in medical software.
    • Modern regulations now emphasize rigorous software testing and safety checks to prevent similar incidents.
    Actionable Takeaways
    • Ensure comprehensive testing and documentation of medical software.
    • Implement clear and responsive complaint handling processes.
    • Avoid overconfidence in software safety without rigorous validation.
    • Regularly update and review safety protocols and user interfaces.
    • Regulatory agencies should enforce strict compliance and safety standards.

    Transcript

    00:00

    Hi, my name is Anme Little and I am a senior majoring in biomedical engineering at Yale. In this module, we will present examples of what happens when software development goes wrong.

    When this class is taught at Yale, the

    00:15

    students are assigned to present these and other case studies in small groups. We follow this tradition for the online class as the case studies will be presented by the four Yale undergraduate students who worked as student assistants in the creation of this course.

    The goal of this module is to

    00:31

    give you some real world examples of software failures. The case of the 25 was particularly important in the development of the medical device regulations in place today.

    All four of the case studies demonstrate a failure to follow appropriate procedures whether in the actual process

    00:48

    of coding and testing, the adherence to the selected life cycle model, the failure to account for real world environment that the software will operate in, or even the consequences of the simple failure to install appropriate software updates in time.

    01:03

    Here is the outline for the four vignettes. I will be presenting the first and my peers will present the other three.

    For more information on some of these and other similar cases, see chapters 17 to 22 of the textbook.

    01:21

    The basics of the 25 are covered in the first lecture video in week two. My goal for this vignette is to provide more details of the situations and decisions that led to the incidents.

    The following details about the 25 come

    01:36

    from these two sources by Nancy Leson from 1993 and 1995. As a reminder, the 25 was the machine used for radiation therapy.

    Radiation therapy uses localized high energy radiation to treat cancer. The common

    01:52

    practice is to apply doses of radiation in multiple increments in order to avoid killing healthy normal tissue since normal cells can recover faster than cancerous cells. There are different forms of radiation.

    Electrons can be used to treat tumors near the skin surface while X-rays which

    02:10

    have 100 times more energy are used to treat tumors deeper into the body. Direct exposure to X-ray can cause serious harm to patients.

    The 25 was a dual mode treatment machine, meaning it had both electron and X-ray treatment capabilities. You

    02:26

    can see where there could be a potential problem here. The figure on the left shows the turntable setup which has both the X-ray and the electron treatment regions.

    Notice the flattener that must be aligned with the X-ray source in order to prevent direct exposure and the scan

    02:42

    magnet that aligns with the electron source to directives beams. A computer adjusts the turntable position.

    This machine derived from an earlier Theak 20 model, but the 25 replaced mechanical safety mechanisms with software which supposedly would

    02:58

    check the turntable settings. Failures within this software caused several incidents of radiation poisoning when the electron magnets were wrongly aligned with the X-ray, leading to severe injuries and death.

    So, let's walk through an operator's interaction with the software. The

    03:16

    operator first positions the patient on the table in the treatment room. here or she manually sets the treatment field sizes and attaches the necessary accessories to the machine.

    The operator then leaves the room and controls everything from a console room.

    03:32

    The patients info and treatment plan are entered into the software system and the computer checks to make sure the manually said values match the values typed into the console. If the values match, treatment begins.

    Operators then began complaining that it was redundant and timeconuming to

    03:48

    reenter the data into the console system. So the Atomic Energy of Canada Limited or AACL modified the software to copy treatment data from the manually set values.

    Here's an example of a time when things went wrong. In March 1986 at the East

    04:05

    Texas Cancer Center, a male patient came in for skin cancer electron radiation treatment on his back. He was going in for his ninth treatment.

    After sending the patient into the treatment room, the technician incorrectly set the machine on X-ray mode, but quickly changed it to electron

    04:22

    mode using the up arrow key to edit. The parameters displayed as verified on the screen, and she hit the key to begin treatment.

    The software did not detect the last minute change and left the beam on X-ray mode, even though the computer read electron mode on the screen.

    04:38

    The machine quickly shut down and displayed malfunction 54 on the screen. There was no indication in the instruction manual as to what this malfunction indicated.

    And accustomed to the frequent stops and problems with the machine, the operator pressed the P key to proceed with treatment.

    04:55

    It was later revealed that the patient had received a massively concentrated overdose of up to 155 times more radiation than planned. The patient died from resulting complications five months later.

    There were a total of six reported

    05:11

    incidents involving theat 25. The first at Kennstone Regional Oncology Center, one Ontario Cancer Foundation, two at Yakima Valley Memorial Hospital, and two at East Texas Cancer Center.

    All of the incidents had similar storylines as the

    05:26

    East Texas Cancer Center incident that we just discussed. It took far too many incidents for the manufacturers to even acknowledge the possibility of error.

    After the first incident, no action was made to investigate the safety or to warn other physicians using the act 25.

    05:44

    After the second incident, the clinic asked Acl to include additional safety checks, but ACL did not comply. After the first Texas incident, the engineer told the hospital that it was quote impossible for the 25 to overdose the patient.

    06:00

    The second East Texas incident involved the exact same operator. As before, she used the edit up key to quickly change the mode from X-ray to electron.

    Again, the malfunction 54 error popped up and the operator heard the patient moaning for help in the treatment room. The

    06:16

    patient described the feeling of fire on the side of his face where he was receiving treatment. After this incident, a physicist at the hospital began his own investigation.

    He worked with the operator and retraced all of her steps. Through lots of experimentation and button pressing,

    06:31

    they discovered that a quick data entry edit from X-ray to electron mode could recreate this malfunction sequence and overdose. Without the effort of these two individuals, it may have taken much more time in many more instance for the the 25 bug to be found.

    06:48

    We can learn a lot from the failures of the manufacturer to respond to complaints. They were complacent in design and failed to accurately check safety, assuming that since prior versions were safe, that the new version was safe as well.

    They used cryptic error messages and confusing user

    07:04

    interfaces. They were overconfident in their software, failing to conduct adequate investigations following hospital incidents.

    These failures led to severe injury and death for several individuals. Because of these incidents, companies are now required to have a process for receiving

    07:20

    and responding to complaints. For the regulatory response, minimal investigation was conducted by regulatory agencies prior to 1987, allowing the machines to continue operating.

    In February 1987, the FDA FDA

    07:36

    recommended all the 25 machines to be shut down and forced ACL to notify all hospitals of this. The FDA required the manufacturer to implement comprehensive safety improvements.

    And after five back and forth revisions, the the 25 was finally approved for safety and

    07:52

    implementation. Since these incidents, the FDA has taken a much more cautious approach in approving software.

    They now highly emphasize proper documentation and testing. So, in conclusion, we learned that software can fail just like hardware.

    In

    08:08

    many cases, it is hard to detect where and how it will fail because this requires very thoughtful testing. Software failure, just like hardware failure, can cause serious harm, as seen by the six reported patient incidents.

    Again, proper documentation and testing is crucial for developing medical

    08:25

    software. And finally, the the failures of the act 25 provide guidance on what not to do, thus shaping modern medical software regulation through agencies such as the FDA.

    Thank you.