In hospitals and health systems across the country, physicians sometimes use algorithms to help them decide what type of treatment or care their patients receive. These algorithms vary from basic computations using several factors to sophisticated formulas driven by artificial intelligence that incorporate hundreds of variables. They can play a role in influencing how a doctor assesses kidney function, if a mother should give birth vaginally once she’s had a Cesarean section, and which patients could benefit from certain interventions.
In a perfect world, the computer science that powers these algorithms would give clinicians unparalleled clarity about their patients’ needs. They’d rely on their own knowledge and expertise, of course, but an algorithm would theoretically steer them away from making decisions based on anecdote, or even implicit or explicit bias.
The only problem, as we’ve learned in recent years, is that algorithms aren’t neutral arbiters of facts and data. Instead, they’re a set of instructions made by humans with their own biases and predispositions, working in a world rife with prejudice. Sometimes, they’re even developed using old or limited data.
The battle over algorithms in healthcare has come into full view since last fall. The debate only intensified in the wake of the coronavirus pandemic, which has disproportionately devastated Black and Latino communities. In October, Science published a study that found one hospital unintentionally directed more white patients than Black patients to a high-risk care management program because it used an algorithm to predict the patients’ future healthcare costs as a key indicator of personal health. Optum, the company that sells the software product, told Mashable that the hospital used the tool incorrectly.
The study’s authors found that Black patients were as sick as their white counterparts, but were expected to have lower costs in the future. The authors suspect the predicted costs for the Black patients didn’t reflect their long-term health risks but were instead linked to structural issues, like difficulty accessing healthcare and reticence to engage the healthcare system because of past experiences with discrimination.
“Otherwise, you’re creating a scientific way of justifying the unequal distribution of resources.”
“On the one hand, having an algorithm is sort of like the illusion of objectivity in science,” says Dr. Ezemenari M. Obasi, director of the HEALTH Research Institute at the University of Houston and a counseling psychologist who studies racial health disparities. Dr. Obasi was not involved in the Science study.
Yet without checks and balances to ensure an algorithm isn’t positively or negatively affecting one group more than another, he believes they’re likely to replicate or worsen existing disparities.
“Otherwise, you’re creating a scientific way of justifying the unequal distribution of resources,” he says.
There’s no universal fix for this problem. A developer might be tempted to solve it with elaborate math. A doctor could try to tinker with software inputs or avoid using an algorithmic product altogether. Experts say, however, that coming up with a solution requires widespread education about the issue; new partnerships between developers, doctors, and patients; and, innovative thinking about what data is collected from patients in the first place.
Checks and balances
Despite the widespread use of algorithms in healthcare, there is no central inventory of how many exist or what they’re designed to do. The Food and Drug Administration laid out a framework last year for evaluating medical software that uses artificial intelligence algorithms, and regulation is still evolving. In some cases, the proprietary code is developed by private companies and healthcare systems, which makes it difficult to study how they work. Patients typically may not know when an algorithm is used as part of their treatment, even as it’s integrated with their electronic medical record to help advise their doctor.
One effort underway at Berkeley Institute for Data Science promises to bring much-needed accountability to the world of healthcare algorithms. Stephanie Eaneff, a health innovation fellow at the institute and at the UCSF Bakar Computational Health Institute, is leading work to develop a “playbook” of best practices for auditing clinical algorithms.
In order to reduce the risk of algorithmic bias, Eaneff says that the evaluation process should happen before a healthcare system adopts new software. The playbook will include information and resources to help a healthcare system create and maintain its own “algorithm inventory” so it knows how and when software is used to make decisions. It’ll also cover how to monitor predictions made by the algorithm over time and across patient demographics, as well as how to assess an algorithm’s performance based on what it’s being used to predict or measure.
The guide aims to give healthcare systems helpful tools for rooting out bias, but Eaneff believes that ongoing professional education and collaboration are both critical. She says developers working in this space need more training in social sciences, bioethics, and health equity policy, as well as partnerships with bioethicists and patient and health advocates.
“Think about it upfront and prioritize it: What are we actually trying to build, for whom, and how will this be implemented, and by whom, and for which communities?” says Eaneff. “When you develop things in a silo and treat them like a math problem, that’s a problem.”
Take, for example, the pulse oximeter. The medical device measures the oxygen level present in a person’s blood. The coronavirus pandemic made the wearable more popular as average consumers looked for non-invasive ways to track key vital signs at home. Yet, as the Boston Review last month, the device effectively “encodes racial bias” because its sensors were originally calibrated for light skin. Pulse oximeters can be less accurate when tracking oxygen levels for patients with darker skin tones. The device itself typically uses an algorithm to make its measurements, but clinicians also use its readings as one factor in their own clinical decision-making algorithms. All the while, a doctor has no clue an algorithm may have let them and their patient down.
One of Eaneff’s collaborators is Dr. Ziad Obermeyer, lead author of the Science study published last fall. He is also a physician and associate professor of health policy and management at U.C. Berkeley. Dr. Obermeyer and his co-authors didn’t have access to the algorithm’s underlying math, but instead evaluated the dataset of a single academic hospital as it used algorithmic software to predict which patients could benefit from targeted interventions for complex health needs.
The researchers found that the Black patients were substantially less healthy than the white patients but were less frequently identified for increased help. When the researchers accounted for this difference, the percentage of Black patients who could receive those extra resources shot up from 18 percent to 47 percent. (The hospital didn’t include race when its employees used the algorithm to identify patients, and yet the process yielded unequal outcomes. The researchers used patients’ self-identified race on their medical records to categorize the results.)
Optum, the company that sells the rules-based software product, known as Impact Pro, disputes the researchers’ findings, though it hasn’t requested a retraction or correction from Science.
“The algorithm is not racially biased,” a spokesperson for the company, said in an email to Mashable. The study, the spokesperson added, mischaracterized the cost prediction algorithm based on the hospital’s use, which was “inconsistent with any recommended use of the tool.”
WATCH: Why you should always question algorithms
The algorithm’s software can identify health status and future healthcare risks based on more than 1,700 variables, not just predicted cost. However, Dr. Obermeyer says that algorithms’ performance are regularly evaluated on their cost prediction accuracy, making it a key metric for hospitals and health systems, even if manufacturers say it shouldn’t be used in isolation to identify patients for certain interventions. Dr. Obermeyer says he’s found this to be the case while working with health systems and insurers following the publication of his study. A 2016 report on healthcare algorithms from the Society of Actuaries also used cost prediction to gauge the performance of several algorithms, including Impact Pro.
“I don’t view this as a story about one bad health system or one bad algorithm — this is just a broad and systematic flaw in the way we were all thinking about the problem in the health system,” Dr. Obermeyer wrote in an email.
He is hopeful that creating a detailed playbook for health systems “will mean that algorithms will get tested at these different points in the pipeline, before they start touching patients.”
The debate over healthcare algorithms — in a field where physicians are frequently white men — has prompted both reflection and defensiveness.
This summer, Dr. David Jones, a professor of the culture of medicine at Harvard University, co-authored an article in the New England Journal of Medicine about how race is used in clinical algorithms. The co-authors identified several algorithms in obstetrics, cardiology, oncology, and other specialities that factored race into their risk predictions or diagnostic test results.
At first glance, including race might seem like an effective way to make algorithms less biased. Except, as Dr. Jones and his co-authors argued: “By embedding race into the basic data and decisions of health care, these algorithms propagate race-based medicine. Many of these race-adjusted algorithms guide decisions in ways that may direct more attention or resources to white patients than to members of racial and ethnic minorities.”
Further, they wrote, when some algorithm developers try to explain why racial or ethnic differences might exist, the explanation leads to “outdated, suspect racial science or to biased data.” The co-authors said it was important to understand how race might affect health outcomes. When race shows up as linked to certain outcomes, it’s likely a proxy for something else: structural racism, education, income, and access to healthcare. Yet they cautioned against using it in predictive tools like algorithms.
“We did not come out and say these things are bad and should be stopped,” says Dr. Jones in an interview. “We said these things are likely bad and should be considered.”
Dr. Jones believes that algorithms would improve and create more equitable outcomes if they accounted for poverty, which is a significant predictor of life expectancy, and other socioeconomic factors like food insecurity, housing, and exposure to environmental toxins.
In general, doctors are known to resist abandoning techniques and tools they trust. They may not understand the complex relationship between structural racism and health outcomes. As a result, some may be reticent to think critically about algorithms and equity.
For Dr. Obasi, director of the HEALTH Research Institute at the University of Houston, it’s vital that developers and clinicians listen to patients affected by algorithms.
A patient who underreports certain aspects of their health, like mental illness, drug use, and intimate partner violence, might do so out of fear. If he can’t answer questions about his father’s medical history, it might be because he doesn’t have a personal history with him or doesn’t discuss medical challenges with him. If he can’t complete the part of the questionnaire about his mother’s health, it could be because she’s not had insurance for years and hasn’t seen a medical provider. A patient deemed “noncompliant” might feel uncomfortable following up on a physician’s orders after dealing with racism in their office.
Dr. Obasi wishes for algorithms that are designed with such cultural differences and lived experiences in mind.
“Anytime you’re trying to take technological advancements and translate that into practice, you need to have folks impacted by it at the table,” says Dr. Obasi. “And that requires a different level of humility.”
Read more from Algorithms: