Machine Learning Predicts Kids at Risk of Not Getting Vaccinated
Predictive computer models could prompt physicians to talk with families who are skeptical of vaccines
Photo: George Frey/Getty Images
Growing skepticism toward vaccines has sparked a flareup of measles outbreaks affecting New York City neighborhoods, cruise ships, international airports and even Google’s Mountain View headquarters. To help family physicians reach out to vaccine-hesitant parents, data scientists have shown how computer models can predict the likelihood that an individual child’s parents will not get him or her vaccinated.
Since 2016, the world has witnessed a resurgence in measles cases and deaths as more people choose not to vaccinate their children—a decision that is often influenced by misinformation spread online through social media platforms such as Facebook and YouTube. By identifying families at greatest risk of not getting vaccinated, computer models could enable health officials and physicians to talk with parents at the stage when they remain undecided about vaccines.
“The reason why this could be useful is that, while it’s very hard to persuade someone once they’ve made up their mind, it might be easier if we know early enough and approach them in a friendly manner explaining why it’s important that their children be vaccinated,” says Tin Oreskovic, a data scientist at IBM’s Chief Analytics Office.
Families who choose to not get the MMR (measles, mumps, rubella) vaccine may expose their neighbors and communities to the risk of serious illness and death. In 2017, there were 110,000 measles deaths worldwide. Most of these fatalities involved children under the age of five, according to the World Health Organization (WHO). Before the measles vaccine became available in 1963, measles epidemics regularly swept the globe, killing approximately 2.6 million people each year.
It’s important to ensure that at least 95 percent of the population gets immunity through two vaccine doses (or sometimes prior exposure to the virus). That 95 percent “herd immunity” threshold limits the possible spread of measles outbreaks and helps protect infants who are too young to be vaccinated as well as people who cannot be immunized because of other diseases or conditions. But many countries have seen second-dose vaccination rates fall below the herd immunity threshold, including 34 out of 53 countries in the WHO’s European region in 2017.
To help boost vaccination rates, Oreskovic initiated and coordinated a University of Chicago Data Science for Social Good project aimed at predicting the likelihood of Croatian children getting vaccinated by the end of their first-grade school year. Working with the Croatian Institute of Public Health, researchers from France, Portugal, and the United States worked together to train machine learning algorithms on the electronic health records of 48,000 children who entered the first grade between 2011 and 2018.
After comparing the results from four machine learning models, researchers decided upon a LASSO logistic regression model that identified vaccine-hesitant families with 72-percent precision. The model pruned the large number of possible data features affecting vaccination rates down to just 25 of the most important features—something that improved the chance of the model’s predictive power holding up for other groups of children beyond those in the training datasets. (Some features that raised child risk scores included having children who sat, walked, and spoke at a later age than their peers.)
Just as importantly, the team chose the LASSO model because it presented the results for child risk scores in a way that humans could understand. Interpretability is never a guarantee with many machine learning models, but in this case it allowed both data scientists and health officials to understand and trust the LASSO model’s reasons for singling out certain families as being at higher risk of hesitating to vaccinate.
The project also created an “Early Warning and Monitoring System” Web dashboard that presents vaccination rates and child risk scores to public health officials and physicians at national, county, and local health clinic levels. The next project being considered will likely involve a randomized controlled trial to see whether the child risk scores help officials and physicians to intervene effectively with vaccine-hesitant families and improve vaccination rates. But that next step would likely take place no sooner than the 2020–2021 school year.
Some important issues have to be resolved before this type of predictive population analysis can be widely deployed.
This article first posted at IEEE Spectrum on May 7, 2019.