There are many Artificial Intelligence tools related to maternal and child health,1 such as apps to detect malnutrition in children2 and algorithms to predict risks and complications during pregnancies.3
But these tools raise several questions: what are the benefits and downsides of AI compared to current approaches? What are the challenges and risks of using AI? How can we ensure that AI tools promote health equity and access rather than deepening existing divides?
I delved into some of these issues in a podcast with Amrita Mahale, the Director of Product and Innovation at ARMMAN. ARMMAN is an Indian non-profit that creates cost-effective, tech-based solutions to reduce maternal and child mortality and morbidity. Through its programs, ARMMAN has reached around 50 million women and children and 400,000 health workers in 21 states across India.4
Here is an edited transcript of the podcast with Amrita Mahale.
India has made great strides towards reducing maternal mortality, but still, every 20 minutes or so, a woman dies in childbirth.5 And for every woman who dies, many more suffer lifelong complications.
One challenge is low healthcare-seeking behavior.6 Due to patriarchal strictures and norms, neither women, nor their families pay much attention to their health needs.7,8 These issues are exacerbated for women from low-income and low-education backgrounds.
Besides, community health workers are often underskilled and overworked.9 They are supposed to provide basic care for simple conditions, but that doesn’t always happen. So, women either delay seeking care, opt for private clinics (which can be expensive), or go to tertiary health facilities (which are usually overburdened) [See Glossary #1 below for more details]. Delay in accessing healthcare increases the risk of severe complications and deaths. If you have good preventive care systems, most complications can be averted or caught early and dealt with at the health system’s lower levels. That way, only the most acute cases go to secondary or tertiary facilities.
Our programs broadly fall into two buckets. One, programs that empower women with preventive care information through pregnancy and infancy so that they seek healthcare early and regularly.
Two, programs that train and support health workers to better provide healthcare and detect and manage complications early. We try to prevent health complications and ensure that when they occur, they are tackled at the health system’s lower levels to avoid overburdening tertiary facilities.
Two of our largest programs are Kilkari and Mobile Academy. Kilkari, which we run in collaboration with the Government of India, delivers weekly pre-recorded messages to women over mobile phones from the fourth month of pregnancy till the child turns one. It has reached over 47 million women to date and has 3.5 million active subscribers across 20 states of India. Mobile Academy trains frontline health workers known as ASHAs [See Glossary #2 below for more details] using phone-based training modules. Another program is mMitra, which sends automated voice calls with critical health information to around 100,000 women in the state of Maharashtra.
There have been Randomized Controlled Trials (RCT)[See Glossary #3 below for more details] to evaluate the impact of our programs. The mMitra RCT showed a 38% increase in pregnant women who completed the prescribed doses of iron-folic acid tablets and a 22% increase in the number of children who tripled their birth weight after one year. There were also improvements in tetanus vaccine uptake, consulting a doctor for spotting or bleeding during pregnancy, and delivery in a hospital. John Hopkins University conducted the RCT for Kilkari about five years ago. It showed improvements in the vaccination of children, delivery in a hospital, use of contraceptives, and fathers’ knowledge regarding maternal and child health.
ARMMAN has been using AI for several years now, starting with the mMitra program.
In mMitra, we observed dwindling engagement over time, which is common for mobile health programs globally. Some listen actively for a few weeks or months, but since the program goes on for 18 months, they stop listening for various reasons. We wanted them to listen to every message during the program because, say, if a pregnant woman drops out before the child is born, she will miss out on information related to immunization, exclusive breastfeeding, complementary feeding, etc.
We had some rules-based systems to avoid drop-offs. So, when a woman stopped listening, we would call her from our call center, but it would often be too late. And the bigger challenge is that we don’t have a large workforce. These resource constraints meant that we couldn’t call everyone who stopped listening. So, we wanted to identify and predict listenership patterns early on and strategically intervene to ensure higher success rates.
We realized that we have a lot of data, so why not use AI to solve this problem? We partnered with Google Research India and used restless multiarmed bandit algorithms to predict which users are likely to drop off and who among them will benefit the most from an intervention.10
Now, we are trying to do the same in Kilkari as well. It’s different from mMitra, where we enroll the women ourselves and collect demographic information. So, it’s easy to transfer insights from older cohorts to newer ones. When a woman joins, we can figure out a lot based on her socio-demographic characteristics and information from past subscribers.
In Kilkari, we have no demographic information, so all we can do is look at a woman’s listening trajectory and make predictions about her future. So, we had to make many tweaks to the AI approach to meet the needs of a national program like Kilkari. But because we’ve done it once, we know what to expect and how to design an effective AI study.
We follow an evidence-based approach to scaling innovation. So, we start with small pilot projects [See Glossary #4 below for more details] and then increase their scale before large rollouts.
We are working on broadly two kinds of AI projects: 1) Using machine learning and data science to improve our programs 2) Using generative AI and Large Language Models (LLMs) [See Glossary #5 below for more details].
We are planning a pilot where we use AI to predict what the best time slot to call a woman is. And this is a great example of using insights from the real world to select a use case to deploy AI.
We’ve seen in rural India that phone usage patterns are very different from urban India. In cities, we keep checking our phones, but this is not common among our target audience. They use their phones early in the morning, then get busy with domestic chores or work in the fields and check their phones again only after lunch or at the end of the day. So in a day, there are only 2-3 brief windows during which we can reach these women. And many share their phones with their husbands, who might be away at work for most of the day.
But in Kilkari, women can’t choose a time slot to receive calls. And we don’t know if they are using shared phones. So how do we know when to call them? We are thinking of using AI to optimize which time slot the woman gets a call in. We are still in the brainstorming phase, but later this year, we’ll run a pilot to see if we can use machine learning to predict the best time to call a woman, especially given the constraints of the automated calling system.
The other kind of AI solution we are excited about uses generative AI and LLMs. Last summer, we decided to build a learning program for Auxiliary Nurse Midwives (ANMs) [See Glossary #6 below for more details]. We had earlier developed 20 detailed protocols on high-risk factors. We train workers face-to-face regarding these protocols and provide them with digital learning materials for self-paced learning. However, ANMs would sometimes get overwhelmed because of the information overload.
So, we started a WhatsApp helpline where they could pose queries and doctors would respond. The doctors are overworked, so they would often take hours or days to respond. Eventually, ANMs stopped using the service because they were used to getting answers at the speed of a Google search.
That’s when we thought of using LLMs to not generate an answer, but pick the most appropriate response from a list of frequently asked questions and answers in response to the ANMs’ queries. Or the LLM would generate an answer, which the doctor would verify before sending it to the ANM. But we saw that the LLM generated excellent answers! So, we thought of keeping the doctor out of the loop and having the LLM send responses directly to the ANM.
Yes, we were nervous about this problem, so we approached it with caution and responsibility. It was clear to us that we would not launch anything without validating it thoroughly and that we would evaluate the model in small, incremental steps. So, we did not just let the LLM make up answers.
We use something called retrieval-augmented generation — we force the LLM to take answers from the training manuals and clinically validated protocols we had created. In this aspect, we were privileged compared to other organizations trying to build chatbots because we did not have to create any resources from scratch. All we had to do was make them more machine-readable. Since they are learning aids for health workers, the protocols are visual — there are flowcharts, decision trees, and images. So, we had to convert those into plain text, which was slightly time-consuming.
We also had a lot of evaluation materials — health workers take quizzes at the beginning and end of the courses, which help us evaluate the courses’ impact on learning levels. We also have a module on ethics. These became evaluation materials for the LLM. We made sure at every step that the LLM was able to give correct answers in the quiz and match the ethical aspects with the correct answers. So, even before we began using LLMs widely, we ensured it worked well in a variety of contexts.
We follow a problem-first approach and not a technology-first approach for our innovation pilots — AI as well as non-AI. We identify the core problem we have to solve for our users and how technology or AI can solve this problem more efficiently. That ensures we use AI only to create meaningful impact.
ARMMAN’s pilots also go through an ethics review. There is an interdisciplinary team that looks at the study design and preliminary results and thinks through the risks: potential harm and sources of bias.
We work with external collaborators on our AI projects, but we do not share any personally identifiable information with them. Internally also, only those who cannot do their job without this data have access to it; others don’t.
To ensure equity, we follow inclusive design principles. We do extensive user research to understand how our AI projects, especially LLMs, will be used, perceived and interpreted on the ground. We figure out who could get left out if we introduce certain technologies in our program.
For example, in the case of the chatbot we spoke about earlier, we did user research even before we developed the chatbot. We used a prototyping technique called ‘Wizard of Oz’. In this experiment, we simulate an automated experience, but a human actually controls the flow. For the chatbot, we had ANMs send their questions on WhatsApp, but instead of chatbots responding, a human at the other end replied using a set of scripts. It was not a free-flowing conversation.
We learnt early on that many ANMs cannot type. They have nursing diplomas, they can read and write, but they are not comfortable typing complex messages with medical terms. So, they defaulted to sending voice messages. Our initial plan was to launch a proof of concept that only used text mode because voice is much harder to get right. But after the experiment, we realized that we couldn’t launch a product that didn’t have voice mode because many ANMs would be left out. Often, the ANMs not comfortable typing are the ones who probably need this kind of service the most. So, it wouldn’t just leave out a certain percentage of users, but also those users who would benefit the most from the service. So, we made sure we prioritized voice mode even if it delayed development.
Some of the visuals used in this blog are AI-generated on Canva.
The mission of the Boston Congress of Public Health Thought Leadership for Public Health Fellowship (BCPH Fellowship) seeks to:
It is guided by an overall vision to provide a platform, training, and support network for the next generation of public health thought leaders and public scholars to explore and grow their voice.