Two weeks ago, we received an update to a Google Pixel device that moved the smartphone to the latest Android 15 operating system. The new OS has provided a number of new accessibility features, including the integration of Google’s own Gemini AI software into the TalkBack screen reader. When images lack alternative text, Gemini can analyse the image and provide a suggestion. We’ve tested a few images to see how it performs, both in its attempt to provide alt text where it’s missing and to compare its alt text with manually added alt text where applicable. We also compared Gemini’s performance with iPhone’s VoiceOver image descriptions.
We noticed that while Gemini’s AI changed each time it was utiised, VoiceOver image descriptions on iPhone typically remain the same each time you activate VoiceOver for the same image.
Testing Gemini on Android 15
To use the feature on compatible devices, enable the option to use Gemini to identify missing alternative texts in the TalkBack accessibility settings. Once enabled, you can test the feature by focusing on an image by moving your finger onto it, then opening the TalkBack menu by swiping up the left-hand side of the screen and then across the top. In the menu, a new option called ‘Describe Image’ will appear, which will activate Gemini to analyse the image.
Enabling VoiceOver on iPhone
To activate VoiceOver on your iPhone, start by locating the Settings icon on your home screen and tapping on it. From there, navigate to the Accessibility menu and select VoiceOver. You’ll see a toggle switch – simply slide it to the right to enable VoiceOver
Testing Image Description Capabilities
We tested both Gemini and VoiceOver with three images:
Image 1: Infographic of India’s Elections
Sources: Government of India, Election Commission of India, United States Census, Eurostat, World Bank, Pew Research Center
Graphic: Rosa de Acosta, CNN.
The image comes from CNN World’s article about elections in India.
– Original Alt Text: None
– Gemini AI Description (First Attempt): This infographic shows population comparisons and election length, in days, for various countries. The image highlights the population of India being larger than the European Union and the countries of North America combined. It also points out that this year’s India election will be the second longest in the country’s history, after the 1951-52 election. The image includes a visual representation of the population of the listed countries, shown in circles of varying sizes.
– Gemini AI Description (Second Attempt): This image depicts a comparison of the population of several countries and regions. It highlights India’s massive population compared to the EU, US, Canada, and Mexico. India is the most populous country in the world, with a population of 1.42 billion. This is greater than the combined population of the EU (448.8 million), US (336.3 million), Canada (38.9 million), and Mexico (127.5 million). The image uses circles to represent population size, with larger circles indicating a greater population. The caption explains that the Indian election, which is held to choose the members of the lower house of parliament, is a huge undertaking that takes place over a month.
– VoiceOver Description: Infographic (upper part): A screenshot of a computer programme with a list of options. India elections by the numbers. The most populous country in the world elects.
Image 2: Reindeer Figurine from Australia Post
The image comes from Australia Post’s Christmas Catalogue.
– Original Alt Text: “JoyUp Light Up Reindeer – Gold product photo”
– Gemini AI Description (First Attempt): This image shows a golden, glittery reindeer figurine, with a gold bow around its neck. It is posed standing with its head slightly tilted to the side. The figurine appears to be made of some type of woven material that is covered in glitter. It is likely a Christmas decoration.
– Gemini AI Description (Second Attempt): A gold and glittery reindeer figurine, with a gold bow, is standing on a white background. The reindeer is facing the right of the frame. The reindeer’s antlers are curled forward. The bow is tied around the reindeer’s neck. There are no other objects in this image.
– VoiceOver Description: A group of illustrations of people next to text. DD to cart. Is light up gift.
Image 3: Athlete
The image comes from news.com.au’s article about the athlete Gout.
– Original Alt Text: “ADELAIDE, AUSTRALIA – APRIL 11: Gout Gout of Queensland winning the men’s u20 100m during the 2024 Australian Athletics Championships at SA Athletics Stadium on April 11, 2024, in Adelaide, Australia. (Photo by Sarah Reed/Getty Images)”
– Gemini AI Description (First Attempt): A male runner with dark skin, wearing a red tank top, is running in a race. He is raising his right arm, and his face shows exertion and determination. He is wearing a race number with the name “Gout” on it.
– Gemini AI Description (Second Attempt): The image shows a young black man, in a red and black running singlet. He is running on a track.
– VoiceOver Description: A person wearing a number on their shirt and running. IMPI. QU. Gout. Chemist.
Comparison and Findings
Our tests revealed that Gemini AI provides richer and more detailed descriptions compared to VoiceOver. However, VoiceOver’s descriptions were more concise and faster to generate. VoiceOver’s technology is based on older AI models, which may explain the difference in performance. We also found that while Gemini was richer, it would sometimes add “opinion” based descriptions – for example, what was perceived to be the emotions on the athlete’s face.
We also noted that:
– Generally speaking, Gemini’s ability to identify public figures was inconsistent.
– VoiceOver sometimes misinterpreted image content, reading the Safari navigation bar as part of the image.
– Gemini allows users to easily choose between existing alt text and AI-generated descriptions, whereas VoiceOver requires users to enable AI-generated descriptions in settings.
Conclusion
The integration of Gemini AI into Android 15’s TalkBack screen reader is a major step forward for accessibility. Our testing shows that Gemini provides more detailed and accurate image descriptions than VoiceOver. This highlights the importance of AI-powered solutions in addressing the online accessibility gap. As AI technology improves, we can expect even better image description capabilities in the future
The future of accessible tech is exciting!