Researchers at the Massachusetts Institute of Technology (MIT) have created a dataset “VisText”, with the aim of improving the accessibility and understanding of complex charts and graphs through automatic captioning systems. The idea came from MIT’s Visualisation Group research, which examined effective chart captions. The study revealed that users who were blind or with low vision had different preferences for caption complexity. Based on this research the team introduced VisText, consisting of over 12,000 charts presented in different formats alongside corresponding captions.
The MIT team trained five machine-learning models with VisText using different representations, consisting screen graphs, data tables and images. Models trained with scene graphs matched or outperformed those trained with data tables, displaying the potential of scene graphs. To ensure accuracy, the researchers categorise common errors made by their best models. Ethical concerns around misinformation led them to suggest auto captioning systems as authorship tools, allowing users to edit and verify captions.
Future work comprises of refining models to reduce errors, expanding the dataset to include more complex charts and gaining insights into auto-captioning learning processes. The VisText dataset signifies an important advance in auto-captioning, showing increased accessibility for visually impaired individuals through machine learning-powered systems.
For more information, please refer to the MIT News article on the VisText AI Tool.