R&D notes

Measuring digital accessibility with Autotest

Jonathan Robert Pool

Introduction

Almost everybody wants or needs to use digital technology, but roughly 20 to 25% of the world population has a durable disability. Many more have occasional disabilities, such as bad lighting that makes it hard to distinguish colors or a rough road that makes it hard to tap a display precisely.

Web accessibility, or more generally digital accessibility, comes to the rescue in these situations. It is similar to physical accessibility. Just as buildings can be equipped with ramps and wide doors to accommodate wheelchair users, similarly websites, mobile apps, email messages, and digital files can be designed and built to limit or eliminate barriers, accommodating a wide range of durable and temporary disabilities. Just as there are detailed standards for physical accessibility, there are also international industry standards for digital accessibility. The latter are incorporated into law in about 20 countries.

There is an expert consensus that accessible digital products benefit not only persons with recognized disabilities, but also most other users of those products. Thus, accessibility is a closer approximation to quality in general than one might have guessed.

If you care about digital accessibility, you presumably want to measure it. You want to be able to say, Our main competitor’s home page is 30% more accessible than ours or The latest revisions made the app even less accessible than it was before. Measuring accessibility gives you a target to aim for and a tool for monitoring your progress.

And, if you want to measure the accessibility of many digital artifacts at frequent intervals, automating the measurement will help. Automated measurement loses some information that human testers could provide, but gains affordability, replicability, and improvability. If I use a fully automated method to measure the accessibility of your digital product, you can apply the same method to check my results, and you can create new versions of my method to make the results more valid. It also becomes feasible to produce large-scale comparisons, such as Vaccine Website Accessibility, published by Johns Hopkins University.

Tools

Products that automatically measure digital accessibility are sold or licensed for up to hundreds of thousands of dollars. But there are also open-source, free, and nearly free automatic measuring tools available for anybody to use. Three of the prominent ones are:

These test packages are partly, but only partly, duplicative. Using them all provides more comprehensive measurement than using only one of them.

But, even though these three packages perform a combined total of 411 tests, much of accessibility lies beyond their reach. As one example, many users are annoyed by content that keeps blinking, flashing, or moving, and some users can suffer nausea or seizures from such content. None of the above tools is designed to report such motion.

Autotest

Autotest is an automated testing prototype, incorporating work that began in 2018. It navigates the web and performs actions on web pages as you instruct. At any point in a browsing process, you can tell it what test or tests to perform. You can have it weight and aggregate the results of its tests to produce a total score for a page, site, or process. You give it all the instructions in advance; once it starts, it follows the instructions automatically.

You give instructions to Autotest by writing a script; the documentation explains how to do this. If you are a JavaScript developer, you can also revise and extend the existing tests and scoring rules.

You can define a batch of web pages and apply a script to all of them. Autotest will output a report for each page, and you can collect their scores to create a comparative report.

Autotest can run the three test packages named above and other custom-made tests.

Current work on Autotest is focused on its procedure named a11y, which runs the three test packages named above and additional tests designed to supplement those packages. Version 7 of a11y includes these 16 additional custom tests:

The a11y procedure takes the results of the package tests and the custom tests, applies weights and duplication discounts to them, and generates a total score. A score of 0 is perfect. There is no upper limit. The more accessibility problems the tests find, and the more serious they are deemed to be, the higher the score. Scores reflect judgments of the test authors about severity. These judgments are biased toward industry standards, conventions, simplicity, stability, predictability, and testability.

The tests in the a11y procedure make no claim to perfection. It is reasonable to consider each test failure as a warning of an accessibility risk, rather than a conclusive finding of fault. Inspection of test results reveals opportunities for improvements in tests, so the development of Autotest continues.

Autotest procedures have powered some web-page accessibility comparisons, including: