Blog: Human-centered AI cheat-sheet
On May 17, 2017, Sundar Pichai stood onstage at Google I/O and told everyone that Google was moving from being mobile-first to AI-first. Meanwhile back at the Google offices, there were quite a few folks looking around asking “what does that actually mean?”
For us fortunate few UXers who’d been tinkering with integrating AI into early-stage product development, we immediately started to see an influx of interest from our peers. So we began scraping together internal workshops to share the tips and tricks we’d picked up along the way. Those grew into a series of talks, then articles, a company-wide office hours, a mentorship program, an internal education series, and ultimately the People + AI Guidebook.
Throughout my journey as a UXer working on AI, I’ve been refining this cheat-sheet of questions-as-guidance. It’s helped me through countless consultations, crits, and jam sessions, and has continued to be my safety blanket as I’ve transitioned to Microsoft, where I lead design for applied ethics in AI and Mixed Reality. Hopefully others will find it useful, too :)
AI is uniquely suited to situations where people can collectively agree on what “good” looks like, but it would either be infeasible to code if/then/else application logic that’d consistently produce good results or impractical for people to perform the task manually.
It’s also important to maintain a healthy skepticism about where AI can actually add unique value…
People: Thrive on novelty
Machines: Thrive on memory
People: Ask questions out of curiosity
Machines: Can respond instantly
People: Excel at special cases but get distracted
Machines: Excel at repetition but never lose focus
What human needs are being addressed?
How might a probabilistic system uniquely address these needs?
Our job is to improve the lives of as many people as possible by augmenting their capabilities, so…
Let’s make … (Product or program)
For … (Who)
So they can … (Something they couldn’t do before)
The burden of proof that any part of a system can be automated should hinge on the strength of agreement — across a broad spectrum of people — about what useful outcomes look like; and how well that agreement holds up across a diversity of users, use cases, and environments of use. Furthermore, it must be possible to compare performance between socially-constructed groups, even if those groups aren’t equally represented in the data set.
What are we trying to predict?
What goals will drive optimization? Over what time frame?
What would a useful prediction look like?
In what contexts do we believe predictions will be most useful? Why?
Who stands to benefit most? Least?
Why do we believe that the data used to train are representative of the expected users, use cases, and contexts of use?
How will we respond when unwanted predictions are made that substantively negatively impact people?
Machine learning is a process of teaching an AI to develop a ‘hunch’ about something. Traditional software engineering, by contrast, is about rote memorization; i.e. ‘recognize this precise scenario and do this precise thing’. But remember, if a human can’t do it, neither can an AI, so it’s important to initially focus on tasks that are grounded in some form of real — or at least theoretical — human expertise.
Therefore, when evaluating the unique capabilities of an AI, it can often be useful to frame things from the perspective of a human expert; and in particular how that expert might struggle due to the hard constraints of time, attention, or memory. An AI trained with sufficiently representative examples of what useful options look like can pour over mountains of data — applying the fuzzy logic it’s learned — without ever getting tired, distracted, or forgetful.
Describe the way a theoretical human “expert” might perform the task today.
If a human were to perform the task, what information would they need? How would they get that information?
If a human were to perform the task, what information would they consider critical to pay attention to and what would be ignorable?
Critical to pay attention to … (Examples of what people think they would paid attention to if they were performing the task)
Ignorable … (Examples of what people think they could ignore if they were performing the task)
If a human were to perform the task, what might they say or do when they weren’t confident?
If a human were to perform the task, what assumptions would you want them to make?
If a human expert were to perform the task, how would you respond to them so they improved the next time if they …
- Made a helpful prediction about what to do (true positive)
- Made an unhelpful prediction about what to do (false positive)
- Made a helpful prediction about what not to do (true negative)
- Made an unhelpful prediction about what not to do (false negative)
When learning to perform a task, people are well-adapted to quickly evaluating and pruning the characteristics of the task that lack practical value. Algorithms, meanwhile, will treat all features of data as equal unless told otherwise. Furthermore, the more confident people feel about their abilities when interacting with a system, the more they will persevere, learn, and succeed in using that system.
What can the user do to make the AI work better for them?
How should the AI grow with the user? How do we expect the user’s behavior to change after the 10th use? 100th use? 1000th use?
How should the context of use affect the way the AI behaves?
How “wrong” can the AI be, and when?
User agency and optionality
The UX of an AI should start with the assumption that a human being will have the final say. The more an AI’s behavior is — or should be — affected by personal context, the more reference points and calibration opportunities should be offered to the user. Said another way: The role of AI shouldn’t be to find the needle in the haystack for people, but to show them how much hay it can clear so they can better see the needle themselves.
If a human were to perform the task, what questions might a user ask them in order to understand their goals? How might these questions change based on the user’s context? (e.g. time of day, environment, past experiences)
How might the user — or people observing the user — perceive the outputs of the AI to represent their personal expression?
How might the outputs of the AI potentially expose people to risk if it were to make a mistake?