Photo by Aryan Dhiman on Unsplash
Keyboard case study
A keyboard was invented that predicts if the user is depressed. Your company has heard about this and would like to purchase one for every employee; they claim that they will use this information to prioritize people for mental health services and approval for time off.
The model uses sociodemographic data purchased from external data vendors in addition to data collected from keyboards of employees. The keyboard data indicates how many hours a day an employee works, how many breaks they take, and the speed at which they type. The external data informs us of their income, household composition, race, gender, sexual orientation, and community-level variables which indicate the wealth of a neighborhood. The external data is matched to employee data on first name, last name, and address.
The model architecture is based on a random forest model. It was trained on a sample population at a tech start-up of 250 employees based in San Jose, CA.
The model output is a flag which indicates “Likely Depressed” or “Likely not depressed”. The model is rerun once a month. If Human Resources sees that an employee is flagged as “Likely Depressed”, they will inform the employees direct manager and ask them to offer either time-off or access to mental health services.
Your company has recently decided to purchase this model. Human Resources is hoping it will improve mental health, increase company loyalty, and therefore allow employees to be more efficient.
Discussion questions
1. What concerns do you have about the keyboard and external data? How might this data be biased?
2. Do you have any follow up questions for the data scientists? What other information or data would you like?
3. Are there any other ways that bias could creep into the algorithms?
4. If the biases discussed so far were to lead to an inaccurate prediction, what impacts would that have?