catlab

[PUBLISHED] The role of automation etiquette and task-criticality on performance, workload, automation reliance, and user confidence

Abstract

Previous research suggests good automation etiquette can yield positive effects on user performance, trust, automation reliance, and user confidence – especially in personified or anthropomorphized technologies. The current study examined the impact of automation etiquette and task-criticality in non-personified technology. The study used a computer-based automation task to examine good and bad automation etiquette models and different domain-based task-criticality levels (between-subjects) that contained various stages of automation (stage 2 and stage 3) and automation reliability levels (60% and 80%) (within-subjects). The study found that bad automation etiquette can increase automation bias and automation reliance which improved performance in the most capable automation condition (stage 3; 80% reliable) but also heightened user subjective workload and decreased user self-confidence. The study also demonstrated task-criticality can be successfully manipulated through domain and instructions. Overall, automation etiquette influenced performance and user confidence more in highly capable automation (stage 3) and in a low-criticality task domain.

Link to paper

[PUBLISHED] Polite AI mitigates user susceptibility to AI hallucinations

Pak, R., Rovira, E., & McLaughlin, A. C. (in press). Polite AI mitigates user susceptibility to AI hallucinations. Ergonomics. https://doi.org/10.1080/00140139.2024.2434604

Abstract

With their increased capability, AI-based chatbots have become increasingly popular tools to help users answer complex queries. However, these chatbots may hallucinate, or generate incorrect but very plausible-sounding information, more frequently than previously thought. Thus, it is crucial to examine strategies to mitigate human susceptibility to hallucinated output. In a between-subjects experiment, participants completed a difficult quiz with assistance from either a polite or neutral-toned AI chatbot, which occasionally provided hallucinated (incorrect) information. Signal detection analysis revealed that participants interacting with polite-AI showed modestly higher sensitivity in detecting hallucinations and a more conservative response bias compared to those interacting with neutral-toned AI. While the observed effect sizes were modest, even small improvements in users’ ability to detect AI hallucinations can have significant consequences, particularly in high-stakes domains or when aggregated across millions of AI interactions.

Practitioner Summary

This study examined how AI chatbot etiquette affects users’ susceptibility to AI hallucinations. Through a controlled experiment, results showed polite-AI led to modestly higher sensitivity in detecting hallucinations and a more conservative response bias. This suggests a potential design strategy that may enhance users’ critical evaluation of AI-generated content.

[PUBLISHED] Attention control measures improve the prediction of performance in navy trainees

Burgoyne, A. P., Mashburn, C. A., Tsukahara, J. S., Pak, R., Coyne, J. T., Foroughi, C., Sibley, C., Drollinger, S. M., & Engle, R. W. (in press). Attention control measures improve the prediction of performance in navy trainees. International Journal of Selection and Assessment. https://doi.org/10.1111/ijsa.12510

Abstract

Military selection tests leave room for improvement when predicting work-relevant outcomes. We tested whether measures of attention control, working memory capacity, and fluid intelligence improved the prediction of training success above and beyond composite scores used by the U.S. Military. For student air traffic controllers, commonality analyses revealed that attention control explained 9.1% (R = .30) of the unique variance in academic performance, whereas the Armed Forces Qualification Test explained 5.2% (r = .23) of the unique variance. For student naval aviators, incremental validity estimates were small and nonsignificant. For student naval flight officers, commonality analyses revealed that attention control measures explained 11.8% (R = .34) of the unique variance in aviation preflight indoctrination training performance and 4.3% (R = .21) of the unique variance in flight performance. Although these point estimates are based on relatively small samples, they provide preliminary evidence that attention control measures might improve training outcome classification accuracy in real-world samples of military personnel.

Psychology professor receives funding to examine the role of cognition in human-autonomy collaboration

Richard Pak, psychology professor and director of the Human Factors Institute at Clemson University, has received more than $560,000 from the National Science Foundation (NSF) to examine how cognitive abilities, including attention and memory, influence users’ performance with automated systems.

Psychology professor receives funding to examine the role of cognition in human-autonomy collaboration

[PUBLISHED] Knowledge, attention, and psychomotor ability: A latent variable approach to understanding individual differences in simulated work performance

Mashburn, C. A., Burgoyne, A. P., Tsukahara, J. S., Pak, R., Coyne, J. T., Sibley, C., … & Engle, R. W. (2024). Knowledge, attention, and psychomotor ability: A latent variable approach to understanding individual differences in simulated work performance. Intelligence104, 101835.

Abstract:  We compare the validity of personnel selection measures and novel tests of attention control for explaining individual differences in synthetic work performance, which required participants to monitor and complete multiple ongoing tasks. In Study 1, an online sample of young adults (N = 474, aged 18–35) based in the United States completed three-minute tests of attention control and two tests that primarily measure acquired knowledge, the Wonderlic and the Armed Forces Qualification Test (AFQT). Structural equation modeling revealed that acquired knowledge tests did not predict simulated work performance beyond attention control, whereas attention control did predict simulated work performance controlling for other measures. In Study 2, an in-lab sample of young adults from Georgia Tech and the greater Atlanta community (N = 321, aged 18–35) completed tests of attention control, processing speed, working memory capacity, and versions of two U.S. Military selection tests, one assessing acquired knowledge (the AFQT) and one assessing psychomotor ability (the Performance-Based Measures assessment from the Aviation Selection Test Battery). Structural equation modeling revealed that attention control fully mediated the relationship between the Performance Based Measures and simulated work performance, but the AFQT and processing speed retained unique prediction. We also explore possible gender differences. Collectively, these results suggest that tests of attention control may be a useful supplement to existing personnel selection measures when complex cognitive tasks are the criterion variable of interest.

Human Factors and Ergonomics Society’s 2023 Journal of Cognitive Engineering and Decision Making Article Award

Our paper:

Textor, C., Zhang, R., *Lopez, J., *Schelble, B. G., McNeese, N. J., Freeman, G., Pak, R., Tossell, C., de Visser, E. J. (2022). Exploring the Relationship Between Ethics and Trust in Human-AI Teaming: A mixed methods approach. Journal of Cognitive Engineering and Decision Making.

Has been selected as the best article of 2022 in the Journal of Cognitive Engineering and Decision Making.

[PUBLISHED] A Theoretical Model to Explain Mixed Effects of Trust Repair Strategies in Autonomous Systems

Our recent paper has been accepted for publication.

Pak, R., & Rovira, E. (2023). A Theoretical Model to Explain Mixed Effects of Trust Repair Strategies in Human-Machine Interaction. Theoretical Issues in Ergonomics Science. https://doi.org/10.1080/1463922X.2023.2250424

An uncorrected preprint is available here.

Abstract: The topic of an autonomous system initiating trust repair has generated intense interest from researchers and has led to a stream of empirical works studying the impact of different trust repair strategies.  Unfortunately, there does not seem to be a clear pattern of results or systematicity in the experimental manipulations. This is likely due to a lack of a coherent model or theoretical framework of trust repair.  We present a possible theoretical model that may explain and predict how different trust repair strategies may work with different autonomous systems, in different situations, and with different people.  We have adapted and applied a well-established social cognition theory that has successfully explained and predicted complex attitudinal and behavioural phenomena.  The model suggests that significant variance in trust repair results may be partly due to individual differences (e.g., motivation, cognitive abilities), which have not been extensively examined in the literature, and confounded or uncontrolled study parameters (e.g., timing of trust measurement, repair frequency, workload).  We hope that this theoretical model stimulates discussion toward a more theory-driven trust repair research agenda to understand basic underlying mechanisms.

[PUBLISHED] The complex relationship of AI ethics and trust in human–AI teaming: insights from advanced real-world subject matter experts

Our latest paper is published:

Lopez, J., *Textor, C., *Lancaster, C., *Schelble, B., Freeman, G., Zhang, R., McNeese, N., & Pak, R. (2023). The complex relationship of AI ethics and trust in human–AI teaming: insights from advanced real-world subject matter experts. AI and Ethics, 1-21.

Download PDF

Abstract: Human-autonomy teams will likely first see use within environments with ethical considerations (e.g., military, healthcare). Therefore, we must consider how to best design an ethical autonomous teammate that can promote trust within teams, an antecedent to team effectiveness. In the current study, we conducted 14 semi-structured interviews with US Air Force pilots on the topics of autonomous teammates, trust, and ethics. A thematic analysis revealed that the pilots see themselves serving a parental role alongside a developing machine teammate. As parents, the pilots would feel responsible for their machine teammate’s behavior, and their unethical actions may not lead to a loss of trust. However, once the pilots feel their teammate has matured, their unethical actions would likely lower trust. To repair that trust, the pilots would want to understand their teammate’s processing, yet they are concerned about their ability to understand a machine’s processing. Additionally, the pilots would expect their teammates to indicate that it is improving or plans to improve. The findings from this study highlight the nuanced relationship between trust and ethics, as well as a duality of infantilized teammates that cannot bear moral weight and advanced machines whose decision-making processes may be incomprehensibly complex. Future investigations should further explore this parent–child paradigm and its relation to trust development and maintenance in human-autonomy teams.

[PUBLISHED] Nature and measurement of attention control

Our latest paper is published in the Journal of Experimental Psychology: General:

Burgoyne, A. P., Tsukahara, J. S., Mashburn, C. A., Pak, R., & Engle, R. W. (2023). Nature and measurement of attention control. Journal of Experimental Psychology: General. Advance online publication. https://doi.org/10.1037/xge0001408

Abstract:  Individual differences in the ability to control attention are correlated with a wide range of important outcomes, from academic achievement and job performance to health behaviors and emotion regulation. Nevertheless, the theoretical nature of attention control as a cognitive construct has been the subject of heated debate, spurred on by psychometric issues that have stymied efforts to reliably measure differences in the ability to control attention. For theory to advance, our measures must improve. We introduce three efficient, reliable, and valid tests of attention control that each take less than 3 min to administer: Stroop Squared, Flanker Squared, and Simon Squared. Two studies (online and in-lab) comprising more than 600 participants demonstrate that the three “Squared” tasks have great internal consistency (avg. = .95) and test–retest reliability across sessions (avg. r = .67). Latent variable analyses revealed that the Squared tasks loaded highly on a common factor (avg. loading = .70), which was strongly correlated with an attention control factor based on established measures (avg. r = .81). Moreover, attention control correlated strongly with fluid intelligence, working memory capacity, and processing speed and helped explain their covariation. We found that the Squared attention control tasks accounted for 75% of the variance in multitasking ability at the latent level, and that fluid intelligence, attention control, and processing speed fully accounted for individual differences in multitasking ability. Our results suggest that Stroop Squared, Flanker Squared, and Simon Squared are reliable and valid measures of attention control. The tasks are freely available online: https://osf.io/7q598/. (PsycInfo Database Record (c) 2023 APA, all rights reserved)