Filipa Correia, Samuel Gomes, Samuel Mascarenhas, Francisco Melo, Ana Paiva
In the past years, research on the embodiment of interactive social agents has been focused on comparisons between robots and virtually-displayed agents. Our work contributes to this line of research by providing a comparison between social robots and disembodied agents exploring the role of embodiment within group interactions. We conducted a user study where participants formed a team with two agents to play a Collective Risk Dilemma (CRD). Besides having two levels of embodiment as between-subjects ---physically-embodied and disembodied---, we also manipulated the agents' degree of cooperation as a within-subjects variable ---one of the agents used a prosocial strategy and the other used selfish strategy. Our results show that while trust levels were similar between the two conditions of embodiment, participants identified more with the team of embodied agents. Surprisingly, when the agents were disembodied, the prosocial agent was rated more positively and the selfish agent was rated more negatively, compared to when they were embodied. The obtained results support that embodied interactions might improve how humans relate with agents in team settings. However, if the social aspects can positively mask selfish behaviours, as our results suggest, a dark side of embodiment may emerge.
Start Time | End Time | |
---|---|---|
07/14 15:00 UTC | 07/14 17:00 UTC |
There is a lot that I liked about this paper. I commend the authors on designing and running a complex experimental protocol that evaluated autonomous robots and agents in a team game setting. The paper is well written and clearly laid out, describing the hypotheses and findings extensively and providing a good discussion of the findings. Furthermore I liked the honest and thoughtful discussion of the negative results found in this study. Overall, this research aims at a very interesting question related to agent embodiment in team settings. I wish that the authors would have more clearly developed theories and predictions for clearly identified and motivated issues surrounding these topics. Herein lies my main critique of the current paper. The authors spend almost three pages setting up topics such as embodiment and team dynamics, but do not converge on a clear theory that their experiment is trying to test. What is the relationship between the studied constructs? What effects would you predict? And importantly, why does it matter to future HRI? In other words, what is the main research question that the authors have? Instead the lengthy introduction and related work moves straight into a list of hypotheses, which seem overreaching (in the sense that there are too many of them) and unfocused. My sense is that this experiment actually gets at some really interesting relationships between trust, group success, and embodiment, but the authors never clearly state these questions. A more focused description of the research question would make this work much stronger. As a result of this lack of focus, the authors combine a lot of different manipulations and measured dependent variable, which also contributed to the lack of focus in this work. Clear experimental manipulations targeting one or two sharp research questions would be preferred. For example, I was left confused about the hypothesis that the perception of the pro-social agent will be more positive in the embodied condition. Do authors thing that it will be less positive in the disembodied condition? A "compared to" phrase might have disambiguated this. An additional issue with the paper is that the authors do not explain the mechanism of the game. This is a crucial part to understanding the experiment and the ability to judge it and possibly replicate it in the future. The sample diversity is commendable, in comparison to many HRI studies. However, I found the significant differences between the two main conditions (embodiment) to be too large. The confounds are striking, as can be seen - for example - in the large drop-out rate for the online participants. Why not have people in the lab experiment the disembodied condition? I would also have liked to know how the agents' behavior was monitored in the online condition (and, in fact, in the lab condition as well). As a smaller point, the Abstract can be much shortened and focus on the findings. In summary, this research gets at some very interesting questions, but the research questions are not well set up, resulting in a large number of confounding manipulations and somewhat unfocused dependent variables. The authors make many good choices, but the lack of theory and clear research questions, along with a lack of detail about the actual procedure, are strong drawbacks of the currently submitted work.
This paper is overall very well written and makes a relevant contribution to the HRI community by discussing the implication of embodiment in human-agent teams in a mixed-motive context. While I am overall in favor of accepting the paper, I have some comments, mostly related to the readability and organization of arguments in the paper and the presentation of the results. I. Introduction The introduction could be a bit more concise in motivating the main contributions of the paper, which are (a) the comparison of an unembodied unimodal agent and an embodied multimodal agent and (b) the interaction with two agents in a mixed-motive task. Right now, the introduction is too detailed in motivating both contributions. This overlaps partially with section II and III, but it also makes the introduction harder to follow because one loses focus. I feel the information that is conveyed in Fig. 1 is rather confusing and could be more understandable conveyed inside the text. If a picture were to be added for this, I would merge it with Fig. 2 to make it more apparent what is considered the environment and what modalities of interaction exist. II. Embodiment This section in turn would benefit from adding some of the details that were given in the introduction to make it more understandable. Specifically, the first part of the argument is hard to follow because the ‘social embodiment’ is the only one of the six embodiments that is explicitly mentioned, but it is difficult to grasp this concept without mentioning and then properly discussing the differences to the other embodiment definitions. I also believe this section would become clearer if it is discussed with the specific embodiments that are chosen for conducting the experiments later in this paper. Especially the importance of discussing structural coupling only becomes evident when reading the paper for a second time, because then the reader knows that both robots can interact with the game interface which is part of the environment in this scenario. Here, it would also be of great help to have Fig. 2 as a reference. IV. A Task / B Independent Variables I think it would be helpful to explicitly name the mixed-motive that was mentioned in the introduction here again. Specifically, what are the mixed objectives of each player in the game? Is there one individual that wins the game? If so, is the manipulation of the winner related to the team winning or losing or the human player winning in comparison to the robots as well? Did participants have an incentive to win the game? IV. D Dependent Measures How were the two different agents differentiated in the post-game questionnaire? Were they given names or other identifiers? Were all participants asked to rate the cooperative or the defective agent first, so could there potentially be an ordering effect, or was that randomized? IV. F Sample Bartneck et al. [0] found a difference between lab-based experiments and experiments conducted online. Even though they argue that practical implications are small, they name the broader range of demographics to be found on AMT as one potential influence factor on their results. This goes much beyond the distribution of age and gender and is more related to the interest in the subject, level of education or academic background. Since the sample for the lab-based study in this paper was recruited “at the facilities of an energy company”, it is likely that the population for the lab-based study was more homogeneous than the AMT sample. This should be further discussed in the paper. [0] Bartneck, Christoph, et al. "Comparing the similarity of responses received from studies in Amazon’s Mechanical Turk to studies conducted online and with direct recruitment." PloS one 10.4 (2015). V. Results Figure 3 is misleading because it communicates that all of the diagrams are using the same scale for the y-axis, when it is in fact different scales for all of them. The specific scale needs to be added to each axis, so this becomes evident. It is also very difficult to read the individual captions because there is no gap between (a), (b) and (c). For Figure 4, I would advice to use percent instead of number of participants, because with the uneven number of participants per condition it now looks like more people picked the pro-social one in the embodied condition (because the bar is higher), when if the opposite is true. VI. Discussion I was wondering if another explanation for the results could be that the two robots looked exactly the same and were thus also expected to behave the same, while people might have had less expectations towards the two unembodied agents because they didn’t communicate similarity due to their appearance? This might be an interesting consideration for a future study as well. Another consideration that came to my mind was the ethical implication coming from a robot being easily able to disguise its selfish behavior just by having an embodiment. This could potentially be misused by designers in the future to cover potentially harmful behavior from a robot. I would suggest adding the considerations for human-robot teams (that is currently in the conclusions) to the discussion section, because this would make the flow of the argument more readable to me.
The paper is mostly well written in terms of language but lacks a lot of methodological clarifications. It appropriately motivates the need for further research in comparisons of embodiment in human-robot interactions. The related work is comprehensive and works are reported adequately. The discussion raised on the definition of embodiment is interesting and well researched, but would be curious to extend the discussion also among the anthropomorphic elements of voice. I think a comparison in a mixed-design experiment where embodiment is manipulated within subjects and agent degree of cooperation manipulated between subjects would be also interesting. It would not allow the use of in-lab and crowdsourcing platforms, but would further highlight embodiment effects. I am also curious on the choice of text output for communication with the disembodied agent. Would subjects’ behaviour shift when they have to focus on the text? Why not choosing a disembodied agent that can communicate only with voice? I did not see anywhere reporting the scale of the questionnaires (only in figures). Also, the measures deserve a bit more description than citing the papers to inform the reader what is going to be compared across conditions. The experimental setup involves a collaborative game framing a collective risk dilemma. There is something I was not able to understand well on the paper: if the choice through digital dice is predetermined, how can players decide to be either pro-social or selfish? Furthermore, more information on the verbal and non-verbal behaviour of the robots is needed, for graduate students that wish to replicate or extend this study, and to control on what robot behaviour affects the reported results. How do subjects in the disembodied condition know who of the two agents is speaking, and is it clear that there are two agents, especially when this condition is evaluated online? How did authors control for this? The within-subjects variable description is also a bit confusing. Are both agents either pro-social or selfish or only one at the time? If one at the time, what is the other agent’s behaviour? If I understand correctly the design of the experiment, authors should run Two-way repeated-measures ANOVAs indicating the existence of two between subject factors and one within. Is that correct? For instance, in subsection A of the results, the within factor is not mentioned. This should be made clear. Minor: a few grammatical mistakes, and a few cases of a mix of British and US English are used throughout the paper. I would suggest the authors have another look before the next paper version. Overall, the paper presents an interesting problem and has potential, as long as clarifications in method are met. It appears to be theoretically sound and relevant to the conference.