A Bayesian Framework for Nash Equilibrium Inference in Human-Robot Parallel Play

Shray Bansal, Jin Xu, Ayanna Howard, Charles Isbell

Abstract

We consider shared workspace scenarios with humans and robots acting to achieve independent goals, termed as parallel play. We model these as general-sum games and construct a framework that utilizes the Nash equilibrium solution concept to consider the interactive effect of both agents while planning. We find multiple Pareto-optimal equilibria in these tasks. We hypothesize that people act by choosing an equilibrium based on social norms and their personalities. To enable coordination, we infer the equilibrium online using a probabilistic model that includes these two factors and use it to select the robot's action. We apply our approach to a close-proximity pick-and-place task involving a robot and a simulated human with three potential behaviors - defensive, selfish, and norm-following. We showed that using a Bayesian approach to infer the equilibrium enables the robot to complete the task with less than half the number of collisions while also reducing the task execution time as compared to the best baseline. We also performed a study with human participants interacting either with other humans or with different robot agents and observed that our proposed approach performs similar to human-human parallel play interactions. The code is available at https://github.com/shray/bayes-nash.

Live Paper Discussion Information

	Start Time	End Time
	07/15 15:00 UTC	07/15 17:00 UTC

Virtual Conference Presentation

Paper Reviews

Review 1

In this submission, the authors present a novel model for human-robot task planning that casts the problem as a general-sum game in which multiple Nash equilibria are weighted against each other using a Bayesian formulation. The formulation includes an expert-crafted and domain-specific social norm and a agent-specific individual preference that is inferred online. The authors also present a set of three related studies in which the approach is evaluated against baselines. This includes a simulated human study, a human-human study, and a human-robot study. Through the studies, the authors show that the developed approach leads to a good balance of safety and efficiency, reducing the number of safety stops while also lowering the time to complete the task. While this was true in the simulated human study, the developed approach actually led to more safety stops in the human-robot study. The authors present plausible reasons for this results and make suggestions for follow-on work. The paper is placed well within context of prior work, and presents the method in a clear and concise manner. The approach appears novel and is technically sound. The ability to infer agent preferences online and leverage domain-specific norms to select from multiple equilibria is an interesting and useful idea. The experiments and analysis are interesting and informative. The main drawback I see is that while the presented approach is designed for any N number of agents collaborating together, all of the analysis is done with just two agents. It is difficult to determine whether the conclusions and claims made by the authors about the approach would actually carry to a larger multi-agent scenario. Would the robot be able to infer the preferences of multiple people successfully and maintain safety and efficiency? Would the approach be computationally tractable in that case? (A discussion of computational complexity and real-time performance is needed). The next drawback is that the analysis/explanation of the studies is a bit lacking. There is no statistical analysis of the simulated studies (so it’s hard to tell what differences are actually significant) and the descriptions are incomplete. For example, what are the error bars in the figures? (standard deviation, SEM, etc.). Also, how are some of the parameters set, and what were the selected values? (e.g., tau in Eq 3 and 6). These are critical for reproducibility. Finally, the paper has a few typos/grammar issues: - Section V.B. cuts off mid-sentence - Section VI. “Metrics” starts off with a run-on sentence - Section VIII.A. there is a reference to Figure 8, that is actually referring to Figure 6.

Review 2

***** Strengths and weaknesses: The paper considers a relevant and topical problem, which is of interest to the conference audience. As stated in the summary of contributions, the authors identify that very few formalisms exist that utilize game theory for HRI and provide a novel approach towards this gap. The research plan is well designed: borrowing insights from human interactions (via human-human studies), designing algorithms building upon prior formalism (i.e., game theory and Bayesian inference), and evaluating them with humans. The paper is overall well written, with a good coverage of related work, description of the approach, and results. The key weakness is the evaluation of the proposed approach, where the evaluations are carried out with small sample size, without baselines from the relevant prior art, and (although less importantly) only in simulation. The small sample size makes assessing the generalizability of observed trends difficult. Further, over the past few years, several approaches have been developed for generating robot behavior in shared workspace tasks. In the absence of evaluations against a representative baseline (see detailed comments for suggestions), it is difficult to assess the utility of game-theoretic formalisms in general and the proposed approach in particular. Please see the detailed comments and suggestions listed below. ***** Comments and suggestions: 1) (Abstract) The abstract states that the proposed approach outperforms the best baseline. This statement should be better qualified as this is observed only in agent-agent studies, with key differences in human-agent studies. Further, no comparisons are made to baselines from the prior art. 2) (Related Work) Despite the presence of several HRI formalisms, the authors provide good coverage of related work. However, space permitting, a few highly related papers would be useful to add to this discussion: 2.1) (Game-theoretic approaches) The following paper, which formalizes HRI problem using game theory, is highly related and currently missing from the discussion: Nikolaidis, Stefanos, et al. "Game-theoretic modeling of human adaptation in human-robot collaboration." Proceedings of the 2017 ACM/IEEE international conference on human-robot interaction. 2017. 2.2) (Theory of mind-based approaches) While the theory of mind- based approaches do not explicitly compute or reason about equilibria, they reason about the influence of human on a robot and vice-versa. For instance, please see: Devin, Sandra, and Rachid Alami. "An implemented theory of mind to improve human-robot shared plans execution." 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2016. 3) (Results of Human-Agent Study, Section 8) The experiments, despite their small sample size, are well designed. However, the results of the human-agent study and its difference with the agent-agent study (Section 6), also question several assumptions of a formalism based on game-theoretic equilibrium. For instance, - Do human-robot interactions necessarily follow an equilibrium, especially given that both the human and robot can adapt? - One interpretation presented in Section 9 highlights that humans indeed adapt and modify their policy in response to that of the robot. This observation raises the question, "Can the proposed approach identify if a stable equilibrium has been reached and, if so, correctly estimate its value?" The results indicate otherwise. As noted above, the observed results are informative for the design of HRI algorithms as well as to understand the utility of game-theoretic formalisms for computing robot policies. Consequently, it will be useful to include additional discussion, which addresses the above questions. 4) (Relation to decision-theoretic approaches) Several decision-theoretic approaches have been developed and demonstrated to perform effectively in shared workspace tasks (for instance, see list below). Similar to the game-theoretic approach proposed in the submission, these approaches maintain an estimate of the human's latent state (either preference or goal) and arrive at robot policy. However, they do not require the presence of equilibrium and can tackle larger problem spaces (e.g., continuous spaces in the case of Javdani et al.) as compared to the proposed approach. Further, they can algorithmically generate spatio-temporal behavior that is typical of human interaction (e.g., wait and then go), which is absent in the implementation of the Bayes-Nash approach. To demonstrate the utility of the proposed approach, consider including a comparison to one representative decision-theoretic approach from the following, - Chen, Min, et al. "Planning with trust for human-robot collaboration." Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. 2018. - Unhelkar, Vaibhav V., et al. "Human-aware robotic assistant for collaborative assembly: Integrating human motion prediction with planning in time." IEEE Robotics and Automation Letters 3.3 (2018): 2394-2401. - Javdani, Shervin, et al. "Shared autonomy via hindsight optimization for teleoperation and teaming." The International Journal of Robotics Research 37.7 (2018): 717-742. - Cheng, Yujiao, et al. "Towards Efficient Human-Robot Collaboration With Robust Plan Recognition and Trajectory Prediction." IEEE Robotics and Automation Letters 5.2 (2020): 2602-2609. 5) (Section 3) Typically, the action refers to an atomic action, which is chosen and then executed without modification. However, in the current formalism, action corresponds to RRT plans (which can be changed mid-execution). Please clarify if this understanding is correct. If so, consider including a footnote mentioning that actions can be modified mid-execution. 6) (Equation 6) Does the equation only apply to timestep 0? The description following the equation "Comparing the distance ... equilibrium performance." was difficult to follow. Please consider rephrasing this description. 7) (Section 4, Clarification question) Does the formalism assume equilibrium is achieved and remains constant over the task execution, and only the belief over the equilibria changes? Or, does it also apply to cases in which the equilibrium has not been achieved (and is changing during the interaction)? ***** Minor comments on the clarity of the presentation: The submission is overall well written and easy to follow. Minor suggestions and typos are listed as follows: - (Abstract) Bayesian should be capitalized. - (Introduction) Consider including a reference for the term `parallel play` from psychology literature. - (Section 3) The phrase 'set of all actions' is ambiguous, as it refers both to a (the action profile) and A (the set of joint actions). - (Section 4) "we take its joint" is informal. Please change to "we take its joint distribution." - (Section 5A) In the current formalism, does N correspond to 2? If the approach is indeed general and extends to beyond 2 agents, consider mentioning it explicitly in the text. - (Figure 4) The phrase Bayes-Nash has not been defined in the text. Please note that the proposed approach is referred to as Bayes-Nash. - (Figure 5) Similarly, the phrase Fair-Nash has not been defined in the text. I assume that it refers to the baseline "Selfish-Nash." - (Section 6) Both the phrases Bayes-Nash and Nash-Bayes are used in the paper. Consider using only one to maintain uniformity. - (Section 6) Typo: ' We measured measured...'' - (Figure 5b) In the description of Figure 5b, it is ambiguous which human is replaced (the control, the participants, or both). Please consider rephrasing this description. *****

Review 3

Originality: The paper presents a novel solution to interaction in a shared space. The solution was interesting as were the results. Especially novel was using both norms and personality types to select from multiple equilibria. Quality: The modeling efforts were justified and well-reasoned. The norms and personality types were grounded in well-recognized approaches in game theory (e.g., minimax, fairness). The planning algorithm was appropriate for the problem. There are a few ways that the paper could have been improved: (1) The game theory model was based in single-stage games. The problem, however, seemed more aligned with games where human and robot would interact repeatedly. The paper should mention that in repeated play there are many more equilibria (from the folk theorem in repeated play games), including equilibria where the two agents take turns receiving their most preferred outcome. The paper should justify why only equilibria for single stage games were considered. (2) The three types of studies provided evidence that the solution approach has merit. However, as identified by the authors, the human-human and human-robot studies had too few participants to allow any statistical conclusions. That is unfortunate because it decrease confidence in the conclusions. (3) I wasn't sure about some of the aspects of the study with real humans. Specifically, was the order with which the humans interacted with the strategies counterbalanced? If not, it is impossible to know whether the trends in the data are simply from a learning effect. Clarity: The paper is really well written and includes an excellent review of the literature. Assumptions were clear, modeling choices were clear, and limitations were clear. Significance: The paper makes a solid contribution to human-robot interaction, expanding the state-of-the-art.