Capturing and Adapting Traces for Character Control in Computer Role Playing Games Jonathan Rubin and Ashwin Ram Palo Alto Research Center 3333 Coyote Hill Road, Palo Alto, CA 94304 USA Jonathan.Rubin@parc.com , Ashwin.Ram@parc.com Abstract. We describe an architecture, in its early stages of develop- ment, that processes user traces in the domain of computer role playing games and utilises the resulting traces in order to control the behaviour of characters within the environment. Behaviour execution is handled via an online case-based planner, which dynamically adapts plans given dissimilarities between the learning and testing environments. The over- all architecture is presented and we provide an example of applying the architecture to a 2D role playing game environment. We conclude with the future objectives of this work in progress. Our work builds heavily on previous research in the area of learning from demonstration and online case-based planning in real-time strategy games [1]. 1 Introduction In this paper, we detail the efforts of a work in progress in the area of learning from demonstration and case-based planning. We describe an architecture for performing online case-based planning within the domain of modern computer role-playing games. The overall purpose of the architecture we describe is to control game characters based on captured user traces. During demonstration a human expert controls a character within a virtual environment and a trace is recorded to capture their sequence of actions. The user traces gathered are combined with a real-time case-based planner, which results in the generation of similar strategies which can be used to influence the behaviour of autonomous agents within the environment. Our work builds on previous research efforts that have produced the Darmok [1, 2] and Darmok 2 [3, 4] systems. Darmok describes an architecture for perform- ing online case-based planning based on capturing expert user traces. Darmok has been shown to be successful in producing coherent strategies, especially in the domain of real-time strategy games (RTS). The work we present here differs from Darmok in that it describes an architecture for controlling computer char- acters in role playing games (RPG). While Darmok 2 was able to be used as a general game player, it was particularly suited for playing RTS type games. The eventual goal of our work is to construct a system that controls one (or more) helpful, non-player characters (NPCs), which are able to aid human players with their goals and objectives in the domain of RPG games. 193
While our work is heavily influenced by research that has been conducted within the domain of RTS games, there are several important differences that result, given the modified objectives and the differences that exist between RTS and RPG domains. 1. To begin with, RTS environments are adversarial, whereas RPGs may not necessarily be so. While RPG games may contain adversarial scenarios (which are required to be handled by the system) the overall objective typically has more to do with space exploration and the appropriate selection of sequences of actions. 2. RTS games require the coordination of a team of agents, typically with the objective to destroy an enemy team. On the other hand, RPG games place a larger focus on the actions and goals of more well defined individual char- acters that exist within the environment. 3. Actions within RPG games are typically instantaneous as opposed to du- rative (as in RTS games). As such, there is less of a focus on the parallel management of durative actions (as in RTS games) and more of a focus on the appropriate selection of sequences of actions and goals to pursue. We refer to our architecture (and the system it produces) as Komrad 1 and the next section provides a high-level overview of its design. 2 Komrad Overview Figure 1 displays a high level overview of the current system architecture that consists of a training phase, a real-time planner, as well as adaptation & repair strategies applied within a particular environment. Each of these sections are described in more detail. 2.1 Training During the initial training phase, a user is able to demonstrate behavior to the system by navigating and controlling a character within the current environment. An environment is composed of a collection of entities, together with a set of actions, which are able to be performed by the user in order to modify the current world state. During this initial training episode traces are captured, which record each action that was chosen by the user. In addition to the set of possible actions that a user can take within the en- vironment, a collection of goals are also specified that reflect more sophisticated milestones achieved by the user during their interaction with the environment. At present, all goals are required to be pre-specified and known beforehand. However, one of the eventual objectives of our research is to remove this as- sumption. The series of actions that result from recording a trace episode are processed into cases. Cases have the following representation: 1 Darmok backwards. 194
Fig. 1. Highlevel overview of the Komrad architecture. 195
C = ( W, G ( E ) , S ) Where, W , captures the current world state at the time goal, G ( E ), was achieved by the user and, S , is the sequence of actions or sub-goals that led to the achievement of the goal. The G ( E ) notation further highlights that each goal also specifies a single entity within the environment that it acts upon. Entities are described by a collection of attribute value pairs. The cases produced become the foundation for controlling the behaviour of an autonomous character that will reflect the style of play of the original expert who was used to capture the trace. 2.2 Real-Time Planner In order to control the behaviour of an autonomous agent the system architec- ture depicted in Figure 1 includes a real-time planner (top right). The planner within the current architecture functions by maintaining a goal stack and an action stack, where the actions currently present on the stack are required to be performed in order to achieve the goal at the top of the goal stack. At the start of a planning episode a single goal is placed onto the goal stack. During the episode the planner is continually queried for the next action it recommends. If there are currently no actions on the action stack, the goal at the top of the goal stack is decomposed into the sequence of actions and/or sub- goals that are required to achieve the particular goal (recall that this information was captured within a case in the case-base). In order to decompose the goal at the top of the stack, the case-base is searched for stored cases whose goals ( G ( E )) and world state ( W ) are similar to the current environment. Once an appropriate case has been found, the sequence of actions or sub-goals recorded by the retrieved case ( S ) are placed onto their appropriate stack within the planner. Goal decomposition continues until at least one action is present on the action stack, at which point the action at the top of the stack is returned by the planner. A goal is removed from the goal stack once all the actions required to be performed to achieve the goal have been popped off the action stack. One limitation of the current architecture is that goals can only be decomposed into a sequence of sub-goals or a sequence of actions, but not a mixture of the two, as this could result in obfuscating the order in which actions should be performed, according to the user traces. 2.3 Adaptation & Repair Once an action is retrieved from the action stack it is ready to be performed in the current environment. However, as the current environment is likely to be different from the environment initially encountered when gathering traces, 196
Recommend
More recommend