HLP-R Dialogue Production

The dialogue module of hlpr_dialogue_production can be used to integrate a dialogue state machine into your code. You can also run the standalone server in standalone_server.py and use the DialogueAct action (defined in this package) to send dialogue requests to the robot.

Writing Dialogue and Adding Behaviors

The dialogue production code is based on CoRDial TTS, from the USC Interaction Lab, which uses Amazon Polly for speech synthesis. You will need an AWS account in order to use Polly.

This code parses strings with simple <> tags for behaviors that should be synchronized to the speech. These tags take the form <behavior_name [arg1 arg2 … argn]> and will be synchronized to the beginning of the first word that folows the tag. Although a mechanism for adjusting the timing on a per-behavior basis is provided, this class is not recommended for cases where tight (within 300ms) synchronization is required, since lag in the ROS message system can affect the precise timing. In practice, this works well for most HRI contexts where behavior timing itself isn’t being studied. CoRDial TTS also automatically extracts visemes and passes them as behaviors to the dialogue system, although these are not currently handled for HLP-R.

Each behavior type takes a controller that provides an interface to an action server that executes the behavior. How the behaviors should be allocated among controllers depends on the interaction context and what clashes you expect to have between degrees of freedom, but the currently-provided controllers are as follows:
  • Lookat Controller: Moves the robot’s head to look at a point
    • Behaviors:
      • lookat
    • Arguments:
      • If 1 argument: the frame to look at
      • If 4 arguments: a frame and position relative to that frame to look at
  • Gesture Controller: Moves the robot’s arm through predefined key frames
    • Behaviors:
      • wave
      • shrug
      • thinking
    • Arguments:
      • none
  • Test Controller: Plays a beep sound
    • Behaviors:
      • test
    • Arguments:
      • none

You can also provide a “phrase file” to the system, which is a text file where each line is a dialogue act with behaviors in angle brackets, beginning with a unique key in square brackets:

[ex1] <shrug>This is a phrase file <lookat ee_frame> entry!

The phrase file can be used to pre-load audio files using gen_phrases.py. Run gen_phrases.py --help for usage information.

Running the Stand-Alone Server

Before you run the server, you will need to have an AWS account and enable your computer to access Polly using Boto3. Some basic instructions can be found here.

  1. Check out the “catkin” branch of CoRDial and place the cordial_tts package in the src folder of your catkin workspace
  2. Make the messages and services by running catkin_make in your workspace root directory
  3. Launch the HLP-R/Poli simulation, with the Lookat waypoints server and MoveIt! enabled
  4. Launch the controllers and action server::
    roslaunch hlpr_dialogue_production all_controllers.launch
  5. Test the setup::
    rosrun hlpr_dialogue_production test_action_client.py

You should hear speech through your computer speakers and see the robot move. For more information on the options to the stand-alone action server, run rosrun hlpr_dialogue_production standalone_server.py --help.

Adding Controllers

The currently-implemented controllers can be found in controllers.py. A ControllerState provides an interface to an ActionServer, providing a callback function that turns the string behavior name and string list of arguments into a goal for that server, and calling the server at the appropriate times relative to the speech audio. If you would like to implement a controller that you think many folks could use, add it to controllers.py by adding the following functions, substituting the name of your controller for [your]:

[your]_controller_cb(behavior_name, string_args)

get_[your]_controller()

You can then add it to the action server by adding it to the list starting on line 71 of standalone_server.py.

Warning

Make sure that neither the state name (given in all caps by smach convention) nor the callback names clash with any of the existing controllers!!

Code API

Controllers

Defines functions to create behavior controllers for dialogue production.

This module contains functions that create controllers for each of the behaviors that one might want to synchronize with robot speech. Currently, this includes looking at a point in space, executing a gesture with pre- defined waypoints, and a test controller that just plays a beeping sound (and can be used to check synchronization).

hlpr_dialogue_production.controllers.gesture_controller_cb(behavior_name, string_args)[source]

Callback to create a GestureGoal from string arguments

Given a string of arguments (pulled from the behavior tags in a speech string), selects the series of named keypoints that define the geature. These keypoints are currently defined in the Gesture Action Server (in gesture_action_server.py. To add more keypoints, see the documentation for that file.

Parameters:

behavior_name : str

The name of the behavior being handled (right now, this will always be “lookat”)

string_args : list of str

The arguments that were parsed from the speech string, as strings

Returns:

GestureGoal

Gesture goal containing the list of keypoints for the gesture

hlpr_dialogue_production.controllers.get_gesture_controller()[source]

Sets up the gesture controller state

Sets up the gesture controller state to connect to the gesture action server in this package. Adjusts the timing of the gesture behaviors by 1s; you may want to change this for your application.

hlpr_dialogue_production.controllers.get_lookat_controller()[source]

Sets up the lookat controller state

Sets up the lookat controller state to connect to the lookat_waypoints action server in hlpr_lookat. Does not adjust the timing of lookat behaviors.

hlpr_dialogue_production.controllers.get_test_controller()[source]

Sets up the test controller state

Sets up the test controller state to connect to the test action server in this package. Does not adjust the timing of behaviors.

hlpr_dialogue_production.controllers.lookat_controller_cb(behavior_name, string_args)[source]

Callback to create a LookatAction goal from string arguments

Given a string of arguments (pulled from the behavior tags in a speech string), determines whether the robot should look at the base of the frame or a point relative to the frame. Prints a warning message to the screen and returns None if the number of arguments is incorrect or they are of the wrong type, but does not check that the first argument is a valid frame (if it is not, the LookatWaypoints action server will return the error instead).

Parameters:

behavior_name : str

The name of the behavior being handled (right now, this will always be “lookat”)

string_args : list of str

The arguments that were parsed from the speech string, as strings

Returns:

LookatWaypointsGoal

Goal containing the one point to look at.

hlpr_dialogue_production.controllers.test_controller_cb(behavior_name, string_args)[source]

Callback to create a FibonacciGoal from string arguments

The test action server just uses the Fibonacci goal from the actionlib_tutorials package. The contents of the goal are not used.

Parameters:

behavior_name : str

The name of the behavior being handled (right now, this will always be “lookat”)

string_args : list of str

The arguments that were parsed from the speech string, as strings

Returns:

FibonacciGoal

Fibonacci goal with order=0; contents not used.

Main Smach Wrapper Class

class hlpr_dialogue_production.dialogue.SmachWrapper(use_tts, phrases=None, controllers=None, voice='Kimberly', debug=False)[source]

SmachWrapper to set up dialogue state machine

Given a set of controllers, this class will put together a state machine that allows simultaneous speech and behaviors. You can either run the state machine by calling standalone_start with the appropriate userdata, or you can call get_sm to get a reference to the internal state machine and use it as a state in a larger smach state machine. The input keys and outcomes of the state machine depend on the options passed to the constructor; they may include:

Input keys
  • key_or_marked_text : str
    If a phrase file is provided and use_tts is true, or debug is true, provides either a key into the phrase file or marked-up text to say with online TTS
  • key : str
    If a phrase file is provided and use_tts is false, provides a key into the phrase file.
  • marked_text : str
    If no phrase file is provided and use_tts is true, contains marked-up text to say with online TTS
  • behaviors : list of dict
    If no phrase file is provided and use_tts is false, or debug is true directly provides the behaviors for the robot to execute. A behavior has the keys “id”, “start”, and “args”.
  • wav_file_loc : list of dict
    If no phrase file is provided and use_tts is false, or debug is true directly provides a path to an audio file for the robot to play
Outcomes
  • done
    Successfully played the dialogue act. This is always a potential outcome of the state machine.
  • preempted
    The state machine was preempted. This is always a potential outcome of the state machine.
  • not_found
    If not using tts, indicates that the key was not found in the phrase file
  • missing_info
    If neither using tts nor a phrase file, indicates that the robot was not provided with behaviors and an audio file in the input userdata

Methods

get_active_states() Returns the currently-active states
get_outcome() Returns the outcome of the state machine.
get_sm() Returns the state machine
is_running() Returns whether the state machine is currently running
preempt() Preempt the state machine.
reset() Reset the state machine
standalone_start([userdata_dict]) Runs the state machine on its own.
__init__(use_tts, phrases=None, controllers=None, voice='Kimberly', debug=False)[source]

Constructor

Creates a state machine according to the provided options.

Parameters:

use_tts : bool

Whether or not to use online TTS from Amazon Polly

phrases : str, optional

Path to the phrase file to use. Defaults to None.

controllers : list of ControllerState, optional

The set of controllers to use for speech. If not provided, the state machine will play audio but no behaviors.

voice : str, optional

Which Amazon Polly voice to use. Defaults to Kimberly

debug : bool, optional

If true, start in debug mode, with no audio or behaviors. Defaults to false.

get_active_states()[source]

Returns the currently-active states

If the machine is running, returns all currently-active states. Otherwise, returns None

Returns:

list of str

The names of the currently-active states

get_outcome()[source]

Returns the outcome of the state machine.

After the state machine has run to completion, this will return the outcome.

Returns:

str

The outcome of the state machine.

get_sm()[source]

Returns the state machine

Warning

Trying to run the state machine multiple times can have unexpected results. If you need to run it multiple times in various places in your code, use the standalone_start function.

Returns:

smach.StateMachine

A reference to the state machine constructed by this object.

is_running()[source]

Returns whether the state machine is currently running

Returns:

bool

Whether or not the state machine is currently running

preempt()[source]

Preempt the state machine.

Useful in the case where the state machine is running inside an action server

reset()[source]

Reset the state machine

Resets preemption in the state machine and resets the synchronizer.

standalone_start(userdata_dict={})[source]

Runs the state machine on its own.

This function resets all states in the state machine, sets the userdata using the provided dictionary (be sure it matches the input_keys for the options you have selected), and runs the state machine.

Parameters:

userdata_dict : dict

A dictionary of values to update the userdata. Which values need to be included depends on the options you provided when creating this object.