HLP-R Dialogue Production¶

The dialogue module of hlpr_dialogue_production can be used to integrate a dialogue state machine into your code. You can also run the standalone server in standalone_server.py and use the DialogueAct action (defined in this package) to send dialogue requests to the robot.

Writing Dialogue and Adding Behaviors¶

The dialogue production code is based on CoRDial TTS, from the USC Interaction Lab, which uses Amazon Polly for speech synthesis. You will need an AWS account in order to use Polly.

This code parses strings with simple <> tags for behaviors that should be synchronized to the speech. These tags take the form <behavior_name [arg1 arg2 … argn]> and will be synchronized to the beginning of the first word that folows the tag. Although a mechanism for adjusting the timing on a per-behavior basis is provided, this class is not recommended for cases where tight (within 300ms) synchronization is required, since lag in the ROS message system can affect the precise timing. In practice, this works well for most HRI contexts where behavior timing itself isn’t being studied. CoRDial TTS also automatically extracts visemes and passes them as behaviors to the dialogue system, although these are not currently handled for HLP-R.

Each behavior type takes a controller that provides an interface to an action server that executes the behavior. How the behaviors should be allocated among controllers depends on the interaction context and what clashes you expect to have between degrees of freedom, but the currently-provided controllers are as follows:

Lookat Controller: Moves the robot’s head to look at a point
- Behaviors:
  
  lookat
- Arguments:
  
  If 1 argument: the frame to look at
  
  If 4 arguments: a frame and position relative to that frame to look at
Gesture Controller: Moves the robot’s arm through predefined key frames
- Behaviors:
  
  wave
  
  shrug
  
  thinking
- Arguments:
  
  none
Test Controller: Plays a beep sound
- Behaviors:
  
  test
- Arguments:
  
  none

You can also provide a “phrase file” to the system, which is a text file where each line is a dialogue act with behaviors in angle brackets, beginning with a unique key in square brackets:

[ex1] <shrug>This is a phrase file <lookat ee_frame> entry!

The phrase file can be used to pre-load audio files using gen_phrases.py. Run gen_phrases.py --help for usage information.

Running the Stand-Alone Server¶

Before you run the server, you will need to have an AWS account and enable your computer to access Polly using Boto3. Some basic instructions can be found here.

Check out the “catkin” branch of CoRDial and place the cordial_tts package in the src folder of your catkin workspace
Make the messages and services by running catkin_make in your workspace root directory
Launch the HLP-R/Poli simulation, with the Lookat waypoints server and MoveIt! enabled
Launch the controllers and action server::

roslaunch hlpr_dialogue_production all_controllers.launch
Test the setup::

rosrun hlpr_dialogue_production test_action_client.py

You should hear speech through your computer speakers and see the robot move. For more information on the options to the stand-alone action server, run rosrun hlpr_dialogue_production standalone_server.py --help.

Adding Controllers¶

The currently-implemented controllers can be found in controllers.py. A ControllerState provides an interface to an ActionServer, providing a callback function that turns the string behavior name and string list of arguments into a goal for that server, and calling the server at the appropriate times relative to the speech audio. If you would like to implement a controller that you think many folks could use, add it to controllers.py by adding the following functions, substituting the name of your controller for [your]:

[your]_controller_cb(behavior_name, string_args)

get_[your]_controller()

You can then add it to the action server by adding it to the list starting on line 71 of standalone_server.py.

Warning

Make sure that neither the state name (given in all caps by smach convention) nor the callback names clash with any of the existing controllers!!

Code API¶

Controllers¶

Defines functions to create behavior controllers for dialogue production.

This module contains functions that create controllers for each of the behaviors that one might want to synchronize with robot speech. Currently, this includes looking at a point in space, executing a gesture with pre- defined waypoints, and a test controller that just plays a beeping sound (and can be used to check synchronization).

hlpr_dialogue_production.controllers.gesture_controller_cb(behavior_name, string_args)[source]¶

Callback to create a GestureGoal from string arguments

Given a string of arguments (pulled from the behavior tags in a speech string), selects the series of named keypoints that define the geature. These keypoints are currently defined in the Gesture Action Server (in gesture_action_server.py. To add more keypoints, see the documentation for that file.

Parameters:

behavior_name : str

The name of the behavior being handled (right now, this will always be “lookat”)

string_args : list of str

The arguments that were parsed from the speech string, as strings

Returns:

GestureGoal

Gesture goal containing the list of keypoints for the gesture

hlpr_dialogue_production.controllers.get_gesture_controller()[source]¶

Sets up the gesture controller state

Sets up the gesture controller state to connect to the gesture action server in this package. Adjusts the timing of the gesture behaviors by 1s; you may want to change this for your application.

hlpr_dialogue_production.controllers.get_lookat_controller()[source]¶

Sets up the lookat controller state

Sets up the lookat controller state to connect to the lookat_waypoints action server in hlpr_lookat. Does not adjust the timing of lookat behaviors.

hlpr_dialogue_production.controllers.get_test_controller()[source]¶

Sets up the test controller state

Sets up the test controller state to connect to the test action server in this package. Does not adjust the timing of behaviors.

hlpr_dialogue_production.controllers.lookat_controller_cb(behavior_name, string_args)[source]¶

Callback to create a LookatAction goal from string arguments

Given a string of arguments (pulled from the behavior tags in a speech string), determines whether the robot should look at the base of the frame or a point relative to the frame. Prints a warning message to the screen and returns None if the number of arguments is incorrect or they are of the wrong type, but does not check that the first argument is a valid frame (if it is not, the LookatWaypoints action server will return the error instead).

Parameters:

behavior_name : str

The name of the behavior being handled (right now, this will always be “lookat”)

string_args : list of str

The arguments that were parsed from the speech string, as strings

Returns:

LookatWaypointsGoal

Goal containing the one point to look at.

hlpr_dialogue_production.controllers.test_controller_cb(behavior_name, string_args)[source]¶

Callback to create a FibonacciGoal from string arguments

The test action server just uses the Fibonacci goal from the actionlib_tutorials package. The contents of the goal are not used.

Parameters:

behavior_name : str

The name of the behavior being handled (right now, this will always be “lookat”)

string_args : list of str

The arguments that were parsed from the speech string, as strings

Returns:

FibonacciGoal

Fibonacci goal with order=0; contents not used.

Main Smach Wrapper Class¶

class hlpr_dialogue_production.dialogue.SmachWrapper(use_tts, phrases=None, controllers=None, voice='Kimberly', debug=False)[source]¶

SmachWrapper to set up dialogue state machine

Given a set of controllers, this class will put together a state machine that allows simultaneous speech and behaviors. You can either run the state machine by calling standalone_start with the appropriate userdata, or you can call get_sm to get a reference to the internal state machine and use it as a state in a larger smach state machine. The input keys and outcomes of the state machine depend on the options passed to the constructor; they may include:

Input keys

key_or_marked_text : str

If a phrase file is provided and use_tts is true, or debug is true, provides either a key into the phrase file or marked-up text to say with online TTS
key : str

If a phrase file is provided and use_tts is false, provides a key into the phrase file.
marked_text : str

If no phrase file is provided and use_tts is true, contains marked-up text to say with online TTS
behaviors : list of dict

If no phrase file is provided and use_tts is false, or debug is true directly provides the behaviors for the robot to execute. A behavior has the keys “id”, “start”, and “args”.
wav_file_loc : list of dict

If no phrase file is provided and use_tts is false, or debug is true directly provides a path to an audio file for the robot to play

Outcomes

done

Successfully played the dialogue act. This is always a potential outcome of the state machine.
preempted

The state machine was preempted. This is always a potential outcome of the state machine.
not_found

If not using tts, indicates that the key was not found in the phrase file
missing_info

If neither using tts nor a phrase file, indicates that the robot was not provided with behaviors and an audio file in the input userdata

Methods

`get_active_states`()	Returns the currently-active states
`get_outcome`()	Returns the outcome of the state machine.
`get_sm`()	Returns the state machine
`is_running`()	Returns whether the state machine is currently running
`preempt`()	Preempt the state machine.
`reset`()	Reset the state machine
`standalone_start`([userdata_dict])	Runs the state machine on its own.

__init__(use_tts, phrases=None, controllers=None, voice='Kimberly', debug=False)[source]¶

Constructor

Creates a state machine according to the provided options.

Parameters:

use_tts : bool

Whether or not to use online TTS from Amazon Polly

phrases : str, optional

Path to the phrase file to use. Defaults to None.

controllers : list of ControllerState, optional

The set of controllers to use for speech. If not provided, the state machine will play audio but no behaviors.

voice : str, optional

Which Amazon Polly voice to use. Defaults to Kimberly

debug : bool, optional

If true, start in debug mode, with no audio or behaviors. Defaults to false.

get_active_states()[source]¶

Returns the currently-active states

If the machine is running, returns all currently-active states. Otherwise, returns None

Returns:

list of str

The names of the currently-active states

get_outcome()[source]¶

Returns the outcome of the state machine.

After the state machine has run to completion, this will return the outcome.

Returns:

str

The outcome of the state machine.

get_sm()[source]¶

Returns the state machine

Warning

Trying to run the state machine multiple times can have unexpected results. If you need to run it multiple times in various places in your code, use the standalone_start function.

Returns:

smach.StateMachine

A reference to the state machine constructed by this object.

is_running()[source]¶

Returns whether the state machine is currently running

Returns:

bool

Whether or not the state machine is currently running

preempt()[source]¶

Preempt the state machine.

Useful in the case where the state machine is running inside an action server

reset()[source]¶

Reset the state machine

Resets preemption in the state machine and resets the synchronizer.

standalone_start(userdata_dict={})[source]¶

Runs the state machine on its own.

This function resets all states in the state machine, sets the userdata using the provided dictionary (be sure it matches the input_keys for the options you have selected), and runs the state machine.

Parameters:

userdata_dict : dict

A dictionary of values to update the userdata. Which values need to be included depends on the options you provided when creating this object.

Smach States and Related Classes¶

class hlpr_dialogue_production.dialogue.Synchronizer[source]¶

Object that can be passed to all controllers to synchronize their start

A single Synchronizer object is passed to all controllers so that the speech state can signal when the audio has begun to play. This is needed because online text-to-speech has to download the audio from the internet before it can begin to play, leading to a variable start time for the speech audio.

Methods

`reset`()	Resets the state of the Synchronizer to not started
`start`()	Sets the state of the Synchronizer to started and saves the time

__init__()[source]¶: Creates a Synchronizer object

reset()[source]¶: Resets the state of the Synchronizer to not started

start()[source]¶: Sets the state of the Synchronizer to started and saves the time

class hlpr_dialogue_production.dialogue.ControllerState(name, behaviors, topic, action_type, arg_list_to_goal_cb, behavior_time_adj=None)[source]¶

Interface to action servers controlling robot behavior

A ControllerState is a smach.State that sends requests to a specific action server at the appropriate times.

Warning

After creating the object, you must call setup_sync to provide the state with a Synchronizer. The Synchronizer is not included in the constructor so that the object can be created in controllers.py or elsewhere in code and passed to the SmachWrapper state, which will then set up the Synchronizers and assemble the dialogue act state machine.

Input keys

ordered_behaviors : list of dict

An ordered list of all the behaviors the controller should play. A behavior is a dict with keys “start”, “id”, and “args”, indicating the start time (after speech begins), name of the behavior, and any arguments to the behavior, respectively.

Outcomes

preempted

The behavior was preempted before playing all the behaviors in the list to completion.
done

The controller played all of the behaviors in the list.

Methods

`execute`(userdata)
`get_registered_input_keys`()	Get a tuple of registered input keys.
`get_registered_outcomes`()	Get a list of registered outcomes.
`get_registered_output_keys`()	Get a tuple of registered output keys.
`preempt_requested`()	True if a preempt has been requested.
`recall_preempt`()	Sets preempt_requested to False
`register_input_keys`(keys)	Add keys to the set of keys from which this state may read.
`register_io_keys`(keys)	Add keys to the set of keys from which this state may read and write.
`register_outcomes`(new_outcomes)	Add outcomes to the outcome set.
`register_output_keys`(keys)	Add keys to the set of keys to which this state may write.
`request_preempt`()	Sets preempt_requested to True
`service_preempt`()	Sets preempt_requested to False
`setup_sync`(synchronizer)	Provides the controller with a Synchronizer object

__init__(name, behaviors, topic, action_type, arg_list_to_goal_cb, behavior_time_adj=None)[source]¶

Constructor

Creates the ControllerState with the given name, able to handle the given behaviors by sending the arguments to the arg_list_to_goal_cb with a time adjustment factor of behavior_time_adj (assigned on a per- behavior basis). By convention, the name should be in all caps, and must not conflict with any other names in the state machine containing this state. If using the SmachWrapper, the names “SPEECH”, “START”, “DIALOGUE” are taken.

Parameters:

name : str

The name of this state in the state machine

behaviors : list of str

The behaviors (from the speech string tags) that this controller can handle

topic : str

The topic for the action server that this controller calls

action_type : ROS Action Type

The type for the action server that this controller calls

arg_list_to_goal_cb : function

A function taking the name of the behavior and a list of arguments and returning a goal for the action of type action_type

behavior_time_adj : dict

A mapping from behavior names (from behaviors) to times (in s). The controller will send the action goal that much earlier. Negative values will cause the controller to call the action server later.

setup_sync(synchronizer)[source]¶

Provides the controller with a Synchronizer object

This provides the controller with information about when the speech audio has begun. This will only work if the controller and speech state share the same Synchronizer object.

Parameters:

synchronizer : Synchronizer

A synchronizer object. All controllers, including the speech state, should share the same synchronizer.

class hlpr_dialogue_production.dialogue.TTSSpeechStart(voice='Kimberly')[source]¶

Speech prep state when using TTS

State to process marked text to ordered behaviors, unmarked text, and an empty wav_file, that can be passed on to the Speech state.

Input keys

marked_text : str

Text marked up with <behavior tags>. Behaviors will be synced to the word following the tag.

Output keys

text : str

Text marked up with <behavior tags>. Behaviors will be synced to the word following the tag.
ordered_behaviors : list of dict

A list of behavior dictionaries. A behavior has the keys “id”, “start”, and “args”. Ordered by start time.
wav_file : str

Always None

Outcomes

done

Finished fetching behaviors

Methods

`execute`(userdata)
`get_registered_input_keys`()	Get a tuple of registered input keys.
`get_registered_outcomes`()	Get a list of registered outcomes.
`get_registered_output_keys`()	Get a tuple of registered output keys.
`preempt_requested`()	True if a preempt has been requested.
`recall_preempt`()	Sets preempt_requested to False
`register_input_keys`(keys)	Add keys to the set of keys from which this state may read.
`register_io_keys`(keys)	Add keys to the set of keys from which this state may read and write.
`register_outcomes`(new_outcomes)	Add outcomes to the outcome set.
`register_output_keys`(keys)	Add keys to the set of keys to which this state may write.
`request_preempt`()	Sets preempt_requested to True
`service_preempt`()	Sets preempt_requested to False

__init__(voice='Kimberly')[source]¶

Constructor

Initializes TTS for speech using Amazon Polly with the given voice

Parameters:

voice : str, optional

Which Amazon Polly voice to use. Defaults to Kimberly

class hlpr_dialogue_production.dialogue.TTSFallbackSpeechStart(phrases, voice='Kimberly')[source]¶

Speech prep from file with online TTS fallback

State to read text, wave file, and ordered behaviors from a phrase file, given a key into that file. If the key isn’t found, uses online TTS using Amazon Polly. This is useful for debugging or if there are a small number of phrases that can’t be known in advance.

Input keys

key_or_marked_text : str

Either a key into the phrase file or text marked up with <behavior tags>. The state tries to read from the phrase file with this string. If it’s not found, assume it is marked up text and generate the audio with TTS

Output keys

text : str

Text marked up with <behavior tags>. Behaviors will be synced to the word following the tag.
ordered_behaviors : list of dict

A list of behavior dictionaries. A behavior has the keys “id”, “start”, and “args”. Ordered by start time.
wav_file : str

None if the audio needs to be fetched, otherwise the path to the audio file.

Outcomes

done

Finished fetching behaviors

Methods

`execute`(userdata)
`get_registered_input_keys`()	Get a tuple of registered input keys.
`get_registered_outcomes`()	Get a list of registered outcomes.
`get_registered_output_keys`()	Get a tuple of registered output keys.
`preempt_requested`()	True if a preempt has been requested.
`recall_preempt`()	Sets preempt_requested to False
`register_input_keys`(keys)	Add keys to the set of keys from which this state may read.
`register_io_keys`(keys)	Add keys to the set of keys from which this state may read and write.
`register_outcomes`(new_outcomes)	Add outcomes to the outcome set.
`register_output_keys`(keys)	Add keys to the set of keys to which this state may write.
`request_preempt`()	Sets preempt_requested to True
`service_preempt`()	Sets preempt_requested to False

__init__(phrases, voice='Kimberly')[source]¶

Constructor

Initializes TTS for speech using Amazon Polly with the given voice, and reads in the phrase file.

Parameters:

voice : str, optional

Which Amazon Polly voice to use. Defaults to Kimberly

phrases : str

Path to the phrase file to use.

class hlpr_dialogue_production.dialogue.FileSpeechStart(phrases)[source]¶

Speech prep from file

State to read text, wave file, and ordered behaviors from a phrase file, given a key into that file.

Input keys

key : str

A key into the phrase file

Output keys

text : str

If the text is included in the phrase file, the text, otherwise None
ordered_behaviors : list of dict

A list of behavior dictionaries. A behavior has the keys “id”, “start”, and “args”. Ordered by start time.
wav_file : str

The path to the audio file.

Outcomes

done

Finished fetching behaviors
not_found

Unable to find the key in the phrase file

Methods

`execute`(userdata)
`get_registered_input_keys`()	Get a tuple of registered input keys.
`get_registered_outcomes`()	Get a list of registered outcomes.
`get_registered_output_keys`()	Get a tuple of registered output keys.
`preempt_requested`()	True if a preempt has been requested.
`recall_preempt`()	Sets preempt_requested to False
`register_input_keys`(keys)	Add keys to the set of keys from which this state may read.
`register_io_keys`(keys)	Add keys to the set of keys from which this state may read and write.
`register_outcomes`(new_outcomes)	Add outcomes to the outcome set.
`register_output_keys`(keys)	Add keys to the set of keys to which this state may write.
`request_preempt`()	Sets preempt_requested to True
`service_preempt`()	Sets preempt_requested to False

__init__(phrases)[source]¶

Constructor

Initializes TTS for speech using a phrase file.

Parameters:

phrases : str

Path to the phrase file to use.

class hlpr_dialogue_production.dialogue.NoPrepSpeechStart[source]¶

Pass through speech info without prep

State for the case where the dialogue state machine will be given text, behaviors (not necessarily ordered), and a path to a wave file in the userdata.

Input keys

behaviors : list of dict

A list of behavior dictionaries. A behavior has the keys “id”,

“start”, and “args”.
wav_file_loc : str

The path to the audio file to play

Output keys

text : str

Always None
ordered_behaviors : list of dict

A list of behavior dictionaries. A behavior has the keys “id”, “start”, and “args”. Ordered by start time.
wav_file : str

The path to the audio file.

Outcomes

done

Finished fetching behaviors
missing_info

Missing information in the input keys

Methods

`execute`(userdata)
`get_registered_input_keys`()	Get a tuple of registered input keys.
`get_registered_outcomes`()	Get a list of registered outcomes.
`get_registered_output_keys`()	Get a tuple of registered output keys.
`preempt_requested`()	True if a preempt has been requested.
`recall_preempt`()	Sets preempt_requested to False
`register_input_keys`(keys)	Add keys to the set of keys from which this state may read.
`register_io_keys`(keys)	Add keys to the set of keys from which this state may read and write.
`register_outcomes`(new_outcomes)	Add outcomes to the outcome set.
`register_output_keys`(keys)	Add keys to the set of keys to which this state may write.
`request_preempt`()	Sets preempt_requested to True
`service_preempt`()	Sets preempt_requested to False

__init__()[source]¶

Constructor

Initializes passthrough state for speech features.

class hlpr_dialogue_production.dialogue.SpeechState(use_tts, synchronizer, phrases=None, voice='Kimberly')[source]¶

Speech player state

Takes in text and/or a wave file. Given a wave file, plays the file. If no wave file is provided and text-to-speech is on, fetch the audio from Amazon Polly and play the audio. If no wave file is provided and text-to- speech is off, return “no_audio”

Input keys

text : str

Can be None; the text version of the robot’s speech. Used to print to the screen.
wav_file : str

Can be None if TTS is on. The path to the audio file.

Outcomes

done

Finished fetching behaviors
no_audio

Couldn’t find the wave file, or no wave file provided and TTS turned off
preempted

State was preempted before audio finished playing. If preempted, will try to stop the audio.

Methods

`execute`(userdata)
`get_registered_input_keys`()	Get a tuple of registered input keys.
`get_registered_outcomes`()	Get a list of registered outcomes.
`get_registered_output_keys`()	Get a tuple of registered output keys.
`preempt_requested`()	True if a preempt has been requested.
`recall_preempt`()	Sets preempt_requested to False
`register_input_keys`(keys)	Add keys to the set of keys from which this state may read.
`register_io_keys`(keys)	Add keys to the set of keys from which this state may read and write.
`register_outcomes`(new_outcomes)	Add outcomes to the outcome set.
`register_output_keys`(keys)	Add keys to the set of keys to which this state may write.
`request_preempt`()	Sets preempt_requested to True
`service_preempt`()	Sets preempt_requested to False

__init__(use_tts, synchronizer, phrases=None, voice='Kimberly')[source]¶

Constructor

Initializes the speech state with the desired parameters; either with or without online TTS, and with or without a pre-generated phrase file.

Parameters:

use_tts : bool

If true, allow online TTS

synchronizer : Synchronizer

Synchronizer object to allow sync of speech and behaviors. Should be the same object as is passed to behavior ControllerState objects with setup_sync

voice : str, optional

Which Amazon Polly voice to use. Defaults to Kimberly

phrases : str, optional

Path to the phrase file to use. If None, require online TTS.

HLP-R Dialogue Production¶

Writing Dialogue and Adding Behaviors¶

Running the Stand-Alone Server¶

Adding Controllers¶

Code API¶

Controllers¶

Main Smach Wrapper Class¶

Table Of Contents

This Page