Joseph Scheuhammer, Anastasia Cheetham, and Allen Forsyth.
This is a technical document that describes the architecture of a pluggable audio look and feel using Swing.
Some prelimimaries: First, the audio look and feel is an auxiliary look and feel in the sense that it does not replace the graphical user interface. Indeed, the visual and auditory interfaces run concurrently and cooperatively. Secondly, it is irrelevant to the audio interface as to which particular visual user interface is "in effect", be it Metal, Windows, Motif, or Macintosh. That is up to the user and/or application developer. In other words, the audio look and feel works regardless of the visual user interface.
What, precisely, does it mean to have an auditory "look" and "feel"? With respect to the "look", Swing components are presented using a sequence of speech and non-speech audio. In terms of user control (the "feel"), the components are manipulated solely via the keyboard. Note that this does not preclude their manipulation by the mouse; however, mouse control is the province of the visual look and feel. No mouse control is defined for the audio look and feel, although appropriate audio feedback is provided when mouse control alters the state of a component. Thus, the audio look and feel is more precisely described as an auditory "look" and keyboard "feel".
The architecture of the audio look and feel has two main parts. The first
is a mechanism that is responsible for generating audio feedback. The
ReportGenerator
assembles a sequence of speech and non-speech
audio appropriate to a context, and that sequence is then passed to an audio
interface for presentation.
The second aspect of the system is that it is event driven. The type of sequence generated is determined by an event, and, more importantly, a specific type of event.
For example, suppose a user has pressed the key to uncheck a check box, and
that the check box has responded accordingly. The user is notified of the new
state via an auditory sequence. To the user, it will appear as if the feedback
was an immediate consequence of their key stroke. While that is the ultimate
cause, the connection between the key stroke and the auditory feedback is less
direct. Internally, the keystroke will eventuate in a change in the check box
state; the check box, in turn, emits an ItemEvent
. It is that
event that is the immediate cause of the audio feedback. The is accomplished by
making the
ReportGenerator
an ItemListener
, and adding it as
such to the check box in question. It is this event/listener combination that
cues the auditory feedback.
That the feedback is event-dependent instead of key stroke-dependent, has an important consequence. It means that feedback occurs in response to the appropriate event, regardless of user input. In other words, if some internal processing causes a component to change state, and that change entails the component's emitting the appropriate event, then audio feedback is output. Users are notified of the changes even if they were not directly responsible for them. For example, navigation feedback is typically output when focus is transferred to a component and that is generally the result of the user pressing some key. However, if the focus is gained by some means other than a key stroke, then the same auditory feedback will occur.
In terms of Swing's pluggable look and feel, audio feedback is realized by
creating a class that extends the basic functionality of the ReportGenerator
,
and implements the relevant listener interfaces. A reporter/listener instance
is then added to a component when its user interface is installed.
In the next section, the report generator aspect of the system is described in more detail.
To a first approximation, two kinds of feedback are generated. One is termed "navigation", and the other, "activation".
Navigation means that the user has "directed their gaze" at a specific component, and desires to know about it. They are inspecting the component, looking to see what it is and what it can do. There is no intent to alter that component; rather, they merely desire to know how it is currently configured.
On the other hand, an activation report is generated in response to some action that does alter the component. This occurs when users want to change, or manipulate, the component in question Buttons can be pushed; check boxes checked or unchecked, and menu items selected. All these interactions with a component are grouped under the rubric of "activation".
Navigation reporting itself comes in a number of flavours. The most common of these occurs when the user navigates to the component for the first time. In this case, the feedback conveys a sense of movement, and provides a description of the component to which one has just moved.
A second kind of navigation report occurs when the users have forgotten precisely where they are, and desire a re-cap of that information. In essence, they are asking "where am I?".
Also, there is often a tool tip associated with a component, and users can request that information. In a similar vein, there may be other "extra" information associated with the component, which users can ask for at their leisure. An example of this extra information is a hot key on a menu item. An important point about these latter two kinds of information is that they are not offered spontaneously to users when they navigate to a component. The rationale is that doing so makes the feedback relatively long. Instead, the audio look and feel regards this kind of information as an aside that is available, but only upon request.
The "navigation-to" feedback sequence occurs after the user has entered a key stroke to move to a new component. The purpose of the sequence is to confirm the move and to describe where it is the user has "landed".
Note: The sequence is configured such that the sound effects occur before the speeches. The reason is so the system can be sensitive to users' expertise with the interface. As "novices", users want to hear the entire sequence. As they learn the meaning of the sound effects, those become sufficient in describing the state of affairs. As "experts", users no longer require all of the speeches and come to rely on the sound effects. Although currently not implemented, there will be both a provision to cancel the auditory feedback via a key stroke, and the ability to set a preference that only essential speeches (e.g., name of component) be output.
Here are three concrete examples of the navigation report for JCheckBox
,
for
JMenuItem
, and for JList
.
Commentary: Note that a lot of the navigation report is "missing" for menu items -- there is no role/state sound effect, nor a role or state speech. The rationale is that once in a menu, users know where they are and that they are navigating from menu item to menu item. They do not need to hear repeatedly that they have just moved to a new menu item.
There are, in fact, two versions of "navigation-to" feedback sequences for lists. The type generated depends on whether the user has navigated to the list for the first time versus navigating among its items.
A user interface must allow that users are not always concentrating solely on the components that they are interacting with. They may be composing a document and thinking predominantly about the best way to express an idea; or the phone may ring and they must deal with that distraction. Whatever the cause, there will occur situations where users will not be sure of what they were doing before they were distracted and want a statement as to where they are within the user interface. In order to handle this situation, the audio look and feel provides a key stroke to elicit a "where-am-I?" report.
The "where-am-I?" report is, to a first approximation, identical
to the "navigation-to" report. The main difference is the lack of the
initial movement sound effect. In addition, in some cases, a larger context is
provided wherein the relevant parent of the component is also stated. How,
exactly, a "where-am-I?" report is configured is determined by the
component at hand. As examples, here are the "where-am-I?" feedback
sequences for JCheckBox
,
JMenuItem
, and JList
Extra information is typically short cut (a.k.a. "mnemonic") keys
and hot keys. If no such keystrokes are available, then nothing is reported.
For components where short cut and hot keys do not make sense, some other aspect
of their current state is provided. For example, the extra information for
JList
is a spoken list of its currently selected items. Note that
this is information not provided by the "where am I?"
report.
The tool tip report is simply the text of the component's tool tip spoken. If their is no tool tip, then the speech "no tool tip" is spoken.
The activation report is an auditory sequence that indicates what has just been activated or manipulated. There are two kinds of activation report. The first is a general sequence that is used for a specific type of component. It is general in the sense that is it used for all components of that type. Thus, for example, there is a general activation sequence for menu items.
The second kind of activation report has the same form, but is specialized for specific kinds of activation. It is action-centric rather than component-centric. Hence, while there is a generic activation report for menu items, there are specific report sequences for menu items that are common to all applications. An example of such a menu item is that used to create new documents, namely the "New..." item in the "File" menu. The audio look and feel defines an activation report specifically for the action of such document creation. This second kind of activation report is termed the "canonical activation report".
Here are the general activation reports for JCheckBox
,
JMenuItem
, and JList
.
The manipulation of a JList
involves altering its selection.
As noted above with respect to navigation, the audio look and feel maintains it
own notion of the current item in a JList
. This tracking is done
without altering the list's selection set. The "activation" of the
list provides users with the ability to add or remove the current item from the
selection set. The activation report provides feedback with respect to this
manipulation:
The canonical activation report has the same form as the general activation report. The audio look and feel makes special provision to capture functions or tasks that are common to many applications. Examples of such functions include "new", "open", "save", "quit", "cut", "copy", and "paste".
Note that such functions are not tied to any specific user interface element. For example, although the "Cut" action is typically found in the "Edit" menu, it is also frequently associated with a tool bar button. Likewise, the cut action's canonical activation report is, strictly speaking, not tied to any specific component. Instead, there is a separate "canon" of activation reports. When a report is required by the activation of a component, it first consults the canon, and if a match is found the appropriate canonical activation report is generated. If no canonical report is to be had, it is the component's responsibility to provide an appropriate, more general, activation report.
As an example, here is the canonical activation report for "cut". Note that the activation sound effect is specific to the cutting action. In a general activation report, the sound effect would be that associated with activating the component -- a button sounds pressed, or a menu item sounds selected. Also, the spoken name is the name of the action, not the label of the component (although, it is likely that the two will be the same).
Such is the structure and type of auditory feedback generated by the ReportGenerator
class. By itself, nothing would be "displayed" in the auditory
modality; something more is needed to cause the ReportGenerator
to generate a report. The audio look and feel accomplishes this by having the
ReportGenerator
listen for specific kinds of events, and use those
events to cue the relevant feedback sequence.
For example, users typically navigate to a component by moving keyboard
focus to that component. When the component receives focus, it emits a "focus-gained"
event. By listening for this event, the ReportGenerator
can cue a
"navigation-to" report.
Which event to listen for depends on the component itself, the kinds of
events for which it is a source, and the conditions under which it emits those
events. For example, there is no guarantee that a Swing component will emit a
FocusEvent
when, from the users' point of view, they have achieved
keyboard focus. Menu items are such components -- they are not FocusEvent
sources. In Swing, to navigate among menu items, keyboard focus is maintained
on the parent menu. As one moves among its menu items, the item emits a ChangeEvent
(to be precise, it is the menu item's model that notifies its listeners of the
change). Thus, to cue "navigation-to" reports among menu items, the
menu item audio user interface listens for these ChangeEvent
's.
It is somewhat of an art to configure a component's ReportGenerator
in the appropriate manner. The component, and sometimes its model, needs to be
studied to determine the events it notifies listeners of, and under what
conditions it does so notify.
To give the reader a feel for this process, consider some of the example
sequences presented above. A JCheckBox
's "navigation-to"
report is cued via a "focus gained" event. It's activation report,
namely whether is was just checked or unchecked, is cued by an ActionEvent
.
A JList
that has just received focus emits a "focus
gained" event, cueing the first version of its "navigation-to"
sequence. The navigation of the list's items is cued by either listening for
ListSelectionEvent
's, or by a registered keystroke/action pair.
The latter is a case where the audio user interface has installed a specific key
stroke as a way of navigating the list and also forcing audio feedback.
In terms of maniuplation or activation reporting, when the user adds or
removes list items from the JList
's selection set, that results in
the JList
notifying its listeners of a ListSelectionEvent
.
Once the appropriate event set has been identified, the audio user interface
is implemented in the following way. Two classes are defined: the AudioXxxUI
,
and the AudioXxxListener
. The "Xxx" stands for the
component in question; for example, the two classes for a JCheckBox
are AudioCheckBoxUI
, and AudioCheckBoxListener
.
The AudioXxxUI
extends the com.sun.java.swing.plaf.XxxUI
object, and implements the requisite methods. In particular, since this is an
audio look and feel, it defines no-op's for the paint()
and
update()
methods. It also instantiates an AudioXxxListener
object and installs it as a listener on the component for whom the UI is
intended. During the UI's uninstall, the AudioXxxUI
removes the
AudioXxxListener
from the component.
The AudioXxxListener
extends ReportGenerator
,
and implements the identified set of event listeners. The listener methods are
a means of routing control to the appropriate report generation method, which
constructs and returns an audio sequence. Upon receiving the sequence, the
listener methods relays it to the audio interface for its presentation.
To summarize, the audio look and feel implements a device that generates sequences of speech and non-speech audio. This device is sensitive to the events that the various Swing component emit. An instance of this device is attached to an instance of a Swing component when its UI is installed; and, as that component changes state and broadcasts those changes to its listeners, the device generates an appropriate audio report.
Note1: need to put the fact that the audio (sound effects and speeches) are accessed symbolically through a ".properties" file. Thus, the actual audio that is played is separate from the code, and is "pluggable" or "configurable". Part of the "Audio Interface" section?
Note2: need to say something about the "browsing within a context", or "fast search" user interface technique for components that have a set of subcomponents (menus, lists, tabbed panes, etc.). Where to put?
The preceding describes how the audio look and feel generates reports for most Swing components. However, there is a text package within Swing that has features over and above other components. For this reason, the architecture of the audio look and feel for text is somewhat more involved. This section describes the extensions of the report generating system as it applies specifically to text.
The text package does not quite fit the navigation/activation model used for other components. Navigation to a text component is basically the same, but once a text component has focus, the idea of 'activating' it becomes inappropriate.
Once a text component has focus, the user can invoke any one of a number of text actions. These are actions which modify the text in the component, such as inserting or deleting text, or applying attributes such as bold, or italic. Swing provides these actions through a number of editor kits, collections of text actions which can be used by text components. Typically, an application will map these actions to buttons, menu items, or keystrokes. For example, the apply-bold-attribute action may be mapped to a button in a toolbar, or to the keystroke Ctrl+B.
The Audio Look and Feel follows this editor kit model. Each of the editor kits defined by Swing has an audio equivalent, which defines an AudioTextAction corresponding to a particular TextAction.
To provide audio feedback for text actions, the Audio Look and Feel uses a second report generator: the TextActionReportGenerator, which is an interface implemented by the AudioTextAction, the base class for all audio text actions.
The Audio Look and Feel provides keyboard access to all of the text actions (applications will not necessarily implement keystrokes for all actions). However, only one action can be mapped to a given keystroke. Because of this, the audio action designed to provide audio feedback for a given text action cannot be mapped to the appropriate keystroke without overriding the given text action. To deal with this, the audio actions are designed to invoke their original action in order to carry out the actual action; then the audio action provides appropriate audio feedback.
Existing Swing editor kit actions are typically actions which modify text, such as inserting new text, or applying attributes to existing text. The TextActionReportGenerator generates audio feedback for these actions.
Text actions specific to the Audio Look and Feel are akin to 'where-am-I' actions, but pertain to where the user is within the contents of the text component. They are typically requests for the Audio Look and Feel to speak portions of the text, such as the current word, or the previous sentence. The TextActionReportGenerator is used to generate this speech.
Here are three concrete examples, one for backspace, one for bold, and another for speak-current-sentence.
Swing components generate audio feedback as described above. Given such a feedback sequence, how is it actually played? This next section describes the audio interface aspect of the system.
The audio component is a term applied to those portions of the system which produce the sounds and speech used by the audio look and feel. The audio component is multi-threaded and consists of the following major classes:
The audio component makes use of a number of Threads, both local and non-local to itself:
A Report is a collection of instances of class SoundThing. A SoundThing represents either a file to be played or a piece of text to be spoken and includes methods to communicate with the lower-level objects which actually handle the production of sound or speech. Associated with each SoundThing is a time to wait before producing the next sound/speech and an optional "voice" to use when speaking.
The Report class makes extensive use of the plaf properties file to enable a "level of indirection" in the creation of a Report. When a Report is created by the ReportGenerator component, the Report's constructor can utilize symbols from the properties file to represent sound files or strings to be spoken. These symbols will be replaced by the symbols' values when the constructor is executed. Since the properties file is editable, the end user will be able to tailor the look and feel to suit her own preferences, perhaps choosing different sounds (or none at all) or changing the words or phrases that the program will speak.
There are two types of Reports: Reports which wait until all previously submitted Reports have completed and Reports which interrupt any currently running Reports. The ReportGenerator component specifies the type of Report at construction time and queues the Report for execution by invoking the doReport method of atrcuof.plaf.audio.AudioInterface.
Each Report is assigned to an instance of class ReportProcessor which is responsible for starting, stopping, and monitoring the Report's execution. By having a seperate ReportProcessor assigned to each Report, as yet un-implemented features such as pausing and resuming Reports and continuous playing of a sound file can be easily performed.
Each ReportProcessor has a Thread associated with it. The synchronized run method begins with a wait(). Class AudioInterface$RPManager assigns a Report to a ReportProcessor and then issues a notify() call to awaken the ReportProcessor which then enters a loop in which each SoundThing in the Report has its playSound() method invoked. This method in turn invokes the lower-level objects to start the sound file playing or begin speaking the word or phrase. Since both speech and sound complete asynchronously, the ReportProcessor issues another wait() call and sleeps until it is awakened by the low-level object upon completion of the sound or speech. After all the SoundThings in the Report have completed their work, the ReportProcessor again enters its top-of-loop wait().
Another significant method in ReportProcessor is cancelRunningReport. This synchronized method is invoked by AudioInterface$RPManager when an interrupt Report has been submitted for processing. cancelRunningReport sets a switch to tell run() not to start any more SoundThings once the current one completes and then invokes the stopSound() method of the currently running SoundThing.
AudioInterface$RPManager is an inner class of AudioInterface that is responsible for managing the ReportProcessors. Other components communicate with it by placing requests on a queue and issuing a notify() call to AudioInterface$RPManager's event handler thread to tell it that a request is available for processing.
AudioInterface$RPManager performs its functions by operating on a set of queues. These queues are:
The AudioInterface$RPManager's event handler thread operates in a loop in which it wait()s for notification that a request is available for processing. Upon awakening, it removes the first request from the queue and processes it. It continues processing requests until the queue is empty whereupon it again wait()s for request notification.
The following requests are processed by the AudioInterface$RPManager's event handler thread:
Each ReportProcessor on RunningProcessors queue is removed from the queue and its cancelRunningReport method is invoked. The ReportProcessor is then moved to the transitionProcessors queue. An available ReportProcessor is acquired from the availableProcessors queue and the Report is assigned to it. This ReportProcessor is placed on the runningProcessors queue and is then started. Finally, any ReportProcessors on the runnableProcessors queue are removed from that queue and added to the availableProcessors queue.
An available ReportProcessor is acquired from the availableProcessors queue and given the Report. The ReportProcessor is then placed the runnableProcessors queue. If no ReportProcessors are running, a ReportProcessor is moved from the runnableProcessors queue to the runningProcessors queue and started.
The ReportProcessor is moved from the transitionProcessors queue to the availableProcessors queue. If no ReportProcessors are running and the runnableProcessors queue is not empty, a ReportProcessor is moved from that queue to the runningProcessors queue and started.
The AudioInterface class has two primary responsibilities: acting as an interface between the audio component and the rest of the audio look and feel; and managing the low-level audio class instances which are responsible for the actual production of sound and speech.
The interface function of AudioInterface consists of several methods:
The low-level object management functions of AudioInterface are accessed by SoundThing instances when their playSound method is invoked by a ReportProcessor.
For a Speakable SoundThing, the AudioInterface will supply a reference to Synthesizer object of the appropriate voice.
For a JMFPlayable, AudioInterface manages a hash table keyed on the name of the sound file to be played. The first time a file is referenced by a Report (during its constructor processing), a JMF Player object is created and brought to a state of readiness for playing the sound. Subsequent references to the hash table return the previously created Player and permit a significantly faster startup of the sound.
Copyright (C) 1998, 1999 Adaptive Technology Resource Centre, University of Toronto.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.
Updated: 1999 Sep 08 JS
Web site maintained by Joseph Scheuhammer