Zdenek Mikovec
xmikovec@fel.cvut.cz
The problem of description and interpretation of pictorial information by blind users is discussed. A system both for description and interpretation of pictorial information has been designed and implemented. The user interface (UI) of the software tools for browsing the picture by blind users is presented.
Detail information about this project at http://cs.felk.cvut.cz/~xmikovec/bis.
Development of Information Technologies (IT) at the end of 20th century brings two main
changes for the blind. The first, positive one is extensive electronic communication,
which gave blind people the ability to work and communicate with other people at nearly
the same level as non-blind. The second, negative change (for the blind) is more intensive
use of graphical user interface (GUI) with pictures and visual effects. The problem of
graphical information is its very difficult presentation for the blind and the problem of
GUI is a difficult orientation in it for the blind.
The motivation of our work is to suppress these negative aspects of new IT and support
positive changes in IT development in such a way that the blind can use them more
extensively.
In many cases the use of GUI is entirely dependent on the perception of graphical
information in a form of pictures of various kinds. This means that blind users should get
access to graphical information (at least on some level) in order to process the graphical
information presented. The solution to this problem is the existence of tools that would
allow blind users to browse a picture in a specific way. There are many problems related
to picture perception. The meaning of a picture is specified both by the list of objects
in the picture and by relations among objects. These relations are either of structural
nature (left, right etc.) or semantic nature (a person talks to another person). Picture
understanding is then derived from combination of both relation types.
Another problem that we had to solve is the handling of picture hierarchy. In general it
is possible to group objects and also establish relations among these groups. The
hierarchy can help us to perceive pictures on different levels of detail. This means that
a tool that would allow blind users to browse pictures should be able to work with
hierarchies. Besides the browser, by means of which blind users can perceive the picture,
it is necessary to create a tool that would allow us to create the picture description
that could be processed by the browser with properties described. In the following text
the strategy for creation of such a tool for picture description creation is described. In
order to automatize creation of both the picture description module and its browsing
module some formal approach had to be devised. Such a tool is a grammar that allows us to
define structure of picture description.
This means that the specific goal of this project is to develop tools for quick and easy picture "reading" for the blind.
The system of tools for creating picture description and browsing the picture is called Blind Information System (BIS) and is divided into three parts (see Figure 1):
In the following text we will describe development of Picture Browser for Blind user interface.
The problem is how to make it possible for the blind to get important information stored in the picture. The solution consists of two parts:
The basic idea of this methodology is to give the blind the ability to browse the
picture. Not to explain what the blind person should see on the picture, but to describe
objects in the picture and relations between them and to let the blind person create his
own perception of the picture.
The methodology was created with a focus on maximum easy use for the blind when browsing
the picture and easy description of the picture when creating the description for the
blind.
Visual information consists of three parts:
These three kinds of information give us several views on the properties of objects.
To fulfill this idea we choose an object oriented approach for picture description. That
means the entire world consists of objects with their descriptions; these objects have
their behavior with influence on other objects.
Therefore, when describing any situation (see Figure 2)
it is necessary to describe objects, their descriptions and relations to other objects (see Figure 3).
For picture description we choose as basic relation structural relation (e.g. door is
part of house).
For description of the object we have defined following fixed structure:
A B C D


The problem of browsing the picture is a large volume of complex information that could
be browsed.
To help a blind person to understand the meaning of the picture we must give him the tool
for filtering out inessential information (objects, descriptions and relations) and thus
reduce the amount of information that could be browsed.
We have developed these two methods of filtering:
When analyzing the way how to filter out objects and description we found out that the
structure of description methodology version
A was not suitable. Object descriptions were separated into several items of structure
(e.g. "running" is in item "action", "9 years old" is in
item "detail") and descriptions has not defined structure.
For easy filtering we have integrated all descriptions into one universal structure and we
have defined basic categories of descriptions.
Basic relation is structural relation (e.g. door is part of house).
Description of the object has following dynamic structure:
While there are defined categories of object descriptions the browsing user can choose which description categories he wants to "see" (e.g. color, position, action). Applying this filter will cut out other descriptions and objects that don't match chosen categories.
When analyzing creation of the picture description we found out that we could
understand the picture from very different points of view, which means different objects,
different object descriptions and different relations. In some cases (views) we found out
that structural relation (which was basic relation for our descriptions) don't have to be
the major relation.
We realize to define for the picture separate descriptions for each point of view. The
structure of description methodology version
B was not prepared for these changes. We had to define new description methodology,
where could be several descriptions of one picture - called view, and where the structural
relation is not the basic relation.
There is no basic relation between objects (The structural relation is defined as
description with category "hierarchical").
Each picture consists of several views (special objects which consist of real objects of
the picture - see Figure 3 or Figure 4). In the picture there must be at least one structural view.
Description of the picture has following dynamic structure:
Two types of view were defined:

As mentioned above, we were focused on simplicity of UI.
The problem with blind users is, that for quick control of application they have to learn
by heart all frequently used commands. The commands they don't remember must be quickly
reachable.
The user interaction with browser application is divided into three parts (see Figure 5):

All these operation are controlled by extremely low number of command. For example to open and simple browse the picture description you need only 3 groups of commands (see Example of browsing picture below):
Now we will simulate the browsing picture (see
Figure 2). The browsing module answers (what can read blind user) will be marked with
symbol "=>". The basic navigation keys are arrow keys for browsing the
hierarchical tree of objects, similar to browsing directory tree in MS Explorer (up, down
- moves on the same level; left, right - moves one level up or down in the tree
hierarchy).
Sequence of actions performed when browsing the picture:
1. Load the picture into the browser by choosing the menu item "Load Picture". The title of picture is displayed.
=> Woman and Boy
2. Press right arrow key to get on level of views. There are two views - structural and semantic.
=> structural view
3. Press right arrow key to choose structural view and get on the first level of objects (House, Boy, Tree, Cap).
=> House(height="9 meters")
4. Press repeatedly down arrow key to choose next object on the first level.
=> Boy(height="1.5 meters" age="9 years" action="running to Door")x
=> Tree(height="20 meters" age="95 years)
=> Cap(color="black" action="falling down from Head")x
The "x" at the end of lines of "Boy" and "Cap" indicates that these objects have semantic relations to other objects. To get to the related object press Ctrl key + right arrow key.
5. Press Ctrl key + right arrow key to get to the related object "Head".
=> Head (hierarchical="is in Boy")
6. To analyze the structure of House we will do next actions:
Press left arrow key to get on the upper level of tree.
=> Boy (...)
Press up arrow key to get to the "House" object.
=> House (...)
Now we can analyze the structure of "House" by pressing right arrow key to get subobject of "House" and then repeatedly pressing down arrow key.
=> window (hierarchical="is in House")
=> door (hierarchical="is in House" color="brown")
=> window (hierarchical="is in House" group="6")
=> roof (hierarchical="is in House" color="red")
This browser application is specially designed for blind users. That means the interface is optimized for special devices used by blind users.
The basic interface for blind users is a screen reader software.
This is a software, which could read text information on the screen. To make it easy for
screen reader to read information of our browser application we defined special menu
structure and special status row (see chapter 3.5.
Implementation).
The browser workspace is divided into three main regions (see Figure 6).

The user interaction with our browser application was tested on several blind and
non-blind users.
Learning the application control was very quick. The users don't need any assistance to
understand how to browse picture description.
When browsing the description we explore new request to the application (e.g. jump to the
object identified only by the name of object and not by the position in the structure).
We found out problems in communication with special interface devices (screen readers,
Braille row) specially under graphical user interface.
In the future work we will focus on implementation of new functionality for faster browsing and solution of problems with special interface devices.