System for Picture Interpretation for Blind
UI Analysis

Zdenek Mikovec
xmikovec@fel.cvut.cz

Abstract

The problem of description and interpretation of pictorial information by blind users is discussed. A system both for description and interpretation of pictorial information has been designed and implemented. The user interface (UI) of the software tools for browsing the picture by blind users is presented.

Detail information about this project at http://cs.felk.cvut.cz/~xmikovec/bis.


Contents

1. Introduction
2. Problem specification
3. Solution
4. Conclusion

1. Introduction

Development of Information Technologies (IT) at the end of 20th century brings two main changes for the blind. The first, positive one is extensive electronic communication, which gave blind people the ability to work and communicate with other people at nearly the same level as non-blind. The second, negative change (for the blind) is more intensive use of graphical user interface (GUI) with pictures and visual effects. The problem of graphical information is its very difficult presentation for the blind and the problem of GUI is a difficult orientation in it for the blind.
The motivation of our work is to suppress these negative aspects of new IT and support positive changes in IT development in such a way that the blind can use them more extensively.
In many cases the use of GUI is entirely dependent on the perception of graphical information in a form of pictures of various kinds. This means that blind users should get access to graphical information (at least on some level) in order to process the graphical information presented. The solution to this problem is the existence of tools that would allow blind users to browse a picture in a specific way. There are many problems related to picture perception. The meaning of a picture is specified both by the list of objects in the picture and by relations among objects. These relations are either of structural nature (left, right etc.) or semantic nature (a person talks to another person). Picture understanding is then derived from combination of both relation types.
Another problem that we had to solve is the handling of picture hierarchy. In general it is possible to group objects and also establish relations among these groups. The hierarchy can help us to perceive pictures on different levels of detail. This means that a tool that would allow blind users to browse pictures should be able to work with hierarchies. Besides the browser, by means of which blind users can perceive the picture, it is necessary to create a tool that would allow us to create the picture description that could be processed by the browser with properties described. In the following text the strategy for creation of such a tool for picture description creation is described. In order to automatize creation of both the picture description module and its browsing module some formal approach had to be devised. Such a tool is a grammar that allows us to define structure of picture description.

This means that the specific goal of this project is to develop tools for quick and easy picture "reading" for the blind.

Blind Information System - conceptual scheme

Figure 1: Blind Information System (BIS)

The system of tools for creating picture description and browsing the picture is called Blind Information System (BIS) and is divided into three parts (see Figure 1):

In the following text we will describe development of Picture Browser for Blind user interface.

Back to Contents

2. Problem specification

The problem is how to make it possible for the blind to get important information stored in the picture. The solution consists of two parts:

Back to Contents

3. Solution

3.1. Picture description methodology

The basic idea of this methodology is to give the blind the ability to browse the picture. Not to explain what the blind person should see on the picture, but to describe objects in the picture and relations between them and to let the blind person create his own perception of the picture.
The methodology was created with a focus on maximum easy use for the blind when browsing the picture and easy description of the picture when creating the description for the blind.

Visual information consists of three parts:

These three kinds of information give us several views on the properties of objects.
To fulfill this idea we choose an object oriented approach for picture description. That means the entire world consists of objects with their descriptions; these objects have their behavior with influence on other objects.
Therefore, when describing any situation (see Figure 2) it is necessary to describe objects, their descriptions and relations to other objects (see Figure 3).

Description methodology version A

For picture description we choose as basic relation structural relation (e.g. door is part of house).
For description of the object we have defined following fixed structure:

A B
C D
Picture - Woman and Boy
Figure 2: Woman and Boy
Picture description - Woman and Boy: Structural view
Figure 3: Woman and Boy - Structural view

3.2. Information filtering

The problem of browsing the picture is a large volume of complex information that could be browsed.
To help a blind person to understand the meaning of the picture we must give him the tool for filtering out inessential information (objects, descriptions and relations) and thus reduce the amount of information that could be browsed.
We have developed these two methods of filtering:

Object description filtering

When analyzing the way how to filter out objects and description we found out that the structure of description methodology version A was not suitable. Object descriptions were separated into several items of structure (e.g. "running" is in item "action", "9 years old" is in item "detail") and descriptions has not defined structure.
For easy filtering we have integrated all descriptions into one universal structure and we have defined basic categories of descriptions.

Description methodology version B

Basic relation is structural relation (e.g. door is part of house).
Description of the object has following dynamic structure:

While there are defined categories of object descriptions the browsing user can choose which description categories he wants to "see" (e.g. color, position, action). Applying this filter will cut out other descriptions and objects that don't match chosen categories.

View

When analyzing creation of the picture description we found out that we could understand the picture from very different points of view, which means different objects, different object descriptions and different relations. In some cases (views) we found out that structural relation (which was basic relation for our descriptions) don't have to be the major relation.
We realize to define for the picture separate descriptions for each point of view. The structure of description methodology version B was not prepared for these changes. We had to define new description methodology, where could be several descriptions of one picture - called view, and where the structural relation is not the basic relation.

Description methodology version C (last version)

There is no basic relation between objects (The structural relation is defined as description with category "hierarchical").
Each picture consists of several views (special objects which consist of real objects of the picture - see Figure 3 or Figure 4). In the picture there must be at least one structural view.
Description of the picture has following dynamic structure:

Two types of view were defined:

1. structural view (see Figure 3) - focused on structural relations

2. semantic view (see Figure 4) - focused on semantic relations

Picture description - Woman and Boy: Semantic view
Figure 4: Woman and Boy - Semantic view

3.3. Control of browser application

Control

As mentioned above, we were focused on simplicity of UI.
The problem with blind users is, that for quick control of application they have to learn by heart all frequently used commands. The commands they don't remember must be quickly reachable.

The user interaction with browser application is divided into three parts (see Figure 5):

Description Browser - User Interaction Diagram
Figure 5: Description Browser - User Interaction Diagram

All these operation are controlled by extremely low number of command. For example to open and simple browse the picture description you need only 3 groups of commands (see Example of browsing picture below):

Example of browsing the picture

Now we will simulate the browsing picture (see Figure 2). The browsing module answers (what can read blind user) will be marked with symbol "=>". The basic navigation keys are arrow keys for browsing the hierarchical tree of objects, similar to browsing directory tree in MS Explorer (up, down - moves on the same level; left, right - moves one level up or down in the tree hierarchy).
Sequence of actions performed when browsing the picture:

1. Load the picture into the browser by choosing the menu item "Load Picture". The title of picture is displayed.

=> Woman and Boy

2. Press right arrow key to get on level of views. There are two views - structural and semantic.

=> structural view

3. Press right arrow key to choose structural view and get on the first level of objects (House, Boy, Tree, Cap).

=> House(height="9 meters")

4. Press repeatedly down arrow key to choose next object on the first level.

=> Boy(height="1.5 meters" age="9 years" action="running to Door")x
=> Tree(height="20 meters" age="95 years)
=> Cap(color="black" action="falling down from Head")x

The "x" at the end of lines of "Boy" and "Cap" indicates that these objects have semantic relations to other objects. To get to the related object press Ctrl key + right arrow key.

5. Press Ctrl key + right arrow key to get to the related object "Head".

=> Head (hierarchical="is in Boy")

6. To analyze the structure of House we will do next actions:
Press left arrow key to get on the upper level of tree.

=> Boy (...)

Press up arrow key to get to the "House" object.

=> House (...)

Now we can analyze the structure of "House" by pressing right arrow key to get subobject of "House" and then repeatedly pressing down arrow key.

=> window (hierarchical="is in House")
=> door (hierarchical="is in House" color="brown")
=> window (hierarchical="is in House" group="6")
=> roof (hierarchical="is in House" color="red")

3.4. Special communication devices (Braille row, screen readers)

This browser application is specially designed for blind users. That means the interface is optimized for special devices used by blind users.

The basic interface for blind users is a screen reader software.
This is a software, which could read text information on the screen. To make it easy for screen reader to read information of our browser application we defined special menu structure and special status row (see chapter 3.5. Implementation).

3.5. Implementation

The browser workspace is divided into three main regions (see Figure 6).

Description Growser - Application design
Figure 6: Description Browser - Application design
Back to Contents

4. Conclusion

The user interaction with our browser application was tested on several blind and non-blind users.
Learning the application control was very quick. The users don't need any assistance to understand how to browse picture description.
When browsing the description we explore new request to the application (e.g. jump to the object identified only by the name of object and not by the position in the structure).
We found out problems in communication with special interface devices (screen readers, Braille row) specially under graphical user interface.

In the future work we will focus on implementation of new functionality for faster browsing and solution of problems with special interface devices.

Back to Contents