IUI4DR: Intelligent User Interfaces for Developing Regions Sheetal K Agarwal, Nitendra Rajput IBM India Research Lab 4, Block C, ISID Campus, New Delhi. {sheetaga,rnitendra}@in.ibm.com
John Canny U C Berkeley, Soda Hall, Berkeley, CA
[email protected]
Apala Lahiri Chavan Human Factors International Andheri (E), Mumbai.
[email protected]
ABSTRACT
Information Technology has had significant impact on the society and has touched all aspects of our lives. So far, computers and expensive devices have fueled this growth. The challenge now is to take this success of IT to its next level where IT services can be accessed by masses. “Masses” here mean the people who (a) are not yet IT literate and/or (b) do not have the purchase power to use the current IT delivery mechanisms (PC centric model) and/or (c) do not find current IT solutions and services relevant to their life or business. Interestingly, a huge portion of the world’s population falls in this category. To enable the IT access to such masses, this workshop focuses on easy-touse and affordable, yet powerful, user interfaces that can be used by this population.
ACCEPTED PAPERS:
1.
J Pal, U Pawar, S Anikar, A Joshi, M Jain, S Gopal Thota and S Teja P. From Pilot to Practice: Creating multiple-input multimedia content for real world deployment. This paper discusses techniques to redesign/re-use existing content with minimal changes in the shared user scenarios.
2.
B DeRenzi, K Z. Gajos, T S. Parikh, N Lesh, M Mitchell and G Borriello. Opportunities for Intelligent Interfaces Aiding Healthcare in Low-Income Countries. This paper presents a study of a PDA use of a healthcare application by medical staff in Tanzania.
3.
O Murillo and M Czerwinski. The Need for In Situ, Multidisciplinary Application Design and Development in Emerging Markets. This paper discusses how design of applications should utilize partnerships with multiple groups including universities of the developing world.
4.
S Nomura, G Chiba, A Honda, T Shirakawa, T Shiose, O Katai, H Kawakami and K Yamanaka. Affordable Echolocation-Based User Interfaces in Accessing Chaotic Environments. This paper explores echolocation as an alternative to guiding visually challenged people in closed surroundings.
5.
L Steels and E Tisselli. Interfaces for Community Memories. This paper discusses the need of understanding future generations and communities’ crucial needs before delivering new technologies to developing countries.
6.
N Varghese, S NK and Raman RKVS. IndicDasher : A Stroke and Gesture based Input mechanism for Indic scripts. This paper presents a mechanism to combine hand stroke inputs with keyboard inputs for complex Indian language scripts.
7.
A K Singh. An Outline of a Multilingual Natural Language Text and Speech Interface for Computing Devices in the South Asian Context. The author suggests a universal speech interface for asian languages.
8.
M Aguilar. Trust building self-generated user interfaces for illiterate people. This paper discusses how local people can be involved in content creation and UI design of applications meant for them.
KEYWORDS
developing countries, illiteracy, cost-effective interfaces INTRODUCTION
The Internet penetration in developing countries is still below 10%. This is partially because cheap PCs (about 220 USD) prove to be an expensive proposition for people in emerging economies (56% people in developing countries live below USD 700 per year). Furthermore, using a PC requires IT skills beyond language reading and writing, leading to a low acceptance rate. Considering the social, cultural, educational and economic diversity of developing regions, the challenge is to develop appropriate and effective interfaces/interaction techniques that will enable these users to access services that currently remain elusive to them. Some of key areas of focus for this workshop are: (a) Novel and cost effective interfaces that reduce the cognitive load on the users who usually operate in chaotic environments, (b) interfaces for semi-literate and illiterate users, (c) shared user interfaces and devices and (d) designs tailored to factor social and cultural issues.
1
Proceedings Page 1 of 42.
IUI4DR 2008 Proceedings
From Pilot to Practice: Creating Multiple-Input Multimedia Content for Real-World Deployment Joyojeet Pal1, Udai Pawar2, Apurva Joshi3, Mohit Jain3, Sai Gopal Thota3 ,Sai Teja P3, Sukumar Anikar4 1
3
TIER Research Group University of California at Berkeley, 634 Soda Hall, Berkeley CA, USA 94720
[email protected]
DA-IICT, Gandhinagar, Gujarat-382007, India { apurva_joshi , miohit_jain ,thota_gopal, sai_teja}@daiict.ac.in
2
4
Microsoft Research India Scientia, 196/36 2nd Main Sadashivanagar, Bangalore, India 560080
[email protected]
AzimPremji Foundation #134,Doddakannelli,Next to Wipro Corporate Office,Sarjapur road,Bangalore-560035,India
[email protected]
ABSTRACT
In this paper we take further the experimental work on the use of multiple-input devices for developing regions and describe the process involved in creating a ready-to-deploy multimedia CD for English, as a second language, in vernacular-language-medium Indian schools. We briefly explore three areas here – first, we discuss the choice of learning English as a second language for our test application, and the pedagogical process used in designing the multimedia content. Second, we describe the various interaction designs for multiple-input modalities that we have employed, and discuss the motivations behind each, as well as the outcomes in preliminary trials. Finally, we lay out the practical challenges in both design and deployment of a real-world implementation of such a system.
there are serious challenges in deploying experimental “Multimouse” [1,3] material to real-world scenarios. The incremental hardware cost of multiple mice is minimal, but software challenges are significant. Despite the almost universal sharing of screens by multiple children in lowincome areas, most prevalent learning multimedia is designed for single users. So the shift to multiple input scenarios necessitates significant changes in the interaction designs and models. These changes involve enabling existing content to be usable for multiple input scenarios, as well as iterating designs and interactions for content specifically tailored for multiple users. Here, we describe some early results, and more importantly, the steps involved in creating deployable content for multiple-input scenarios
Author Keywords
RELATED WORK
User Interfaces, Education, Developing Regions, Computer Aided Learning, Multiple Mice.
In addition to well-received scholarly works on design research for consumer-level services such as financial transactions [10] and engineering aspects of technology in such scenarios [12], a number of experimental deployments with a strong design focus have attempted to introduce technology-aided communication and operations to bottleneck scenarios. These include CAM-based mobile data capture for rural coffee cooperatives [6], text-free User Interface applications for illiterate and semi-literate users [5], learning English as a second language using mobilephones [4] and in relation, a lot of work in the space of vernacular language development for developing regions.
ACM Classification Keywords
H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. INTRODUCTION
Following a renewed interest in interface design for lowincome populations, the idea of using multiple-input to enable more equitable technology use in shared-use scenarios has gained currency among researchers in educational technology [1, 3, 11]. Experimental work so far has shown major gains in engagement with educational content [1, 15, 17], and gains in basic learning for children in cases where each child is assigned his or her own mouse when sharing a computer [3]. This calls for an examination of prevalent real-world multimedia educational content to understand how multiple-input technologies can be incorporated into use in existing classrooms. However,
Among such areas of interest, there has been design innovation for enabling children to better utilize computers in shared-use scenarios, of particular relevance in the resource-constrained developing world [1,2,3,11]. In this paper, we follow the thread of work on the use of multiple mice on a single computer in education settings, which has its roots in earlier work in Single Display Groupware [7,15,16,17,18], looking at programs that enable co-present users to collaborate via a shared computer with a single 1
Proceedings Page 2 of 42.
IUI4DR 2008 Proceedings shared display with the simultaneous use of multiple input devices. KEY CONTRIBUTIONS
In the past decade, there has been credible research focusing on the „case for‟ technology research with the specific needs of the developing world [4, 13], and on the importance of innovative shared computing [14] for the developing world. These works have made clear that technologies initially developed with first world conditions in mind often do not adapt well to low-income situations, and often a lot of work is needed to bridge the gap between showing a prototype in a first-world lab, to actually deploying it in a developing country. The key contribution of this current paper work is to work on this logical next step of „real-world deployment‟. Here we have taken an idea within the ICTD field which had demonstrated benefits in experimental scenarios, and redesigned real-world content to examine practical applicability of such systems. In conceptualizing our design decisions, we kept in mind earlier research [2] that has shown that teacher and resource shortages in developing countries create „babysitting‟ type scenarios in computer classes where access to human guidance is minimal or absent. Consequently, we have paid much attention to artificial intelligence factors in the design of children‟s interaction in creating several design options for real-world deployment. One such contribution here is our use of turn-taking as a machine-induced interaction assuming the absence of human intervention in our designs. DESIGN APPROACH
There are three broad approaches for designing interfaces relevant to multi-mouse scenarios. First, simply “enabling” existing multimedia, designed for use by a single user, to be used by multiple users – without any changes to the content. A second approach is “redesigning” the interactive parts of the content without changes to the narrative flow or pedagogical structure. The third approach is to design multimedia content assuming a multi-user scenario from first principles (from „scratch‟). In this paper, we describe experiences with the first two cases, which in turn we believe help make a case for redesign from „scratch‟. Enabling Existing Material for Multimouse
In our studies of educational content, we found that sizable fraction of software applications for children are graphicsintensive and designed using Macromedia Flash. Most such content also follows a typical narrative-interactive loop pattern – with some narrative content being shown to the user child, followed by a series of multiple-choice questions based on the narrative content. Such content usually has hyperlinks, and animations activated by clicks. At the simplest level, “enabling” such material for multiple inputs would mean allowing each child to have a mouse and creating a “first click prevails” scenario for all screens. Thus the typical interaction mode is like a „racing‟ scenario [3] – whichever child clicks first triggers the specific action
on the screen that leads to the next step in the software, and so on. To examine ways of achieving this, we explored toolkits that have been created by researchers [17] in the past which enable multiple mice. We found that a major constraint in each of these was their being tied to particular platforms. Specifically, none of these worked with Flash. We selected the MultiPoint SDK by Microsoft [9], which though based in .NET, could initiate Flash applications from within .NET without requiring any changes to the Flash runtime. Thus, it was technically possible to enable multiple mice on Flash-based multimedia content on a PC with .NET installed. Using the MultiPoint SDK in C#, we developed a tool which can host Flash content and enables multiple mice to interact with Flash. We created two C# applications, which run simultaneously, one hosts the Flash content and the other captures the multiple mouse clicks and informs the first application. The tool can support any number of mice, adding flexibility. This tool suffers from certain glitches – specifically some overhead in some settings, but it sets the stage for refined future iterations. Using this tool and the MultiPoint APIs, the application instantly recognizes how many mice are plugged into a PC, and assigns a cursor to each, following which any of the students can cause the next action to take place on the screen by being the first to click. So even by using existing interactive modules, such as multiple choice questions, this creates a competitive environment (such as the MM-R “MultiMouse-Racing” discussed in [3]) which assigns a response to the first student to click. Implementing MM-R using the tool requires no change to be made to the existing content, designed for single user scenarios. Past work has shown that this simple change can significantly increase children‟s engagement with such content, though for higher order learning outcomes, more attention needs to be paid to other factors such as collaboration. This leads us to the second design approach. Redesigning Existing Multimedia
The bulk of our work was in “re-designing” existing educational multimedia content with the aim of minimizing changes to the pedagogical design of the content. This is a logical next step to simply “enabling” multiple mice, but it still allows one to re-use existing content. Such a re-design is not an easy case scenario since most content was originally designed with a single user in mind. More importantly the full breadth of features and interactions that a multiple-input system would offer might not be exploitable. We reviewed various interaction types typical to children‟s multimedia, and redesigned existing „English as Second Language‟ (ESL) material using these. The existing content CD used, known as „Friendly Animals‟ was an ESL CD being used in government primary schools for instructing seventh graders. Software mainly consisted of questions based on narrative content with three types of interactive modules - standard Multiple
Proceedings Page 3 of 42.
IUI4DR 2008 Proceedings „instantaneous‟ interaction modality is similar in one sense to a one-pc-one-user scenario – i.e. at any given point there is only one active child, yet due to the pressure of „getting on‟ with the game, it is not exactly a one-pc-one-user scenario. Also, past experience with multiple mice leads us to believe that even though a mouse might not be currently „active‟, the fact that a student simply has a mouse in hand and a cursor onscreen makes him/her more involved.
Choice Questions (MCQ), Fill in the Blanks (FITB) with a blank to be filled from a list of clickable choices, and Ordering Questions in which the user has to re-arrange a jumbled set of words or phrases into the correct order, by clicking and moving the words or phrases, with the cursor. The tool we developed was able to capture multi mice clicks, distinguish clicks by different mice, and inform Flash accordingly. For communication between Flash and C#, we used fscommand, a built-in function in Flash, and setVariable, a member function of the activeX wrapper class in C#. C# application also notified the Flash runtime about the number of mice connected to the computer.
Ideally, any content can be implemented using turn-taking model where questions are targeted to all the students sharing a single pc, in a round-robin fashion. This model can „mechanically‟ enforce a condition that all students get equal opportunity to learn.
The content was redesigned with no content additions or changes to narrative flow, to maintain a minimal interference with the curricular material, as well as to minimize the „time-to-deployment‟.
To implement the turn-taking modality in our test application, we need to emphasize which specific user‟s turn it is to answer. This is done by changing the font color of the question and answer text on-screen, according to the color of the active cursor. The active cursor is the one belonging to the student whose turn it is to answer. To prevent the other students from answering, their mice cursors are represented as a cross and disabled (though they are visible and can move, but cannot click on anything, Figure 2). Content developed using this model gives all the students a fair chance to participate as all of them get equal number of questions, which also makes this different from Inkpen‟s work [15] which also experimented with turn taking of sorts, but directed by the users themselves toggling a single on-screen cursor between two mice.
The interactive modules were redesigned building upon earlier research on what worked well [3, 7, 15–18] to attract and engage students such as color differentiations, animated cursors and personalized scoring to reinforce on-screen identity. We designed six types of multiple-input interactive modules, each was used following one of the narrative segments, by re-designing and replacing the existing interactive single-player modules. These were as follows. Racing Model
As the name suggests, it‟s the „fastest-finger-first‟ model. Any student can answer the questions, and the child who clicks the correct answer first, gets rewarded, in the form of stars (colored the same as their cursors) (Figure 1). Any content that tests basic concepts can use the racing model, as the questions can be answered quickly. This model is competitive in nature, and from [3] we expect that it will be engaging, however pedagogical efficacy is not guaranteed, and competition might not be the best way to go forward.
Figure 2. The game implemented using Turn-Taking Model Directed MCQ Model
In the Directed MCQ model for the multi-mouse scenario, a question is followed with few option choices. Beside each option, there are colored boxes corresponding to each cursor (Figure 3). Each student has to choose one of the options by clicking over the colored box having the same color as his/her cursor. When a student clicks on an option, the option gets checked. When all of them have chosen their option, the correct answer is revealed, showing who all answered correctly and the game advances. As all the students participate at the same time, they remain engaged throughout. This model allows each student to exercise his/her choice independently.
Figure 1. A multiple choice question with the Racing Model Turn-Taking Model
Turn-Taking model has originated from the idea of „one player at a time‟, as in traditional games like carrom, ludo, etc. In this model each student gets a chance, one after another. A question is targeted to a randomly selected student and only he/she is allowed to answer it. The 3
Proceedings Page 4 of 42.
IUI4DR 2008 Proceedings collaboration is enforced. This model encourages sharing of knowledge among the students and also develops the spirit of team work. The content where the answer requires proper selection and arrangement of options can be comfortably implemented using this model.
Figure 3. The game implemented using Directed MCQ Model Voting Model
The Voting model has originated from the concept of „voting‟, where opinions of each individual are considered, and the final decision is based on the majority, or unanimity [3]. In this model, every student exercises his/her choice for answering a question and a suitable action takes place depending on the option selected by the majority. It is useful in situations where varied views are possible and decision is to be taken collectively, taking each students opinion into account. In fact this extends to interaction settings beyond gameplay. For instance, decisions affecting the global application operation like moving to the next game, playing the same game again, exiting the game, etc. can be implemented using voting model. In these cases, the game pauses and the voting screen appears (Figure 4).
Figure 5. Showing the game implemented using Unity Model
This model was implemented for a game which involves rearranging jumbled words to make meaningful sentences (Figure 5). The game was redesigned for the multiple-mice scenario such that each of the five jumbled words is randomly colored to one of the five cursor colors. A student can only click on the word corresponding to his/her cursor color. This brings that word to a sentence queue. To reach the next level all the students need to collaborate amongst themselves and click on the words in a particular sequence to place these words in a sentence queue, so as to form a meaningful sentence. Split-Screen Model
Figure 4.Implementation of the Voting Model for decisions
The split-screen model has also been trialed in other research and found to be highly effective in increasing collaboration without losing engagement and competitiveness within a group [8]. In our implementation, the game screen was split into two halves so that two students (or two groups) can play simultaneously in their respective halves. In the game developed using this model, two random teams were formed by dividing the students (Figure 6). Each team was allotted one part of the screen, and teams can only answer the questions appearing in their part. The team which answers first gets rewarded. The game moves forward only after questions in both the halves were answered correctly.
The game moves forward only when each student has chosen his/her option. If a student doesn‟t select any option the game doesn‟t proceed. To overcome this problem, a timer was introduced (Figure 4). The timer is set to a specified limit and it starts as soon as the voting starts. The users are allowed to vote within that time span. If any of the students do not vote within the given time, the game resumes from the same point. Unity Model
As the name suggests, in this model all the students need to collaborate among themselves to answer the question –
Figure 6. The game implemented using the Split-Screen Model
Proceedings Page 5 of 42.
IUI4DR 2008 Proceedings This encourages team effort amongst the students, leading to higher level of interaction, along with incorporating competitive incentives. Hence, it satisfies the three-fold objective of engagement, collaboration and learning.
hand-holding and getting used to individual mice for children to start effectively using them. Racing Model in Multimouse Enabled Content
The students played the game with enthusiasm as they get rewarded for answering correctly. The games implemented using racing model proceeded quite fast, as students rushed to answer the questions, without discussing among themselves. A non-collaborative environment developed in such situations. Moreover in a particular case, a lagging child eventually lost interest in the game and sat idle.
OBSERVATIONS
Our usage examination here is a preliminary qualitative participant observation which helps the iterative design process and gives us insight into the right questions to ask. These observations set the stage for controlled experimentation. At the time of publication, we have observed a total of 30 children using the material over 6 groups. Each of the tests took place in a real-world setting, at a low-income corporation school which was participating in the state computer-aided-learning program. The observations were all done during school hours inside the computer lab with 5 children per computer at the time of the test. No specific instructions related to multi-mouse were given to the students, to observe how quickly they get acquainted with the new scenario.
Turn-Taking Model in Multimouse Enabled Content
Children easily understood this model as it was similar to the one-pc-one-child scenario which they were already aware of. Since, each question was targeted to a randomlyselected student, all the students remained attentive waiting for their turn. In cases where a student got stuck with a question, others helped him/her answer the question, encouraging discussion among students. Children remained idle after their turns. Directed MCQ Model in Multimouse Enabled Content
As all the students have to choose an option for each of the questions, everyone remained involved throughout the game. In few cases, the lagging child was observed following the leading child. Voting Model in Multimouse Enabled Content
As each individual‟s decision is taken into consideration, students felt responsible and participated actively. It was seen that the leading student was compelling others to choose an option of his/her choice. The lagging child was forced by others to be quick. Unity Model in Multimouse Enabled Content
In this model, maximum amount of discussion was observed. It was observed in a case that the leading girl not only formed the complete sentence, but even recited it, so that the answer was known to all. Albeit the sample size is low, girls seemed to be more cooperative as compared to boys. Since the students have to discuss in order to form the correct answer, this model consumed a lot of time, but we postulate that the discussion would lead to better learning.
Figure 7. Preliminary field tests with school-children Unchanged Existing Content Enabled with Multiple Mice
We tested first, the basic implementation of simply adding multiple mice capability to existing single user CDs. This was a particularly important test because in terms of a realworld usage scenario, this offers the cheapest and most immediately deployable option. Trials showed that students were more engaged due to the competitive aspect, but this in turn made the control of the interaction somewhat ungainly since any child could move on to a „next‟ screen. An encouraging result is that a share of the clicking was distributed among all the users, which indicated a fairly wide involvement. However, during the MCQ sessions, the clicking itself was based on a speed-based competitive strategy rather than one oriented to thinking through the options. In this strategy, children hoped to score in the game through lucky clicks, validating [3], so there was no real need for actually building content knowledge. On the whole, our observations suggest that while there is encouraging increase in engagement, it may take some
Split Screen Model in Multimouse Enabled Content
Children learned to play in a team and co-operated with their teammates to win the game, as the answers could be framed only after discussion within the team. Since the game doesn‟t move forward until the questions on both halves of the screen are answered correctly, the first team to finish remained idle till the second team completed. General Observations for Multimouse Enabled Content
Apart from these specific observations, in general when children were forced to wait due to the slow pace of others, they got frustrated. This was more common in boys as compared to girls. In few cases, the leading student 5
Proceedings Page 6 of 42.
IUI4DR 2008 Proceedings forcefully tried to answer for others by taking over their mice. Moreover, few students indulged in random clicking. These observations open up possible avenues for future developments (Table 1) for multiple-mice interactions. TABLE 1. Interaction models and related characteristics
Model
Application
Risks
Future Work
Racing Model
To increase engagement, competition
Gaming system through rapid clicks
Negative marking to discourage random clicking
TurnTaking Model
Creating equitable access
Decreased engagement of nonactive children, resentment towards slow movers
Artificial intelligence: push questions for children performing lesser than others
Directed MCQ Model
Simultaneous and individual participation
Disinterest among fasterfinishing children
Rewards based on the timing of clicking
CONCLUSION
We started this work under the assumption that shared computing is a likely direction for the future given the cost of technology in the developing world. This work is meant to serve as a reference for researchers looking at simultaneous shared computer use by offering ways in which such interactions can be designed as well as discussing the pros and cons of these designs. We find in our tests that children easily adapt from one type of interaction model to another with limited or no explanation. For multiple mice to be used effectively in currently prevalent learning scenarios, no single design, rather a combination, is likely to be used. One finding consistent through many of the trials was expanding the scope of artificial intelligence in realizing the true benefits of multi-mouse. With the increase of interest in multiple mice both within the industry and in policy circles, it is possible that in the near future, real world deployments of such technology are highly likely. Interaction designers are likely to play a critical role in the development of shared screen technology going forward. This study and others like it highlight some of the key issues for researchers to iteratively discuss. REFERENCES 1. Pawar, U.S., Pal, J. and Toyama, K. Multiple Mice for Computers in Education in Developing Countries. In Proc. of IEEE/ACM ICTD’06, 2006, pp. 64-71. 2. Pal, J., Pawar, U.S., Brewer, E.A. and Toyama, K. The Case for Multi-User Design for Computer Aided Learning in Developing Regions. In Proc. of WWW 2006, pp.781-789.
Voting Model
Split Screen Model
Unity Model
Encouraging consensus
Balancing collaboration with engagement
Fostering group responsibility
Free riding, contrived agreement, no decisionmaking even if one doesn‟t answer
Decisionmaking based on received votes
Complicated interface
Using scores from previous rounds to form a balanced team
Slower interaction
3. Pawar, U.S., Pal, J., Gupta, R. and Toyama, K. Multiple Mice for Retention Tasks in Disadvantaged Schools. In Proc. CHI 2007, ACM Press (2007), pp. 1581-1590. 4. Kam, M., Ramachandran, D. and Canny, J. MILLEE: Mobile and Immersive Learning for Literacy in Emerging Economies. CHI 2007 Workshop, 2007. 5. Medhi, I. and Toyama, K. User-Centered Design and International Development. CHI 2007 Workshop, 2007. 6. Schwartzman, Y. and Parikh, T.S. Establishing Relationships for Designing Rural Information Systems. CHI 2007 Workshop, 2007.
Setting time limits for decisionmaking
7. Stewart, J., Bederson, B.B. and Druin, A. Single Display Groupware: A Model for Co-present Collaboration. Proc.’ CHI 99, ACM Press (1999), 286-293. 8. Field/mice project. http://groups.ischool.berkeley.edu/fieldmice/ 9. The Microsoft Windows MultiPoint Software Development Kit (SDK). http://www.microsoft.com/downloads/details.aspx?FamilyID= A137998B-E8D6-4FFF-B8052798D2C6E41D&displaylang=en 10. Parikh, T., Ghosh, K. and Chavan, A. Design studies for a financial management system for micro-credit groups in rural India. In Proc. of the Conference on Universal usability, 2003.
Proceedings Page 7 of 42.
IUI4DR 2008 Proceedings 11. Patra, R. et al. Usage Models of Classroom Computing in Developing Regions. Proc of IEEE/ACM ICTD2007, December 2007.
15. Inkpen K., Booth K.S., Gribble S.D., and Klawe M. Give and Take: Children Collaborating on One Computer. Short papers in CHI‟ 95, ACM Press (1995), 258-259.
12. Guo, S., Falaki, M.H., Oliver, E.A., Ur Rahman, S., Seth, A., Zaharia, M.A. and Keshav, S. Very Low-Cost Internet Access Using KioskNet. ACM SIGCOMM Computer Communication Review, October 2007, 95-100.
16. Bricker, L.J., Tanimoto, S.L., Rothenberg, A.I., Hutama, D.C., and Wong, T.H. Multiplayer Activities that Develop Mathematical Coordination. Proc. CSCL (1995), 32-39. 17. Shoemaker G.B.D. Single Display Groupware research in the year 2000. TR2001-1 (2001), Simon Fraser University. 18. Benford, S. et al. Designing storytelling technologies to encouraging collaboration between young children. Proc. CHI '00, ACM Press (2000), 556-563.
13. Jhunjhunwala, A., Ramamurthi, B. and Gonsalves,T.A. The Role of Technology in Telecom Expansion in India. Communications Magazine, IEEE, 36 (11), pp. 88-94, 1998. 14. Dillenbourg , P. and Traum, D. Does a shared screen make a shared solution?. In Proc. of the 1999 conference on Computer support for collaborative learning, p.14-es 1999.
7
Proceedings Page 8 of 42.
IUI4DR 2008 Proceedings
Opportunities for Intelligent Interfaces Aiding Healthcare in Low-Income Countries Brian DeRenzi† Tapan S. Parikh‡ Neal Lesh* †University of Washington Box 352350 Seattle, WA 98195-2350 {bderenzi, kgajos, gaetano} @cs.washington.edu
Krzysztof Z. Gajos† Gaetano Borriello† Marc Mitchell*
*D-Tree International 52 Whitney Tavern Road Weston, MA 02943
[email protected] [email protected]
‡School of Information University of California, Berkeley Berkeley, CA 94720-4600
[email protected]
ABSTRACT
INTRODUCTION
Child mortality is one of the most pressing health concerns – almost 10 million children die worldwide each year before reaching their fifth birthday, mostly in low-income countries. To aid overburdened and undertrained health workers the World Health Organization (WHO) and United Nations Children’s Fund (UNICEF) have developed clinical guidelines, such as the Integrated Management of Childhood Illness (IMCI) to help with the classification and treatment of common childhood illness. To help with deployment, we have developed an electronic version (eIMCI) that runs on a PDA. From July to September 2007, we ran a pilot of e-IMCI in southern Tanzania. The system guides health workers step-by-step through the treatment algorithms and automatically calculates drug doses. Our results suggest that electronic implementations of protocols such as IMCI can reduce training time and improve adherence to the protocol. They also highlight several important challenges including varying levels of education, language and expertise, which could be most adequately addressed by implementing novel intelligent user interfaces and systems.
To address the public health issues in low-income countries and aid the overburdened, undertrained health workers, the World Health Organization (WHO) and other non-profits have developed medical algorithms to quickly classify and suggest treatment for major health concerns. The Integrated Management of Childhood Illness (IMCI) [17] program was developed specifically to deal with high child mortality in low-income countries where almost 10 million children under the age of five die each year [16]. The IMCI treatment algorithm guides health workers through a set of investigations and questions that lead to a classification and recommended treatment. A multicountry evaluation of IMCI, which included a study across four districts in rural Tanzania, found that the correct use of IMCI combined with evidence-based planning leads to rapid gains in child survival rates [1]. Despite this, uptake in Tanzania has been disappointing for a number of reasons, including a lack of sufficient supervision and the cost of training health workers. Although nearly all districts in the country have started to train front-line health staff in the use of IMCI, many health workers remain inadequately trained. For a sick child attending a rural health facility in Tanzania, the chance of being seen by an IMCI-trained person is low.
Author Keywords
IMCI, Tanzania, child health, medical protocols, intelligent tutoring, user experience, adaptive interfaces.
To aid with the deployment of IMCI, we have developed an electronic version (e-IMCI) that runs on a PDA and guides health workers step-by-step through the treatment algorithm. We piloted the use of our software with four clinicians at a dispensary (small health facility) in rural Tanzania from July to September 2007 to gather quantitative evidence on the effect on adherence to the protocol and qualitative data from clinicians about their impression of the software.
ACM Classification Keywords
H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous.
In this paper, we present observations from our field study of e-IMCI and present opportunities for e-IMCI to adapt to user experience, be used for training, and automatically generate fast, usable interfaces for a variety of platforms.
1
Proceedings Page 9 of 42.
IUI4DR 2008 Proceedings
The interface for e-IMCI is based on work being done in South Africa, where a screening algorithm is being developed to help counselors in HIV clinics determine which patients need to be seen by doctors and which are healthy enough to be sent home with their drugs [8]. Preliminary results there are also encouraging.
Figure 1: The e-IMCI interface. IMCI
Figure 1 shows the e-IMCI software. Based on familiar instant messaging and chat programs, the interface presents the current question at the bottom of the screen. After answering the question, a short version is presented above the next question, allowing users to review previous answers. This provides context, allowing users to review their previous answers and understand how the system arrived at the next question. Back and next buttons are supplied for correcting errors in previous answers. This interface was also inspired by the DiamondHelp system for collaborative home applications [13].
IMCI is a multi-faceted approach to addressing children’s health in a resource-constrained environment. Originally developed by the WHO, UNICEF and other partners in 1992, the system integrates several protocols to address the most common childhood conditions. According to the WHO, 70% of childhood deaths worldwide are caused by pneumonia, diarrhoea, malaria, measles, malnutrition or a combination of these, all of which are covered by IMCI [17].
FIELD STUDY
At the health facility level, IMCI is a medical algorithm currently implemented as an extensive training program and paper chart booklets used in the clinics to aid practitioners at the point of care. To make the algorithm easier to follow, IMCI divides the treatment into five major symptoms: cough, diarrhea, fever, ear problems and malnutrition. Referring to a paper chart, the practitioner asks the caregiver (typically the mother or another family member) questions and performs investigations to navigate through the decision tree. The questions are usually simple— including questions about age, weight, and how long the child has had a particular symptom. The investigations include various medical tasks like taking body temperature or measuring the number of breaths per minute.
Next, the clinicians were observed following IMCI according to their current practice. The first author took notes during these sessions while a Tanzanian research colleague acted as a translator for the doctor-patient interactions (mostly in Swahili) and as a supervising clinician. The Tanzanian researcher filled out a check sheet to record which investigations were performed and which questions were asked.
e-IMCI
To aid practitioners in navigating through the algorithm, we developed e-IMCI, which runs on Windows Mobile. The software presents one question or investigation at a time and uses the answer to determine the next step. By removing the page-turning involved with the chart books, and automatically calculating drug doses, we believe that the electronic delivery of IMCI can be as fast as the current practice.
To pilot e-IMCI, we studied its use with real patients in a dispensary in Mtwara, Tanzania [3]. Four clinicians participated in the full study, which had four major parts. First, clinicians were interviewed to determine their level of experience with computing devices and preconceived notions about using e-IMCI. The software was introduced at this time so clinicians could comment on how they thought that it might or might not help them.
After gathering this baseline data on current practices, the clinicians were observed using the e-IMCI software. Again, two researchers took notes. Finally, the clinicians were interviewed to compare their experience with preconceived notions. They were also asked about long-term use and potential for deployment of the system. Current Practice in Tanzania
While the ideal case is that the practitioner follows the IMCI paper chart booklet for every patient under the age of five, we found that this is not the case. In our study we observed that many clinicians chose to work without following the paper charts.
Proceedings Page 10 of 42.
IUI4DR 2008 Proceedings
Figure 2: e-IMCI being used by a clinician with a patient.
All four of the clinicians that we worked with mentioned the duration of the patient visit as being important. As one clinician put it, they do not like to follow the chart booklet ―because it takes so long‖ to flip the pages and follow the algorithm. One of the more experienced clinicians stated ―experience is faster, but [we] can forget some things.‖ The long line of patients queuing outside the room added to the need for quick visit times. We want a device to help clinicians navigate through the algorithm, but it is clear from our experience that it needs to be at least as fast as current practice, where the chart booklet is rarely referenced, as well as having some additional value that will encourage its continued use (e.g., automatically producing government-mandated monthly reports).
Figure 3: The clinicians training each other with e-IMCI.
also suggest that experienced e-IMCI users will want flexibility in how to structure/order their interactions with the patient. However, the system must also deliver additional value to the clinician in order to justify the overhead of purchasing and maintaining the hardware and learning to use the system. By providing longitudinal records, summary reports and a resulting higher standard of care the system can provide a compelling value proposition to clinicians and their supervisors.
Clinician Response
The clinicians unanimously cited the interface as easy to use. One particularly thorough clinician enjoyed being able to review all of his previous answers and would routinely check to make sure he was entering correct data. Figure 2 shows a clinician using the system with a patient during our pilot study.
OPPORTUNITIES FOR INTELLIGENT INTERFACES
We believe that intelligent user interface research can be used to increase the uptake and effectiveness of this software for day-to-day use, training new clinicians, and implementing more complex and dynamic protocols.
All of the clinicians cited using e-IMCI as being faster than following the chart booklet, but not quite as fast or flexible as current practice, where care is often delivered from memory instead of explicitly following the protocol. One clinician said that if available she would ―use a combination‖ of current practice and the e-IMCI software and would never need to refer to the book. We were encouraged by the positive feedback from the clinicians, but we feel that we can improve the speed and efficiency of the interface to encourage them to use the software for every patient.
Adapting to User Experience
None of the five clinicians that we worked with had any previous experience with PDAs or computers, though all had used a mobile phone at some point in the past. The clinicians were able to quickly get used to the user interface and the stylus. We demonstrated the device to all clinicians while we were implementing the IMCI protocol. Two clinicians used the software during our pre-trials where we discovered as many major bugs as possible. After finishing our programming, we trained one clinician and answered questions while she used the system. After about 10 to 20 minutes stepping through the software, she took it upon herself to demonstrate the system to her colleagues.
Beyond the Novelty Effect
To achieve long-term use beyond the novelty stage, we propose that two things are required: efficiency and significant additional value. Based on our experience, if the software significantly increases the length of patient visits, it will be put down and only occasionally referenced, just like the chart booklet. The responses we have collected
As mentioned earlier, clinicians said they would prefer to use a combination of providing treatment from memory and using the e-IMCI software. When using memory, care is user-driven, delivered by user initiative. That is, the 3
Proceedings Page 11 of 42.
IUI4DR 2008 Proceedings
clinician decides what investigations to perform and in what order. When using e-IMCI, the experience is systemdriven, by system initiative, meaning that the clinicians lose a certain amount of flexibility. We propose providing two interfaces to our system that will not only make the software faster to use, but also lead to improved acceptance by the clinicians. The first mode, guided mode, would work like the current interface, with a system-driven approach. The software would be the sole determinant of the order of questions. The second mode, expert mode, would be almost entirely user-driven. Practitioners could choose the investigation they wish to perform. When they feel that they have enough data, they could ask the software for a classification and treatment. The tool would revert back to guided mode for asking any subsequent questions that might be needed to make this determination. This would provide a mixedinitiative scenario, where the user is in charge but the system makes sure that the quality of care is not jeopardized. Training Support
One of the clinicians we worked with was an IMCI instructor. He found the software ―easy to learn [and] easy to use.‖ He felt that using the e-IMCI software instead of the paper-based protocol, people could be trained faster than with current training. If the health workers were previously familiar with PDAs, he felt that he could cut training time by 50% (from the current 11-14 days). In future work, we plan to expand e-IMCI to include a training module. The Novartis Foundation for Sustainable Development, in conjunction with the WHO, has developed a computer-based training course for IMCI [11]. The course trains health workers to use the paper chart booklets. Within e-IMCI we can provide contextualized feedback and guidance while learning to use the system. Previous work has shown intelligent tutoring systems (ITS) to be effective learning tools for students [7]. There has been work on using ITS for medical learning [15]. There has also been work using this approach with mobile devices [14]. However, the context of health facilities in rural Tanzania presents a different set of challenges. Not only can language, domain expertise and cultural diversity vary widely among health workers, but the available infrastructure and user experience can present novel difficulties introducing computing technology. In Tanzania, the official language of the people is Swahili, but the language of the government is English. As a result all of the health reporting must be in English. During an informal tour of health facilities in the Mtwara region, the first author observed health workers with a wide range of English language skills. Some were able to converse while others knew only enough to do the required reporting.
Tanzania has at least nine different levels of health professions from maternal child health aid, which requires a 1-year course after secondary school, to full medical doctor, with an additional 5 years of study after completing secondary school. Over half of these types of professions qualify to be trained to use IMCI in their daily work. Within each of these groups there is a varying range of technical experience and expertise. All the clinicians we worked with this summer either owned or had used a mobile phone. None had any previous experience with PDAs or computers. We expect that almost every potential user will have experience with mobile phones. On the other side of the spectrum, the more technically savvy will already be familiar with email, web browsing and other basic computer tasks. Finally, Tanzania is made up of over 120 different tribes occupying 26 different regions: 21 on the mainland and 5 on Zanzibar. People often speak their tribal language as their mother tongue, in addition to Swahili, which they learn in school. This wealth of different cultures presents another set of issues. For example, on a previous trip to Tanzania doing health surveys, the first author learned that it is not socially acceptable to ask the Maasai people about the causes of death of their ancestors. These types of cultural differences are present throughout the country and must be carefully considered. Past the traditional training scenarios, we can imagine that training could be more personalized for the user. If it is a refresher course, the analysis of the use of e-IMCI in the field could be used to influence the re-training received. Similarly, as protocols are updated1 the software could automatically generate initial and re-training scenarios. Deploying Protocols
Our studies and those of D-Tree in South Africa have demonstrated that software interfaces for medical protocols can improve accuracy of delivery without a dramatic decrease in speed and offer potential for shorter training times when compared to the paper-based versions. However, it is inefficient to spend weeks reprogramming just to deploy a new algorithm. Ideally, medical professionals would be able to design, update and deploy new protocols without any programming skills. To make the process of deploying and updating softwarebased versions of these protocols scalable, we believe that the design of the protocol itself, the paper ―interface‖ and the software versions, should be integrated into a single process. Recent advances in automatic user interface generation [5, 9] make this approach feasible. As protocols are developed, the software can automatically generate versions for different platforms. In extremely rural locations with no power, we may still need to use
1
In Tanzania, IMCI is updated annually.
Proceedings Page 12 of 42.
IUI4DR 2008 Proceedings
Similar projects, such as the HIV screening algorithm in South Africa, will also benefit from further research in user interface design. More generally, job aids for undertrained, over-burdened knowledge workers in low-income countries provides a rich set of opportunities for user interface research. ACKNOWLEDGMENTS
We would like to thank D-tree International [2], Dimagi Inc. [4], Ifakara Health Research & Development Centre [6], the clinicians in Mtwara and the Ministry of Health in Tanzania for their involvement and support of this project. REFERENCES
1. Armstrong Schellenberg, J; Bryce, J; de Savigny, D; Lambrechts, T; Mbuya, C; Mgalula, L; Wilczynska, K. The effect of Integrated Management of Childhood Illness on observed quality of care of under-fives in rural Tanzania. Health Policy and Planning (2004), 110.
Figure 4: Protocol creation for different devices and interfaces.
paper charts (which could also be automatically generated from an abstract model of the protocol), but in large cities of low-income countries, it may be feasible to use a laptop or desktop machine. We anticipate the majority of health facilities will exist somewhere between these two extremes, using mobile devices like PDAs and smart phones to deliver medical protocols. The paper charts could also be used as a backup in case devices are lost, stolen, broken or unable to be charged.
2. D-tree International http://www.d-tree.org/ 3. DeRenzi, B., Lesh, N., Parikh, T.S., Sims, C., Mitchell, M., Maokola, W., Chemba, M., Hamisi, Y., Schellenberg, D., and Borriello, G. e-IMCI: Improving Pediatric Health Care in Low-Income Countries. In Proc. CHI 2008, ACM Press (2008), (to appear). 4. Dimagi Inc. http://www.dimagi.com/
Further, for each of these devices, we can provide three interfaces: a tutor mode for training, a guided mode to follow the system-driven model and an expert mode to follow the user-driven model. Software to automatically create usable interfaces for these medical protocols would be used widely. With the large number of protocols and clinical algorithms currently being used, three different skill-levels and at least four different device targets, the amount of work required to deploy or update a protocol grows rapidly. Figure 4 illustrates this point, with the protocol originally being designed on the desktop machine and deployed on a variety of devices with automatically generated interfaces. An added benefit of this approach might be improved consistency among interfaces for different protocols [10], which could further reduce training time.
5. Gajos, K. and D. S. Weld. Supple: automatically generating user interfaces. In Proc. IUI 2004, ACM Press (2004), 93–100. 6. Ifakara Health Research & Development Centre http://www.ihrdc.org/ 7. Koedinger, K. R.; Anderson, J. R.; Hadley, W. H.; Mark, M. A. Intelligent Tutoring Goes To School in the Big City. International Journal of Artificial Intelligence in Education. (1997), 30-43. 8. Mitchell, M; Lesh, N; Crammer, H; Fraser, H; Haivas, I; Wolf, K. Improving Care – Improving Access: The Use of Electronic Decision Support with AIDS patients in South Africa. International Journal of Healthcare Technology and Management. In process. 9. Nichols, J., Myers, B.A., Higgins, M., Hughes, J., Harris, T.K., Rosenfeld, R. and M. Pignol. Generating remote control interfaces for complex appliances. In Proc. UIST 2002, ACM Press (2002), 161-170.
CONCLUSION
Applying information and communication technology in low-income countries as opposed to developed countries provides a unique and challenging set of constraints. As presented in this paper, the delivery of medical algorithms at the point of care in health facilities in rural Tanzania provides a huge opportunity for intelligent user interfaces. In future work we plan to continue the development and deployment of e-IMCI in Tanzania, but hope to also go beyond IMCI to include more complex protocols.
10. Nichols, J., Rothrock, B., Chau, D. H., and Myers, B. A. Huddle: automatically generating interfaces for systems of multiple connected appliances. In Proc UIST 2006, ACM Press (2006), 279-288. 11. Novartis Foundation for Sustainable Development. ICATT – Computer-based learning program for health professionals in developing countries. (2007). http://www.novartisfoundation.org/mandant/apps/public
5
Proceedings Page 13 of 42.
IUI4DR 2008 Proceedings
ation/detail.asp?MenuID=272&ID=614&Menu=3&Item =46.3&pub=134 12. Parikh, Tapan S. Designing an Architecture for Delivering Mobile Information Services to the Rural Developing World, Ph.D. Dissertation, University of Washington, 2007. 13. Rich, C.; Sidner, C.; Lesh, N.; Garland, A.; Booth, S.; Chimani, M., DiamondHelp: A Collaborative Interface Framework for Networked Home Appliances, IEEE International Conference on Distributed Computing Systems Workshops, IEEE (2005), 514-519. 14. Sharples, M.; Corlett, D.; Westmancott, O. The Design and Implementation of a Mobile Learning Resource.
Personal and Ubiquitous Computing, Springer London (2002), 220-234. 15. Suebnukarn, S.; Haddawy, P. A Collaborative Intelligent Tutoring System for Medical Problem-Based Learning. In Proc. IUI 2004, ACM Press (2004), 14-21. 16. UNICEF. Child deaths fall below 10 million for first time. (2007). http://www.unicef.org/media/media_40855.html 17. World Health Organization. Child and Adolescent Health and Development: Integrated Management of Childhood Illness http://www.who.int/child-adolescenthealth/integr.htm
Proceedings Page 14 of 42.
IUI4DR 2008 Proceedings
The Need for In Situ, Multidisciplinary Application Design and Development in Emerging Markets Oscar Enrique Murillo Microsoft Corporation One Microsoft Way, Redmond, WA 98052
[email protected]
Mary P. Czerwinski Microsoft Research One Microsoft Way, Redmond, WA 98052
[email protected]
ABSTRACT
schooled at all. Middle-to-upper-class consumer and information workers, whether in New York City or Bangalore, India, are more likely to understand the metaphors of attachments and files, the desktop and folders, whereas in more lower social-economic, rural contexts, this is clearly not the case.
In this paper we describe our thoughts and experiences around the need for in situ, multidisciplinary design research in developing countries. Our position is that a partnership with educational institutions in the local regions from a design perspective could be quite valuable. Through ethnographic, longitudinal partnerships leveraging the educational system, we believe better design solutions for these regions would be possible.
OUR POSITION
In this day and age where costs are a priority and one of the main deterrents for some lower socio-economic class users who might adopt our technology, we believe there are other considerations that must be taken into account with regards to enabling the marginalized to be empowered by this very technology. For instance, the metaphors that most of us have learned in order to use computing technology may not be readily accessible. After all, what does the term attachment mean to a Peruvian fisherman?
Author Keywords
Emerging markets, design, ethnography ACM Classification Keywords
H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. INTRODUCTION
Our position is that technology companies should consider extending the reach of their design staff via participatory design efforts in the regions their products will eventually be adopted in. One cost effective way of doing this is by leveraging the educational system. Unfortunately, the use of educational systems for expanded design efforts is limited to computer science, not design, social science or human-computer interaction. We believe this to be a huge, missed opportunity. What typically happens when technology companies partner with the computer science departments in emerging regions is that we get excellent prototypes back, but mostly it is really excellent plumbing. These products and services could be perceived as unwieldy power tools that leverage user interface metaphors that were developed for completely different audiences than what they are actually being targeted for in that region.
When it comes to developing applications for users in emerging markets, one of the main concerns that designers working for companies like Microsoft have is a lack of longitudinal research and participatory design in these contexts. In fact, most application designers working in technology companies are responsible for designing tools that are intended to be leveraged by all users, anytime and anywhere. However, what is frustrating about designing these global solutions is that the lack of region-specific, deep cultural awareness and understanding renders many of our applications useless in these regions, especially in rural subcultures, i.e., farmers, or fishermen. Our products tend to cater to middle to upper class information workers and consumers. When doing research in emerging markets the metaphors and graphical user interface standards have a tendency to not hold sway with people who have been schooled in different ways, via different standards or not
An example that actually occurred in Seattle, 2005, included one of the authors volunteering at the Seattle Public Library, teaching basic internet usage skills to migrant workers, most of whom were from Mexico and Central America. The author started off by asking the class why they were interested in learning these applications and the internet. The most common responses were, ―to buy a better house‖, ―to send their children to college‖, ―to make more money‖ or ―to build a better life‖. Many of the students didn’t know what a mouse was or how to use one,
1
Proceedings Page 15 of 42.
IUI4DR 2008 Proceedings so the author started there. They had to be taught how to click on specific buttons and sequences of behaviors using metaphors that they didn’t understand. Finally, when the author came to the realization that he had come to the root of the problem he decided to teach concepts using a Spanish web portal. As it turned out, most of his class couldn’t even type in Spanish. In fact, one person almost gave their credit card information out after a web search to an untrustworthy source. The students believed that the Internet was a safe haven, capable of providing them with everything they needed. Sadly, their disappointment was evident at the end of the first session. One student, a single mother of two between 25-30 years old, started crying, because she didn’t think she could ever succeed in learning the internet, and that she was so stupid. We consider this to be an instance of another lost user. Of course, there is a huge opportunity in emerging markets. Like users in developed regions, they have very basic needs—to stay in touch, to receive some form of entertainment, to manage their personal finances and to get basic access to information. These users would also like pathways to entrepreneurship in order to better their lives. Our applications could be stripped down and simplified in comparison to the number and kind of tasks that an information worker is going to perform. And, the user of simple input technologies, such as speech, would benefit these users immensely as well. Kiosk model-style metaphors would work better for these users; for instance, more reliance on icons and photos (e.g., a photo of a doctor or teacher) would be better understood than the classic desktop approach (4-8). What should also be emphasized are the principles of reliability, privacy and security for these users, as they are more susceptible to malicious attacks and scams (1-3, 9). This watermark needs to be easily identifiable and, most importantly, trusted.
ACKNOWLEDGMENTS
We thank reviewers of this proposal for their comments. REFERENCES
1.
Kuriyan, R., I .Ray, K. Toyama, Integrating Social Development and Financial Sustainability: The Challenges of Rural Kiosks in Kerala. Proc. 1st International Conference on ICT and Development, Berkeley, May 2006.
2.
Veeraraghavan, R., Singh, G., Pitti, B., Smith, G., Meyers, B and Toyama, K., Towards accurate measurement of computer usage in a rural kiosk. Third International Conference on Innovative applications of Information Technology for Developing World – Asian Applied Computing Conference, Nepal, December 2005.
3.
Toyama, K., Kiri, K., Menon, D., Pal, J., Sethi, S., Srinivasan, J., PC kiosk trends in rural India. Proc. Policy Options and Models for Bridging Digital Divides (Tampere, Finland), March 2005.
4.
Medhi, I., Pitti B. and Toyama K. Text-Free UI for Employment Search. Asian Applied Computing Conference. Nepal, December 2005.
5.
Medhi, I., Prasad, A. and Toyama K. Optimal Audio-Visual Representations for Illiterate Users. International World Wide Web Conference Committee. Canada, May 2007.
6.
Medhi, I. and Kuriyan R. Text-Free UI: Prospects for Social Inclusion. International Conference on Social Implications of Computers in Developing countries. Brazil, May 2007
7.
Indrani Medhi, User-Centered Design for Development. ACM Interactions. COLUMN: Forum: under development Volume 14 , Issue 4 (July + August 2007), 12 – 14.
8.
The Challenge of Dealing with Cultural Differences in Industrial Design in Emerging Countries: Latin-American Case Studies (Proceedings Volume 10, LNCS_4559, ISBN: 978-3-540-73286-0) Alvaro Enrique Diaz, Bee-Design Inc., Canada.
9.
When in Rome… be Yourself: A Perspective on Dealing with Cultural Dissimilarities in Ethnography (Proceedings Volume 10, LNCS_4559, ISBN: 978-3-540-73286-0) Apala Lahiri Chavan, Human Factors International, Pvt. Ltd., India; Rahul Ajmera, Human Factors International, Pvt. Ltd., India.
CONCLUSION
Our bottom line is that there is ample room to do design explorations around education, healthcare, communication, coordination and finance in emerging regions. We believe these design explorations require in situ, multidisciplinary teams and longitudinal ethnographic research to guide them. We also believe a partnership with educational institutions in the local regions from a design perspective could be quite valuable. It’s unfortunate that our community continues to design solutions to problems in these regions without the benefits of leveraging design and education. Introducing the requirement into the curriculum for computer scientists to collaborate with other disciplines would benefit them greatly and speed this process along. We believe this kind of collaboration would foster designs that have greater utility, better usability and might even have the potential to create businesses and startups in emerging regions.
Proceedings Page 16 of 42.
IUI4DR 2008 Proceedings
Affordable Echolocation-Based User Interfaces in Accessing Chaotic Environments ∗
Shigueo Nomura Kyoto University 606-8501 Kyoto, Japan
[email protected]
Genki Chiba Kyoto University 606-8501 Kyoto, Japan
[email protected]
Takeshi Shirakawa Kyoto University 606-8501 Kyoto, Japan
[email protected] ABSTRACT
This work investigates a novel idea of interfaces that can reduce the cost and cognitive load on the users. The proposed nonspeech audio-based interfaces take advantage of users’ echolocation ability in accessing chaotic environments without depending on help from others or literacy. This investigation experimentally evaluates whether the subjects can perceive and discriminate different shapes of spatial structures through echolocation. Twenty volunteers participated in six experiments under different conditions such as type of nonspeech sound, variable speed of moving sound, testing session accompanied by prior training or no. The subject’s task consisted of discriminating echoically among concave, plane, and convex virtual surfaces. These virtual surfaces were created by a 3D acoustic space system generating two types of nonspeech sounds (discrete and continuous) that could move at speed of 40 deg/s or 90 deg/s. The early results showed promising evidence that human echolocation skills can be enabled and taken advantage to spatial structure conceptualization tasks. Also, these results revealed that the direct perception in contrast to the associative learning of spatial structures is highly possible. Such results stand on the essential evidence that affordable echolocation-based user interfaces can emerge to support rejected groups like illiterate, visually impaired, or elderly users.
Akitoshi Honda Kyoto University 606-8501 Kyoto, Japan
[email protected] †
Takayuki Shiose Kyoto University 606-8501 Kyoto, Japan
[email protected] information about a target (spatial structure) by using transmitted and reflected sounds (echoes) as interaction media with this target. According to Kellogg [5], echolocation is not just sensing the presence of an echo. It requires the ability to interpret, evaluate, and identify that echo. Daniel Kish [12] lost his sight as an infant and taught himself to “see” with sonar by clicking his tongue and enabling echolocation ability to spatial structure conceptualization. He reflects the conviction that blind people can enable human echolocation ability and learn to see without sight. Basically, human echolocation [11] consists of transmitting sonar signals and processing echoes to determine the position, size, shape, and other features of targets. This active system can allow skills to navigate chaotic environments in the absence of light without requiring complex systems. Since echolocation can make an important role for acquiring spatial structure information, we look at the possibility to emerge a comprehensive account of echolocation. Related Works
Author Keywords
3D acoustic space, chaotic environments, direct perception, echolocation-based user interfaces, spatial structure conceptualization, virtual surfaces INTRODUCTION
Echolocation can be considered a major tool of spatial structure conceptualization employed by visually impaired persons. The visually impaired can get rich ∗Dr. Nomura is a postdoctoral fellow. †Dr. Shiose is an assistant professor.
In our previous work [9], we proposed an approach to enhance the spatial conceptualization performance of subjects in the experiments to navigate different types of virtual tracks created by a 3D acoustic space system. We verified that the proposed approach is viable and is essential to design novel aural user interfaces as supporting systems for the visually impaired. In another previous work [10], we investigated the comprehensive account of everyday listening based on nonspeech audio cues as the experience to listening to events rather than sounds.
Proceedings Page 17 of 42.
IUI4DR 2008 Proceedings In this work, we have investigated the possibility to enable human echolocation using some experimental results from our previous works. Several volunteers participated in experiments to evaluate their performance on conceptualization (perception and discrimination) of virtual surfaces created by a 3D acoustic space system. A potential application of such echolocation-based user interfaces is to pedestrian navigation system. Recently, navigation services for pedestrians have entered the marketplace. Also, several researchers have investigated the special needs for pedestrian navigation systems [3, 6]. Specifically, Ito [4] has investigated how the events in everyday listening [1, 2] are perceived by blind and sighted pedestrians in a navigation task. Motivation
Figure 1. Experimental setting for an eventual prototype of our echolocation-based user interface.
Unfortunately, we found out that echolocation skills have been virtually ignored in interaction with computers or in traditional user interfaces. Based on the above evidence of Kish’s success to enable human echolocation skills, we believe in the realistic practicalities of our work on future aural user interfaces.
STAFF APPARATUS
Effectively, we hope to contribute with design of future interfaces as supporting devices so that disabled, illiterate or elderly people are not unnecessarily excluded from using systems for the general public infrastructure. Table 1 shows the difficulty for visually impaired persons in developing and even rich countries to access one of the most common orientation systems (guide dog) that do not require modification of the environment. The advantage of using the proposed echolocation-based interfaces is to offer low cost and high efficiency because implementation will not require high technology, adaptation to different languages or modification of the existing environment. Figure 1 presents an experimental setting to echoically perceive and discriminate surface shapes using an eventual prototype of echolocationbased user interface. We suppose that the designed device can be placed on the user’s head like glasses as shown in Figure 1. APPARATUS
Figure 2 presents an experimental scenery with the 3D acoustic space system apparatus to create virtual surfaces. This apparatus is based on our previously conceived “sound visualization [7]” function and “perception of crossability [13]” training system. A schematic overview of the apparatus is shown in Figure 3. A computer performs digital control on an audio recorder and a sound space processor through USB MIDI interface. The audio recorder plays sound source as input data to the sound space processor. This processor generates nonspeech sounds with reflection, reverberation, and movement effects in the 3D virtual space.
SUBJECT
Figure 2. Experimental scenery using the 3D acoustic space system apparatus.
According to our previous work [13], the reflection and reverberation levels were adjusted to −30 dB. The generated nonspeech sounds are (1) discrete sounds whose source is from a walking stick used by the visually impaired and (2) continuous sounds whose source is from a fan noise captured by precise microphones [8]. The use of nonspeech sounds is based on the evidence that our ears and brains extract from information of nonspeech audio cues which cannot be, or are not, displayed visually [8]. For example, knocking on objects tells us a great deal about the materials from which they are made, leading to more important observations about the quality of sound than sight. Also, we adopted “natural” sounds to break away from typical use of “artificial” nonspeech sounds [8]. VIRTUAL SURFACES
A virtual surface is created in the 3D acoustic space and is defined by the geometrical components in the upper view of Figure 4. These components are measured as follows:
Proceedings Page 18 of 42.
IUI4DR 2008 Proceedings Table 1. Relation between guide dog cost and per capita income (*).
N umber of visually impaired persons Cost and time to train a guide dog
Nonspeech Sounds
Computer
Ultrasone Headphone
IBM-Xeon
HFI-2000 Audio Recorder
USB MIDI Interface
Mixing Console
AR-3000
VM-C7100
UM-880 Control Line Audio Line
Sound Space Processor
Mixing Processor
RSS-10
VM-7200
Figure 3. Schematic overview of the 3D acoustic space system apparatus.
Japan ($U S 33, 000∗ ) 300,000 $US 40,000 (2 years)
Brazil ($U S 2, 700∗ ) 170,000 (Census 2000) $US 2,500 (2 years)
Twenty subjects (15 male and 5 female) participated in six experiments to evaluate their echolocation ability in the 3D virtual acoustic space. No participants had audition problems. The participants were college and graduate students aged from 20’s to 30’s. During the experiments, the subjects wore eye mask as shown in Figure 2. Sessions
Basically, the experimental procedure consisted of a training session followed by a testing session. Training Session
Generated sounds
Pa
rt
su ial
rfa
ce
ǩ
ǩ
Axis
The objective of this session was to provide subject’s familiarization on the existence of three virtual surface shapes as shown in Figure 5. We prepared a set of thirty training trials. The three shapes of training surfaces were determined as follows: • Concave surface for α = arctan(0.5).
d
• Plane surface for α = 0. • Convex surface for α = arctan(−0.5). ǰ Subject
Figure 4. Upper view of a virtual surface used as perceptual structure in the experiments.
• The angle α represents the slope setting up the following three shapes of virtual surfaces: – Concave for α ∈ {arctan(0.25), arctan(0.5)}. – Plane for α = 0. – Convex for α ∈ {arctan(−0.25), arctan(−0.5)}. • The variable d represents the distance from the subject’s head to a virtual surface axis facing to the average lobular height. We adopted d = 5 m in our experiments. • The angle θ represents the range of the subject’s head moving from the left to the right side. This angle was set at 90 deg in our experiments. PROCEDURE Experiments
The subjects’ task during the experiments was to conceptualize, that is, to perceive and discriminate different shapes of the above described virtual surfaces.
The adopted nonspeech sound was discrete or continuous, and its moving speed was 40 deg/s or 90 deg/s depending on the experiment. The subject tried ten samples per each shape of training surface in a random sequence. Since the training was supervised, a supervisor told the correct surface shape after each trial. Each trial consisted of a fixed duration of a nonspeech sound presentation (1 s for 90 deg/s or 2.25 s for 40 deg/s) without repetition. Testing Session
According to each experiment, the virtual surfaces were created as follows: • The adopted nonspeech sound was discrete or continuous. • The moving speed of each sound was set at 40 deg/s or 90 deg/s. • The slope represented by the angle α in Figure 4 was set up as follows: – α = arctan(0.25) or arctan(0.5) for concave surfaces. – α = 0 for plane surfaces. – α = arctan(−0.25) or arctan(−0.5) for convex surfaces.
Proceedings Page 19 of 42.
IUI4DR 2008 Proceedings Concave
Plane
Convex
Figure 5. Shapes of virtual surfaces created in the 3D acoustic space.
Then, we prepared a set of five different virtual surfaces corresponding to the five slopes. The virtual surfaces with slope α = arctan(0.25) and α = arctan(−0.25) represented testing trials without any prior exposure in the training session. The 3D acoustic space system created eight samples of testing surfaces for each slope counting forty trials in this session.
Table 2. Average performance(%) of each subject from group A categorized by experiment.
S01 S02 S03 S04 S05 S06 S07 S08 S09 S10
The subject’s task was to tell the perceived and discriminated surface shape to the staff after each trial. At the end of this session, we consulted the subjects about which surface shape was the easiest structure to be perceived and discriminated.
A·1 80.0 77.5 75.0 62.5 60.0 60.0 80.0 50.0 45.0 65.0
A·2 67.5 65.0 72.5 67.5 65.0 70.0 52.5 57.5 37.5 70.0
A·3 70.0 75.0 60.0 57.5 67.5 55.0 62.5 57.5 55.0 72.5
Groups
The subjects were divided into two groups with ten components in each group.
Table 3. Average performance(%) of each subject from group B categorized by experiment.
S11 S12 S13 S14 S15 S16 S17 S18 S19 S20
Group A
In this group, the subjects performed three experiments as follows: • Experiment A·1 with a training session for virtual surfaces constituted by discrete sounds moving at a constant speed of 40 deg/s. In the subsequent testing session, the subjects tried to conceptualize the virtual surfaces by hearing these discrete sounds. • Experiment A·2 with a training session for virtual surfaces constituted by continuous sounds moving at a constant speed of 40 deg/s. In the subsequent testing session, the subjects heard these continuous sounds to conceptualize the virtual surfaces. • Experiment A·3 with a testing session for virtual surfaces constituted by discrete sounds moving at speed of 90 deg/s. There was no training session in this case. This experiment aimed to investigate the direct perception through echolocation by evaluating the influence of no training on performance of subjects.
B·1 72.5 57.5 80.0 65.0 62.5 85.0 67.5 47.5 45.0 72.5
B·2 77.5 67.5 80.0 72.5 67.5 65.0 60.0 82.5 52.5 67.5
B·3 70.0 65.0 62.5 67.5 75.0 70.0 72.5 82.5 55.0 70.0
• Experiment B·2: In contrast to the A·2, the continuous sound moved at speed of 90 deg/s for both training and testing sessions. • Experiment B·3: In contrast to the A·3, the discrete sound moved at speed of 40 deg/s. RESULTS
The subjects in this group performed three experiments as follows:
Tables 2–3 present, respectively, the average performance of each subject from groups A and B on tasks to perceive and discriminate the testing virtual surfaces in the experiments. The experiments are represented by (A·1, A·2, A·3) or (B·1, B·2, B·3), and the subjects by (S01, S02, . . . , S10) or (S11, S12, . . . , S20).
• Experiment B·1: In contrast to the A·1, the discrete sound moved at speed of 90 deg/s for both training and testing sessions.
Table 4 shows average performance of all subjects from groups A and B on trials to perceive and discriminate each virtual surface. The averaged results were catego-
Group B
Proceedings Page 20 of 42.
Table 4. Average performance(%) of subjects from groups A and B on virtual surface conceptualization.
CC+ CCPL CVCV+
Group A A·1 A·2 A·3 58 65 60 50 50 58 75 73 70 50 33 38 95 93 91
Group B B·1 B·2 B·3 63 70 63 55 68 51 69 78 81 50 34 50 91 98 100 66%
61%
Performance (%)
IUI4DR 2008 Proceedings 80 65.5 65.5
69.3 62.5
63.3
69.0
60
Group A Group B
40 20 0
1
2
3
Experiment
Figure 7. Average performance (%) of all subjects on virtual surface conceptualization categorized by groups and experiments.
can perform spatial structure conceptualization through echolocation. 6%
CV as CC CV as PL
Experiment A䊶2 6% 61%
0% Experiment B䊶2 0% 66%
Figure 6. Detailed results for CV- surface shape perception in the experiments A·2 and B·2.
rized under five surface shapes. Each virtual surface is identified as follows: • CC+: concave surface with slope α = arctan(0.5). • CC-: concave surface with slope α = arctan(0.25). • P L: plane surface with slope α = 0. • CV-: convex surface with slope α = arctan(−0.25). • CV+: convex surface with slope α = arctan(−0.5). The graph in Figure 6 presents the detailed results corresponding to the worst performances on surface shape perception of CV-, respectively, 33% (experiment A·2) and 34% (experiment B·2) from Table 4. In this graph, CV as P L means that the testing surface perceived by the subject was plane instead of convex, and CV as CC means that the subject perceived the testing surface as concave instead of convex. In the graph of Figure 7, we verify the average performance of all subjects from each group participating in the experiments. Finally, on the question what kind of surface was easy to perceive and discriminate during the experiments, the convex shape was mentioned by all subjects. DISCUSSION
In Tables 2–3, the early results denote considerable high performances of several subjects on perception and discrimination of virtual surfaces. From sixty results, we verify that only two average performances in Table 2 and other two in Table 3 presented values less than 50%. These results confirm the fact that human beings
Table 4 shows that the performance on CV- surface shape perception in the experiment A·2 was 33% and on CC- surface shape perception with the equal slope was 50%. Also, the performance on CV- surface shape perception in the experiment B·2 was 34% and on corresponding CC- surface shape perception was 68%. In other words, the performance of subjects on concave surface shape perception was better than convex, even though these subjects had found to perceive convex easier than concave during the experiments. The detailed results of the graph in Figure 6 show that the subjects had high tendency to consider the target as plane instead of convex in these experiments. Then, CV as P L was 61% and CV as CC was only 6% in the experiment A·2; CV as P L was 66% and CV as CC was null in the experiment B·2. The graph of Figure 7 shows that the average performance of subjects in the experiment B·3 without training session was 3.5% (from 65.5% to 69.0%) better than the performance in the A·1 followed by the training session. The standard deviation was 7.4% from average performances of subjects in the experiment B·3. Also, when the subjects participated in the experiment B·2 with the sound moving faster than the adopted speed in the A·2, their performance increased 5.8% (from 62.5% to 69.3%). The corresponding standard deviation from average performances was 9.2%. Furthermore, the average performance in the experiment B·1 was 2.2% (from 63.3% to 65.5%) better than the subjects’ performance in the corresponding A·3 without training session. The calculated standard deviation was 7.5% from average performances of subjects in the experiment A·3. Since the corresponding standard deviations were greater than increasing rates due to different comparisons, we verify that the subjects attained similar performances. Particularly, the similar performances of subjects in the experiments A·3 and B·3 assure that the training session did not influence on their performances. CONCLUSION
Although we are still in the early stages of investigating the spatial structure conceptualization through echolo-
Proceedings Page 21 of 42.
IUI4DR 2008 Proceedings cation, there is now evidence that humans can directly perceive and discriminate surface shapes by enabling echolocation skills. In this investigation, twenty subjects participated in the experiments to conceptualize (perceive and discriminate) three different shapes of spatial structures represented by virtual surfaces in a 3D acoustic space. We prepared six experiments according to different conditions such as types of nonspeech sound, speeds of moving sound, and with or no training session. According to the early results, it is surprising that the average performance of subjects on virtual surface conceptualization in all the experiments was higher than 62.5%. Also, comparison of results between the experiments showed that the subjects’ performances were similar even under different experimental conditions. Therefore, we can conclude that the subjects enabled echolocation ability in spatial structure conceptualization through direct perception instead of associative learning. These results encourage us to search future strategies for improving performances of subjects on non-trivial tasks like discrimination between plane and convex surface shapes. Since our affordable echolocation-based interfaces do not require complex and expensive system, we hope to contribute into design of supporting devices for rejected groups represented by illiterate, visually impaired or elderly people in developing countries. The eventual echolocation-based interface advantage is to provide users with the recovering chance to navigate available chaotic environments without high cognitive load or help from others. ADDITIONAL AUTHORS
3. T. H¨ollerer, S. Feiner, T. Terauchi, G. Rashid, and D. Hallaway. Exploring mars: Developing indoor and outdoor user interfaces to a mobile augmented reality system. Computers and Graphics, 23(6):779–785, Dec. 1999. 4. K. Ito. Semantics of tau. Gendaishiso Journal (in Japanese), pages 178–187, 1994. 5. W. N. Kellogg. Porpoises and sonar. The University of Chicago Press, Chicago, 1961. 6. A. Kr¨ uger, A. Butz, C. M¨ uller, C. Stahl, R. Wasinger, K. Steinberg, and A. Dirschl. The connected user interface: Realizing a personal situated navigation service. In IUI 04 - 2004 International Conference on Intelligent User Interfaces, pages 161–168, New York, 2004. ACM Press. 7. S. Nomura, T. Shiose, H. Kawakami, O. Katai, and K. Yamanaka. A novel “sound visualization” process in virtual 3D space: The human auditory perception analysis by ecological psychology approach. In Proc. of 8th Asia Pacific Symposium on Intelligent and Evolutionary Systems, pages 137–149, Cairns, Dec. 2004. 8. S. Nomura, M. Tsuchinaga, Y. Nojima, T. Shiose, H. Kawakami, O. Katai, and K. Yamanaka. Novel nonspeech tones for conceptualizing spatial information. Artificial Life and Robotics, 11:13–17, 2007. 9. S. Nomura, T. Utsunomiya, M. Tsuchinaga, T. Shiose, H. Kawakami, O. Katai, and K. Yamanaka. Designing an aural user interface for enhancing spatial conceptualization. In Proc. of the Second IASTED International Conference on Human-Computer Interaction, pages 205–210, Chamonix, France, Mar. 2007.
Osamu Katai (Dr. Katai is a professor in Kyoto University, e-mail:
[email protected]) Hiroshi Kawakami (Dr. Kawakami is an associate professor in Kyoto University, e-mail:
[email protected]) Keiji Yamanaka (Dr. Yamanaka is a professor in Federal University of Uberlˆandia located at Santa Mˆonica Campus, 38400-902 Uberlˆandia, Brasil, e-mail:
[email protected]).
10. S. Nomura, T. Utsunomiya, M. Tsuchinaga, T. Shiose, H. Kawakami, O. Katai, and K. Yamanaka. Toward novel interfaces using non-speech sounds as events for human perception. In Proc. of the Second International Workshop on Image Media Quality and its Applications, pages 189–194, Chiba, Japan, Mar. 2007.
REFERENCES
12. J. Roberts. A sense of the world: How a blind man became history’s greatest traveler. Harper Collins Publishers, USA, 2006.
1. W. Buxton. Using our ears: An introduction to the use of nonspeech audio cues. In E. Farrell, editor, Extracting meaning from complex data: processing, display, interaction, Proceedings of the SPIE, volume 1259, pages 124–127, 1990. 2. W. W. Gaver. What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5:1–29, Dec. 1993.
11. O. A. Ramos and C. Arias. Human echolocation: The ECOTEST system. Applied Acoustics, 51(4):439–445, 1997.
13. T. Shiose, K. Ito, and K. Mamada. The development of virtual 3D acoustic environment for training ‘perception of crossability’. In Proc. of the 9th International Conference on Computers Helping People with Special Needs, pages 476–483, Paris, Jul. 2004.
Proceedings Page 22 of 42.
IUI4DR 2008 Proceedings
Interfaces for Community Memories Luc Steels VUB AI Lab - Brussels Sony Computer Science Laboratory - Paris
[email protected]
ABSTRACT
Computers and networks are now accessible to millions across the globe. But many more millions have no access at all for a multitude of reasons. A lot has to do with economics and uneven distribution of wealth. But it is also due to the interfaces and applications that are the main target of the computer industry and which are mostly geared towards literate individuals living generally speaking in abundant conditions, or towards business and manufacturing activities in strongly developed economies. This paper raises the question how we could start to address much more seriously the overwhelming majority of the human population and the human activities that do not fall in this category. There are now several initiatives, such as the One-Laptop-PerChild project, that make hardware available to greater groups, particularly children in developing nations. But we believe that it is not just going to be a matter of infrastructure alone. We need to inquire what future generations need the most and how those needs can be satisfied. The goal of this paper is not to introduce new technologies but to be a position paper. We contribute to the discussion on visions for the future and identify projects already going on that may act as successful inspirational cases. We also draw attention to social tagging as an important ingredient of the interfaces to community memories. Author Keywords
Community memory COMMUNITY AND COMMUNITY MEMORY
Current information technologies are strongly geared towards highly developed literate societies and towards individual needs (such as access to cultural goods) or business needs (manufacturing, commerce, etc.). Here we want to draw the attention to an area which is of particular importance in developing countries, although it is also relevant in highly urbanised contemporary environments. The emphasis is on community rather than
Eugenio Tisselli Sony Computer Science Laboratory - Paris
[email protected]
the individual and on the fair organisation of the commons rather than its exploitation. A commons can be as basic as water and air, but it can also be space on the road, wood in the forest, access to public spaces, bandwidth for information transmission, cultural artefacts, etc. Activities of the commons decompose into two aspects: (1) There are processes that supply input to the commons, either natural processes like growth of trees or human activities, like maintainance of infrastructure, production of goods, etc. and (2) There are processes that take from the commons, for example, use the water streaming down a mountain for growing crops on a particular field. To have a sustainable commons requires first of all that a balance is maintained between input and output, which means that those who take from the commons must ensure that the processes to regenerate it are in place. Second, there are almost always conflicts between those that supply input and those that take output, as well as among those that take output if the available resources are scarce. There is often a conflict between a particular community that is managing a commons and outsiders or other communities that feel they should have the right of access to the output of the same commons. The organisation of the commons is a primary function of human groups and if it is not done right, the suffering can be immense or the destruction of the commons can be swift. Given this importance and given that there are many ecological and social systems under extreme threat, we propose that collective tools for managing a commons should be a key target for future information technology. The issues are just as pressing for modern densely populated urban societies where there is a daily fight for access to roads or air and water quality, as it is for indigenous semi-nomadic communities which are trying to preserve their rain forest environment against the onslaught of logging companies. Particularly communities whose members are illiterate, have little ’official’ legal power, or have almost no access to information technologies, are the ones that are most in need. We call the information infrastructure needed for maintaining a commons a community memory. A community memory is a medium for recording and archiving information relevant to the commons and for diffusing this information among members or communicating it
IUI 2008, January, 13-16, 2008, Gran Canaria, Spain.
Proceedings Page 23 of 42.
IUI4DR 2008 Proceedings to those threatening the commons and thus the community. All members making up the community should have unlimited access and be allowed to upload and download information. Once the information is there it becomes possible to ’add intelligence’ to the system in various ways, for example by creating maps containing information in relation to its geographic location, by explicating dependencies between information items in order to bring out trends and predict future evolutions, etc. There have already been several moments in the past where the concept of a community memory has been discussed. Interestingly enough, the first public computerized bulletin board system operating in Berkeley California between 1972 and 1974 was called a Community Memory [1]. Indicative of the altruistic spirit of the times, it was free, except when it involved items that changed hands and involved money. The intended goals were stated thus: Our intention is to introduce COMMUNITY MEMORY into neighborhoods and communities in this area, and make it available for them to live with it, play with it, and shape its growth and development. The idea is to work with a process whereby technological tools, like computers, are used by the people themselves to shape their own lives and communities in sane and liberating ways. In this case the computer enables the creation of a communal memory bank, accessible to anyone in the community. With this, we can work on providing the information, services, skills, education, and economic strength our community needs. We have a powerful tool – a genie – at our disposal; the question is whether we can integrate it into our lives, support it, and use it to improve our own lives and survival capabilities. We invite your participation and suggestions. — Loving Grace Cybernetics 1972 (http://www.well.com/ szpak/cm/cmflyer.html) The means available in the early seventies were of course a far cry from today’s computational infrastructure, but the basic idea of these visionaries to create a ’communal memory bank’ was clearly there and so was the idea to use it for the benefit of the community. Personal computers and the emerging field of knowledge technologies lead in the nineteen eighties to a revival of the concept, although emphasising rather the potential for added intelligence by recording and exchanging knowledge [6]. A decade later, Internet technologies and widespread availability of computers made it possible to achieve many of the aspirations for community memory, and many joint efforts from wikis to blogs strongly go in this direction. However we want to emphasise a key difference. The ’community memories’ we envision here are intended for a real community of real individuals, not a diffuse group that flock anonymously through the Internet and have no real stake in the management of a
commons. A Community Memory is therefore the opposite of a ’Smart Mob’, defined as ”people who are able to act in concert even if they don’t know each other” [5]. It is also different from ’Collective Intelligence’ which is essentially about gathering and diffusing opinion (for example for the popularity of books, clips or music) or collecting information from a large set of anonymous contributors (as in Wikipedia). There are currently a number of projects that develop the idea of a Community Memory in the sense intended here and some of them have successfully been put into practice. We give just some examples and then draw some general conclusions about the requirements and practices involved, particularly from the required interfaces.
Case Study 1: A Community Memory for Handicapped People (Barcelona, Spain)
In the past decade, Barcelona has become one of the most bustling cities in Europe. However, the benefits of recent changes have not reached yet some of the communities that live in Barcelona, such as the handicapped people who use a wheelchair because of their limited mobility. Throughout the streets of Barcelona, people on wheelchairs constantly find obstacles that undermine their access to both public and private spaces. Every day, they must overcome different sorts of architectural barriers, which by themselves reflect a certain degree of neglect towards them. The commons in this case are the streets and buildings which should be publicly accessible. Whereas most users take access to these spaces for granded the handicapped people must gauge a persistent struggle to gain equal access to this urban space. In the year 2005, the Catalan artist Antoni Abad and computer scientist Eugenio Tisselli started a project called canal* ACCESSIBLE [http://www.zexe.net/BARCELONA], which centered around the creation of a Community Memory for people on wheelchairs in Barcelona. The goal was to survey the architectural barriers they encountered, classify them according to a set of categories created collectively, and locate them on a map. The aim was to make these obstacles visible to non-handicapped citizens, and also to the city government. Through the use of multimedia mobile phones, the participants of canal* ACCESSIBLE were able to take pictures of the barriers and send them directly to a web page. This immediate way of publishing contents gave these people the freedom to register whichever obstacle they happened to find and immediately tag it. After only three months, more than 3.000 inaccessible places were recorded and located on the city map. This project had the following ingredients: • Multimedia mobile phones with capabilities for taking pictures and recording video and audio clips were used as the device available to community members.
Proceedings Page 24 of 42.
IUI4DR 2008 Proceedings number of participants continued to publish their images on their own account. Some of them also created a spontaneous extension of canal* ACCESSIBLE, by founding an association that promotes cultural and leisure activities for disabled people. For the members of this association, the original project became a seed for further self-organization. Case study 2: A Community Memory for Motoboys (Sao Paulo, Brazil)
Figure 1. The public browsing interface of canal*ACCESSIBLE. Display of an inaccessible place (in this case a truck obstructing pedestrian area) and location on the map where the problem occurs.
• A simple interface was available on the device for tagging the pictures with additional information. • The spatial location system was based on the correspondence between a city address and a pair of geographical coordinates. • There was a central database system for uploading and downloading information through the GPRS network. • There were additional computer-based interfaces to allow browsing and editing of information. • All information was brought together through maps of the city, both in the Web interfaces and in printed form During three months, the participants of canal* ACCESSIBLE turned into active broadcasters, with the possibility of sending unlimited information. Each week, they got together in a meeting space which was especially set up for them at the Centre d’Art Santa Mnica, an arts centre located in the heart of Barcelona. During the meetings, they discussed different strategies for finding and publishing their images. On some occasions, they used the digitized map as a reference, and organized special trips to cover unexplored areas of the city. Thus, the map became both a record that reflected their activity and also a live Community Memory interface, which they used to decide future actions. The project was widely disseminated through all types of media ranging from press to TV, and of course the Internet itself. This maximized the communicative potential of canal* ACCESSIBLE, and gave it widespread attention. At the end of the project, several thousands of maps of Barcelona with colored markers that corresponded to the architectural barriers were printed and handed out to the public and the city’s authorities. Although the project ended officially in March 2005, a
The city of Sao Paulo in Brazil is known as one of the world’s biggest cities, with an estimated population of more than 17 million inhabitants. As with most of the capital cities in developing countries, Sao Paulo has grown quickly and chaotically, despite the implementation of urban planning in some of its areas. Motoboys, a hybrid word that combines motorcycle and boy, are messengers who dash across the streets of Sao Paulo on their motorcycles, delivering all sorts of things, from pizzas to confidential documents. Motoboys are considered both as essential motors of the city’s economy and as a growing problem. Every day, thousands of motorcycle messengers have to literally hustle their way between cars. The lack of a special lane for motorcycles, and the pressing need to rapidly complete their deliveries, forces them to drive at full speed through the narrow space between road lanes. This practice unfortunately results in an alarmingly large number of accidents, often with fatal outcomes both for motoboys and car drivers. Thus, an essentially conflicting situation arises in the streets of Sao Paulo between car drivers who believe that they have the exclusive right to use the roads, and the motoboys who believe that it is their right to work in secure conditions, and to earn a decent living through their jobs. So we get another classical conflict about a commons, in this case the traffic areas in a city. Motoboys have sought to organize themselves in order to fight for a better working environment, yet they have been continuously fingered by large sections of the paulista society, not only because they are accused of causing traffic accidents, but also because they have been associated (in many cases in an unjustified way) with criminal activities, such as theft, kidnapping and rape. The ongoing conflict between motoboys and the citizens of Sao Paulo unfolds mainly on the scenario of a tangible commons, which is formed by the network of city streets and roads, but also on an intangible layer. Within a society, the daily interactions between different groups or individuals generate a mechanism for assigning reputation. Roughly considered, this mechanism will reward individuals or groups whose activities are perceived as being positive for the society with a high reputation, while at the same time it will punish those whose actions are seen as detrimental to society. An equal access to this negotiated social space can also be considered a commons, albeit an intangible one. In the case of the motoboys of Sao Paulo, a mix of disinformation and a negative bias stimulated by the local media have damaged their reputation in a way that they
Proceedings Page 25 of 42.
IUI4DR 2008 Proceedings
Figure 2. Web-interface for the Community Memory of the motoboys. They can browse uploaded media materials and edit text and tags associated with them.
claim to be unfair. Motoboys also argue that they are simply workers who want better and more secure conditions to carry out their daily jobs, for their own good and also for that of their fellow street users. One year after canal*ACCESSIBLE, the team formed by Abad and Tisselli set up a similar project in Sao Paulo, called canal*MOTOBOY [http://www.zexe.net/SAOPAULO]. A group of motoboys armed with multimedia mobile phones were invited to publish images, audio and video clips from the city streets to the project’s web page. This time, the participants were free to choose the topics that they wanted to deal with, instead of having a pre-fixed goal. These topics were discussed during weekly face-to-face meetings in a space that was set up at the Centro Cultural Sao Paulo, one of the first multidisciplinary cultural centers in Brazil. In canal*MOTOBOY every participant has a personal section on the project’s web page, where they can send all the multimedia information they wish, regardless of the topic. However, all of this information has to be annotated using tags, which can be associated to the contents directly on the mobile phones, or by using a special web interface after the contents have been sent. The aggregation of all the motoboys’ tags is shown on the main page of canal*MOTOBOY as a tag cloud, which is a list that contains the most significant tags in the annotation system. Through this linguistic interface, we can see that an emergent lexicon has evolved throughout the project’s duration, including words which reflect the group’s interests and concerns. Among the words used by most motoboys we find ”fala”, or ”speak” in portuguese, a word that was used to annotate the interviews that the motoboys did using their phones, ”dia a dia”, used to tag what they considered to be their ”daily experiences”, ”transito” (traffic), ”trabalho” (work) and, unfortunately, ”acidente” (accident). Here, the tag cloud
becomes an interface that not only conveys an immediate, linguistic model of the Community Memory generated by the motoboys, but also serves as a tool for browsing through their multimedia files by keyword. At the time of writing this paper, the canal*MOTOBOY project is still going on. While the face-to-face meeting sessions now happen outside their original space and have become sporadic, most of the participants are still actively feeding contents to the web page. An essential aspect of the project is that it involves an act of collective appropriation. The web page becomes the moral property of the participating motoboys, who have full access to whatever contents they publish in it. As canal*MOTOBOY becomes increasingly popular through its extensive dissemination, its participants are starting to use it as a platform for self organization, and to facilitate dialogue with members of the government of Sao Paulo, academics and their fellow citizens in general. Some of them have participated in interviews and conferences, publicly expressing their points of view on how the access to the city’s streets should be regulated, and which are the working conditions that they desire. They also hope that, through the project’s popularity, their image within the paulista society will be transformed into a more positive one. Case study 3: A Community Memory for Pygmees (Rainforest, Congo Basin)
One of the ecosystem currently under enormous stress are the rainforests in the Congo. They contain wood of great value to logging companies but the trees may be of equally great if not more value for the indigenous people who actually live and must survive in these forests, such as the Mbendjele pygmees. A tree may be important for them, for example because they are a rich source of caterpillars that supplies food, or because they stand on burial grounds and are considered to be sacred. So this is a classical example of a commons (the rain forest) with competing forces interested in exploiting it. Although the pygmies have managed the forest for centuries in a sustainable way and have their own internal systems for dividing output from the commons and making sure it can regenerate, the same interests are not shared by logging companies, who generally are after the wood without concern for sustainability or for local communities. The only force which restricts logging companies is the acquisition of a special label of the Forest Stewardship Council that attests that the wood has been harvested in a sustainable, environmentally friendly, and socially responsable way. This label may one day be obligatory and already allows a premium price for the wood. To address the question how the pygmies, an illiterate people devoid of any prior experience with information technology, could map out knowledge about trees in the forest for themselves and for the logging companies, Jerome Lewis, an anthropologist at the London School of Economics, devised a Community Memory. The pygmies are semi-nomadic and there are only 3000 surviving members. They form a clear prior community living based on the principle
Proceedings Page 26 of 42.
IUI4DR 2008 Proceedings of abudance [3], meaning that they share all resources they have, so that they do not need to extract more than they need, and they restrain themselves or move to other areas when they sense that an area of forest is no longer yielding what is needed for life sustainance. The Community Memory built by Lewis and his coworkers [2] contains the following ingredients: Portable palm-pilot-style devices which are available to members of the community that want to participate in the recording of knowledge. An iconic interface with a discrimination tree based decision system that is used to tag trees or forest areas. A space localisation system based on GPS that automatically supplies information about location. A database system for uploading and downloading information in a fully distributed fashion but not necessarily when information is being gathered in the forest. Geographical Information Resources, so that the information can be organised and displayed on maps, like Google maps, which are easily recognised by the community members. An external information system to communicate crucial information to the Logging Companies which are the primary competitors for the communitys commons. This Community Memory is today fulling functioning and serving its role to be a medium whereby one community can bring out the value of its own commons to another group of users. IMPLICATIONS
This section now draws a number of general conclusions from these various Community Memory projects. 1. The first point is surely that technology accounts for only a small percentage of the success of a project. By success we mean that the source of the tension around which the community has galvanised is at least managed if not resolved. A modus vivendi has been found which is satisfactory to all partners. Unless the Community Memory is critical for communication and information tracking it may no longer be needed and seize operation. The biggest factor towards success appears to be setting up the social organisation of the communication itself, which has to be done by social workers with strong ties to the community and special organisational skills. Face to face meetings between community members appears essential and it is crucial that contributions of members to the Community Memory is recognised as coming from certain individuals. This is in contrast to the anonymity of other Internet infrastructures which make it usually possible to hide identity leading to severe problems with security, mobbing, and improper appropriation of work of others as a natural consequence. 2. Next we observe that computers are of course present in the background as servers, but they are no longer the primary vehicle through which the system interfaces with users. Instead, we see palm-pilot like devices, mobile phones, or enhanced cameras. They are networked (although maybe not continuously) and often have lo-
calisation functionality through GPS. Computer-based interfaces are complementary, but they do not go much beyond simple web browsing for displaying information that has been brought together and editing tools for adding text. A large number of projects use text messaging (SMS). In Nigeria, for example, citizens used a text messaging application to mobilize and monitor the 2007 Presidential Elections. With an estimated 50 3. Community Memory projects deal often with persons who are not literate in the use of technology nor even in basic reading and writing. They often envision environments which involve inhabitants of remote rural locations or members of marginalised groups in urban societies. Even though these projects can also reach people who are already users of communication technologies, it cannot be assumed that this will always be the case. Therefore, a fundamental requirement for the Community Memory interfaces is simplicity. An immediately graspable and easy-to-use set of tools for transmitting and managing contents are central for the success of these initiatives. By simplicity, we mean that the interfaces are designed in a way that enables direct, uncluttered action. Each function must be stripped to its bare essentials, taking good care of providing just the necessary functions, but none more. 4. Social tagging has recently emerged from sharing sites like Flickr as a powerful tool for labeling and organising content in an intuitively satisfying way [7]. This idea, can and has been used effectively to organise the content of Community Memories, as it is easy to grasp even for non-literate people [8]. In the case of the Pygmy Community Memory, the tags used to circumscribe the value of a tree or area are fixed and the interface to choose them is organised as a decision tree. The top nodes contain items like hunting, gathering, religion, farming. After having chosen gathering, the next choices might include caterpillars or yams, etc. The interface is entirely iconic. Such an interface is not difficult to build from a technological point of view, but it must obviously be designed in very close interaction with the people involved and a very deep understanding of their culture. A bottom-up ontology of tags is possible as well. The motoboys project used tag clouds as an instrument for users to see trends in tags and to help regulate their own usage. 5. The interfaces for the post-editing of content should be kept simple, even though they usually offer a more complex set of functions, and will probably be used only by the most experienced or eager participants of the Community Memory projects. The basic functions of the content editing interfaces should be adding (for example, annotating an image with text or tags), modifying (modifying text, tags, or changing the order of appearance of the contents) and deleting. They should be implemented with great care, in order to create a robust system that will always consider the possibility of mistakes on the user’s part.
Proceedings Page 27 of 42.
IUI4DR 2008 Proceedings 6. It is important to have two types of interfaces. The one used within the community which is only accessible to members of the community and can be changed by them. Users must feel entirely safe to add information. As a community is often in conflict with others about their commons, it is also crucial the Community Memory has additional facilities for communicating selected information to the outside world. For example, the Barcelona canal* ACCESSIBLE produced physical maps. It is obvious that once communication to another community exists it can be enriched and start working in both direction. For example, in the case of the Pygmy Community Memory, a new mode of communication was added by the logging companies based on radio. it contained items of local interest such as music but also information about upcoming logging activity. It is clear that only when such cross-community information channels have been established that conflicts between communities have a chance of being resolved. CONCLUSIONS
Current computer and communications technologies make it possible to build a new class of applications which we have called Community Memories. Community Memories are intended to be a support infrastructure for real communities, that means groups of individuals that have to manage a commons, either among themselves or in conjunction, and often in opposition, to another group. Managing a commons means that problems in the inappropriate use of the commons can be detected and reported, that rights can be stated and communicated, and more generally speaking, that the commons remains sustainable and is used fairly. Setting up a community memory uses the same technologies as now common on the Internet, but often they need opposite solutions. Instead of hiding identity, identity should be recognizable. Computers are no longer the main interface device. Instead of complexity, interfaces should be as simple and rigid as possible. If there is any complexity it should be hidden instead of being a selling point. Community Memories are almost always nonprofit ventures and may disappear as fast as they come into existence. There are still tremendous opportunities for adding additional intelligence to currently existing systems, specifically to predict the future evolution of the commons given its current state, or to compare different future trajectories so that an intelligent choice can be made.
1. Colstad, K. and E. Lipkin (1975) Community Memory: a public information network. ACM Sigcas Computers and Society. 6(4) p.6-7. 2. Hopkin, M. (2007) Mark of Respect. Nature, 448, july 2007. 3. Lewis, J. (2004) From Abundance to Scarcity. Indigenous resource management and the industrial extraction of forest resources. Some issues for conservation. Centre for African Studies Seminar. Edinburgh. 4. McKibben, B. (2007) Deep Economy. The Wealth of Communities and the Durable Future. Times Books, New York. 5. Rheingold, H. (2002) Smart Mobs. The next social revolution. 6. Steels, L. (1986) From Expert Systems to Community Memories. In: Bernold, T. (ed.) Expert Systems and Knowledge Engineering. Conf. G. Duttweiler Institute, Ruschlikon, Switzerland. p. 17-29. 7. Steels, L. (2006) Collaborative tagging as distributed cognition. Pragmatics and Cognition, 14(2):275–285. 8. Steels, L. and E. Tisselli (2008) Social Tagging in Community Memories. AAAI Spring Symposium. Stanford, Ca.
Acknowledgement The writing of this paper at the Sony Computer Science Laboratory in Paris was partly supported by the ECAgents project funded by the Future and Emerging Technologies program (IST-FET) of the European Commission as the Streps TAGORA project. The information provided is the sole responsibility of the authors and does not reflect the Commission’s opinion. REFERENCES
Proceedings Page 28 of 42.
IUI4DR 2008 Proceedings
IndicDasher: A Stroke and Gesture based Input mechanism for Indic scripts
Srinivas N K CDAC Bangalore
[email protected]
Nobby Varghese CDAC Bangalore
[email protected]
ABSTRACT
RKVS Raman CDAC Bangalore
[email protected]
INTRODUCTION
In last two decades, considerable research has been conducted on online handwriting recognition systems. Yet no usable system has evolved as far as Indic Scripts are concerned. The crux of the problem is tied with the large number of distinct scripts involved along with their varying characteristics and idiosyncrasies of the user. Indic scripts comprise large number of glyphs involving complex curves and are generally orthographic in nature. They also have complex compound characters of variant width that change shape by merging, overlapping and superimposition of the individual characters. This makes development of handwriting recognition systems a deeper problem for Indic Scripts. Dasher is an informationefficient textentry interface, driven by natural continuous pointing gestures. We have attempted to enhance Dasher by adding gesture recognition to make the user more comfortable in using Indian scripts which has a large number of characters. IndicDasher as we call it, aims to be a bridge between full blown handwriting recognition systems and gesture based input methods to achieve maximum utility for Indic Scripts.
With the latest trends and development in technology, the need for more sophisticated manmachine interfaces is increasing. Due to the large alphabet size in the case of Indic scripts, interaction with the computer using the conventional keyboard has long been a bottleneck. One main reason is that one must remember various combinations of keys to input a character. However, with the development of penbased devices such as Tablet PC, PDA, etc. handwritten input for text entry provides a more natural alternative. This solves the problem of large alphabet size and helps in extending the reach of Information Technology to a larger community. Hence, handwriting recognition for text input acquires great significance in the context of Indic scripts. At the same time, the large number of characters in Indic Scripts and their curvylinear nature causes impediments to the development of handwriting recognition systems in these scripts. Currently a marriage of gesturebased input with minimal stroke recognition mechanisms seems to be a more viable solution. Dasher is one such interface which we have modified and added stroke recognition capabilities to use it to input Indic Scripts.
Author Keywords
Online handwriting recognition, stroke recognition.
In this paper we discuss the modifications we incorporated to Dasher to suit Indic scripts and provide reasons for the same.
ACM Classification Keywords
H5.2. [Information interfaces and presentation]: Input devices and strategies
About Dasher
Dasher is an informationefficient textentry interface, driven by natural continuous pointing gestures [1]. Dasher is free software which is distributed under the same license as GNU/Linux, the GPL.Dasher is a competitive textentry system(Figure 1) wherever a fullsize keyboard can not be used. Dasher is used to assist disabled people to interact with
1
Proceedings Page 29 of 42.
IUI4DR 2008 Proceedings
computers. These users may have lost the use of one or both hands. Some of them rely on computers to bridge communication with others.
is indicated using a diacritic sign known as maatra. For example, Hindi has 36 consonants that can be modified by any of 12 maatras, and almost every consonant can be bound to another. That leads to around 1,500 symbols. The shape of a maatra is often completely different from the corresponding vowel. The shape of a consonant also changes when it combines with a vowel or with another consonant.
Figure 1
Figure 2 shows the set of vowels and corresponding maatras in Devanagari. Figure 2
Selected characters
Most likely character This makes developing handwriting recognition systems for Indic Scripts a difficult task because of the huge inventory of characters and associated training and learning process involved. Thus researchers are investing their time in exploring newer methods to cater to designing stroke based input mechanisms for Indic scripts
Curser Dasher uses prediction by partial matching, in which a set of previous symbols in the uncompressed symbol stream is used to predict the next symbol in the stream. It employs continuous input by dynamically arranging characters in multiple columns positioning the next most likely character near the user’s cursor pointer in boxes sized according to their relative probabilities.
RELATED WORK
Most handheld pen devices are supplied with software to recognize handwritten characters, possibly using a special alphabet. The gesture sets range from those that are designed to be very efficient, to those that emphasize ease of learning. Commercial systems are generally designed to be easy to learn. Early devices attempted, with limited success, to recognize users’ natural handwriting and hence required no learning at all. Current systems include Graffiti for the Palm Pilot, and Jot for Windows CE (also available on the Pilot). Researchers have also proposed more efficient alphabets such as Unistrokes, which maps simple gestures to common characters regardless of mnemonic similarity.
In the remaining sections, we will discuss about addition of stroke recognition to Dasher to support Indic scripts. Indic scripts
India has 10 official scripts Devanagari, Tamil, Telugu, Kannada, Gujarati, Gurmukhi, Oriya, Bengali, Malayalam and Urdu. All of them except Urdu script have primarily evolved from a common origin Brahmi script. Thus, despite their noticeable differences in the visual characteristics, they are analogous in many aspects. They are defined as syllabic alphabets where basic unit is a syllable. [2] Also, the basic graphic units show distinctive internal structure and a constituent set of graphemes.
Solutions for input in Devanagari scripts using a physical keyboard have been around for a while but none of them have emerged as a standard mechanism due to several usability related concerns. HP has come up with a Gesture Keyboard which consists of a keyboard and a stylus. [3]
Indic script consists of a number of consonants and a set of vowels. In a consonant vowel combination, the vowel part
2
Proceedings Page 30 of 42.
IUI4DR 2008 Proceedings
Conceptually, it is a stylus sensitive keypad that supports tapping for getting the base consonants and some symbols, while it also recognizes handwritten maatras as per handwriting recognition methods. The overhead here is that the user has to search in the keyboard to find the consonant and accordingly draw the maatra. Later this needs to be verified on the screen, which is a tedious process. IndicDasher
IndicDasher, is an enhanced version of existing Dasher with features that help the user to use Indic scripts effectively. In Dasher, adding an appropriate configuration file called alphabet.xml file can enable new scripts and languages. In English there is no concept of maatras and hence it is easy and efficient to select alphabets from the incoming blocks. However in Indic scripts most of the consonants need to combine with a maatra to form a new character. The user selects a consonant from one block and maatra from other block to form the combination. By adding gesture recognition, it becomes easier to add maatra to consonant. The user points to the consonant and draws the maatra on the interface(Fig 3). The possible maatras are displayed in the suggestion box, which is a cue to the user. The selected consonant and the maatra which is drawn combines to form the character and is displayed in the edit box.(Fig 4,Fig 5)
Suggestion Box
Multistroke corresponding to vowel
Figure 5
Figure 3
Vowel added with consonant
Selected consonant
In addition to gesture recognition, IndicDasher is integrated with the speech component. If user selects speak mode, characters in the edit box will be read out by the speech component once the user stops writing. In spell mode each character, which is being added to the edit box, will be spelled.
Stroke drawn
System Overview
Figure 4
Considering the intricacies of Indic scripts, we have
3
Proceedings Page 31 of 42.
IUI4DR 2008 Proceedings
designed our system as a layered framework. The primary design goal of our system is to achieve maximum flexibility and usability. For that, we have designed a user adaptive, 3 layer approach that processes the input in three stages. The basic layer processes the input at stroke level by removing noise , the middle layer extracts the features from the strokes and third layer maps the strokes to maatraas. The stages are explained briefly below.
Totally nine relative positions are defined namely top, bottom, left, right, topleft, topright, bottom left, bottom right and inside. Recognition Process
Each handwritten symbol is represented by a set of strokes SS = {st1 ,st2 ,........, stn } where each stroke is a sequence of points. In the feature extraction phase, each stroke sti is transformed into an intermediate stroke level representation. It represents an individual stroke in terms of a direction vector D = {d0, d1, ........, dn1 } where each element d i is a direction approximation of a section of a stroke st i , radial moments and relative position.
Preprocessing
The basic unit for recognition is referred to as a stroke, which is usually a segment of pen motion from the pen down to the penup position. The input of the system is a sequence of strokes obtained from the movements of a mouse or a penbased device. In the preprocessing stage, the input stroke sequence is processed to reduce the noise[4].
The stroke level information is given to the string match algorithm to identify nearest matches from the stroke database. To resolve ambiguity in stroke level recognition, radial moments of the stroke is used.
Feature Extraction The input to the feature extraction phase is a set of preprocessed sequence of strokes. In this phase, each basic stroke is processed to trace its directional flow. The system defines 8 basic directional flows as shown in fig 6.
Training
The Indic Dasher is provided with the default training for all maatras in Indic scripts. The user can make the system adapted to his writing style by giving training. All the unique strokes of a script are manually identified and given unique labels. The collection includes all the commonly used maatras of a script. In the training phase, the stroke information is extracted for the all the basic strokes of a script defined by the system. From the intermediate stroke representation of the feature extraction phase, basic stroke database and character database are built. Apart from the set provided by the system, the user can train the new symbols for recognition. Training Interface is shown in the following figure 7.
Figure 6 An intermediate representation of the stroke is formed which describes the stroke as a sequence of directional flows in units [5]. Further smoothening is applied to the direction sequence to remove possible errors of the conversion phase [6] . This phase simplifies input data so that feature extraction rules can be written in terms of direction rather than sequences of points. Radial moment [7] of the stroke is also computed in the extraction stage. The intermediate representation stores the temporal and relative spatial information of strokes. Relative Position of a stroke is its position to the previous stroke in the sequence.
4
Proceedings Page 32 of 42.
IUI4DR 2008 Proceedings
REFERENCES
1. David J. Ward, Alan F. Blackwell, David J. C. MacKay “Dasher—a data entry interface using continuous gestures and language models”, Proceedings of the 13th annual ACM symposium on User interface software and technology 2000 2. Sriganesh Madhvanath. Deepu Vijayasena. Thanigai Murugan Kadiresan. “LipiTk: A Generic Toolkit for Online Handwriting Recognition”, 10th Intl. Workshop on Frontiers in Handwriting Recognition, 2006. 3. R. Balaji, V. Deepu, Sriganesh Madhvanath and Jayasree Prabhakaran. “Handwritten Gesture Recognition for Gesture Keyboard”, Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR10), 2006
Figure 7 Integration with Dasher.
Thus this matraa recognition module is overlaid on the Dasher interface to enable it to recognize the pen input from the user. Appropriate training files are loaded as per the default input language chosen in Dasher. Thus Dasher is now enabled to recognize the matraa for that particular script. During the usage of IndicDasher the user needs to focus on a certain consonant and write the stroke using a pen device. The stroke is recognized and appended to the consonant in focus to output the desired syllable. Thus Indic Dasher not only enhances the current capabilities of Dasher when it comes to ease of use with Indic Scripts, it also provides a new mechanism to input text into computers for specially abled users. Indic Dasher has been successfully tested for 4 Indian scripts (Devanagari, Telugu, Kannada and Malayalam).
4. MStephen M. Watt and Xiaofang Xie"Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition", pp. 423 437, Proc. Maple Conference 2005 5. SungHyuk Cha, YongChul Shin, Sargur N. Srihari, “Approximate Stroke Sequence String Matching Algorithm for Character Recognition and Analysis”, Proceedings ICDAR '99 6. Chen, S. F., and Goodman, J. “An empirical study of smoothing techniques for language modeling.” Computer Speech and Language, 13:359394,1999.
Conclusion and Future Work
7. Desai M and Cheng H D (1994).“Pattern Recognition by Local Radial Moments”. InProceedings of the International Conference on PatternRecognition. Pp. 168172. 1994
IndicDasher is an extension of Dasher in an intuitive manner with stroke and gesture recognition. Indic dasher combines the capabilities of Dasher with stroke recognition so as to enable fast and efficient text input. It's usability is augmented by continuous speech output. Future plans include extending IndicDasher to other languages like Thai, Khmer etc.
5
Proceedings Page 33 of 42.
IUI4DR 2008 Proceedings
An Outline of a Multilingual Natural Language Text and Speech Interface for Computing Devices in the South Asian Context Anil Kumar Singh Language Technologies Research Centre IIIT, Hyderabad, India
[email protected]
ABSTRACT
In this paper, we present the outline of a natural language text and speech interface for computing devices. This interface is based on the idea of a Universal Speech Interface (USI), which is a universal paradigm for human-machine speech communication. The vision of USI is to have ubiquitous human-machine interactivity via speech, with the focus being on simple machines. It will have functionality somewhere between a systems based on full scale natural language interaction and the normal telephone-based systems. The computing machine can be any gadget, appliance or automated service which must be communicated with. It can even be robot. Our focus in this paper is on describing how such an interface can be extremely useful in the developing regions like South Asia. We also describe an initial model for the natural language interface (NLI) and how it can be combined with existing open source speech recognition and synthesis systems to build a working interface for South Asian languages. INTRODUCTION
Computing is becoming more and more ubiquitous, to the extent that not only people unfamiliar with computers but even those who cannot read and write are beginning to have access to some computing machines. Even if such people cannot afford to own computing machines, they still may have to interact with these machines, e.g., mobile phones, ATMs, computers in Internet cafes etc. There is a requirement for an interface that allows these people to interact with computers. Such an interface, if it based on natural language and speech processing, may be a convenient way to interact with computing devices, even for people who know how to read and write and are familiar with computers. However, given that fact that real natural language (NL) is so complex that, at present, it is not possible to have an interface that can handle all the complexities of an NL. Even if it was theoretically possible, it will not be practical in the near future because most of the computing devices may not have the capabilities and language resources to process real NL. Also, general purpose speaker independent speech recognition is very difficult. Fortunately, we do not always require an interface that can handle full fledged NL because most of the computing devices (like a printer) have very limited functionality and interacting with them is possible with a restricted language that covers a small subset of the real NL.
And we have the theoretical and practical capabilities to process such restricted languages and also to build a very accurate speech processing system for such languages. In this paper, we present an outline of a proposed method of interaction with computing devices. This method assumes that only a restricted language is necessary to interact with most of the computing devices, including state of the art robots. Therefore, just as it possible to build any kind of graphical user interface (GUI using only a few GUI primitives, it is possible to build any kind of NL and speech interfaces using a simple language model, whether rule based or statistical. The basic idea is derived from the Universal Speech Interface [13] proposed by Rosenfeld, which in turn is inspired by the success of GUIs and the Graffiti handwriting recognition system. MULTILINGUALISM AND THE SOUTH ASIAN CONTEXT
Building a natural language and speech interface is hard enough, so building such an interface that can be easily adapted for many language seems to be much more difficult. However, two facts make a solution to this problem feasible. The first is, of course, that we are focusing only on restricted languages because restricted languages are adequate to interact with most of the computing devices, as pointed out earlier. The other important fact is that we are focusing on South Asian languages. As has been widely accepted now (following Emeneau [4, 5]), South Asian forms a ‘linguistic area’, i.e., even though there are a large number of languages in this region, and they belong to several different families, they have a lot of similarities due to the phenomenon called ‘convergence’. The reasons for this include long term (millenia) contact, frequent migrations, changing dominance of different communities etc. Because of convergence, it is easier to build a multilingual or a crosslingual NLP application for these languages [16], than to build a different application for each language from scratch. The similarities among the most of the South Asian languages are at all linguistic levels such as lexicon, morphology, syntax and semantics. Some other factors important in the South Asian context are: the lack of linguistically trained people, the lack of NLP researchers, the lack financial resources, the fact that the small elite which has most of the resources does not use local languages for formal purposes, the lack of computational support (in existing operating systems and user programs) for
Proceedings Page 34 of 42.
IUI4DR 2008 Proceedings the South Asian languages, and the lack of infrastructure needed to build complex and advanced NLP applications such as those involving multimodal interaction. In the light of these factors, the method that we suggest in this paper can be more practical than other more sophisticated methods. Moreover, the resources and infrastructure may be deficient, but the need for a natural language and speech interface in this region is more than in the developed regions of the world for obvious reasons. SOME PREVIOUS WORK
A great deal of work has been done on NL and speech interfaces and dialogue systems. In this section we will mention only some of the work that is more relevant to this paper. Victor Zue discussed the advances and challenges in conversational interfaces in 1997 [19]. They summarized the progress in this area, discussed the issues faced by researchers in trying to build such systems, and they presented some unmet challenges in this area. The premise on which their discussion was based, was “that human language technology will play a central role in providing an interface that will dramatically change the human-machine communication paradigm from programming to conversation” (emphasis in the original). In 1999, J. Glass [6] discussed the challenges for spoken dialogue systems. Even earlier, in 1995, Laengle et al. [9] had proposed KANTRA, a natural language interface for intelligent robots. In 2000, Rosenfeld [14] described an attempt at designing and evaluating a ‘universal human-machine speech based interfaces’. An extended version of this work was presented in 2001 by Rosenfeld et al. [13]. In the same year, G. Chung [3] described his work on building multi-domain speech understanding with flexible and dynamic vocabulary, and Brad A. Myers [10] suggested how handheld devices and personal computers could be used together. In 2002, Toth et al. [18] proposed an application generator for speech interfaces to databases and Nichols et al. [11] described their work on generating remote control interfaces for complex appliances. In the same year, Shriver and Rosenfeld [15] presented the work on keyword selection for the Universal Speech Interface project. In the next year, Nichols et al. [12] described personal universal controllers for controlling complex appliances with GUIs and speech. In 2004, Tomko and Rosenfeld presented a study of user feedback about the ‘speech graffiti’. In the same year, Glass et al. [7] presented a framework for developing conversational user interfaces. In 2005, Tomko et al. [17] presented further work on the ‘speech graffiti’ project for efficient human-machine speech communication. The work being described in this paper was originally started by this author in 2004. At that time we had planned to build a natural language and speech interface for Ubiquitous Mobile Robot (UMR)1 . Part of the project was implemented, but due to various reasons, the project was not completed. The proposed work is an extension of that work, but with a 1
http://ltrc.iiit.ac.in/anil/projects/nli-speech/
larger scope and with some modifications. The earlier work was targeted mostly on robots, but the present work is for any kind of computing device. The computational model for the NLI has also been revised significantly. UNIVERSAL SPEECH INTERFACE
Since the framework being proposed in this paper for South Asian languages is closely related to the work on Universal Speech Interface or USI, we describe the core idea of USI in this section. USI is a universal paradigm for humanmachine speech communication. The vision is to have ubiquitous human-machine interactivity via speech. The focus is on simple machines (or computing applications) which do not require the full expressive power of NLs for interaction. A USI based system lies somewhere between a systems based on full scale natural language interaction and the current telephone-based systems. In other words, it is more regular than a natural language, but more flexible than simple hierarchical menus. It is not aimed at applications requiring truly intelligent communication. The targeted machine can be any gadget, appliance or automated service which must be communicated with. USI is inspired by the success of Graphical User Interfaces and the Graffiti handwriting recognition system. The major features of a USI based system can be described as: • A universal speech interface style: – – – –
A universal metaphor Universal user primitives Universal machine primitives A universal display style
• Moderate and incremental user training – Learning the USI core should take only a few minutes • Standardized universal phrases • An ontology of Interaction Primitives – – – – – – –
Discovery interactions Navigation interactions Confirmation interactions Backtracking interactions Reorientation interactions Error correction interactions Help correction interactions
OVERVIEW OF THE PROPOSED INTERFACE
In this section we present an overview of the proposed interface as illustrated in Figure-1. As mentioned earlier, the interfaced is based on the USI philosophy. The interface requires systems for speech recognition, speech synthesis and a natural language interface. For speech recognition, a system like Sphinx [8] can be used. Sphinx provides the APIs and tools for developing a speech recognition system. For
Proceedings Page 35 of 42.
IUI4DR 2008 Proceedings Natural Language Interface S-Model (≈ Syntax and Semantics)
L-Model (≈ Lexicon)
Function Generator
D-Model (≈ Domain)
NL Input
NL Output
Speech Recognition System
Speech Synthesis System
APIs for Controlling the Device
Computing Device Voice Input
Voice Output
Figure 1. An outline of a natural language and speech interface for computing devices
speech synthesis, either Festvox [1] or Flite [2] can be used. Flite has the advantage that it has a low-memory footprint, which makes it suitable for small or less powerful devices. We are working on the design of a natural language interface, which is being implemented in Java. The proposed natural language interface consists of three component models. The first is the L-Model, which roughly covers the lexicon and morphology, i.e., words and multi word expressions. The second is the D-Model, which models the domain, since our interface will have to be customizable for different domains. The D-Model is required because we do not want to model the complete natural language in one go. The third is the S-Model. This model roughly covers the syntax and semantics of the restricted language. It is connected with the other two models. Another important part of the interface is the function generator. This component connects the APIs for controlling a device (possibly provided by the operating system) with the language model (L-Model, D-Model and S-Model). It does so by using the language model to generate functions which use the device APIs to perform various tasks. Finally, there is generator, which outputs a sentence in text form when a function is called by the device. This generator also uses the language model. The models are loaded from XML files and corresponding Java objects are created. When a human user gives a spoken command, the speech recognizer converts the sentence in text form. This sentence is analyzed by the natural language interface. Based on the information in the language model,
an appropriate function (previously generated by the function generator) is called and the device performs the task. When the device has to provide some feedback or carry on a dialogue with the human user, it also calls an appropriate function. This function causes the natural language generator to output a sentence as text. The text is passed on to the speech synthesizer. The output of the synthesizer is speech which is heard by the human user. The most important point here is that the language model and its three components need not be extremely expressive. In fact, they can even be only a little more expressive than for formal languages. Therefore, the syntax and the semantics covered in the S-Model are fairly simple and restricted versions of the syntax and semantics for a complete natural language. In the simplest case a sentence can be seen as a natural language representation of a computable function, where the verb represents the function and the arguments of the verb represent the arguments of the function. The DModel also is only as complicated as the functionality of the computing device for which the interface is being built. Note that the interface will not be built from scratch for a new device. The core interface will remain the same, only domain specific parts will change. The L-Model can be common for all devices. So, for example, we can start with a small LModel for the first device or application that we target. Later on we can expand the L-Model as new devices get added. The D-Model may have to be rewritten for a new device if the functionality is completely different. The S-Model will have to be changed partially for a new device.
Proceedings Page 36 of 42.
IUI4DR 2008 Proceedings Since the most of the major South Asian languages have a lot of similarities, the interface built for one language can be adapted for another South Asian language with comparative ease, depending on the distance between the two languages. From the L-Model point of view, there are a lot of words common to many South Asian languages. And from the S-Model point of view, the syntax of South Asian languages also has a lot of similarities, e.g. free word order, subject-object-verb or SOV structure, suitability of dependency grammars, etc. All these facts make it feasible to build a multilingual natural language and speech interface for South Asian languages, even though there are many major language in South Asia. CONCLUSION
In this paper we discussed the need of a natural language and speech interface for developing region. We also discussed some of the issues which are important in the South Asian context from the point of view of building such an interface. We proposed an interface based on the Universal Speech Interface or the USI paradigm. The basic idea in this paradigm is that it is neither feasible nor necessary to build an interface (whether text based or speech based) that can handle complete natural language. It is possible to build interfaces for differences devices with coverage of only restricted languages that are adequate to interact with most of the computing devices. We presented an outline of an interface based on this paradigm for South Asian languages and argued that it is feasible to build such an interface because of the similarities among South Asian languages, even though there is lack of all kinds of resources. REFERENCES
1. B LACK , A., AND TAYLOR , P. The Festival Speech Synthesis System: System Documentation (1.1.1). Technical Report HCRC/TR-83, Human Communication Research Centre, January 1997. 2. B LACK , A. W., AND L ENZO , K. A. Flite: a small fast run-time synthesis engine. In Proceedings of the 4th ISCA Workshop on Speech Synthesis (Scotland, August-September 2001). 3. C HUNG , G. Towards Multi-domain Speech Understanding with Flexible and Dynamic Vocabulary. PhD thesis, Massachusetts Institute of Technology, 2001. 4. E MENEAU , M. B. India as a linguistic area. Linguistics 32:3-16 (1956). 5. E MENEAU , M. B. Language and linguistic area. Essays by Murray B. Emeneau. Selected and introduced by Anwar S. Dil. Stanford University Press, 1980. 6. G LASS , J. Challenges for spoken dialogue systems. In Proceedings of the 1999 IEEE ASRU Workshop (1999). 7. G LASS , J., W EINSTEIN , E., C YPHERS , S., P OLIFRONI , J., C HUNG , G., AND NAKANO , N. A framework for developing conversational user interfaces. In Proceedings of CADUI (Isle of Madeira, Portugal, 2004).
8. H UANG , X., A LLEVA , F., H ON , H.-W., H WANG , M.-Y., AND ROSENFELD , R. The SPHINX-II speech recognition system: an overview. Computer Speech and Language 7, 2 (1993), 137–148. 9. L AENGLE , T., L UETH , T., S TOPP, E., H ERZOG , G., AND K AMSTRUP, G. Kantra - a natural language interface for intelligent robots, 1995. 10. M YERS , B. A. Using handhelds and PCs together. Communications of the ACM 44, 11 (2001), 34–41. 11. N ICHOLS , J., M YERS , B., H IGGINS , M., H UGHES , J., H ARRIS , T. K., ROSENFELD , R., AND P IGNOL , M. Generating remote control interfaces for complex appliances. In Proceedings of the 15th Annual Symposium on User Interface Software and Technology (UIST’02) (Paris, France, 2004), pp. 161–170. 12. N ICHOLS , J., M YERS , B. A., H IGGINS , M., H UGHES , J., H ARRIS , T. K., ROSENFELD , R., AND L ITWACK , K. Personal universal controllers: Controlling complex appliances with guis and speech, 2003. 13. ROSENFELD , R., O LSEN , D., AND RUDNICKY, A. Universal speech interfaces. Interactions 8, 6 (2001), 34–44. 14. ROSENFELD , R., Z HU , X., T OTH , A., S HRIVER , S., L ENZO , K., AND B LACK , A. Towards a universal speech interface. In Proceedings of ICSLP (Beijing, China, 2000). 15. S HRIVER , S., AND ROSENFELD , R. Keyword selection, and the universal speech interface project. In Proceedings of AVIOS (San Jose, CA, 2002). 16. S INGH , A. K., AND S URANA , H. Using a single framework for computational modeling of linguistic similarity for solving many nlp problems. In Proceedings Eurolan Doctoral Consortium (Iasi, Romania, 2007). 17. T OMKO , S., H ARRIS , T. K., T OTH , A., S ANDERS , J., RUDNICKY, A., AND ROSENFELD , R. Towards efficient human machine speech communication: The speech graffiti project. ACM Transactions on Speech and Language Processing 2 (1) (2005). 18. T OTH , A., H ARRIS , T. K., S ANDERS , J., S HRIVER , S., AND ROSENFELD , R. Towards every-citizen’s speech interface: An application generator for speech interfaces to databases. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP/Interspeech) (2002). 19. Z UE , V. Conversational interfaces: Advances and challenges. In Proceedings of Eurospeech ’97 (Rhodes, Greece, 1997), pp. KN–9–KN–18.
Proceedings Page 37 of 42.
IUI4DR 2008 Proceedings
Trust building user generated interfaces for illiterate people Jos´e Manuel Aguilar Alvarez Department of Computer Science Concordia University, Montreal, Canada jm
[email protected] ABSTRACT
The introduction of technology to illiterate people has many cultural barriers and technical constraints. At present, methodologies and process models for efficient design of user interfaces targeted at developing regions make use of intensive analysis and add improvements after many iterations. Following a review of some of these methodologies, an alternative that tackles user trust in technology and efficient use of cultural content in the creation of user interfaces is presented. The proposed User Interface Generator component can be used as a way to delegate a more direct role to these target communities in the creation of their own content. The result is a simplified design process that encourages capacity building, which is the ultimate goal of technologies for development. The purpose of our work is to experiment and verify this approach by developing voice and iconic personalized user interfaces along with software that generates content with aggregated geographical and cultural value from and by these communities. Author Keywords
ICT4D, user interface, user generated content, voice feeback ACM Classification Keywords
H.5.2 User Interfaces Interaction Styles H.1.2 User/Machine Systems Human Factors INTRODUCTION
Many of the factors for successful design of user interfaces(UI) that leverage the use of Information and Communication Technologies for Development(ICT4D) have been addressed. There exist theoretical and practical approaches for UI design targeted at developing regions. Most of them make use of iterative lengthy processes for understanding the communities’ culture in order to gather requirements, make decisions, test, and
incorporate additions after user feedback. For instance, Hypothetical User Design Scenarios(HUDS) are created to reduce the amount of field work needed when designing interfaces by elaborating scenario scripts following a well structured analysis of the user and its environment[4]. On the other hand, ethnographic UI design makes heavy use of empirical evidence and it follows an extensive and dedicated interview process for gathering relevant information about the community and its environment[9]. Localization methods usually include observation, interviews, ethnographical analysis, and the application of cultural theories[15]. At the end, most of the design decisions in terms of content and postdeployment improvements are controlled by design experts in an iterative manner, thus making the process lengthy, reducing its cost-effectiveness, and leaving appart possibilities of creativity in these communities. By decoupling the way localized content is extracted, processed, and included in the UI during the design process, most of the effort made by developers and UI design experts can be transfered to these target communities, and more effort can be put in creating interfaces that encourage content creation. This more natural delegation of responsibilities can increase confidence in the users, improve the level of penetration of these technologies, and create and preserve cultural identity. In this paper, a hypothetical scenario of such approach is presented, followed by each of the aspects for generating these interfaces: a user-centric approach with personalized iconic and voice content, and the User Interface Generator as the entry point for sustainable content creation. HYPOTHETICAL SCENARIO
Sometime in the near future, in a remote area, an andean farmer is given a small device with multimedia capabilities. The farmer has been told the device will help with the family’s daily life and it could even be used to sell farm products. The farmer’s family lives in a small village that has limited power and communication (the nearest telephone is located at the neighboring village, a 2 hour walk from the farmer’s village). Once taught how to turn on and off the device by a designated person who is also part of the community, the device plays a welcome message from a familiar voice: it is actually one of the village members who has accepted to be the person in charge of coordinating device activity in the village. The farmer can verify this claim by iden-
Proceedings Page 38 of 42.
IUI4DR 2008 Proceedings tifying the picture of this person on the screen. After answering a questionnaire (name, activity, and role in the community) by listening to the questions and answering directly by speaking to the device, an invitation is made to join the community network and access a set of services. After accepting and joining the community using the screen and a few input buttons, the farmer can now visualize a few other people who have already joined the community. These people are represented as icons in different ways that identify them (an image of a neighbor’s house, a photograph of another member, or any other image that a member can chose to be identified by the community). After querying the device for the meaning of each of the available images, an option is selected. The selection is a health service represented by an image of a local doctor healing a person. Two options are given: either leave a message or answer some questions about the health condition of a family member. After choosing to answer a questionnaire, a person self-identified as the representative of a medical checkpoint asks for basic health information. Once the questionnaire is finished and this activity is confirmed, the device informs the farmer that an answer will be sent to him through the device as soon as possible. Afterwards, navigation is returned to the main screen. The farmer selects a member, and records a voice message, which is sent upon confirmation. By leaving this message, the farmer has requested information about where to sell vicu˜ na wool. After a few minutes, a reply with a location is received from this member. Later that night, a reply from the health representative is received: the farmer has to pick up some medication from the health post next morning. To visualize better the representation of this scenario, we created a hypothetical user generated UI that shows the initial activity of the farmer while interacting with a mobile device (Figure 1). The initial screen shows three different communities the farmer can enroll before interacting with the device. This content is meant to be generated automatically depending on what community networks are in range. There are three common buttons for all screens: a back or undo button, a check or confirm button, and a dialog or help button. After joining a network, the second screen shows a simplified profile page containing sequentially generated content (I for name, II for activity, III for role in the community). This content was entered by a fairly literate community member using a pre-defined script, indicating the type of item (question or field form), and recording its content. Upon selection, voice feedback is provided to explain how to edit the content, which basically is recording voice and confirming the action; the check mark icon confirms changes in the user profile. Finally, the third screen shows other community members that have been added by the villager. By selecting a community member from this screen, a villager can activate other options to interact with this member. Generating the interface
The user interface contains 3 basic elements:
Figure 1. Personalized User Interface • Voice messages, recorded according to a script. These are used to reach the user in questionnaires or other building blocks and to provide feedback through all the screen options. • Local images, captured by a camera and transfered to the devices. • Navigation buttons, used for basic input of actions. The content is entered by a community representative or an external service provider following general guidelines. The User Interface Generator organizes the content based on a script. All incoming user data is delivered from the devices either to a peer device or to a central workstation that works as a gateway between the community and the outside communities . USER-CENTRIC APPROACH FOR BUILDING TRUST
Previous efforts in designing effective UIs for developing regions include some level of personalized content. One example of this is the Simputer project in India[5], which uses a mobile telephone identification card that stores information about the user. Other projects like One Laptop Per Child, include geographical context in which the users (children) interact with their peers by sharing activities—mostly school work[12]. The importance of peer to peer assistance to generate content and facilitate interaction of illiterate people with technology has also been studied[1]. In addition to these approaches, we leverage the level of personalization in our UI, in which a member of the community interacts virtually with an existing member that has a delegated role. A role could be defined according to economic activity or skill set. This way, transactions among users and services have visible support and a trust relationship can be established using technology as the medium.
Voice feedback
It has been observed that the presence of voice feedback, that is, the use of recorded or computer generated audio messages that guide the user in performing a task with the device, dramatically improves usability[8, 13]. This suggests that voice messaging is naturally more accessible than written content, and the amount of structured
Proceedings Page 39 of 42.
IUI4DR 2008 Proceedings
Figure 2. Balancing the UI content
Figure 3. Self-generated User Interface
information and abstraction that is required to perform a task in the UI is lower than one that relies only on visual elements. Indeed, cognitive structures in illiterate subjects are less organized than with literate ones[6]. Based on these observations, our approach relies heavily on human generated voice feedback as the means for accesibility and reachability: messages can be easily generated by recorded voices and users can gain confidence as they listen to familiar voices interacting with them. At a more advanced level, where users are literate and have computer skills, it has been observed that interfaces with explanations present a potential for building trust with their users[14]. At the moment, there is a strong constraint on making these systems responsive to voice commands or dialogs. Development costs rise when voice recognition software requires customization and training. There have been efforts in creating universal and flexible speech recognition systems for developing regions[10]. However, their initial adjustment in accuracy may present a sensitive factor in adopting speech based recognition. These systems will become more attractive when users become less apprehensive as it could be proven if usability tests under real conditions are successful.
based approach is more attractive for low cost and low resource devices because of its low computation requirements, its lack of adaptability makes it less maintainable, especially in remote areas. On the other hand, the more sophisticated and highly adaptable algorithm optimization approach appears to be more attractive, despite its high computation requirements, because the target interfaces are expected to be simpler than those of literate people. However, in our specific case, both interface rendering approaches must be evaluated after adding the voice and iconic nature of our interfaces. By using any of these approaches, our UI generator must balance the cost of UI design between expert made UI and automated UI generation with content directly entered by villagers through its delegates (see Figure 2 to visualize this relationship).
GEOGRAPHICAL AND CULTURAL CONTEXT
We propose a redistribution of responsibilities, where some factors that are deeply analyzed by UI experts can be delegated to village members that would act as technological hubs between the villagers and the outside community (other villages, government, and external organizations). Among these responsibilities are the identification of user needs, as specified in the Hypothetical User Design Scenarios(HUDS) method[4], creation and maintenance of cultural context via interviews as used in ethnographic UI design[9], and all the task set used for localization[15]. The instrument for delegating these requirements, which are inherently acquired by the acting content programmer—a villager or group of villagers, a government representative, or a member of any other organization interested in bringing services is a well structured, simplistic, and malleable UI generator. UI generators have been applied in industry for decades, and they are built using rule-based[11] or algorithm optimization[2, 3] approaches. While a rule-
ACCESS TO SERVICES Building blocks
Previous work has been done in building a set of instructions using iconic representation as building blocks[7]. In our approach, we define a set of basic of content blocks that a content provider will use to generate a working service interface. For example, in the case of the villager from our hypothetical scenario: a provider, namely, a doctor or nurse, builds an interface for diagnosis questionnaires using voice driven, iconic menu blocks, and general messages according to a script that follows basic guidelines, which the provider believes will deliver a clear message to the villager. At the same time, villagers can leave voice messages giving feedback directly to the provider or using the community network operator as intermediary. Figure 3 shows how these blocks can be organized and generated: The first image shows a group of services distributed simmetrically on the screen (health, financial, trading, and transportation). The next two images show how a service is accessed: with the help of voice feedback (the dialog icon on bottom right), the villager chooses the health service (red cross sign), and then selects to answer some questions (stethoscope icon). The questions are answered using audio recording (text content in shaded area is hidden from the user). It is worth noting that the shaded silhouettes used to identify people interacting
Proceedings Page 40 of 42.
IUI4DR 2008 Proceedings with the villager should be replaced with images of real participants in future models of our interface. Scripting
Huenerfauth’s hypothetical design makes use of scripts to describe interaction between user and interface[5]. To transfer this task to the village, we create a basic set of words and dialogues that the community member in charge of the devices will have to record in the device before first use by the villagers. In the same way, any service provider who generates a UI for the villagers to access provider’s services must follow the procedure and create a script to record using the building blocks and guidelines for UI generation. The interface content is enriched by the community’s members automatically every time a villager makes use of the device and accesses his or her community network. The sample script below shows how a service can be added to the interface. Sample script service type: health provider: name?, purpose? (feedback) content: add option option name: diagnose type: questionnaire add icon content: add question question?, explain question (feedback) add icon continue/end? end option end service
The script above pretends to be an instance of a service description language that will feed the UI generator. This script must be generated from a series of questions and answers the user agent enters via iconic and voice input based on requirements (adding a questionnaire for health services in this case). CONCLUSION AND FUTURE WORK
An alternative approach is given to reduce the cost of designing UIs for remote regions where there exists high level of illiteracy. We can also leverage involvement of their members by using geographical context, personalization with intensive voice feedback, and also by delegating content creation to the acting parties themselves with the use of building blocks and automated interface generation. This approach can be beneficial for these remote communities because it targets usability, it encourages creativity, and builds a collective image that has more possibilities to access the resources of the Information Society(e-Government, e-Learning, etc). For validation purposes, a proof-of-concept prototype is being developed. There are also general purpose features observed in previous studies that should be included in our model of building blocks: moving icons, and if possible, video feedback, but this depends heavily on hardware constraints.
ACKNOWLEDGMENTS
The author would like to thank Dr. Peter Grogono, and Sheila Turner for their insightful comments during the development of this paper. REFERENCES
1. Chand, A. and Kend, A. D. Jadoo: A paper user interface for users unfamiliar with computers. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems. ACM, 2006, 1625–1630. 2. Gajos, K., Christianson, D., Hoffmann, R., Shaked, T., Henning, K., Long, J. J., and Weld, D. S. Fast and robust interface generation for ubiquitous applications. In Proc. UbiComp 2005: Ubiquitous Computing, 7th International Conference. Springer, 2005, 37–55. 3. Gajos, K. and Weld, D. S. SUPPLE: Automatically generating user interfaces. In Proc. 9th International Conference on Intelligent User Interfaces IUI ’04 . ACM, 2004, 93–100. 4. Huenerfauth, M. Design approaches for developing user-interfaces accessible to illiterate users. In Intelligent and Situation-Aware Media and Presentations Workshop. Eighteenth National Conference on Artificial Intelligence (AAAI-02). 2002. 5. Huenerfauth, M. Developing Design Recommendations for Computer Interfaces Accessible to Illiterate Users. Master’s thesis, National University of Ireland, Department of Computer Science (2002). 6. Katre, D. S. Unorganized cognitive structures for illiterate as the key factor in rural e-learning design. I-Manager’s Journal of Education Technology, 2 , 4 (2006), 67–71. 7. Kuicheu, N. C., Fotso, L. P., and Siewe, F. Iconic communication system by XML language: (SCILX). In Proc. 2007 International Cross-disciplinary Conference on Web Accessibility (W4A). ACM, 2007, 112–115. 8. Medhi, I., Prasad, A., and Toyama, K. Optimal audio-visual representations for illiterate users of computers. In Proc. 16th International Conference on World Wide Web. ACM, 2007, 873–882. 9. Medhi, I., Sagar, A., and Toyama, K. Text-free user interfaces for illiterate and semi-literate users. In Conference on Information and Communication Technologies and Development . 2006. 10. Nedevschi, S., Patra, R. K., and Brewer, E. A. Hardware speech recognition for user interfaces in low cost, low power devices. In Proc. of the 42nd Annual Conference on Design Automation. ACM, 2005, 684–689.
Proceedings Page 41 of 42.
IUI4DR 2008 Proceedings 11. Nichols, J., Chau, D. H., and Myers, B. A. Demonstrating the viability of automatically generated user interfaces. In Proc. SIGCHI Conference on Human Factors in Computing Systems. ACM, 2007, 1283–1292. 12. OLPC. OLPC human interface guidelines/the laptop experience/zoom metaphor. http://wiki.laptop.org/go/OLPC_Human_ Interface_Guidelines/The_Laptop_ Experience/Zoom_Metaphor.
13. Plauch´e, M. and Prabaker, M. Tamil Market: A spoken dialog system for rural India. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems. 2006, 1619–1624. 14. Pu, P. and Chen, L. Trust building with explanation interfaces. In Proc. 11th International Conference on Intelligent User Interfaces. ACM, 2006, 93–100. 15. Smith, A., Dunckley, L., French, T., Minocha, S., and Chang, Y. A process model for developing usable cross-cultural websites. Interacting with Computers, 16 , 1 (2004), 63–91.
Proceedings Page 42 of 42.