interact2-2001.ppt 3245KB Jun 23 2011 12:32:38 PM

Multimodal HumanComputer Interaction
New Interaction Techniques 22.1.2001
Roope Raisamo (rr@cs.uta.fi)

Department of Computer and Information Sciences
University of Tampere, Finland

Multimodal human-computer
interaction
A definition [Raisamo, 1999e, p. 2]:
”Multimodal interfaces combine many
simultaneous input modalities and may
present the information using synergistic
representation of many different output
modalities”

Multimodal interaction
techniques
Our definition of an interaction technique
[Raisamo, 2000]:
• An interaction technique is a way to carry out

an interactive task. It is defined in the binding,
sequencing, and functional levels, and is
based on using a set of input and output
devices or technologies.
– In a multimodal interaction technique there are
more than one inputs or outputs used for the same
task.

Two views
• A Human-Centered View
– common in psychology
– often considers human input channels, i.e.,
computer output modalities, and most often vision
and hearing
– applications: a talking head, audio-visual speech
recognition, ...

• A System-Centered View
– common in computer science
– a way to make computer systems more adaptable


Multimodal humancomputer interaction
Computer
Computer input
modalities

Human output
channels

”cognition”

Human
Cognition

Interaction information flow
Intrinsic perception/action loop

Computer output
media


Human input
channels

Senses and modalities

Sensory perception

Sense organ

Modality

Sense of sight

Eyes

Visual

Sense of hearing

Ears


Auditive

Sense of touch

Skin

Tactile

Sense of smell

Nose

Olfactory

Sense of taste

Tongue

Gustatory


Sense of balance

Organ of equilibrium

Vestibular

[Silbernagel, 1979]

Design space for
multimodal user interfaces

Use of modalities
Fusion

Sequential

Parallel

Combined


ALTERNATE

SYNERGISTIC

Independent

EXCLUSIVE

CONCURRENT

Meaning

No Meaning

Meaning

No Meaning

Levels of abstraction


[Nigay and Coutaz, 1993]

An architecture for
multimodal user interfaces

Adapted from
[Maybury and
Wahlster, 1998]

Output
generation
- graphics
- animation
- speech
- sound
-…

Media
analysis

- language
- recognition
- gesture
-…

Media
design
- language
- modality
- gesture
-…

Interaction
management
- media fusion
- discourse
modeling
- plan
recognition
and

generation
- user
modeling
- presentation
design

Application interface

Input
processing
- motor
- speech
- vision
-…

Modeling

[Nigay and Coutaz, 1993]

Put

– That
– There

[Bolt, 1980]

Potential benefits
A list by Maybury and Wahlster [1998, p. 15]:
– Efficiency
– Redundancy
– Perceptability
– Naturalness
– Accuracy
– Synergy
– Mutual disambiguation of recognition errors
[Oviatt, 1999a]

Common misconceptions
A list by Oviatt [1999b]:
1. If you build a multimodal system, user will interact
multimodally.

2. Speech and pointing is the dominant multimodal
integration pattern.
3. Multimodal input involves simultaneous signals.
4. Speech is the primary input mode in any multimodal
system that includes it.
5. Multimodal language does not differ linguistically
from unimodal language.

Common misconceptions
6. Multimodal integration involves redundancy of
content between modes.
7. Individual error-prone recognition technologies
combine multimodally to produce even greater
unreliability.
8. All users’ multimodal commands are integrated in a
uniform way.
9. Different input modes are capable of transmitting
comparable content.
10. Enhanced efficiency is the main advantage of
multimodal systems.

Two paradigms for
multimodal user interfaces
1. Computer as a tool
– multiple input modalities are used to enhance
direct manipulation behavior of the system
– the machine is a passive tool and tries to
understand the user through all different input
modalities that the system recognizes
– the user is always responsible for initiating the
operations
– follows the principles of direct manipulation
[Shneiderman, 1982; 1983]

Two paradigms for
multimodal user interfaces
2. Computer as a dialogue partner
– the multiple modalities are used to increase the
anthropomorphism in the user interface
– multimodal output is important: talking heads and
other human-like modalities
– speech recognition is a common input modality in
these systems
– can often be described as an agent-based
conversational user interface

Two hypotheses on
combining modalities
1. The combination of human output channels
effectively increases the bandwidth of the
humanmachine channel.
 This has been discovered in many empirical
studies of multimodal human-computer interaction
[Oviatt, 1999b].

Two hypotheses on
combining modalities
2. Adding extra output modality requires more
neurocomputational resources and will lead
to deteriorated output quality, resulting in
reduced effective bandwidth.
 Two types of effects are usually observed:
 a slow-down of all output processes, and
 interference errors due to the fact that selective
attention cannot be divided between the increased
number of output channels.

 Two examples of this: writing when speaking, and
speaking when driving a car.

Call for research
A summary in [Raisamo, 1999e] pointed out that more
research is needed to understand the following:
– How the brain works and which modalities can best be used
to gain the synergy advantages that are possible with
multimodal interaction?
– When a multimodal system is preferred to a unimodal
system?
– Which modalities make up the best combination for a given
interaction task?
– Which interaction devices to assign to these modalities in a
given computing system?
– How to use these interaction devices, that is, which
interaction techniques to select or develop for a given task?

Touch’n’Speak

[Raisamo, 1998]

• Touch’n’Speak is a multimodal user interface
framework that makes use of combined touch and
speech input and different output modalities
– Input: touch buttons, touch lists, touch gestures in area
selection (time, location, pressure), speech commands
– Output: graphical, textual, and auditory (non-speech) output,
speech feedback

• The framework was used to implement a restaurant
information system that provides information on
restaurants in Cambridge, MA, USA.

A snapshot of Touch’n’Speak

Examples
• CHI2000 Video Proceedings: The Efficiency of
Multimodal Interaction for a Map-Based Task
(8:18)
• SIGGRAPH Video Review 76, CHI’92
Technical Video Program: Multi-Modal Natural
Dialogue (10:25)
• SIGGRAPH Video Review 77, CHI’92
Technical Video Program: Combining
Gestures and Direct Manipulation (9:56)
• CHI’99 Video Proceedings: Embodiment in
Conversational Interfaces: Rea (2:08)

Homework
• Read Chapter 2 (Multimodal interaction)
in [Raisamo, 1999e].
– [Raisamo, 1999e] is available online at http://granum.uta.fi/
pdf/951-44-4702-6.pdf
– A printable version is available online at
http://www.cs.uta.fi/~rr/interact/dissertation.pdf

Dokumen yang terkait

ANALISIS FAKTOR YANGMEMPENGARUHI FERTILITAS PASANGAN USIA SUBUR DI DESA SEMBORO KECAMATAN SEMBORO KABUPATEN JEMBER TAHUN 2011

2 53 20

KONSTRUKSI MEDIA TENTANG KETERLIBATAN POLITISI PARTAI DEMOKRAT ANAS URBANINGRUM PADA KASUS KORUPSI PROYEK PEMBANGUNAN KOMPLEK OLAHRAGA DI BUKIT HAMBALANG (Analisis Wacana Koran Harian Pagi Surya edisi 9-12, 16, 18 dan 23 Februari 2013 )

64 565 20

FAKTOR – FAKTOR YANG MEMPENGARUHI PENYERAPAN TENAGA KERJA INDUSTRI PENGOLAHAN BESAR DAN MENENGAH PADA TINGKAT KABUPATEN / KOTA DI JAWA TIMUR TAHUN 2006 - 2011

1 35 26

A DISCOURSE ANALYSIS ON “SPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTY” IN THE JAKARTA POST ON 4 MARCH 2011

9 161 13

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

6 101 0

Pengaruh pemahaman fiqh muamalat mahasiswa terhadap keputusan membeli produk fashion palsu (study pada mahasiswa angkatan 2011 & 2012 prodi muamalat fakultas syariah dan hukum UIN Syarif Hidayatullah Jakarta)

0 22 0

Perlindungan Hukum Terhadap Anak Jalanan Atas Eksploitasi Dan Tindak Kekerasan Dihubungkan Dengan Undang-Undang Nomor 39 Tahun 1999 Tentang Hak Asasi Manusia Jo Undang-Undang Nomor 23 Tahun 2002 Tentang Perlindungan Anak

1 15 79

Pendidikan Agama Islam Untuk Kelas 3 SD Kelas 3 Suyanto Suyoto 2011

4 108 178

PP 23 TAHUN 2010 TENTANG KEGIATAN USAHA

2 51 76

KOORDINASI OTORITAS JASA KEUANGAN (OJK) DENGAN LEMBAGA PENJAMIN SIMPANAN (LPS) DAN BANK INDONESIA (BI) DALAM UPAYA PENANGANAN BANK BERMASALAH BERDASARKAN UNDANG-UNDANG RI NOMOR 21 TAHUN 2011 TENTANG OTORITAS JASA KEUANGAN

3 32 52