Mulsemedia: State-of-the- Art, Perspectives and Challenges
GHEORGHITA GHINEA Brunel University CHRISTIAN TIMMERER Alpen-Adria-Universität WEISI LIN Nanyang Technological University STEPHEN R. GULLIVER University of Reading Mulsemedia – multiple sensorial media – captures a wide variety of research efforts and applications. This paper presents a historic perspective on mulsemedia work and reviews current developments in the area. These take place across the traditional multimedia spectrum – from virtual reality applications to computer games - as well as efforts in the arts, gastronomy and therapy, to mention a few. We also describe standardization efforts, via the MPEG-V standard, and identify future developments and exciting challenges the community needs to overcome. Categories and Subject Descriptors: H.5.2 [Information Interfaces and Presentation]: User Interfaces—Evaluation/methodology; H.1.2 [Models and Principles]: User/Machine Systems—Human Information Processing; General Terms: Mulsemedia, multi-sensory Additional Key Words and Phrases: Contour perception, flow visualization, perceptual theory, visual cortex, visualization ACM Reference Format: Ghinea G, Timmerer, C., Lin, W. and Gulliver, S.R.. Mulsemedia: State-of-the- Art, Perspectives and Challenges. ACM Trans. Multimedia Computing Communications and Applications. X, Y, Article Z (XXX 201X), XX pages. DOI=10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
1.
INTRODUCTION
In 2004, on t h e 10t h a n n iver sa r y of t h e cr ea t ion of t h e ACM Mu lt im edia Specia l In t er est Gr ou p, La r r y Rowe a n d Ra m esh J a in pu blish ed a sem in a l pa per on “Fu tu re Direction s in M u ltim ed ia R esearch ” in ACM TOMCCAP . Th e pa per pr esen t ed t h e r esu lt of discu ssion s a t a on e da y wor k sh op wit h over 30 lea din g r esea r ch er s in t h e field. Th er e wa s a gr eem en t t h a t m u lt im edia is a m u lt idisciplin a r y field, a pplyin g t o a va r iet y of fields (e.g., en t er t a in m en t , edu ca t ion , m edicin e, cr ea t ive a r t s, et c.). Th r ee u n ifyin g t h em es wer e iden t ified t o u n it e t h e m u lt im edia r esea r ch field. F ir st ly, a m u lt im edia syst em or a pplica t ion is com pr ised of a t lea st t wo m edia object s t h a t a r e cor r ela t ed. Secon dly, t h er e is t h e issu e of in t egr a t ion a n d a da pt a t ion wh er e m u lt iple m edia object s sh ou ld be u sed join t ly a n d sepa r a t ely t o im pr ove a pplica t ion per for m a n ce, a n d dist r ibu t ed m u lt im edia a pplica t ion s sh ou ld pr ovide t r a n spa r en t deliver y of dyn a m ic con t en t in su ch a wa y t h a t con t en t a da pt s n a t u r a lly t o t h e u ser s’ en vir on m en t . Th ir dly, m u lt im edia a ppl ica t ion s a r e m u lt im oda l a n d in t er a ct ive (Rowe a n d J a in , 2005). 10 yea r s on , wh a t h a s ch a n ged? A lot , a n d m a ybe n ot so. Ar gu a bly, t h e t h r ee u n ifyin g t h em es a r e ver y m u ch va lid t oda y, in a wor ld dom in a t ed by socia l m edia a n d a pr olifer a t ion of sen sor r ich (pr edom in a n t ly m obile) devices, wh er e in dividu a ls a r e pr odu cer s, br oa dca st er s , a n d con su m er s of r ich m edia con t en t . Rea ssu r in gly, t h e a ccept ed defin it ion of m u lt im edia r em a in s t h a t of a com bin a t ion of t wo or m or e m edia , on e of wh ich is pr efer a bly con t in u ou s, t h e ot h er u su a lly discr et e. It is wit h ou t dou bt t h a t m ost of m u lt im edia con t en t a va ila ble t oda y is a com bin a t ion of video a n d a u dio (bot h con t in u ou s m edia ) wit h t ext u a l (discr et e m edia ) in for m a t ion som et im es con t a in ed t h er ein . H owever , su ch a pplica t ion s en ga ge pr im a r ily t wo of ou r h u m a n sen ses: t h a t of sigh t a n d h ea r in g, i.e. t h ey a r e Author's address: G. Ghinea, Department of Computer Science, Kingston Lane, Uxbridge, UB8 3PH, UK; email:
[email protected]; Christian Timmerer, Universitätsstrasse 65-67 A-9020 Klagenfurt Austria; email:
[email protected]; Weisi Lin, School of Computer Engineering,Nanyang Technological University 50 Nanyang Avenue Singapore 639798; email:
[email protected]; Stephen Gulliver, Henley Business School, Whiteknights, Reading, RG6 6UR, UK; email:
[email protected] Permission to make digital or hardcopies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credits permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or
[email protected]. @2010 ACM 1544-3558/2010/05-ART1 $10.00 DOI10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
1
bi-sen sor ia l. Th is sit u a t ion is a t odds wit h t h e fa ct t h a t 60% of h u m a n com m u n ica t ion is n on -ver ba l a n d t h a t m ost of u s per ceive t h e wor ld t h r ou gh a com bin a t ion of five sen se s (i.e., sigh t , h ea r in g, t ou ch , t a st e, a n d sm ell). As su ch , cu r r en t m u lt im edia exper ien ces fa il t o con vey t h e sen sa t ion , for in st a n ce, of h ea t a n d h u m idit y, let a lon e t h e wa ft s of a r om a s t h a t on e exper ien ces wh en wa kin g t h r ou gh a spice m a r k et in In dia . As h u m a n s, we en ga ge a n d lea r n by in t er a ct in g wit h a ll of ou r sen ses – ca n we n ot do t h is in a digit a l fa sh ion a s well? We t h er efor e pr opose Mu lsem edia – m u lt iple sen sor ia l m edia - a s a n ew m u lt im edia ch a llen ge for t h e for t h com in g 10 yea r s. Wh er ea s m u lt im edia a pplica t ion s a r e u su a lly bi-(som et im es t r i-)m edia a n d a lm ost exclu sively bi-sen sor ia l in n a t u r e, m u lsem edia a pplica t ion s a r e t h ose t h a t en ga ge t h r ee (or m or e) of ou r sen ses. Wh ile cu r r en t t ech n ologica l developm en t s h a ve m a de d igital m u lsem edia exper ien ces som ewh a t of a n ovelt y, in t h e n on -digit a l wor ld t h ey a r e a n yt h in g bu t . Th e ea r liest we k n ow a bou t h a ppen ed in 1906 wh en a r t ificia lly gen er a t ed sm ells wer e com bin ed wit h a u diovisu a l con t en t . An a u dien ce wa s spr a yed wit h t h e scen t of r oses wh ile wa t ch in g a scr een in g of t h e Rose Bowl foot ba ll ga m e. In 1943, H a n s La u be wh o h a d ea r lier per fect ed a t ech n iqu e t o ext r a ct odor s fr om a n en closed en vir on m en t , wa s a ble t o r ever se t h is pr ocess so t h a t select ed scen t s wer e em it t ed a t specific t im es a n d for specified du r a t ion s, r esu lt in g in a 35 m in u t e ‘sm ell-o-dr a m a ’ m ovie ca lled M ein T rau m in wh ich 35 differ en t odor s wer e r elea sed t o a ccom pa n y t h e dr a m a pr esen t a t ion . Bu ildin g on t h is, a u dien ces in 1959 viewin g a docu m en t a r y a bou t Red Ch in a ca lled B eh in d th e Great Wall wer e t r ea t ed wit h a n Ar om a Ra m a pr esen t a t ion , in wh ich t h e t h ea t r e’s a ir -con dit ion in g syst em wa s u sed t o r elea se over 30 differ en t sm ells. Sh or t ly a ft er wa r ds, in 1960, Mich a el Todd J r pr odu ced a com pet in g syst em ca lled S m ell-O-Vision , in wh ich a r om a s wer e r elea sed du r in g t h e scr een in g of t h e m ovie Scen t of Myst er y. It wou ld be a n exa gger a t ion t o sa y t h a t t h ese exper ien ces wer e a n u n qu a lified su ccess: ch a llen ges of gen er a t in g r ea list ic scen t s, t h e t en den cy of odor s t o dr ift a n d diffu se, a s well a s in su fficien t ly u n der st ood ch a r a ct er ist ics of odor in t en sit y a ll m ea n t t h a t , n ovelt y fa ct or a side, u ser t a ke -u p wa s low. Th e r ea ct ion of t h e a u dien ce t o t h e Ar om a Ra m a exper ien ce is pr oba bly best descr ibed fr om t h e followin g ext r a ct fr om t h e r eview pu blish ed ba ck t h en by Tim e m a ga zin e: “T o begin w ith , m ost of th e prod u ction ’s 31 od ors w ill probably seem ph on ey, even to th e average u n ed u cated n ose. A beau tifu l old pin e grove in Pek in g, for in stan ce, sm ells rath er lik e a su bw ay rest room on d isin fectan t d ay. B esid es, th e od ors are stron g en ou gh to give a blood h ou n d a h ead ach e. Wh at is m ore, th e sm ells are n ot alw ays rem oved as rapid ly as th e scen e requ ires: at on e poin t, th e au d ien ce d istin ctly sm ells grass in th e m id d le of th e Gobi d esert.” Su ch dr a wba ck s did n ot pr even t pion eer in g m u lsem edia effor t s, h owever . In 1962, Mor t on H eilig cr ea t ed wh a t is n ow popu la r ly du bbed a s t h e fir st vir t u a l r ea lit y (VR) exper ien ce for u ser s, even t h ou gh digit a l com pu t in g a n d vir t u a l r ea lit y syst em s did n ot exist t h en . Wit h S en soram a, h e cr ea t ed a n a r ca de-st yle device, wh ich t ook u ser s on a n im m er sive 3 -D vir t u a l r ea lit y bik e r ide exper ien ce t h r ou gh t h e st r eet s of Br ooklyn , N ew Yor k. Th is ca m e com plet e wit h m ot ion s a n d vibr a t ion s, sou n ds, fa n s a n d sm ells, t h e m ost com plex m u lsem edia exper ien ce devised so fa r , en ga gin g fou r ou t of ou r five m a jor sen ses. In deed, given t h a t t h e sen se of t a st e is in t im a t ely con n ect ed t o t h a t of sm ell a n d t h a t on e of t h e a r om a s em it t ed wa s t h a t of fr esh ly ba k ed br ea d fr om a ba k er y, it is n ot in con ceiva ble t h a t for som e u ser s a ll five m a jor sen ses wer e en ga ged in t h eir m u lsem edia jou r n ey (H eilig, 1962). Over h a lf a cen t u r y h a s pa ssed sin ce t h en – so wh er e a r e we n ow on t h e m u lsem edia la n dsca pe? T o a n swer t h e qu est ion , t h is pa per r eviews developm en t s t h a t r ecen t t ech n ologica l a dva n ces h a ve m a de possible t o see h ow m u lsem edia a pplica t ion s fit wit h in t h e m u lt im edia a r en a , a n d t o iden t ify ch a llen ges t h a t t h e com m u n it y h a s t o over com e. Accor din gly, t h e st r u ct u r e of t h e r est of t h is pa per is a s follows: given t h e im por t a n ce of t h e h u m a n sen se t o m u lsem edia , t h e n ext sect ion gives a n over view; Sect ion 3 t h en det a ils r ela t ed wor k. Mu lsem edia n eeds st a n da r ds t o t h r ive, a n d, t o t h is en d, Sect ion 4 descr ibes MP E G-V a st a n da r d ca pa ble of su ppor t in g m u lsem edia a pplica t ion s. Th e u ser is a n im por t a n t elem en t of m u lsem edia , a n d QoE effor t s in t h is r espect a r e det a iled in Sect ion 5. F in a lly, r esea r ch ch a llen ges a n d open issu es a r e descr ibed in Sect ion 6.
2.
H UMAN SE NSORIAL OVE RVIE W
In t h is sect ion we con sider in m or e det a il t h e m u lt iple pr ocess st eps r equ ir ed t o a ch ieve m u lt iple sen sor y per cept ion . We in t r odu ce key ph ysiologica l syst em s, a n d descr ibe h ow ea ch ca pt u r es a n d t r a n sfor m s in for m a t ion fr om t h e wor ld, so t h a t t h e br a in ca n pr ocess it . We con clu de t h e sect ion by con sider in g t h e issu e of cogn it ive bin din g, a n d h igh ligh t t h e a t t en t ive st r u ggle bet ween t op -down a n d bot t om -u p pr ocesses. 2.1
Mu lt iple Sen sor y P er cept ion
Sen sor y per cept ion r ela t es t o a h u m a n ’s con sciou s sen sor y exper ien ce of t h e wor ld, i.e. wh a t a per son ca n see, h ea r , sm ell, t ou ch , a n d t a st e, et c. Wh en we con sider m u lsem edia per cept ion , t h er efor e, it is cr it ica l t o a ppr ecia t e t h a t m u lt iple sen sor y m edia per cept ion is n ot som et h in g t h a t ju st ‘h a ppen s’. F or a per son t o be a ble t o u n der st a n d a n d a ssim ila t e m ea n in g fr om m u lt iple sen sor y m edia , t h ey m u st ca pt u r e, in t er pr et a n d com bin e in for m a t ion fr om n u m er ou s sen sor y or ga n s – bot t om u p sen sin g (Goldst ein , 2013). Mor eover , in for m a t ion fr om m u lt iple sen ses m u st be cogn it ively join ed a n d a lign ed , a n d t h en com pa r ed t o h igh er -or der cogn it ive sch em a , wh ich defin e t a sk sem a n t ics, pr a gm a t ics a n d socia l n or m s – t op down t h in kin g (Ma r ois, 2005; Ma yer , 2003). Alt h ou gh per cept ion som et im e feels a s t h ou gh it ju st h a ppen s, it is in r ea lit y t h e r esu lt of a com plex set of pr ocesses. Biologica l s en sor s ca pt u r e ph ysica l sign a ls fr om t h e en vir on m en t a n d t r a n sdu ce t h em , wit h t h e except ion of specific ch em or ecept or s, in t o st r u ct u r ed elect r ica l sign a ls . Th ese sign a ls a r e r est r u ct u r ed in t h e n er vou s syst em s, a n d t r a n sm it t ed t o t h e br a in . Wit h in t h e br a in , spa t ia l/t em por a l sign a ls a r e t h en su b-con sciou sly st r u ct u r ed a s pa t t er n s, wh ich a r e a t t en t ively pr ocessed a s h igh er -level a r t efa ct s / object s. On ce st r u ct u r ed, a ppr ecia t ion of m ea n in g fa cilit a t es t h e va lida t ion of pr oposit ion s. Iden t ifyin g wh et h er som et h in g is t r u e or fa lse fa cilit a t es h u m a n s a lign bot t om -u p sen sor y in pu t wit h t op-down kn owledge a n d m em or y; a n d en a bles u s t o cr ea t e, a n d it er a t ively va lida t e, com plex sch em a m odels of t h e r ea l wor ld. Th e exist en ce of t h ese com plex sch em a m odels a llows h u m a n s t o pr edict , a n d u n der st a n d, t h e wor ld in con t ext of h igh er pr a gm a t ic a n d socia l st r u ct u r es. 2.2
Ca pt u r in g t h e P h ysica l
Th er e is n o u n iver sa l a gr eem en t a s t o t h e n u m ber of sen ses per ceived by t h e h u m a n m in d. In r ea lit y, h owever , t h e h u m a n body m a n a ges sen sor y in pu t s fr om a wide r a n ge of in t er n a l a n d ext er n a l sen sor y in pu t s; su ch a s pa in (n ocicept ion ), spa ce (pr opr iocept ion ), m ovem en t (kin a est h esia ), t im e, a n d t em per a t u r e (t h er m ocept ion ). As well a s ext er n a l sen ses, ou r bodies sen se a n d pr ocesses in t er n a l r egu la t ion (ca lled in t er ocept ive sen ses), wh ich lea ds t o feelin gs of h u n ger , sickn ess, t h ir st , st r ess or discom for t (Cr a ig, 2003). All of t h ese sen ses a r e in t er n a lly lin k ed wit h in ou r m odel of t h e wor ld. Despit e ou r pr ocessin g t h is dyn a m ic r a n ge of in t er n a l a n d ext er n a l sen ses, m u lsem edia syst em s focu s on t h e five t r a dit ion a l sen se, a s defin ed by Ar ist ot le, i.e. visu a l (sigh t ), a u dit or y (sou n d), t a ct ile/h a pt ic (t ou ch ), Olfa ct or y (sm ell) a n d Gu st a t or y (t a st e). In t h is sect ion we in t r odu ce t h e r ea der t o ea ch ph ysiologica l syst em in t u r n . 2.2.1 Visu al (S igh t). In m u lsem edia , sigh t a llows a ssim ila t ion of t ext u a l a n d visu a l in for m a t ion . Ligh t r eflect ed fr om a ph ysica l object in t h e visu a l field en t er s t h e eye t h r ou gh t h e pu pil a n d pa sses t h r ou gh t h e len s; wh ich pr oject s a n in ver t ed im a ge on t o t h e r et in a a t t h e ba ck of t h e eye. Th e r et in a con sist s of a ppr oxim a t ely 127 m illion ligh t -sen sit ive cells (120 m illion ca lled r ods; 7 m illion ca lled con es, wh ich ca n be su bdivided in t o L-con es, M-con es a n d S-con es). Alt h ou gh con es a r e less ligh t sen sit ive t h a n r ods, t h ey a r e r espon sible for ca pt u r in g color wit h in t h e h u m a n visu a l syst em . Wh en ligh t en t er s t h e eye, it pa sses t h r ou gh seven sen sor y cell-la yer s befor e r ea ch in g t h e r ods a n d con es a t t h e ba ck of t h e eye. If con es wer e dist r ibu t ed even ly a cr oss t h e r et in a , t h eir a ver a ge dist a n ce a pa r t wou ld be r ela t ively la r ge, lea din g t o poor spa t ia l a cu it y. Accor din gly con es a r e con cen t r a t ed in t h e cen t er of t h e r et in a (in a cir cu la r a r ea ca lled m a cu la lu t ea ). Wit h in t h is a r ea , t h er e i s a depr ession ca lled t h e fovea , wh ich con sist s a lm ost en t ir ely of con es, a n d it is t h r ou gh t h is a r ea of h igh a cu it y, ext en din g over ju st 2° of t h e visu a l field, t h a t h u m a n s m a ke t h eir det a iled obser va t ion s of t h e wor ld. Th e cells t h a t pr ocess a n d t r a n sm it in for m a t ion t o t h e br a in a r e ca lled t h e bipola r , h or izon t a l a n d
ga n glion cells. P h ot or ecept or s a t t h e ba ck of t h e eye (con es a n d r ods) a r e a ct iva t ed wh en ligh t is sh in ed a t t h em , wh ich con secu t ively a ct iva t es bipola r cells. Visu a l pr e-a t t en t ive segr ega t ion , a n d object com bin in g, occu r s pr im a r ily in t h e occipit a l lobe (a t t h e ba ck of t h e br a in ), h owever visu a l in for m a t ion is con t ext u a lized, i.e. ‘Wh er e/H ow’ a n d ‘Wh a t ’, in t h e P a r iet a l a n d Tem por a l lobes r espect ively (Sch iller , 1986). 2.2.2 Au d itory (S ou n d ). In m u lsem edia , t h e h u m a n a u dit or y syst em is u sed h ea vily in t h e t r a n sfer of sou n d, speech , m u sic a n d specia l effect s. If a n object vibr a t es, it pr odu ces a sequ en ce of wa ve com pr ession s in t h e a ir su r r ou n din g it . Th ese flu ct u a t ion s in a ir pr essu r e spr ea d a wa y fr om t h e sou r ce of vibr a t ion a t 320m /s, r edu cin g in m a gn it u de a s t h e en er gy is disper sed. Wh en t wo or m or e wa vefor m s in t er a ct , t h ey cr ea t e a com bin ed wa vefor m t h a t is t h e su m of it s com pon en t pa r t s. Sou n d is t h e sen sa t ion pr odu ced by t h e ea r wh en a vib r a t ion occu r s wit h in a given fr equ en cy r a n ge (a ppr oxim a t ely 20 H z t o 20 KH z), wh ich is a u dible t o h u m a n s. Th e volu m e of sou n d, a t t h e sou r ce of vibr a t ion , is depen den t u pon t h e m a gn it u de of sou n d en er gy wa vefor m . Th e fr equ en cy is depen den t u pon t h e fr equ en cy of com pr ession s bein g pr odu ced by t h e sou r ce of vibr a t ion . Th e ea r is divided in t o t h r ee pa r t s - t h e ou t er (ext er n a l), t h e m iddle a n d t h e in n er (in t er n a l) ea r . Th e ou t er ea r collect s sou n d wa ves a n d focu ses t h em a lon g t h e ea r ca n a l t o t h e ea r dr u m . Th e e a r dr u m vibr a t es, ca u sin g bon es (Ma lleu s a n d In cu s) t o r ock ba ck a n d for t h , wh ich pa sses m ovem en t t o t h e coch lea wh er e flu id in t h e in n er ea r is dist u r bed. Th e dist u r ba n ce of flu id ca u ses t h ou sa n ds of sm a ll h a ir cells t o vibr a t e. Th e coch lea con ver t s sou n d wa ves in t o elect r ica l im pu lses, wh ich a r e pa ssed on t o t h e br a in via t h e a u dit or y n er ve. Th e t h r ee m a in a u dit or y a r ea s in t h e br a in (i.e. t h e cor e a r ea , t h e belt a r ea , a n d t h e pa r a belt ) a r e fou n d in t h e t em por a l lobe. Recogn it ion of sou n d a n d loca liza t ion of sou n d a r e, h owever , pr ocessed sepa r a t ely (Yost , 1985). 2.2.3 T actile/ h aptic (T ou ch ). In m u lsem edia , t a ct ile feedba ck a llows u s t o iden t ify sever a l dist in ct t ypes of sen sa t ion s; a s h u m a n skin con t a in s a n u m ber of differ en t sen sor y r ecept or cells t h a t r es pon d pr efer en t ia lly t o va r iou s m ech a n ica l, t h er m a l or ch em ica l st im u li. Th e m a jor it y of m u lt im edia st u dies in volves t h e t a ct ile or t ou ch sen se, wh ich det ect s pr essu r e a n d t ou ch (i.e., br u sh in g, vibr a t ion , flu t t er a n d in den t a t ion ), h owever , h u m a n skin is a lso sen sit ive t o t em per a t u r e a n d pa in . In for m a t ion fr om t h e skin r ecept or s is ca r r ied a lon g “t ou ch -n eu r on pa t h wa y” t o t h e som a t osen sor y cor t ex, wh ich m a ps t h e sen ses in t h e body a n d t r a n sm it s m essa ges a bou t sen sor y in for m a t ion t o ot h er pa r t s of t h e br a in (e.g. for u se in per for m in g a ct ion s, for m a kin g decision s, en joyin g sen sa t ion or r eflect in g on t h em ). 2.2.4 Olfactory (sm ell) In m u lsem edia , olfa ct or y feedba ck a llows r esea r ch er s t o m on it or su bcon sciou s r ea ct ion t o sm ell; wh ich is oft en lin ked t o t a sk / em ot ion a l con t ext u a liza t ion . Th er e a r e 50 m illion pr im a r y sen sor y r ecept or cells in a sm a ll (2.5 cm 2) a r ea of t h e n a sa l pa ssa ge ca lled t h e olfa ct or y r egion . Th e olfa ct or y r egion is for m ed of cilia pr oject in g down ou t of t h e olfa ct or y epit h eliu m in t o a la yer of m u cou s, wh ich h elps t o t r a n sfer solu ble odor a n t m olecu les t o t h e r ecept or n eu r on s. Th e n eu r on a l cells for m a xon s, wh ich pen et r a t e t h e cr ibr ifor m pla t e of bon e, t h u s r ea ch in g t h e olfa ct or y bu lb of t h e br a in . Sm ell m essa ges a r e sen t dir ect ly t o t h e h igh er levels of t h e cen t r a l n er vou s syst em , via t h e olfa ct or y t r a ct , wh er e olfa ct or y in for m a t ion is decoded a n d a r ea ct ion is det er m in ed. Com pa r ed t o m a n y m a m m a ls, sm ell a bilit y in h u m a n s is lim it ed. Sm ell is, h owever , im por t a n t t o h u m a n per cept ion of episodic kn owledge, wit h sm ells oft en t r igger in g specific con t ext u a l m em or ies. Th e olfa ct or y sen se is u sed in h u m a n s a s a m ea n s of iden t ifyin g r esou r ces, a s a wa r n in g of da n ger (e.g. r ot t en food, ch em ica l da n ger s, a n d fir es), iden t ify m a t es, pr eda t or s, a idin g n a viga t ion , a n d pr ovidin g sen su a l plea su r e. Sin ce olfa ct or y n eu r on s a r e con n ect ed dir ect ly t o t h e br a in , a n d ca n t h er efor e u n con sciou sly in flu en ce cogn it ion a n d em ot ion , sm ell is kn own t o t r igger discom for t , sym pa t h y or even u n con sciou s r efu sa l (Aya be–ka n a m u r a et a l., 1998). 2.2.5 Gastron om y (taste). Th e t on gu e is cover ed in pa pilla e, wh ich a r e eit h er i) filifor m , fou n d a cr oss t h e en t ir e su r fa ce of t h e t on gu e, ii) fu n gifor m , wh ich a r e fou n d on t h e t ip a n d sides of t h e t on gu e, iii) folia r e, wh ich a r e st r u ct u r ed a t t h e sides a t t h e ba ck of t h e t on gu e, a n d iv) cir cu m villia t e, fou n d a t t h e cen t r a l ba ck of t h e t on gu e. All pa pilla e, wit h t h e except ion of filifor m , con t a in t a st e bu ds. E a ch of t h e 10,000 t a st e
bu ds con t a in s bet ween 50-100 t a st e cells. Tr a dit ion a lly it wa s believed t h a t t a st e wa s gr ou ped in a r ea s r ela t in g t o sou r , sweet , sa lt y, bit t er (wit h Um a m i n ot con sider ed), h owever it is n ow u n der st ood t h a t a ll t a st es (in clu din g Um a m i) a r e r egist er ed by a ll t a st e bu ds. E lect r ica l sign a ls a r e gen er a t ed in t a st e bu ds, a n d pa ss a lon g on e of a n u m ber of n er ves, r ela t in g t o sepa r a t e a r ea s of t h e t on gu e, a n d lin k t o bot h t h e Th a la m u s (per ch ed on t op of t h e br a in st em ) a n d t h e fr on t a l lobe. Sm ell a n d t a st e a r e com m on ly con sider ed t oget h er , a s t h ey a r e fu n ct ion a lly lin ked. Un lik e ot h er sen ses, wh ich in t er pr et ligh t /sou n d wa vefor m s or in t er a ct ion pa t t er n s, a n d t r a n sfor m t h ese in t o elect r ica l sign a ls u n der st ood in t h e br a in , sm ell a n d t a st e a r e oft en t er m ed ‘ga t ek eepin g’ sen ses, i.e. sen sa t ion s cr ea t ed a s a r esu lt of in t er a ct ion wit h m olecu les bein g a ssim ila t ed in t o t h e body (Goldst ein , 2013). Ga t ek eeper (ch em or ecept or ) sen ses a r e u n der st a n da bly lin k ed t o biologica l a n d em ot ion a l pr ocesses, i.e. t o en su r e a u t om a t ic r eject ion in t h e ca se of ba d food. Despit e in pu t of da t a via sepa r a t e sen sor y syst em s (i.e., sm ell a n d t a st e), it is a lm ost im possible t o t a s t e som et h in g wh ile pin ch in g you r n ose, m a kin g t h e exper ien ces of sm ell a n d t a st e h a r d t o sepa r a t e. 2.3
Bin din g a n d F ocu s
Alt h ou gh en t it ies a n d even t s in t h e wor ld a r e per ceived via dispa r a t e sen sor y m oda lit ies, a s descr ibed in sect ion 2.2, ou r exper ien ce of t h e wor ld is la r gely coh er en t (bot h spa t ia lly a n d t em por a lly). Th e issu e of h ow t h e br a in in t egr a t es a n d a lign s sen sor y fr a gm en t s is ca lled ‘Bin din g’ (Da m a sio, 1989, p. 29), a n d con sist s of: segr ega t ion a n d com bin in g pr ocesses. Segr ega t ion pr ocesses (BP 1) defin e h igh -level object va r ia bles wit h in ea ch sen sor y in pu t (e.g., sh a pe a n d color fr om t h e sa m e in pu t fr om m illion s of ligh t -sen sit ive cells), a n d com bin in g pr ocesses (BP 2) r ela t es t o t h e pr ocess of join in g a n d syn ch r on izin g object va r ia bles a cr oss differ en t sen ses. S en sor y pr ocessin g is con sist en t for a ll h u m a n s, a n d r esea r ch er s u n der st a n d a sign ifica n t a m ou n t con cer n in g t h e pr ocess in g a n d r epr esen t a t ion of sen sor y da t a (Sm yt h ies, 1994, p. 54), h owever t h er e is less u n der st a n din g of h ow br a in m ech a n ism s con st r u ct ph en om en a l object s (i.e. h igh -level m en t a l object , eit h er ph ysica l or con cept u a l, wh ich a ct a s t h e focu s of a t t en t ion ).
Fig. 1. Schema of Thinking Processes (Based on Mayer, 2003; Marois, 2005; Adapted from Fadel & Lemke, 2008).
Du a l pr ocess t h eor y, wh ich pr ovides som e in t er est in g in sigh t s, sepa r a t es cogn it ion in t o t wo syst em s, i.e. in t u it ion / exper ien ce (t er m ed syst em 1), a n d r ea son in g / m em or y (t er m ed syst em 2). Syst em 1 com bin es sen sor y a n d em ot ion a l st im u li t o su bcon sciou sly defin e spa t ia l/t em por a l a ssocia t ion s bet ween object va r ia bles. Syst em 2 a llows h u m a n s t o u n iqu ely pr ocess con sciou s ju dgm en t s a n d a t t it u des in con t ext of sem a n t ic a n d episodic kn owledge. Syst em 2 is slow, h owever , a n d lim it ed in pa r t du e it s r elia n ce on lim it ed -ca pa cit y ser ia l-ba sed m em or y fu n ct ion s (see F ig 1.).
Su ch lim it a t ion s m ea n t h a t con sciou s per cept ion occu r s in lin ea r in st a llm en t s, wit h t a sk efficien cy sign ifica n t ly r edu ced if m u lt it a sk swit ch in g is r equ ir ed (Ru bin st ein et a l, 2001). Du e t o it s lim it a t ion s, h u m a n r ea son in g is sign ifica n t ly depen den t u pon exist in g kn owledge (sch em a ) t o su ppor t sim plifica t ion of t h e t a sk, or con t ext u a lize episodic in for m a t ion ; a n d h a s been sh own t o in fl u en ce u ser a t t en t ive select ion (Ya r bu s, 1967). 2.4
Su m m a r y
It is clea r t h a t m u lsem edia sen sor y m edia per cept ion is n ot som et h in g t h a t ju st ‘h a ppen s .’ P er cept ion is a com plex com bin a t ion of st eps t h a t com bin e bot t om -u p (sen sor y pr ocessin g) a n d t op down (cogn it ive r ea son in g) pr ocesses, wh ich r esu lt in t h e a ppr ecia t ion of t h e m edia in for m a t ion a n d t h e in t er pr et a t ion of it s m ea n in g in con t ext of exist in g sem a n t ic a n d episodic kn owledge. Sen sor y pr ocessin g is well u n der st ood. Th e pr ocess of u n der st a n din g h ow kn owledge im pa ct s m u lsem edia m edia in t er pr et a t ion , per cept ion a n d a ccept a n ce, h owever , is a n excit in g a r ea of r esea r ch . 3.
RE LATE D WORK
Mu lsem edia r esea r ch , wh ile n ot m a in st r ea m a n d sh elt er in g per h a ps u n der m or e t r a dit ion a l r esea r ch a r ea s, h a s n on et h eless pr ogr essed over t h e pa st 20 yea r s. In t h is sect ion , we pr esen t key wor k in t h e a r ea . We st a r t off by h igh ligh t in g wor k don e on m on o-sen sor ia l eva lu a t ion . Of cou r se, m ost wor k per for m ed so fa r in t h is r espect t a r get s a u dit ion a n d vision . Sin ce t h e em ph a sis of t h is pa per is on m u lsem edia , we will n ot discu ss in det a il per cept ion -ba sed m odels for speech , a u dio, im a ge, gr a ph ics a n d video; in t er est ed r ea der s ca n r efer t o t h e r ecen t su r veys in su ch m odelin g a n d a pplica t ion s (e.g., Der m ot , et a l. 2009, Lin a n d Ku o 2011, M öller , et a l. 2011, Rein h a r d, et a l. 2013 , Rich a r d, et a l. 2013, Wu , et a l. 2013, You , et a l. 2010). Th er efor e, t h e m a in t h r u st of t h e fir st su b-sect ion below is t o in t r odu ce exist in g r esea r ch in volvin g ot h er sen ses, n a m ely, olfa ct ion , t a ct ion a n d gu st a t ion – bu t n ot in com bin a t ion wit h on e a n ot h er . Th e n ext su bsect ion t h en pr oceeds t o r eview r esea r ch explor in g t h e com bin a t ion of t wo or m or e sen ses in a digit a l en vir on m en t : m u lsem edia , wh ile Appen dix A gives m or e det a ils for t h e r ela t ed ba sic t ech n ica l a ppr oa ch es a n d com pu t a t ion a l m odels so fa r in t h e lit er a t u r e, a lt h ou gh t h e developm en t of m u lsem edia a lgor it h m a n d syst em s is st ill in it s in fa n cy . 3.1
Mon o sen sor ia l eva lu a t ion
Th er e h a s been in t er est in g a n d su bst a n t ia l r esea r ch in t o t h e olfa ct or y syst em t h a t en a bles h u m a n s t o r ecogn ize a n d ca t egor ize differ en t odor s a n d det er m in e m a n y beh a vior a l a n d socia l r ea ct ion s. H o a n d Spen ce (2005) in vest iga t ed t h e differ en t ia l effect s of olfa ct or y st im u la t ion u n der con dit ion s of va r yin g t a sk difficu lt y. P a r t icipa n t s det ect visu a lly pr esen t ed t a r get digit s fr om a st r ea m of visu a lly pr esen t ed dist r a ct or let t er s in a r a pid ser ia l visu a l pr esen t a t ion t a sk; a t t h e sa m e t im e, pa r t icipa n t s wer e r equ ir ed t o discr im in a t e st im u li pr esen t ed on t h e fr on t or ba ck of t h eir t or so. Th e r e su lt s sh owed a sign ifica n t per for m a n ce im pr ovem en t in t h e pr esen ce of pepper m in t odor (a s com pa r ed t o a ir ) in a difficu lt t a sk bu t n ot in a n ea sy on e. Th is dem on st r a t ed t h a t olfa ct or y st im u la t ion ca n fa cilit a t e t a ct ile per for m a n ce. In t h e digit a l wor ld, a pion eer in t h e a r ea of olfa ct ion is Ka ye (2001), wh o, in h is wor k on sym bolic olfa ct or y devices, exper im en t ed wit h a few pr ot ot ypica l design s of olfa ct or y da t a displa y devices t o illu st r a t e t h e con cept of com pu t er -con t r olled sm ell ou t pu t . F or h u m a n bein gs, odor st im u li a r e h igh ly a ssocia t ed wit h m a n y pr ocesses su ch a s em ot ion s, a t t r a ct ion , m ood, et c. Mon it or in g a n d a n a lyzin g elect r oen ceph a logr a m (E E G) of h u m a n br a in a ct ivit y du r in g per cept ion of odor s h a ve sh own (Ya zda n i, et a l., 2012) t h a t cla ssifica t ion of E E G sign a ls du r in g per cept ion of odor s ca n r evea l t h e plea sa n t n ess of t h e odor wit h r ela t ively h igh a ccu r a cy. Gh in ea , et a l. (2010, 2011) focu sed on olfa ct ion -en h a n ced a pplica t ion s. Th e ch a llen ges of en h a n cin g m u lsem edia wit h olfa ct ion wer e a lso discu sse d. Ta ct ion is a n ot h er im por t a n t sen se for m u lsem edia in vest iga t ion . F or fou n da t ion a l kn owledge in t h is a r ea a n d gu idelin es of design , r ea der s ca n r efer t o t h e pa per by Seu n gm oon a n d Ku ch en becker (2013). H a pt ic r en der in g (or h a pt ic displa y) con vey s in for m a t ion a bou t vir t u a l object s t o u ser s t h r ou gh t h e sen se of t ou ch . F or h a pt ic r en der in g, for ce-feedba ck displa y of con t a ct in t er a ct ion s ca n be r ea lized for bot h r igid a n d defor m a ble vir t u a l m odels. A gen er a l fr a m ewor k for for ce -feedba ck displa y of vir t u a l en vir on m en t s is pr esen t ed by Ot a du y, et a l. (2013), a n d t h e issu es, m odellin g, a n d a ssessm en t
r ela t ed t o h a pt ic a est h et ics is discu ssed by Ca r bon a n d J a kesch (2013). It h a s been sh own t h a t per ceivin g m a t er ia l pr oper t ies (in clu din g r ou gh n ess, fr ict ion , a n d t h er m a l pr oper t ies) of object s t h r ou gh t ou ch is gen er a lly su per ior t o t h e per cept ion of sh a pe (Kla t zky, et a l. 2013). Wit h r espect t o gu st a t ion , t h is sen se is in t r ica t ely lin k ed wit h olfa ct ion . H owever , t h e on ly wor k t a r get in g gu st a t ion per se of wh ich we a r e a wa r e is t h a t of Adr ia n Ch eok (h t t p://a dr ia n ch eok.in fo). H e a n d h is t ea m developed a t a st e t r a n sm it t er m a ch in e for sen din g t a st es r em ot ely (t h e u ser st ick s h is/h er t on gu e in a device wh ich t r a n sfor m s a sign a l deliver ed over t h e In t er n et in t o elect r ica l im pu lses t o t h e t on gu e). Cou pled wit h t h e gr ou p’s wor k in developin g a m a ch in e for sen din g olfa ct or y sign a ls over a n et wor k, t h e u lt im a t e a im is t o bu ild a wor ld r eposit or y of ga st r on om ic kn owledge, pr esu m a bly a ccessible on lin e t o u ser s ever ywh er e. 3.2
A Review of Mu lsem edia Resea r ch a n d Applica t ion s
Mu lsem edia r esea r ch is u su a lly in ext r ica bly lin ked t o t h e developm en t of n ovel a n d excit in g a pplica t ion s. On e of t h e ea r liest su ch m u lsem edia VR a pplica t ion is t h a t of Ca t er (1992) a n d h is t ea m , wh o developed a vir t u a l r ea lit y syst em t o t r a in pot en t ia l fir e-figh t er s t o r ecogn ize ch a r a ct er ist ic sm ells com m on ly a ssocia t ed wit h fir es. Th e pr oblem bein g solved in t h is ca se wa s t o fa m ilia r ize pot en t ia l fir e-figh t er s wit h t h ose sm ells t h a t a r e oft en a ssocia t ed wit h fir es, a s it is oft en t h ou gh t a n d a r gu ed t h a t it is ea sier t o r ecogn ize sm ells a lr ea dy kn own by a per son . Mor eover , in a fir e -figh t er ’s pr ofession , bein g a ble t o det ect t h e pr esen ce of su ch sm ells cou ld well pr ove in va lu a ble . La t er on , Din h et a l. (1999) in vest iga t ed t h e u se of t a ct ile, olfa ct or y a n d a u dit or y sen sor y m oda lit ies wit h differ en t levels of visu a l in for m a t ion on a u ser ’s sen se of pr esen ce a n d m em or y of t h e det a ils of a vir t u a l r ea lit y exper ien ce. Wit h r espect t o t h e olfa ct or y sen sor y m oda lit y, t h e r esea r ch st u dy wa s lim it ed wh en com pa r ed wit h ot h er sen sor y m oda lit ies con sider ed. Mor eover , t h e sin gle olfa ct or y cu e u sed in t h e st u dy did n ot pr odu ce a n y sign ifica n t effect on t h e sen se of pr esen ce, a lt h ou gh it did on m em or y. On e of t h e ben efit s of in t egr a t in g m u lsem edia in t er fa ces in a pplica t ion s is t h a t it ca n over com e lit er a cy ba r r ier s a n d br in g t h e wor ld of com pu t in g closer t o ca t egor ies of people wh o h a d h it h er t o been exclu ded fr om it . J a in (2003) wa s on e of t h e ea r liest t o m a ke t h is poin t , wh en descr ibin g t h e pot en t ia l of E xper ien t ia l Com pu t in g – com pu t in g ba sed on t h e wa y h u m a n s n a t u r a lly exper ien ce a n d in t er a ct wit h t h eir en vir on m en t . Ba sed pr im a r ily on video, a u dio, a n d t a ct ion , h e t h en descr ibes t h e pot en t ia l t h a t su ch in t er fa ces m igh t h a ve in en h a n cin g vir t u a l a n d a u gm en t ed r ea lit y syst em s. In r ela t ed wor k, Bodn a r , Cor bet t a n d Nekr a sovski (2004) cr ea t ed a n ot ifica t ion syst em t h a t m a d e u se of m u lsem edia da t a . In t h eir wor k, t h ey con du ct ed a n exper im en t a l st u dy t o com pa r e t h e effect of visu a l, a u dio or olfa ct or y displa ys t h e deliver y n ot ifica t ion s h a d on a u ser ’s en ga gem en t of a cogn it ive t a sk. P a r t icipa n t s wer e given a n a r it h m et ic t a sk t o com plet e a n d a t va r iou s in t er va ls t wo t ypes of n ot ifica t ion s wer e t r igger ed : 1) pa r t icipa n t s h a d t o im m edia t ely st op wh a t t h ey wer e doin g a n d r ecor d som e da t a befor e r et u r n in g t o t h e com plet ion of t h eir t a sk, a n d 2) t h ey cou ld ign or e t h e n ot ifica t ion . Wit h t h is exper im en t , t h ey fou n d t h a t wh ile olfa ct or y n ot ifica t ion s wer e t h e lea st effect ive in deliver in g n ot ifica t ion s t o en d u ser s, t h ey h a d t h e a dva n t a ge of pr odu cin g t h e lea st disr u pt ive effect on a u ser ’s en ga gem en t of a t a sk. It is a lso wor t h n ot in g t h a t t h ey en cou n t er ed m ost of t h e pr oblem s of u sin g sm ell ou t pu t a s h igh ligh t ed ea r lier by Ka ye in t h eir exper im en t a n d h a d pa r t icipa n t s m ost ly com m en t in g t h a t som e of t h e sm ells u sed wer e t oo sim ila r t o be dist in gu ish a ble . Lin ger in g sm ells in t h e a ir a lso m a de it difficu lt t o det ect t h e pr esen ce of n ew sm ells a n d t h e la ck of exper ien ce of wor kin g wit h olfa ct or y da t a im pa ct ed t h eir per for m a n ce of t h e a ssign ed t a sk. Br ewst er , McGookin a n d Miller (2006) u se explicit ly lea r n ed odor m em or ies t o eva lu a t e t h e effect iven ess of u sin g olfa ct or y da t a t o a id in m u lt im edia con t en t sea r ch in g, br owsin g a n d r et r ieva l in a digit a l ph ot o libr a r y. To con du ct t h is exper im en t , t h ey developed a n olfa ct or y ph ot o br owsin g a n d sea r ch in g t ool, wh ich t h ey ca lled Olfot o. Th e odor s a r e lea r n ed by get t in g pa r t icipa n t s t o com plet e t h e explicit odor m em or y t a sk of a ssocia t in g specific odor s wit h t h eir per son a l ph ot ogr a ph s, i.e. sm ellba sed ph ot o t a gs. P a r t icipa n t s wer e a lso r equ ir ed t o t a g t h e sa m e ph ot ogr a ph s u sin g t ext -ba sed t a gs. Th e t est in g ph a se occu r r ed t wo week s la t er , in wh ich pa r t icipa n t s wer e a sk ed t o com plet e t h r ee t ypes of exer cises. Two wer e m a t ch in g exer cises t h a t r equ ir ed m a t ch in g ph ot os wit h t h e sm ell/t ext t a gs t h ey h a d pr eviou sly a ssocia t ed wit h t h em – in on e exer cise m u lt iple ph ot os wer e pr esen t ed wit h on e
sm ell/t ext t a g a n d in t h e ot h er m u lt iple sm ell/t ext t a gs wer e pr esen t ed wit h on e ph ot ogr a ph . Th e t h ir d exer cise in volved sea r ch in g t h r ou gh t h eir digit a l ph ot o libr a r ies u sin g sm ell or t ext t a gs a ft er bein g given 3 k ey fea t u r es of t h e ph ot o. Despit e t h e fa ct t h a t r esea r ch h a s sh own t h a t odor m em or ies per sist lon ger t h a n wor d a n d ver ba l m em or ies t h e r esu lt s sh owed t h a t per for m a n ce wa s lower wit h t h e sm ell-ba sed t a gs. Th e lower per for m a n ce m a y well be a t t r ibu t ed t o t h e fa ct t h a t possibly odor m em or ies lin k ed t o em ot ion s, i.e. t h ose im plicit ly lea r n ed, la st lon ger t h a n t h ose explicit ly lea r n ed. Th u s wh ile in t h is exper im en t a l st u dy pa r t icipa n t s lea r n ed t o a ssocia t e a n odor m em or y wit h t h eir ph ot ogr a ph s, t h e m em or y wa s pr oba bly n ot a s pr ofou n d a s it wou ld h a ve been if t h e odor m em or y h a d been im plicit ly lea r n ed du r in g t h e r ea l life m om en t wh en t h e ph ot o wa s t a ken . Non et h eless, t h e fin din gs fr om t h eir st u dy su ggest t h a t odor m em or ies do h a ve t h e pot en t ia l t o pla y a r ole in m u lt im edia con t en t sea r ch in g. In r ela t ed wor k, t h e effect s of olfa ct ion on in for m a t ion r eca ll in a vir t u a l r ea lit y ga m e en vir on m en t wer e eva lu a t ed by Tor t ell et a l. (2007). In t h is exper im en t a l st u dy, pa r t icipa n t s en ga ged in ga m e pla y in a vir t u a l r ea lit y en vir on m en t . Th e fir st ph a se of t h e st u dy in volved a n im plicit odor lea r n in g per iod for on e gr ou p of pa r t icipa n t s, wh er e su bject s h a d a sm ell pr esen t wh ilst pla yin g t h e vir t u a l r ea lit y ga m e. Th e ot h er gr ou p of pa r t icipa n t s in t h is ph a se of t h e exper im en t h a d n o sm ell pr esen t wh ile t h ey pla yed t h e ga m e. In t h e secon d ph a se of t h e exper im en t , wh ich wa s a n in for m a t ion r eca ll t a sk a bo u t t h e VR en vir on m en t , pa r t icipa n t s wer e a ga in split in t o t wo gr ou ps. On e gr ou p per for m ed t h e t a sk wit h t h e sa m e sm ell t h a t wa s pr esen t du r in g t h e fir st ph a se of t h e exper im en t , wh ile t h e secon d gr ou p per for m ed t h e in for m a t ion r eca ll t a sk wit h n o sm ell pr e sen t . P a r t icipa n t s wer e r a n dom ly a ssign ed t o gr ou ps in t h e t wo ph a ses of t h e exper im en t s, so t h a t pa r t icipa n t s wh o com plet ed t h e fir st ph a se of t h e exper im en t in t h e pr esen ce of sm ell did n ot n ecessa r ily get t o com plet e t h e secon d ph a se wit h t h e pr esen ce of sm ell a n d vice ver sa . Resu lt s sh owed t h a t t h e su bject s wh o wer e pr esen t ed wit h scen t on ly du r in g t h e r eca ll ph a se per for m ed by fa r t h e wor st , wh ile su bject s wit h scen t on ly du r in g t h e VR exper ien ce per for m ed t h e best . H owever , t h e gen er a l fin din gs fr om t h e st u dy did sh ow t h a t t h e in t r odu ct ion of scen t in t h e VR en vir on m en t h a d a posit ive effect on su bject s’ r ecollect ion of t h e en vir on m en t . Mu lt im edia en t er t a in m en t , su ch a s com pu t er ga m es, is a n ot h er a r ea t h a t is expect ed t o ben efit fr om t h e a ddit ion of ou r ot h er sen sor y cu es (t h u s becom in g m u lsem edia ga m es). It is expect ed t h a t t h ey will h eigh t en t h e sen se of pr esen ce a n d r ea lit y a n d h en ce im pa ct posit ively on u ser exper ien ce, e.g. m a k e it a m or e en ga gin g exper ien ce for u ser s. Below, we m en t ion som e m edia en t er t a in m en t syst em s t h a t in volve t h e u se of olfa ct or y da t a in on e wa y or a n ot h er . F r a gr a is a Visu a l-Olfa ct or y vir t u a l r ea lit y ga m e t h a t en a bles pla yer s t o explor e t h e in t er a ct ive r ela t ion sh ip bet ween olfa ct ion a n d vision (Moch izu ki et a l., 2004). Th e object ive of t h e ga m e is t o iden t ify if t h e visu a l cu es exper ien ced cor r espon d t o t h e olfa ct or y cu es a t t h e sa m e t im e. Th e ga m e en vir on m en t h a s a m yst er iou s t r ee t h a t bea r s m a n y kin ds of foods. P la yer s ca n ca t ch t h ese food it em s by m ovin g t h eir r igh t h a n d a n d wh en t h ey ca t ch on e of t h e it em s a n d m ove it in fr on t of t h eir n ose, t h ey sm ell som et h in g wh ich does n ot n ecessa r ily cor r espon d t o t h e food it em t h ey a r e h oldin g. Alt h ou gh t h ey do n ot r epor t on a n y det a iled eva lu a t ion of t h eir im plem en t ed ga m e, t h ey do r epor t t h a t in t h eir pr elim in a r y exper im en t , t h e per cen t a ge of qu est ion s a n swer ed cor r ect ly va r ied a ccor din g t o t h e com bin a t ion of visu a l in for m a t ion a n d olfa ct or y in for m a t ion a n d con clu de t h a t t h er e is a possibilit y t h a t som e foods’ a ppea r a n ce m igh t h a ve s t r on ger in for m a t ion t h a n t h eir scen t s, a n d vice ver sa . A sim ila r in t er a ct ive com pu t er ga m e, ca lled t h e “Cookin g Ga m e,” wa s cr ea t ed by N a ka m ot o a n d h is r esea r ch t ea m a t t h e Tokyo In st it u t e of Tech n ology (Na ka m ot o et a l., 2008). In ea r lier r ela t ed wor k, Boyd Da vis et a l. (2006) u sed olfa ct or y da t a t o cr ea t e a n in t er a ct ive digit a l olfa ct or y ga m e. H owever , t h e m a in object ive of t h eir exper im en t , “wh a t sh ou ld t h e design er of in t er a ct ive syst em s kn ow a bou t olfa ct or y da t a ?” is a qu est ion a lr ea dy a n swer ed by pr edecessor s in t h e field. In t h eir wor k, t h ey developed a su it e of digit a l ga m es in wh ich t h ey u se olfa ct or y da t a , (i.e., t h r ee differ en t scen t s) t o en ga ge u ser s in ga m e pla y. Th e u ser s’ sen se of sm ell is t h e m a in sk ill n eeded t o win t h e ga m es. Th e fin din gs fr om t h eir wor k fu r t h er con fir m r esu lt s r epor t ed by Ka ye a bou t t h e u se of olfa ct or y da t a . Mor r ot et a l. (2001) ca r r ied ou t a sim ila r st u dy t o in vest iga t e t h e in t er a ct ion bet ween t h e vision of color s a n d odor det er m in a t ion u sin g lexica l a n a lysis of win e exper t s’ t a st in g com m en t s. F or t h e
exper im en t , t h ey sim u la t e a win e t a st in g pr a ct ice, wh er e t h e win e t a st er s pr ovide com m en t s on t h e t a st ed win es ba sed on t h e visu a l, olfa ct or y a n d gu st a t or y pr oper t ies of t h e win es. A pr eviou s st u dy (Willia m s et a l., 1984) h a d a ct u a lly sh own t h a t per cept ion of t h e olfa ct or y qu a lit ies of win es ch a n ges depen din g on wh et h er t h e color of t h e win e is visible or h idden fr om t h e su bject s by u sin g t r a n spa r en t a n d opa qu e win e gla sses r espect ively. In t h e st u dy ca r r ied ou t by Mor r ot et a l. , t h ey color ed a wh it e win e a r t ificia lly r ed a n d pr esen t ed it t o win e exper t s t o a n a lyze, a lon gside t h e u n color ed wh it e win e a n d a r ed win e. To con fir m t h a t t h e color a n t u sed t o a r t ificia lly color t h e win e h a d n o in flu en ce on t h e color ed win e, a pr e-t est exper im en t wa s ca r r ied ou t t o con fir m t h a t t h e wh it e win e a n d it s a r t ificia lly color ed ver sion wer e per ceived a s t h e sa m e wh en it s color wa s obscu r ed fr om t h e t a st er s. Th eir r esu lt s sh owed t h a t t h e wh it e win e wa s per ceived a s h a vin g t h e odor of a r ed win e w h en color ed r ed (a ll of t h e win e t a st er s t h a t pa r t icipa t ed in t h e st u dy descr ibed t h e a r t ificia lly color ed win e wit h t er m s r ela t in g t o r ed win e qu a lit ies; t h e win e’s color t h u s a ppea r s t o pr ovide sign ifica n t sen sor y in for m a t ion , wh ich m islea ds t h e su bject s’ a bilit y t o ju dge fla vor ; la st ly, t h e m ist a k e is st r on ger in t h e pr esen ce t h a n in t h e a bsen ce of a ccess t o t h e win e color ). Th e Resea r ch in Au gm en t ed & Vir t u a l E n vir on m en t Syst em s (RAVE S) r esea r ch gr ou p r epor t ed a st u dy con du ct ed t o in vest iga t e t h e im pa ct of olfa ct ion (con cor da n t a n d discor da n t scen t s) on a u ser ’s sen se of im m er sion in t o a vir t u a l r ea lit y en vir on m en t (J on es et a l, 2004). Th e exper im en t a l st u dy in volved pa r t icipa n t s pla yin g a com pu t er ga m e in a n im m er sive vir t u a l exper im en t . Th e exper im en t a l con dit ion s con sist ed of a con t r ol ca se wh er e n o scen t s wer e r elea sed wh ile t h e pa r t icipa n t pla yed t h e ga m e a n d t wo exper im en t a l ca ses, on e in volvin g con cor da n t scen t s (e.g., em ission of a n ocea n m ist scen t a s t h e pla yer pa ssed t h e ocea n a n d a m u st y scen t wh en t h e pla yer wa s in t h e for t in t h e im m er sive en vir on m en t ) a n d t h e ot h er a discor da n t scen t (e.g., sm ell of m a ple syr u p t h r ou gh ou t t h e ga m e). Th e r esu lt s fr om t h is st u dy wer e n ot st a t ist ica lly sign ifica n t , h owever . It is of lit t le su r pr ise t h a t , beca u se of t h e r ela t ive n ovelt y of t h e m u lsem edia com bin a t ion s in volved, t h e st u dies r eviewed so fa r a lso explor e u ser a ccept a n ce of t h ese n ew m edia object s . Th is is a t h em e ca r r ied for wa r d in m or e r ecen t r esea r ch (Gh in ea a n d Adem oye, 2012), wh ich looked a t u ser per cept ion a n d a ccept a n ce of olfa ct or y m edia com bin ed wit h t h e m or e t r a dit ion a l a u dio a n d video. Ka h ol et a l. (2006) pr esen t st r a t egies a n d a lgor it h m s t o m odel con t ext in h a pt ic a pplica t ion s t h a t a llow u ser s t o explor e h a pt ica lly object s in vir t u a l r ea lit y/a u gm en t ed r ea lit y en vir on m en t s. Th e r esu lt s fr om t h eir st u dy sh ow sign ifica n t im pr ovem en t in a ccu r a cy a n d efficien cy of h a pt ic per cept ion in a u gm en t ed r ea lit y en vir on m en t s wh en com pa r ed t o con ven t ion a l a ppr oa ch es t h a t do n ot m odel con t ext in h a pt ic r en der in g. In deed, t h e u se of h a pt ics in m u lsem edia VR en vir on m en t s h a s r ecen t ly been t h e su bject of ot h er r esea r ch (a s in t h e wor k of Apost olopou los et a l., 2012). In r ela t ed wor k , r esea r ch er s r epor t ed on a per cept u a l st u dy ca r r ied ou t t o est a blish a n a lgor it h m t o pr ovide h igh qu a lit y in t er -m edia st r ea m syn ch r on iza t ion bet ween h a pt ic a n d a u dio (voice) m edia object s in a vir t u a l en vir on m en t (Ish iba sh i et a l. 2004). In deed, syn ch r on iza t ion seem s t o be a com m on t h em e a cr oss m u lsem edia r esea r ch . Th u s, r ecen t wor k h a s explor ed syn ch r on iza t ion of olfa ct or y m edia wit h a u dio-visu a l con t en t (Gh in ea a n d Adem oye, 2010 a ), wh ile St ein ba ch et a l. (2012) in vest iga t ed syn ch r on iza t ion issu es bet ween differ en t m oda lit ies a n d t h e in t egr a t ion of video a n d h a pt ics in r esou r ce con st r a in ed com m u n ica t ion n et wor k s. Gh in ea a n d Adem oye (2010b) t a ckled olfa ct ion -en h a n ced m u lsem edia , by com bin in g com pu t er gen er a t ed sm ell wit h h a pt ic da t a . In t er a ct ive m edia a n d a pplica t ion s h a ve becom e u biqu it ou s a n d com pet e for a t t en t ion in ou r ever yda y life a n d wor k. As discu ssed by Sa r t er (2013), t h is u biqu it y h a s led t o a n in cr ea sin g n eed of effect ive m u lt im oda l in t er fa cin g a n d decision s, in clu din g in for m a t ion dist r ibu t ion a cr oss differ en t sen sor y ch a n n els t o en su r e det ect ion , in t er pr et a t ion , a n d h a n dlin g of sign a ls. An over view of wellkn own m odels of m u lt im oda l m a n a gem en t wa s pr esen t ed by Sa r t er . In r ela t ed wor k, Rob et a l. (2013) pr esen t ed st u dies of m u lt isen sor y (a u dio, t a ct ile, et c.) in t egr a t ion a n d cr oss -m oda l spa t ia l a t t en t ion t o en ga ge m or e t h a n ju st a sin gle sen se in com plex en vir on m en t s. F ir st ly, m u lt im oda l sign a ls wer e u sed t o r eor ien t spa t ia l a t t en t ion u n der t h e con dit ion s in wh ich u n im oda l sign a ls m a y be in effect ive. Secon dly, m u lt im oda l sign a ls a r e less lik ely t o be m a sk ed in n oisy en vir on m en t s . An d la st ly, n a t u r a l lin ks exist bet ween specific sign a ls a n d pa r t icu la r beh a vior a l r espon ses. A m u lt im oda l syst em sh ou ld be design ed t o m in im ize a n y in con gr u en ce pr esen t ed in differ en t sen sor y m oda lit ies t h a t r ela t e t o t h e sa m e even t .
We a lso m en t ion t h a t m u lsem edia h a s gr ea t t h er a peu t ic pot en t ia l. Wh ile a r om a t h er a py, m u sic t h er a py a n d t h er a pies ba sed on t ou ch a ll em ploy pr im a r ily on e h u m a n sen se, t h e cr ea t ion of m u lt isen sor y r oom s, wh ich give m u lsem edia exper ien ces t o in dividu a ls wit h specia l n eeds, r a n g in g fr om lea r n in g difficu lt ies t o a u t ism , Alzh eim er ’s a n d dem en t ia , h a s been r epor t ed. Accor din gly, t h e E U F r a m ewor k P r oject 5 M E DIAT E r epor t ed r esea r ch on r oom s com pr isin g bot h visu a l (e.g. ligh t , color , UV ligh t , pr oject ion s, illu sion s), a u dio (e.g. soot h in g m u sic), olfa ct or y (i.e. a r om a t h er a py dispen ser s), a n d t a ct ile st im u li (i.e., object s wit h differ en t t ext u r es, sh a pes, vibr a t ion ) (Gu m t a u , 2011). Acr oss t h e At la n t ic, a n d a ga in for t h er a peu t ic pu r poses, M u ltisen sory S ystem s (h t t p://m u lt isen sor ysyst em s.com ) h a ve developed a n im m er sive m u lsem edia syst em in t egr a t in g 3D sou n d, olfa ct ion , vibr a t ion a n d im a ger y. La st bu t n ot lea st , m u lsem edia a pplica t ion s wer e fir st cr ea t ed in a ssocia t ion wit h t h e film in du st r y. So it sh ou ld com e a s n o su r pr ise t h a t t h e a r t s a n d t h e cr ea t ive in du st r ies con t in u e t o exper im en t m u lsem edia in t h eir con t en t a n d deliver y m ech a n ism s. In so doin g, in t er a ct ive digit a l exper ien ces a r e n o lon ger a u dio-visu a l cr ea t ion s bu t m u lsem edia on es. Th e in t egr a t ion of h a pt ic a n d olfa ct or y ca pa bilit ies in m a n y con t em por a r y in t er a ct ive design s m a kes t h e com m u n ica t ive pot en t ia l of m u lsem edia in t er m s of sen sor y, a ffect ive, in dividu a l a n d cr ea t ive expr ession even m or e r eleva n t . Th u s for in st a n ce, Ba m boozle t h ea t r e (h t t p://www.ba m boozlet h ea t r e.co.ok) a n d Oily Th ea t r e (h t t p://www.oilyca r t .or g.u k ) bot h specia lize on m u lt i-sen sor y per for m a n ces t a ilor ed exclu sively for ch ildr en wit h a u t ism or com plex disa bilit ies. Th ea t r ica l m u lsem edia exper ien ces a r e a lso for m a in st r ea m a u dien ces – Disn ey’s 4D m ovie exper ien ces fea t u r in g t a ct ile a n d olfa ct or y st im u li on t op of t h e t r a dit ion a l a u diovisu a l pr esen t a t ion h a ve been a st a ple of a u dien ces for t h e la st 30-40 yea r s. Dyn a m ic Mot ion Rides (DyM oR id es) is a n Au st r ia n com pa n y, wh o h a ve developed a h ost of “com plex a n d in n ova t ive en t er t a in m en t a t t r a ct ion s,” a ll in volvin g m u lsem edia , for a wide r a n ge of en t er t a in m en t pa r k s wor ldwide; wh ile t h e well-kn own Lowr y t h ea t r e in Ma n ch est er will be st a gin g Nosfer a t u (h t t p://www.t h elowr y.com /even t /n osfer a t u ), a m u lsem edia t h ea t r ica l even t in F ebr u a r y 2014, n o less. 4. 4.1
MP E G-V: A STANDARD F OR MU LSE ME DIA Con t ext a n d Object ives
Th e in it ia l pu r pose of t h e MP E G-V st a n da r d wa s t o pr ovide a n a r ch it ect u r e a n d a ssocia t ed in for m a t ion r epr esen t a t ion s t o en a ble t h e in t er oper a bilit y bet ween vir t u a l wor l ds a n d t h e r ea l wor ld. Th is a lso expla in s t h e n a m e MP E G-V, wh er e “V” st a n ds for vir t u a l wor ld a n d t h e st a n da r d wa s en t it led “in for m a t ion exch a n ge wit h vir t u a l wor lds”, la t er r en a m ed t o “m edia con t ext a n d con t r ol” t o br oa den it s scope. Th e a ct u a l a r ch it ect u r e of t h e MP E G-V st a n da r d defin es in t er fa ces – wh ich a r e pr ovided in t h e for m of XML- a n d bin a r y-ba sed r epr esen t a t ion for m a t s – bet ween digit a l con t en t pr ovider s (in cl. vir t u a l wor lds) a n d r ea l-wor ld devices com pr isin g sen sor s a n d a ct u a t or s. Th ese r ea l-wor ld devices m a y offer va r iou s ca pa bilit ies con t r olled by a ppr opr ia t e device com m a n ds issu ed by t h e digit a l con t en t a pplica t ion s. Alt er n a t ively, t h ese com m a n ds m a y be a lso u sed t o con t r ol devices wit h in vir t u a l wor lds. Th e MP E G-V st a n da r d com pr ises t h e followin g pa r t s: P a r t 1: Ar ch it ect u r e – descr ibes t h e gen er a l syst em a r ch it ect u r e a s well a s m a jor in t er fa ces a n d in t er oper a bilit y poin t s. P a r t 2: Con t r ol In for m a t ion – defin es t h e m ea n s t o descr ibe t h e ca pa bilit ies of (r ea l-wor ld) devices a s well a s t o con t r ol t h em . P a r t 3: Sen sor y In for m a t ion – pr ovides t h e m ea n s t o descr ibe sen sor y effect s a s discu ssed in t h e n ext sect ion . P a r t 4: Vir t u a l Wor ld Object Ch a r a ct er ist ics – pr ovides da t a r epr esen t a t ion for m a t s t o specify vir t u a l object s t h a t ca n be exch a n ged wit h ot h er vir t u a l wor lds. P a r t 5: Da t a F or m a t s for In t er a ct ion Devices – focu ses on device in t er a ct ivit y a n d a ssocia t ed da t a for m a t s. P a r t s 6 a n d 7 defin e com m on da t a t ypes a n d t ools n eeded for t h e ot h er pa r t s a s well a s con for m a n ce a n d r efer en ce soft wa r e.
4.2
Sen sor y In for m a t ion
Th e m a in pu r pose of MP E G-V P a r t 3 – Sen sor y In for m a t ion – is t o en h a n ce bot h t h e qu a lit y of a n d u ser exper ien ce of m u lt im edia ser vices by a n n ot a t in g exist in g m u lt im edia con t en t wit h a ddit ion a l sen sor y effect s. Th e m a in m ot iva t ion beh in d t h is wor k is t h a t t h e con su m pt ion of m u lt im edia con t en t m a y st im u la t e a lso ot h er h u m a n sen ses – goin g beyon d h ea r in g a n d seein g – in clu din g olfa ct ion , m ech a n or ecept ion , t h er m ocept ion , et c. Th er efor e, m u lt im edia con t en t is a n n ot a t ed pr ovidin g so-ca lled sen sor y effect s t h a t st eer a ppr opr ia t e devices ca pa ble of r en der in g t h ese effect s givin g t h e u ser t h e sen sa t ion of bein g pa r t of t h e pa r t icu la r m edia wh ich r esu lt s in a wor t h wh ile, in for m a t ive u ser exper ien ce. 4.2.1 Con cept an d S ystem Arch itectu re. Th e con cept a n d syst em a r ch it ect u r e of r eceivin g sen sor y effect s in a ddit ion t o a u dio/visu a l con t en t is depict ed in F ig.2. Th e m edia a n d t h e cor r espon din g sen sor y effect m et a da t a (SE M) m a y be obt a in ed fr om a Digit a l Ver sa t ile Disc (DVD), Blu -r a y Disc (BD), or a n y kin d of on lin e ser vice (i.e., down loa d/pla y or st r ea m in g). Th e m edia pr ocessin g en gin e , wh ich ca n be deployed on a set -t op-box, DVD/BD pla yer , or a n y ot h er sm a r t device, is r espon sible for pla yin g t h e a ct u a l m edia r esou r ce a n d a ccom pa n yin g sen sor y effect s in a syn ch r on ized wa y ba sed on t h e u ser ’s set u p in t er m s of bot h m edia a n d sen sor y effect r en der in g. Th er efor e, t h e m edia pr ocessin g en gin e m a y a da pt bot h t h e m edia r esou r ce a n d t h e SE M a ccor din g t o t h e ca pa bilit ies of t h e va r iou s r en der in g devices.
Fig. 2. Concept and System Architecture of Sensory Information
Th e MP E G-V P a r t 3 st a n da r d deliber a t ely defin es on ly t h e r epr esen t a t ion for m a t s wit h ou t det a ilin g h ow t o cr ea t e a n d h ow t o con su m e m u lt im edia con t en t en r ich ed wit h sen sor y effect m et a da t a . Th is a ppr oa ch en a bles in t er oper a bilit y a m on g differ en t ven dor s wh ile su ppor t in g a br oa d r a n ge of a pplica t ion dom a in s. P ossible m ea n s for cr ea t in g a n d con su m in g m u lt im edia wit h sen sor y effect s in clu din g it s qu a lit y a ssessm en t a r e descr ibed in Sect ion 5. Th e r epr esen t a t ion for m a t s defin ed wit h in MP E G-V P a r t 3 a r e n ow descr ibed in t h e followin g. 4.2.2 S en sory E ffect Description L an gu age (S E DL ). Th e Sen sor y E ffect Descr ipt ion La n gu a ge (SE DL) is a n XML Sch em a -ba sed la n gu a ge wh ich en a bles on e t o descr ibe so-ca lled sen sor y effect s su ch a s ligh t , win d, fog, vibr a t ion , et c. t h a t t r igger h u m a n sen ses. Th e a ct u a l sen sor y effect s a r e n ot pa r t of SE DL bu t defin ed wit h in t h e Sen sor y E ffect Voca bu la r y (SE V) for ext en sibilit y a n d flexibilit y a llowin g ea ch a pplica t ion dom a in t o defin e it s own sen sor y effect s (see Sect ion 4.2.3). A descr ipt ion con for m in g t o SE DL is r efer r ed t o a s Sen sor y E ffect Met a da t a (SE M) a n d m a y be u sed in a n y m u lt im edia con t en t (e.g., m ovies, m u sic, Web sit es, ga m es). Th e SE M ca n st eer sen sor y devices lik e fa n s, vibr a t ion ch a ir s, la m ps, et c. via a n a ppr opr ia t e m edia t ion device t o en h a n ce t h e u ser exper ien ce. Th a t is, in a ddit ion t o t h e a u dio-visu a l con t en t of, for exa m ple, a m ovie, t h e u ser will per ceive ot h er effect s, givin g h er /h im t h e sen sa t ion of bein g pa r t of t h e pa r t icu la r m edia wh ich sh ou ld r esu lt in a wor t h wh ile, in for m a t ive u ser exper ien ce. Th e cu r r en t syn t a x a n d sem a n t ics of SE DL a r e specified in (Tim m er er et a l., 2011). H owever , in t h is pa per we pr ovide a n E BNF (E xt en ded Ba cku s –Na u r F or m )-lik e over view of SE DL.
SEM ::= [autoExtraction] [DescriptionMetadata] (Declarations|GroupOfEffects|Effect|ReferenceEffect)+ SE M is t h e r oot elem en t . It m a y con t a in a n opt ion a l au toE xtraction a n d Description M etad ata a t t r ibu t es followed by a sequ en ce of Declaration s, Grou pOfE ffects, E ffect, a n d R eferen ceE ffect elem en t s. Th e au toE xtraction a t t r ibu t e is u sed t o sign a l wh et h er a u t om a t ic ext r a ct ion of a sen sor y effect fr om t h e m edia r esou r ce is pr efer a ble. Th e Description M etad ata a t t r ibu t e pr ovides in for m a t ion a bou t t h e SE M it self (e.g., a u t h or in g in for m a t ion ) a n d a lia ses for cla ssifica t ion sch em es (CS) u sed t h r ou gh ou t t h e wh ole descr ipt ion . Th e MP E G -7 descr ipt ion sch em e (Ma n ju n a t h et a l., 2002) is u sed. Declarations ::= (GroupOfEffects|Effect|Parameter)+ Th e Declaration s elem en t defin es a set of SE DL elem en t s – wit h ou t in st a n t ia t in g t h em – for la t er u se in a SE M via a n in t er n a l r efer en ce. In pa r t icu la r , t h e Param eter m a y be u sed t o defin e com m on set t in gs u sed by sever a l sen sor y effect s sim ila r t o va r ia bles in pr ogr a m m in g la n gu a ges. A Grou pOfE ffects st a r t s wit h a tim estam p t h a t pr ovides in for m a t ion a bou t t h e poin t in t im e wh en t h is gr ou p of effect s sh ou ld becom e a va ila ble for t h e a pplica t ion . Th is in for m a t ion ca n be u sed for r en der in g pu r poses a n d syn ch r on iza t ion wit h t h e a ssocia t ed m edia r esou r ce. XML St r ea m in g In st r u ct ion s a s defin ed in MP E G-21 Digit a l It em Ada pt a t ion (Vet r o a n d Tim m er er , 2005) h a ve been a dopt ed for t h is fu n ct ion a lit y. F u r t h er m or e, a Grou pOfE ffects sh a ll con t a in a t lea st t wo E ffectDefin ition for wh ich n o t im est a m ps a r e r equ ir ed a s t h ey a r e pr ovided wit h in t h e en closin g elem en t . Th e a ct u a l E ffectDefin ition com pr ises a ll in for m a t ion per t a in in g t o a sin gle sen sor y effect . Effect ::= timestamp EffectDefinition An E ffect is u sed t o descr ibe a sin gle effect wit h a n a ssocia t ed tim estam p. EffectDefinition::=[SupplementalInformation][activate][duration] [fade-in][fade-out][alt][priority][intensity][position] [adaptability][autoExtraction] An E ffectDefin ition m a y h a ve a S u pplem en talIn form ation elem en t for defin in g a r efer en ce r egion fr om wh ich t h e effect in for m a t ion m a y be ext r a ct ed in ca se a u t oE xt r a ct ion is en a bled. F u r t h er m or e, sever a l opt ion a l a t t r ibu t es a r e defin ed wh ich a r e defin ed a s follows: activate descr ibes wh et h er t h e effect sh a ll be a ct iva t ed; d u ration descr ibes h ow lon g t h e effect sh a ll be a ct iva t ed; fad e-in a n d fad e-ou t pr ovide m ea n s for fa din g in /ou t effect s r espect ively; alt descr ibes a n a lt er n a t ive effect iden t ified by a u n ifor m r esou r ce iden t ifier URI (e.g., in ca se t h e or igin a l effect ca n n ot be pr ocessed); priority descr ibes t h e pr ior it y of effect s wit h r espect t o ot h er effect s in t h e sa m e gr ou p of effect s; in ten sity in dica t es t h e st r en gt h of t h e effect in per cen t a ge a ccor din g t o a pr edefin ed sca le/u n it (e.g., for win d t h e Bea u for t sca le is u sed); position descr ibes t h e posit ion fr om wh er e t h e effect is expect ed t o be r eceived fr om t h e u ser ’s per spect ive (i.e., a t h r ee -dim en sion a l spa ce is defin ed in t h e st a n da r d); ad aptability a t t r ibu t es en a ble t h e descr ipt ion of t h e pr efer r ed t ype of a da pt a t ion wit h a given u pper a n d lower bou n d; au toE xtraction wit h t h e sa m e sem a n t ics a s a bove bu t on ly for a cer t a in effect . 4.2.3 S en sory E ffect Vocabu lary (S E V). Th e Sen sor y E ffect Voca bu la r y (SE V) defin es a clea r set of a ct u a l sen sor y effect s t o be u sed wit h t h e Sen sor y E ffect Descr ipt ion La n gu a ge (SE DL) in a n ext en sible a n d flexible wa y. Th a t is, it ca n be ea sily ext en ded wit h n ew effect s or by der iva t ion of exist in g effect s t h a n ks t o t h e ext en sibilit y fea t u r e of XML Sch em a . F u r t h er m or e, t h e effect s a r e defin ed in a wa y t o a bst r a ct fr om t h e a u t h or s in t en t ion a n d be in depen den t fr om t h e en d u ser ’s device set t in g. Th e sen sor y effect m et a da t a elem en t s or da t a t ypes a r e m a pped t o com m a n ds t h a t con t r ol sen sor y devices ba sed on t h eir ca pa bilit ies. Th is m a ppin g is u su a lly pr ovided by t h e m edia pr ocessin g en gin e a n d deliber a t ely n ot defin ed in t h is st a n da r d, i.e., it is left open for in du st r y com pet it ion . It is im por t a n t t o n ot e t h a t t h er e is n ot n ecessa r ily a on e -t o-on e m a ppin g bet ween elem en t s or da t a t ypes
of t h e sen sor y effect m et a da t a a n d sen sor y device ca pa bilit ies. F or exa m ple, t h e effect of h ot /cold win d m a y be r en der ed on a sin gle device wit h t wo ca pa bilit ies, i.e., a h ea t er /a ir con dit ion er a n d a fa n /ven t ila t or . Cu r r en t ly, t h e st a n da r d defin es t h e followin g effect s. Lig h t, c o lo re d lig h t, fla s h lig h t for descr ibin g ligh t effect s wit h t h e in t en sit y in t er m s of illu m in a t ion expr essed in [lu x]. F or t h e color in for m a t ion , a cla ssifica t ion sch em e (CS) is defin ed by t h e st a n da r d com pr isin g a com pr eh en sive list of com m on color s. F u r t h er m or e, it is possible t o specify t h e color a s RGB. Th e fla sh ligh t effect ext en ds t h e ba sic ligh t effect by t h e fr equ en cy of t h e flick er in g in t im es per secon d. Te m p e ra tu re descr ibes a t em per a t u r e effect of h ea t in g/coolin g wit h r espect t o t h e Celsiu s sca le. Win d pr ovides a win d effect wh er e it is possible t o defin e it s st r en gt h wit h r espect t o t h e Bea u for t sca le. Vibra tio n a llows on e t o descr ibe a vibr a t ion effect wit h st r en gt h specified u sin g a Rich t er m a gn it u de sca le. F or t h e w a te r s p ra y e r, s c e n t, a n d fo g effect t h e in t en sit y is pr ovided in t er m s of m l/h . F in a lly, t h e c o lo r c o rre c tio n effect defin es pa r a m et er s t h a t m a y be u sed t o a dju st t h e color in for m a t ion in a m edia r esou r ce t o t h e ca pa bilit ies of en d u ser devices. F u r t h er m or e, it is a lso possible t o defin e a r egion of in t er est wh er e t h e color cor r ect ion sh a ll be a pplied in ca se t h is desir a ble (e.g., bla ck /wh it e m ovies wit h on e a ddit ion a l color su ch a s r ed). 5. 5.1
QUALITY OF SE RVICE , QUALITY OF E XP E RIE NCE , AND QUALITY OF SE NSORY E XP E RIE NCE Mu lsem edia a n d Qu a lit y of Sen sor y E xper ien ce
New r esea r ch per spect ives on a m bien t in t elligen ce a r e pr esen t ed in Aa r t s a n d de Ru yt er (2009), wh ich in clu des a lso sen sor y exper ien ces ca llin g for a scien t ific fr a m ewor k t o ca pt u r e, m ea su r e, qu a n t ify, ju dge, a n d expla in t h e u ser exper ien ce. In a pr eviou s pa per (de Ru yt er a n d Aa r t s, 2004) t h e a u t h or s r epor t on t h e effect a ddit ion a l ligh t effect s h a ve on u ser s. User st u dies sh owed t h a t ligh t effect s a r e a ppr ecia t ed by u ser s for bot h a u dio a n d visu a l con t en t s. In t h e con t ext of t h e MP E G-V st a n da r diza t ion (Tim m er er et a l., 2011) som e wor k h a s been pu blish ed r ela t ed t o sen sor y exper ien ce t h a t is wor t h m en t ion in g h er e. Su k et . a l.(2009) in t r odu ce a n ew gen er a t ion of m edia ser vice ca lled Sin gle Media Mu lt iple Devices (SMMD) wh ich is ba sed on Sen sor y E ffect Met a da t a (SE M) a s defin ed in MP E G -V. In pa r t icu la r , t h e SMMD m edia con t r oller is descr ibed t h a t m a ps sen sor y effect s on a ppr opr ia t e sen sor y devices for t h e pr oper r en der in g t h er eof. Th e m a in focu s of t h is wor k is on im plem en t a t ion a n d en gin eer in g. An ea r lier ver sion pu t s t h e con t r oller in t h e con t ext of Un iver sa l P lu g a n d P la y (UP n P ), t h u s, focu sin g a lso on im plem en t a t ion /en gin eer in g a spect s (P yo et a l., 2008). Koon et a l. (2010) pr esen t a fr a m ewor k for 4-D br oa dca st in g ba sed on MP E G-V, t h a t is, t h e m a in focu s is on deliver in g a ddit ion a l r epr esen t a t ion for m a t s in t h e MP E G-2 Tr a n spor t St r ea m (M2TS) a n d it s decodin g wit h in t h e h om e n et wor k en vir on m en t in clu din g t h e a ct u a l ser vice discover y. In t h is con t ext , Wa lt l et a l. (2013) pr ovide a n open -sou r ce en d-t o-en d t ool ch a in for cr ea t in g a n d con su m in g m u lt im edia con t en t en r ich ed wit h sen sor y effect s com plia n t t o MP E G-V ba sed on off-t h e-sh elf in fr a st r u ct u r e. Not e t h a t sen sor y effect s a r e n ot lim it ed t o st a t ion a r y in st a lla t ion s su ch a s in h om e en vir on m en t s a s t h er e is a lr ea dy r esea r ch t o br in g sen sor y effect s t o m obile devices (Ch a n g a n d O’Su lliva n ., 2005). F u r t h er m or e, Kim et a l. (2010) in t r odu ces — a m on g ot h er s — n ew loca t ion -ba sed m obile m u lt im edia t ech n ology u sin g u biqu it ou s sen sor n et wor k -ba sed five sen ses con t en t . Th e t em por a l bou n da r ies wit h in wh ich olfa ct or y da t a ca n be u sed t o en h a n ce m u lt im edia a pplica t ion s a r e in vest iga t ed in (Adem oye a n d Gh in ea , 2009) con clu din g t h a t olfa ct ion a h ea d of m u lt im edia con t en t is m or e t oler a ble t h a n olfa ct ion beh in d con t en t . F in a lly, Gr ega et a l. (2008) pr ovide a good over view of t h e st a t e -of-t h e-a r t in QoE eva lu a t ion for m u lt im edia ser vices wit h a focu s on su bject ive eva lu a t ion m et h ods wh ich lea ds u s t o r ela t ed wor k in t h e a r ea of QoE m odels. Most of t h ese m odels focu s on a sin gle m oda lit y (i.e., a u dio, im a ge, or video on ly) or a sim ple com bin a t ion of t wo m oda lit ies (i.e., a u dio a n d video). F or t h e com bin a t ion of a u dio a n d video con t en t on e m a y em ploy t h e ba sic qu a lit y m odel for m u lt im edia a s descr ibed in (H a n ds, 2004). An ot h er a ppr oa ch is kn own a s t h e IQX h ypot h esis for m u la t ed a s a n expon en t ia l fu n ct ion (H oßfeld et a l., 2008). In (P er eir a , 2005) a t r iple u ser ch a r a ct er iza t ion m odel for video a da pt a t ion a n d
QoE eva lu a t ion is descr ibed t h a t in t r odu ces a t lea st t h r ee qu a lit y eva lu a t ion dim en sion s, n a m ely sen sor ia l (e.g., sh a r pn ess, br igh t n ess), per cept u a l (e.g., wh a t /wh er e is t h e con t en t ), a n d em ot ion a l (e.g., feelin g, sen sa t ion ) eva lu a t ion . F u r t h er m or e, it pr oposes a d a pt a t ion t ech n iqu es for t h e m u lt im edia con t en t a n d qu a lit y m et r ics a ssocia t ed t o ea ch of t h ese la yer s. Th e focu s is clea r ly on h ow a n a u dio/visu a l r esou r ce is per ceived, possibly t a kin g in t o a ccou n t cer t a in u ser ch a r a ct er ist ics (e.g., h a n dica ps) or n a t u r a l en vir on m en t con dit ion s (e.g., illu m in a t ion ). 5.2
H ow t o cr ea t e, con su m e, a n d ca pt u r e Qu a SE
In t h is sect ion , we pr esen t a t ool ch a in for cr ea t in g a n d con su m in g m edia r esou r ces a n n ot a t ed wit h sen sor y effect in clu din g m ea n s t o ca pt u r e t h e Qu a lit y of Sen sor y E xpe r ien ce (Qu a SE ). Th is set of t ools is on e of t h e fir st com plet e en d -t o-en d t ool ch a in s offer in g a n ea sy a ccess fr om t h e gen er a t ion of SE M descr ipt ion s t ill t h e con su m pt ion of a u dio/video (A/V) con t en t a ccom pa n ied by SE M descr ipt ion s in t h e con t ext of t h e Wor ld Wide Web or t h e loca l pla yba ck devices. F ig. 3 illu st r a t es t h e wh ole t ool ch a in st a r t in g fr om t h e a n n ot a t ion t ool (SE Vin o) on t h e left side. Th is t ool r eceives t h e m u lt im edia con t en t for a n n ot a t ion wit h sen sor y effect s a n d ou t pu t s t h e cor r espon din g SE M descr ipt ion . Th ese t wo a sset s ca n t h en be loa ded in t o t h e sim u la t or (SE Sim ) loca t ed in t h e cen t er of t h e figu r e or deliver ed via DVD, Blu -Ra y, or t h e In t er n et . If t h e con t en t is em bedded in t o a Web sit e t h e Web br owser plu g-in ca n pla yba ck t h e m u lt im edia con t en t wit h in t h e Web br owser a n d u se t h e SE M descr ipt ion t o st eer a ppr opr ia t e devices. If t h e con t en t is a va ila ble on ot h er m ea n s (e.g., DVD, Blu -Ra y) t h en t h e st a n d-a lon e m u lt im edia pla yer (SE MP ) ca n be u sed for en h a n cin g t h e viewin g exper ien ce. N ot e t h a t t h e pla yba ck of t h e Web br owser plu g-in is per for m ed by t h e Web br owser it self. All t ools a r e fr eely a va ila ble u n der a n open -sou r ce licen se a n d ca n be down loa ded fr om t h e Web sit e of t h e Sen sor y E xper ien ce La b (SE La b) (h t t p://sela b.it ec.a a u .a t ).
Fig. 3 Overview of end-to-end tool chain enabling to create, consume, and capture QuaSE.
Th e Sen sor y E ffect Video An n ot a t ion (SE Vin o) t ool a llows a n n ot a t in g video sequ en ces wit h va r iou s sen sor y effect s (e.g., win d, vibr a t ion , ligh t ) a n d gen er a t in g MP E G -V-com plia n t SE M descr ipt ion s. It is 1 wr it t en in J a va a n d for t h e a ct u a l decodin g a n d r en der in g of t h e A/V files t h e J a va bin din gs for VLC a r e u sed. Th u s, it pr ovides m ea n s for em beddin g t h e VLC pla yer in t o a J a va a pplica t ion a n d, t h u s, 1 http://www.videolan.org/vlc/ (last access: March 2014)
en a bles a n a pplica t ion t o su ppor t a lot of differ en t codecs (e.g., H .264, MP E G -2) a n d file for m a t s (e.g., MP 4, AVI). Th e Sen sor y E ffect Sim u la t or (SE Sim ) a llows for sim u la t in g sen sor y effect s t h a t a r e con t a in ed in SE M descr ipt ion s. Th e Sen sor y E ffect Media P la yer (SE MP ) is a Dir ect Sh ow-ba sed m edia pla yer wh ich su ppor t s t h e followin g devices for r en der in g sen sor y effect s: t h e P h ilips a m BX syst em (wit h t wo 2 fa n s, a wa ll wa sh er , t wo ligh t -spea ker s, a su bwoofer , a n d a wr ist r u m bler ) , t h e Cybor g Ga m in g 3 Ligh t s (in cl. h igh -power LE Ds) , a n d t h e Vor t ex Act iv device (com pr ises fou r slot s for pr ovidin g fou r 4 differ en t scen t s) . Not e t h a t a s t h e m edia pla yer u ses Dir ect Sh ow for pla yba ck, t h e m edia pla yer ca n h a n dle a ll for m a t s a n d codecs wh ich a r e su ppor t ed eit h er n a t ively by Win dows or via va r iou s codec pa cks. F in a lly, t h e Web br owser plu gin is ba sed on t h e Am bien t Lib wh ich en a bles a r bit r a r y a pplica t ion s t o en r ich t h e u ser exper ien ce wit h sen sor y effect s. Th u s, t h e libr a r y ca n be seen a s a n a da pt a t ion a n d pr ocessin g en gin e bet ween t h e vir t u a l d escr ipt ion of sen sor y effect s a n d r ea l devices ca pa ble of r en der in g t h e descr ibed effect s. In pa r t icu la r , it pr ovides fu n ct ion a lit ies t o pa r se SE M descr ipt ion s, a ccor din g t o t h e MP E G-V st a n da r d, color ca lcu la t ion of video fr a m es, a n d en a bles r en der in g of sen sor y effect s on a va r iet y of devices. Am bien t Lib pr ovides a n Applica t ion P r ogr a m m in g In t er fa ce (AP I) a n d a Dr iver In t er fa ce (DI). Th e AP I en a bles em beddin g t h e libr a r y wit h in a n y a pplica t ion a n d t h e DI is u sed for a n ea sy in t egr a t ion of ext er n a l devices (e.g., t h ose su ppor t ed a lso be SE MP ) r en der in g sen sor y effect s. On e su ch a pplica t ion is t h e Web br owser wh ich a llows t h e u se of sen sor y effect s wit h em bedded video con t en t on t h e Wor ld Wide Web su ch a s You Tu be. In or der t o ca pt u r e t h e Qu a lit y of E xper ien ce (QoE ) en a bled by m u lsem edia , com pr isin g t r a dit ion a l a u dio-visu a l con t en t en r ich ed wit h sen sor y effect s, a ppr opr ia t e su bject ive qu a lit y a ssessm en t s n eed t o be con du ct ed. Th er efor e, Wa lt l et a l. (2012) pr ovides a sen sor y effect da t a set a n d t est set u ps ba sed on t h e open sou r ce t ools in t r odu ced a bove. Th e t est set u ps a r e a lign ed wit h ITU -T’s r ecom m en da t ion s for su bject ive qu a lit y a ssessm en t s wh ich pr ovide t h e ba sis t o st u dy t h e im pa ct on t h e QoE wh en con su m in g m u lt im edia a sset s a n n ot a t ed wit h sen sor y effect s. Tim m er er et a l. (2012) descr ibes t h e r esu lt s of t h r ee su bject ive qu a lit y a ssessm en t s in t h is dom a in ba sed on m et h ods defin ed by ITU -T P .910 a n d P .911, r espect ively. Th e m a in con clu sion s fr om t h ese u ser st u dies a r e t h a t gen r es su ch a s a ct ion , spor t s, a n d a lso docu m en t a r y ben efit fr om a ddit ion a l sen sor y effect s wh ile t h e im pa ct on t h e QoE for gen r es lik e com m er cia ls a n d, specifica lly, n ews is n ot t h a t m u ch a ppr ecia t ed. Addit ion a lly, m edia r esou r ces wit h sen sor y effect s m a y su ccessfu lly m a sk visu a l qu a lit y deg r a da t ion s of t h e a ct u a l video con t en t . In t h e ext r em e ca se, t h e low -qu a lit y ver sion of t h e video en h a n ced wit h sen sor y effect s r eceives h igh er r a t in gs (on a m ea n opin ion scor e sca le) t h a n t h e h igh -qu a lit y ver sion of t h e video wit h sen sor y effect s. F in a lly, in (Ra in er et a l., 2012) t h e im pa ct on t h e em ot ion a l st a t e is in vest iga t ed a cr oss differ en t sit es in Au st r ia a n d Au st r a lia . Th e r esu lt s in dica t e t h a t t h e in t en sit y of a ct ive em ot ion s (e.g., in t er est , su r pr ise, fu n ) a r e in cr ea sed for video sequ en ces wit h s en sor y effect s com pa r ed t o t h ose wit h ou t sen sor y effect s. Th e r esu lt s of t h e Au st r ia n sit e a lso su ggest t h a t t h e in t en sit y of pa ssive em ot ion s (e.g., wor r y, fea r , a n ger ) a r e decr ea sed for video sequ en ces wit h sen sor y effect s (com pa r ed t o t h ose wit h ou t sen s or y effect s) bu t wit h t h e r esu lt s fr om t h e ot h er sit es, it does n ot yet a llow for a gen er a l con clu sion on wh et h er pa ssive em ot ion s a r e decr ea sed or in cr ea sed in t h eir in t en sit y. F in a lly, t h e u lt im a t e goa l is t o defin e a u t ilit y m odel wh ich t r ies t o est im a t e t h e QoE of m u lt im edia con t en t en h a n ced wit h sen sor y effect s ba sed on va r iou s in flu en ce fa ct or s a n d fea t u r es. See (Le Ca llet et a l., 2013) for a gen er a l defin it ion of QoE . Th ese in flu en ce fa ct or s a n d fea t u r es r esu lt fr om t h e QoE of t h e a ct u a l m u lt im edia con t en t a n d t h e QoE con t r ibu t ion s of t h e in dividu a l sen sor y effect s a n d t h e com bin a t ion s t h er eof. Th e for m er ca n be est im a t ed ba sed on exist in g m odels (e.g., su ch a s t h ose r efer en ced in t h e r ela t ed wor k sect ion ) wh er ea s t h e QoE con t r ibu t ion s of t h e sen sor y effect s, bot h in dividu a l a n d com bin a t ion s, r equ ir es fu r t h er su bject ive qu a lit y a ssessm en t s. Th er efor e, t h e r esu lt s of su ch st u dies (Wa lt l et a l., 2010; Tim m er er et a l., 2013) in dica t e a lin ea r r ela t ion sh ip bet ween t h e 2 http://www.ambx.com/ (last access: March 2014) 3 http://www.cyborggaming.com/prod/ambx.htm (last access: March 2014) 4 http://www.daleair.com/vortex-activ (last access: March 2014)
n u m ber of effect s a n d t h e a ct u a l QoE . Th u s, t h e QoE of m u lt im edia con t en t en h a n ced wit h sen sor y effect s is r efer r ed t o a s Qu a lit y of Sen sor y E xper ien ce (Qu a SE ) a n d ca n be est im a t ed fr om t h e QoE of t h e a u dio-visu a l con t en t wit h ou t sen sor y effect s (QoE a v) a s depict ed a s: QuaSE := QoEav (δ +∑wibi) In t h is u t ilit y m odel, w i r epr esen t s t h e weigh t in g fa ct or for a sin gle sen sor y effect of t ype i (i.e., wit h t h e given set u p a s descr ibed a bove, i ∈ {ligh t(l), w in d (w ), vibration (v)}). Addit ion a l sen sor y effect t ypes su ch a s scen t m a y be in cor por a t ed ea sily, e.g., a s soon a s a ppr opr ia t e devices becom e a va ila ble. Th e va r ia bles b i ∈ 0, 1 depict t h e bin a r y va r ia bles for ea ch effect a n d a r e u sed t o in dica t e wh et h er a n effect is pr esen t for a given set u p. F in a lly, δ is u sed for fin e-t u n in g a n in st a n t ia t ion of t h e m odel. 6.
RE SE ARCH CH ALLE NGE S AND OP E N ISSUE S
Mu lsem edia is a n em er gin g a n d excit in g r esea r ch a r ea t h a t we believe wou ld ext r a ct m u ch effor t fr om t h e r ela t ed a ca dem ic a n d in du st r ia l com m u n it ies. We h a ve poin t ed ou t t h e ch a llen ges a n d possible r esea r ch wor k in Appen dix A a ft er exist in g ba sic t ech n ica l a ppr oa ch es a n d com pu t a t ion a l m odels a r e discu ssed. In t h is sect ion , we will h igh ligh t R&D possibilit ies for t h e n ea r fu t u r e in or der t o fu r t h er a dva n ce t h e t ech n ology, a pplica t ion s a n d ser vices, ba s ed u pon t h e a u t h or s’ u n der st a n din g a n d pr oject exper ien ce in t h e r ela t ed fields. Tech n ica l a dva n cem en t is expect ed t o be m a de in a n d fa cilit a t ed by effect ive a lgor it h m developm en t , su bst a n t ia l da t a ba se bu ildin g, m ea n in gfu l a pplica t ion s a n d wider u ser a ccept a n ce. 6.1
Mu lsem edia – a solu t ion in sea r ch of a k iller a pp?
6.1.1 T aste – th e last fron tier? F or com pu t a t ion m odelin g of t h e fu n ct ion in g of h u m a n sen ses, a s discu ssed in Sect ion 3.1, m ost wor k h a s been don e for a u dit ion a n d vision ; sign ifica n t r ecen t in t er est s h a ve a ppea r ed t owa r d olfa ct ion a n d t a ct ion ; a n d gu st a t ion is obviou sly t h e lea st in vest iga t ed t opic so fa r . We expect in cr ea sin g a ct ivit ies t o h a ppen for gu st a t ion a n d t h e r ela t ed issu es. On e ch a llen ge t h a t we see is t h a t , sin ce t a st e bu ds a r e loca t ed in t h e m ou t h , devices t h a t t r a n sm it sen sa t ion s of t a st e will n ecessa r ily be in va sive; a lt er n a t ively, given t h e close r ela t ion sh ip bet ween t a st e a n d sm ell, it wou ld a lso be in t er est in g t o m on it or if t h e solu t ion u lt im a t ely a dopt ed will be t o u se (n on -in va sive) olfa ct or y in pu t s t o st im u la t e a n d en ga ge gu st a t ion . 6.1.2 Atten tion m od elin g. H u m a n a t t en t ion r efer s t o t h e cogn it ive pr ocess of select ively con cen t r a t in g on on e a spect of t h e en vir on m en t wh ile ign or in g ot h er t h in gs (An der son , 2004). As descr ibed in sect ion 2, in pu t s fr om on e sen se or differ en t sen ses com pet e for h u m a n a t t en t ion . At t en t ion m odelin g h a s been for m u la t ed a s t h e a lloca t ion of pr ocessin g r esou r ces in h u m a n s, wit h a la r ge n u m ber of exa m ples in t h e visu a l sen se (It t i, et a l. 1998, Zh a n g a n d L in , 2013) a n d join t a u diovisu a l sen ses (Ma , et a l. 2005, You , et a l. 2007). A com pr eh en sive a t t en t ion m odel sh ou ld eva lu a t e st im u li fr om a ll five sen ses, a n d t h is r epr esen t s a m ea n in gfu l r esea r ch ch a llen ge for QoE explor a t ion . 6.1.3 B u ild in g d atabases. Appr opr ia t e da t a ba ses pla y im por t a n t r oles in discover in g n ecessa r y in sigh t s for m odelin g, m odel pa r a m et er det er m in a t ion , a n d m odel ver ifica t ion , a s eviden ced in t h e r ela t ed exist in g visu a l a n d a u dio m odelin g (Der m ot , et a l. 2009, Lin a n d Ku o 2011, Möller , et a l. 2011), a n d cr oss-da t a ba se eva lu a t ion is essen t ia l t owa r d m odels’ gen er a lit y (Na r wa r ia a n d Lin 2012, Na r wa r ia , et a l. 2012). Th er e h a ve been on ly a ver y lim it ed n u m ber of da t a ba ses a va ila ble for odor (h t t p://www.odou r .or g.u k/in for m a t ion .h t m l, h t t p://sen sela b.m ed.ya le.edu /odor db/?db=5) a n d t ou ch (h t t p://br l.ee.wa sh in gt on .edu /H a pt icsAr ch ive/exp001.h t m l); m or e pu blic da t a ba ses a r e n eeded for m u lsem edia (in clu din g gu st a t ion ). 6.1.4 M u lsem ed ia an d perform in g arts/ en tertain m en t . 4D (a n d 5D) t h ea t r es a r e a st a ple a t t r a ct ion of t h em e pa r k s wor ldwide a n d h a ve been im pa r t in g ‘n ovel’ m u lsem edia exper ien ces t o t h eir visit or s for som e yea r s n ow. Th e ch a llen ge will be t o m ove su ch exper ien ces fr om t h e t h em e pa r ks in t o t h e m a in st r ea m . To som e ext en t t h is is a lr ea dy h a ppen in g: vibr a t in g ga m in g ch a ir s, wit h in t egr a t ed
su bwoofer s (h t t p://www.4ga m er s.n et /pr odu ct s/ps3/in t er a ct ive-ga m in g-ch a ir ), wh ich m a ke u ser s ‘feel’ t h e a ct ion (a n d t h e ba ss in t h e a u dio) a r e ga in in g in popu la r it y a n d becom in g m or e a ffor da ble. Non et h eless, in or der for m u lsem edia t o pr olifer a t e in t h ese dom a in s , we n eed t o bet t er u n der st a n d h ow a u dien ces r ea ct t o m u lsem edia effect s; t h is will a lso en a ble scr ipt a u t h or s t o effect ively in t egr a t e t h em in t h e r espect ive st or y lin es. 6.1.5 M u lsem ed ia in tegration , syn ch ron ization , an d in ten sities. E ffect ive in t egr a t ion of m u lsem edia effect s r equ ir es sever a l qu est ion s t o be a n swer ed: Wh a t m u lsem edia com bin a t ion s wor k in pr a ct ice? In wh a t doses/in t en sit ies ? Wh a t syn ch r on iza t ion r equ ir em en t s do n ew m edia su ch a s olfa ct or y a n d gu st a t or y m edia n eed t o sa t isfy in r ela t ion t o t h eir cou n t er pa r t s? Th ese a r e a ll a s -of-yet u n a n swer ed qu est ion s, wh ich fu t u r e r esea r ch n eeds t o t a r get . On ce cla r ified, n ew – m u lsem edia - a u t h or in g t ools wou ld n eed t o be wr it t en . 6.1.6 Wearable M u lsem ed ia. Th e m in ia t u r iza t ion of sen sor s a n d com pu t in g devices a lik e h a s led t o a n in cr ea sed focu s on t h e pot en t ia l of wea r a ble t ech n ology: r ecen t ly, bot h Google (t h r ou gh t h e Google Gla ss pr oject - h t t p://www.google.com /gla ss/st a r t /) a n d Son y (t h r ou gh t h e Sm a r t Wig pr oject h t t p://www.bbc.co.u k/n ews/t ech n ology-25099262) h a ve br ou gh t t o m a r k et wea r a ble com pu t in g ga dget s. If on e t h in k s t h a t in dividu a ls a lr ea dy ‘wea r ’ per fu m e a n d r eceive vibr a t in g a ler t s wh en t h eir sm a r t ph on es a r e in silen t m ode, t h e pot en t ia l of wea r a ble devices t o t r a n sm it m u lsem edia con t en t becom es obviou s. Resea r ch will n eed t o be don e in or der t o u n der st a n d h ow best t o in t egr a t e su ch con t en t in wea r a ble devices, a n d in deed, h ow best t o design su ch devices so t h a t t h ey ca n be pu r veyor s of m u lsem edia . 6.1.7 M u lsem ed ia an d e-learn in g. Mu lsem edia a u t h or in g t ools wou ld a lso com e in h a n dy for e lea r n in g syst em s. Th is, a s e-lea r n in g syst em s st a n d t o ga in pot en t ia l ben efit s fr om olfa ct ion -en h a n ced m u lsem edia a pplica t ion s (for in st a n ce), a s t h e on lin e lea r n in g of cer t a in su bject m a t t er s, e.g. ch em ist r y, m a y be fu r t h er en h a n ced by t h e a ddit ion of t h e cor r espon din g sm ells if it wer e possible t o t r a n sm it odor s, or m or e pr ecisely, t r a n sm it com m a n ds t o a sm ell gen er a t in g device t o m ix a n d em it t h e r equ ir ed scen t over t h e In t er n et . Su ch fu t u r e wor k wou ld of n ecessit y n eed t o explor e in wh a t con t ext s a n d t o wh ich ext en t does m u lsem edia im pr ove com m u n ica t ion s. In so doin g, gu idelin es a bou t h ow exa ct ly t o u se m u lsem edia t o a ch ieve a m or e a ccu r a t e kn owledge t r a n s fer wou ld n eed t o be ela bor a t ed. 6.1.8 M u lsem ed ia an d e-com m erce. Th e opt ion s t o feel t h e t ext u r e of a sh ir t t h a t on e wish es t o bu y, t o sm ell t h e fr a gr a n ce on e is con t em pla t in g of pu r ch a sin g, of in h a lin g t h e a r om a , a s well a s seein g, t a st in g a n d exper ien cin g t h e t ext u r e of a gou r m et dish befor e bookin g a t a ble a t t h e r est a u r a n t ser vin g it , a ll h a ve t h e pot en t ia l of m ovin g fr om t h e r ea lm of possibilit ies t o t h a t of r ea lit y . In so doin g, t h e t ou ch /t a st e/sm ell ba r r ier s cu r r en t ly ch a r a ct er ist ic of e -com m er ce will be over com e. 6.1.9 User Acceptan ce an d E xperien ce. We st a r t ed off t h is sect ion by h igh ligh t in g t h e n e ed for a m u lsem edia killer a pp. Wh ilst in t h e a bove we h a ve det a iled, a m on g ot h er s, wh a t we believe t o be pot en t ia lly in t er est in g m u lsem edia developm en t s, we ca n n ot m a ke a n y pr edict ion s for wh a t a k iller m u lsem edia a pp m igh t be. On e t h in g, h owever , is cer t a in : u ser a ccept a n ce, a n d m or e im por t a n t ly, t a k e-u p is essen t ia l for a n y killer a pp. In or der t o do t h is, fu t u r e wor k n eeds t o u n der t a k e m u lsem edia QoE st u dies t o bet t er u n der st a n d h ow m u lsem edia u ser s r ea ct t o su ch exper ien ces. Mor eover , in so doin g, su ch effor t s wou ld a lso in for m t h e developm en t of object ive m u lsem edia QoE m et r ics. 6.2
F in a l t h ou gh t
“Seein g is believin g” is a n oft en -qu ot ed idiom . P er h a ps n ot so well kn own is t h e fa ct t h a t t h e th com plet e idiom , a s pen n ed by it s a u t h or , t h e 17 cen t u r y E n glish cler gym a n , Th om a s F u ller , is a ct u a lly “Seein g is believin g, bu t feelin g is t h e t r u t h .” We su bscr ibe t o t h is st a t em en t , bu t feel t h a t , for m u lsem edia , t h e idiom is (a t lea st ) t h r ee sen t en ces t oo sh or t .
RE F E RE NCE S Aarts, E. and de Ruyter, B. 2009. New Research Perspectives on Ambient Intelligence, Journal of Ambient Intelligence and Smart Environments, 1, 1, 5–14. Ademoye, O. and Ghinea, G. 2009. Synchronization of Olfaction-Enhanced Multimedia, IEEE Transactions on Multimedia, 11, 3, 561–565. Anderson, J. R. 2004. Cognitive psychology and its implications (6th ed.). Worth Publishers Apostolopoulos, J. G., Chou, P. A., Culbertson, B., Kalker, T., Trott, M. D. and Wee, S., 2012. The Road to Immersive Communication, Proceedings of the IEEE, 100, 4, 974 –990. Ayabe–Kanamura, S., Schicker, I., Laska, M., Hudson, R., Distel, H., Koboyakawa, T., and Saito S., 1998. A JapaneseGerman cross-cultural study, Chemical Senses, 23, 31—38. Bodnar, A., Corbett, R. and Nekrasovski, D. 2004. AROMA: Ambient awareness through olfaction in a messaging application: Does olfactory notification make 'scents'?. In Proceedings Sixth International Conference on Multimodal Interfaces (ICMI'04), 183 -- 190. Boyd Davis, S., Davies, G., Haddad, R. and Lai, M. 2006. Smell Me: Engaging with an Interactive Olfactory Game. In Proceedings of the Human Factors and Ergonomics Society 25th Annual Meeting, 25--40, UK. Brewster, S.A., McGookin, D.K. and Miller, C.A. 2006. Olfoto: Designing a smell-based interaction. In Proceedings CHI 2006: Conference on Human Factors in Computing Systems, 653 – 662. Campbell, D., Jones, E., and Glavin, M., 2009. “Audio quality assessment techniques—A review, and recent developments”, Signal Processing, 89, 1489—1500. Carbon, C.-C. and Jakesch, M. 2013. A Model for Haptic Aesthetic Processing and Its Implications for Design, Proceedings of the IEEE, 101, 9, 2123-2133. Cater, J.P. 1992. The Nose Have It! Letters to the Editor, Presence, 1, 4, 493—494. Chang, A. and O’Sullivan, C. 2005. Audio-Haptic Feedback in Mobile Phones, in Proceedings CHI ’05 extended abstracts on Human factors in computing systems, CHI EA ’05, ACM, New York, NY, USA, 2005, 1264–1267. Craig, A. D. (2003). Interoception: the sense of the physiological condition of the body. Current opinion in neurobiology, 13(4), 500-505. Damasio, A. R. 1989. Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition, Cognition, 33, 25–62. de Ruyter, B. and Aarts, E., 2004, Ambient Intelligence: Visualizing the Future, in Proceedings of the working conference on Advanced visual interfaces AVI ’04:, ACM Press, New York, NY, USA 203–208. DiMaggio, P. 1997. Culture and cognition. Annual Review Of Sociology, 23263-287. Dinh, H.Q., Walker, N., Hodges, L.F., Song, C. and Kobayashi, A. 1999. Evaluating the importance of multi-sensory input on memory and the sense of presence in virtual environments. In Proceedings - Virtual Reality Annual International Symposium, 222--228. Fadel, C., & Lemke, C. 2008. Multimodal learning through media: What the research says. San Jose, CA: CISCO Systems. Retrieved October, 21, 2010. Ghinea, G. and Ademoye, O, 2010a Perceived Synchronization of Olfactory Multimedia , IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans , 40, 4, 657 – 663. Ghinea, G. and Ademoye, O. 2010b. A User Perspective of Olfaction-Enhanced Mulsemedia. In Proceedings of the International Conference on Management of Emergent Digital EcoSystems (MEDES '10), 277-280, Thailand, Bangkok. Ghinea, G. and Ademoye, O., 2011. Olfaction-enhanced multimedia: perspectives and challenges, Multimedia Tools and Applications, 55, 3, 601-626. Ghinea, G. and Ademoye, O., 2012. The sweet smell of success: Enhancing multimedia applications with olfaction”, ACM Transactions on Multimedia Computing, Communications and Applications 8, 1, 2. Goldstein, E. B. 2013. Sensation and perception. Cengage Learning. Gray, R., Spence, C., Ho, C. and Tan, H.Z. 2013. Efficient Multimodal Cuing of Spatial Attention Proceedings of the IEEE, 101, 9,. 2113 – 2121. G ega, M., Janows i, ., es c u , M., omania , ., and api , . 200 uality of E pe ience Evaluation fo Multimedia Se vices - S acowanie post egane a o sci uslug (QoE) komunikacji multimedialnej, Przegla d Telekomunika- cyjny, 81, 4, 142–153. Gumtau, S. 2011. Affordances of touch in multi--sensory embodied interface design. PhD thesis, University of Portsmouth, UK. Hands, D. 2004. A Basic Multimedia Quality Model, IEEE Transactions on Multimedia, 6, 6, 806–816. Heilig, M. L. 1962. Sensorama Simulator, United States Patent Office (3,050,870); Patented August 28, 1962. Hinterseer, P. and Steinbach, E. 2006. A psychophysically motivated compression approach for 3D haptic
data, Proc. Int. Symp. Haptic Interfaces Virtual Environ. Teleoperator Syst., 35–41. Ho, C. and Spence, C., 2005.Olfactory facilitation of dual-task performance, Neuroscience letters, 389, 1, 35--40. Ishibashi, Y., Kanbara, T., and Tasaka, S., 2004. Inter-stream synchronization between haptic media and voice in collaborative virtual environments. In Proceedings of the 12th annual ACM international conference on Multimedia, ACM, NewYork, NY, USA, 604-611. Hoßfeld, T., Hock, D., Tran-Gia, P., Tutschku, K., Fiedler, M. Testing the IQX Hypothesis for Exponential Interdependency between QoS and QoE of Voice Codecs iLBC and G.711, in Proceedings 18th ITC Specialist Seminar on Quality of Experience, Karlskrona, Sweden, 2008. Itti, L. Koch, C. and Niebur, E. 1998. A model of saliency-based visual attention for rapid scene analysis, IEEE Trans Patt Anal Mach Intell., 20,11, pp. 1254-9. ITU-T Rec. P.910, Subjective Video Quality Assessment Methods for Multimedia Applications, April 2008.
ITU-T Rec. P.911, Subjective Audiovisual Quality Assessment Methods for Multimedia Applications, December 2008. Jain, R. 2003. Experiential computing, Communications of the ACM 46, 7, 48-55. Jayant, N., Johnston, J. and Safranek, R. 1993. Signal compression based on models of human perception, Proc. IEEE, 81, 1385–1422. Jones, L., Bowers, C.A., Washburn, D., Cortes, A. and Satya, R.V. 2004, The Effect of Olfaction on Immersion into Virtual Environments, in Human Performance, Situation Awareness and Automation: Issues and Considerations for the 21st Century Lawrence Erlbaum Associates, 282—285. Kahol, K., Tripathi, P., Mcdaniel, T., Bratton, L. and Panchanathan, S. 2006. Modeling context in haptic perception, rendering, and visualization, ACM Transactions on Multimedia Computing, Communications and Applications, 2, 3, 219--240. Kammerl, J., Vittorias, I., Nitsch, V., Faerber, B., Steinbach, E., and Hirche, S. 2010. Perception-based data reduction for haptic force-feedback signals using adaptive deadbands, Presence, Teleoper. Virtual Environ., 19, 5, 450–462. Kahneman D. 2003. A perspective on judgement and choice. American Psychologist. 58, 697-720. Kaye, J.N. 2001, Symbolic Olfactory Display, Master of Science edn, Massachusetts Institute of Technology, Massachusetts, U.S.A. Available: http://www.media.mit.edu/~jofish/thesis/ Kim, H. Kwon, H.-J., and K.-S. Hong, 2010. Location Awareness-based Intelligent Multi-Agent Technology, Multimedia Syst., 16, (4-5), 275–292. Klatzky, R.L., Pawluk, D., and Peer, A. 2013. Haptic Perception of Material Properties and Implications for Applications, Proceedings of the IEEE, 101, 9, 2081-2092. Le Callet, P., Möller, S. and Perkis, A. (eds) 2013. Qualinet White Paper on Definitions of Quality of Experience.European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003),.Lausanne, Switzerland, Version 1.2. Lin, W. 2006. Computational Models for Just-noticeable Difference, Chapter 9 in Digital Video Image Quality and Perceptual Coding, eds. H. R. Wu and K. R. Rao, CRC Press. Lin, W. and Jay Kuo, C.-C. 2011. Perceptual Visual Quality Metrics: A Survey, J. of Visual Communication and Image Representation, 22, 4, 297—312. Liu K and Gulliver S. R. 2013. Semiotics in Building space for Working and Living in Intelligent Building: Design, Management and Operation ed.Clements-Croome D. Lu, Z., Lin, W., Yang, X., Ong, E. and Yao, S. 2005. Modeling Visual Attention's Modulatory Aftereffects on Visual Sensitivity and Quality Evaluation. IEEE Trans. Image Processing, 14, 11, 1928 – 1942. Ma, Y-F, Hua, X-S , Lu, L. and Zhang, H-J. 2005. A generic framework of user attention model and its application in video summarization. IEEE Trans. on Multimedia, 7,5, 907-919. Mayer, R. E. 2003. Elements of a science of e-learning. Journal of Educational Computing Research, 29(3), 297-313. Manjunath, B. S. Salembier, P. and Sikora., T. 2002. Introduction to MPEG-7: Multimedia Content Description Interface, John Wiley and Sons Ltd. Marois, R., & Ivanoff, J. 2005. Capacity limits of information processing in the brain. Trends in cognitive sciences, 9(6), 296305. Metzinger, T. 1995. Faster than thought. Holism, homogeneity and temporal coding. In T. Metzinger (Ed.), Conscious experience 425–461, Paderborn: Schoningh Mochizuki, A., Amada, T., Sawa, S., Takeda, T., Motoyashiki, S., Kohyama, K., Imura, M. and Chihara, K. 2004, Fragra: a visual-olfactory VR game. In Proceedings SIGGRAPH '04: ACM SIGGRAPH 2004 Sketches ACM Press, New York, NY, USA, pp. 123. Möller, S., Chan, W-Y., Côté, N., Falk, T.H., Raake, A., and Wältermann, M. 2011. Speech Quality Estimation: Models and trends, IEEE Signal Processing Magazine, 28, 6, 18—28. Morrot, G., Brochet, F. and Dubourdieu, D. 2001, The Color of Odors, Brain and Language, 79, 2, 309-320. Narwaria, M. and Lin, W. 2012. SVD-Based Quality Metric for Image and Video Using Machine Learning, IEEE Trans. on Systems, Man, and Cybernetics--Part B, 42(2), 347 - 364. Narwaria, M., Lin, W., McLoughlin, I., Emmanue, S. and Chia, L. T. 2012. Nonintrusive Quality Assessment of Noise Suppressed Speech with Mel-Filtered Energies and Support Vector Regression, IEEE Trans. on Audio, Speech and Language Processing, 20(4), 1217 - 1232. Nakamoto, T., Otaguro, S., Kinoshita, M., Nagahama, M., Ohinishi, K., and Ishida, T. 2008. Cooking Up an Interactive Olfactory Game Display, IEEE Computer Graphics and Applications, 28, 1, 75--78. Nothdurft, H.-C. 2000. Salience from feature contrast: additivity across dimensions. Vis. Res., 40, 10–12, 1183–1201. Otaduy, M.A., Garre, C., and Lin, M.C. 2013. Representations and Algorithms for Force-Feedback Display, Proceedings of the IEEE, 101, 9, 2068-2080. Pereira, F. 2005. A Triple User Characterization Model for Video Adaptation and Quality of Experience Evaluation, in Proceedings 7th IEEE Workshop on Multimedia Signal Processing, 1–4. Pyo, S., Joo, S., Choi, B., Kim, M., and Kim, J. 2008. A Metadata Schema Design on Representation of Sensory Effect Information for Sensible Media and its Service Framework using UPnP, in Proceedings 10th International Conference on Advanced Communication Technology, (ICACT 2008), 2, 1129 –1134. Rainer, B, Waltl, M., Cheng, E., Shujau, M., Timmerer, C., Davis, S., Burnett, I., Ritz, C. and Hellwagner, H. 2012. Investigating the Impact of Sensory Effects on the Quality of Experience and Emotional Response in Web Videos in: I. Burnett, H. Wu (Eds.), Proceedings of the 4th International Workshop on Quality of Multimedia Experience (QoMEX’12), IEEE, Yarra Valley, Australia, 278--283. Reinhard, E., Efros, A.A., Kautz, J., and Seidel, H.-P. 2013. On Visual Realism of Synthesized Imagery, Proceedings of the IEEE, 101, 9, 1998 -- 2007, 2013. Revonsuo, A. 1999. Binding and the phenomenal unity of consciousness. Consciousness and cognition, 8, 2, 173-185. Richard, G., Sundaram, S., and Narayanan, S. 2013. An Overview on Perceptually Motivated Audio Indexing and Classification,
Proceedings of the IEEE, 101, 9, 1939 --1954. Rowe, L.A. and Jain, R. 2005, ACM SIGMM retreat report on future directions in multimedia research, ACM Transactions on Multimedia Computing, Communications, and Applications, 1, 1, 3--13. Rubinstein, J. S., Meyer, D. E., & Evans, J. E. (2001). Executive control of cognitive processes in task switching. Journal of Experimental Psychology: Human Perception and Performance, 27(4), 763. Sarter, N. 2013. Multimodal Support for Interruption Management: Models, Empirical Findings, and Design Recommendations, Proceedings of the IEEE, 101, 9, 2105 – 2112. Schiller PH. 1986. The central visual system, Vision Res. 26 (9): 1351–1386. Seungmoon C. and Kuchenbecker, K.J. 2013. Vibrotactile Display: Perception, Technology, and Applications, Proceedings of the IEEE, 101, 9, 2093-2104. Smythies, J. R. 1994a. The walls of Plato’s cave. Aldershot: Avebury. Smythies, J. R. 1994b. Requiem for the Identity Theory. Inquiry, 37, 311–329. Stamper, R.K. 1973 Information in Business and Administrative Systems, New York, John Wiley and Sons. Steinbach, E., Hirche, S., Ernst, M., Brandi, F., Chaudhari, R., Kammerl, J., and Vittorias, I., 2012. Haptic Communications, Proceedings of the IEEE, 100, 4, 937 –956. Suk, C. B., Hyun, J. S., and Yong, L. H. , 2009. Sensory Effect Metadata for SMMD Media Service, in Proceedings of the 2009 Fourth International Conference on Internet and Web Applications and Services, IEEE Computer Society, Washington, DC, USA, 649–654. Timmerer, C, Kim S. K., Ryu, J., and Choi, B. S. 2011. ISO/IEC 23005-3 FDIS Information technology — Media context
and control — Part 3: Sensory information. Timmerer, C., Waltl, M., Rainer, B., Hellwagner, H. 2012. Assessing the Quality of Sensory Experience for Multimedia Presentations. Signal Processing: Image Communication 27, 8, 909--916. Tortell, R., Luigi, D.P., Dozois, A., Bouchard, S., Morie, J.F. and Ilan, D. 2007, The effects of scent and game play experience on memory of a virtual environment, VirtualReality, 11, 1 , 61—68. Vetro, A. and Timmerer, C. 2005. Digital item adaptation: overview of standardization and research activities, IEEE Transactions on Multimedia, special issue on MPEG-21, 7, 3, 418 --426. Waltl, M., Timmerer, C., Rainer, B., and Hellwagner, H. 2012. Sensory Effect Dataset and Test Setups, in: I. Burnett, H. Wu (Eds.), Proceedings of the 4th International Workshop on Quality of Multimedia Experience (QoMEX’12), IEEE, Yarra Valley, Australia, 115--120. Waltl, M., Rainer,B., Timmerer, C., and Hellwagner, H. 2013. An End-to-End tool Chain for Sensory Experience based on MPEG-V,, Signal Processing: Image Communication, 28, 2, 136--150. Williams, A., Langron, S., and Noble, A. 1984. Influence of appearance on the assessment of aroma in Bordeaux wines by trained assessors. Journal of the Institute of Brewing, 90, 250–253. Wu, H. R., Reibman, A. , Lin, W., Pereira, F., and Hemami S. S. 2013. Perceptual Visual Signal Compression and Transmission, Proceedings of the IEEE, 101, 9, 2025 – 2043. Yang, X., Lin, W., Lu, Z., Ong, E. and Yao, S. 2005. Just Noticeable Distortion Model and Its Applications in Video Coding. Signal Processing: Image Communication, 20, 7, 662-680. Yarbus, A. L. 1967. Eye movements during perception of complex objects. In Eye movements and vision (pp. 171-211). Springer US. Yazdani, A., Kroupi, E., Vesin, J., and Ebrahimi, T., Electroencephalogram alterations during perception of pleasant and unpleasant odors in: I. Burnett, H. Wu (Eds.), Proceedings of the 4th International Workshop on Quality of Multimedia Experience (QoMEX’12), IEEE, Yarra Valley, Australia., 272--277. Yoon, K., Choi, B., Lee, E.-S., and Lim, T.-B. 2010. 4-D Broadcasting with MPEG-V, in Proceedings IEEE International Workshop on Multimedia Signal Processing (MMSP), 257 –262. You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., and Perkis, A. 2010. Perceptual-based quality assessment for audio-visual services: a survey, Signal Processing: Image Communication, 25, 7, 482–501. You, J., Liu, G., Sun, L. and Li, H. 2007. A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization. IEEE Trans. on Circuits and Systems for Video Technology, 17,3, 273 – 285. Yost, William A. and Nielsen, Donald W., 1985. Fundamentals of Hearing, Holt, Rinehart and Winston, New York. Zhang, L. M. and Lin, W. 2013. Modeling Selective Visual Attention: Techniques and Applications, John Wiley & Sons.
Online Appendix to: Mulsemedia: State-of-the- Art, Perspectives and Challenges GHEORGHITA GHINEA Brunel University CHRISTIAN TIMMERER Alpen-Adria-Universität WEISI LIN Nanyang Technological University STEPHEN R. GULLIVER University of Reading
A. IMP ORTAN T TE CH NICAL ISSUE S AND COMP UTATIONAL MODE LS IN MULSE ME DIA Th is a ppen dix pr esen t s m or e det a ils a n d discu ssion for t h e ba sic a n d im por t a n t t ech n ica l a ppr oa ch es a n d com pu t a t ion a l m odels for m u lsem edia . We will a lso t r y t o h igh ligh t t h e r ela t ed t ech n ica l ch a llen ges a n d possible fu t u r e explor a t ion , wh en ever possible. A.1 J u st n ot icea ble differ en ce (J ND) m odellin g Th e ju st n ot icea ble differ en ce (J ND) is t h e m in im u m ch a n ge in t h e m a gn it u de of a st im u lu s t h a t ca n be det ect ed by h u m a n s. In a h a pt ic pr oblem , t h e J ND ca pt u r es cer t a in er r or t oler a n ce (i.e., t h e dea dba n d or dea dzon e) in for ce a n d velocit y sign a ls below h u m a n h a pt ic t h r esh olds , a n d t h er efor e fa cilit a t es effect ive a n d efficien t da t a com pr ession (H in t er seer a n d St ein ba ch 2006, Ka m m er l, et a l. 2010), er r or r esilien ce (St ein ba ch , et a l. 2012), r en der in g (St ein ba ch , et a l. 2012; Seu n gm oon a n d Ku ch en becker 2013), in t er a ct ion a n d qu a lit y eva lu a t ion (Ka m m er l, et a l. 2010); for in st a n ce, a dea dba n d sign a l sa m ple n eeds n ot t o be t r a n sm it t ed t o a r em ot e sit e t o a ch ieve com pu t a t ion a l a n d ba n dwidt h sa vin g. In t h e 1-DoF (degr ee of fr eedom ) h a pt ic ca se a s in t h e wor k by H in t er seer a n d St ein ba ch 2006, a n d St ein ba ch , et a l. 2012, a sign a l sa m ple ( ) is wit h in t h e dea dba n d if t h e followin g in equ a lit y is h eld: | ( ) ( )| | ( )| (A.1) wh er e ( ) is a n a dja cen t sa m ple (u su a lly t h e pr eviou s sa m ple) for ( ), a n d is a per cept u a l t h r esh old pa r a m et er sim ply det er m in ed by Weber ’s la w t o r epr esen t t h e J ND (i.e., t h e J ND is pr opor t ion a l t o t h e sign a l m a gn it u de) in t h e a for em en t ion ed wor k . In r ea l-wor ld h a pt ic syst em s wit h m u lt iple DoF , a sign a l vect or ⃗ ∈ is u sed in st ea d of a sca la r sign a l ( ) in (A.1). A m u lt i-DoF sign a l ⃗( ) is wit h in t h e dea dba n d if: ‖ ( ⃗( ) ⃗( ))‖ ‖ ⃗( )‖ (A.2) Th is is a n ext en sion of (A.1) by St ein ba ch , et a l. (2012), a n d t h e dea dba n d m a t r ix dim en sion a l sign a ls is: (
)
for n -
(A.3)
wh er e is t h e J ND-r ela t ed t h r esh old cor r espon din g t o ea ch elem en t in ⃗. It h a s been fu r t h er kn own t h a t wh en a h u m a n u ser in t er a ct s wit h a n object wit h a cer t a in velocit y ̇ ( ), h is/h er for ce-feedba ck per cept ion a bilit ies a r e r edu ced; t h a t is, t h e va lu e of in (A.1) or in (A.3) in cr ea ses wit h ̇ ( ), a n d t h er efor e n ow t im e va r yin g, a s in t h e wor k by J . Ka m m er l, et a l. (2010): | ̇ ( )| ( ) (A.4) wh er e ( ) is t h e velocit y-a da pt ive J ND-r ela t ed t h r esh old, is t h e ba se-lin e (con st a n t ) t h r esh old, a n d den ot es t h e r a t e of ch a n ge in ( ) wit h r espect t o ̇ ( ). Con t r a st t o a u diovisu a l developm en t for J ND (J a ya n t , et a l. 1993, Lin 2006, Lin a n d Ku o 2011, a n d Wu , et a l. 2013), t h a t for t ou ch sen sor a n d displa y devices is st ill in it s in fa n cy (e.g., t h e sim ple u se of Weber ’s la w, a s in t h e exist in g wor k m en t ion ed a bove), wh ile t h er e is la ck of sim ila r r esea r ch in olfa ct ion . Th er efor e, t h er e is a ca ll for in -dept h , com pr eh en sive a n d syst em a t ic in vest iga t ion for m u lsem edia J ND m odellin g, especia lly in m a skin g a n d con t r a st sen sit ivit y.
A2. P er cept ion of con flict in g m u lt isen sor y in for m a t ion F or a com pu t a t ion a l m odel of m u lsem edia , it is in evit a ble for a discr epa n cy t o occu r a m on g differ en t st r ea m s of sen sor y in for m a t ion in spa ce or in t im e. Som e for m s of in for m a t ion a r e of lin ger in g n a t u r e (like sm ell), a s opposed t o t h e t r a n sit or y n a t u r e of ot h er s (su ch a s video a n d a u dio). Th er e h a s been in it ia l in vest iga t ion in t h e r ela t ed r esea r ch com m u n it y, r ega r din g t h e im pa ct of a syn ch r on isa t ion a n d n eed of syn ch r on iza t ion of differ en t m edia . Th e olfa ct ion -en h a n ced m u lt im edia st u dy by Gh in ea a n d Adem oye (2009, 2010a , 2011, 2012) con cer n s it self wit h a ssocia t in g com pu t er -gen er a t ed sm ell wit h visu a l a n d a u dio in for m a t ion ; t h e six sm ell ca t egor ies u sed wer e flower y, fou l, fr u it y, bu r n t , r esin ou s a n d spicy, t oget h er wit h t h e a ssocia t ed videos, a s list ed in Ta ble A.1 (Adem oye a n d Gh in ea , 2009). Su bject ive exper im en t s wer e con du ct ed wit h m or e t h a n 40 pa r t icipa n t s, t owa r d : 1) Det ect a ble in t er -m edia sk ew bet ween olfa ct or y a n d a u diovisu a l m edia con t en t ; 2) Im pa ct of dela y on t h e u ser -per ceived exper ien ce. As sh own wit h t h e exper im en t s, in t er -m edia sk ew syn ch r on isa t ion r equ ir em en t s for olfa ct ion a n d a u diovisu a l con t en t lie bet ween -30 a n d +20 sec; olfa ct ion a h ea d of a u diovisu a l con t en t is less n ot icea ble t h a n t h e r ever se ca se (i.e., olfa ct ion beh in d a u diovisu a l con t en t ). F u r t h er m or e, t h e r esu lt s r evea led t h a t a lt h ou gh pa r t icipa n t s det ect ed t h e pr esen ce of syn ch r on iza t ion er r or s, it d id n ot h a ve a sign ifica n t im pa ct on t h e gen er a l per ceived qu a lit y of exper ien ce of t h e olfa ct ion -en h a n ced m u lt im edia for pa r t icipa n t s. Alt h ou gh t h e h u m a n per cept ion syst em seem s t o be a ble t o cor r ect for in t er -m edia m ism a t ch so t h a t t h e discr epa n cy becom es less n ot icea ble wit h t im e, a s discu ssed a bove, som e r esea r ch in dica t es t h a t a com pu t a t ion a l syst em r equ ir in g t h e u ser t o fr equ en t ly a da pt t o n ovel con flict in g sit u a t ion s will h a ve u n sa t isfa ct or y per for m a n ce in t er m s of QoE . H en ce, in or der t o fa cilit a t e t h e coh er en t per cept ion of a n even t a cr oss differ en t sen sor y feedba ck s, in t er -m edia a syn ch r on y sh ou ld be syst em a t ica lly m in im ized in a t eleoper a t ion syst em , for in st a n ce, via in t elligen t st a t ist ica l m u lt iplexin g of a u diovisu a l-h a pt ic sign a ls on t h e feedba ck com m u n ica t ion ch a n n el (H in t er seer a n d St ein ba ch , 2006; Ka m m er l, et a l, 2010; Seu n gm oon a n d Ku ch en beck er , 2013). Sin ce t h e per cept u a l m ech a n ism s beh in d t h e con flict in g in for m a t ion a n d syn ch r on iza t ion a r e st ill la r gely u n kn own , obviou sly m or e explor a t ion is ca lled for t h is field, befor e t h e fin din gs ca n be effect ively t u r n ed in t o design a n d im plem en t a t ion a dva n t a ges for olfa ct ion /h a pt ics-en h a n ced a pplica t ion s a n d ser vices. Ta ble A.1 Associa t in g com pu t er -gen er a t ed sm ell wit h videos (Adem oye a n d Gh in ea , 2009) SMELL CATEGORY VIDE O DE SCRIP TION
SME LL USE D
BURNT Docum en t ar y bu sh fir es Okla h om a Bu r n in g Wood
FLOWERY on in
FOUL
News broa dca st fea t u r ing per fum e la u nch
Docum en t ar y a bou t rot t ing fr u it s
Wa llflower
Ru bbish Acr id
FRUITY Cooker y sh ow on h ow t o m a ke a fr u it cockt a il St r a wber r y
RESINOUS Docum en t ar y on Spr in g a ller gies& ceda r wood Ceda r Wood
SPICY Cooker y sh ow on h ow to m a ke ch icken cu r ry Cu r r y
A3. Mu lt isen sor y in t egr a t ion Mu lsem edia in t egr a t ion n eeds t o be per for m ed t owa r d t h e t ot a l con t r ol a n d QoE , wit h {s i} den ot in g t h e per cept u a l effect of n oisy m u lt isen sor y st im u li, wh er e i=1, 2, ... n , bein g t h e sen sor y in dex. Assu m in g n oises a r e in depen den t a n d Ga u ssia n dist r ibu t ed , a wa y t o in t egr a t e t h e u n bia sed sen sor y est im a t es { ̂ } is t o a da pt ively weigh t t h em (a s in t h e wor k of h a pt ic com m u n ica t ion by St ein ba ch , et a l. (2012)), wit h w i t o be pr opor t ion a l t o t h e in ver se of t h e va r ia n ces of r espect ive n oise dist r ibu t ion s, { }, i.e.,
∑
,
(A.5)
a n d r esu lt a n t a ddit ive in t egr a t ion is:
̂ ∑ ̂ ∑ ̂ (A.6) ̂ wh er e is t h e n or m a lized for m of w i, a n d ̂. In gen er a l, t h e over la ppin g effect a m on g {s i } n eeds t o be a ccou n t ed for , so wit h ext en sion of t h e n on lin ea r a ddit ivit y m odel for per cept ion pr oposed by Not h du r ft (2000), E q. (A.6) becom es ̂
̂ ̂ ̂) ∑ ∑ ( (A.7) ( ) wh er e r epr esen t s t h e cr oss-sen sor y cou plin g fa ct or s fr om ̂ t o ̂ , a n d is defin ed in t h e r a n ge of [0, 1] t o den ot e t h e m in im u m t o t h e m a xim u m in over la ppin g, so t h e secon d t er m of t h e r igh t -h a n d side of (A.7) a ccou n t s for over la ppin g of m u lt isen sor y da t a . Sim ple exa m ples of u sin g (A.7) ca n be fou n d in t h e wor k of Lu , et a l. (2005) a n d Ya n g, et a l. (2005), for vi su a l sign a ls. Towa r d im m er sive en vir on m en t s, fu r t h er a n d m or e con vin cin g r esea r ch is st ill a ver y ch a llen gin g t a sk for over la ppin g eva lu a t ion a n d m u lt isen sor y per cept u a l fu sion , ver ified wit h da t a in big sca les.