Proceedings for
2 2006 M USIC
Frontiers of ICT Research Date: 16th – 17th November 2006 Venue: PJ Hilton Hotel, Malaysia
Organised by:
Supported by:
Ministry of Energy, Water and Communications
Main Sponsor: Selangor State Government The Institution of Engineers Malaysia
Co-Sponsors:
Motorola
Malaysian Communications and Multimedia Commission
Selangor State Government
IN CONJUCTION WITH MMU 10TH YEAR ANNIVERSARY Brains, Knowledge, Wisdom: 10 Years and Beyond
ORGANISING COMMITTEE Chairman of Organising Committee
: Professor Chuah Hean Teik
Local Committee Chairman Co-Chairman Secretary
: Dr. Somnuk Phon-Amnuaisuk : Dr. Fazly Salleh Abas : Mr. Mohd Azhar Mat Zim : Ms Siti Hajar Mat Yusop
Members: Dr. Chew Kuew Wai Dr. Khine Nyunt Dr. Lim Wee Kiong Dr Mohammad Yusoff Alias Dr. Ong Boon Hong Dr. You Ah Heng Mr. Abid Abdelouahab Mrs. Amy Lim
Mr. Chin Wen Cheong Mr. Frank Wonning Chu Mrs. Goh Hui Ngo Mr. Ibrahim Yusof Mr. Khairil Imran Ghauth Mr. Khor Kok Chin Mr. Lee Kian Chin Mr. Mohd Izzuddin Mohd Tamrin
Mr. Nor Azhan Nordin Mr. R. Logeswaran Mr. Sriraam Natarajan Ms. Soon Lay Ki Ms Tan Chui Hong Mr. Tong Gee Gok Mrs. Wan Noorshahida Mohd Isa Mr. Wong Kok Seng
Technical Committee Chairman
: Associate Professor Dr. Lee Sze Wei
Co-Chairman Secretariat
: Dr. Mohammad Faizal Ahmad Fauzi : Mr. Mohd Fazrin Lokman : Ms .Rasinah Mohamad
Members: Prof. Lim Swee Cheng
Dr. Tan Kim Geok
Dr. Harold M. Thwaites
Dr. Andrew Teoh Dr. Chien Su Fong
Dr. Chuah Teong Chee Dr. Hafizal Mohamad
Mr. Leow Meng Chew Mr. Lim Tien Sze
Dr. Krishna Prasad Dr. Lee Chien Sing
Dr. Jacob Daniel Dr. Mohd Ridzuan Mokhtar
Mr. Ong Thian Song Mr. Tan Shing Chiang
Dr. Nurul Nadia Ahmad
Dr. Norliza Mohd Zaini
Ms. Tee Connie
Dr. Sim Moh Lim
Dr. Zulfadli Yusoff
Mr. Chan Yee Kit
Dr. Sithi V. Muniady
Dr. Goi Bok Min
Mr. Goh Kah Ong Michael
LIST OF INTERNATIONAL ADVISORY BOARD 1.
Prof. Alan Marshall, Queen's University of Belfast, UK
2.
Prof. David J. Nagel, The George Washington University, USA
3.
Prof. Francis T.S. Yu, The Pennsylvania State University, USA
4.
Prof. Kong J.A., Massachusetts Institute of Technology, USA
5.
Prof. Norman Foo, The University of New South Wales, Australia
6.
Dr. Nobuo Goto, Toyohashi University of Technology, Japan
7.
Prof. Ong Chee Mun, Purdue University, USA
8.
Prof. Randy Goebel, University of Alberta, Canada
9.
Prof. Robert J. Simpson, UK
10. Prof. Tam Hwayaw, The Hong Kong Polytechnic University, Hong Kong
LIST OF REVIEWERS M2USIC 2006 Aarthi Chandramohan Abdul Razak Rahmat Aeni Zuhana Saidin Ah Heng You Ah-Choo Koo Ahmad Fauzi Aini Binti Hussain Ajantha Sinniah Ajay Anant Joshi Amit Saxena Ammar Wahab Mohemed Andrew Teoh Beng Jin Antonio L. J. Teixeira Asaad Abusin Asad I. Khan Ashvini Chaturvedi Aussenac-Gilles Nathalie Azizi Ab.Aziz Azman Yasin B.S. Daya Sagar Bala P. Amavasai Bernd Porr Bhawani Selvaretnam Chai Kiat Yeo Chan Huah Yong Chan Gaik Yee Charles Woo Cheah Wooi Ping Chia Chieh Thum Chien-Sing Lee Ching-Chieh Kiu Choo Kan Yeep Chor Min Tan Chua Fang Fang Chuah Teong Chee Dan Corkill Daniel Wong Daniel Lemire David Chieng Dirk Gerhard Rilling Dr. Mahamed Ridza Wahiddin Dr. Mohammad Umar Siddiqi Eisuke Kudoh Eng Kiong Wong Eric Chin Eris Chinellato
Ettikan Kandasamy Karuppiah Fairuz abdullah Fazilah Haron Haron Fazly Salleh Abas Fengyuan Ren G.S.V. Radha Krishna Rao Gareth J. F, Jones Geong Sen Poh Ghanshyam singh Gobi Vetharatnam Goh Hock Ann Hafizal Mohamad Hailiza Kamarul Haili Hakim Mellah Hanafi Atan Harold Thwaites Hend Al-Khalifa Heng Siong Lim Hew Soon Hin Ho Yean Li Ho Chin Kuan Hoon Wei Lim Hui Shin Wong Hui-Ngo Goh Hui-ngo Goh Hyungjeong Yang Ibrahim Yusof Ismail Ahmad Izuzi Marlia Jehana Jamaluddin Jerome Haerri John See Jusoh Shaidah Ka Sing Lim Kah Hoe Koay Kaharuddin Dimyati Kang Eng Thye Khong Neng Choong Khoo Bee Ee Khor Swee Eng Khor Kok Chin Kiu Ching Chieh KK Teoh Kok Swee Sim Krishna Prasad Kuan-Hoong Poo Kulathruraimayer Narayanan Kwek Lee Chung
Lau Siong Hoe Lee Kok Wah Lim Way Soong Lim Chia Sien Lim Chee Peng Ling Huo Chong Ling Siew Woei Lock Yen Low M. Senthil Arumugam M. -L. Dennis Wong Mahendra V. Chilukuri Mao De Ma Md Shohel Sayeed Md Zaini jamaluddin Mehrdad J. Gangeh Michael Goh Kah Ong Michael J. H. Chung Michael Wagner Mihaela Rodica Cistelecan Miss Laiha Mat Kiah Moesfa Soeheila Binti Mohamad Mohamad Kamal A Rahim Mohamad Yusoff Alias Mohammad Osiur Rahman Mohammed Belkhatir Mohd Haris Lye Abdullah Mohd Shamrie Sainin Nazib Nordin Ng Tian Tsong Ng. Mow Song Ng Nidal S Kamel Nirod Chandra Sahoo Nithiapidary Muthuvelu Nor Ashidi Mat Isa Norashidah Md Din Norazizah Mohd Aripin Norhashimah Morad Norshuhada Shiratuddin Othman M. S. Ahtiwash Pang Ying Han Pankaj Kumar Choudhury Paul Safonov Paul Yeoh H P Paul H. Lewis Pechin Lo Chien Pau Putra Sumari R. Azrina R. Othman
LIST OF REVIEWERS M2USIC 2006 Rahmita Wirza O.K. Rahmat Ram Chakka Raphael C.-W. Phan Reethu Arokia John Richard Richard Crowder Robin Salim Rosalind Deena Kumari Rosni Abdullah Ryoichi Komiya Salina Abdul Samad Salman Yusoff Sayuthi Jaafar Lt. Col. Sean B. Anderson Ser Wah Oh Seung Teck Park Sheng Chyan Lee Shyamala Doraisamy Sim Moh Lim Simon David Scott Sithi V. Muniandy Somnuk Phon-Amnuaisuk Soon Fook Fong Sreenivasan Jayashree Su Fong Chien Subarmaniam Kannan Suhaidi Hassan Swarnappa clement sudhakar Swee Cheng Lim Sze Wei Lee Tan Shing Chiang Tanja Mitrovic Teck Meng Lim Tee Connie Thong Leng Lim Tien Sze Lim Toufik Taibi Umi Kalthum Ngah Wai Lee Kung Wan Tat Chee Wei Lee Woon Wong Kok Seng Wong Chui Yin Wong Li Pei
Wun-She Yap Ya-Ping Wong Yap Keem Siah Yau Wei Chuen Yee Kit Chan Yik Seng Yong Yip Wai Kuan Yong Yen San Yoong Choon Chang Yu-N Cheah Zainal Abdul Aziz Zainol Abidin Abdul Rashid Zhamri Che Ani Zubeir Izaruku Zulaikha Kadim
MMU INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATIONS TECHNOLOGIES 2006 (M2USIC 2006) OPENING CEREMONY 16th November 2006 (Thursday) KRISTAL BALLROOM 1, LEVEL 1, PETALING JAYA HILTON AGENDA 9.00 am
Registration of participants
9.30 am
Arrival of guests and media members
10.00 am
Arrival of Guest of Honour and VIPs
10.05 am
Welcoming speech by Prof. Datuk Dr. Ghauth Jasmon President, Multimedia University
10.15 am
Officiating speech by our Guest of Honour
10.30 am
Mock cheque and souvenir presentation
10.45 am
Refreshment and Press Conference
11.15 am
Keynote Address 1 and Keynote Address 2
12.30 pm
Lunch
16 NOVEMBER 2006 (THURSDAY) 14:00 – 15:40 Artificial Intelligence & Applications (TS1A) TS1A– 1
TS1A – 2
TS1A – 3
TS1A – 4
TS1A – 5
SVD In AHP With Embedded Genetic Algorithm And Neural Network For Mass Classification In Breast Cancer Nur Jumaadzan Zaleha binti Mamat, Afzan binti Adam Multimedia University, Cyberjaya, Malaysia Numerical Evolutionary Optimization using G3-PCX with Multilevel Mutation Jason Teo, Mohd. Hanafi Ahmad Hijazi Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia Comparing Between 3-Parents and 4-Parents for Differential Evolution Nga Sing Teng, Jason Teo, Mohd Hanafi Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia. Analysis and Design of Crew Reassignment Module in Bus Crew Scheduling System Using GAIA Methodology Abdul Samad Shibghatullah, Nazri Abdullah KUTKM, Ayer Keroh, Melaka, Malaysia InfoPruned Artificial Neural Network Tree (ANNT) S.Kalaiarasi Anbananthen, G Sainarayanan, Ali Chekima, Jason Teo University Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia
1
7
12
17
22
16:00 – 17:40 Artificial Intelligence & Applications (TS1B) TS1B– 1
TS1B – 2
TS1B – 3
TS1B – 4
TS1B – 5
Online Graphic Recognition using a novel Spatio-Sequential Descriptor Noorazrin Zakaria, Jean-Marc Ogier, Josep Llados University of La Rochelle, La Rochelle, France. Acceptance Probability of Lower Regular Languages and Problems using Little Resources Michael Hartwig, Beik Zadeh Reza Mohammad Multimedia University, Malaysia. Gaborone, Botswana. A Novel Intelligent Technique for Mobile Robot Navigation S. Parasuraman, V. Ganapathy, Bijan Shrinzadeh Monash University, Kuala lumpur, Selangor, Malaysia. Dynamic Signature Identification and Forgery Detection Based on Bend of Joints of Fingers in the Signing Process Using Data Glove Shohel Sayeed, Nidal S Kamel, Rosli Besar Multimedia University, Melaka, Malaysia. A Novel Intelligent Decision Support System Based on Cognitive Processing Model of Decision-Making Chee Siong Teh, Chwen Jen Chen Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
28
34
40
46
52
14:00 – 15:40 Telecommunications (TS2A) TS2A-1
TS2A – 2
TS2A– 3
TS2A – 4
TS2A – 5
Active Integrated Antena for Direct Conversion Receiver Mohamad Kamal A Rahim, M.I.M Zaki, M.H. Jamaluddin, M.H. Azmi, A.Asrokin UTM, Skudai, Johor, Malaysia.
151
Token Ring Using Ethernet (TRUE): Introduction and Performance Bounds K.Daniel Wong, Wei Lee Woon. MUST, PJ, Selangor,Malaysia.
156
Grammatical Coding for Low Cost Communication Mohd. Moniruzzaman Internatonal Islamic University Chittagong, Dhaka, Bangladesh.
162
Precision Farming Using Wireless Sensor Network Chun Fatt Hiew, Voon Chet Koo, Yik Seng Yong. Multimedia University, Melaka, Malaysia.
166
Scaling Properties of Automatically Switched Optical Networks Rajendran Parthiban. Monash University Malaysia, Petaling Jaya, Selangor, Malaysia.
171
16:00 – 17:40 Telecommunications (TS2B) TS2B – 1
TS2B – 2
TS2B – 3
TS2B – 4
TS2B– 5
Gain And SNR Improvement Of Four ITU Channels Using Double Pass Amplification With A MUX / DMUX Filters Belloui Bouzid. KFUPM, HBCC, Dhahran, Saudi Arabia. Multi-User Detection for the Optical Code Division Multiple Access: Optical Parallel Interference Cancellation Nagi Elfadel, Elamin Idriss, Mohamad Naufal Mohamad Saad. Universiti Teknologi PETRONAS, Tronoh, Perak, Malaysia. Economic Viability of Deploying All-Optical Networks Rajendran Parthiban. Monash University Malaysia, Petaling Jaya, Selangor, Malaysia. Distributed Raman Amplifiers for Long-Haul OFCS Nadir Hossain, Vivekanand Mishra, Hairul Azhar Abdul Rashid, Abbou Mohammed Fouad, Ahmed Wathik Naji,A. R. Faidz, H.Al- Mansoori, M.A. Mahdi. Multimedia University, Malaysia, Cyberjaya , Malaysia. WiMAX Deployment along North-South Expressway in Malaysia: The First Step Wong Hui Shin, Chong Kieat Wey. Multimedia University, Cyberjaya, Selangor, Malaysia.
177
180
184
188
192
14:00 – 15:40 Multimedia (TS3A) TS3A– 1
TS3A – 2
TS3A– 3
TS3A – 4
TS3A – 5
Automatic Continuous Digit Boundary Segmentation Syed Abdul Rahman Al-Haddad, Salina Abdul Rahman, Aini Hussein University Kebangsaan Malaysia, Bangi, Malaysia. Enhanced Bayesian Networks Model for Dialogue Act Recognition in Dialogue Systems Anwar Ali Yahya, R. Mahmod, F. Ahmad, M.N Sulaiman Universti Putra Malaysia, Serdang, Selangor, Malaysia A Controllable JPEG Image Scrambling using CRND Coefficient Scanning Ahmad Zaidee b Abu, Fazdliana bt Samat, J. Adznan Multimedia Tech. Cluster, Telekom Research & Development, Serdang, Selangor, Malaysia
280
284
291
Temporal Video Compression by Modified Constant and Linear Approximation Lee Peng Teo, Hse Tzia Teng, Wee Keong Lim, Yi Fei Tan, Wooi Nee Tan MMU, Cyberjaya, Selangor, Malaysia.
297
Offline Signature Verification by Binary Image Processing Shih Yin Ooi, Andrew Teoh Beng Jin, David Ngo Chek Ling Multimedia University, Melaka, Malaysia.
303
16:00 – 17:40 Multimedia (TS3B) TS3B – 1
TS3B – 2
TS3B – 3
TS3B – 4
TS3B – 5
Fusion of Multiple Edge Maps For Improved Noise Resistance Wei Lee Woon, Panos Liatsis, Kuok-Shoong Daniel Wong Information Technology, Malaysia University of Science and Technology, Petaling Jaya, Malaysia.
309
Structural Analysis on Ground Level Image of Man-Made objects Using Fuzzy Spatial Descriptor Patrice Boursier, Norazrin Zakaria, J.M Ogier, Hong Tat Ewe, Noramiza Hashim Laboratoire Informatique Image Interaction, Universite de La Rochelle, La Rochelle, France
315
A Review Study of Face Feature Extraction Karmesh Mohan, Edwin Tan Teck Beng, Mohd Fadhli Mohd Nor, Pang Ying Han Multimedia University, Bukit Beruang, Melaka, Malaysia.
321
A New Approach of Multi-Image Method for 3D Model Reconstruction Mohd Azzimir Jusoh, Nor Roziana Rosli Multimedia University, Cyberjaya, Selangor, Malaysia
325
Feature Selection for Texture-Based Classification Scheme Using Multimodality for Liver Disease Detection Sheng Hung Chung, Rajesvaran Logeswaran Multimedia University, Cyberjaya, Selangor, Malaysia
331
14:00 – 15:40 Information Security & Cryptography (TS4A) Invited Talk: " Security Assurance Framework " Pn. Rozana Rusli (CISSP), Head of Security Assurance, National ICT Security and Emergency Response Centre (NISER) TS4A– 1
TS4A – 2
TS4A– 3
TS4A – 4
A Weight-Based Scheme For Secret Recovery With Personal Entropy Leau Yu Beng, Mohd. Aizaini Maarof, Rabiah Ahmad Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia. On a Strong Location Privacy and Forward Secrecy RFID Challenge - Response Authentication Protocol Hun-Wook Kim, Shu-Yun Lim, Hoon-Jae Lee Dongseo University, South Korea, Busan, Republic of Korea. Securing VPN Using IPSEC: A Case Study Sharipah Setapa, Noraida Kamaruddin, Gopakumar Kurup Cyberspace Security Lab, MIMOS BHD. Implementation of Spam Detection on Regular and Image based Emails – A Case Study of Using Spam Corpus Biju Issac, Valliapan Raman Swinburne University of Technology (Sarawak Campus), Kuching, Sarawak, Malaysia.
414
420
425
431
16:00 – 19:00 Information Security & Cryptography (TS4B) Invited Talk: “State of the Art and Open Problems in Cryptology" Dr. Raphael C.-W. Phan, Director of Information Security Research Laboratory, Swinburne University. TS4B – 1
TS4B– 2
TS4B – 3
TS4B– 4
TS4B – 5
TS4B – 6
TS4B– 7
A Multi-Stage Off-Line Signature Verification System by Artificial Neural Network with Individual Error Tolerance Aklima Khanam, Md. Abdullah Al Mamun, Md. Zia Uddin International Islamic University Chittagong, Bangladesh. An Off-Line Signature Identification System using Quaternary Tree Md. Abdullah Al Mamun, Md. Zia Uddin, Aklima Khanam International Islamic University Chittagong, Bangladesh. Block-based Watermarking Scheme using Inter-block Dependencies for Copyright Protection Vik Tor Goh, Mohammad Umar Siddiqi Multimedia University, Cyberjaya, Selangor, Malaysia International Islamic University Malaysia, Kuala Lumpur, Malaysia. A Secure Approach for Data Hiding in Internet Control Message Protocol (ICMP) Ern Yu Lee, Hoon Jae Lee Dongseo University, Pusan, Republic of Korea. MyKad Security: Threats & Countermeasures Huo-Chong Ling, Raphael C.-W. Phan Multimedia University, Cyberjaya, Selangor, Malaysia. Swinburne University of Technology (Sarawak Campus), Kuching, Sarawak, Malaysia. Certificateless Encryption Schemes Revisited Wun-She Yap, Swee-Huay Heng, Bok-Min Go Multimedia University, Cyberjaya, Selangor, Malaysia. Analysis of an SVD-based Watermarking Scheme Grace C-W Ting,Bok-Min Goi, Swee-Huay Heng, Swinburne University of Technology (Sarawak Campus), Kuching, Sarawak, Malaysia. Multimedia University, Cyberjaya, Selangor, Malaysia
437
443
448
454
460
465
471
17 NOVEMBER 2006 (FRIDAY) 09:00 – 10:40 Artificial Intelligence & Applications (TS1C) TS1C – 1
TS1C– 2
TS1C– 3
TS1C– 4
TS1C– 5
A New Approach to Implement an MT From Bangla to English Md. Shahnur Azad Chowdhury, Ibrahim Bin Nurul Islalm, Mohammad Mahadi Hassan International Islamic University Chittagong, Bangladesh)
58
Fuzzy Matching Of Bangla Words Farhana Jahan, Mahmudul Faisal Al Ameen, Khondaker Abdullah-Al-Mamun Ahsanullah University of Science and Technology, Dhaka, Bangladesh
64
Rectangle & Circle Based Geometric Characters of Bangla & English and An Efficient Relevant Method of Size Independent Character Recognition Khondaker Abdullah-Al-Mamun, Md. Shakayet Hossain IBAIS University, Dhaka, Bangladesh. Integration of Fuzzy Linear Regression and Response Surface Methodology for Intelligent Data Analysis Hiaw San Kho, Chee Peng Lim, Abdul Aziz Zalina University Sains Malaysia, Nibong Tebal, Penang, Malaysia. Semantic Retrieval with Spreading Activation Smitashree Choudhury, Somnuk Phon-Amnuaisuk MMU, Cyberjaya, Selangor, Malaysia
70
77
83
11:00 – 12:40 Artificial Intelligence & Applications (TS1D) TS1D – 1
TS1D – 2
TS1D – 3
TS1D– 4
Knowledge Retrieval and Ontological Question Answering: Survey, Discussion, Open Research Issues and Our Position Goh Hui Ngo, Somnuk Phon-Amnuaisuk MMU, Selangor, Malaysia.
89
Planning in Uncertain Temporal Domain Nor Azlinayati Abdul Manaf, Mohmmad Reza Beikzadeh Multimedia University, Cyberjaya, Selangor, Malaysia.
95
Improving Traffic Signal Control System Using Data Mining Yeen-Kuan Wong, Wei-Lee Woon Multimedia University, Cyberjaya, Selangor, Malaysia.
99
Writer Identification and Verification: A Review Siew Keng Chan, Yong Haur Tay, Christaian Viard-Gaudin Universiti Tunku Abdul Rahman, Petaling Jaya, Selangor Darul Ehsan, Malaysia.
105
14:30 – 16:00 Software Engineering (TS1E) TS1E – 1
TS1E – 2
TS1E – 3
TS1E – 4
Understanding Global Software Development Badariah Solemon UNITEN, Kajang, Malaysia
111
Implementing Software Process Assessment At Mid-Size Malaysian It Company: The Establishment Of Artifacts Shukor Sanim Bin Mohd Fauzi, Nuraminah Binti Ramli Universiti Pendidikan Sultan Idris, Tg Malim, Perak, Malaysia Universiti Teknologi Mara, Arau, Perlis, Malaysia
116
An Object-Oriented Approach towards User Interface Layout Management with Change Tracking System Liang Ying Oh, Tong Ming Lim Monash University Malaysia, Petaling Jaya, Selangor Darul Ehsan, Malaysia Compromise Multi-Objective Fuzzy Linear Programming (CMOFLP): Modelling and Application P. Vasant, B. Arijit, S, Sunanto, C. Kahraman, G. Dimirovski Universiti Teknologi Petronas, Tronoh, Perak, Malaysia
120
126
16:15 – 17:45 Theory & Method (TS1F) TS1F– 1
TS1F – 2
TS1F – 3
TS1F – 4
Hybrid Scheme for Glassy Ion Dynamics Johan Sharif, Asri Ngadi UTM Skudai, Malaysia Swansea University, Swansea, United Kingdom Optical Spectrum CDMA Double Weight Code Family using Equation-Based Code Construction Ahmed Mohammed, Mohamad Naufal Mohamad Saad Universiti Teknologi Petronas, Tronoh, Malaysia. Comparisons of the Effect of Different Discretization Method on the Results of Reducer’s Accuracy Ahmad Farhan, Jamilin Jais, Mohd. Hakim Haji Abd Hamid, Zarina Abd Rahman, Djamel Benouda Uniten, Kajang, Selangor, Malaysia. A Memory-efficient Huffman Coding Khondaker Abdullah-Al-Mamun, Mohammad Nurul Huda, M. Kaykobad, Md. Mostofa Akbar University of Science and Technology, Dhaka, Bangladesh
132
136
140
145
09:00 – 10:40 Telecommunications (TS2C) TS2C – 1
TS2C – 2
TS2C – 3
TS2C – 4
Supporting Quality of Services in Wireless LANs by EDCA Access Scheme Frankie Chan Kok Liang, Maode Ma. Nanyang Technological University, Singapore.
196
Aperture Coupled Microstrip Antenna Design Using Different Feed Width Mohamad Kamal A Rahim, Z.W. Low, M.H. Jamaluddin A. Asrokin, M.R. Ahmad. UTM, Skudai, Johor, Malaysia.
201
Controlling Home Appliances Using GSM Mobile Communication System From Remote Distant Md. Golam Rabiul Alam, Md. Mahabubul Hasan Masud, Md. Fakhrul Islam, Md. Monirul Islam. International Islamic University Chittagong,Bangladesh. Literature Study of Adaptive Modulation and Adaptive Beamforming Techniques for Mobile Transmissions Nurul Nadia Ahmad, Mohamad Yusoff Alias, Siti Azlida Ibrahim. Multimedia University, Cyberjaya, Selangor, Malaysia.
206
212
11:00 – 12:40 Telecommunications (TS2D) TS2D– 1
Performance of Block Coding and Convolutional Coding Over Noisy Channel Almas Uddin Ahmed, Hafizal Mohamad, Nor Azhar Mohd Arif. Multimedia University, Cyberjaya, Selangor, Malaysia.
218
TS2D – 2
Cooperative Bandwidth Allocation Scheme Noorsalwati Nordin, Mohamed Othman, Shamala Subramaniam. Universiti Putra Malaysia, Serdang, Selangor, Malaysia.
222
TS2D– 3
Connectivity of Vehicular Ad Hoc Networks Bhawani Selvaretnam, K. Daniel Wong. Multimedia University, Melaka, Malaysia.
229
14:30 – 16:00 Information Systems (TS2E) TS2E – 1
Multi Agent based Continuous Knowledge Audit Subarmaniam Kannan, Emaliana Kasmuri, Peter Woods. Multimedia University, Bukit Beruang, Melaka, Malaysia.
235
TS2E – 2
Semantic-based Support for Personalized Multimedia Presentation Authoring Jayan C Kurian, Payam M Barnaghi, MichaelIan Hartley. University of Nottingham, Malaysia Campus, Selangor, Malaysia.
241
TS2E – 3
Extending Conceptual ER to Multidimensional Model in Data Warehouse Design Haw Su Cheng, Chua Sook Ling. Multimedia University, Cyberjaya, Selangor, Malaysia.
247
TS2E– 4
Combining User Context Extraction and Data Fusion for Contextual Retrieval Precision Improvement Fatimah Ahmad, Shyamala Doraisamy, Az Azrinudin Alidin UPM, Serdang, Selangor, Malaysia.
253
16:20 – 18:00 Information Systems (TS2F) TS2F– 1
TS2F – 2
TS2F – 3
TS2F – 4
KNOWLEDGE-UI: A Knowledge-Driven User Interface Management System with Personalization Capability Kei Fai Lai, Tong Ming Lim. Monash University Malaysia, Bandar Sunway, Selangor, Malaysia. Towards a framework for Bus Ticketing and Tracking System Using Interactive GIS Map Jonathan Sidi, Syahrul N. Junaini, Goh Soon Kheah. University Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia. Performance Tests on Distributed Real-Time and Embedded Middleware Boon Ping Lim, Ettikan Kandasamy Karuppiah, Boon Poah Ang, Yoke Khei Lam. Panasonic R&D Centre Malaysia Sdn Bhd, Cyberjaya, Selangor, Malaysia. A Framework of M-government Approach for Developing Countries: Perspective of SAARC Comprised Countries Abdul Kadar Muhammad Masum, Md Delawer Hossain, Md. Golam Rabiul Alam. Internatinal Islamic University Chittagong, Bangladesh.
257
262
268
274
09:00 – 10:40 Multimedia (TS3C) TS3C– 1
TS3C– 2
TS3C– 3
Image Segmentation Using Wavelet Based Agglomerative Hierarchical Clustering Techniques Chang Yun Fah, Ee Kok Siong Multimedia University, Cyberjaya, Selangor, Malaysia
337
Feature Vector Selection Based On Freeman Chain Code For Shape Classification Suzaimah Ramli, Mohd Marzuki Mustafa, Aini Hussain UKM, Cheras Kuala Lumpur, Malaysia.
343
Weed Feature Extraction Using Gabor Wavelet and Fast Fourier Transform Asnor Juraiza Ishak, Mohd.Marzuki Mustafa, Aini Hussain Universiti Kebangsaan Malaysia, BANDAR BARU BANGI, Selangor, Malaysia.
347
11:00 – 12:40 Multimedia (TS3D) TS3D – 1
TS3D – 2
TS3D – 3
Wavelet Packet Image Arithmetic Coding Gain Using Subspace Energy Feature Remapping Matthew Teow, Sze Wei Lee, Ian Chai, Multimedia University, Cyberjaya, Malaysia. Colour Number Coding Scheme for Time-Varying Visualisation in Glassy Ion Trajectories Muhammad Shafie Abd Latiff, Johan Mohamad Sharif , Md Asri Ngadi UTM Skudai Johor , Malaysia. Colourization Using Canny Optimization Technique Jiin Taur Eng, Kok Swee Sim, Multimedia University, Melaka, Malaysia.
351
357
362
14:30 – 16:00 Computer Applications & Cybernatic (TS3E) TS3E – 1
TS3E – 2
TS3E – 3
TS3E – 4
CITRA: A Children’s Innovative Tool for Fostering Creativity Siew Pei Hwa Universiti Tunku Rahman, Petaling Jaya, Selangor, Malaysia. An Efficient Technique of Decomposing Bangla Composite Letters Riyad Ush Shahee, Mohammed Saiful Islam, Mohammad Shahnoor Saeed, Mohammad Mahadi Hassan, Mohammed Nizam Uddin International Islamic University Chittagong , Bangladesh.
367
374
Radio Frequency Identification (RFID) Application in Smart Payment System Mohamad Izwan Bin Ayob, Danial Bin Md Nor, Mohd Helmy Bin Abd Wahab, Ayob Johari Kolej Universiti Teknologi Tun Hussein Onn, Batu Pahat, Johor, Malaysia.
379
Enhance Polyphonic Music Transcription with NMF Sophea Seng, Amnukaisuk Phon Somnuk MMU Cyberjaya, Selangor, Malaysia.
384
16:20 – 18:00 Computer Applications & Cybernatic (TS3F) TS3F – 1
TS3F – 2
TS3F– 3
TS3F – 4
RGB-H-CbCr Skin Colour Model for Human Face Detection Nusirwan Anwar Abdul Rahman, Chong Wei Kit, John See Multimedia University, Cyberjaya, Selangor, Malaysia
390
Scalable Pornographic Text Filtering System Oi Mean Foong, Ahmad Izzudin Zainal Abidin, Suet PengYong, Harlina Mat Ali Universiti Teknologi PETRONAS, Tronoh, Perak, Malaysia
396
Computer Based Instructional Design in Light of Constructivism Theory Kamel H. A. R. Rahouma, Inas M. El-H. Mandour Technical College in Riyadh, Riyadh, , Saudi Arabia.
402
The Role of Computer Based Instructional Media and Computer Graphics in Instructional Delivery Kamel H. A. R. Rahouma, Inas M. El-H. Mandour Technical College in Riyadh, Riyadh, Saudi Arabia.
408
09:00 – 10:40 Virtual Reality (TS4C) Invited Talk: “ Latest R&D trends in VR and AR in both extremes [high-end oriented and low-end oriented]” Simon Su1, William R. Sherman1, R. Bowen Loftin2 1Desert Research Institute, Reno, USA. 2 Texas A&M University at Galveston, USA PANEL SESSION
11:00 – 12:40 Virtual Reality (TS4D) TS4D– 1
TS4D– 2
TS4D– 3
TS4D– 4
TS4D– 5
Invited Talk: “ECulture and Its EPossibilities: Immortalising Cultural Heritage through Digital Repository” Faridah Noor Mohd Noor Universityof Malaya, Kuala Lumpur, Malaysia. Using Heuristic To Reduce Latency In A VR-Based Application Yulita Hanum Iskandar, Abas Md. Said, M. Nordin Zakaria Kolej Universiti Islam Antarabangsa Selangor Universiti Teknologi Petronas, Tronoh, Perak Darul Ridzuan, Malaysia
497
480
Surface Reconstruction from the Point Clouds: A Study Ng Kok Why Multimedia University, Cyberjaya, Selangor, Malaysia
486
A Network Communications Architecture for Large Networked Virtual Environments Yang-Wai Chow, Ronald Rose, Matthew Regan Monash University, Australia.
491
The Instructional Design and Development of NoviCad Chwen Jen Chen, Chee Siong Teh Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia.
475
14:30 – 16:00 Interdisciplinary Applications (TS4E) TS4E– 1
TS4E– 2
TS4E– 3
TS4E– 4
TS4E– 5
Understanding How User Perceptions Influence Learning Objects Use Siong-Hoe Lau, Peter C. Woods Multimedia University, Bukit Beruang, Melaka, Malaysia. Modeling of Patterns and Different Learning Styles for Pedagogical Framework Documentation Ho Sin Ban, Ian Chai, Tan Chuie Hong Multimedia University, Cyberjaya, Selangor, Malaysia. Design of Active Filter Bank for Measuring Thickness Frequency of P-Wave in ImpactEcho Testing of Concrete Slamet Riyadi, Khairul Anuar Mohd. Nayan, Mohd. Marzuki Mustafa Universitas Muhammadiyah Yogyakarta, Bantul, Yogyakarta, Indonesia. UKM, Malaysia
502
507
513
Embedding Flexible Manufacturing Systems (FMS) with TCP/IP Ong Swee Sin, Ch’ng Yeow Chuan, Jonas Noto Santoso Djuanda Multimedia University, Bukit Beruang, Melaka, Malaysia.
519
Dynamic Appearance Model in SCORM-Compliant E-Learning Ean- Teng Khor, Eng-Thiam Yeoh Multimedia University, Cyberjaya, Selangor, Malaysia.
523
16:20 – 18:00 Hardware & Architecture (TS4F) TS4F– 1
TS4F– 2
TS4F– 3
A Low Voltage CMOS Bandgap Reference Circuit Harikrishnan Ramiah, Tun Zainal Azni Zulkifli Universiti Sains Malaysia, Nibong Tebal, Pulau Pinang, Malaysia. Effect of Short Noise and Emission Noise in Scanning Electron Microscope Image at Different Primary Energy Sim K.S ,Wee Mun Yee Multimedia University, Cyberjaya, Malaysia. Auto Focus System using Mixed Dynamic Fast Fourier Transform (MDFFT) in Optical Microscope Fang Jauw Hoo, Kok Swee Sim Multimedia University, Melaka, Malaysia.
09:00 – 10:40
529
535
539
SESSION TS1A TOPIC: ARTIFICIAL INTELLIGENCE & APPLICATIONS SESSION CHAIRMAN: Dr. Somnuk Phon- Amnuaisuk _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________ 2.00pm
TS1A- 1
SVD In AHP With Embedded Genetic Algorithm And Neural Network For Mass Classification In Breast Cancer Nur Jumaadzan Zaleha Mamat Afzan binti Adam ( Multimedia University, Cyberjaya, , Malaysia.)
1
2.20pm
TS1A-2
Numerical Evolutionary Optimization using G3-PCX with Multilevel Mutation Mohd. Hanafi Ahmad Hijaz Jason Teo (Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia)
7
2.40pm
TS1A-3
Comparing Between 3-Parents and 4-Parents for Differential Evolution Nga Sing Teng Jagan Teo Mohd. Hanafi (Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia)
12
3.00pm
TS1A-4
Analysis and Design of Crew Reassignment Module in Bus Crew Scheduling System Using GAIA Methodology Nazri Abdullah Abdul Samad Shibghatullah (KUTKM, Ayer Keroh, Melaka, Malaysia)
17
3.20pm
TS1A-5
InfoPruned Artificial Neural Network Tree (ANNT) S.Kalaiarasi Sonai Muthu Anbananthen G. Sainarayanan Ali Chekima Jason Teo (University Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia)
22
M2USIC 2006
TS- 1A
SVD IN AHP WITH EMBEDDED GENETIC ALGORITHM AND NEURAL NETWORK FOR MASS CLASSIFICATION IN BREAST CANCER Nur Jumaadzan Zaleha binti Mamat1, Afzan binti Adam2 Center of Mathematical Sciences1, Center of Artificial Intelligence and Intelligent Computing2 Faculty of Information Technology Multimedia University Jalan Multimedia, 63100 Cyberjaya, Selangor Darul Ehsan, Malaysia
[email protected] [email protected]
ABSTRACT This paper presents a new idea in classifying and decision making system in biomedical field, specifically in breast cancer. As early detection is crucial, an efficient and fast system that can determine whether a mass in breast is cancerous, hence its severity if it was, is in dire need. Therefore, a research to develop a new system to classify a mass in breast is started. The proposed system will have Neural Network (NN) classifier with Genetic Algorithms (GA) optimizer, seamlessly embedded within SVD in AHP architecture. Weights for criteria to determine the mass type are calculated using GA, while NN is used to produce pairwise relative importance comparison matrices for mass types in terms of each criterion. AHP then uses SVD approach to calculate the ranking of mass types under each criterion. Lastly all type mass weights are synthesized with criteria weights to obtain the final decision about the type of mass. Keyword: SVD in AHP, genetic algorithms, neural network, breast cancer
determined by an expert radiologist who is familiar with breast cancer. About 10%-30% of visible abnormalities in mammogram images are undetected, due to human or technical error. These may lead to false-positive detection; which can cause unnecessary biopsy and false-negative detection that can be deadly. The procedures in detecting breast cancer are shown in Figure 1. Generally, there are 2 types of breast cancer which are in situ and invasive. In situ starts in milk duct and does not spread to other organs even if it grows. Invasive on the contrary, is very aggressive and spreads to other nearby organs and destroys them as well. It is very important to detect the cancerous cell before it spreads to other organs, thus the survival rate for patient will increase to more than 97%. However, time taken from taking mammogram images to biopsy’s result is 2 weeks to a month in average. Therefore, a faster computerized breast cancer diagnosing tools is crucial to save lives.
BSE
CBE
MAMMOGRAM
1.0 INTRODUCTION Breast cancer is a leading fatality cancer for woman in many countries including Malaysia. In year 2002, 30.4% newly diagnosed cases were recorded and the number has increased to 31% in 2003, which is six times higher than the record in year 1993 [1][2]. Alarmingly, it continues to rise as it strikes every 1 of 4 Malaysian women in 2004 [3]. These facts have driven us to select this deadly cancer as our domain. Breast cancer has four early signs; micro calcification, mass, architectural distortion and breast asymmetries. However, only data regarding mass will be used as a pilot project to test our system later on. Masses of 2cm in diameter are palpable with regular breast self examination while mammogram images can capture it from 5mm in diameter. However, these images were to be
1
ANY SIGNS?
NO YES
BIOPSY
TREATMENT PROCEDURES
CANCER?
FIGURE 1: PROCEDURES IN DETECTING BREAST CANCER
M2USIC 2006
TS- 1A
1.1 Singular Value Decomposition (SVD) in Analytic Hierarchy Process (AHP) 1.1.1 Analytic Hierarchy Process (AHP). Analytic Hierarchy Process (AHP) was first introduced by T.L. Saaty in the early 70s to solve multiple criteria decision making (MCDM) problems, be they quantitative or qualitative [4]. Areas that have benefited from the AHP are education, social, military, politics economics and many more [4][5][6][7][8]. We have also seen wide usage of AHP in the medical area in general including breast cancer [9][10][11][12]. The three principles that make AHP a comprehensive, logic and robust MCDM method are analyzation, decomposition and structuring of the decision problem, comparative judgements of elements, and synthesis of the priorities of the elements. Firstly, the MCDM problem is analyzed, decomposed and structured into a hierarchy. At the top level of the hierarchy is the objective/goal of the problem. Multiple criteria that are influential in determining the overall final decision form the second level, while subcriteria as the third level and so on (if deemed necessary by decision makers). At the bottom of the hierarchy are the decision alternatives. After all elements (criteria, subcriteria and decision alternatives) are determined, decision makers will then do comparative judgements of these elements. For example, in an MCDM problem with 3 levels (top – objective, middle – criteria and bottom – decision alternatives), with respect to each criteria, all decision alternatives are compared two at a time (pairwisely) for their relative importance in the problem. In some cases, decision makers will also need to make comparative judgements of all criteria with respect to the objective of the problem. All the pairwise comparisons are put in matrices where the matrices are going to be positive and reciprocal with value one on the diagonal entries [4]. As mentioned earlier, the AHP can also be applied in solving qualitative MCDM problem. This is done with the use of Saaty’s scale of relative importance [4]. Once the pairwise comparison matrices are formed, they are now synthesized to obtain the weights/priorities of the all elements. The original method proposed by Saaty to obtain the weights is the eigenvector method (EM) where the right principle eigenvector corresponding of the pairwise to the maximum eigenvalue comparison matrix is estimated. The EM method will calculate the weights of all criteria (if they were not known), and the weights/relative importance of decision alternatives in terms of each criterion. Consider a case with criteria and decision alternatives. Let entry in represents the relative value of alternative the matrix when it is considered in terms of criterion (after applying EM method). Therefore the best decision alternative is indicated by
= m ax ∑
, for =1,2,…,
(1)
=1
where
is the weight of criterion
.
To ensure the ‘unbiasedness’ of the decision due to human comparative judgements, all pairwise comparison matrices are checked for their consistency. Consistency ratio (C.R.) is used for this purpose. Given matrix , C.R. = C.I./R.I. where C.I. is an consistency index and R.I. is random index. The C.I., also called the deviation from consistency is – )/( -1). The R.I. is the represented by ( consistency index of a randomly generated reciprocal matrix from the scale 1 to 9, with reciprocals forced [4]. The C.R. is required to be less or equal to 0.1, if not, the decision makers will need to re-judge the elements.
1.1.2. Singular Value Decomposition (SVD). In this paper, we choose to work with AHP but using singular value decomposition (SVD) and the theory of low-rank approximation of a (pairwise comparison) matrix as the method in obtaining the weights of the elements for a more justified result in the decision making process [13]. The SVD is a widely used method in many different areas such as face recognition, gene expression analysis and geophysical inversion. Some applications can be found in [15],[16] and [17]. matrix is the The SVD of an decomposition of the matrix into the product of three special matrices. Consider the SVD of pairwise comparison matrix, from Gass and Rapcsák [14].
Theorem 1.1.
!
"
#
$
$
'
=
(
1 '
2
'
2
1 &
'
2
2 &
%
'
2
1
1
'
1
1
%
'
'
1
1
2
&
1
&
&
&
, w = (wi) ∈ +
(2)
*
)
,
-
.
.
'
>
1 B
C
C
@
G
?
H
F
1
D
'
H
I
O
B
A
P
Q
?
B
?
K
?
@
5
6
7
8
7
9
C
2
C
G
K
E
B
N
I
@
5
H
A
C
S
I
?
6
C
D
?
T
F
A
D
J
>
?
?
I
E
B
A
K
F
>
A
∑
?
F
8
0
6
@
A
K
C
T
?
O
H
M
@
)
2
?
D
F
A
A
M
?
F
P
Q
?
@
A
B
?
D
G
K
H
I
B
E
I
A
8
>
A
C
A
+
J
A
C
?
,
0
H
H
8
3
(1
=1
;
R
E
A
F
K
P
*
F
B
N
I
H
C
S
C
F
D
>
F
K
I
K
E
C
F
?
O
+
E
?
@
C
A
K
E
C
J
B
?
A
K
I
?
@
F
I
?
T
F
P
;
>
/
U
>
3
C
F
H
C
F
D
G
H
I
B
J
I
H
G
E
?
@
A
K
R
E
A
K
M
V
W
2
When used in the AHP, all the ’s are integers and belong to the interval [1,9] (according to Saaty’s scale). To obtain the weights in SVD, =
M2USIC 2006
D
TS- 1A
Theorem 1.2. u C
J
F
I
F
A
P
H
Q
F
A
D
C
I
A
I
H
A
F
H
?
K
@
I
C
B
=
C
>
E
v Q
I
B
A
?
@
D
A
E
?
A
F
D
G
O
I
B
C
F
K
E
I
A
C
A
N
A
H
P
?
D
K
O
?
F
M
F
B
?
H
C
B
J
@
I
H
I
B
?
F
D
G
I
K
B
J
Q
I
H
M
A
G
A
B
A
V
H
E
G
E
C
?
C
P
I
N
K
D
V
O
T
A
>
E
C
A
J
E
H
?
A
R
W
F
@
I
O
A
O
B
C
B
T
O
C
K
B
C
N
?
C
K
E
J
=
?
K
M
?
@
A
P
C
A
P
C
A
Q
R
D
?
J
A
@
@
E
?
I
B
K
I
A
C
G
T
A
E
B
R
P
A
?
F
Q
H
L
N
C
A
P
>
K
V
F
A
P
Q
B
T
F
T
F
A
I
N
I
H
C
S
K
A
=
∑ =
C
>
E
I
G
A
I
K
M
I
@
N
A
=1
C
@
I
= 1 ,...,
E
K
Q
N
A
C
F
C
(3)
P
N
H
A
C
,
1 +
F
?
B
P
1
A
K
+
F
N
B
A
F
A
>
?
C
A
G
?
C
R
I
K
O
>
Q
C
S
R
E
I
?
J
C
F
D
I
C
K
C
P
F
O
K
B
H
K
Q
I
E
N
F
>
N
?
A
=
C
A
@
F
C
N
C
S
I
C
F
?
I
?
C
G
F
C
L
F
P
H
A
I
K
G
E
G
K
C
?
F
K
>
@
W
Readers are referred to [14] for proof of the theorems and consistency measurement of the pairwise comparison matrix.
1.2 Genetic Algorithms (GAwNN)
with
Neural
Network
GAwNN is a hybrid function; combining the used of GA technology with Backpropagation NN to classify a mass in breast cancer dataset [19]. The model managed to reduce the diagnosis time as well as producing high accuracy percentage in classifying mass in breast to either benign, or malignant. It has the advantages of both GA and NN techniques where the GA will actually helps the BPNN to learn faster and escaped from being trapped in local minima [22], [23], [24] and [25]. The modules in GAwNN prototype model are as shown in Figure 2 below [19]. Clinical data regarding breast tissue from mammogram images
x1
x2 x3 x4 x5 x6 x7 x8 x9 Input layer
First submodule in initializing weight Second submodule in initializing weight
h1
h2
y1
h3
Hidden layer
Output layer
Preprocess data
FIGURE 3: SUB-MODULES IN INITIALIZING WEIGHTS FOR BPNN USING GA
Cleaned data
However, to incorporate GAwNN with AHP with SVD architecture, few amendments are needed on the original framework and module. The changes needed are explained in brief in sub topic 3 that is the proposed system.
No of attributes
GA model
good starting point to start the learning process [21]. In this NN classifier module, Fletcher-Reeves conjugate gradient will be used as the training algorithm as it gave the best output with shortest computation duration. It also has the smallest storage requirements of the conjugate gradient algorithms. The optimum initial weight for each layer in the NN architecture is calculated with GA approach. GA used in this method is according to Khairuddin Omar [25], Tsoi [27] and Hush and Horne [28], which is initially used to find the best number of hidden nodes in BPNN. To ensure that the weights generated using sigmoid function is a small value, the initial weight should not be too big and the initial input signal to hidden or output nodes must be in the range of (0.5, 0.5). The number of interconnected nodes in the NN architecture will be calculated and used as the number of optimum initial weights to be produced. The GA modules will produce two output files each, which contain the population generated and the optimum weights selected for each layer. Figure 3 shows the submodule of weights initialization in GAwNN.
NN Model for data mining Initial weights calculated
Classification result
2.0 PROBLEM STATEMENT
FIGURE 2: MODULES IN GAwNN FRAMEWORK
Ideally, setting the initial weight for NN model will reduce the time needed for mining process as it has a
3
As breast cancer can be very aggressive, only early detection can prevent mortality. The proposed system is to eliminate the unnecessary waiting time as well as reducing human and technical errors in diagnosing breast cancer. Furthermore, there are no expert
M2USIC 2006
TS- 1A
3.0 PROPOSED SYSTEM Hitherto, the usage of the AHP in the breast cancer study is mostly on making decision on selecting the best treatment for breast cancer [11][18] which means the patient has already been diagnosed with the disease. However, these treatment decisions though the best following the dignosis , may not be of any help if the patient was in the first place wrongly diagnosed. Therefore we propose to incorporate GAwNN into the AHP with SVD in classifying a mass into its type. The proposed hierarchical problem are shown as in Figure 4.
SPEC
CIRC
mitos
normn
bland
baren
secsi
marga
ucsha
ucsiz
To classify a mass in breast to either be non-cancerous or if yes, the type of cancer
clump
radiologists who can confidently classify a mass is cancerous, what’s more to determine the type from mammogram images. In addition, existing classification systems are still in testing and training stages. Hence, new paradigm and tools are still welcomed. Even though GAwNN gives faster result and high accuracy percentage in classifying mass in breast cancer [20], the model can only classify it into benign or malignant. This research is aimed to enhance the result by further classify the mass record into circumscribed (CIRC), speculated or satellite (SPEC), lobulated (LOB) or ill-defined (ILL) [29]. As AHP can incorporate any weight calculation method and have not yet been tested to classify types of breast cancer, GAwNN will be adjusted and embedded into its architecture. Furthermore, both AHP and GA has been widely used and accepted in many real application in medical, military, portfolio selection, agriculture, fisheries and many more. For this research, data are taken from single source, that is from Dr William H. Wolberg from University of Wisconsin Hospitals [30]. The dataset are actually a record regarding mass in breast from mammogram images, and the values entered are by an expert radiologist based on the mammogram images. Preliminary examination on dataset chosen is compulsory before cleaning process take place. The dataset consist of 699 data with 65.5% classified as benign and 34.5% as malignant. As medical data are best not to be tampered [20], all data with missing value are eliminated. There are nine criteria recorded for masses in this dataset and they are clump thickness (clump), uniformity of cell size (ucsiz), uniformity of cell shape (ucsha), marginal adhesion (marga), single epithelial cell size (secsi), bare nuclei (baren), bland chromatin (bland), normal nucleoli (normn) and mitoses (mitos) with one class attribute that is either benign or malignant.
ILL
LOB
FIGURE 4: THE ANALYTIC HIERARCHY OF IDENTIFYING AND CLASSIFYING A MASS IN BREAST From Figure 4, we can see that at the top level or level 1, lies the main objective of the problem i.e. to classify a mass in breast into the specific type. The second level of the hierarchy comprises all criteria that contribute to the identification of the mass, that is the attributes of the dataset. While in the last/bottom level are the alternative decisions on the classes of the mass. In order to embed GAwNN into AHP with SVD, the hierarchical problem have to be separated into two sub module, which is to find optimum weights in the first layer, and to obtain the pairwise comparison matrices of each decision alternativesas the second layer. Thus, GA will be used in the second level, that is to determine the optimum weights for all nine criteria of mass. The chromosomes needed in GA are represented using binary representations and the population for each generation is fixed to 10. The regeneration process will stop when the fitness value of each possible solutions is in the range of (0.09, 0.12) or the GA module has generated a maximum of 200 generation. By the end of the simulation process, nine optimum weights will be selected based on its fitness value, to represent the mass criteria. For level 3, NN analysis will be used as a data mining tools for knowledge discovery in the dataset, not a classifier as in original GAwNN prototype. This is to find the relative importance of attributes that contributes in classifying masses into its type. As a knowledge discovery tool, the NN will be able to learn hidden patterns or relationships of attributes, from the dataset. For example, if the relationship between attribute ‘ucsha’ and mass type ‘circ’ is very strong when compared to type ‘spec’, the pairwise comparison value for the two types will be 9 and 1/9 reciprocally. =9 (i.e. in the pairwise comparison matrix, entry =1/9). and I
I
4
M2USIC 2006
TS- 1A
Therefore, the pairwise comparison matrices of the decision alternatives with respect to each criterion can be obtain. The radiologist/doctor’s pairwise judgements are not used here as we are tying to eliminate the 10-30% tendency of undetected abnormalities from mammogram readings. In addition, the values generated by the NN module are hopefully more reliable as it tries to find any hidden relationship between attributes and the masses types. After all pairwise comparison matrices are obtained, we will apply the SVD approach to obtain the weights/priorities of the decision alternatives (class of mass) with respect to each criterion (mass attributes). The pairwise comparison matrices will also be checked for consistency. We then calculate the overall ranking of the mass class using (1) and conclude our decision. The summary of the system is as in Figure 3. Level 1 - Objective: To identify and classify mass Level 2 - Use GA to obtain weights/importance of criteria (mass’ attributes) in identifying mass class Level 3 - Use NN to obtain pairwise comparison matrices for decision alternatives (mass classes/types). Apply the SVD approach to obtain weights/rankings of mass types with respect to each criterion. Check consistency of pairwise comparison matrices. Use (1) to determine the overall ranking of the mass types.
FIGURE 5: THE SUMMARY OF THE PROPOSED SYSTEM IN CLASSIYING MASS
4.0 CONCLUSION Based on past research and real applications to date, the many methods and approaches are still not enough to correctly identify the type of mass. With the wide, proven and effective applications of the AHP in solving multi criteria decision making models both by academicians and practitioners, it is hoped that this approach can help in deciding the mass classes, given the many available criteria. After reviewing several classification models in classifying breast cancer [21], as well as other related materials, we have come out with the idea of combining these two techniques in our system. The new hybrid system is expected to perform better than any existing NN classifier models. As the trend of intelligent systems is geared towards integration of several AI approaches [26], this proposed system would not only follow trends, but hopefully will be the leading in medical decision making modelling.
5
Only values with best or highest accuracy rate will be reused in our new model. Therefore, a hybrid function using GAwNN and AHP with SVD would be further tested in this research to solve some of the stated problems.
REFERENCE [1] Ed: Lim, G.C.C., Halimah, Y., Lim, T.O., The First Report of The National Cancer Registry: Cancer Incidence in Malaysia 2002. Malaysia: National Cancer Registry, Ministry of Health Malaysia, July 2003. [2] Ed: G.C.C. Lim., Y. Halimah., Second Report of The National Cancer Registry: Cancer Incidence in Malaysia 2003. Malaysia: National Cancer Registry, Ministry of Health Malaysia, Dec 2004. [3] http://www.radiologymalaysia.org College of Radiology, Academy of Medicine of Malaysia, [2006, June 5] [4] T.L. Saaty, The Analytic Hierarchy Process, NY: McGraw-Hill, 1980. [5] T.L.Saaty, Fundamentals of decision making and priority theory with the AHP. Pittsburgh, PA: RWS Publications, 1994. [6] P.R. Drake, "Using the analytic hierarchy process in engineering education", International Journal of Engineering Education, Vol. 14, No. 3, pp 191-196, 1998. [7] S. Petrovic-Lazarevic, “Personnel selection fuzzy model”, International Transactions in Operational Research 8, pp 89-105, 2001. [8] M. Punniyamoorthy, and P.V. Ragavan, “A strategic decision model for the justification of technology selection", The International Journal of Advance Manufacturing Technology, Vol. 21, pp 72-78, 2003. [9] E.B. Sloane, “Using a decision support system for healthcare technology assessments: Applying the analytic hierarchy process to improve the quality of capital equipment procurement decision”, IEEE Engineering in Medicine and Biology, vol.23, no. 3, pp 42-55, May/June 2004. [10] E.B. Sloane, M.J. Liberatore, R.L. Nydick, W. Luo, and Q.B. Chung, “Using the analytic hierarchy process as a clinical engineering tool to facilitate an iterative, multidisciplinary, microeconomic health technology assessment”, Journal of Computers and Operations Research, vol. 30, pp 1447-1465, 2003. [11] K.J. Carter, N.P. Ritchey, F. Castro, L.P. Caccamo, E. Kessler, and B.A. Erickson, “Analysis of three decision-making methods: A breast cancer patient as a model”, Medical Decision Making, vol. 19, no. 1, pp 49-57, Jan-Mar 1999. [12] M. Hokey, A. Mitra, and S. Oswald, “Competitive benchmarking of health care quality using the analytic hierarchy process: an example from Korean cancer
M2USIC 2006
TS- 1A
clinics”, Socio-Economics Planning Sciences, vol. 31, issue 2, pp 147-159, June 1997. [13] S.I. Gass, and T. Rapcsák, “Singular value decomposition in AHP”, European Journal of Operational Research 154, pp 573-584, 2004. [14] S.I. Gass, and T. Rapcsák, “A note on synthesizing group decisions”, Decision Support Systems vol 22, pp 59-63, 1998. [15] M.A. Porter, P.J. Mucha, M.E.J. Newman, and C.M. Warmbrand, “A network analysis of committees in the U.S. House of Representatives”, Proceedings of the National Academy of Sciences, vol. 102, 2005, pp 70577067. [16] S.L.M. Freire, and T.J. Ulrych, “Application of singular value decomposition to vertical seismic plotting”, Geophysics, 58, pp 778-785, 1998. [17] P. Neocleus, SVD methods applied to wire antennae. 2003. [Online] Available: http://www.ipam.ucla.edu/publications/invla/invla_4483.p pt [2005, December 9] [18] C. Reibnitz, and M. Silva Saavedra, “Decision analysis in cancer therapy, A new way for evaluation outcomes with the analytic hierarchy process (AHP)”, Value In Health – The Journal of The International Society for Pharmacoeconomics and Outcomes Research, vol. 2, no. 1, January 1999. [19] Afzan, A., Khairuddin, O., “Mining Clinical Data: Supervised Learning with Genetic Algorithms for Mass Classification in Breast Cancer Dataset”, Proceedings of the Seminar Siswazah FTSM 2005, Bangi, Malaysia. [20] Afzan, A., Genetic Algorithm with Back propagation Neural Network for Classifying Mass in Breast, Masters thesis, Universiti Kebangsaan Malaysia, 2006. [21] Afzan, A., Khairuddin, O. 2006. “Critical analysis on various network models in predicting breast cancer”, Proceedings of the International Conference on Computer and Communication Engineering, Kuala Lumpur, Malaysia. Vol: 1, pp 37-41. [22] S.J. Marshall., & R.F. Harrison, “Optimization and Training on Feed-forward Neural Networks by Genetic Algorithms”, IEEE 2nd International Conference on Artificial Neural Network, 1991, pp 39-43. [23] Burke, H.B., Goodman, P.H., Rosen, D.B., Henson, D.E., Weinstein, J.N., Harrel, F.E.; Marks, J.R., Winchester, D.B., Bostwick, D.G. 1997. “Artificial Neural network improve the accuracy the accuracy of cancer survival prediction”, Cancer, vol. 79, pp 857-862. [24] M. McInerney, and A.P. Dhawan., “Use of Genetic Algorithms with Backpropagation in Training of Feedforward Neural Networks”, Proceeding of IEEE International Conference on Neural Network, Vol 1, No 3, 28 March-1 April 1993, pp 203-208. [25] Khairuddin Omar., Pengawalan Pemberat untuk Perambat balik menggunakan Algoritma Genetik: Hasil Ujikaji, Laporan Teknikal Perkembangan Penyelidikan PhD Thesis, Universiti Putra Malaysia, 2000. T
T
6
[26] S. Fadzilah, M.A. Azliza, “Web-Based Neurofuzzy Classification for Breast Cancer”, Proceedings of the Second International Conference on Artificial Intelligence in Engineering & Technology, Sabah, Malaysia. 2004, pp 383-387. [27] A.C. Tsoi, “Constructive Algorithms. A Course on Artificial Neural Networks”, Jointly Organised by MIMOS & Computer Centre, University of Malaya, 4-8 Jul 1994. [28] D.R. Hush., and B.G. Horne., “Progress in Supervised Neural Networks”, IEEE Signal Processing Magazine, January 1993, pp 1-4. [29] C. Ioanna, , D. Evalgelos, K. George, “Fast Detection of Masses in Computer aided Mammography”, IEEE Signal Processing Magazine, January 2000, pp 54-64. [30] UCI Machine Learning (no date), Wisconsin Breast Cancer Databases , (online) http://www.ics.uci.edu/~mlearn/MLSummary.html
M2USIC 2006
TS- 1A
Numerical Evolutionary Optimization using G3-PCX with Multilevel Mutation Jason Teo1 1
Mohd. Hanafi Ahmad Hijazi2
Artificial Intelligence Research Centre, School of Engineering and Information Technology Universiti Malaysia Sabah, Locked Bag No. 2073, 88999 Kota Kinabalu, Sabah, Malaysia Tel: +6-088-320000 ext. 3104, Fax: +6-088-320348, E-mail:
[email protected]
2
Artificial Intelligence Research Centre, School of Engineering and Information Technology Universiti Malaysia Sabah, Locked Bag No. 2073, 88999 Kota Kinabalu, Sabah, Malaysia Tel: +6-088-320000 ext. 3137, Fax: +6-088-320348, E-mail:
[email protected] GAs, the chromosome is a vector of floating point numbers whose length is kept the same as the number of variables to be optimized in the problem, thus directly representing a trial solution to the problem. Such GAs based on real number representations are commonly referred to as realcoded genetic algorithms (RCGAs) [4, 7].
Abstract The objective of this paper is to implement a multi-pronged strategy for generating diversity using non-adaptive, adaptive as well as self-adaptive methods for controlling mutation operations in a real-coded genetic algorithm (RCGA). Currently, one of the state-of-the-art RCGAs for function optimization is called the G3-PCX algorithm. However, its performance for solving multimodal problems is known to be poor compared with its performance for unimodal problems. In G3-PCX, the main problem primarily stems from premature convergence to local rather than global optima due to lack of explorative capabilities of the algorithm. As the G3-PCX algorithm relies completely on crossover for promoting diversity, this paper proposed a multilevel mutation operator to augment the algorithm’s capability of escaping local optima. The proposed algorithm is called G3M2 (G3-PCX with Multilevel Mutation) and empirical tests on four benchmark multimodal test functions have shown highly competitive performance. In three of the four problems, G3M2 dramatically outperformed the standard G3-PCX algorithm in terms of solution quality. Thus, the multilevel combination of non-adaptive, adaptive and self-adaptive into a single paradigm is empirically shown to have beneficial effects for enhancing the effectiveness of the G3PCX algorithm for solving multimodal optimization problems in terms of solution quality.
In canonical evolutionary optimization paradigms, fixed rates and settings are typically used for parameters such as crossover, mutation and population size [1,5,16]. However, advanced evolutionary algorithms typically now incorporate some form of adaptive or self-adaptive mechanisms in order to reduce the number of evolutionary parameters that need to be set prior to running the actual optimization process. In terms eliminating crossover and mutation rates as userdefined parameters, significant progress has been made through adaptation and self-adaptation of crossover and mutation rates as well as population size [5]. In fact, it has been found to actually improve the quality of solutions resulting from the search process [1,5].
Background For function optimization problems in continuous search spaces, one of the main difficulties currently faced is that of locating high quality solutions. In other words, solvers for these problems must be able to obtain solutions with a high degree of precision [10]. This problem is particularly pertinent for continuous multimodal problems where the quality rather than computational efficiency is more important as a test of the solver’s ability to escape local optima and finding solutions near the global optimum [17]. Moreover, this difficulty is further compounded when the function involves large numbers of variables, which translates into a highly deceptive fitness landscape with very large numbers of local optima [9].
Keywords: Artificial Intellignece, Global Optimization, Selfadaptation, Adaptation, G3-PCX, Real-Coded Genetic Algorithms.
Encoding solutions based on real numbers offers the advantage of defining a large variety of specialized realcoded crossover operators that are able to take advantage of the inherent numerical characteristics. Hence, many different versions of real-coded crossover operators exist for RCGAs. In particular, the blend crossover (BLX) operator [6], the simulated binary crossover (SBX) [6], the unimodal normal distribution crossover (UNDX) operator [11], the simplex crossover (SPX) operator [8] and the parent-centric crossover (PCX) operator [3] have been studied and used
Introduction There is currently a strong interest in the application of evolutionary algorithms for solving real-world optimization problems [1]. Consequently, a large number of recent studies in genetic algorithms (GAs) have focused on the use of real number encoding for solving continuous functions, particularly those with large numbers of variables. In these
7
M2USIC 2006
TS- 1A
extensively. The common theme among these different crossover methodologies is the generation of new offspring which are primarily parent-centered, which essentially defines a probability distribution of offsprings based on some measure of distance among the parents. Further detailed information regarding real-parameter GA recombination operators can be found in [4, 7].
complex continuous multimodal problems. The mutation operator in GAs mainly serves to create random diversity in a population of solutions [15]. Hence, it is hoped that the augmentation of G3-PCX with the proposed multilevel mutation will enhance its performance in multimodal problems.
As such, a great majority of RCGAs tend to utilize crossover operators which use some form of arithmetic recombination that in general involves the creation of a new gene i for an offspring z arising from parents x and y according to the formula z i = xi + (1 − α ) yi for some α
G3-PCX Algorithm with Multilevel Mutation (G3M2) The G3-PCX algorithm is one such parent-based RCGA, which uses the PCX crossover operator without any mutation operator. Although proving highly successful and very efficient for solving continuous unimodal optimization problems, it performed less desirably for highly deceptive fitness landscapes found in large scale multimodal problems with large numbers of local optima [3]. In the next section, the G3-PCX algorithm is proposed to be augmented through a multilevel mutation operator that combines both adaptive and non-adaptive mutation. The adaptive mutation operation is activated only when the algorithm is detected to be prematurely converging to a local optimum, which is measured using the parameter ε, and is applied only to the parents. The non-adaptive mutation operation is always active since it is used on every new offspring generated, and is only applied to offsprings. A flag (represented by an integer from 0 to 2), which indicates whether the adaptive (0), non-adaptive (1), or both (2) mutation operations should be applied, represents the self-adaptive portion of the algorithm. This flag is encoded into the chromosome and evolved during the course of the evolutionary optimization process. Details of the proposed G3M2 algorithm are presented below.
in [0,1]. Although new genetic material can be created, there is the disadvantage that the range of values is reduced as a result of this averaging process [5]. However, it has been reported in the recent past that RCGAs utilizing some of these parent-based crossover operators exhibit selfadaptive search properties similar to that of evolution strategies and evolutionary programming [4]. Based on these findings, it was argued that depending on the current diversity of the population, these RCGAs self-determine whether exploitation or exploration of the search space will be carried out without requiring an external adaptive control mechanism. Consequently, the use of the mutation operator is foregone in favor of these self-adaptive crossover operators that alone can automatically introduce arbitrary diversity in the offspring population when necessary. One of the most computationally efficient real-coded genetic algorithms (RCGAs) for solving function optimization problems is the G3-PCX algorithm, which was developed by Deb et al. [3] in 2002. In this algorithm, a parent-centric crossover operator known as PCX is used as the sole mechanism for generating population diversity as the evolutionary algorithm explores the search space of the optimization problem. It was claimed by the authors of G3PCX that such recombination procedures are sufficient since they are able to generate arbitrary diversity in the offspring population. Hence, the G3-PCX algorithm does not utilize any mutation procedures and solely relies on the PCX crossover operator. This algorithm was shown to be highly successful in solving unimodal problems compared to other RCGAs including well-known evolutionary algorithms such as Differential Evolution [12] and Evolution Strategies [13] as well as the commonly used classical quasi-Newton optimization method [14], where up to an order of magnitude speedup in computational efficiency was achieved in some cases.
1.
From the population P(t) select the base parent and µ-1 other parents randomly.
2.
Crossover: Generate λ offspring from the chosen parents using the PCX crossover scheme: a.
Calculate the mean vector parents.
b.
Select one parent
x
g of chosen µ
p
for each offspring
g with equal probability. c.
To-date, G3-PCX remains one of the most competitive stochastic solvers, where its performance in terms of convergence speed and solution accuracy remains unmatched for solving high-dimensional unimodal problems. However, it was also reported that the G3-PCX algorithm was much less successful in locating high quality solutions for multimodal problems and the authors highlighted that a new approach which can overcome this shortcoming would be greatly beneficial [3]. As such, the primary motivation for this study is drawn directly from the challenge offered by the original authors of G3-PCX to improve the performance of the algorithm for solving
If base parent’s Self-Adaptive Mutation flag = 0 or 2, then activate Adaptive Mutation: p
i. If the distance between x and g is less than ε, then apply Gaussian mutation N(0,1) to the index parent with some probability Uniform(0,1)) If L ∈ REG we can simulate it using a Turing machine which only go’s to the right but does not need any space to save information. Hence it is also ESPACE(0). (4##
(
?# -
&
#>4## 2 J4##
+
% % 2
( ( ( -
% %
(
%
2 #>4##
( 4##
% !4##
% -
%
(
:3
6 -
%
!4## (
-
(
(
2
14##
- ,
%
)
:
@ :)A
:3
8
- , < =
*
%
%
! " #) ! " ##
3
* +
)
$
E &
#
?# L
! " ### * +
$
2 BF 30 2 BF
)2E F 2
;
%
%
%
%
! " ### '
* +
&%
%
%
%
%
%
, ! ") (
%
%
%
%
)( #.4##; >4##
%
%
! " ## '
* +
H4
%
%
%$% %
! ") (
%
%
%
! "#' (
&%
%
%
%
%
%
$
, ! ")
,* +
2 BF
2 BF
!H
1
"1 H
1" 1
".
H1 "
1#
1#
!. J
"> >
!! H
#.4## 9 #J4##
>J
#J4## 9 #>4## #>4## 9 #4##
, 0
K ( 1> 1
, 0 >!
#4## 9
4##
1#
1#
!.
"> J
!! #
4## 9
4##
">
1#
!. >
"> >
!! >
4## 9 "4##
#!
#"
># 1
#"
>
"4## 9 14##
.
!!
"# !
.#
"H 1
14## 9 H4##
.#
H>
H
!"
"# >
H4## 9 !4##
J1
HJ
JJ !
!
> H
!4## 9 .4##
J#
1
.. "
.4## 9 J4##
##
"H
"H #
J4## 9 >4## % 4B
>.
"
;"" 3
)( (
2
H
HH !
"
;""
? 2
% - ( (
2
% 2
2
;! ##
>J
#J4## 9 #>4##
;. H>
>
#>4## 9 #4##
H !!
>#
#4## 9
4##
" !>
" #H
4## 9
4##
1 11
>J
4## 9 "4##
# !1
>
"4## 9 14##
;" #
14## 9 H4##
;H !
JJ
H4## 9 !4##
;J >
" #H
!4## 9 .4##
""
>
.4## 9 J4##
> J!
>H
J4## 9 >4##
# H1
JJ
B
K ( "" H
, 0
#.4## 9 #J4##
6 % #.4##
103
% "4 ;
% 1 % % 2 3 #>4## @ (
"
E
F
)(
"
F
:) (
F
@2
2
:' %
-
2
:) ( #J4## @
#.4##
A
'A
:) #J4##
'
$A
2 ( , 4## >4## ( F
-
2
2 :) (
:)
( (
2
2 ( ( B "4##; 14## ( . ( ( A 8 , :) - (
. #, 0
-
:) # ", 0
% , 0 ( 0 (
% ( ! !, 0 @
. #, 0 "# !K "!K ( M N (
M2USIC 2006
(
TS- 1D
% - :' ; 6 % % 2 ( -
( (
-
(
2 (
;
-
? (
% (
- , ( ; ( & ( , ( (( (
:' (
% -
(
(
-
-
% ( -
( (
-
( %
( -
(
% 6 :'
(
$ = #.4## 9 #J4##
3
'
#J4## 9 #>4##
3
'
$
#>4## 9 #4##
3
$
$
#4## 9
4##
3
$
$
4## 9
4##
3
$
$
< = $ + ( F F - , 2 E :6 4$ 2 >>H < = ?$ ? - $ ( 4 ( E + F E $ ### ># x a1 )
(5)
else
On the other hand, for the fuzzy membership function of ε5, the segment Nb situated on the left of the projection of Na. Thus, its membership function is defined as:
∆θb-a
xa2
We use a linear distribution for those 4 fuzzy linguistic terms. Figure 4 also shows us that the overlapping properties of ε1, ε2 and ε3 reflect those of ε4, ε5 and ε6. To begin, we describe the overlapping types called ε2 and ε5. These two classes represent a total separation of each primitive in a pair of segments that are projected into a new axis of reference. The limit inferior of these classes is when the primitives in this pair touch each other. When two segments overlapped, these classes represent the relative length of a fraction of a projection (of the segment Na) that is not overlapped. For the fuzzy membership function of ε2, the projection of Na is situated on the left of Nb. This function is defined as:
1 − ε1 (x − x a1 ) ε 2 = b1 a max 0
where ∆θ is the opening angle of pair (Na, Nb) in the old co-ordinates so as to let Na act as the X axis of the new co-ordinates. Hence, the overlapping is calculated by projecting the segment Nb into the new co-ordinate (X axis).
xa1
distances between two segments in x-axis are illustrated in figure below:
1− ε6 (x − xb 2 ) ε 5 = a2 a max 0
Na,(θa)
xb2
Nb,(θb)
Figure 3: Projection of segment Nb on Na.
(x a1 ≥ xb 2 )
(x a1 < xb 2 ) ∧ (x a 2 > xb 2 )
(6)
else
where amax = (xa2 – xa1) and bmax = (xb2 – xb1), represent the total length of segment Na and the projection length of Nb on new x-axis that is parallel with Na. The first condition in (5) and (6) represents a total separation of those projections. The second condition shows the relative length of segment of reference (Na) that is not overlapped. The function returns zero elsewhere.
Figure 3 above shows the overlapping of the projection of Nb [xb1,xb2] on Na [xa1,xa2], where xa1 < xa2 and xb1 < xb2. We propose 6 types per axis of spatial relationships to represent the separation or overlapping between 2 segments in that axis. The separating or overlapping
317
M2USIC 2006
TS- 3B
For the fuzzy membership function of ε3 and ε4, a linear distribution is assured by integrating a relative distance of projections’ submerge, named ε’, together with a distance between the centers of two overlapping projections. We divide the computation of the relative distance of ε’ into four situations of overlapping: 1. The first situation consists of a partial overlapping, where one of the extreme points of every projection stay outside the overlapping. During this type of overlapping, the membership functions of ε’; represents a ratio between the length of the overlapped projection and the length of segment used as axis of reference, Na. In this situation, the segment Na is situated on the left of segment Nb. 2. The second situation is described by a projected segment that submerges totally in the segment of reference, Na. This happens when the projection of Nb is smaller than Na. 3. The third situation is the opposite of the one described in the second. The projection of Nb is longer than Na. 4. And the forth situation is the complement of the first one. It represents the same ratio as explained in the first situation above, i.e. the relative un-overlapped length of segment Na, when segment Na is situated on the right of segment Nb. For the function of ε3 and ε4, these 4 situations are represented respectively by:
rcentre =
(8)
a max + bmax
Hence, our middle membership functions are given by:
ε 3 = ε '.(rcentre + 1)
(9)
ε 4 = ε '.(1 − rcentre )
(10)
Last but not least, we introduce our last 2 marginal membership functions in order to differentiate between a limit of separation (ε2 and ε5) and a total separation within a restricted window of descriptor (ε1 and ε6). Thus, our marginal membership functions are defined by:
(x b1 − x a 2 ) Dist MaxSep ε1 = 0
x b1 > x a 2 (11)
else
( x a1 − x b 2 ) x a1 > x b 2 Dist MaxSep ε6 = (12) 0 else where DistMaxSep = window width – (bmax) – (amax). Relatively, we also have another 6 fuzzy classes that represent the adjacency properties of 2 segments in our new y-axis.
3.3. Reconstruction of spatial descriptor.
ε'= (xa 2 − xb1 ) a max 1 2 1 4 (xb 2 − xa1 ) a max 0
2. m b − m a
At the end of the previous processes, we have the adjacency properties between segment Na and segment Nb in 2 dimensional plans which can be obtained by synthesizing the way they overlapped on new relative x-axis (segment Na) and on its normal-to-segment Narelative-y-axis.
(xa 2 > xb1 ) ∧ (xa 2 < xb 2 ) ∧ (xa1 < xb1 ) (xa 2 > xb 2 ) ∧ (xa1 < xb1 )
(xb 2 > xa 2 ) ∧ (xb1 < xa1 )
ε (ij ) (N a , N b ) = ε i (xa1 , xa 2 , xb1 , xb 2 ) × ε j ( ya1 , ya 2 , yb1 , yb 2 )
(xa1 < xb 2 ) ∧ (xa 2 > xb 2 ) ∧ (xa1 > xb1 )
(13)
Here, our spatial descriptor for each pair of segment is defined as:
else
Dsk ,(ij ) ( N a , N b ) = µ k (∆θ ) × ε (ij ) (N a , N b )
(7) After considering all possibilities of overlapping, we define the 2 central membership function, ε3 and ε4, by multiplying the submergence relative, ε’, with the relative distance of centers between 2 overlapping projections. The relative distance of those centers is defined by:
(14)
As we have 4 classes for the opening angle, µ k (∆θ ) , 6 classes for ε i ( xa1 , xa 2 , xb1 , xb 2 ) and 6 more classes for
ε j ( y a1 , y a 2 , yb1 , yb 2 ) , the descriptor is encoded in 4 x 6 x 6 matrix :
318
M2USIC 2006
TS- 3B
correspond to the relative length of the reference segments. Based on the above equation, the distance between every candidate image from the database and the query image can be determined. Then, the candidate image with the minimum similarity distance of D for the query image is selected as the retrieved image.
4 classes opening angles
5. Experimental result and discussion We tested the retrieval system using an image database containing 8 different buildings. The buildings are randomly chosen from some of the world’s famous buildings. There are the Arc of Triumph (Paris), the Burj Al Arab( Dubai), the Eiffel Tower (Paris), the Notre Dame Cathedral (Paris), the Opera House (Sydney), the Petronas Twin Towers (Kuala Lumpur), the Tower Bridge (London) and the Taj Mahal (Agra). For each building, there are 10 images featuring different acquisitions with varying viewpoints, background, rotation and illumination. In total, there are 80 images in the model image database. The images sizes vary from 256*256 pixels to 600*800 pixels. The reason for this variation is to ensure that the retrieval process works satisfactorily for small size images as well as for big size images. The algorithms for the system are implemented in Matlab and the images used are in the JPG format. It is found experimentally that for the Canny’s edge detector, the neighborhood size, N, of 9 gives the most satisfactory results compared to other values. This is the value used as the parameter for the implementation of the edge detector. As a preliminary result of evaluating our retrieval system, we compared the performance of our system for two cases: the first one used only the overlapping properties matrix as the feature and the second one used both the opening angle between a pair of segments and their overlapping properties as the features. To measure the performance, we use a metric known as the Normalized Modified retrieval rank (NMRR) specified in [9]. It incorporates recall, precision and rank information. The NMRR is defined as:
6 x 6 classes overlapping XY projections
Figure 5: Descriptor matrix
4. Similarity matching For L number of segments in the image, we also have L number of fuzzy descriptors. We use Bhattacharyya distance [7] to compute the best matching between all fuzzy features from an input image and those from database. Suppose that an input symbol composed by line segments w={Np|p=1,2,…,L} and the model to be compared consists of line segments w’={N’q|q=1,2,…,L’}. The similarity distance between these two line segments Np and N’q is defined as:
(
)
d N p , N 'q = 4 − log k =1
6
6
∑∑∑ i =1 j =1
Ds k ,(ij ) N p ∗ Ds k ,(ij ) N ' q
( )
( )
(15) where Dsk,(i)(Np) and Dsk,(i)(N’q), defined in above equation are the statistical features associated with each line segment. Matching two shapes is to find the correspondence between their line segments by determining the nearest segment similarity. The distance between shapes is defined as below: min {L , L '}
L − L' . D = 1 + L L ' +
∑d
l ⋅ wl l =1 min {L , L '}
∑
(16)
NG ( q )
Rank (k ) NG (q) − 0 .5 − NG (q) 2 NMRR = k =1 K + 0.5 − 0.5 * NG (q )
∑
wl
l =1
where ∑dl is the segment similarity distances of min{L,L’} matched pairs, [1+|L-L’|/(L+L’)] accounts for the penalty for |L–L’| unmatched primitives, D is the total similarity distance and wl weighs factors that
(17)
where NG is the number of ground truth images considered as similar to the query image. Rank (k) is the ranking of the ground truth images by the retrieval algorithm. Rank of k+1 is assigned to each ground truth image that is not in the first k retrievals. K equals
319
M2USIC 2006
TS- 3B
min (4*NG (q), 2GTM), where GTM is the maximum of NG (q) for all queries. The NMRR is in the range of [0 1]. Smaller values represent a better retrieval performance. ANMRR is defined as the average NMRR over a range of queries. ANMRR =
1 Q ∑ NMRR (q ) Q q =1
6. References [1] A. Vailaya, A. Jain, and H. J. Zhang, “On Image Classification: City vs. Landscape”, Content-Based Access of Image and Video Libraries, Proceedings. IEEE Workshop, 21 June 1998, pp 3-8.
(18)
[2] Q.Iqbal, J.K. Aggarwal , “Applying perceptual grouping to content-based image retrieval : Building Images”, IEEE International Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, June 1999, vol. 1, pp. 42-48.
For each case, a set of 80 query images is used. For each query image, the retrieved images from the image database are arranged from rank 1 to 80 according to their degree of similarity to the query image, with rank 1 representing the most similar and rank 80 the least similar. From these ranks, the NMRR is calculated. The ANMRR value is computed from the 80 NMRR obtained for each query image. Case Overlapping properties Opening angle and overlapping properties
[3] Y.Li, L.G.Shapiro, “Consistent Line Clusters for Building Recognition in CBIR”, In Proceedings International Conference on Pattern Recognition, Quebec, August 2002, vol. 3, pp 952-957.
ANMRR 0.66 0.72
[4] W. Zhang, J. Kosecka, "Localization Based on Building Recognition", Computer Vision and Pattern Recognition, IEEE Computer Society Conference, 2026 June 2005, Vol. 3, p 21.
Table 1: Average NRS
[5] J.M.Ogier, “De l’image au document techniques: Problème de l’interprétation”, Habilitation a diriger des recherches ,Laboratoire Perception, Systèmes et Information, Université de Rouen,France, 2000.
The result in table 1 shows that the use of overlapping properties gives better retrieval result than the use of both opening angle and overlapping properties of the line segments. In the latter case, the addition of the opening angle between line segments implies that slight changes in the segmentation stage of the line segments will result in a different result for the spatial descriptor matrix. The overlapping properties are not as sensitive to the changes due to the way they are constructed. Although the results obtained are preliminary, it can be inferred that the system in case one (overlapping properties) is more robust than the system in case two (overlapping properties and opening angle). One drawback of the system is its executing time. The time increases significantly as the number of line segments increases. For example, on a 2.4 GHz processor, the execution time of an image containing an average of 20 line segments takes about 8 seconds. On the other hand, for 40 and 80 line segments, the execution time is around 28 and 114 seconds respectively. This limits the system to use only a small number of line segments for building description. For a more concrete result, further comparison with existing retrieval systems should also be performed in order to better evaluate the performance of the system. A larger image database should also be used as the size of the current image database is too small.
[6] Remco C. Veltkamp. "Shape Matching: Similarity Measures and Algorithms," In Proceedings Shape Modelling International, Genova, Italy, May 2001, pp 188-197. [7] N. Zakaria, J.M Ogier, J. Llados, “The Fuzzy Spatial Descriptor for Online Graphic Recognition: Overlapping Matrix Algorithm”, Document Analysis System 2006, pp 616-627. [8] L.Lui, R.M. Haralick, “Two Practical Issues in Canny’s Edge Detector Implementation”, International Conference on Pattern Recognition, Barcelona, Spain, September 3-7, 2000, vol. 3, pp. 680-682. [9] A. Mufit Ferman, S. Krishnamacari, A. Murat Tekalp, M. Abdel-Mottaleb, R. Mehrota, “Group-ofFrame/Picture Color Histogram Descriptors For Multimedia Applications”, In Proceedings International Conference on Image Processing, Vancouver, Canada, September 2001.
320
M2USIC 2006
TS- 3B
A Review Study of Face Feature Extraction Karmesh Mohan, Edwin Tan Teck Beng, Mohd Fadhli Mohd Nor, Ying-Han Pang
Faculty of Information Science and Technology Multimedia University Jln Ayer Keroh Lama 75450 Melaka, Malaysia
[email protected],
[email protected],
[email protected],
[email protected] examples of linear dimensionality reduction method is Principal Component Analysis, denoted as PCA [6]. On the other hand, Locally Linear Embedding is an example of nonlinear dimensionality reduction for face feature extraction [4].
ABSTRACT The task of face recognition has been actively researched in recent years. This paper provides a review of major human face recognition research. We first present an overview of face recognition and its applications. Then, a literature review of the most recent face recognition techniques is presented. In this paper we present a review of face recognition system in two different face recognition methods: Principal Component Analysis and Locally Linear Embedding. Principal Component Analysis is a linear subspace analysis to describe the face structure; on the other hand, Locally Linear Embedding is a nonlinear analysis which can describe the underlying structure of face data. The performance of these methods is evaluated on 3 different face databases: FERET, ORL and YALE B. Different settings in terms of feature dimensionality and neighborhood size are set used to test the outcome and accuracy of each method.
PCA a linear transformation that builds a new coordinate system for the data set such that the greatest variance lies on the first principal component, the second greatest variance on the second principal component, and so on. PCA retains the date characteristics that contribute most to its variance. However, PCA just manages to extract linear features of face data, but ignore the nonlinear features that may also contribute significant information of the face identity. Locally Linear Embedding (LLE) was introduced to overcome the limitation of PCA. The main assumption behind LLE is that the data set is sampled from a nonlinear manifold, embedded in the high dimensional space. LLE is unsupervised, non-iterative method, which avoids the local minima problems affecting many competing methods. In LLE, each data point is represented as a convex combination of its knearest neighbors; the data is then mapped into a low-dimensional space, at the same time, the convex combinations (called embedding) is preserved to the best possibility.
1. INTRODUCTION Face recognition is a popular research nowadays due to its ubiquity in the applications of access control, banking transactions, security monitoring and surveillance system. However, face data coming from the real world is often difficult to understand because of its high dimensionality. How can we overcome the challenges by high dimensional date set? One of the solutions is by reducing the data dimensionality by only extracting its significant and informative features.
2. PRINCIPAL COMPONENTS ANALYSIS In the PCA method, the average face of face image set with N dimension {ij ∈ℜ N | j=1,…, M} is defined as i =
There are two types of dimensionality reduction techniques for face feature extraction: linear and nonlinear methods. One of the
1 M
M
∑i
j
. Each face image
j =1
differs from the average face, i by the vector φj =
321
M2USIC 2006
TS- 3B i.e. how well each xi can be linearly reconstructed in terms of its neighbours x N (i ) ... x N (k ) . For one
M
ij- i .
A
matrix C =
covariance
∑ϕ ϕ j
T j
is
j =1
(i )
vector xi and weights w j that sum up to 1, this
constructed. Then, eigenvectors, vk and eigenvalues, λk with symmetric matrix C are calculated. vk determines the linear combination of M difference images with φ to form the Eigenfaces:
gives a contribution
ε ( w) =
M
Ul =
∑v
lk ϕk
2
k
(i) I
∑ w (x (i ) j
i
− xN ( j ) )
j =1
, l = 1,...,M
(1)
k
k
= ∑∑ w(ji ) wm( i )Q (jmi )
k =1
j =1 m =1
(4)
From these Eigenfaces, P( d): R = (Q + rI ) . (i ) Interestingly, Q can also be calculated based on to be used for
just the squared Euclidean distance matrix D between all samples in X:
Q (jmi ) =
1 (Di , N ( j ) + Di, N (m) − DN ( j ), N (m) ) 2
(7)
In stage II, the weights w are fixed and new mdimensional vectors yi a sought which minimized the criterion: n
In stage I, the cost function minimized is:
2
k
ε II (Y ) = ∑ yi − ∑ w y N ( j ) (i ) j
i =1
(8)
j =1
2
k
ε1 ( w) = ∑ xi − ∑ w x
(i ) j N ( j)
i
k
In practice, a regularization parameter r will have
Put another way, the embedding is optimized to preserve the local configurations of nearest neighbors. As we shall see, under suitable conditions, it is possible to derive such an embedding solely from the geometric properties of nearest neighbors in the high dimensional space. Indeed, the LLE algorithm operates entirely without recourse to measures of distance or relation between faraway data points. Below the equations are explained based on the work of [5]:-
n
(i ) jm
m =1
(3)
(i )
The w j ’s is stored in an n x n sparse matrix W,
j =1
where
322
Wi , N ( j ) = w(ij ) . Re-writing eqn. 8 gives
M2USIC 2006
TS- 3B
n
n
ε II (Y ) = ∑
∑
i =1
i =1
where as M the
M
is
TABLE 2: EER OF LLE ON THE FERET DATABASE BASED ON DIFFERENT NEAREST NEIGHBOURS, K
T
M ij yi y j = tr (YMY T ) (9)
an
n
×
n
matrix
found
T
= ( I − W ) ( I − W ) , and Y contains
yi ’s as its columns.
4. EXPERIMENTAL STUDY
EER (%)
21.70
22.02
ORL
Nearest Neighbors
K=3
K=4
K=5
Feature Length
160
140
100
EER (%)
9.68
9.62
9.72
TABLE 4: EER OF LLE ON THE YALE B DATABASE BASED ON DIFFERENT NEAREST NEIGHBOURS, K Database
YALE B
Nearest Neighbors
K=6
K=9
k=11
Feature Length
198
198
60
EER (%)
11.77
11.32
11.69
TABLE 5: RESULT COMPARISON Database Best Feature Length Best PCA EER (%) Best LLE EER (%)
FERET 120
ORL 140
YALE B 198
31.25
11.745
33.43
21.70
9.62
11.32
In addition, the result comparison table in Table 5 states the best feature length, EER value and difference for both LLE and PCA. In average, the value for the best feature length is quite high. This implies the more the features considered, the more useful information is provided for recognizing face identity. Besides that, we also can observe from Table 5 that the performance of LLE is superior to PCA for all the three databases. This justifies that the nonlinear feature structure of face, extracted by LLE, is more significant and informative compared with the linear features extracted by PCA. This implies that a robust feature descriptor
TABLE 1: EER OF PCA ORL 20 11.74
FERET K=4 100
Database
In this project, recognition rate is obtained with different feature lengths. Equal Error Rate, denoted as EER, is used as a measure criterion to the recognition rate. Lower EER obtained indicates the better the feature extraction algorithm. Table 1 shows the optimal recognition results of PCA on the three databases. In FERET dataset, feature length of 180 obtains the lowest EER (31.25%); in ORL database, feature length of 20 obtains the lowest EER (11.74%); while in Yale B, feature length of 140 obtains the lowest EER (33.43%). From Table 2 until Table 4, LLE obtains the best result (EER=21.70%) when tested using FERET database with 120 feature length and 3 nearest neighbors; in ORL database, LLE obtains EER=9.62% with 140 feature length and 4 nearest neighbors; in Yale B dataset, LLE obtains EER=11.32% with 198 feature length and 9 nearest neighbors
FERET 180 31.25
K=3 120
TABLE 3: EER OF LLE ON THE ORL DATABASE BASED ON DIFFERENT NEAREST NEIGHBOURS, K
Three different databases were used to evaluate the performances of PCA and LLE: FERET database, ORL database and YALE B database. All faces in all three databases are cropped before being tested in the experiment. In the FERET database, it contains 121 users with 8 images for each user. These images are significant in pose, facial expression and illumination variations. In the ORL database, it contains 40 users with 10 images for each user. The images in this ORL database are significant in facial expression variation. In the Yale B database, it contains 38 users with 28 images with significant illumination variation for each user. PCA and LLE are evaluated on the databases based on different principal component lengths, which differed by a value of 20 starting from 20 until 200.
Database Feature length EER (%)
Database Nearest Neighbors Feature Length
YALE B 140 33.43
323
M2USIC 2006
TS- 3B
should not ignore the nonlinear features of face. This is because the nonlinear feature structure is able to provide accurate and reliable information about the face identity.
[7] Samuel Kadoury, “Face Detection using LLE” Department of Electrical and Computer Engineering McGill University, Montreal, Canada.
5. CONCLUSION
[8] S.Roweis and L. Saul, “Non-linear Dimensionality Reduction by Locally Linear Embedding”, Science, vol. 290, 2000
This paper presents a study review between the linear and nonlinear feature extraction techniques for face recognition. The linear method discussed in this paper is Principal Component Analysis, in which it is the most popular face feature extractor; on the other hand, Locally Linear Embedding is the example of nonlinear feature descriptor discussed in this paper. According to the empirical results, the LLE’s EER value is lower that the PCA’s EER value. This clearly states that LLE surpasses PCA as face feature extraction method. As a conclusion, a nonlinear method is more accurate for recognizing a face as the extracted nonlinear features are more significant for face identity representation.
6.
[9] Xiaoguang Lu, “Image Analysis for face Recognition” . Dept. of Computer, Science & Engineering, Michigan State of University, East Lansing. [10] Ying-Han Pang, “Moment Analysis in Biometrics-based Authentication Systems”, Master Thesis, Multimedia University, Malacca, Malaysia, 2005.
REFERENCES
[1] A.K Jain, A. Ross, S. Prabakhar, “An Introduction to Biometric Recognition”, IEEE Trans. On Circuits and Systems for Video Technology, Vol 14, o.1, pp 40-19, January 2004. [2] Alan Brooks, “Face Recognition: Eigenface and Fisherface Performance Across Pose” http://pubweb.northwestern.edu/~acb206/ece432/ FaceRecReport.html#refs. [3] “Biometrics from Wikipedia, the free encyclopedia”, http://en.wikipedia.org/w/index.php. [4] Dan Ventura, “Manifold Learning Examples PCA,LLE and ISOMAP”. Master Thesis, September 2005. [5] Dick de Ridder and Robert P.W. Duin, “Locally linear embedding for classification : a review, IEEE Transactions on Pattern Analysis and Machine Intelligence, Netherlands,2002, pg 1-13. [6] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proc. of Computer Vision and Pattern Recognition, pages 586-591. IEEE, June 1991b.
324
M2USIC 2006
TS- 3B
A New Approach to Multi-Image Method for 3D Model Reconstruction Mohd Azzimir Jusoh, Noor Roziana Rosli
[email protected],
[email protected] Comparison procedure will then be implemented on the images (of the object) based on their coordinates’ values, and as a result, the algorithm will come out with the list of the coordinates of the voxels that reflect the 3D model itself. The whole part here is considered to be the first phase of our Multi-Image method for 3D model development. In this paper, we only concentrate on the production of the 3D voxel coordinates, aiming to have the coordinates that closely represent the original 3D model. In future, for the next phase, we will use this collection of coordinates to produce the wireframe and solid texture-mapped 3D models. Having such technology, the research can contribute a better application in the area of animation, architecture, medical and construction. The remainder of this paper is structured as follow: in Section 2, we will highlight some of the existing multi-image methods together with their problems. Each of the algorithms owns its strength and lacks. For section 3, we present our multi-image method for developing the 3D model of an object based on least number of images. We show the result of the of acceptance test that was implemented and specify the strength and lacks of the algorithm. Finally, we draw our conclusion in Section 4.
ABSTRACT Developing three-dimensional (3D) model from images has captured researchers’ attention and become an ongoing research in the area of computer graphics and image processing. The algorithm basically works by producing 3D model based on numbers of two-dimensional (2D) images containing the intentional object from different viewpoints. Many algorithms have been developed in this area and successfully produced good results, but they still face some lacks and problems. In this paper, we present an automatic algorithm for reconstructing 3D model within short time period using five numbers of whiteprint images. At this point of time, we only concentrate on the production of the 3D Voxel Coordinates that will be used to reconstruct the 3D model. This method consumes less user involvement and can produce good 3D model within short time period.
Keywords: Immersive Technology, 3D Model Reconstruction, Image Processing, Whiteprint.
1. INTRODUCTION The method of reconstructing 3D model based on the 2D scene in multiple images has become an ongoing research in area of computer vision and image based modeling. Having multiple images of an object from different viewpoints, the algorithm attempts to produce the 3D model of the object within a short time period. Currently, there are lots of algorithms that involve in this area and have successfully developed the 3D model. However, upon the completion of reconstructing the model, the algorithms carry some problems, such as using lots of images, needs high-cost equipments, requires user intervention for running some tedious procedure on the images and the time taken to produce the 3D model [3, 4, 5]. We have the intention to produce an automatic algorithm that consume least number of images and eliminate the user involvement. Under this new method, the users’ job is just to specify the file names of the images to the algorithm. They will no involve in defining the polygonal regions using 2D photo editing tools, assign region correspondence, etc. Having the connection with the images, the algorithm can start to extract the important data from the images, including coordinates of the pixels, width and height of the object in the image, and so on.
2. PREVIOUS WORKS We study and analyze three multi photo methods for 3D modeling, which are the Cell Carving, Dense Surface Estimation and Semi-Automatic Region Growing. 2.1. Cell Carving The Cell Carving Algorithm is the generalization of the scene reconstruction algorithms [3] that reconstruct a 3D model based on the images of a scene (from arbitrary viewpoints) taken using omnidirectional camera. Basically, each scene is composed of points that can be projected into any camera by intersecting the image sphere with the directed line from the points to the camera's center of projection. Projecting all visible points in the scene onto the image sphere creates the images. This method of 3D model reconstruction requires the user to identify 2D polygonal regions. Using any simple segmentation tools, the user segments each image into a set of regions by selecting about 20 feature points in each image and assigning the point correspondences. The boundary of a region is defined by the outer and inner boundaries of a polygon. Each
325
M2USIC 2006
TS- 3B into each silhouette. Besides that, the algorithm only works for the images that contain the outside view of an object. It is hard for it to deal with the hidden surface of the object.
region is assigned with a unique identification number, in such a way that those regions with same identification numbers in two or more images are projecting to the same object in the world. To ease the user, the researchers have built a simple photo editing application that provides polylines and intelligent scissors. In this algorithm, we can see that the users have to specify the polygonal regions by themselves. For the first burden, they have to select about 20 points on each image. If there are lots of images involved, then there can be hundreds points to be selected. This can cause the user, especially the new one, to finish the jobs in a long time period.
3. NEW MULTI-IMAGE METHOD 3.1. Whiteprints We are familiar with the blueprint, which is conceptually a detailed plan or design documenting an architecture or engineering design. Basically, a blueprint is an image that contains white lines (representing the plan) on blue background. Nowaday, blueprints have mostly been replaced by the whiteprints, which have blue lines to represent the design on a white background. Beside whiteprints, the drawings are also being called as bluelines due to the existance of blue lines in the image. There are sometimes whereby whiteprints are incorrectly being called as blueprints. For this paper, we will be using whiteprints that contain black lines, and this type of image also be known as blacklines.
2.2. Dense Surface Estimation Under the Dense Surface Estimation [4], virtual models are reconstructed from sequences of uncalibrated images taken by a hand held camera. Based on the tracked or matched features points, the relations between the images are computed to retrieve the structure of the scene and the motion of the camera. In developing the initial reconstruction frame, two images are selected first and the initial reconstruction is refined and extended once the additional images are included. To obtain a more detailed model, the Dense Surface Estimation uses the structures and motions to constrain the correspondence search. The search attempts to match the image points along each image scan line in order to obtain a dense estimation of the surface geometry. For matching the ambiguities (restricted from projective to metric through autocalibration), the dynamic programming approach is implemented. 2.3. Semi-Automatic Region Growing (SARG) The Low-Cost Model Reconstruction system constructs a 3D model based on the 2D images taken from multiple viewpoints by using CAHV camera model [5]. A new algorithm known as the SemiAutomatic Region Growing (SARG) is developed based on the work of Niem [6] to facilitate the automatic extraction of silhouettes from images. SARG concentrates on the most relevant areas in the image to obtain fast automatic extraction of silhouettes. During the volume carving stage, the algorithm projects the volume (that represents object) into each silhouette and discards the sections of the volume that fall outside the silhouettes. As more silhouettes are added, the volume becomes closer approximation to the object. Finally, a voxel volume model is obtained. Shape from silhouette can never recover concavities [3]. This method faces some problems in dealing with the images that contains multiple, occluding objects. In this situation, it handles a hard job to project the volume that represents many objects
Figure 4: Example of whiteprint images
We propose an algorithm that will reconstruct a 3D model based on least number of images. For this method, we will be using five whiteprints images that will reflect the intentional object from five different viewpoints, which are the Top view (T), Front view (F), Right view (R), Back view (B) and Left view
326
M2USIC 2006
TS- 3B
(L). Compared to the ordinary photos, whiteprint images contain more accurate form of object inside them. Having the accurate one, it is easier for us to demonstrate the ability of the algorithm in reading the important data from the five images, comparing them in order to obtain the main voxel coordinates and finally, reconstructing the 3D model based on the coordinates.
x & z or y & z) and assign the other unselected value to zero (0). In doing this, the algorithms must capture the minimum or the maximum value of x, y or z for each image, so that the object representation in 3D will be placed near the origin inside the relevant plane. Table 1 shows the changes that must be done for each image. See the example in Figure 6. Top
3.2. Algorithm Implementation The algorithm starts the job by accepting five whiteprints images (T, F, R, B and L) as the input. In the algorithm, this is the only time that needs the user to intervene for specifying the filenames of the images. Having the images, the algorithm will first calculate the Width and Height of each image. The Width is calculated based on the total number of column (total number of pixel on x-axis) in the image, while the Height is based on the image row (total number of pixel on y-axis). Then, the algorithm begins to capture the coordinates of the pixels on the edge of the object and store them inside five different text files (T1.txt, F1.txt, R1.txt, B1.txt and L1.txt). As we know, a whiteprint image is just a 2D representation that contains x and y values. However, for the purpose of 3D model reconstruction, the algorithm stores the coordinate of a pixel in 3D form (x = current x-value, y = current y-value and z = 0). Subsequently, based on these coordinates, the algorithm extracts all the coordinates of pixels enclosed by the edge. The coordinates are then stored in different text files (T2.txt, F2.txt, R2.txt, B2.txt and L2.txt) in 3D coordinate form (x, y, z). By having these data, the algorithm will now concentrate on the object itself (Main Area which is surrounded by the edge) and ignore the unimportant white background.
new x
=
old x - min(x)
new y
=
0
new z
=
old y - min(z)
new x
=
old x - min(x)
new y
=
max(y) - old y
new z
=
0
new x
=
0
new y
=
max(y) - old y
new z
=
max(z) - old x
new x
=
max(x) - old x
new y
=
max(y) - old y
new z
=
0
new x
=
0
new y
=
max(y) - old y
new z
=
old x - min(z)
Front
Right
Back
Left
Table 1: Calculation for obtaining the voxel coordinates.
In Figure 6, we show an example of converting the five 2D Coordinate Images of a Mini Cooper into 3D Coordinate Images. In following images, there exist three colors of lines, which are the red color (for x-axis), the green color (for y-axis) and white color (for z-axis). All the left images are placed on the xyplane before being placed on the respective planes.
Figure 5: Example of the input image
From here onwards, the whiteprint images will not be used anymore, as the algorithm has extracted and captured the important coordinates. The algorithm will be using the stored coordinates for the next process and we will later refer the files that store the coordinates as the 2D Coordinate Images. For the next step, the algorithm needs to convert each pixel (2D) coordinate values to the relevant voxel (3D) coordinate. Different 2D Coordinate Image will undergo different changes, as the algorithm will change two values of the image (x & y,
(T2.txt)
(3D_T.txt)
(F2.txt)
327
(3D_F.txt)
M2USIC 2006
TS- 3B the lowest value of x, y or z depends on the images used as Main Image and Compared Image. Then, the algorithm attempts to obtain the DifferentWidth, DifferentHeight or DifferentDepth and implement the values to calculate the new voxel coordinates. 1: Read Main Image, MI and Compared Image, CI 2: Find the highest x value in each row of MI, or Find the highest y value in each column of MI, or Find the highest z value in each dim of MI, or Find the lowest x value in each row of MI, or Find the lowest y value in each column of MI, or Find the lowest z value in each dim of MI. 3: Find the highest x value in each row of CI, or Find the highest y value in each column of CI, or Find the highest z value in each dim of CI, or Find the lowest x value in each row of CI, or Find the lowest y value in each column of CI, or Find the lowest z value in each dim of CI. 4: DifferentWidth = highest x MI – highest x CI, or DifferentHeight = highest y MI – highest y CI, or DifferentDepth = highest z MI – highest z CI. 5: Read all coordinates (x, y, z) of MI 6: while coordinates (x, y, z) of MI exist do 7: x = old x; or x = old x – DifferentWidth(MI,CI) 8: y = old y; or y = old y – DifferentHeight(MI,CI) 9: z = old z; or z = old z – DifferentDepth(MI,CI) 10: end while 11: save the coordinate inside MIfromCI.txt.
(R2.txt)
(3D_R.txt)
(B2.txt)
(3D_B.txt)
(L2.txt)
(3D_L.txt)
Figure 6: Converting 2D Coordinate Images of a Mini Cooper into 3D Coordinate Images.
Based on this conversion, the algorithm will place T image on the xz-plane, F and B images on the xyplane and R and L images on the yz-plane. From the conversion, five new text files are created to store the 3D voxel coordinates, including 3D_T.txt, 3D_F.txt, 3D_R.txt, 3D_B.txt and 3D_L.txt. Later, the 3D coordinate files will be referred as 3D Coordinate Images. The algorithm will start to compare once it has the 3D Coordinate Images of T, F, R, B and L. Basically, the comparison is the most important part in the method, whereby in here, the algorithm will compare the 3D coordinate images among each other. By the comparison, the 3D voxel coordinates of each image will be changed to new positions that reflect the accurate 3D model. To do this, the algorithm has to calculate some important data from the 3D coordinate images, for example the highest x, y or z values and lowest x, y or z values in each row or column. All the data will stored inside arrays. Based on the values, the algorithm may obtain DifferentWidth, DifferentHeight or Different_Depth. The difference will be counted with current (old) x, y or z value to obtain the new voxel coordinates. Figure 8 shows the pseudo code of the algorithm implementation. In the first line, Main Image (MI) refers to the image that will be compared, while Compared Image (CI) represents the image to which MI will be compared. For example, when comparing T with F, T is the MI and F will be the CI. In Line 2 and 3, the algorithm will calculate either highest or
Figure 7: Comparison Algorithm
Basically, there are 16 comparisons that must be applied on the 3D T, F, R, B and L Coordinate Images (see Table 2).
328
Compare
.txt Files
T with F MI = T CI = F T with R MI = T CI = R T with B MI = T CI = B T with L MI = T CI = L F with T MI = F CI = T F with R MI = F CI = R F with L MI = F CI = L R with T MI = R
3D_T 3D_F
TfromF 3D_R
TfromR 3D_B
TfromB 3D_L
3D_F TfromL
FfromT 3D_R
FfromR 3D_L
3D_R TfromL
Conversion x=old x y=old y - DH(T,F) z=old z x=old x y=old y - DH(T,R) z=old z x=old x y=old y - DH(T,B) z=old z x=old x y=old y - DH(T,L) z=old z x=old x y=old y z=old z - DD(F,T) x=old x y=old y z=old z - DD(F,R) x=old x y=old y z=old z - DD(F,L) x=old x - DW(R,T) y=old y
Result TfromF.txt
TfromR.txt
TfromB.txt
TfromL.txt
FfromT.txt
FfromR.txt
FfromL.txt
RfromT.txt
M2USIC 2006 CI = T R with F MI = R CI = F R with B MI = R CI = B B with T MI = B CI = T B with R MI = B CI = R B with L MI = B CI = L L with T MI = L CI = T L with F MI = L CI = F L with B MI = L CI = B
TS- 3B
RfromT FfromL
RfromF 3D_B
3D_B TfromL
BfromT RfromB
BfromR 3D_L
3D_L TfromL
LfromT FfromL
LfromF BfromL
z=old z x=old x - DW(R,F) y=old y z=old z x=old x - DW(R,B) y=old y z=old z x=old x y=old y z=old z - DD(B,T) x=old x y=old y z=old z - DD(B,R) x=old x y=old y z=old z - DD(B,L) x=old x - DW(L,T) y=old y z=old z x=old x - DW(L,F) y=old y z=old z x=old x - DW(L,B) y=old y z=old z
RfromF.txt
RfromB.txt
BfromT.txt
BfromR.txt
BfromL.txt
LfromT.txt
LfromF.txt
LfromB.txt
* DW = DifferentWidth
* MI = Main Image
* DH = DifferentHeight
* CI = Compared Image
* DD = DifferentDepth
Table 2: 16 Comparisons Done.
Finally, the algorithm will combine all the 3D coordinates from five text files, including TfromL.txt, FfromL.txt, RfromB.txt, BfromL.txt and LfromB.txt. A text file named 3D_Model.txt will store these important coordinates that represent the form of the 3D model. 3.3. Result We implement the algorithm in Microsoft Visual C++ 6.0 together with the OpenGL and CImg. Having the algorithm, we successfully run the acceptance test on a personal computer with the ability of 2.1 GHz AMD Athlon XP equipped with 512 MB of RAM. We have provided a few sets of 5 images (T, F, R, B and L) to the algorithm, and it will process each of them to produce the 3D voxel coordinates. Figure 8 shows an example for reconstructing a representation of Mini Cooper based on five whiteprint images.
Figure 8: Example: Reconstructing a Mini Cooper
From the acceptance test that been implemented on various sets of input images, the algorithm has successfully collected their voxel coordinates and
329
M2USIC 2006
TS- 3B producing the 3D Coordinate Images. Subsequently, the algorithm can start to compare the 3D Coordinate Images one by one, as there are 16 comparisons that must be done. Upon finishing this part, the algorithm obtains the collection of 3D voxel coordinates that closely represent the form of the intentional 3D model. Basically, all the process takes less than twenty seconds to be completed on various sets of input images. We implement this algorithm with the intention to produce an automatic way for reconstructing 3D model based on least number of images in short time period without user intervention. Besides the advantages, there are still some minor problems in the algorithm that requires some modifications, including convexity and unnecessary coordinates. In future, we will enhance the algorithm to deal with such a problem and able to produce a better result in a more efficient way.
reconstructed the 3D models in less than twenty (20) seconds. The results here show that the algorithm has the ability to produce a collection of coordinates that closely represent the original 3D model within a short time period. There are still some minor problems faced while implementing the 3D model reconstruction algorithm. Basically, at this moment, the algorithm cannot deal with the convex surface and hidden part of an object. For example, the T image of a cup contains the representation of the convex part. However, from the other points of view (F, R, B and L), this convexity never exists. So, when the algorithm has finished collecting all the important voxel coordinates and reconstructed the model, we may have a 3D model of a cup without any convex part in it. Besides that, the algorithm sometimes produces a small number of unnecessary voxel coordinates in certain 3D models. This problem occurs in the voxel production (comparison) part whereby when comparing 2 images, the algorithm tends to produce such coordinates. After combining all the coordinates from the comparison, we may have the 3D model that contains the unnecessary points. We believe that by implementing a sort of filter in the algorithm, the unnecessary one can be taken out and will not be exist inside the final 3D model. At this point of time, we have successfully collected the voxel coordinates that closely represent the original 3D model, which means that we solve the first phase of the Multi-Image method for 3D model reconstruction. For future works, we will continue with the next phase, which is to produce 3D model under wireframe and solid modes. We may need to implement the concept of interpolation, if the voxel coordinates that we have are not enough to be used for reconstructing a smooth and nice 3D model.
REFERENCES [1]
[2]
[3]
[4]
4. CONCLUSION [5]
In this paper, we present a way to reconstruct a 3D model of an object based on five accurate whiteprint images containing the representation of the object from five different viewpoints. The automatic algorithm requires no user involvement, as the user just needs to run a simple external task, which is providing five whiteprint images to the algorithm. At this point of time, we only concentrate on the production of the 3D voxel coordinates, aiming to have the coordinates that closely represent the original 3D model. In future, for the next phase, we will use this collection of coordinates to produce the wireframe and solid texture-mapped 3D models. Having the input, the algorithm starts to calculate the width and height of each image and capture the coordinates of pixels on and inside the edge of the object. All the coordinates are saved in 3D format inside different text files (2D Coordinate Images). From here, the 2D Coordinate Images will be placed on the respective 3D plane near to the origin,
[6]
[7]
[8]
330
M. A. Jusoh, Q. A. Salih. “Extracting 2D Objects from Multiple Photos for 3D Model Reconstruction”, MMU International Symposium on Information and Communication Technology, Cyberjaya, Malaysia, October 2004. M. A. Jusoh, Q. A. Salih. “Comparison of MultiPhoto Methods for 3D Modeling”, International Conference on Computer Graphics, Imaging and Visualization, Penang, Malaysia, July 2004. R. Ziegler, W. Matusik, H. Pfister, and L. McMillan, “3D Reconstruction Using Labeled Image Regions”, Eurographics Symposium on Geometry Processing, 2003. M. Pollefeys, M. Vergauwen, F. Verbiest, K. Cornelis, and L. Van Gool, “From Image Sequence to 3D Models”, 3rd International Workshop on Automatic Extraction of Man-made Objects from Aerial and Space Images, June 2001, p. 403-410. C. Lyness, O.C. Marte, B. Wong, and P. Marais. “Low-Cost Model Recosntruction from Image Sequences”, AFRIGRAPH, Capetown, South Africa, 2001. E. Mortensen and W. Barret, “Intelligent Scissors for Image Composition”, SIGGRAPH98, 1998, pages 310. W. Niem. “Error Analysis fo Silhouette-Based 3D Shape Estimation from Multiple Views”, International Workshop on Synthetic - Natural Hybrid Coding and Three Dimensional Imaging, 1997. R Debevec, C. Taylor, and J. Malik, “Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach”, SIGGRAPH 96 Annual ConCerence Series, 1996.
M2USIC 2006
TS- 3B
Feature Selection for Texture-Based Classification Scheme Using Multimodality for Liver Disease Detection Sheng-Hung Chung 1 and Rajasvaran Logeswaran 2 Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia. 1
[email protected] ,
[email protected]
Abstract
2
a) Focal liver diseases, where the pathology is concentrated in a small area, while the rest of the liver volume remains normal. b) Diffused liver diseases, where the pathology is distributed over the liver volume. For the diagnosis of liver diseases, the patient needs to undergo a series of examinations, where a number of imaging techniques are employed, such as Magnetic Resonance Imaging (MRI), Computed-Tomography (CT), Ultrasound, Position Emission Tomography (PET) and Single Photon Emission Computed Tomography (SPECT). This study presents the approach for selection of appropriate features extracted using SGLCM and FOS for liver diseases detection, namely, liver cyst and fatty liver, using MRI, CT and Ultrasound images. Figure 1 shows the methodology employed in this study. Three types of imaging modalities (MRI, CT and Ultrasound) were applied with FOS and SGLCM to evaluate the features for further classification of liver textures. The statistical experiment on feature extraction is applied to produce the classification of focal liver disease (liver cyst) and diffused liver disease (fatty liver) from healthy liver.
In this study, the use of texture analysis is proposed for the detection of abnormalities in combined imaging modalities, namely Magnetic Resonance Imaging (MRI), Computed-Tomography (CT) and Ultrasound. Statistical texture analysis techniques based on FirstOrder Statistics (FOS) and Spatial Grey-Level CoOccurrence Matrix (SGLCM) have been used in this study as a fundamental approach for selecting appropriate features that may be used to discriminate different liver textures. FOS features employed were, average grey-level, kurtosis, skewness, coefficient of variance, standard deviation and entropy whereas Second-Order Statistics (SOS) derived using SGLCM includes entropy, correlation, homogeneity and contrast. The results extracted indicate that the proposed texture analysis methodology is able to successfully characterise liver cyst, fatty liver and healthy liver in clinical test images using the diagnostic parameters obtained. Keywords: Spatial Grey-Level Co-Occurrence Matrix (SGLCM), First-Order Statistics (FOS), Magnetic Resonance Imaging (MRI), Computed-Tomography (CT), Ultrasound, Second-Order Statistics
1. Introduction There are many image texture analysis techniques developed over the years. The common ones are Spatial Grey-Level Co-Occurrence Matrix (SGLCM) [1], First-Order Statistics (FOS) [2], Laws Texture Energy Measures (TEM) [3], Fourier Power Spectrum (FPS) [3], Grey-Level Difference Statistics (GLDS) [4] and Grey-Level Run Length Method (GLRLM) [5]. Liver diseases are seriously as the liver is one of the most vital organs in the human body. Liver diseases can be divided into two main categories [6]:
Figure 1 Features selection based on FOS and SGLCM for classification of liver texture
331
M2USIC 2006
TS- 3B
2. Texture Analysis
measures the peakedness or tail prominence of the distribution. It is 0.0 for the Gaussian distribution and is invariant under a linear grey-scale transformation.
Image blocks (4 x 4 pixels) are first extracted using a Region of Interest (ROI) tool from each test image. These image blocks are then fed into SGLCM and FOS to construct spatial matrices for FOS and Second-Order Statistics (SOS) features extraction. The extracted features from FOS are average grey-level, standard deviation, kurtosis, skewness, entropy and coefficient of variance. The parameters used in the SGLCM are entropy, correlation, contrast and homogeneity. The details of the texture analysis algorithms and features used are discussed below.
Coefficient of variance,
G −1 i=0
iH i
The SGLCM aspect of texture is concerned with the spatial distribution and spatial dependence among the grey-level in a local area. This concept was first used by Jules [9] in texture discrimination experiments. Being one of the most successful methods for texture discrimination at present [10], we chose to investigate its optimum features in the different imaging modalities. The method is based on the estimation of the second order joint conditional probability density function in (7) [9].
(1)
H i is the number of pixels with grey-level i in an
Standard deviation, σ
=
i =0
(i − µ ) 2 H i
1 2
(7) f (i , j | d ,θ ) = 0°, 45°, 90°, 135°, 180°, 225°, 270° and
where 315°
image and , G is the number of grey-level in the image. G −1
Each f(i,j|d, ) is the probability of going from greylevel i to grey-level j, given that the inter-sample spacing is d and the direction is given by the angle . The estimated value for these probability density functions can thus be written in matrix form (8) [9].
(2)
measures the global contrast in the image. Entropy,
s=−
G −1 1= 0
H i log H i
φ ( d ,θ ) = [ f (i , j , | d ,θ )] (3)
1
γ1 =
G −1
σ3
i =0
(i − µ ) 3 H i
(4)
φ ( d ,0) = φ t (d ,180) φ (d ,45) = φ t (d ,225) φ (d ,90) = φ t (d ,270) φ (d ,135) = φ t ( d ,315)
measures the extent to which outliers favor one side of the distribution. Skewness is invariant under a linear grey-scale transformation. Kurtosis,
γ2 =
1
σ
4
G −1 i =0
(i − µ ) 4 H i − 3
(8)
For computing these probability distribution functions, scanning of the image in four directions has been carried out in this work, with = 0°, 45°, 90° and 135° being sufficient [11], since the probability density matrix for the rest of the directions can be computed from these four basic directions, as denoted in (9) [9].
measures the uniformity of the histogram, a quantity widely used in image compression. Skewness,
(6)
2.2 Spatial Grey-Level Co-Occurrence Matrix (SGLCM)
FOS describes the grey-level distribution without considering spatial independence. As a result FOS can only describe echogenity of texture and the diffuse variation characteristics within the ROI [7]. The algorithms of the extracted features are described by the following equations [8]:
µ=
σ µ
is invariant under a change of scale. Thus if the intensity scale has a natural zero, the cν will be a scale invariant measure of global contrast.
2.1 First-Order Statistics (FOS)
Average grey-level,
cv =
(9)
where φ ( d , θ ) denotes the transpose of the matrix for the inter-sample spacing d, and direction, . t
(5)
332
M2USIC 2006
TS- 3B Table 2 Correlation results for liver classes based on directions, = 0°, 45°, 90°, 135° for MRI images
3. Feature Extraction and Selection Feature extraction is the first stage of image analysis. The results obtained from this stage are used for texture discrimination or texture classification [8]. The extraction of features derived from FOS and SGLCM is important for the correct classification of the liver diseases. The selection of suitable and efficient feature properties determines the classification rate of the liver diseases in further work. The subresults of 21 images out of the experiment with 462 clinical liver images is illustrated in this section. The results of the four optimum features derived from SGLCM are shown in Tables 1-4, acquired from the medical records of the collaborating hospital. The ROI denoted as T1-T7 belongs to liver cyst, T8-T14 is fatty liver and T15-T21 is healthy liver. The values of the SGLCM derived are carried out experimentally using directions = 0°, 45°, 90° and 135°, and sample distance, d = 1.
MRI Liver Classes
Liver Cyst
Fatty Liver
Table 1 Entropy results for liver classes based on directions, = 0°, 45°, 90°, 135° for MRI images MRI Liver Classes
Liver Cyst
Fatty Liver
Healthy Liver
Entropy at direction
ROI
0°
45°
90°
135°
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21
5.802 5.487 5.477 5.339 5.174 5.477 5.884 2.880 2.921 2.971 2.871 2.883 2.931 3.124 0.054 0.082 0.554 0.887 0.574 0.114 0.774
5.724 5.524 6.554 6.774 6.694 5.884 6.145 2.869 2.993 3.325 3.692 2.960 3.118 3.145 0.692 0.281 0.784 0.231 0.884 0.145 0.954
6.425 5.925 6.072 6.241 6.131 6.082 6.281 3.101 2.991 3.448 3.600 3.421 3.532 3.381 1.054 1.082 1.054 1.607 1.177 1.484 1.074
6.484 6.183 7.854 7.692 7.911 7.054 7.692 3.248 3.025 3.432 3.821 3.721 3.862 3.102 1.692 1.954 1.281 1.784 1.231 1.884 1.145
Healthy Liver
Correlation at direction,
ROI
0°
45°
90°
135°
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21
5.962 6.127 6.493 6.384 6.128 6.773 6.237 2.302 2.300 2.321 2.370 2.410 3.156 2.186 0.110 0.120 0.152 0.133 0.142 0.071 0.140
6.403 6.241 6.554 6.774 6.694 6.884 6.345 2.924 2.811 2.703 2.718 2.843 3.481 2.916 0.241 0.231 0.214 0.231 0.284 0.145 0.254
6.825 6.325 6.610 6.941 6.910 6.904 6.431 3.164 3.119 2.321 3.860 2.994 3.494 3.994 1.314 1.312 1.224 1.167 1.437 1.224 1.284
6.854 6.483 6.854 6.997 6.111 6.341 6.562 4.269 4.702 4.164 4.718 4.932 4.156 4.321 1.500 1.431 1.322 1.311 1.410 1.374 1.350
As observed in Tables 1 and 2, SGLCM analysis shows that entropy and correlation are useful for characterising liver cyst, fatty liver and healthy liver in MRI images as the values obtained for each class do not overlap, with a sufficiently large interval between them. The study of feature selection was concerned with the task of classifying discriminatory information used to classify the images [8]. For MRI liver images, entropy is in the range of 5.174-7.911, with correlation within 5.962-6.997 for liver cyst. As for fatty liver classification, entropy is in the range of 2.487-3.862, correlation within 2.300-4.932. Finally, for healthy liver, entropy is 0.054-1.954 and correlation 0.0711.500. Attempts with the other SGLCM features were unsucessful as their ranges overlapped between two or more tissue classes and not suitable for liver tissue characterisation. For CT liver images (see Table 3), only correlation is selected for classification for liver cyst, fatty liver and healthy liver. Correlation results for the liver
333
M2USIC 2006
TS- 3B order measurement distributions change very slightly with distance, while for coarse textures (liver cyst and fatty liver), the change in the distribution is rapid [12].
tissues in CT range from 4.012-4.950 for liver cyst, 3.170-3.900 for fatty liver and 2.160-2.850 for healthy liver. The other SOS measurements were evaluated for the characterisation of CT liver images however entropy, contrast and homogeneity results for CT in this study did not show non-overlapping obvious classification ranges.
Table 4 Contrast results for liver classes based on directions, = 0°, 45°, 90°, 135° for Ultrasound images Ultrasound Liver Classes
Table 3 Correlation results for liver classes based on directions, = 0°, 45°, 90°, 135° for CT images CT Liver Classes
Liver Cyst
Fatty Liver
Healthy Liver
Correlation at direction,
ROI
0°
45°
90°
135°
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21
4.054 4.143 4.012 4.236 4.531 4.215 4.222 3.692 3.548 3.542 3.583 3.690 3.170 3.806 2.160 2.223 2.155 2.507 2.259 2.251 2.170
4.942 4.903 4.330 4.771 4.445 4.584 4.638 3.631 3.690 3.696 3.921 3.633 3.338 3.118 2.108 2.267 2.172 2.340 2.333 2.613 2.192
4.007 4.101 4.410 4.834 4.874 4.798 4.050 3.068 3.692 3.833 3.028 3.511 3.178 3.629 2.203 2.260 2.233 2.352 2.286 2.450 2.784
4.950 4.500 4.569 4.863 4.510 4.116 4.516 3.224 3.711 3.528 3.336 3.624 3.178 3.900 2.176 2.259 2.170 2.322 2.457 2.436 2.850
Liver Cyst
Fatty Liver
Healthy Liver
Among the SOS features obtained for Ultrasound images, only contrast was successful for classification of liver cyst, fatty liver and healthy liver (see Table 4). The contrast values for liver cyst is (3.012-4.571), fatty liver (1.450-2.699) and healthy liver (0.103-1.109). SGLCM feature extraction based on entropy, correlation and homogeneity were investigated. However, there was no significant difference among the three liver tissues classes for these features. As a result, the optimum SOS features achieved in the SGLCM tests are encouraging for each imaging modalities. For soft textures (healthy liver), the second
Contrast at direction,
ROI
0°
45°
90°
135°
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21
3.054 3.143 3.012 3.236 3.531 3.215 3.222 1.692 1.548 1.542 1.583 1.690 1.450 1.906 0.216 0.223 0.115 0.107 0.259 0.251 1.109
3.952 3.993 3.330 3.771 3.445 3.984 3.938 1.631 1.690 1.696 1.921 1.633 2.338 2.118 0.104 0.267 0.102 0.340 0.333 0.613 1.102
4.007 4.101 3.410 3.834 3.874 3.998 4.050 2.068 1.692 1.833 2.028 2.511 2.563 2.629 0.103 0.260 0.103 0.352 0.286 0.450 1.084
4.571 4.500 3.569 3.963 3.510 4.116 4.516 2.224 1.711 2.528 2.336 2.624 2.628 2.699 0.381 0.259 0.160 0.322 0.457 0.436 1.106
Figure 2 Average grey-level for three liver classes, liver cyst, fatty liver and healthy liver, in MRI
334
M2USIC 2006
TS- 3B
4. Results and analysis
Table 5 Average grey-level based on FOS for MRI MRI Liver cyst Fatty liver Healthy liver
In the approach described above, feature selection for appropriate parameters based on FOS and SGLCM has been presented. For each acquired image, a ROI image block of 4 x 4 pixels is selected to produce SGLCM matrices at angles of 0°, 45°, 90° and 135° using the inter-sampling distance, d=1. From Table 7, it can be found that entropy and correlation are consistent within the liver classes for MRI images. For contrast and homogeneity, statistical results do not show a significant difference among the features of different liver classes. Therefore it is not possible to use contrast and homogeneity for classification of MRI image textures. Thus, it is concluded that these two features are potentially useful for discrimination of these classes.
Average grey-level Min 92 41 5
Max 146 64 23
From the experiment, it has been found that the features obtained from FOS, specifically, standard deviation, entropy, skewness, kurtosis and coefficient of variance, did not show good classification results other than average grey-level. From Figure 2, it is clearly observed that the average grey-level is consistent within a liver tissue class in MRI images. The parameters used from FOS for MRI liver images is shown in Table 5. The average grey-level results for liver cyst, fatty liver and healthy liver for CT images range from 91143, 38-63 and 10-25, respectively (see Table 6). From the preliminary study, the average grey-level is suitable for classification of the liver classes, as observed in Figure 3.
Table 7 Summary of Tables 1 and 2 for entropy and correlation (MRI) Liver Classes Liver cyst Fatty liver Healthy liver
Table 6 Average grey-level based on FOS for CT CT-scans Liver cyst Fatty liver Healthy liver
Average grey-level Min 91 38 10
Entropy Min 5.174 2.487 0.054
Max 7.911 3.862 1.954
Correlation Min 5.962 2.300 0.071
Max 6.997 4.932 1.500
With regards to the SGLCM results obtained for CT images, only correlation is consistent within the liver classes. The summarised optimum correlation feature range selected based on Table 3 are listed in Table 8. As for Ultrasound images, only contrast is consistent within the liver classes. The features with obvious classification ranges for FOS are average grey-level for MRI and CT liver images and none for Ultrasound.
Max 143 63 25
Table 8 Summary of Table 3 and Table 4 for correlation (CT) and contrast (Ultrasound) Liver Classes
Liver cyst Fatty liver Healthy liver
Figure 3 Average grey-level for liver three liver classes, liver cyst, fatty liver and healthy liver in CT
335
Correlation (CT)
Min 4.230 3.170 2.160
Max 4.950 3.900 2.850
Contrast (Ultrasound)
Min 3.012 1.450 0.103
Max 4.571 2.699 1.109
M2USIC 2006
TS- 3B
5. Future Work
[3] M. Michel, “Fatty liver: Nonalcoholic Fatty Liver Disease (NAFLD) and Nonalcoholic Steatohepatitis (NASH)”, MedicineNet, 2005, pp. 1-14.
This paper presents some feature extraction experimentation for tissue characterisation by using SGLCM and FOS. Preliminary results shows that FOS and SGLCM are able to classify liver cyst and fatty liver. Successful features in this study are to be used as parameters in a neural networks classification system. We believe that through the work done and initial evaluation of features on FOS and SOS for multimodebased systems can be integrated into the existing clinical environment to evaluate more imaging modalities. Additional popular modalities include PET, SPECT and X-ray. In addition, test on other liver tissue classes for diseases such as Cirrhosis, liver tumour and Hepatitis B Viral, could also be explored.
[4] L. A. Adams, J. F. Lymp, J. St Sauver and S. O. Sanderson, “The natural history of nonalcoholic fatty liver disease: a population-based cohort study”, Gastroenterology, vol. 129, 2005, pp. 113–121. [5] S. Kitaguchi, S. Westland and M. R. Luo, “Suitability of Texture Analysis Methods for Perceptual Texture”, Proceedings, 10th Congress of the International Colour Association, vol. 10, 2005, pp. 923-926. [6] M. Bister, J. Cornelis, Y. Taeymans and N. Langloh, “A generic labeling scheme for segmented cardiac MR images”, Proceedings of Computers in Cardiology, vol. 5, no. 2, pp. 45-48, 1990. [7] L. L. Lanzarini, A.C. Camacho, A. Badran and D. G. Armando, “Images Compression for Medical Diagnosis using Neural Networks”, Processing and Neural Networks, vol. 10, no. 5, 1997, pp. 13-16. [8] R. Haralick, “Statistical and Structural Approaches to Texture”, Proceedings of IEEE, vol. 67, no. 5, 1979, pp. 786-804. [9] B. Julesz, ”Experiments in the Visual Perception of Texture”, Scientific American, vol. 232, no. 4, 1975, pp. 34-43.
6. Conclusion This paper has shown useful features in the use of texture analysis techniques for the extraction of diagnostic information from different imaging modalities. The successful features extracted from FOS in MRI and CT images is average grey-level whereas SGLCM successfully classified liver tissues using entropy and correlation (MRI), correlation (CT) and contrast (Ultrasound). It is hoped that the features derived here will lead to higher classification performance in future CAD systems, for the use of preliminary diagnosis of liver diseases.
[10] I. Valanis, S. G. Mougiakakou, K. S. Nikita and A. Nikita, “Computer-aided Diagnosis of CT Liver Lesions by an Ensemble of Neural Network and Statistical Classifiers”, Proceedings of IEEE International Joint Conference on Neural Networks, vol. 3, 2004, pp. 19291934. [11] E. L. Chen, P. C. Chung, C. L. Chen, H. M. Tsai and C. L. Chang, “An automatic diagnostic system for CT liver image classification”, IEEE Transactions on Biomedical Engineering, vol. 45, no. 6, 1996, pp. 783-794.
References [1] A. H. Mir, M. Hanmandlu and S. N. Tandon, “Texture Analysis of CT Images”, IEEE Transactions on Computers, vol. 42, no. 4, 1993, pp. 501-507.
[12] S. G. Mougiakakou, I. Valanis, K. S. Nikita, A. Nikita and D. Kelekis, “Characterization of CT Liver Lesions Based on Texture Features and a Multiple Neural Network Classification Scheme”, Proceedings of the 25th Annual International Conference of the IEEE EMBS, vol. 2, no. 3, 2001, pp. 17-21.
[2] S. I. Kim, K. C. Choi and D. S. Lee, “Texture classification using run difference matrix”, Proceedings of Ultrasonics Symposium, vol. 2, no. 10, 1991, pp. 1097-1100.
336
SESSION TS3C TOPIC: MULTIMEDIA SESSION CHAIRMAN: Dr. Lee Chien Sing _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________ Image Segmentation Using Wavelet Based Agglomerative Hierarchical Clustering Techniques Chang Yun Fah Ee Kok Siong (Multimedia University, Cyberjaya, Selangor, Malaysia)
337
9.00am
TS3C-1
9.20am
TS3C-2
Feature Vector Selection Based On Freeman Chain Code for Shape Classification Suzaimah Ramli Mohd Marzuki Mustafa Aini Hussain (UKM, Cheras Kuala Lumpur, Malaysia)
343
9.40am
TS3C-3
Weed Feature Extraction Using Gabor Wavelet and Fast Fourier Transform Asnor Juraiza Ishak Mohd.Marzuki Mustafa Aini Hussain (Universiti Kebangsaan Malaysia, Bandar Baru Bangi, Selangor, Malaysia)
347
M2USIC 2006
TS- 3C
Image Segmentation Using Wavelet Based Agglomerative Hierarchical Clustering Techniques
CHANG YUN FAH, EE KOK SIONG Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Selangor, MALAYSIA.
[email protected],
[email protected]
Abstract: Image segmentation is a process to find a homogeneous region in an image. In this paper, a feature vector is constructed based on Haar wavelet and Daubechies Db2 wavelet approximate coefficients. The feature vector is then incorporated into agglomerative hierarchical clustering techniques to perform segmentation on an image. The experimental results showed that the Db2 wavelet coefficient is a good indicator for image features. It is also outperforms Haar wavelet coefficient and statistical based feature vectors. Keywords: image segmentation, wavelet transform, agglomerative hierarchical clustering, and ward linkage.
1.
is used to find the clusters from the feature vectors;
INTRODUCTION
hence it is an alternative method to the unsupervised Image segmentation is defined as a process which
image
segmentation
[6].
The
agglomerative
subdividing an image into a number of regions, where
hierarchical clustering techniques allow a minimum
each region is homogeneous [1]. The purpose of image
users intervention in adjusting the quality of
segmentation is to identify data arrays that have similar
segmentation based on their needs and applications.
properties and matches them with an object in the real
The desired segmentation quality can be achieved by
world [2]. Image segmentation has been extensively
choosing the sub-matrix size, and the level of wavelet
applied in many fields such as image visualization,
decomposition.
image coding, image synthesis, and pattern recognition.
This paper is a continuous study to our previous
Extensive discussion on fuzzy clustering, K-mean
work in [6] where the feature vector is constructed
clustering, and segmentation using mixture distribution
using various statistic values such as mean, standard
have been carried out. Besides, wavelet transforms also
deviation, minimum and maximum of the intensity
play an increasing role in segmentation tasks. For
values.
example, Salari and Ling [3] performed K-mean clustering on wavelet Db4 coefficients for texture segmentation. Other examples on segmentation using
2.
DESCRIPTION OF TECHNIQUES
wavelet are given in [4] and [5]. In this study, an image is divided into different
Let X be the m × n matrix of an image, where m
disjoint windows or sub-matrices. It is followed by
is the number of rows and n is the number of columns.
performing the wavelet transform to obtain the
The image is divided into q sub-matrices of size w × w ;
approximate and details coefficients. The approximate
i.e.
coefficients are considered as a feature vector to represent the characteristic of each window. Finally the agglomerative hierarchical clustering (AHC) technique
337
f h ( 0 ) = {xh ,1 , xh,2 ,..., xh, w2 },
(1)
M2USIC 2006
TS- 3C
for h = 1, 2,...q , where xhj ,
j = 1, 2,K , w2 are the
As we want to construct a feature vector that representing the original signal, the approximation
intensity values of the h-th sub-matrix.
coefficients a j (h, k ) will be constructed to form the feature vector: ⎧ ⎫ f h ( k ) = ⎨a1 ( h, k ) , a2 ( h, k ) ,..., aw2 ( h, k ) ⎬ . 2k ⎩ ⎭
2.1 Wavelet Transforms
Discrete wavelet transform (DWT) has been widely used in various image applications, such as
(8)
Note that the size of f h ( k ) is half of the size of f h ( k − 1) .
compression and feature detection [7]. Here, we propose to use it for constructing feature vectors. The
2.2 Hierarchical Clustering
DWT can be performed by convolving the original
f h ( 0 ) with two basic functions; namely
The aim of cluster analysis is to search for the
scaling function ψ (x) and wavelet function ϕ (x) to
optimum number of grouping so that the observations
produce the transform pair
or objects in a particular cluster are similar, but the
signal
1
a j (h, k ) =
2k 1
d j (h, k ) =
2k
∑f
h
(k − 1)ϕ ( x)
(2)
(k − 1)ψ ( x)
(3)
clusters are dissimilar to each other [10]. A distance
x
∑f
matrix, D = [ Dhl ] is computed from the feature h
vectors where Dhl , h, l = 1, 2,L, q is the Euclidean
x
where k is the level of wavelet decomposition. The values of a j (h, k ) and d j (h, k ) are called approximate coefficient and details coefficient, respectively.
Daubechies Db2. Haar wavelet is one of the simplest and oldest orthonormal wavelet [8] that provides the lowest computational cost and memory efficient [11,12]. The basic functions of Haar wavelet are
⎧ 1, ⎩ 0,
ϕ (x) = ⎨
x ∈
[0 , 0 . 5 ] [0 . 5 , 1 ]
vector f l . The hierarchical clustering [10] process starts with q feature vectors (from q sub-matrices) and group
The wavelet transforms we used are the Haar and
⎧ 1, ⎪ ψ ( x ) = ⎨ − 1, ⎪ 0, ⎩
distance measure of feature vector f h and feature
the two nearest or most similar feature vectors into a cluster, thus reducing the number of cluster to q – 1. Two feature vectors, say f h and f l are fused to form a cluster if Dhl is the smallest entry in matrix D. Then, let f hr as the h-th feature vector in r-th cluster and
x ∈ e ls e w h e r e
(4)
x ∈ [0 ,1] e ls e w h e re
(5)
f l s as the l-th vector in s-th cluster. The distance
between feature vectors f hr and f l s is calculated using the AHC techniques listed in Table 1 [11]. The above process is repeated until all the feature vectors
On the other hand, Daubechies Db2 wavelet
are grouped together. An objective measure for the
provides a better resolution for smooth changing time
goodness of fit of the dendogram presented by AHC
series [9]. The Db2 scaling and wavelet functions are
techniques is to use cophenetic correlation coefficient
given as follow:
[11]:
ψ1 =
1+ 3 4
, ψ 2 = 3 + 3 , ψ3 = 3 − 3 , ψ 4 = 1− 3 , 4
4
ϕ1 = ψ 4 , ϕ 2 = −ψ 3 , ϕ3 = ψ 2 , ϕ 4 = ψ 1
(6)
δ=
4
(7)
∑ (W i< j
ij
∑( i< j
−W
Wij − W
)( Z
ij
) ∑( i< j
−Z
)
Z ij − Z
)
(9)
where Wij and Z ij are the distance between object i
338
M2USIC 2006
TS- 3C
and object j in group W and group Z, respectively. W
effect.
The Centroid
linkage
is
probably
not
and Z are the corresponding average value.
appropriate due to non-monotonic cluster tree factor. Visual observation on Figures 1 and 2 show that
Single linkage:
(
)
d(1rs ) = min dist ( f hr , f l s ) ,
Complete linkage and Average Linkage are relatively
h = 1, 2L , nr , l = 1, 2,L , ns
(
poorly segmented in both Haar and Db2 wavelet
(10)
transforms.
)
Complete linkage: d(2rs ) = max dist ( f hr , f l s ) , h = 1, 2L , nr , l = 1, 2,L , ns
Average linkage: d(3rs ) =
1 nr ns
nr
ns
h=1
l =1
∑∑ dist ( f
(
r h
(11)
, f l s ) (12)
)
Centroid linkage: d(4rs ) = dist f hr , f l s , where f r =
Ward linkage:
d( rs ) = 5
1 nr
(i)
(ii)
(iii)
(iv)
(v)
(vi)
nr
∑f
r h
(13)
h=1
nr ns d(4rs ) nr + ns
(14)
Table 1: Distance measure for various agglomerative
Figure 1: Segmentation results using k=4, 4 segments and Haar wavelet decomposition at level 1. (i) Original Lena
hierarchical clustering techniques.
image, (ii) Single linkage, (iii) Complete linkage, (iv) Average linkage, (v) Centroid linkage, and (vi) Ward linkage.
3.
RESULT AND DISCUSSION We use the standard Lena test image with size of
256 × 256 pixels. Lena contains typical image features
such as curved edges, textures, and contrast. Three experiments are carried out to determine the optimum
(i)
(ii)
(iii)
segmentation result; i.e. selecting clustering techniques, sub-matrix size and level of wavelet decomposition. Finally, the proposed method is compared with the segmentation method suggested by [6].
(iv)
(v)
Figure 2: Segmentation results using k=4, 4 segments and Db2 wavelet decomposition at level 1. (i) Original Lena
3.1 Selecting the clustering techniques
image, (ii) Single linkage, (iii) Complete linkage, (iv)
In the first experiment, the five clustering techniques listed in Table 1 are being compared. The window size is fixed as 4 × 4 and the number of segment is fixed at 4. The feature vectors are obtained by using first level of decomposition of Haar and Db2 wavelet transforms. The segmentation results using Haar and Db2 wavelet transforms are shown in Figure 1 and Figure 2, respectively. Single linkage has the tendency to show only one segment due to the chaining
339
Average linkage, (v) Centroid linkage, and (vi) Ward linkage.
Tables 2 and 3 show the cophenetic correlation coefficients for the respective dendogram using Haar and Db2 wavelet decomposition. We can see from the tables that Db2 wavelet decomposition gives a more reliable cophenetic coefficients for all the clustering techniques. Furthermore, we observe that Ward linkage using Db2 wavelet provides the most consistent
M2USIC 2006
TS- 3C
clustering result. This observation agreed with the conclusion given in [6]. Thus, only Ward linkage will be discussed in the following sessions. (i)
(ii)
(iii)
Technique
Level 1
Level 2
Level 3
Single
0.1398
0.5549
0.5549
Figure 3: Image segmentation using Haar wavelet at 1st level
Complete
0.6565
0.6920
0.6828
decomposition and Ward linkage with 4 segments. (i)
Average
0.7187
0.7100
0.7058
3 × 3 sub-matrix, (ii) 4 × 4 sub-matrix, and (iii) 5 × 5
Ward
0.6414
0.6911
0.6804
sub-matrix.
Table 2: Cophenetic coefficient for segmentation results using k=4, 4 segments and Haar wavelet decomposition at levels 1, 2 and 3.
Technique
Level 1
Level 2
Level 3
Single
0.1484
0.1331
0.1220
Complete
0.6440
0.6372
0.6553
Average
0.6908
0.6971
0.6777
Ward
0.6629
0.6672
0.6692
(i)
(ii)
(iii)
Figure 4: Image segmentation using Db2 wavelet at 1st level decomposition and Ward linkage with 4 segments. (i)
3 × 3 sub-matrix, (ii) 4 × 4 sub-matrix, and (iii) 5 × 5 sub-matrix.
Table 3: Cophenetic coefficient for segmentation results using k=4, 4 segments and Db2 wavelet decomposition at levels 1, 2 and 3.
3.3 Selecting the level of wavelet decomposition One of the advantages of the wavelet transform
3.2 Selecting the sub-matrix size
comes from its multiresolution property as given by Equations (2) and (3). This experiment investigates the
The proposed AHC techniques give the flexibility to select sub-matrix size based on user’s needs. It can be any rectangle sub-matrix, but we only focus on the square, where 1 ≤ w < min ( m, n ) . Three different sub-matrix sizes are considered, i.e. 3 × 3 , 4 × 4 and 5 × 5 . Note that the larger is the sub-matrix size, the
lesser is the computation times but with a more serious blocking effect.
effect on wavelet decomposition level of the segmentation result. Figure 5 shows the segmentation results using Haar and Db2 wavelets at 1st, 2nd and 3rd level of wavelet decomposition. As the level of wavelet decomposition increases, segmentation results obtained from Haar wavelet are also varying. However, Db2 wavelet gives a similar result for different decomposition levels although the number of approximate coefficients in the feature
Figure 3 and Figure 4 show the segmentation results using Haar wavelet and Db2 wavelet, respectively. The number of segment is fixed at 4 and the level of decomposition is 2. Clearly, Db2 wavelet has preserved a better image features and has a better resolution when the sub-matrix size varies. This is because Db2 has the feature to pick up the details that miss out by Haar wavelet.
340
vector reduced half for every higher level. This indicates that one may increase the computational efficiency
without
scarifying
the
quality
of
segmentation with higher Db2 wavelet decomposition level. The objective measures using cophenetic correlation coefficient are given in Table 3 and Table 4.
M2USIC 2006
TS- 3C
(i)
(ii)
(iv)
(v)
(iii)
(1a)
(vi)
(1b)
(1c) level 2
(2a)
(2b)
(2c) level 3
(3a)
(3b)
(3c) level 3
(4a)
(4b)
(4c) level 2
Figure 5: Segmentation results using k=4 and Ward linkage, 4 segments and various decomposition level. (i) Haar at level 1, (ii) Haar at level 2, (iii) Haar at level 3, (iv) Db2 at level 1, (v) Db2 at level 2, (vi) Db2 at level 3.
3.4 Comparison with previous work Chang etc. [6] suggested using AHC techniques as an image segmentation procedure. They defined the feature vector by using statistic values, namely mean,
Figure 6: Segmentation using Ward linkage at different
standard deviation, minimum and maximum intensity
number of segment between previous method and proposed
values. However, it is known that certain statistic value
method (1) Lena image, (2) Cameraman image, (3) Pepper
tends to be more sensitive to different image feature.
image, and (4) Mandrill image. (a) Original image, (b)
For instance, mean value is more sensitive to image
statistical-based feature vector, and (c) Db2 wavelet-based
luminance and variance is using to measure image
feature vector.
contrast. Thus, a statistic value may adequate for one image, but not for the others. To overcome this problem, we considered using
4.
CONCLUSION
approximate coefficients from wavelet transforms as the
In this study, five AHC techniques and two
statistical-based feature vector and the proposed Db2
wavelet transforms are analyzed. We observed that
wavelet-based feature vector has been carried out and
Ward linkage is the most reliable clustering technique
their optimum segmentation results are shown in
to carry out the image segmentation task. On the other
Figure 6. Four images are compared; namely the Lena
hand, we found that the approximate coefficient
image, Cameraman image, Pepper image, and Mandrill
generated from Db2 wavelet transform performs a
image. All the images are with 256 × 256 pixels.
better segmentation result as compare to Haar wavelet
Some of the settings are fixed, such as the sub-matrix
and statistical-based method. Besides, Db2 wavelet
the
feature
vector.
Comparison
between
size is 4 × 4 and the AHC technique used is Ward
transform resulted a higher efficiency by increasing the
linkage. Our study shows a significance improvement
sub-matrix size or the level of wavelet decomposition
on segmentation results for all the images when Db2
yet without scarifying the quality of segmentation.
wavelet approximate coefficient is used.
341
M2USIC 2006
TS- 3C
publication, US, 2002.
References [1] Shinn-Ying Ho, Kual-Zheng Lee, “A Simple and Fast
GA-SA
hybrid
Image
Segmentation
Algorithm”, GECCO, 2000, pp. 718 – 725. [2] S.D.
Olabarriaga
and
A.W.M.
Smeulders,
Images: A survey”, Medical Image Analysis, Vol.5, pp. 127-142, 2001. [3] E.Salari and Z. Ling, Texture Segmentation Using Hierarchical Wavelet Decomposition, Pattern Recognition, Vol. 28, No. 12, pp. 1819 – 1824, 1995. [4] B.G. Kim, J.I. Shim and D.J. Park, Fast Image Based
On
Multi-resolution
Analysis and Wavelets, Pattern Recognition Letters, Vol. 24, No. 16, pp. 2995 – 3006, 2003. [5]
S. Deng,
S.
Document
Latifi
and
Segmentation
Emma Regentova, Using
Polynomial
Spline Wavelets, Pattern Recognition, Vol. 34, No. 12, pp. 2533 – 2545, 2001. [6] Y.F. Chang, M.O. Rijal, N.M. Noor, T.H. Siew and
Z.Y.
Ng,
“Exploring
Agglomerative
Hierarchical
Techniques
An
Procedure”,
As
Kuala
The
Use
of
Clustering
Image
Segmentation
Lumpur:
International
Statistics Conference, December 2005. [7] W.J. Yi, K.S. Park and J.S. Paick, “Parameterized characteristic of elliptic sperm heads using Fourier representation and wavelet transform”, Proceeding of the 20th Annual International Conference of the IEEE, Vol.2, 1998, pp. 974 – 977. [8] Rafeal C. Gonzalez and Richard E.woods, Digital Image Processing Second Edition, Prentice Hall, US, 2002. [9] M. Mokji and S.A.R. Abu-Bakar, “Fingerprint matching based on directional image constructed using
expanded
Haar
wavelet
transform”,
Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV'04), 2004. pp. 149 – 152, [10] Alvin C. Rencher, Methods of Multivariate analysis
Second
Edition,
B.S.,
Multivariate
Graphical Data,
Wiley-Interscience
342
techniques
London:
Educational Book Ltd., 1978.
“Interaction in the Segmentation of Medical
Segmentation
[11] Everitt,
for
Heinemann
M2USIC 2006
TS- 3C
FEATURE VECTOR SELECTION BASED ON FREEMAN CHAIN CODE FOR SHAPE CLASSIFICATION
Suzaimah Ramli, Mohd Marzuki Mustafa, Aini Hussain Dept.of Electrical, Electronic and Systems Engineering Faculty of Engineering, Universiti Kebangsaan Malaysia 43600 UKM Bangi, Malaysia. Tel:603-89216699 Fax: 603-89216146 Email:
[email protected]
next section focuses on discussions about system description, followed by the methodology of the work and discussion on the algorithm to derive HSN. Section 3 discusses the results while section 4 concludes this paper and suggests future works.
ABSTRACT This paper focuses on selecting suitable feature vectors using the well-known Freeman Chain Code (FCC) for shape classification purposes [3]. However, FCC has its limitation. The generated chain codes are unsuitable to be used with a neural classifier since the number of codes generated for each shape varies. Alternatively, a set of eight feature vectors, which is the histogram of the shape number, is derived to represent the object shape. A correlation test is performed to analyze the feature vectors and determine the optimal features amongst the derived set. The selected optimal features are fed into a neural network through back-propagation multi layer perceptron (BP-MLP), in order to test the efficacy of the selected feature vectors. BP-MLP is trained using the selected features and subject the output to a pattern recognition task of classifying rounded and non-rounded shapes. Initial finding suggests that the extracted feature vectors of each class possess unique characteristics and can be used to discriminate between the classes.
SYSTEM DESCRIPTION Figure 1 depicts an overview of the overall system that outlines the basic structure. It consists of the following steps; preprocessing, feature extraction and feature selection using the rules of thumbs of FCC prior to classification. This feature extraction section discusses the algorithm to derive chain code and shape number. In this study, an image analysis preprocessing stage is performed to normalize the shapes and extract the features that are relevant for classifications process as in Figure 1 below. Once the feature vectors are extracted from the images, they are used as input and fed to the neural network for the purpose of image classification. Digitized Image
INTRODUCTION
PREPROCESSING
• Thresholding • Morphological Operation • Grid to get uniform image • Trace boundary to get edge
PR and classification are important tasks in a variety of engineering and scientific disciplines such as computer vision, artificial intelligence, medicine and remote sensing [5]. In designing a PR system, there are three main aspects involved, in particular the data acquisition and preprocessing module that include segmentation, feature extraction and classification. One of the issues that require careful attention in a PR system is feature extraction and selection. Feature vector serves as a reduced representation of the original data/signal/input that helps to avoid the curse of dimensionality in a PR task. Feature selection entails the task to select a subset amongst a set of candidate feature that performs best under a classification system. This procedure can reduce not only the cost of recognition by reducing the number of features that needs to be collected, but in some cases it also can provide better classification accuracy [1]. The paper is organized as follows. The
FEATURE EXTRACTION
•FCC •Histogram of Shape Number (HSN) •T-test Analysis CLASSIFICATION •ANN –MLP
Recognized Image/Shape
Figure 1: Block Diagram of a conventional objects shape analysis and classification
343
M2USIC 2006
TS- 3C
ii.
METHODOLOGY A.
The chain code of a boundary depends on the starting point. However, the code can be normalized by a straightforward procedure. The normalized form of chain code is called the shape number. For a chain code generated by starting in an arbitrary position, we treat it as a circular sequence of direction numbers and redefine the starting point so that the resulting sequence of numbers forms an integer of minimum magnitude. We can also normalize the rotation by using the first difference of the chain code instead of the code itself. This difference is obtained simply by counting (counterclockwise) the number of directions that separates two adjacent elements of the code. The shape number of such a boundary, based on the 4-directional code, is defined as the first difference of smallest magnitude.
Pre-processing
The Pre-processing component performs the following operations: thresholding, morphological operation, grid to acquire uniform image and edge detection via boundary tracing algorithm. Thresholding is a non-linear operation that converts a gray-scale image into a binary image where the two levels are assigned to pixels that are below or above the specified threshold value. Morphological operation applies a structuring element to an input image, creating an output image of the same size. The most basic morphological operations are dilation and erosion. Edge detection refers to the process of identifying and locating sharp discontinuities in an image. Edges are defined as discontinuities in the image intensity due to changes in scene structure. These discontinuities originate from different scene features and can describe the information that an image of the external world contains. As such, image enhancement and smoothing attempts to make these discontinuities apparent to the detector, so that desirable edges can be extracted. B.
iii.
Feature Extraction
direction is clockwise. In Figures 2, (c)-(d) are the chain code and the HSN of the contour of the square. Moreover, n is even for a closed boundary, and its value limits the number of possible different shapes. The HSN is a translation and scale invariant shape descriptor. It can be made invariant to rotations of 90o because the 90o rotation causes only a circular shift in the FCC [2]. To achieve better rotation invariance the normalized chain code histogram which is HSN should be used.
Freeman Chain Code
The FCC algorithm is implemented to obtain a simplified representation of the object shape. Thus, FCC is a compact way to represent a contour of an object [1]. The chain code is an ordered sequence of n links, which can be represented such as ci , i = 1,2,....n , where ci is a vector connecting
{
}
neighboring contour pixels. The directions of
Histogram of the shape number
The HSN is meant to group together objects that look similar to a human observer. It is not meant for exact detection and classification tasks. The HSN is calculated from the chain code presentation of a contour. (Refer to Figure 2: (a) – (d) for an example).The starting point for the chain coding is marked with a black circle, and the chain coding
The preprocessed images are further analyzed to extract important features to represent the images in a compact and reduced form. In this work, we will consider feature extractions based on the shape of the plastic bottles. There are three main approaches to shape representation: 1) boundary or contour-based 2) region or area based and skeleton based. The identified feature vectors are the histogram of the shape number derived from the generated chain code. In the further section, we described in detail the FCC algorithm, shape number and the histogram of the shape number. i.
Shape Number
C.
Classification
ci are Once the feature vectors are extracted from the images, they are used as input and fed to the neural network for the purpose of shape classification. Neural networks proved to be invaluable in applications where a function based model or parametric approach to information processing are difficult to formulate. The description of neural network can be summarized as a
normally coded with the integer values of k = 0, 1, ..., K-1 in a counterclockwise direction starting from the direction of the positive x-axis. The number of ( M +1)
where M directions K takes the integer values of 2 is a positive integer. Chain codes with K>8 are called generalized chain codes.
344
M2USIC 2006
TS- 3C
The t-test analysis also is done to generate valid data before we can use the data as a feature vector in the neural network. Table 1 below shows the result from the t-test analysis.
collection of units that is connected in some patterns to allow communication between the units [4]. BP-MLP starts as a network of nodes arranged in three layers the input, hidden and output layers. The input and output layers serve as nodes to buffer input and output for the model respectively and the hidden layer serves to provide a means for input relations to be represented in the output, before any data has been run through the network, the weights for the nodes are random, which has the effect of making the network much like a newborn’s brain, developed but without knowledge [7]. All necessary parameters are set before the network was trained. Back-propagation technique have been evaluated, the classifier learns from the training set, in which every image is represented by feature vector composed of extracted HSN. In this work, we consider the BP-MLP only. Other variations of the NN models [6] are also being considered but not reported here.
Original Image
Binarized Object Boundary
HSN 140 120
100
80
60
40
20 0
1
2
3
4
5
6
7
8
9
10
11
120
100
80
60
40
20
0
6
5
4
3
2
1
7
8
9
10
11
(a) Rounded/Curved Shape Class Original Image
Binarized Object Boundary
HSN
250
200
150
100
50
0
1
2
3
4
5
6
7
8
9
10
11
RESULTS & DISCUSSION 160 140 120 100
A collection of 100 images of various shape objects constitutes the database to generate the input images. All these images are divided into two groups, rounded and non-rounded objects. In this work, the extracted FCC based feature vectors are the HSN which derives from the generated chain code. To accomplish this, the object boundary or outline is first represented using FCC’s technique described earlier. Chain coding provides the points in relative position to one another, independent of the coordinate system. To generate the chain code, the trace boundary algorithm is implemented by detecting the direction of the next-inline pixel in the clockwise direction. Displays of the generated feature vectors from the FCC method are shown in Figure 2. Figure 3 illustrates the step-by-step images of four different shapes produced during preprocessing and their generated shape number histograms obtained via FCC based feature extraction method. The top display representing Figure 3(a) belongs to the rounded/curved image class whilst the bottom belongs to the non-rounded image class. The images used in the study are generated images produced using the Adobe Photoshop software.
80 60 40 20 0
1
2
3
4
5
6
7
8
9
10
11
(b) Non Rounded/non-Curved Shape Class Figure 3: Samples of Preprocessed Image and generated HSN from FCC analysis.
Table 1: RESULT OF T-TEST ANALYSIS Rounded Non-rounded T-test Result class class H1 H1 h = 1; ci =-349.92 -138.02 significance = 1.88e-005 H2 H2 h =1; ci = -214.68 -105.43 significance = 1.47e-007 H3 H3 h =1; ci =-0.58 -0.20 significance = 1.40e-004 h =NaN; ci =0 0 H4 H4 significance =NaN h =NaN; ci =0 0 H5 H5 significance =NaN h =NaN; ci =0 0 H6 H6 significance =NaN H7 H7 h = 1; ci = 0.12 1.27 significance =0.02 H8 H8 h =1; ci = -211.76 -104.30 significance =1.35e-007
A. T-test Result The t-test assesses whether the means of two groups of shapes are statistically different from each other. In this work, from the generated HSN, we do the t-test analysis to prove which of the set of data from the HSN (H1-H8) that has no relationship and shows the significant difference among the two different classes.
From Table 1 above, it shows that the data set H1-H3 and H7-H8 have significant different and have no relationship between two classes. Data set H4-H6 can be ignored since most of the data value is 0. From this
345
M2USIC 2006
TS- 3C descriptors are able to correctly classify the two-classes of shapes with more than 80% accuracy. Initial finding also suggests that the extracted feature vectors of each class possess unique characteristics and therefore can be used to discriminate between the classes. For future improvement, we will continue with larger and more complicated sets of data and to try other techniques that can better classify these two classes of objects.
test, the data set which can be used as input to the neural network are only H1, H2, H3, H7 and H8. A.
Classification Result
Once the feature vectors are extracted from the images, they are used as input and fed to the neural network for the purpose of shape classification. The neural network classification is done based on two groups of data. For the first trial, we do the classification for all data set, which is H1-H8. For the second trial, we only test the set of data that prove to have no relationship in the t-test analysis above, which is H1- H3 and H7-H8. Table 2 below shows the result of the first and second trial.
REFERENCES [1]
Anil K. Jain, Robert P.W. Duin, and Jianchang Mao, Statistical Pattern Recognition: A Review, In Proceedings of IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, January 2000.
[2]
H. Freeman and A. Saghri. Generalized chain codes for planar curves. In Proceedings of the 4th International Joint Conference on Pattern Recognition, pages 701-703, Kyoto, Japan, November 7-10 1978.
[3]
Herbert Freeman. Computer processing of linedrawing images. Computing Surveys, 6(1):57-97, March 1974.
[4]
Gonzalez, R. C. And Woods, R. E. Digital Image processing. Upper Saddle River. Prentice Hall, 2002.
[5]
Jahne, B. 2002. Digital Image Processing. Fifth Revised and Extended Edition. Germany:Springer-Verlag.
[6]
S. Haykin, Neural Networks, A Comprehensive foundation, Macmillan College Publishing Company, Inc. 1994.
[7]
L. Fausett, “Fundamental of Neural Network Architectures, Algorithms and Applications”, Prentice-Hall, Inc. Upper Saddle River, USA, New York, 1990.
Table 2: RESULT OF NN CLASSIFICATION
Type of object
No. of Classification object rate (%) Non-Round 50 86% Round 50 82% (a) First Trial: Classification Result for all Data set
Type of object
No. of Classification object rate (%) Non-Round 50 96% Round 50 90% (b) Second Trial: Classification Result for H1, H2, H3, H7and H8 Data set
From table 2, we can conclude that the second trial, which consist of only the feature vectors after the selection process gives better result than the first trial when all the feature vectors were used. The results above proved that feature selection is an important step after extraction and it has help improved the effectiveness of the FCC based feature vector in classifying all the shape tested. In short, feature selection procedure, which can be afforded, using statistical analysis helps optimize the extracted feature vectors for PR purposes. This work is still in its early stages and further work involving more recent and advance algorithms are required to enhance the results. CONCLUSION AND FUTURE WORKS In conclusion, this paper has presented a method to select feature vector based on FCC and perform shape analysis for automatic shape classification task. Initial results suggest that the extracted feature vectors obtained via HSN have unique characteristics and can be used as signatures to represent various object shapes namely the rounded and non-rounded shapes. The HSN based feature vectors though considered simple
346
M2USIC 2006
TS- 3C
Weed Feature Extraction Using Gabor Wavelet and Fast Fourier Transform Asnor Juraiza Ishak, Mohd Marzuki Mustafa & Aini Hussain Department of Electrical,Electronic and Systems Engineering Faculty of Engineering,Universiti Kebangsaan Malaysia 43600 UKM Bangi,Malaysia.Tel:603-89216699 Fax:603-89216146 Email:
[email protected]
Abstract In the plantation field, broad and narrow weed species are typically controlled differently with selective herbicides or with different tank mixes. Estate management and production cost can be reduced whilst the environment can be protected and conserved if lesser chemicals are used. As such, selective patch spraying is compulsory in order to reduce the amount of chemical herbicide used in agricultural practices. In order to do so, weed detection is a necessity. In this work, an image processing approach has been proposed to detect and classify weeds according to its class such that the selective patch spraying strategy can be implemented. This paper mainly deals with feature vectors extraction to achieve the ultimate goal of developing a system for selective patch spraying of weeds. In order to extract weed texture features, we have considered the combination of Gabor wavelet and FFT algorithms to extract a new feature vectors which we termed as “difFFTgabor”. Results showed that the difFFTgabor texture features can be used in weed classification quite effectively. Keywords: Image Processing, Feature Extractions, Gabor wavelet transformation and Weed Classification
1. INTRODUCTION Textures are replication, symmetries and combinations of various basic patterns or local functions usually with some random variation [10]. Texture features of weed species have been applied in distinguishing weed species by Meyer et al. [3]. The most promising method for texture feature extraction that has emerged is Gabor wavelet. The texture feature extraction is obtained by filtering the input image with a set of two-dimensional (2-D) Gabor filters. The filter is characterized by a preferred orientation and a preferred spatial frequency. Typically, an image is filtered with a set of Gabor filters of different preferred orientations and spatial frequencies that cover appropriately the spatial frequency domain, and
the features obtained form a feature vector field that is further used for analysis, classification, or segmentation. Gabor feature vectors can be used directly as input to a classification or a segmentation operator or they can further be transformed into new feature vectors that are then used as input. Daugman [5] indicated that Gabor wavelet resemble the receptive field profile of the simple cortex cells. Beck et al. [4] found out that Gabor transform was the best among the tested candidates, which matched the human vision study results. Bovik et al. [1] further emphasized that 2-D Gabor filters have been shown to be particularly useful for analyzing texture images containing highly specific frequency or orientation characteristics. In Tang et al. [7], texture-based weed classification using Gabor Wavelet was adopted to classify images into broad leaf and grass categories. Applications based on Fast Fourier Transform (FFT) such as image processing requires high computational power, plus the ability to experiment with algorithms. The FFT makes tangible the computational intensity of processing even large images with complicated filters [10]. A digital image filter can be applied by performing a convolution of the input image and the filter's impulse response [10]. C.H. Park and H. Park [2] apply the Fast Fourier Transform (FFT), the fast algorithm for computing the DFT, to construct a directional image. Xiong et al. [3] propose an effective scheme for rotation invariant texture classification using empirical mode decomposition and 2D FFT. Image processing of very close range video images or digital photographs can be used for recognition of plant species. Image processing involves enhancing image and extracting information or features from an image. Several researches in weed detection using image processing had been conducted abroad [6] which involved various types of weeds that are regional and climatic conditions dependent. Therefore, in this work, the main objective is to implement the Gabor wavelet and FFT algorithms in selecting the best feature vectors for the classification of weeds in view of the local climatic conditions. Specifically, we will classify the weed as either narrow or broad weed type using the proposed feature vectors. This paper has been structured accordingly: the
347
M2USIC 2006
TS- 3C
image preprocessing measures that have direct influence on the feature extraction results is described in Section 2. Then, Section 3 describes both the Gabor filter and FFT algorithms. Results are presented and discussed in Section 4 followed by a concluding remark in Section 5.
Conversion from digitized image to MExG image
(a)
2. IMAGE PREPROCESSING
(b) Preprocessing operation involves pretreatment of the raw RGB image. Typically, it involves normalization to produce uniformity in terms of the image size. Resizing is therefore required. In this work, all images were resized to 100 by 100 pixels images. Meyer et al. [3] have applied Excess Green (ExG) to separate between plant and soil region in their weed species identifications research. Similarly, the color index use for segmentation in this work is called Modified Excess Green (MExG). Prior to feature extraction, the images need to be converted to the MExG level by removing the red and blue components. MExG conversion from RGB involves background segmentation that separates the weed region from the background and internal voids. The overall block diagram of the weed classification system is shown in figure 1. Figure 2 shows some results of the image preprocessing stage involving MExG conversion.
Figure 2 MExG preprocessing results for (a) narrow and (b) broad weed types
3. FEATURE EXTRACTION Image feature vector extraction is an area of image processing which involves using algorithms to detect and isolate various desired portions or feature of a digitized image. The objective of feature extraction process is to represent the raw image in its reduced and compact form in order to facilitate and speed up the decision making process such as classification. In this work, Gabor wavelet and FFT algorithm were used to extract feature vectors of images. 3.1 Gabor Wavelet Function.
Digitized images (Weeds images)
Image Preprocessing • resize, • modify excess green
Feature Extraction Process • Gabor Filter Fast Fourier Transform
Classification • Threshold
OR
Narrow
Broad
Gabor wavelet transformation enables us to obtain image representations, which are locally normalized in intensity and decomposed in spatial frequency and orientation. In this work, we have employed the Gabor wavelet function used by Naghdy et al. (1996) and it is defined as follows. x2 + y2 h( x, y ) = exp − α 2 j . exp jπα j (x cosθ + y sin θ ) 2
[
]
Different choices of frequency j and orientation θ will construct different sets of filters. In this work, the frequency j value has been fixed to a value of 7 where as the orientation, θ, are set to 4 different values of 0o, 45 o, 90 o and 135o. Each filter is made of a pair of filters, which are the real and imaginary parts of the complex sinusoid. These pairs are convolved with the MExG channel signal of texture image separately. Convolution between the green channel and the Gabor wavelet filters is performed since the MExG channel is found to have the best texture quality and provides the best contrast level between plants and soil compared to the red and blue channels [7]. Fixing the frequency level to 7, the output is the modulation of the average of the convolution output from
Figure 2. Overall block diagram of the weed classification system
348
M2USIC 2006
TS- 3C
real and imaginary filter masks on all convolved pixels in the green channel image, which is computed as follow 2 2 Output = R ave + I ave
Rave is the result of the convolution of the sample image region with the real filter mask and I ave
where
is the result of the convolution of the sample image region with the imaginary filter mask. This equation means every complex filter pair for one frequency level is employed to capture one feature of a texture. Therefore, every image will have a set of 4 Gabor filter output. These extracted Gabor output are then used as input to a fast Fourier Transform to form a set of three feature vectors. 3.2. Fast Fourier Transform The fast Fourier Transform (FFT) is a faster version of the discrete Fourier transform. The FFT algorithm will able to do the same thing as the DFT, but in much less time and less complex. The 2D FFT essentially decomposes a discrete signal into its frequency components, and shuffles the low frequency components to the corners. Samples of weed images that have gone through the Gabor wavelet and FFT processing are as shown in figure 3. The 2D Fourier Transform is defined as follow.
Next, the difference between the two FFT values is computed and this value is what we called as difFFTgabor which we used as a feature vector. The task is repeated for three other pair combinations that is the [0° & 90°], [0° & 135°] and [45° & 90°]. As a result, for each image there are four different difFFTgabor feature vectors that can be used to represent it, namely the [0° & 45°], [0° & 90°], [0° & 135°] and [45° & 90°] orientation pairs.
4. RESULTS AND DISCUSSION To classify narrow and broad weed accurately, one needs to select and extract the best feature vectors. This means one must determine a set of feature vectors that can uniquely represent the two weed classes so that the features can be used for the classification of narrow and broad weed types. In choosing the best features, the four sets of the extracted features are used in the task to classify between narrow and broad weed. Each extracted feature set is tested for its classification accuracy. Figure 4 shows the feature extraction process to select the best feature vectors based on the FFT of the Gabor filter orientation pair sets. Raw image
F (ω1 , ω 2 ) = ∫ ∫ f ( x1 , x 2 )e − i (ω1x1 +ω2 x2 ) dx1dx2 Gabor filter image
Sets of Gabor filter [0° & 45°] [0° & 90°] [0° & 135°] [45° & 90°]
Preprocessing MExG
FFT image
Gabor filter
* FFT
(a) F135
Convolution
FFT
-
FFT
+
FFT
F45
Σ
F90
+ F0
Σ
-
+
-
Σ
(b) Figure 3. Gabor filtered image and its FFT based on 90° orientation for (a) narrow and (b) broad weed type
+
Σ
Extracted Feature Vectors (difFFTgabor)
To choose the best FFT feature extraction for each image, we need to calculate the difference of the FFT output value based on a pair of Gabor orientations. As an example, let consider the Gabor orientation pair of [0° & 45°]. We then compute FFT1 and FFT2 in which FFT1 is based on the 0° orientation whereas FFT2 is derived from the 45° orientation.
Classification via threshold check Figure 4. Process to select FFT feature vectors based on Gabor orientations
349
M2USIC 2006
For classification purpose, we have used the simple threshold check. The threshold value was set to 0.05 and this was determined experimentally. In general, it was found that difFFTgabor for narrow weed types exceed the threshold where as for broad weed types it is lower than the threshold. Three hundred images consisting of 150 narrow and 150 broad weeds were used to test the efficacy of proposed feature sets. Table 1 tabulates the weed classification results using the four different feature vector sets. Results obtained revealed that the feature vector set of the [0° & 90°] orientation set has classified the weed almost perfectly. This implies that the difFFTgabor feature vector of the [0° & 90°] pair set has unique feature such that it enables perfect classification. On the contrary, the others only obtained less than 86.5% overall accuracy. These results have shown that difFFTgabor feature vectors using [0° & 90°] pair set is the best feature vector to represent the weed images. However, further testing with larger database is required before a conclusive statement can be deduced.
Table 1: WEED CLASSIFICATION RESULTS % Classification Accuracy Feature Vector narrow broad overall 47 98 72.5 difFFTgabor [0°&45°] 96 94 95 difFFTgabor [0°&90°] 36 97 66.5 difFFTgabor [0°&135°] 75 98 86.5 difFFTgabor [45°&90°]
5. CONCLUSION This paper has considered the combine use of Gabor wavelet and FFT algorithms to extract feature vectors set which we called difFFTgabor and used it in the weed classification task. Preliminary results obtained in this work have shown that the best feature vector set for difFFTgabor is when using the 0°&90° orientation pair set. On going work is currently being carried out to further improve the system performance with particular emphasis is now focused on the classifier development. In addition, more images will be collected and used in the testing to determine the best configuration for the intelligent weed detection and classification system.
TS- 3C
References [1] A.C. Bovik, M. Clark, and W. Geisler, “Multichannel texture analysis using localized spatial filters”, IEEE Transaction on Pattern Analysis and Machine Intelligence, 12:55-73, 1990. [2] C.H. Park and H.Park, “ Fingerprint classification using fast Fourier transform and nonlinear discriminant analysis”, Journal of Pattern Recognition 38(4): 495-503, 2005 [3] C.Z. Xiong, J.Y. Xu, J.C. Zou, and D.X. Qi, “ Texture classification based on EMD and FFT”, Journal of Zhejiang University Sceience A, 1516-1521,2006. [4] G. Naghdy, J.Wang, and P. Ogunbona, “Texture Analysis using Gabor Wavelets”. Proc. IS&T/SPIE Symp. Electronic Imaging. 2657:74-85, 1996. [5] G.E. Meyer, T. Mehta, M.F. Kocher, and D.A. Mortensen, “Textural Imaging and Discriminant Analysis for Distinguishing Weeds for Spot Spraying”, Transactions of the ASAE, 41(4), 1189-1197, 1998. [6] J. Beck, A. Sutter, and R.Ivry. “Spatial Frequency Channels and Perceptual Grouping in Texture Segregation”. Proc. Computer Vision, Graphics, and Image Processing (CVGIP), 37, 1987. [7] J.G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation opti ized by twodimensional visual cortical filters”. Journal of the Optical Society of America A., 2(7): 1160-1169, 1985. [8] J.V. Belloch, T. Heisel, , S. Christensen, and A. Roads, “Image Processing Techniques for Determination of Weeds in Cereal”, Proc. International Workshop on Robotics and (BIOAutomated Machinery for Bio-products ROBOTICS’97), 195-199, 1997. [9] L. Tang, L. F. Tian, B. L. Steward and J. F. Reid, “Texture-based Weed Classification using Gabor Wavelets and Neural Network for Real-time Selective Herbicide Application”, Transactions of the ASAE , St. Joseph, MI, No. 99-3036, 2000. [10] M. Kenneth and A. Edward, “The FFT and GPU”, Proc. Eurographics Association:Graphics Hardware, 2003. [11] M.S. Nixon and A.S. Aguado, Feature Extraction & Image Processing, Oxford UK: Newnes, 2002. [12] N. Zhang and C. Chaisattapagon, “Effective criteria for weed identifying in wheat fields using machine vision”. Transactions of the ASAE, 38(3), 965-974, 1995. [13] N.Sebe and M.Lew, “Wavelet Based Texture Classification”, Proc. International Conference on Pattern Recognition, 959-962, 2000.
350
SESSION TS3D TOPIC: MULTIMEDIA SESSION CHAIRMAN: Dr. Andrew Teoh _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________
11.00am
TS3D-1
Wavelet Packet Image Arithmetic Coding Gain Using
351
Subspace Energy Feature Remapping Sze Wei Lee Ian Chai Matthew Teow ( Multimedia University, Cyberjaya, Malaysia)
11.20am
TS3D-2
Colour Number Coding Scheme for Time-Varying
357
Visualisation in Glassy Ion Trajectories Muhammad Shafie Abd Latiff Johan Mohamad Sharif Md Asri Ngadi (UTM Skudai Johor, Malaysia)
11.40am
TS3D- 3
Colourization Using Canny Optimization Technique Kok Swee Sim Jiin Taur Eng (Multimedia University, Melaka, Malaysia)
362
M2USIC 2006
TS- 3D
Wavelet Packet Image Arithmetic Coding Gain Using Subspace Energy Feature Remapping
1
Matthew Teow1, Sze-Wei Lee2, and Ian Chai2 Department of Engineering, KDU College, Petaling Jaya, Malaysia 2 Faculty of Engineering, Multimedia University, Cyberjaya, Malaysia Abstract
We develop an arithmetic coding (AC) preprocessing algorithm for discrete wavelet packet transform (DWPT) image compression using subspace energy feature remapping (EFRM) algorithm. Compared with Huffman coding AC is an effective entropy codec for image compression. Thus, it has been adopted as the entropy codec for the new ISO image compression standard, JPEG2000. AC is optimal when the joint probability distribution of the zero-entropy of the coding data is one-dimensionally (1D) autocorrelative and structured. Thus, the ordering of DWPT transformed image coefficients will affects the AC coding performance when the image coefficients are restructured into a 1D bit stream for entropy coding but does not consider the image spatial homogeneity and the DWPT subspace energy features. The proposed method, wavelet packets subspace energy feature remapping (WPEFRM) algorithm, is a pre-processor for the AC codec that pre-processes the transformed coefficients by determining an optimal coefficient ordering according to the DWPT subspace energy features. To strengthen the proof, the proposed algorithm was evaluated with a set of test images. Empirical analysis shows that AC with entropy code pre-processing algorithm. EFRM has a better coding performance as compared to conventional AC. Keywords: Image Coding, Wavelets Theory, Image Processing, Multimedia Applications
1. Introduction The design of separable wavelet transforms compression of images using one-dimensional (1D) factorisation (convolution-decimation); discrete wavelet transform (DWT) or discrete wavelet packet transform (DWPT) which will results in compaction of a set of directional orthonormal basis in L2 ( R ) domain. This orthonormal basis characterises an image local texture and spatial homogeneity in a few wavelet atoms [1]-[4]. The strength of separable transform comes from the fact that it optimises the compression of the decomposition coefficients according to its local sub-space energy features. The decomposition coefficients are further
351
quantised and encoded with an arithmetic codec (AC). Our hypothesis is that the directional orthonormal wavelet basis is disjoint across subspaces where the variations in local image contrast manifest themselves as clustering of high magnitude coefficients that defines the image wavelet subspace energy features [5]. If an AC is applied to encode such wavelet basis uses a prefixed coefficient-ordering model [6] that does not exploit the coherency of the sub-space energy features. Which will generates fragmentised code blocks and the overall theoretical entropy is increased. Thus, results in high cost of average number of bits per arithmetic symbol to encode the coefficients. In general, the operation of DWPT is best described as “first filter different frequency bands, than cut these bands into time-intervals to study their energy variations, if the information cost does not reach the entropy bound, further the decomposition” [3] [4] [8]. The above explanation makes clear of DWPT to DWT basis, for the search on wavelet image representation. DWT has a pre-determined structure of basis decompositions but DWPT is adaptive [4] [8]. Theoretically, DWPT provides better energy compaction and good adaptation images with oscillatory pattern and irregular spatial homogeneity [4] [8]. In this paper, an arithmetic code pre-processing algorithm called the wavelet subspace energy-feature re-mapping (EFRM) algorithm [5] is proposed to be integrated into a DWPT image coding system. The proposed algorithm is called wavelet packets subspace energy feature remapping (WPEFRM). The proposed EFRM algorithm characterises the DWPT subspace energy feature and re-maps the subspace according to its local energy feature to construct an adaptive coefficients-ordering model that converges with the energy features locality and its spatial texture homogeneity [5]. As a result, the arithmetic code is optimised so that entropy-coding gain improves. The proposed algorithm is a lossless algorithm and has very low computational complexity. This paper is organised as follows: Section 2 gives a description of both DWPT and EFRM algorithms. Experimental results and discussion are presented in
M2USIC 2006
TS- 3D
Section 3. Finally, this paper concludes with a summery in Section 4.
2. Description of Algorithm
filters with D6 (Daubechies’s 6 th order) wavelets. Figure 2 is the plot of D6 scaling function (lowpass filter) and mother wavelets (high-pass filter) with an iteration factor of 10. phiHtL
The WPEFRM algorithm is schematised in Figure 1. The WPEFRM consists of (i) a discrete wavelet packet transform (DWPT) [3] [4] [8] (ii) a scalar quantiser [3] (iii) an EFRM entropy pre-processor [5], and (iv) an arithmetic entropy coder (AC) [3] [6].
psiHtL
1 0.8 0.6 0.4 0.2
-0.2 -0.4
1 0.5
-4 2
4
6
EFRM
DWPT
Q
Decoder
Arithmetic codec pre-processor
EFRM
AC
products of (2), and corresponds to a separable space of Wjp , q
supported, discrete sampled function in L ( R
2
)
space with a spatial coordination of m = ( m1 , m2 ) ,
computed as d jp [ n ] = f [ m ] ,ψ jp ⎡⎣ m − 2 j − L n ⎤⎦ ,
(1)
j−L ⎣⎡ m − 2 n ⎦⎤}n∈Z is the separable onedimensional orthogonal basis of the wavelet packet subspace, Wjp . For m ≤ L . These discrete wavelet p j
packets are recursively decomposed for any j ≥ L
∑ h[n]ψ
p,q
Wj
= W j +1
2 p ,2 q
n = −∞
ψ 2j +p1+1 [ m ] =
j−L ⎣⎡ m − 2 n ⎦⎤ and
(2)
+∞
∑ g[n]ψ
n = −∞
p j
⎡⎣ m − 2 j − L n ⎤⎦
352
(4)
2 p ,2 q +1
⊕ W j +1
2 p +1, 2 q +1
⊕ W j +1
. (5)
Each quad-tree wavelet subspace is labelled by a 2 j decomposition scale and ( p, q ) subspace indices, where 0 ≤ p < 2 j − L and 0 ≤ q < 2 j − L . Thus, the orthogonal decomposition, based on recursive splitting of the root WL0,0 is computed as WL0,0 = ⊕iI=1 Wj
pi , qi i
, for i ≤ I .
(6)
Given, the 1D factorisation of d jp , q [ n ] is defined as
d
2 p +1,2 q j +1
d
2 p ,2 q +1 j +1
d 2j +p1+1,2 q +1
where p is defined as the subspace index and h [ n ] and g [ n ] are the low-pass and high-pass
2 p +1, 2 q
⊕ Wj
2 p ,2 q
p j
(3)
With (3), the first level of wavelet packet transform thus generates the following four orthogonal subspaces
d j +1
ψ 2j +p1 [ m ] =
q
ψ jp , q [ m ] = ψ jp [ m1 ]ψ qj [ m2 ] .
and 0 ≤ p < 2 j − L by +∞
p
and separable wavelet packet
where f ( m) is a square function, L2 that bounded with a finite spatial resolution of m 2 = L2 . ii. The wavelet packet decomposition of an image f is defined by its sample values is thus obtained by 1D factorisation. Given, for n = ( n1 , n2 ) that defines any transformational subspace bases in a wavelet image representation. Thus, at a given L − level decomposition, the wavelet packet d jp [ n ] is
= Wj : W j ,
p,q
Wj
is defined as a compactly 2
{ψ
(b)
The two-dimensional (2D) wavelet packet quadtree bases of L2 ( R 2 ) space are the separable
2.1. Discrete Wavelet Packet Transform
where
t
Figure 2. The plot of D6 scaling function (a) and mother wavelets (b)
Figure 1. The WPEFRM block diagram
i. An input image f
6
AC Compressed image bit stream
fˆ
4
-0.5
(a) Q
DWPT
10
2
-1
Encoder
f
8
-2
t
[ n] = d [ n] = d [ n] = d [ n] = d
p,q j
⊗ hh [ 2n ] ,
p,q j
⊗ gh [ 2n ] ,
p,q j
⊗ hg [ 2n ] , and
p,q j
⊗ gg [ 2n ] .
(7)
Then, the 2D DWPT coefficients is computed with (7) and it is defined as
M2USIC 2006
TS- 3D
d jp , q [ n ] = f ,ψ jp , q ⎣⎡ m − 2 j n ⎦⎤
(8)
The 2D separable wavelet packet filter bank decomposition is derived according to onedimensional convolutions and sub-sampling of wavelet packet basis (1), recursive decomposition (2), and separable theorem of orthogonal basis of L2 ( R 2 ) space, (3) and (5).
The insignificant coefficients below δ are truncated by the quantiser dead-zone, where δ is the dead-zone of Q . The value of δ is determined based on the DWPT transformed image cumulative energy ( CE ) . Figure 3 is the CE plot of a resized ( 128 × 128 ) 8-bits greyscale Goldhill image (Figure 5). The CE is defined as n
CEn = ∑ d j ,i
2.2. Information Cost Function
Η ( • ) and it is defined as
then
(
)
⎧ 2 2 ⎫ Η ( • ) = max ⎨−∑ d j [ n ] log d j [ n ] ⎬ ⎩ n ⎭ (9)
max
{
}
∀ {W 2 j } : 2 j → 2 j +1 , if max Η ( d j )
The entropy criterion Η is to decide the level of DWPT decompositions for best basis selection. For further understanding of information cost function, please refer to the following literature [4] [7] [8]. iii. Q ( • ) is defined as a scalar quantiser with a stepsize of {i Δ}i∈R and i as the bin index. Thus the quantisation rule of packet, d
p,q j
{ p, q}
th
order of wavelet
[ n ] is defined as
p,q p,q d j [ n ] = Q ( d j [ n ])
⎧0 ⎪ =⎨ ⎪sign( d jp , q [ n ])i Δ ⎩
p,q
if d j
, for i = n
(11)
i =1
i. The DWPT information cost function (best basis selection) of f is implemented using a levellimiter, j ≤ l with entropy adaptation [7] [8],
for
2
[ n ] < Δ2
(10)
if (i − 1 ) Δ ≤ (i + 1 ) Δ 2 2
2.3. Energy Feature Remapping i. The first step of the EFRM algorithm [5] is to determine the DWPT energy feature, E jp , q and it is defined as the coefficient-ordering in a wavelet subspace with a maximum joint probability distribution of zeros-order coefficients for a given minimum entropy,
{
( ) }
G E jp , q = E ⎡⎢ Η d 0 ∈ P di = 0 ⎤⎥ . max ⎣ min ⎦
(12)
which is performed with a scanning, S ( • )
{
}
S jp , q , k = ∀ S k ( 2 j E p , q )
(13)
0≤ k ≤3
The k th scanning order for 0 ≤ k ≤ 3 , is defined as a horizontal scan, a vertical scan, a left-diagonal scan, and a right-diagonal scan. The higher energy k′ can be determined from the four feature Emax scanning sequences as
{ ( W )}
k′ Emax = max S k
⎧
p,q 2 j
(14)
⎫
= max ⎨∑ S k 2 j d p , q [ n ] ⎬ ( j, p,q)
⎩ j−L
Progressively reduce Δ n according to R bit-rates,
2
⎭k ={0,1, 2,3}
n
which is iteratively refined every i th Δ with iΔ = 2 n from 2n +1 to 2n until i Δ = R [3] [6]. CE of wavelet packet coefficients
CE 1
k′ ii. Next, re-map each d jp , q according to the Emax
orientation, where k ′ indicates the new coefficient ordering for a wavelet packet subspace. The remapper Μ ( • ) is a lossless function that performs a one-to-one coefficients re-ordering and it is defined as
0.8 0.6
{
0.4
δ
0.2
Cof . 2500
5000
7500
10000
12500
}
map Μ ( • ) = d jp , q , k ′ [ n′] ←⎯⎯ → d jp , q , k [ n ] k ′=± k
CE of image pixels
15000
for n′ ≠ n , where n′ is the new spatial coordination of n after re-mapped. The re-mapped subspace orientation, k ′ is defined as d jp , q , k ′ = max {∀E kj (d kj +1 ) :1 ≤ k ≤ 3}
Figure 3. The CE plot of the Goldhill image
353
(15)
p,q
.
(16)
M2USIC 2006
TS- 3D
The d jp , q , k ′ is the wavelet packet subspace with k ′ new coefficients ordering.
In arithmetic coding, the cn value is progressively calculated by adding a refinement bit when [ ai , ai + bi ] is reduced in the next sub-interval
[ ai +1 , ai +1 + bi +1 ]
2.4. Arithmetic Coding i. The last step is to pack d jp , q , k ′ [ n′] into arithmetic codes for entropy coding [6]. First, we define wavelet packet coefficients as an independent G p,q,k ′ , vector of symbols, d = ∀2 j {d1 , d 2 ,...., d n′ } G that represents each d with a finite and nonoverlapping interval [ an , an + bn ] and include in G [0,1] , which is the dynamic range of d log2 . The G probability of occurrence of the d sequence is defined as
{}
n
bn = ∏ P ( di )
(17)
i =1
where P ( di ) is the i probability distribution of G d sequence and {a, b} is the upper and lower G bound of d probability distribution. th
ii. Next, to determine the probability distribution of each symbol, di , we initialise the {a, b} interval initial condition with a boundary factor of G {a0 = 0, b0 = 1} and the P d are determined as
( )
follows: P ( di ) = [ ai , ai + bi ] P ( di +1 ) = [ ai +1 , ai +1 + bi +1 ] #
#
(18)
P ( di + n ) = [ ai + n , ai + n + bi + n ]
which can be written as n −1
ai +1 = ai + bi ∑ Pi and bi +1 = bi Pn
(19)
i =1
(19) forms the probability distribution table of the G symbol vector d of an arithmetic sequence. iii. Finally, we defined cn ∈ [ an , an + bn ] as a binary prefixed code with rn ≤ R binary bit representation
of P ( di ) with a given i th interval, where R is the coding bit budget and rn is defined as n
rn = −∑ p ( di ) log 2 p ( di )
(20)
i =1
354
until
[ an , an + bn ] .
The resulting
binary representation (compressed image bitstream) of d n , cn has rn bit vector of − ⎡⎢ log 2 bn ⎤⎥ ≤ rn ≤ − ⎡⎢log 2 bn ⎤⎥ + 2,
(21)
n
for
log 2 bn = ∑ log 2 P ( di ) and i =1
G G Η d = E log 2 d ,
( ) {
}
and that satisfies G G G 2 Η d ≤R d ≤Η d + . n
( )
( )
( )
(22)
(23)
The reconstruction of f ′ is performed with arithmetic de-coding [6], reverse EFRM with (15) and [5], de-quantisation and inverse DWPT [3].
3. Experimental Results and Discussion 3.1. Experiments The WPEFRM algorithm is implemented as described in Section 2 with D6 wavelet packet transform and the maximum decomposition level is set to l = 4 . Experiments were conducted on three 8-bits greyscale images of resolution 512 × 512 , Barbara (Figure 4): an image of a lady with strip and checkers tablecloth (high frequency components and oscillatory texture everywhere in the image). Goldhill (Figure 5): the facades in the image generate segmented non-periodic patterns with a non-smoothed surface, and the sky in the image provides a large smooth low frequency region. Brickwall (Figure 6): this image is composed of a mixture of periodic texture (bricks, high frequency components with structural horizontal-vertical lines) and smooth regions (wall surface, low frequency background).
The best basis geometry of each test image is presented in Figure 4, Figure 5 and Figure 6. The selection of the best wavelet packet basis is determined based on the entropy factor defined in (9). The best basis search is limited to l = 4 , a four levels of DWPT decomposition. The principle argument to define a fixed decomposition level, l = 4 , and level-adaptation is based on empirical analysis. There is no significant improvement for EFRM if the DWPT subspace spatial
M2USIC 2006
TS- 3D
resolution is smaller than 16 × 16 . Thus, we defined l = 4 for all experiments, where the DWPT decompositions is set to be stopped at a subspace resolution of 32 × 32 .
38 36
Zone A
34 32
PSNR (dB)
500
400
30 28 26
300
24
200
AC AC+EFRM
22 100
0.0
0.2
0.4
0.6
0.8
1.0
Quantisation Level
0 0
100
200
300
400
500
Figure 4. Left: Barbara image and right: best basis geometry, l = 4 , D6
Figure 7. Barbara image compression results 36
34 500
Zone B 32
PSNR (dB)
400
300
200
30
28
26 100
AC AC+EFRM
24 0 0
100
200
300
400
500
22 0.0
Figure 5. Left: Goldhill image and right: best basis geometry, l = 4 , D6
0.2
0.4
0.6
0.8
1.0
Quantisation Level
Figure 8. Goldhill image compression results
500
32 400
30 300
Zone C PSNR (dB)
200
100
28
26
0 0
100
200
300
400
500
24
Figure 6. Left: Brickwall image and right: best basis geometry, l = 4 , D6
AC AC+EFRM 22 0.0
0.2
0.4
0.6
0.8
1.0
Quantisation Level
3.2. Results and Discussion Figure 9. Brickwall image compression results
The compression performance of WPEFRM is measured in terms of quantisation level (normalised from 0.10 bpp to 2.00 bpp, where bpp is bit-per-pixel) and the reconstruction fidelity (peak-to-signal-noise ratio, PSNR). The compression results of test images Barbara, Goldhill and Brickwall as presented in Figure 7, Figure 8, and Figure 9, show that the DWPT coefficients coded by AC with EFRM has better performance as compared to standard AC codec and, the PSNR improvement is recorded with an average PSNR of 0.312 dB, 0.258 dB, and 0.282 dB, respectively.
355
An important observation for compression results of the test images (referring to Figure 7, Figure 8, and Figure 9) are, at low compression bit rate, 0.0 < δ ≤ 0.3 the PSNR performance of both AC and AC+RFRM had very close average results; the PSNR difference is only 0.06 dB. However, at high quantisation bit rate, (referring to Zone A, B and C) 0.4 ≤ δ ≤ 0.9 we can observe that AC+EFRM outperformed standard AC with a significant PSNR index, an average PSNR of
M2USIC 2006
TS- 3D
0.308 dB for the Barbara image, 0.252 dB for the Goldhill image and 0.278 dB for the Brickwall image.
Zone D
Zone D
these sub-band discontinuities into a unity orientation for signal variation reduction, thus heightening PSNR gain with fewer entropy-codewords, and the wavelet maxima magnitude had been preserved with no degradation of coefficient amplitudes. The WPEFRM compressed-reconstructed image results for visual inspection are presented in Figure 10 ( Q = 0.5 ) , 11 ( Q = 0.6 ) and 12 ( Q = 0.5 ) .
4. Conclusions (a)
(b)
We have demonstrated the compression performance of WPEFRM on several test images. The proposed WPEFRM is a new image-coding scheme using energy feature re-mapping entropy pre-processor for an AC in DWPT. The EFRM proved be able to remap the local DWPT subspace, reduce the subspace orientation sensitivities, and restructure them into a unidirectional correlative data stream for improving the entropy coding gain. Experimental results shown that AC with EFRM outperformed standard AC by an average PSNR improvements of 0.312 dB, 0.258 dB, and 0.282 dB on test images Barbara, Goldhill and Brickwall, respectively. Thus, the AC with EFRM proved to be an effective algorithm to improve standard AC entropy coding performance.
Figure 10. Barbara. (a) AC and (b) AC+EFRM
Zone E
Zone E
(a)
(b)
Figure 11. Goldhill. (a) AC and (b) AC+EFRM
5. References
Zone F
(a)
Zone F
(b)
Figure 12. Brickwall. (a) AC and (b) AC+EFRM
The intuitive explanation is that in low compression bit rates, most of the transformed image coefficients had thresholded to zero amplitude, and that the sub-band orientation sensitivity caused by wavelet maxima discontinuity has been reduced due to more coefficients with zero order correlation that approaching the DC level. Thus, the EFRM remapper has not much impact on improving the AC gain with the sub-bands that have no high orientation sensitivity. In contrast, high compression bit rates tend to preserve more wavelet maxima magnitudes for better reconstruction fidelity. Thus, it generates a high degree of signal discontinuity in transformed sub-bands, and that causes more entropy codewords to affect the compression bit rate. That is, EFRM is able to re-map
356
[1] Mac Antonini, Michel Barlaud, Pieere Mathieu, Ingrid Daubechies, “Image Coding Using Wavelet Transform”, IEEE Trans. Image Processing, vol. 1, pp. 205 -220, April 1992 [2] Stephane G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation”, IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674 -693, July 1989 [3] K. Sayood, Introduction to Data Compression, 2nd Edition. San Francisco, California: Morgan Kaufmann Publishers, 2000 [4] Stephame Jaffard, Yves Meyer, Robert Ryan, Wavelets: Tools for Science and Technology, SIAM, 2001 [5] Matthew Teow, Sze-Wei Lee Ian Chai, “Entropy Code Pre-Processing Using Wavelet Subspace Energy Feature ReMapping”, Proc. MMU Inter. Sym. on Inform. and Comm. Technologies, pp. 148-151, P.J., Malaysia, October 2002 [6] Paul G. Howard, Jeffrey Scott Vitter, “Arithmetic Coding for Data Compression”, Proc. IEEE, vol. 82, pp. 857 865, June 1994 [7] Coifman, R.R, Wickerhauser, M.V., “Entropy-based algorithm for best basis selection”, IEEE Trans. Inf. Theory, vol. 38 (2), pp. 713-718, 1992 [8] M.V. Wickerhauser, “Comparison of picture compression methods: Wavelet, wavelet packet, and local cosine transform coding”, Wavelets: Theory, Algorithms and Applications, pages 585--621. Academic Press, 1994
M2USIC 2006
TS- 3D
Colour Number Coding Scheme for Time-Varying Visualisation in Glassy Ion Trajectories J. M. Sharif∗† , M. S. A. Latiff† and M. A. Ngadi† ∗ Department
† Department
of Computer Science, Swansea University, SA2 8PP, United Kingdom of Computer System & Communication, University Technology of Malaysia, 81310, Johor, Malaysia Email:
[email protected],
[email protected],
[email protected] Telephone: (+607) 5532384, Fax: (+607) 5565044
Abstract— In physics, ion trajectories has totally relied on statistical analysis from experimental and computer simulation results[23][15][33][20][10]. To help the physicists to identify and trigger the timeline and collaborative events in ion trajectory, we need the codes to distinguish the events according to timelinebased events. In coding theory concept, we need such a code that can represent each of the events in timeline series. Moreover, the code itself must help in identifying and trigger the events if there is a collaborative event among chaotic movements of ion trajectories. In particular, we propose a Colour Number Coding Scheme for depicting the time series of ion trajectories. We discuss the method of depicting the time series in relation to the encoding series of timeline events in ion trajectories. We also point out some of the advantages of this method in terms of accuracy according to human observer.
Keywords: coding theory, colour scale, visualisation, time-varying data, spatio-temporal visualisation, molecular dynamics, scientific data, hybrid scheme, visual representation I. I NTRODUCTION Since the late 1990’s, researchers in coding theory have been searching for developing, improving, applying or generating a codes. Several applications have been developed, including error controlling system [3], fault tolerant or fault diagnosis/monitoring [26], analysis model [21], developing framework [24] and communication system [5]. Some of the effort combined with another field of research such as complexity theory [31], system theory [27] and test pattern generators [32]. Some researchers borrowed a technique from coding theory concept to solve some research problem such as in testing problems [2], cryptography [30], optical flow [14], adaptive radar [12], algorithm [38] and neural network [37]. However, through the codes itself, many researcher were try to enhance the performance of the codes. For example, Kieffer was study the rate and performance of a sequence codes along a sample sequence of symbols generated by a stationary ergodic information source [16] but differently with Ishikawa, he used to improving the communication performance in hypercube multiprocessor with bus connection through coding theory concept [13]. Some researchers improve the codes in different way such as Vardy used to enhance the codes by minimum distance of the code [35]. Moreover, Garcia and Stichtenoth were shows an algebraic functions field as a useful tool for improving the codes by determine the number of
357
rational places [6]. For Rains, they improved the codes by determine the bound through finding the minimum distance of the codes using the length [25]. In ions trajectory, to help the physicists to identify and trigger the timeline and collaboration events, we need the codes that can be identify the events according to timelinebased events. From the above review, we need such a codes that can represent each of the events in timeline series. Moreover, the code itself must help in identifying and trigger the events if there is a collaboration among the ion trajectories. Many researchers have recently noted that there is a works for solving the problem in some research area with the help of coding theory concept such as in data security [30], optical flow [14], communication channel [12] and neural network [37]. One problem often overlooked when rendering time-varying data sets based on coding theory concept is to associate a particular event with a precise moment on the timeline. This is useful not only for determine the time of an event but also for identification corresponding parties involved in collaboration. Very few attention has been given in literature on timeline encoding especially in codes. Location codes is a labelling technique that represented tetrahedral elements within a mesh. Lee et al. [17] used this technique for labelling triangular faces. There are also a few authors used this idea for their works such as Evans et al. [4] who use an array where the label of a node determines the node location in the array. Thus, Zhou et al. [39] used this strategy to addressing the children and parents in managing the multiresolution tetrahedral volume data. A similar data structure is used by Gerstner and Rumpf [8] for extracting isosurfaces at different levels of details. Location code also has been used in spatio-temporal database research for labeling purpose as well [34]. Since then, there is another labeling scheme has been introduce like LPT code [1]. It was extension from the location code itself. The origin idea for labeling codes has comes from Gargantini [7]. She was introduced the effective way for represented the quadtrees with her codes called gargantini codes. After that, quadcode has been published by Li and Loew [18] for representing geometric concepts in the coded images, such as location, distance and adjacency. Designing efficient image representations and manipulations with bincodes has been proposed by Ouksel and Yaagoub
M2USIC 2006
TS- 3D
[22]. This codes will represent a black rectangular sub-image in the image. The code is formed by interleaving the binary representations of the x- and y-coordinates of the subimage and its level in the corresponding bintree. Some enhancement had been made on the bincodes itself by Lin et al. [19]. There is few more codes had been introduce in image representation such as sarkar’s code [28] [29], logicodes [36], restricted logicodes [36] and symbolic codes [11]. All those codes closely related to the image representation. Since there is no such a codes for time-varying datasets in ion trajectories. Here, we introduced our own codes that can be useful to visualise a series of timelines in ion trajectories. The paper is organized as follows. Firsts, we discuss existing and relevant codes which is available in time-varying visualisation. In Section 3, we divided the explanation into two concrete situations. We highlight the issue of timeline events which is before collaboration takes place and after collaboration which is called collaborative events. In the first section, we enumerate various parameters which are important in perceiving the accuracy of timeline events. We discuss the pros and cons of using those parameters in our visual representation with the same input datasets. We elaborate the issues of collaborative events in the following subsection. Finally, in section 4 we concluded our study.
Given a small set of key colours, c1 , c2 , ..., ck (k > 1) and distinctive interval-colour (e.g., while, black or grey depending on the background colour), we code a group of consecutive m vectors as a k-nary number, terminated by a vector in the interval-colour. Given n as the total number of vectors and we always assign the interval-colour to the first vector, we need to find the smallest integer m that satisfies Equation 1; ((m + 1)k m ) ≥ n
(1)
Figure 1 shows a quaternary colour coding scheme for ion tracks with 1000 vectors.
II. V ISUALISING I ON T RAJECTORIES In this section, we will first examine the more challenging task for visualising temporal information in order to identify the series of events and collaborative events. We will discuss the use of colour scale and code in our visual representations and present the methods for constructing and rendering composite visualisation that convey a rich a collection of indistinguishable visual features for assisting in a visual data mining process. A. Temporal Encoding When using visualisation to summarise a series of events along a timeline, perhaps the most difficult task is to associate a particular event with a precise moment on the timeline. This is useful not only determine the time of an event but also for the identification corresponding parties involved in collaborative, but collaborative events is not included at moment. We divided our temporal encoding scheme into two major part. The purpose of global colour scale is to allow the viewer to determine a time frame at global scale and the purpose of local colour scale is to enhance the accuracy our scheme in order to differentiating a different vector segments. In the next section, we would like to shows our implementation of local colour scale in depicting a series of timeline in ion trajectory. 1) Local Colour Scale: In order to correlate each vector segment with the timeline more accurately and hence to improve the differentiation of different vector segments, we introduce a Colour Number Coding Scheme in our visualisation.
358
Fig. 1.
Quaternary Colour Coding Scheme on trajectory of sodium #169
2) Results: In this section, as an example, we present a visual study of ion trajectory. In our case, quaternary colour coding scheme is considerable suit to represent the time series in ion trajectories. Here, we shows the results in relation to determine the best value for the parameter m and k based on equation 1. For instance, when n = 1000, using two key colours, say red and green, we need in m = 7 colour digits. We have m = 5 for k = 3, m = 4 for k = 4, and m = 2 when k reaches 19. The selection of m and k needs to address the balance between a smaller number of colours or a smaller number of colour digits in each group of vectors. The former ensures more distinguishable colours in visualisation, and the latter reduces the deductive effort for determine the temporal position of each vector. 3) Remarks: In this section, we consider the selection of m and k will dealing with our local colour scale. According to Equation 1, we will shows all possibility of parameter m and k have been applied on ion #169 trajectory. Our targeting goal from this method is to improve the correlation of each vector segment with the timeline more accurately and hence totally enhanced the differentiation of different vector segments. We introduced a Colour Number Coding Scheme in our visualisation. Again based on Equation 1, when k = 2, the minimum value for parameter m is 7 for n = 1000. We can increase the value of parameter m until m reaches 1000 which is the
M2USIC 2006
TS- 3D
(a) k = 2, m = 7
(b) k = 2, m = 8
(a) k = 2, m = 7
(b) k = 3, m = 5
(c) k = 2, m = 9
(d) k = 2, m = 10
(c) k = 4, m = 4
(d) k = 5, m = 4
(c) k = 2, m = 11 (d) k = 2, m = 1000 Fig. 2.
(c) k = 6, m = 4 (d) k = 256, m = 2
Experimental images for satisfying m parameter
Fig. 3.
maximum number of vector segment. This comparison task will illustrated in Figure 2. This figure show to us that when we increase the value of m until m = 1000, its loss the accuracy of local scale timeline because its does not give any meaning to the viewers. Our next experiment is to satisfied the value of parameter k. A k will represent the total of colours that will be used. Same with the previous experiment, we can increase the colour, k up to 1000 colours, k = 1000. Compare the results that we obtained from Figure 3, those images rendered with small value of k are visually distinguishable than the images rendered with the high value of k. It is clear that as the k or m are increase then the accuracy of local colour scale will loss as well. Thus, we need a balance selection between k and m that will satisfied our local colour scale. As a result, we choose m = 4 and k = 4 for n = 1000 that we called quaternary colour coding scheme as shown in
359
Experimental images for satisfying k parameter
Figure 1. B. Collaborative events The main objectives of this task is to discover if collaboration is exhibited between ions in the simulation results. As described previously, there is not well-defined description about collaborative events, although experiments suggested the existence of collaborative phenomena [9]. Therefore we have introduced a variable, ψ, representing the probability of collaborative. Given a set of m hypothesized criteria of collaborative, we have : ψ = ω1 ψ1 + ω2 ψ2 + ... + ωm ψm
(2)
where ωi is the weight of criterion i, and ψ1 +ψ2 +...+ψm = 1. In this work, we have considered two such criteria, namely (1) the ability for two or more ions to maintain similar orientation
M2USIC 2006
TS- 3D
and (2) the ability for two or more ions to maintain similar velocity. Given two corresponding vector segments, va,i and vb,i , belonging to two different ion trajectories, we have: D1 1 va,i • vb,i (3) ψ1 = 2 |va,i ||vb,i |
where D1 ≥ 0 is de-highlighting factor. The larger the D1 is, the less probable a vector is considered being involved in collaborative. With ψ1 , va,i and vb,i are considered to be in collaborative, if they follow a similar direction. As the velocity of an ion at particular time is reflected in the length of the corresponding vector segment, we define ψ2 as: D2 abs(|va,i | − |vb,i |) (4) ψ2 = 1 − |va,i | + |vb,i |
where D2 is similar to D1 for ψ1 . With ψ2 , va,i and vb,i are considered to be in collaboration, if they are of a similar length. Once we have computed all those ψ ∈ [0, 1], we can highlight or de-highlight the corresponding vector segments as shown in Figure 4. 1) Results: In this section, we shows the visual study of a cluster of six ions, including three sodium and three oxygen ions. Before visualisation, we make no assumption which ion may collaborate with each others in its motion. For each ions, we compute its probability of collaborative with each of the other ions in the cluster. We evaluate based on ψ1 and ψ2 as stated above. Below we present a small set of visualisation generated in the visual study. This image was computed using Na #211 as the reference ion, which displayed with a full set of its vector glyph. All other sodium ions are shown with a translucent tube in a warm colour, namely yellow or pink, while each oxygen trajectory is enclosed in a translucent tube in a cool colour. The glyphs are coloured using the quaternary colour number coding as in Figure 1. 2) Remarks: From this visualisations, did we gain more understanding of the collaborative phenomena than what has already been understood in physics? In general, we are intrigued by the following findings: 1) We have also confirmed that the collaboration in orientation is largely coincidental (Figure 4(a)). 2) While collaboration in velocity may also coincidental or at least influenced by the equispaced motion of different ions (Figure 4(b)).
(a) When ψ1 .
(b) When ψ2 .
Fig. 4. The possible collaborative between Na #211 with other five ions, when probability was computed based on ψ1 and ψ2
From the above works, we can say that applying coding theory in ion trajectories can indicate some important activities in relation to the timeline event such as trigger collaborative events. This would help us to explore the insight of the chaotic movement of ion trajectories. In future, we would help to examine the spatio-temporal collaborative among ions and to identify possible collaborative behaviours and their spatial and temporal association among the complex and seemingly chaotic atom movements, we have made a few new discoveries, which will hopefully enhance our understanding of ion trajectories. R EFERENCES
III. C ONCLUSION We have developed an effective visual representation, which have combined benefits from several schemes, including the use of glyphs for conveying orientation and velocity and opacity for highlighting and de-highlighting appropriate events. Our main contribution have included the introduction of colour number coding scheme, which conveys temporal information in high degree of certainty and the effective deployment of visualisation in mining complex spatio-temporal data sets.
360
[1] F. Betul Atalay and David M. Mount. Pointerless implementation of hierarchical simplicial meshes and efficient neighbor finding in arbitrary dimensions. 2004. [2] T. Berger and V. I. Levenshtein. Asymptotic efficiency of two-stage disjunctive testing. IEEE Transactions on Information Theory, 48(7):1741 – 1749, Jul 2002. [3] D. J. Costello, J. Hagwnauer, H. Imai, and S. B. Wicker. Application of error-contol coding. IEEE Transactions on Information Theory, 44(6):2531–2560, Oct 2004. [4] W. Evans, D. Kirkpatrick, and G. Townsend. Right-triangulated irregular networks. Algorithmica, 30(2):264–286, Mar 2001.
M2USIC 2006
TS- 3D
[5] A. Fujiwara and H. Nagaoka. Operational capacity and pseudoclassicality of quantum channel. IEEE Transactions on Information Theory, 44(3):1071 – 1086, May 1998. [6] A. Garcia and H. Stichtenoth. algebraic function fields over finite fields with many rational places. IEEE Transactions on Information Theory, 41(6):1548 – 1563, Nov 1995. [7] I. Gargantini. An effective way to represent quadtrees. Communications of the ACM, 25(12):905–910, Dec 1982. [8] T. Gerstner and M. Rumpf. Multiresolutional parallel isosurface extraction based on tetrahedral bisection. In Proceedings 1999 Symposium on Volume Visualization, ACM Press, 1999. [9] J. Habasaki and Y. Hiwatari. Fast and slow dynamics in single and mixed alkali silicate glasses. Journal of Non-Crystalline 320, 1-3:281– 290, 2002. [10] J. Habasaki, K.L. Ngai, and Y. Hiwatari. Time series analysis of ion dynamics in glassy ionic conductors obtained by a molecular dynamics simulation. The Journal of Chemical Physics 122, Feb 2005. [11] D. J. Hebert. Symbolic local refinement of tetrahedral grids. Journal of Symbolic Computation, 17(5):457–472, May 1994. [12] S. D. Howard and A. R. Calderbank. relationships between radar ambiguity and coding theory. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ’05)., pages 355 – 362. IEEE, 18-23 March 2005. [13] T. Ishikawa. Hypercube multiprocessors with bus connections for improving communication performance. IEEE Transactions on Computers, 44(11):1338 – 1344, Nov 1995. [14] M.A. Jabri, Ki-Young Park, Soo-Young Lee, and T.J. Sejnowski. properties of independent components of self-motion optical flow. In 30th IEEE International Symposium on Multiple-Valued Logic, 2000. (ISMVL 2000) Proceedings., pages 355 – 362. IEEE, 23-25 May 2000. [15] P. Jund, W. Kob, and R. Jullien. Channel diffusion of sodium in a silicate glass. Physics Review B, 64, 2001. [16] J. C. Kieffer. Sample converses in source coding theory. IEEE Transactions on Information Theory, 37(2):263 – 268, Mar 1991. [17] M. Lee, L. De Floriani, and H. Samet. Constant-time neighbour finding in hierarchical tetrahedral meshes. Proceedings of the Shape Modeling International 2001 (SMI’01), pages 286–295, 2001. [18] S.-X. Li and M. H. Loew. Adjacency detection using quadcodes. Communications of the ACM, 30(7):627–631, Jul 1987. [19] S.-H. Lin, P.-H. Wu, R.-P. Tzeng, and C.-Y. Huang. An efficient ,ethod for obtaining the level of the bincode. CVGIP:Graphical Models and Image Processing, pages 773–777, 2003. [20] A. Meyer, J. Horbach, W. Kob, F. Kargl, and H. Schober. Channel formation and intermediate range order in sodium silicate melts and glasses. Journal of Physics : Condensed Matter 1, 2004. [21] O. Milenkovic. Analysis of bin model with applications in coding theory. In International Symposium on Information Theory, 2004. ISIT 2004., page 224. IEEE, Jun 27 - Jul 2 2004. [22] M. A Ouksel and A. Yaagoub. The interpolation-based bintree and encoding of binary images. CVGIP:Graphical Models and Image Processing, 54(1):75–81, 1992. [23] J. Oviedo and J. F. Sanz. Molecular dynamics simulation of (na2 o)x (sio2 )1−x glasses relation between distribution and diffusive behaviour of na atoms. Physics Review B, 58:9047, 1998. [24] D.K. Pradhan. Logic transformation and coding theory-based frameworks for boolean satisfiability. In Eighth IEEE International High-Level Design Validation and Test Workshop, 2003., pages 57 – 62. IEEE, 2003. [25] E. M. Rains. shadow bounds for self-dual codes. IEEE Transactions on Information Theory, 44(1):134 – 139, Jan 1998. [26] H. Ren, Z. Mi, J. Diao, and H. Zhao. A novel power system fdi scheme based on petri nets and coding theory. In 2004 International Conference on Power System Technology - POWERCON 2004, pages 108 –113. IEEE, Nov 2004. [27] J. Rosenthal. Some interesting problems in systems theory which are of fundamental importance in coding theory. In Proceedings of the 36th IEEE Conference on Decision and Control, pages 4574 – 4579 vol.5. IEEE, 10-12 Dec 1997. [28] D. Sarkar. Boolean function-based approach for encoding of binary images. Pattern Recognit. Lett., 17(8):839–848, 1996. [29] D. Sarkar. Operation on binary images encoded as minimized boolean functions. Pattern Recognit. Lett., 18:455–463, 1997. [30] Silverberg, J. Staddon, and J. L. Walker. Applications of list decoding to tracing traitors. IEEE Transactions on Information Theory, 49(5), May 2003.
361
[31] D. A. Spielman. models of computation in coding theory. In Thirteenth Annual IEEE Conference on Computational Complexity,, page 120. IEEE, 15-18 June 1998. [32] R. Srinivasan, S. K. Gupta, and M. A. Breuer. Novel test pattern generators for pseudoexhaustive testing. IEEE Transactions on Computers, 49(11):1228 – 1240, Nov 2000. [33] E. Sunyer, P. Jund, W. Kob, and R. Jullien. Molecular dynamics study of the diffusion of sodium in amorphous silica. Journal of Non-Crystalline, 939:307–310, 2002. [34] T. Tzouramanis, M. Vassilakopoulos, and Y. Manolopoulos. Multiversion linear quadtree for spatio-temporal data. ADBIS: East European Symposium on Advances in Databases and Information Systems, pages 279–292, 2001. [35] A. Vardy. the intractability of computing the minimum distance of the code. IEEE Transactions on Information Theory, 43(6):1757 – 1766, Nov 1997. [36] J.-G. Wu and K.-L. Chung. A new binary image representation : Logicodes. Journal of Visual Communication and Image Representation, 8(3):291–298, Sept 1997. [37] L. Ying. Fractals, neural networks, cellular automata, formal language and coding theory. In IEEE International Conference on Systems, Man and Cybernetics, pages 1663 – 1669 vol.2. 1992, 18-21 Oct. 1992. [38] J.-H. Youn and B. Bose. A topology-independent transmission scheduling in multihop packet radio networks. Global Telecommunications Conference, 2001. GLOBECOM ’01. IEEE, 3:1918 – 1922, Nov 2001. [39] Y. Zhou, B. Chen, and A. Kaufman. Multiresolution tetrahedral framework for visualizing regularvolume data. In Proceedings of Conference on Computer Graphics(VISUALIZATION 97), pages 135–142. IEEE, Oct 1997.
M2USIC 2006
TS- 3D
COLORIZATION USING CANNY OPTIMIZATION TECHNIQUE J.T. Eng, K.S. Sim Faculty of Engineering and Technology, Jalan Ayer Keroh Lama, 75450 Bukit Beruang, Melaka, Malaysia.
[email protected],
[email protected]
suitable for natural images because it colors the bright intensity image pixels with bright colors, and dark intensity image pixels with dark colors, based on the limited color mapping. Colorization using computer is a cost effective method to colorize the image. One of the main limitations of colorization technique is the requirement of color assignments from users. It would be troublesome and time consuming if a large number of color marking pixels are needed. Recently, colorization methodology with the aid of adaptive edge detection was introduced by Huang et al. [4]. Anat et al. [5] applied optimization techniques on colorization. Hence, a new method for colorization techniques is needed to simplify the work of the users.
Abstract This paper proposes a new approach to improve current colorization techniques for gray images. Due to the weaknesses and problems of human intervention on manual colorization techniques, we propose to build a simple and user-friendly colorization technique which helps to save time by using minimal color markings, and yet produce satisfactory colorization results. Most of the colorization techniques involve image segmentation or region tracking. However, the limitation of automatic segmentation tends to degrade the quality of image with fuzzy or complex region boundaries after colorization. Optimal colorization technique, which transfers color based on the similarity of intensities between neighboring pixels, needs a large number of color assignment inputs from the user and yet still could not produce guaranteed results. In this paper, we propose to combine the Optimization algorithm, has the ability of color mixing, with the Canny edge segmentation technique, which gives better control and illustration for color flow in images.
2. Previous Work 2.1 Segmentation and Optimization methods
1. Introduction Colorization is a method to add color to monochrome image or video using computer. This process was invented and used by Wilson Markle in 1970 [1]. Although color images and movies are available, there are still many precious, valuable black and white images, and movies that need to be colorized. Besides that, colorization is used extensively nowadays in medicine and research field such as Magnetic Resonance Imaging (MRI), X-ray, and Computerized Tomography (CT) to enhance the image visualization [2]. In most of these cases, the pseudo-coloring technique (Any pixels whose gray level is above the intensity-slicing plane will be coded with one color. Any pixel below the plane will be coded with the other.) [3] is used for these images. However it is not
(a) (b) Figure 1: Table images extracted from Barbara images. (a) Segmented image and (b) colorized image.
(a) (b) Figure 2: Table images extracted from Barbara images. (a) Color markings image and (b) colorized image by Optimization method.
362
M2USIC 2006
TS- 3D
In the segmentation method, gray image is first automatically segmented. After that the assigned color is filled into each segment. For Optimization method, color markings are made closer to the edges and colors are transferred between similar intensities neighboring pixels [5]. An experiment on segmentation method and Optimization method are done; the results are compared in Figures 1 and 2, respectively.
controlled under only Optimization algorithm. Thus edge detection is recommended to solve the problem.
3. New Methodology 3.1. Theories
In Figure 1, we use the Normalized Cuts segmentation algorithm that can produce complete edges for segmented-regions to do segmentation [6]. Figure 1(a) shows that the automatic segmentation cannot detect some edges when the region is small—or having fuzzy boundaries. Thus, wrong color is assigned to these segmented regions, as shown in Figure 1(b). However, it has clear edge between regions and reduces the color scribbles that need to color the image since one complete segment only needs a small color marking. Figure 2(a) shows that Optimization method needs more color markings than segmentation method. Thus, it is time consuming and troublesome for the user to make a large number of color scribbles and yet produce an unsatisfactory result. Figure 2(b) shows that the color flow between background and objects are uncontrollable without marking the colors close enough to the edge. For Optimization algorithm, the variance of monochromatic luminance value inside its 3x3 neighboring pixels is calculated. Then, the weighting function (
proportional to the intensities difference between pixels [5]. The equation is 2
2 r
(b)
(c)
(d)
(e) (f) Figure 3: (a) Image after marking with colors, (b) colorized image after Optimization, (c) image after colors marking through conventional Canny edge detection method, (d) colorized image after conventional Canny edge detection method, (e) segmented image after Canny Optimization method, and (f) colorized image after Canny Optimization method.
ω rs ) is derived as inversely
ωrs = ke − (Y ( r ) −Y ( s )) / 2σ ,
(a)
(1)
where Y(r) and Y(s) are intensities of the pixels; and k is the proportional constant value. The weight is large when the difference between two intensities Y(r) and Y(s) is small, and the color difference between pixels is based on the weight.
In this new Canny Optimization method, we use Canny edge detection, which has shorter execution time to produce desired edges. By using the same system (AMD 1800+, 512MB DDR RAM) to do segmentation on a 160x160 pixels image, normalized cut segmentation takes 35.74 seconds and Canny segmentation takes 1 second. The theories of the Canny optimization can be easily illustrated by using secondary
Therefore, Optimization algorithm takes the advantage of mixing the color for the pixels that have similar intensities and makes images look natural. However, it would become a disadvantage when two objects are having similar intensity and are overlapping, as shown in Figure 2. The flow of colors between the objects with similar intensities could not be
363
M2USIC 2006
TS- 3D
colors mixing circles shown in Figure 3. Figure 3(b) shows that the colors inside the circles are flowing out over the boundaries without control after being colored by Optimization method. But for the overlapping parts of circles which do not have any color assigned, Optimization algorithm has made a good guess on the color assignment because the colors are mixed up very well. Red, green, blue, and black colors are shown clearly in the image.
In the new Canny Optimization algorithm, the gray image is first converted into luminance and chrominance color system, which is the National Television System(s) Committee (NTSC) format, using YUV color system; Y is the luminance channel having gray intensity level, while U and V the chrominance channels contain color; the image color information. So, after the conversion, a black and white luminance image is produced. Next, conventional Canny edge detection with appropriate threshold and sigma value is performed onto the luminance image. An overlay of the edges onto the image will give better illustration to the user for marking the color. The ambiguity of the appropriate color marking positions is solved once we know the edge segmentation regions. At least one color mark is needed within an edge-segmented region. We can also mark the color across the edge if we merge the segment regions with same colors or to cancel the extra edges. The elimination of the edges is performed on the pixels where color marking and edges are intersected. When gray level color is marked, it preserves the original grayscale color in that marked segment region. Chrominance channel U and V are extracted from the color marked image.
In Figure 3(d), colors of the circles are wellcontrolled inside the boundaries but the overlapping parts of circles are having some errors on color assignment after conventional Canny edge detection is added. Finally, Figure 3(f) is the result through Canny Optimization method that takes the advantages of color mixing in Optimization algorithm and color flow control of edge segmentation. This is done by superimposing the conventional Canny edges onto the image and breaking the edges at the necessary regions, as shown in Figure 3(e). This edge elimination method is applied when there are some extra edges occurring during edge detection. Likewise, when there are some missing edges between two objects where their intensities are almost the same, edge can be added to image to control the color flow of the object.
After the cancellation of the unnecessary edges, the program overlays the remaining edges onto the luminance image for the user to determine and modify the edges. The user can add black color for edge adding and white color for edge erasing.
3.2 Flow Chart Source gray image Convert to luminance channel
Next, the modified edge image is used as Y channel and combined with the U and V channels that are previously extracted from the color marked image. Optimization algorithm is performed onto this image. Once the image is colorized, the original luminance channel replaces the Y channel again, so that the edge would not be visible. Finally, edge re-coloring process is performed before the final colorized image is produced, to minimize erroneous color assignment on the edges.
Overlay edges onto gray image
Canny edge detection
Color marking from user Detecting color marking Cancel the edges that intersect with color Overlay edges onto image Extract chrominance channels Modify Canny edge process
Combine to form NTSC image Colorization using Optimization Restore original luminance channel Recolor the edges pixels Output colored image
Figure 4. Canny Optimization Flow Chart
364
M2USIC 2006
TS- 3D
4. Result
(a)
(a)
(b)
(c) (d) Figure 5: House images. (a) Image after marking with colors, (b) colorized image after Optimization method, (c) colorized image after conventional Canny edge detection method, and (d) colorized image after Canny Optimization method.
(a)
(b)
(c)
(d)
(b)
(c) (d) Figure 7: Cameraman images. (a) Image after marking with colors, (b) colorized image after Optimization, (c) colorized image after conventional Canny edge detection method, and (d) colorized image after Canny Optimization method.
(a)
(b)
(c) (d) Figure 8: Tank images. (a) Image after marking with colors, (b) colorized image after Optimization, (c) colorized image after conventional Canny edge detection added, and (d) colorized image after Canny Optimization method.
Figure 6: Barbara images. (a) Image after marking with colors, (b) colorized image after Optimization, (c) colorized image after conventional Canny edge detection method, and (d) colorized image after Canny Optimization method.
365
M2USIC 2006
TS- 3D
onto the image about the best positions to do color marking. The time for trial and error using the original Optimization method can be saved. With the new technique, a promising colorized result can be obtained. In the future, it is possible to convert this manual colorization method to automatic colorization method mean of object recognition, with precise edge detection.
5. Discussion The edge detection used on the images is the Canny edge detection. In Figure 5(b), the color is mixed up near the windows and the roof after being colored by the optimization algorithm. In Figure 5(c), the roof of the house looks better without mixing the blue color by adding conventional Canny edges onto it. For Figure 5(d), the house looks nicer without color mixing errors after Canny Optimization technique is applied. Figure 6 is the Barbara images. Figure 6(a) shows the color markings onto the grayscale image. In Figure 6(b), the skin color from Barbara mixing with the color from the floor and clothes after colorized by using Optimization algorithm. In Figure 6(c), the majority color mixing problems are solved but there are still some minority errors due to incomplete edges. In Figure 6(d), the Canny Optimization method helps to control most of the color mixing between the objects by using complete edges. Similarly, Figure 7 is the Cameraman image. Figure 7(a) shows the color markings onto the grayscale image. In Figure 7(b), parts of the grass field are having dull green color due to the influence by nearby objects after colored by using Optimization algorithm. In Figure 7(c), the green colors in the grass field looks nicer but the color of buildings are influenced by the blue color from the sky after conventional Canny edges added. In Figure 7(d), the color problems in the grass field and the buildings are solved after Canny Optimization method. Figure 8 is the tank image. Figure 8(a) shows the color markings onto the grayscale image. In Figure 8(b), the green color from the tank is flowing out to the grass after colored by Optimization algorithm. In Figure 8(c), the color problems still occurred after conventional canny edges are added. In Figure 8(d), the color mixings between the tank and grass is under controlled after Canny Optimization method is used.
7. References [1]
[2]
[3]
[4]
[5]
[6]
6. Conclusion Both the colorization methods using Optimization and segmentation have their advantages and disadvantages. In the new Canny Optimization algorithm, we combine the advantages of these two methods and overcome their limitations. The combination gives a better and plausible colorized result, with minimal color marking pixels. Furthermore, it provides guidance to the user after overlaying the edges
366
G. Burns, “Colorization”, Museum of Broadcast Communications, http://www.museum.tv/archives/etv/C/h tmlC/colorization/colorization.htm T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring Color to Greyscale Images”, ACM Transactions on Graphics 21(3), (July 2002), 277280. R. C. Gonzalez and R. E. Woods, “Digital Image Processing”, AddisonWesley, 1992. Y.C. Huang, Y.S. Tung, J.C. Chen, S.W. Wang, and J.L. Wu, "An adaptive edge detection based colorization algorithm and its applications". ACM MM, 2005, 351-354. L. Anat, L. Dani, and W. Yair. "Colorization using optimization". ACM TOG 23(3), 2004, 689-694. J. Shi and J. Malik, “Normalized Cuts and Image Segmentation”, In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000, 731-737.
SESSION TS3E TOPIC: COMPUTER APPLICATION & CYBERNETIC SESSION CHAIRMAN: Prof. Lim Swee Cheng _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________
2.30pm
TS3E-1
2.50pm
TS3E-2
3.10pm
TS3E-3
3.30pm
TS3E- 4
CITRA: A Children’s Innovative Tool for Fostering Creativity Siew Pei Hwa (Universiti Tunku Rahman, Petaling Jaya, Selangor, Malaysia)
367
An Efficient Technique of Decomposing Bangla Composite Letters Mohammad Mahadi Hassan Riyad Ush Shahee Mohammed Saiful Islam Mohammad Shahnoor Saeed Mohammed Nizam Uddin (International Islamic University Chittagong , Bangladesh)
374
Radio Frequency Identification (RFID) Application in Smart Payment System Danial Bin Md Nor Mohamad Izwan Bin Ayob Mohd Helmy Bin Abd Wahab (Kolej Universiti Teknologi Tun Hussein Onn, Batu Pahat, Johor, Malaysia)
Enhance Polyphonic Music Transcription with NMF Sophea Seng Somnuk Phon- Amnuaisuk (MMU Cyberjaya, Selangor, Malaysia)
379
384
M2USIC 2006
TS- 3E
CITRA: A Children’s Innovative Tool for Fostering Creativity Siew Pei Hwa Faculty of Information and Communication Technology Univerisiti Tunku Abdul Rahman
[email protected]
knowledge as interactive multimedia documents (e.g. multimedia stories).
Abstract The technological applications, especially the use of multimedia courseware have become more common in today’s education, stimulating innovative approaches to teaching and learning. This paper describes an interactive multimedia courseware package called “CITRA”, aimed to foster creativity in children. Four strictly interrelated learning modules are available within each interactive multimedia courseware in the package. This product is addressed to children of the primary schools, who can practice thinking, reading and writing skills in a pleasant and appealing manner. This paper also describes the author’s experiences in using multimedia approach to motivate creative writing in a Standard 3 classroom. The author form a classroom study on the potential uses of storytelling and reading activities, and electronic games for enhancing creative writing in intermediate grade classrooms. The author believes the experiences from this study can provide insight on issues of importance to research on multimedia learning in the classroom and to education as a whole.
This study examines the role played by information communication technologies as cognitive tools in the classroom. One of the objective of study is to promote and foster creativity in children. CITRA is intended to help children be creative and motivate them to do what they can enjoy. Children can create stories by using multimedia. The Creating Story pages, which instilled in the Mind Test Land module, serve as a multimedia creative environment that allows the children to construct their own personalized story. 2. Description of CITRA CITRA reflects a new, multimedia approach to foster creativity sense in children. It is an interactive and flexible tool, centered on the child, who gradually constructs his/her knowledge. The teaching and learning environment in CITRA includes four strictly correlated learning modules. 2.1. Storytelling World Module
Keywords: Creativity, multimedia courseware, storytelling, traditional oral narratives, thinking skills
Storytelling World (Dunia Mari Bercerita) module is a module that introduces the children to various kinds of traditional Malay oral narratives via digital storytelling technique. This module incorporates a variety of media such as audio, graphics and animation in presenting the stories. The focuses of this module are storytelling and projecting the positive images of stories. This module also allows children to practice and promote their comprehending and listening skills.
1. Introduction In recent years, research activities on uses of computers in school have increased greatly. The notion that children learn by constructing their own knowledge is highly popular among educational theorists. Children ought to be active, not passive, in the learning process. They ought to be doing something, not merely watching it. Multimedia technologies offer children the opportunities of learning “actively” by allowing them to construct
2.2. Enjoyable Reading World Module Enjoyable Reading World (Alam Baca Ria) module (Figure 1), a module that also introduces the children to
367
M2USIC 2006
TS- 3E
various kinds of traditional Malay oral narratives. It is developed based on Whole Language approach that children can learn reading more effectively. The whole language approach is chosen, as storytelling is very adaptable to this approach [3]. Based on this approach, children are taught to read not phonetically but by meaningfulness of the word. When the words are sounded by the system and children find them meaningful, they are able to remember the whole word better [4]. The words chosen for the stories are based on the natural language of children age eight to nine year-old. Besides integrating audio, graphics and animation in presenting the stories as shown in Storytelling World Module, this module also encompasses text. This is the different between Storytelling World module and Enjoyment Reading World module. In addition to projecting positive moral values, this module is able to motivate children and cultivate reading habit indirectly due to the multimedia approach and tutoring strategies of scaffolding, self-explanation and hyperlinks provided in this module. This module adopted interactivity and perpetual navigation approaches. Children can interact with the system. For example, clicking on an active text, a button or an icon will direct the children to definitions and further information of terms and allow the children to print story and navigate other plots of story or modules in the courseware.
Four activities that adopted the problem solving, interactivity and perpetual navigation approaches are built into the Mind Test Land (Taman Uji Minda) module. The activities are fun, mind stimulating and motivating, and are related to previous three learning modules. The four activities are Knowledge Test, two kinds of electronic games, and Creating Story Pages. There are four different tests or quizzes built into the Knowledge Test activity: Comprehension Test, Vocabulary Quiz, Good Moral Values Appreciation Test and Good Moral Values Application Test. 2.4.1. Knowledge Test Activity. The Comprehension Test and Vocabulary Quiz are designed and developed with the objective of testing and evaluating the children on their overall understanding of story and words presented in previous three learning modules. Children themselves can thus monitor their achievement and performance based on the feedback acquired from the tests or quizzes. Good Moral Values Appreciation Test (Figure 1 and 2) is designed based on the syllabus or curriculum of primary school (KBSR) moral education. The curriculum incorporates sixteen key good moral values: good-natured, independent, good manners, to respect each other, love, justice or fairness, freedom or liberty, bravery, cleanliness, purity or hygiene, honesty, hardworking, co-operation, simplicity, thankfulness or gratitude, rational, and social living. Each good moral value entails a number of sub-values respectively. However, not all the values and sub-values are integrated in a story. Therefore, before the process of CITRA development commenced, the author had read critically the selected stories to identify positive and negative elements, and good moral values integrated in the stories. Good moral values that have been identified in a story and their sub-values will be explained briefly. Then only sub-values that embedded in story will be putting a question to the learner.
2.3. Word Enrichment Corner Module The difficult words or vocabularies as identified in the story in Enjoyment Reading World Module will be reinforced and made meaningful to the children through text, graphics and audio in the Word Enrichment Corner (Sudut Pengayaan Kata) module. For certain words, a motion video-in-a-window is attached to present the explanation, which makes the children understand the vocabulary better. This module is hyper linked to Enjoyment Reading World Module, where the vocabulary is. The children can also retrieve the word or vocabulary using the Quick Search Menu prepared in this module. The interactivity is also an important component in the module. This module allows the child to extend his or her vocabulary with the terms used in story. The main objective of developing this module is to enrich the children’s vocabulary.
2.4. Mind Test Land Module
Mouse enter any sub-values in purple to get the explanation
368
Click on any sub-values in red to answer question (see Figure 2 for a sample screen of the question)
M2USIC 2006
TS- 3E
practice and promote their affective and cognitive skills via the tests. Figure 1.
A Sample Screen from Good Moral Values Appreciation Test
2.4.2 Electronic Games. The next two activities that built into the Mind Test Land module are various kinds of electronic games such as jigsaw puzzle, sliding puzzle, memory game, maze game, hangman game, and so forth. These activities allow children to practice and promote their cognitive and psychomotor skills. 2.4.3 Creating Story Pages. Creating Story Pages (Figure 4) are instilled in the module with the aim to enrich the children’s literacy experience and to motivate creativity sense in children. The activity allows children to compose their own stories based on the given pictures and using either the information already presented in CITRA, or other sources external to CITRA. A menu of icons suggesting the functions that can be performed is provided. The icons are generally self-explanatory, thus leading children to identify after a few trials their correct meaning and to understand how to use the tool (an on-line help is, however, available). Overall, works with the Creating Story Pages are an appealing and friendly task that children can perform as if it were a game. It is however advisable that the teacher is present for supervision.
Based on the question raised, children can type their answer in the space provided. The answer then can be saved or printed out for discussion amongst peers or with teachers
Figure 2.
A Sample Screen of Question from Good Moral Values Appreciation Test
The final test in the Knowledge Test activity is Application Test (Figure 3). The Application Test aims to evaluate and reinforce the moral sense of children that relates to their real life experience.
Based on the question, children can type their answer in the space provided. The answer then can be saved or printed out for discussion amongst peers or with teachers
Figure 3.
Figure 4.
A Sample Screen from Creating Story Page
When opening a Creating Story Page, the screen displays an open exercise book with facing pages (Figure 4). The left-hand page contains the title of story to be created, a picture, the page number, and direction icons (next page and previous page). The right-hand page (empty at the start) is the environment in which the creating story activity can be carried out, as well as icons corresponding to the different “functions” that can be selected such as “save”, “print”, “open”, and “delete and rewrite”. Invented stories and personal experiences interconnected with the pictures are the
A Sample Screen from Good Moral Values Application Test
The Good Moral Values Appreciation Test and Application Test intend to furnishes a cognitive understanding of human behavior, expand life experiences and yield a sensitivity sense to the use of moral sentiments as important tools in coming to terms with human experience. Meantime, the children can
369
M2USIC 2006
TS- 3E
thinking and more independent learning. They enjoy flexing their brainpower and applying the skills they have already developed to acquire further knowledge. They prefer software that is able to test their growing ability to apply logic to arrive at conclusions. To dovetail with schoolwork, the software should feature a variety of activities and problem-solving challenges [2]. The creation of CITRA attempted to embed some of the sound theoretical concepts and desired enhancements within the courseware design and to offer more support to children at the intermediate stage of literacy. The children have no major difficulties with the interface but do need help available from time to time. In addition to helping children master academic skills in school, it is important to give them opportunities that fuel their imaginations and inspire critical and creative thinking skills. The CITRA package can help here, too. The Storytelling World module and Enjoyment Reading World module take children on a fantasy adventure. Children call on their own creative powers to puzzle their way through the adventures. At its core, storytelling is the art of using language, vocalization, and/or physical movement and gesture to reveal the elements and images of a story to a specific, live audience. Most dictionaries define a story as “a narrative account of a real or imagined event or events”. Within the storytelling community, a story is more generally agreed to be a specific structure of narrative with a specific style and set of characters and which includes a sense of completeness. Stories are an influential form of passing on accumulated wisdom, beliefs and values. Stories explain how things are, why they are, and a community’ or an individual’s role and purpose. Stories are the building blocks of knowledge, the foundation of memory and learning. Stories connect an individual with his/her humanness and link past, present, and future by teaching his/her to anticipate possible consequences of his/her action. Narrative is a fundamental aspect of the human experience. It seems that the empathic approach needed for stimulating personal growth and development may be similar to that needed to enable creativity. The combination of stimulation and receptivity to different ideas creates a fertile ground for the creative imagination [1]. Children can choose scenes and characters from home and school or from traditional oral narratives such as fairy tales. They may stay with the familiar, but may equally mix and match the objects and characters. This ability to mix the familiar and the novel, the past, present and future enables children to use their imagination and to challenge and confront existing
important tasks that can help the child become actively involved in the work. Meantime, in the Creating Story Pages allow the child to organize his knowledge, creating different “worlds” that correspond to his ways of looking at reality. A story can pertain to different worlds at the same time. The possibility of creating stories in the Creating Story Pages can also be exploited by the teachers, who can use it in the daily activities to draw the attention of the children on story writing practice. After having performed a great many practices on this activity, it is unlikely that the child will face the story writing difficulties in the future. In brief, this module can stimulate creative writing and allows children to write humorous or thoughtprovoking stories in a fun environment. It is believe that creative writing can be supported and encouraged through this activity.
3. CITRA Creative Learning Environment The children can use CITRA as an innovative tool at their own pace, working in front of the computer, supervised and guided by the teacher. The child who wishes to use the tool should first digit his name. The screen displays the welcome phrases and the title of the story to be presented. The child is then introduced into the “main menu” (Figure 5), where the four learning modules (the label “Dunia Mari Bercerita”, “Alam Baca Ria”, “Sudut Pengayaan Kata” and “Taman Uji Minda”) with relevant graphics, which adopted intuitive metaphor concept, can be seen clearly, placed on the screen respectively. The graphical interface of the program is clean and pleasant. The icons relative to the different functions are shown and activated as the work proceeds.
Figure 5. A Sample Screen of Main Menu By the time children reach Standard 3 or 4 (children age 8-10 years-old), they are capable of abstract
370
M2USIC 2006
TS- 3E
The sample of the study comprised of 85 Standard 3 students from a primary school in Kajang, Selangor. Two weeks before the study commenced, 41 children in multimedia groups were given a “warming up” session to familiarize them with computer and CITRA. Then they worked on the four learning modules at their own pace during the treatment. Meantime, the conventional groups (44 children) were tested using printed storybooks. The behaviors of the children while working with the CITRA and printed storybooks were also been observed and noted in research notes. Once the children completed working through the printed storybooks and the four learning modules of each courseware (six pieces) in CITRA respectively, they were interviewed based on an interview protocol, and were tested using activity books that containing questions pertaining to what they learned in CITRA and printed storybooks respectively. Answers collected in the activity books then were transformed into five Likert-type scales in different sets of questionnaire based on the indicators, which had been set before the study commenced. Thereinafter, the analysis of data was done based on parametric approach using SPSS. The results presented in this paper are based on a hypothesis sounded: “Multimedia approach is more effective in practicing creativity compared to the conventional approach via printed materials. The findings showed that there is a significant difference, (t = 8.263; df = 83; p < 0.01), between two tested groups, namely creativity practice using multimedia courseware via CITRA (mean1 = 37.9195 ± 0.6319) as compared to conventional method via printed storybooks (mean2 = 30.2636 ± 0.6732). The results are shown in Table 1.
norms and stereotypes, to change power relations and also to have fun and enjoy their writing. Besides, in the word banks children can find story hints for assisting them in writing. CITRA, especially Creating Stories Pages activity is designed to facilitate creativity and perspective-taking through cross-fertilisation of ideas. It will reflect the child’s attainment level and may retain knowledge from the previous use of the courseware. Using all kind of senses such sight, hearing, smell, touch and yes, even tastes, can stimulate new ways of learning. Software programs, such as CITRA, include listening, reading and writing activities encourage children to build creativity sense and promote their thinking skills. One of the computer’s greatest strengths is that it takes children to compelling graphic worlds which they can control. CITRA also capable help a child get in touch with his/her intuition. Those powerful “feelings” often lead ones to the right answer to a problem. Encouraging a child to question things is an important strategy for fostering intuitive thinking. Learning to come up with novel and unusual solutions is a skill that can help children be successful in all areas of their lives. CITRA can foster creative thinking in many ways. The Mind Test Land module, for example, a module that offers different kind of activities, invites children to travel to a challenging world that develop creative thinking skills. After observing creative writing sessions, discussions were held where teachers envisaged how the new technology might support creative writing in the classroom. With the help of interviews, results indicated that children were also able to contribute to the detailed provision within the courseware. Observations of existing classrooms showed children and teachers making links between the known and the unknown, between reality and fantasy, between their worlds and the worlds of others. This was accomplished by reading, retelling and discussing traditional fairy stories, with use of role-play to explore the feelings and interactions of characters within the stories.
Table 1. Comparison of Multimedia Group (CITRA) and Conventional Group (Printed Storybooks) on The Effectiveness of Practicing Creativity Amongst Children Mean ± S.E. 37.9195 ± 0.6319 Conventional 30.2636 ± 0.6732 t > t0.01; N = 85; df = 83** p < 0.01
Multimedia
4. CITRA Evaluation and Findings For the study on effectiveness of CITRA, the observational ethnographic and quasi-experimental methods were used. Various research instruments, for instance questionnaire, activity books, research notes, and interview protocol were designed and created. The reliability of the major instrument, namely seven sets of questionnaire was tested. The values of Cronbach's coefficient alpha ranged from 0.82 to 0.97.
t 8.263
p 0.000 **
Results in table 1 showed that t > t0.01 (8.263 > 2.756), which can be concluded that practicing creativity using multimedia courseware (CITRA) was more effective compared to conventional instructional method via printed storybooks. The difference is clearly shown in Figure 6.
371
M2USIC 2006
TS- 3E
Media and technology are pervasive forces in today’s world. People rely on various kinds of media and technology for much of their information and entertainment. Computer technology is the latest page in the history of technology and written language. The use of computers for composing and disseminating textual information electronically is rapidly becoming a common experience. In this new century, besides traditional books, cassette or tape recorders and visual media such as television, computers that provide new means of teaching and drilling specific reading and writing skills had led to widespread interest in using computers for instruction in language arts. The author fervently hopes that CITRA will be a useful innovative tool to project the intrinsic positive moral values of traditional oral narratives, cultivate reading habit and assist children in Malaysia to enhance literacy skills in their own vernacular language, and promote their creativity sense and thinking skills. The authors also believe that CITRA will be used for outreach to teachers, parents, and individuals in society who are interested in new curricula and education reform.
The Figure 6 revealed that none of the children in conventional method group’s ratings fell on the value “Very Effective” compared to the children in multimedia group was 4.9 percent. On the contrary, none of the children in multimedia group’s ratings fell on the value “Not Effective” but the conventional method group was 4.5 percent. Meantime, the percentage of children in multimedia group that contributed to the value “Effective” was apparently higher (80.5 percent) than the conventional method group (22.7 percent). Most of the children in conventional method group’s rating fell on the value “Average” (72.7 percent), however the children in multimedia group was only 14.3 percent. The results indicated that using digital storytelling that adopted multimedia approach in practicing creativity is more effective than the conventional approach through printed materials. 90 80
Pecentage
70 60 50 40 30
6. Acknowledgements
20 10
This work is supported by IRPA project 04-02-020054-EA221. Thanks to the CITRA team in Universiti Kebangsaan Malaysia, especially Allahyarhamah Associate Professor Dr Norhayati Abd Mukti, Associate Professor Dr Hanapi Dollah, the teachers from S.J.K. (c) Pei Min Segari, Lumut, Perak, and the teachers and students from S.K. Convent Kajang, Selangor.
0 Not Effect ive
Average
Effect ive
Very Effect ive
Le ve l of Effe cti ve ne ss
Multimedia
Convent ional
Figure 6. Percentage of Samples and Effectiveness of Practicing Creativity
7. References
5. Conclusion
[1] Members of the National Storytelling Association, What is storytelling, (online), 1997, http://www.seanet.com/~eldrbarry/roos/st_defn.htm [2001, December 3].
Educators, philosophers, psychologists, and others have long been interested in the processes of children’s literacy acquisition and development. Most of those concerned with children’s emerging literacy found that literacy development starts at birth and grow rapidly through literacy learning and schooling. To learn to read and write, children have to be familiar with language and use it to communicate. Children need to develop communication skills in listening, comprehending, speaking, reading and writing. Each of these skills is part of children’s emerging literacy. One of the ways in which very young children learn about literacy is by parents reading to them.
[2] Raskin, R., Back To School Goes Digital: The Computer Makes Learning Fun For Today’s Kids, (online), 2004, http://www.robinraskin.com/articles/tlcpdfweb.pdf [2004, February 29]. [3] Teaching Grammar through a Whole Language Approach Conceptual Orientation (online), http://web.lwc.edu/staff/fmoore/classwork/wholean.htm [1999, November 23].
372
M2USIC 2006
TS- 3E
[4] Wood, M. and O’Donnell, M.P. “Directions of change in the teaching of reading”, Reading Improvement, 28(2), 1991, pp. 100-103.
373
M2USIC 2006
TS- 3E
An Efficient Technique of Decomposing Bangla Composite Letters Riyad – Ush – Shaheed, Mohammed Saiful Islam, Mohammad Shahnoor Saeed, Mohammad Mahadi Hassan, Mohammed Nizam Uddin Department of Computer Science & Engineering International Islamic University Chittagong E-mail:
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
letters by creating database for almost all the composite letters. In section 2 we describe some previous methods, then a new method is proposed to decompose Bangla composite characters in section 3. At the section 4 we describe the proposed algorithm.
Abstract The main goal of this paper is to decompose the Bangla composite letter. Composite letter is known as composed of multiple simple letters. In Bangla there are about 253 composite letters[11]. These types of letters are really difficult to understand for the foreigners who are interested to know Bangla Language. In this paper we have proposed an efficient technique to decompose the Bangla composite letters. The techniques can help to search and sort Bangla words.
2. Previous work In this section we describe some previous works on the Bangla Composite Letters simplification.
2.1 Method 1 Uddin, J.[1] et al. proposed a system to decompose Banagla composit alphabet. Their system takes the ASCII code for each composite alphabet as English character. When a composite character comes, the system search with this English alphabet into the database and fetches the simplified form of that composite alphabet. Example:
Keywords: Composite letter, Decompose.
1. Introduction Bangla is a very rich language and approximately 10% of the population in the world speaks in Bangla. Due to International Mother Language Day, the foreigners become interested about Bangla language. But Bangla is a very complex language for its large number of alphabets and composite letters. For this, it is very difficult for foreigners to understand Bangla. So, the simplification of Bangla is required. In this paper, we have tried to simplify Bangla composite Composite Letters
If the program first gets “
“ by scanning the input text,
after getting this alphabet it converts “
“ to “ ° “ and
” of “ “ from then it fetches the simplified form “ the database. For every composite alphabet there is a unique simplified English alphabet in the database. Some of the simplified forms of composite alphabets are given in the Table 1. Simplified form
Table 1: Simplified form of Composite letters Actually the program retrieves the ASCII codes of all simplified letters. So the simplified form is converted
to Bangla alphabets and concatenated to the noncomposite alphabets previously scanned. For example, in
374
M2USIC 2006
TS- 3E
“ ” [1]. Conversions of some simplified words are shown in the Table 2.
the Bangla word “ ”, the composite letter is “ ” which is converted to “ ” by program. Hence program concatenates “ ” with “ ” that provides Word with Composite Letter
Word without Composite Letter
Table 2: Words with no composite letters they can be typed before or after the characters but internal representation every time are to be shifted after the character. In case of compound characters, they are decomposed into their constituent components and stored accordingly [11]. Table 3 shows internal representations of few words, where ‘@’ represents the dummy characters.
2.2 Method 2 To maintain proper sorting this method proposed an internal representation of Bangla words where a dummy character is placed after the character[11], which has no modifier. More over, it also ensures that there would be no dummy character between the constituent parts of a compound character. Again, vowel modifiers are included in the character set and
Word
Internal Representation
Table 3: Internal representation of words (method 2) This method has some limitations [14]: 1. Previously extra vowel modifiers had to be accommodated in the keyboard, which is not needed according to our opinion. 2. Shifting the vowel modifiers adds extra overhead. The keyboard interface has to be complex enough to do this job. 3. For using dummy character, a large amount of disk space is consumed to store Bangle words.
appropriate ASCII values. No linked character is used. The Vowel modifiers are assigned 10 distinct ASCII values higher than those of the consonants. The compound characters are divided into their constituent components and saved to file. The shape of those components will vary based on their relative position in compound character [9]. All the shapes are stored in the Video ROM and distinct codes are assigned to them. Internal representations of some words are shown in the Table 4.
2.3 Method 3 In this proposed system, the authors divided the characters into vowels, consonants and necessary symbols. Here [9], a special key is used for linking character. The words will be typed as they are spelled. The characters in words are mapped to
375
M2USIC 2006
TS- 3E
Words
Internal Representation
Table 4: Internal Representation of words (method 3) This method has some following restrictions[14]: 1. Due to use of the key used for linked characters, extra space is required to store Bangla words. 2. Since different codes are assigned to different shapes of the constituent parts of the compound character, a wide range of shapes and their corresponding codes are to be maintained. Words
2.4 Method 4 In this method the authors assign two-digit unique number for each Bangla character. To handle composite character they[14] use a special two digit number 99. Some representations of simple and composite words are shown in Table 5.
Representation 607947715773 4372527255
14329932995057
Table 5.: Internal Representation of words (method 4) This method has some limitations: • Due to the use of two extra digits this method take much memory. • To search a composite character from the memory is difficult.
3.1 Assigning values of Bangla Letters At first we assign a digit unique for every letter of Bangla alphabet along with the vowel modifiers and the consonant modifiers. The letters and their corresponding numbers are given in the Table 6.
3. Proposed method Letters
Assigned Number 0-9 10
12-20 21-25 26-30 31-35 36-40 41-45
46,47,48,49,50,51.52 53,54,55 56,57,58,59
376
M2USIC 2006
TS- 3E
60,61,62,63,64,65,66,67, 68,69,70,71,72,73
Table 6: Assigned number to the letters of Bangla Alphabets
3.2 Simplifying the characters of Composite letters and assigning values
It is to be noticed that here “ ” is treated as a set ”. The consonant of two characters that is “ modifiers are also having the same number as their original consonants.
Composite Letters
Secondly, we decomposed the composite letters. And then assign the numeric values of the composite characters. Here we use a “.” operator to separate the letters of a composite character. Some examples are given in the following Table 7.
Simplified form
Assigning values 50.32
38.38 21.50 45.44
40.39.60 + + Table 7: Assigning values of Composite Letters
5.
4. Proposed algorithm Step – 1 Store all the Sorbarno (vowel), Benjonbarno (consonant), modifier and their assigned value into the database. Step – 2 Store all possible composite letters into the database. Step – 3 Compare the composite letters. Step – 4 If the composite letter is valid and match into the database, then print the simplified Sorbarno (vowel) and Benjonbarno (consonant) of the given composite letters.
Sorting and searching are very easy due to assigning a specific value for each character.
6. Conclusion Bangla is a widely used language all over the world. More than 200 million people speak in Bangla language; it is also the official language of three countries (Bangladesh, India and Seara Leon). We feel proud as Bangladeshi that UN declared our 21st February mother language day as the International Mother Language Day all over the world in the year 2000. In our paper we are trying to solve the Bangla composite letters problem. We hope our proposed techniques can help to ease the difficulties of Bangla composite letters. Our algorithm also can help to search and sorting Bangla characters efficiently.
5. Comparative study As describer earlier, several previous works have done in this field. But most of them have limitations. In the proposed system, we tried to over come the limitations. Our successes are as follows: 1. Use only a dummy character in between composite character. 2. No shifting of vowel modifier. 3. No extra codes are assigned to different shapes of the constituent parts of the compound character, no need to maintained large amount of codes. 4. Little amount of dummy characters are used, small amount of disk space is needed to store Bangle words.
References [1] Ahmed, Jalal Uddin, Kamal, A.H.M., Haque, Md. Raihanul, Bangla Composite Letters Simplification (BCLS), Proceedings of International Conference on Computer and International Technology (ICCIT-2003), Jahangirnagar University, Dhaka, Bangladesh. [2] Bishwas, Sayttandranath and Rahman, Jhorna’s Adhunik Bangla Byakoron ebong Rochana.
377
M2USIC 2006
TS- 3E [10] Prof. Asaduzzaman, Mohammed and Habibur, Sayed Rahman’s Ucchatara Bangla Byakoron O Rochona. [11] Rahman, Md. Shahidur and Iqbal, Md. Zafar, Bangla Sorting Algorithm: A Linguitic Approach, Proceedings of International Conference on Computer and International Technology, Dhaka, 18-20,] December 1998, PP.204-208. [12] Rahman Zillur Siddiqui’s Bangla Academy Bengali-English Dictionary. [13] Minhaz Fahim Zibran, Arif Tanvir, Rajiullah Shammi and Md. Abdus Sattar, Computer Representation of Bangla Characters And Sorting of Bangla Words, Proc. ICCIT’2002 , 27-28 December, East West University, Dhaka, Bangladesh. [14] Khan, M. H, Haque, S. R, Uddin, M. S., Khan, R. and Islam, A. T., An Efficient and Correct Bangla Sorting Algorithm, Proc. ICCIT’2004 , Brac University, Dhaka, Bangladesh.
[3] Dale, Nell and Lilly, Susan C’s Pascal Plus, Data Structures, Algorithms and Advanced Programming. [4] Decomposition technique from http:// woeug.ukc.ac.uk / parallel / acronyms /hpccgloss/all.html accessed on the 5th March, 2005. [5] Definition of Unicode from http://www.unicode.org/standard/what is unicode.html accessed on the 2nd March, 2005. [6] Dr. Muhammed, Quazi Din and Prof. Alam, Shafiul’s Amader Vhasha O Rochona. [7] Ghosh, Jagdishchandra’s , Adhunik Bangla Byakoron. [8] Lipschutz, Seymour’s Theory and Practices of DATA STRUCTURES. [9] Palit, Rajesh and Sattar, Md. Abdus, Representation of Bangla Character in the Computer Systems. Bangladesh Journal of Computer and Information Technology, volume-7, number-1, December 1999.
378
M2USIC 2006
TS- 3E
Radio Frequency Identification (RFID) Application in Smart Payment System Mohamad Izwan Ayob, Danial Md Nor, Mohd Helmy Abd Wahab, Ayob Johari Faculty of Electrical and Electronic Engineering Tun Hussein Onn University College of Technology 86400 Parit Raja, Batu Pahat Johor, Malaysia
[email protected],{danial,helmy,ayob}@kuittho.edu.my
Abstract Radio Frequency Identification (RFID) is the most advanced technological method for automatic data collection. It can be integrated with any enterprise applications to help reduce human errors and improve the speed and accuracy of business processes. As we know, insufficient small change is a small matter but it can lead to a big problem to the shopkeeper as well as to the customer. Therefore, a smart payment system is developed by applying the RFID technology. This project is basically an electronic cash system that is referred as one of the categories in electronic payment system. Customer can register and top up their card at admin while the payment of meal can be done at participated shop. Furthermore, Microsoft Visual Basic 6.0 program is used to develop the interface and software of the system. Besides of using the RFID tag and reader, Microsoft Office Access 2003 is also used to design the system database while the networking system is carried out by using Ethernet. The result shows that the RFID can be applied to the smart payment system. Keywords : RFID, database, payment system
1. Introduction Radio Frequency Identification (RFID) is the last and the most advanced technological method for automatic data collection. It is a mechanism for data transfer by electromagnetic waves. It works by detection and unique identification of a transponder tag by a reader. The tag responds to the reader’s radio signal with its own identity and other stored data [1]. This technology does not require line of sight and it goes far beyond what bar code scans can do. The optical nature of barcode requires labels to be seen by lasers. That line-of-sight between label and reader is often difficult, impractical, or even impossible to achieve in industrial environments. In order to function properly, a barcode reader must have clean, clear optics, the label must be clean and free of abrasion, and the reader and label must be properly oriented with respect to each other. Contrast with barcode, RFID technology enables tag reading from a greater distance, even in harsh environments [1].
379
Furthermore, the information imprinted on a barcode is fixed and cannot be changed. RFID tags, on the other hand, have electronic memory similar to what is in the computer or digital camera to store information about the inventory or equipment. This information can be dynamically updated [1]. RFID systems can be differentiated based on the frequency range it uses. Low-frequency (30 KHz to 500 KHz) systems have short reading ranges and lower system costs. They are most commonly used in security access, asset tracking, and animal identification applications. High-frequency (850 MHz to 950 MHz and 2.4 GHz to 2.5 GHz) systems, offering long read ranges (greater than 90 feet) and high reading speeds, are used for such applications as railroad car tracking and automated toll collection [1].
2. Background In today’s modern world, the fast growing technology has help human in solving their daily problems. As we know, insufficient small change is a small matter but it can lead to a big problem to the shopkeeper as well as to the customer. Therefore, the RFID technology should inspire to the invention of a useful and practical system that can be used in our daily life. The main objective of this project is to develop a smart payment system by applying the RFID technology. With this system, the shopkeeper will not need to be worried of the small changes. This project involves hardware and software. The hardware is RFID equipment that consists of tag and reader while the software is Microsoft Office Access 2003 program that is used to build the database of the system. In addition, Microsoft Visual Basic 6.0 software program is also used to develop the system interface. This smart payment system is developed for the usage of the food court in main campus of Kolej Universiti Teknologi Tun Hussein Onn (KUiTTHO).
3. Methodology This system is developed by using Waterfall LifeCycle (WLC) Model. This software process model is characterized by feedback loops and documentation-
M2USIC 2006
TS- 3E
driven. This model has six main phases consist of requirements, analysis, system and software design, implementation and unit testing, integration and system testing, and operation and maintenance.
database located at server and it will be accessed by both client and server. At administration, the process of new registration and top up can be done while the process of payment can be only executed at client.
3.1 Requirements Among the activities during this phase are exploring the concept and elicit the project’s requirement (extract). During this phase, the title, objective and scope of the project are identified. Then, all the literature is reviewed and information from various sources is gathered. 3.2 Analysis The system’s requirement is analyzed. Then, the specification document and the software project management plan are drawn up. For this project, the decision on which types of hardware and software that is used is based on the advantages and availability of both elements. 3.3 System and Software Design Before starting the actual coding, it is highly important to understand what we are going to create and what it should look like. The requirement specifications from previous phases are studied in this phase and system design is prepared. System design helps in specifying hardware and system requirements and also helps in defining overall system architecture. The hardware comprises of RFID system included RFID tag and reader. The Tag-it HF-I Transponder Inlay (shown in Figure 1) was choose because it comply with ISO/IEC 15693 standard with a user memory of 2k bits and organized in 64 blocks. The reading distance is up to 650mm is suited for this project. Microsoft Visual Basic 6.0 program is used to develop the interface and software of the system.
Figure 1: Tag-it HF-I Transponder Inlay
3.3.1 System Block Diagram Figure 2 shows the overview of the system. Every workgroup in which includes the client (shop) and server (administration) must be attached with RFID reader. Besides that, this system also applies the concept of client-server application where only one
380
Figure 2: Block diagram of the system
3.4 Implementation and Unit Testing This phase is focus on to implement on what has been designed in the previous phase. Both of these phases are related in each other. Theoretically, the activities which included in this phase are involves programming, debugging, unit testing, and acceptance testing. In this phase, the designs are translated into code. With respect to the type of application, the right programming language is chosen. On receiving system design documents, the work is divided in modules/units and actual coding is started. The system is first developed in small programs called units, which are integrated in the next phase. Each unit is developed and tested for its functionality; this is known as Unit Testing. Unit testing mainly verifies if the modules/units meet their specifications. 3.5 Integration and System Testing As stated in previous phase, the system is broken down into several units of modules for each function. These units are integrated into a complete system during Integration phase and tested to check if all modules/units coordinate between each other and the system as a whole behaves as per the specifications. In other word, the separate modules are brought together and tested as a complete system. The system is tested to ensure that interfaces between modules successfully work (integration testing), the system works on the intended platform and with the expected volume of data (volume testing) and that the system does what the user requires (acceptance/beta testing). After successfully testing the system, it is then ready to be used. 3.6 Operation and Maintenance This phase focused on deals with problems during the implementation. Generally, problems usually
M2USIC 2006
TS- 3E
happen when to deploy the system for the real environment. The main challenging task is to populate database with the real data instead of tested data and the network situation that always up and down. But these problems usually come at the beginning of the deployment. However the system requires continuous maintenance from time to time to provide a better service and the software must be updated regularly. Unexpected problems also comes when a heavy access to the system and normally this problem does not happen during testing phase except pilot testing or trial test has been conducted to the public.
information is filled in the specified fields and the “Update” button is pressed.
4. Result and Analysis Figure 4: Successful registration process
4.1 Administration Login Page Security Test The login page is important to protect an unauthorized user to access administration system. Password is required to activate the administration system otherwise error popup message appear to alert the user to re-login again (see Fig. 3).
4.3 Top up Process As stated before, the top up process can only be performed at administration module. Figure 5 shows a topping up process being executed by administrator. At first, admin requests the card from customer then clicks the “Read Tag” command button to read the card. After an admin receives money and gets confirmation of amount to be topped up from the customer, then an admin can easily clicks the “Select Amount” combo box to select the top up amount. A list of pre-defined amount which are RM5, RM10, RM15, RM20, RM50 and RM100 will be shown. If the amount is not listed, simply click “Others” to key in the amount requested (see Fig. 6). The balance of the card will be updated automatically. However, admin need to click the “Top Up” button as the confirmation of the top up process. After that, the new balance will be saved in the card and updated in database.
Figure 3: Testing with a wrong password
4.2 Registration Process The process of registration is only applied to customer as the RFID tag is only used by the customer. However, the shop registration is also important so that the record of shop id can be saved in the database. At first, the customer required to pay to admin a registration fees plus the minimum top up amount. Then, admin will read a new, blank RFID tag. After that, admin will key in the personal information of the customer and the amount of top up as the initial balance. The registration will be successful after the “Register” button is clicked. This command button will save all the data to the RFID tag as well as to the database. Figure 4 shows the successful registration process. To update customer information, the new
381
Figure 5: Select the top up amount from “cboTopup” combo box list
Figure 6: “Other Amount” pop up form
M2USIC 2006
TS- 3E
4.4 Payment Process At first, shopkeeper requests the card from customer. “Read Tag” button is click to read the card. Upon completion of calculating the meal price, the shopkeeper clicks the “Select Amount” combo box to deduct the balance in the customer card. A list of predefined amount which are RM1, RM2, RM3, RM4, RM5 and RM10 are listed. If the amount is not listed simply click “Others” to key-in the amount (see Fig. 7). Then, the new card’s balance update automatically. However, the payment process will not be successful if the “Pay Now” button is not clicked. Moreover, Figure 8 shows the successful payment process.
Figure 9: Select “ShopID” and insert date to view the transaction
Figure 7: “Other Amount” pop up form
Figure 10: Select “CustomerID” and insert date to view the transaction
Figure 8: Successful payment process
4.5 Transaction History At administration area, there is a menu that enables admin to view transaction and customer. From here, shopkeeper can claim payment from admin based on what have been displayed in shop’s transaction. Figure 9 shows the selection of “Shop ID” and date to view the shop’s transaction. Meanwhile, Figure 10 shows the selection of “Customer ID” and date to view the customer’s transaction. Admin should use navigation tab in order to view shop’s transaction or customer’s transaction at one time. After selecting the “Shop ID” from the list at data combo, admin must also supply a valid format of date at the specified text box. Then, the “View” button is used to display the transaction on selected date at the particular “Shop ID”. Figure 11 shows example of the completed transaction by a particular shop on 27-042006.
Figure 11: Transaction history base on selected “ShopID” and date
Meanwhile, there are two parts on “frmView2” that are used to display the total amount of top up and total expense each customer. Both transactions are separated for easy and fast viewing. Figure 12 shows the transaction completed by a customer on 27-042006 that consists of one top up process of RM111 and a payment of RM3.20 paid at “S06041201” shop..
Figure 12: Transaction “CustomerID” and date
382
history
base
on
selected
M2USIC 2006
TS- 3E
5. Discussion and Conclusion This project has described the basic design approach for Radio Frequency Identification (RFID) Application in Smart Payment System by adapting the WLC. Waterfall model was choose due to the involvement of hardware prototype and software. This methodology is widely used in defence and aerospace industries. Phase disciplines and monitoring, when practiced with sensible judgement and flexibility will lead to a good result with no offsetting downside. These are important and essential to the success of large projects. Currently, radio frequency identification (RFID) may be something new in this country since there are not so much application can be seen from the technology. But for the past two years, it is rapidly applied in various applications whether in government sectors or private sectors. For instance, it is successfully used by PLUS Expressways Bhd. (PEB) in the country’s automated toll system for the past several years. However the specification especially reading distance in RFID system must be dedicated to the application of the design system. In term of security, RFID is much better from barcode system and the price for RFID tag is much cheaper from smart chip. Compare to ordinary cash transaction, the benefit of this system especially for scholarship funded students, their expenditure will be able to control. For example, some items determined by KUiTTHO or faculty that student need to buy can be used using RFID payment, this is to prevent student from simply and lavishly spend their scholarship money for unnecessary lavish item. As a conclusion, the four main aspects in this project are Visual Basic programming language, Microsoft Access database design, RFID system, and network connection between client and server. Hopefully the prototype of this RFID Smart Payment System can be upgraded in future to overcome the current weakness in the system so that it will become a reality and beneficial to the usage of food court in KUiTTHO. Last but not least, from the RFID payment system, future research also can be done to see the possibilities for improvement in other typical applications such as stock take, inventory control, library and logistics.
References [1] Klaus Finkenzeller (2003). “RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification.” England: Wiley. [2] GT&T Engineering Pte Ltd (2005). “Technologies - RFID” http://www.gtteng.com/technologies/tech-rfidoverview.html
383
Thursday, July 28, 2005, 10:24:17 AM [3] Zon-Yau Lee, Hsiao-Cheng Yu, Pei-Jen Kuo (2001). “An Analysis and Comparison of Different Types of Electronic Payment Systems.” Proc. of the Portland International Conference on Management of Engineering and Technology (PICMET’01). Portland, Oregon, USA. [4] PLUS EXPRESSWAY BERHAD (2005). “Electronic Payment Systems (EPS)” http://www.plus.com.my/toll_eps.asp Monday, September 12, 2005, 1:05:21 PM [5] Wikipedia - the Wiki Encyclopedia (2001). “RFID - Wikipedia, The free encyclopedia” http://en.wikipedia.org/wiki/Rfid Wednesday, April 6, 2005, 10:33:48 AM [6] Alka R. Harriger, Susan K. Lisack, John K. Gotwals and Kyle D. Lutes (2000). “Introduction to Computer Programming with Visual Basic 6: A Problem-Solving Approach.” Purdue University: Prentice-Hall. [7] Badariah Hussein (1997). “Database Development Using Visual Basic.” Kolej Universiti Teknologi Tun Hussein Onn: Thesis B.Eng. [8] Khalid Isa and Zarina Tukiran (2005). “Module of 3-Days Workshop of Visual Basic 6.0: Beginner to Intermediate.” Kolej Universiti Teknologi Tun Hussein Onn. [9] Cisco Systems, Inc. (2005). “Ethernet” http://www.cisco.com/univercd/cc/td/doc/cisintwk /ito_doc/ethernet.htm Sunday, September 25, 2005, 4:00:47 PM [10] David M. Kroenke (2004). “Database Processing: Fundamentals, Design and Implementation.” New Jersey: Pearson Prentice Hall. [11] Tan Chan Fong (2004). “Remote Alarms Monitoring and Control System.” Kolej Universiti Teknologi Tun Hussein Onn: Thesis B.Eng. [12] Noel Jerke, George Szabo, David Jung and Don Kiely (1997). “Visual Basic 5 Client / Server How-To.” California: Waite Group Press. [13] Jeffrey L.Whitten, Lonnie D.Bentley and Kevin C.Dittman (2000). “System Analysis and Design Methods - 5th Edition.” McGraw-Hill. [14] Mennecke, B. and Townsend, A. (2005). “Radio Frequency Identification Tagging as a Mechanism of Creating a Viable Producer’s Brand in the Cattle Industry, MATRIC Research Paper 05MRP 8.
M2USIC 2006
TS- 3E
Enhance Polyphonic Music Transcription with NMF Seng Sophea
Somnuk Phon-Amnukaisuk
Perception and Simulation of Intelligent Behaviour Research Group, Faculty of Information Technology, Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Selangor, Malaysia
[email protected]
[email protected]
transcription is a very common and well-understood system. Attempts towards polyphonic music transcription are much more challenging and there is less success in the current research. In this paper we present the polyphonic music transcription rely on the Fourier transform and factor analysis tactics.
Abstract This paper demonstrates the enhancement of polyphonic music transcription by employing the Nonnegative Matrix Factorization (NMF) algorithm. The uniqueness of NMF constrains – non-negativities constraints – lead to good evaluation in computation because they allow only additive, not subtractive. The system enhances the time-frequency resolution of the FFT using Zero-padding. The resulting coefficients are used in NFM algorithm in order to find the notes of the polyphonic input.
This paper is structured as follow. We first introduce what is music transcription in Section 1. Section 2, we state the problems and briefly mention our approach to the problems. Music transcription background and related works describe in Section 3, and then we present our experiments in Section 4, and working evaluation in Section 5. Finally, we conclude the main idea and future works in Section 6.
Keywords: Automatic music transcription, Fast Fourier Transform, Non-negative Matrix Factorization, Zeropadding.
2. 1.
Problem statement
Introduction In this Section we explain about music transcription problems. In Fig. 1, music sound is transcribed into its equivalent music notation output. Fig. 1 (a) shows the monophonic transcription process and Fig. 1 (b) shows the polyphonic transcription process.
Music transcription is the process that maps musical signal (i.e., sound wave) to its equivalent musical notation format. Back to the early day, musicians perform music transcription manually. Manual music transcribing of the audio stream is a tough task to perform even for a skillful musician. It would be beneficial to many communities if music transcription can be performed automatically.
(a)
The problem of music transcription may be classified into two categories: transcribing monophonic music and transcribing polyphonic music. Monophonic music refers to a sound wave that does not have any two notes being played simultaneously. Polyphonic music, on the other hand, refers to a sound wave that has more than one note being played at the same time. Thus, we have monophonic music transcription that transcribes a single note played from monophonic signal while polyphonic music transcription transcribes many notes played from polyphonic signal. Monophonic
(b) Figure 1: Music transcription process (a) Monophonic, (b) Polyphonic The transcribers perform the following tasks: 1. Analog-digital conversion of sound signal 2. Determine pitch, time, dynamic range, and etc. of the sound signal. This is a mapping from
384
M2USIC 2006
3.
TS- 3E Compromise between time and frequency is always the issue in FFT. In Darch’s monophonic music transcription [3], he extracts the frequency coefficients from audio waveform using FFT. The system computed FFT from a 50 ms window length. This was equivalent to a 20 Hz frequency resolution. This figure was not effective for distinguishing notes below G4 (approximate figure). Darch approached this lacking of resolution by a Zero-padding technique. This technique increased the window size to tenfold resulting in a new resolution of 2 Hz. The finer resolution enabled the system to detect the notes as low as C2.
one representation to another representation format Transcribing those extracted properties (e.g. pitch, time, dynamic, etc.) into the equivalent conventional western music notational format.
In this paper, the emphasis is on the second step of the process mentioned above. In the literature review, there are two major approaches that researchers employ to determine the pitch and time properties of the input sound wave: Data-driven processing and Predictiondriven processing [1, 7]. The main distinction between these two approaches is that the prediction-driven does not depend solely on the sound wave input but also other sources of knowledge. In both approaches, pitch property of the sound wave is evaluated from the frequency spectrum generated by discrete Fourier transform. The pitches and onset/offset time are extracted from the frequency spectrum. This approach is common in most works. However, there is some major drawback in the precision of onset/offset time and the issues of harmonics in polyphonic cases. These issues are still opened research issues.
Polyphonic music transcription is a difficult problem. Klapuri [7] abstracted the domain into three levels: acoustic signal; sinusoid tracks and note representation. Sinusoid tracks are the tracks of local energy maxima in time-frequency. The system extracted the sinusoid tracks in frequency domain, the sinusoid tracks’ evolution in time can be traced by a series of frequency transforms at successive time points. Re-synthesizing the sound from the sinusoid tracks produced a result that possessed a high perceptual fidelity to the original. Dixon [4] proposed the time-frequency atoms to transcribe the notes. In this system, the local energy maxima and peaks were first located in frequency dimension whereby it gave the set of energy localized in time and frequency. The system then searched in the time dimension to locate the corresponding note onsets, and these onsets are followed until the power drops below the minimum threshold, which determined the offset time of the notes.
This report discusses the enhancement of the problem of partialism issue using Non-negative Matrix Factorization (NMF). First, the waveform audio signal sampled in time domain is transformed into spectrograms that represent time-dependent spectral energies of sounds. We then compute the non-negative entries of the spectrograms associated by iterating the updating rule (see Section 4, Equation (6)) until the system is converged. After convergence, different bases of W represent different notes of the input polyphonic stream.
3.
To date, the challenge of polyphonic transcription has not been solved yet. A numbers of problems are still opened research issues:
Background and related works 1.
Fast Fourier transform (FFT) was introduced during 1900’s as a fast computing for complex Fourier series. FFT reveals frequency components of the sound mass. This technique is useful when one wants to see different frequency components in a given sound. Most works to date [3, 4, 7, 12] employ Fourier transform for music transcription before any other techniques.
2.
3. Music transcription has been investigated since 1970’s. Piszczalski and Galler [12] presented the system that transcribed musical sounds to a musical notation. The system recognized the fundamental frequencies (in the human hearing range) of a monophonic input. Though, the output is not perfect, however, the work had triggered many other music transcription research objects.
The adjacent semitones might be erroneously detected due to poor frequency resolution. An octave error occurs when the frequency given is the expected frequency multiplied or divided by an integer power of two, also known as doubling of frequency. The system finds it is hard to decide which one is note and which one is harmonic. The mixture of more than one instruments played at the same time. Different type of instrument has different shape/phase of signal, but it consist the same frequency for each note.
Fourier transform is an effective tool that presents an input signal in its sine and cosine bases. Unfortunately, the Fourier transformation technique could not, in all cases, distinguish whether the coefficients are the harmonics or the fundamentals (of the polyphonic
385
M2USIC 2006
TS- 3E
input). Non-negative Matrix Factorization decomposes an original input coefficient matrix into the product of two non-negative matrices: a basis matrix and a coefficients matrix. NMF started to develop in the last decade till present day [6, 9, 10, 13, 14].
O( N 2 ) algorithm). However, it turns out that by rearranging these operations, one can optimize the algorithm down to O( N log( N )) which for large N makes a huge difference. The optimized version of the algorithm is called the fast Fourier transform, or the FFT. Generally, FFT is known as an efficient algorithm to compute the discrete Fourier transform, and it is defined as:
Lee and Seung [9] proposed the NMF algorithm and exploited it in image processing task. They also compared PCA (Principle Component Analysis), VQ (Vector Quantization), and NMF in terms of their basis functions (see [9] for more detailed discussion). The y concluded that NMF is a more practical approach to learn the parts of objects from the use of non-negativity constraints. This had made NMF a successful learning facial parts and semantic topics. The same as Hoyer’s system [6], he proposed an image processing using NMF but with the addition of sparse constraints. The system presented the control on sparseness explicitly given by basis NMF, thus, this algorithm improved the found decomposition of data representation because the control on sparseness explicitly is reduces the objective function (basis NMF) every step taken.
N −1
fk =
i 2 ∏ nk N
xne
(1)
n=0
Where k = 0, 1… N-1 and xo,…, xN-1 are complex numbers. Our program computes the spectrogram based on the sampling rate of the waveform signal. Each sampling rate chooses different size of FFT window. Thus, in order to have a good resolution of time and frequency for each sampling rate, we employ the Zero-padding algorithm to the same window size of spectrogram taken. 1 fr = (2) Wd
Smaragdis and Brown [10] applied NMF algorithm for music transcription. Basically, the system employed NMF on magnitude spectra. Their system showed that the components in basis matrix contributed an individual note and proposed an approach of polyphonic music transcription using NMF. Plumbley and Wang [13] had also developed a method for separating musical audio into streams of individual sound sources. The system used NMF method and a masking algorithm. By putting the masker binary on the original spectrogram, the program marked as 1 if one point was the maximum one in all the basis spectrogram at the same position in its related masker. This masker estimated the phases for the outputs by probabilistic inference. This algorithm was also inspired by automatic music transcription proposed by Plumbley and colleagues [14].
where fr is the frequency resolution, Wd is the window duration. If Zero-padding is employed we will have:
fr =
1 Wd * Pn
(3)
where Pn is the number of zero padded per window. Pn usually increases by the power of two or four and so on. In monophonic cases Fourier transform practically works well in order to track the pitch, because the system will mark the first lowest frequency as the fundamental frequency. Unfortunately, in polyphonic case, though the zero padding has been added to the program, the system still suffers from the harmonics of the simultaneously played notes.
In this paper, we use the Non-negative Matrix Factorization algorithm to decompose the input spectrogram in time-frequency information. The following Sections describe the algorithm and results.
4.
∑
−
NMF provides the solution for decomposing a given data into different components. NMF decomposes a non-negative matrix into two non-negative matrices. The following giving formula is the generic form of NMF:
Techniques
T
V mn ≈
In this Section, we describe the FFT of Fourier transform and NMF of factor analysis tactics. The naïve implementation of the N-point discrete Fourier transform involves calculating the scalar product of the sample with N separate basis vectors. Since each scalar product involves N multiplications and N additions, the total time is proportional to N2 (in other words, it is an
∑W
mr
H rn
(4)
k =0 m× n
where V ∈ R is a non-negative matrix and a positive integer r < min {m,n} in order to find non-negative matrix W ∈ R m×r and H ∈ R r×n to achieve the noise removal, model reduction, or feasibility reconstruction
386
M2USIC 2006
TS- 3E
etc. [9]. The product of WH is called a non-negative matrix factorization of V, although V is not necessarily equal to produce WH. The objective function in Equation (4) can be modified in several ways to reflect the application needs.
negative V finds an approximate function V≈WH by converging a local maximum of the objective function given in Equation (4). As in Section 5, we will show the results and evaluate the accuracy.
In order to find the product of WH, we first need to define cost function to quantify the quality of the matrices. This may be done in two ways: the measuring of Distance and Divergence [9, 10, 11]. The common way of distance measuring is to find W and H, which minimizes the difference between V and WH:
5.
V − WH
2
= ∑ (Vmn − WH mn ) 2
(5)
mn
Here, we adopt the common multiplicative update rules. These rules were found as the good compromise between speed and ease of implementation to find the difference between V and WH [9, 10, 11]. It is defined as: (( W r +1 )T V )ia H iar +1 = H iar (( W r +1 )T W r +1 H r )ia
Experiment and evaluation
The experiments were conducted using the acoustic waveforms as the input signals. These waveforms are converted and presented in digital samples representation. The samples are passed to FFT to compute the frequency coefficients with Zero-padding enhancement. Fig. 2(a) shows the input in music notations, and follows by the spectrogram with and without the zero-padding algorithm Fig. 2(b) and Fig. 2(c).
Figure 2 (a): The diatonic scale
(6)
W ajr +1 = W ajr
( V ( H r )T )aj ( W r H r ( H r )T )aj
where W and H are respectively corresponding updating a column of W and row of H (see Equation (4)). Hence, the only non-subtractive combinations are allowed for evaluation. This is believed to be compatible to the intuitive notion of combining parts from a whole, and is how NMF learns a part-based representation.
Figure 2 (b): Spectrogram without Zero-padding.
In this paper, we make use of Fourier transform spectrograms in order for the NMF to produce a better result. NMF itself does not produce anything that even roughly corresponds to actual notes, and would not be usual as part of an instruments separation system without serious additional process [6]. Therefore, we produce the non-negativity matrix based on Fourier transform. The Equation is defined as following: Figure 2 (c): The spectrogram with Zero-padded
V( n, j ) = FFT ([ x ( n, j ) ... y (T − n, j ) ])
(7)
The changes of frequency resolution after employed Zero-padding algorithm help the transcriber to mark the fundamental frequency easily (see the sharpness of fundamental frequencies in Figure 2(b) and 2(c)). This improvement, however, does not solve the issue of harmonics in polyphonic music.
Let xn,j to be the reading of the inputted signal from the increment of time T by n length time. We then use the non-negative matrix factorization techniques to break the signal into its components. The system starts to compute the non-negative matrix where V(n,j) is the music signal spectrogram. This is an iterative algorithm for NMF. From non-negative initial conditions for W and H, iteration of these update rules (Eq. (6)) for non-
We resort to the NMF to handle the harmonic issue. NMF decomposes the input spectrogram into time and note output. The experiment shown in Figure 3(a), 3(b)
387
M2USIC 2006
TS- 3E
and 3(c) present the output of the program. Fig. 3 (a) and (c) show the original notation and the piano roll after the transcription. Note that Figure 3(c) is the interpretation of signal from Figure 3(b).
Figure 3 (a): the real musical signal
Figure 4 (b): the actual notes after NMF analysis
Figure 3 (b): the actual notes after NMF analysis Figure 4 (c) Piano-roll Original (L) and after NMF (R) Justification of the transcription accuracy from real music is usually subjective; this is because the accuracy of the transcription is dependent to the type of music (e.g. tempo, range of notes, texture, etc). To justified the accuracy of our methodology in an objective way. We create simple monophonic and polyphonic patterns of notes (in C major scale). The experiment result is summarized in Table 1. File Real monophonic Real polyphonic
Figure 3 (c): Piano-roll representation after NMF
In Figure 4, similar experiment is carried out with other musical excerpt selected without bias. The piano roll of the expected and actual output is displayed side by side in Fig. 4 (c). Only a few of polyphonic notes signal are miss-detected, most errors are from semitone and octave cases. The experiment results were positive and very supportive to the exploitation of NMF in music transcription.
Correct 100% 79%
Incorrect 0 21%
Table1: accuracy rate of monophonic and polyphonic transcriptions The accuracy rate in Table 1 is based on the experiment of total 208 notes included monophonic and polyphonic musical signals. In the experiments of real monophonic signal transcription, the results are much better than the real polyphonic signals. The current result conducts almost 100% for synthetic and monophonic, and roughly 80% for polyphonic. Since the speed of the notes (how fast the notes are played) contributes to the accuracy of the transcription, we present the relationship for signals at different sampling rate and tempo in Figure 5. We manually select the number of samples and factor of zero padding
Figure 4 (a): the real musical signal
388
M2USIC 2006
TS- 3E [4] S. Dixon, “On the Computer Recognition of Solo Piano Music”, Australian Computer Music Conference (ACMC), Brisbane, Australia, pp. 31-37, 2000. [5] R. Grosse, “CS229 Project: Segmenting Music into Notes using Mixture of Gaussians”, Machine Learning Final Project, Stanford University, December 16, 2005. [6] P. O. Hoyer, “Non-negative Matrix Factorization with Sparseness Constrains”, Journal of Machine Learning Research, Vol. 5, pp. 1457-1469, November 2004. [7] A. Klapuri, “Automatic Transcription of Music”, Master of Science Thesis, Temper University of Technology, November 2002. [8] A. Klapur, “Automatic Music Transcription as We Know it Today”, Journal of new Music Research, Vol. 33, No. 3, pp. 269-282, September 2004. [9] D. D. Lee, and H. S. Seung, “Learning the Parts of Object by Non-negative Matrix Factorization”, In Nature 401, pp. 788-791, 1999. [10] D. D. Lee, and H. S. Seung, ”Algorithms for NonNegative Matrix Factorization”, In Advances in Neural Information Processing 13, Music Information Technology (MIT) Press, 2001. [11] C. J. Lin, “Project Gradient Methods for NonNegative Matrix Factorization”, Information and Support Services Technical Report, ISSTECH-95-012, Department of Computer Science, National Taiwan University, 2005. [12] M. Piszezalski, and B. Galler, “Automatic Music Transcription”, Computer Music Journal, Vol. 1, No. 4, pp. 24-31, 1977. [13] M. D. Plumbley, S. A. Abdallah, T. Bumensath, and M. E. Davis, “Sparse Representation of Polyphonic Music”, To appear in Signal Processing, 2005. [14] M. D. Plumbley, and B. Wang, “Musical Audio Stream Separation by Non-Negative Matrix Factorization”, In Proc. of the DMRN Summer Conference, Glasgow, UK, 23--24 July 2005. [15] P. Smaragdis, and J. C. Brown, “Non-Negative Matrix Factorization”, In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustic (WASPAA), pp. 177-180, October 2003 [16] P. Smaragdis, “Discovering Auditory Objects through Non-negative Constraints”, SAPA, 2004. [17] K. Yoshii, M. Goto, H. G. Okuno, “AdaMast: A Drum Sound Recognizer based on Adaptation and Matching of Spectrogram Templates”, In Proc. 2nd Music Information Retrieval Evaluation eXchange (MIREX 2005), September 2005.
No of samples
according to the tempo and sampling rate of the signal. In this Figure, we maintain the frequency resolution of FFT around 2.69 Hz. The number 2.69Hz is used because it is able to detect the note as low as C2. All the experiments presented in this paper are parameterized according to the resolution of 2.69 Hz (resolution between FFT coefficients). 18000 16000 14000 12000 10000 8000 6000 4000 2000 0
8 kHz
11 kHz 22 kHz
44 kHz
1''
0.8''
0.5''
0.4''
Note duration
Figure 5: Constant resolution graphs
6.
Conclusion
We have presented an automated music transcription process and demonstrated the positive result from the simple non-negativity constraints for polyphonic musical signal and good results from the synthetic /monophonic signals. For further work, we plan to continue the study of NMF and develop further applications to spectral data analysis. Several problems remain for future research on NMF in music transcription: - Improve NMF quality with different classes such as: Least Square NMF or Alternative Least Square NMF. - Integration of Fourier transforms, NMF, and Wavelet transform.
7.
References
[1] J. P. Bello, and M. Sandler, “Blackboard System and Top-down Processing for the Transcription of Simple Polyphonic Music”, In Proc. COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 2000
[2] M. Berry, M. Browne, A. Langville, V. Pauca, and R. Plemmons, “Algorithms and Application for Approximate Nonnegative Matrix Factorization”, submitted, 2006. [3] J. J. A. Darch, “An Investigation into Automatic Music Transcription”, Final Year Project, Bachelor of Electronic Engineering, University of East Anglia (UEA), May 2003.
389
SESSION TS3F TOPIC: COMPUTER APPLLICATIONS & CYBERNATIC SESSION CHAIRMAN: Dr. You Ah Heng _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________ 4.20pm
TS3F-2
RGB-H-CbCr Skin Colour Model for Human Face Detection Chong Wei Kit Nusirwan Anwar Abdul Rahman John See (Multimedia University, Cyberjaya, Selangor, Malaysia)
390
4.40pm
TS3F-3
Scalable Pornographic Text Filtering System Oi Mean Foong Ahmad Izzudin Zainal Abidin Suet PengYong Harlina Mat Ali (Universiti Teknologi Petronas , Tronoh, Perak, Malaysia)
396
5.00pm
TS3F-4
Computer Based Instructional Design in Light of Constructivism Theory Kamel H.A.R. Rahouma, Inas M. El-H Mandour ( Technical College in Riyadh, Saudi Arabia)
5.20pm
TS3F-5
The Role of Computer Based Instructional Media and Computer Graphics in Instructional Delivery Inas M. El-H. Mandour, Kamel H. A. R. Rahouma (Technical College in Riyadh, Saudi Arabia)
402
408
M2USIC 2006
TS- 3F
RGB-H-CbCr Skin Colour Model for Human Face Detection Nusirwan Anwar bin Abdul Rahman, Kit Chong Wei and 1John See Faculty of Information Technology, Multimedia University, 63100 Cyberjaya, Selangor, Malaysia 1
[email protected]
the abundance of methods and approaches. Recent surveys [1], [2] have comprehensively reviewed various face detection methods available in the literature. Face detection in colour images has also gained much attention in recent years. Colour is known to be a useful cue to extract skin regions, and it is only available in colour images. This allows easy face localisation of potential facial regions without any consideration of its texture and geometrical properties. Most techniques up to date are pixel-based skin detection methods [3], which classifies each pixel as skin or “non-skin” individually, and independently from its neighbours. Early methods use various statistical colour models such as a single Gaussian model [4], Gaussian mixture density model [5], and histogram-based model [6]. Some colour spaces have their luminance component separated from the chromatic component, and they are known to possess higher discriminality between skin pixels and non-skin pixels over various illumination conditions. Skin colour models that operate only on chrominance subspaces such as the CbCr [7], [8], [9] and H-S [10] have been found to be effective in characterising various human skin colours. Skin classification can be accomplished by explicitly modelling the skin distribution on certain colour spaces using parametric decision rules. Peer et al. [11] constructed a set of rules to describe skin cluster in RGB space while Garcia and Tziritas [12] used a set of bounding rules to classify skin regions on both YCbCr and HSV spaces. In this paper, we present a novel skin colour model RGB-H-CbCr for human face detection. This model utilises the additional hue and chrominance information of the image on top of standard RGB properties to improve the discriminality between skin pixels and non-skin pixels. In our approach, skin regions are classified using the RGB boundary rules introduced by Peer et al. [11] and also additional new rules for the H and CbCr subspaces. These rules are constructed based on the skin colour distribution obtained from the
Abstract While the RGB, HSV and YUV (YCbCr) are standard models used in various colour imaging applications, not all of their information are necessary to classify skin colour. This paper presents a novel skin colour model, RGB-H-CbCr for the detection of human faces. Skin regions are extracted using a set of bounding rules based on the skin colour distribution obtained from a training set. The segmented face regions are further classified using a parallel combination of simple morphological operations. Experimental results on a large photo data set have demonstrated that the proposed model is able to achieve good detection success rates for near-frontal faces of varying orientations, skin colour and background environment. The results are also comparable to that of the AdaBoost face classifier. Keywords: Face detection, skin colour, colour models, skin classification, skin modeling.
1. Introduction Detection of the human face is an essential step in many computer vision and biometric applications such as automatic face recognition, video surveillance, human computer interaction (HCI) and large-scale face image retrieval systems. The first step in any of these face processing systems is the detection of the presence and subsequently the position of human faces in an image or video. The main challenge in face detection is to cope with a wide variety of variations in the human face such as face pose and scale, face orientation, facial expression, ethnicity and skin colour. External factors such as occlusion, complex backgrounds, inconsistent illumination conditions and quality of the image may also contribute significantly to the overall problem. Throughout the last decade, there has been much development in face detection research, particularly in
390
M2USIC 2006
TS- 3F
variation and quality of image. Since we emphasise on the use of a novel skin colour model in this work, our system is restricted to colour images only.
training images. The classification of the extracted regions is further refined using a parallel combination of morphological operations. The rest of the paper is organised as follows: Section 2 briefly describes the various steps of our face detection system. The construction of the RGB-H-CbCr skin colour model is described in Section 3. Section 4 presents the use of morphological operations in our algorithm. Experimental results and discussions are provided in Section 5. Finally, Section 6 concludes the paper.
3. RGB-H-CbCr Model 3.1. Preparation of Training Images In order to build the skin colour model, a set of training images were used to analyse the properties and distribution of skin colour in various colour subspaces. The training image set is composed of 140 skin colour patches of ten colour images obtained from the Internet, covering a wide range of variations (different ethnicity and skin colour). These images contained skin colour regions that were either exposed to normal uniform illumination, daylight illumination (outdoors) or flashlight illumination (under dark conditions). The skin regions in all ten training images were manually cropped for the purpose of examining their colour distribution. Fig. 2 shows two samples of the skincropped training images.
2. System Overview Training Images
Input Image
RGB-H-CbCr Skin Model
Skin Region Segmentation
Morphological Operations
Region Labeling
Detected Faces
Figure 2. Skin-cropped training images Figure 1. System overview of face detection system
3.2. Skin Colour Subspace Analysis The prepared skin colour samples were analysed in the RGB, HSV and YCbCr spaces, As opposed to 3-D space cluster approximation used by Garcia and Tziritas [12], we intend to examine 2-D colour subspaces in each of the mentioned colour models, i.e. H-S, S-V, H-V and so forth, to model the skin clusters more compactly and accurately. In RGB space, the skin colour region is not welldistinguished in all 3 channels. A simple observation of its histogram will show that it is uniformly spread across a large spectrum of values. In HSV space, the H (Hue) channel shows significant discrimination of skin colour regions, as observed from the H-V and H-S plots in Fig. 3 where both plots exhibited very similar distribution of pixels. In the hue channel shown in Fig. 4, most of the skin colour samples are concentrated around values between 0 and 0.1 and between 0.9 and 1.0 (in a normalized scale of 0 to 1).
Fig. 1 shows the system overview of the proposed face detection system, which consists of a training stage and detection stage. In our colour-based approach to face detection, prior formulation of the proposed RGBH-CbCr skin model is done using a set of skin-cropped training images. Three commonly known colour spaces – RGB, HSV and YCbCr are used to construct the proposed hybrid model. Bounding planes or rules for each skin colour subspace are constructed from their respective skin colour distributions. In the first step of the detection stage, these bounding rules are used to segment the skin regions of input test images. After that, a combination of morphological operations are applied to the extracted skin regions to eliminate possible non-face skin regions. Finally, the last step labels all the face regions in the image and returns them as detected faces. In our system, there is no preprocessing step as it is intended that the input images are thoroughly tested under different image conditions such as illumination
391
M2USIC 2006
TS- 3F
observations. All rules assume intensity values between 0 and 255 for each colour channel. In RGB space, we use the skin colour rules introduced by Peer et al. [11]. The skin colour at uniform daylight illumination rule is defined as (R > 95) ∩ (G > 40) ∩ (B > 20) (max{R, G, B} − min{R, G, B} > 15) (|R − G| > 15) ∩ (R > G) ∩ (R > B)
Figure 3. H-V and H-S subspace plots
AND AND (1)
while the skin colour under flashlight or daylight lateral illumination rule is given by (R > 220) ∩ (G > 210) ∩ (B > 170) (|R − G| ≤ 15) ∩ (R > B) ∩ (G > B)
Figure 4. Distribution of the H (Hue) channel
AND (2)
To consider both conditions when needed, we used a logical OR to combine both rule (1) and rule (2). The RGB bounding rule is denoted as Rule A. Rule A: (1) ∪ (2) (3) Based on the observation that the Cb-Cr subspace is a strong discriminant of skin colour, we formulated 5 bounding planes from its 2-D subspace distribution, as shown in Fig. 6. The five bounding rules that enclosed the Cb-Cr skin colour region are formulated as below: Cr ≤ 1.5862 × Cb + 20 Cr ≥ 0.3448 × Cb + 76.2069 Cr ≥ -4.5652 × Cb + 234.5652 Cr ≤ -1.15 × Cb + 301.75 Cr ≤ -2.2857 × Cb + 432.85
Figure 5. Distribution of Y, Cb and Cr Some studies have indicated that pixels belonging to skin regions possess similar chrominance (Cb and Cr) values. These values have also been shown to provide good coverage of all human races [8]. Our analysis of the YCbCr space using our training set further substantiates these earlier claims, and have shown that the Cb-Cr subspace offers the best discrimination between skin and non-skin regions. Fig. 5 shows the compact distribution of the chrominance values (Cb and Cr) in comparison with the luminance value (Y). It is also observed that the varying intensity values of Y (Luminance) does not alter the skin colour distribution in the Cb-Cr subspace. The luminance property merely characterises the brightness of a particular chrominance value.
(3) (4) (5) (6) (7)
Rules (3) to (7) are combined using a logical AND to obtain the CbCr bounding rule, denoted as Rule B. Rule B: (3) ∩ (4) ∩ (5) ∩ (6) ∩ (7) (8) In the HSV space, the hue values exhibit the most noticeable separation between skin and non-skin regions. We estimated two cutoff levels as our H subspace skin boundaries, H < 25 H > 230
(9) (10)
where both rules are combined by a logical OR to obtain the H bounding rule, denoted as Rule C. Rule C: (9) ∪ (10) (11)
3.3. Skin Colour Bounding Rules
Thereafter, each pixel that fulfills Rule A, Rule B and Rule C is classified as a skin colour pixel,
From the skin colour subspace analysis, a set of bounding rules is derived from all three colour spaces, RGB, YCbCr and HSV, based on our training
Skin Colour Rule: (3) ∩ (8) ∩ (11)
392
(12)
M2USIC 2006
TS- 3F
4. Morphological Operations The next step of the face detection system involves the use of morphological operations to refine the skin regions extracted from the segmentation step. Firstly, fragmented sub-regions can be easily grouped together by applying simple dilation on the large regions. Hole and gaps within each region can also be closed by a flood fill operation. The problem of occlusion often occurs in the detection of faces in large groups of people. Even faces of close proximity may result in the detection of one single region due to the nature of pixel-based methods. Hence, we used a morphological opening to “open up” or pull apart narrow, connected regions. Additional measures are also introduced to determine the likelihood of a skin region being a face region. Two region properties – box ratio and eccentricity are used to examine and classify the shape of each skin region. The box ratio property is simply defined as the width to height ratio of the region bounding box. By trial and error, the good range of values lie between 1.0 and 0.4. Ratio values above 1.0 would not suggest a face since human faces are oriented vertically with a longer height than width. Meanwhile, ratio values below 0.4 are found to misclassify arms, legs or other elongated objects as faces. The eccentricity property measures the ratio of the minor axis to major axis of a bounding ellipse. Eccentricity values of between 0.3 and 0.9 are estimated to be of good range for classifying face regions. Though this property works in a similar way as box ratio, it is more sensitive to the region shape and is able to consider various face rotations and poses. Both the box ratio and eccentricity properties can be applied to the extracted skin regions either sequentially or parallelly, following a dilation, opening or flood fill.
Figure 6. Bounding planes for Cb-Cr subspace
Figure 7. Skin segmentation using the RGB-HCbCr model
3.4. Skin Colour Segmentation The proposed novel combination of all 3 bounding rules from the RGB, H and CbCr subspaces (Equation 12) is named the “RGB-H-CbCr” skin colour model. Although skin colour segmentation is normally considered to be a low-level or “first-hand” cue extraction, it is crucial that the skin regions are segmented precisely and accurately. Our segmentation technique, which uses all 3 colour spaces was designed to boost the face detection accuracy, as will be discussed in the experimental results. Fig. 7 shows the skin segmentation result of two sample test images using the RGB-H-CbCr model. The resulting segmented skin colour regions have three common issues: a) Regions are fragmented and often contain holes and gaps. b) Occluded faces or multiple faces of close proximity may result in erroneous labeling (e.g. a group of faces segmented as one). c) Extracted skin colour regions may not necessarily be face regions. There are possibilities that certain skin regions may belong to exposed limbs (arms and legs) and also foreground and background objects that have a high degree of similarity to skin colour (also known as false alarms).
5. Experimental Results and Discussion The proposed RGB-H-CbCr skin colour model for skin region segmentation was evaluated on a face detection system using a test data set of 100 images, containing a total of 600 unique faces. In order to build this test data set, the images were randomly selected from the Internet1, each comprising of two or more near-frontal faces and of a large variety of descent (Asians, Caucasians, Middle-Eastern, Hispanic and African). The test images also consist of various indoor 1
All images taken from the Internet were used solely for this research only and there were no attempts to re-use, edit, manipulate or circulate these images for any other interests.
393
M2USIC 2006
TS- 3F
Table 1. Face detection experimental results Method FDR DSR (%) (%) Opening+BoxRatio+Eccentricity 28.29 90.83 (parallel) AdaBoost [13] 18.65 90.17 Opening+BoxRatio 20.76 87.17 Opening+Eccentricity 23.73 85.17 Dilation+Opening 42.29 83.00 Opening+BoxRatio+Eccentricity 13.36 82.17 (sequential)
and outdoor scenes and of different lighting conditions – daylight, fluorescent light, flash light (from cameras) or a combination of them. The size of each image ranged from 500x350 to 1000x600, regardless of face size. The face detection system was implemented using MATLAB on a 2.4MHz Pentium IV machine running on 256 MB RAM. To evaluate our experiments, we defined two performance metrics to gauge the success of our schemes. False Detection Rate (FDR) is defined as the number of false detections over the number of detections. false detections × 100% (13) FDR = number of detections
Table 2. Results using various combination of colour model bounding rules Method FDR (%) DSR (%) RGB only [11] 43.05 69.00 RGB+CbCr 36.14 77.17 RGB+H 33.82 83.50 RGB+H+CbCr 28.29 90.83
Detection Success Rate (DSR) is defined as the number of correctly detected faces over the actual number of faces in the image. correctly detected faces × 100% (14) DSR = number of faces where the number of correctly detected faces is equivalent to the number of faces minus the number of false dismissals. Table 1 shows the experimental results of the face detection system using various combination of morphological operations (as described in Section 4). The face detection system achieved a good detection rate of 90.83% using a parallel combination of opening, box ratio and eccentricity operators. Other combinations resulted in a poorer DSR or higher FDR. The proposed scheme was also compared with the well-known AdaBoost face detector/classifier by Viola and Jones [13], and results showed that the proposed scheme (with the right configuration of morphological operators) is able to reach comparable standards to that achieve by the AdaBoost algorithm (90.17%) on the similar data set. To evaluate the effectiveness of the RGB-H-CbCr skin colour model (Table 2), the face detection system was tested with various combination of colour models, each represented by its own set of bounding rules. The combination of all 3 subspaces resulted in the best DSR and lowest FDR values. Fig. 8 presents some sample results of the proposed face detection system. As shown in the experimental results, the proposed method sometimes failed to detect a face correctly, as seen from the high FDR of 28.29% (Table 1). This could be attributed to the usage of morphological operators. Though these operators are used parallelly to improve the likelihood of detecting faces, it may sometimes cause “over-detection” of faces. Fig. 8(b) and Fig. 8(c) show examples of false detections and false dismissals encountered in our experiments.
The RGB-H-CbCr skin color model is able to deal with various brightness and illumination conditions, but it remains susceptible to detection of non-skin objects that possess similar chrominance levels as skin colour. Occlusion remains a difficult problem to tackle, especially when colour is used as the cue for segmentation. Occluded faces and faces that are closely located are often merged together by a minute portion of a connected skin part. Fig. 8(d) shows an example of misclassification of multiple persons due to occlusion.
6. Conclusion In this paper, we have presented a novel skin colour model, RGB-H-CbCr to detect human faces. Skin region segmentation was performed using a combination of RGB, H and CbCr subspaces, which demonstrated evident discrimination between skin and non-skin regions. The experimental results showed that our new approach in modelling skin colour was able to achieve a good detection success rate. On a similar test data set, the performance of our approach was comparable to that of the AdaBoost face classifier. In future work, we intend to refine the use of morphological operations in the post-processing of the extracted skin regions. An adaptive training (incremental learning) of the skin colour model can be used to improve the overall classification of skin regions. Primarily, the elimination of false detections and false dismissals is crucial to the success of a robust face detector.
394
M2USIC 2006
TS- 3F
7. References [1] M.-H. Yang, D. Kriegman, and N. Ahuja, “Detecting Faces in Images: A Survey”, IEEE Trans. PAMI, Vol. 24, No. 1, pp.34-58, Jan. 2002. [2] E. Hjelmas, and B.K. Low, “Face Detection: A Survey”, Computer Vision and Image Understanding, Vol. 83, No. 3, pp.236-274, 2001. [3] V. Vezhnevets, V. Sazonov, and A. Andreeva, “A Survey on Pixel-based Skin Color Detection Techniques”, Proc. Graphicon2003, Moscow, Russia, September 2003. (a) [4] J.-C. Terrillon, M. David, and S. Akamatsu, “Automatic Detection of Human Faces in Natural Scene Images by use of a Skin Color Model and of Invariant Moments”, Proc. Int. Conf. AFGR’98, Nara, Japan, pp. 112-117, 1998. [5] S.J. McKenna, S. Gong, and Y. Raja, “Modeling Facial Color and Identity with Gaussian Mixtures”, Pattern Recognition, 31(12), pp. 1883-1892, 1998. [6] R. Kjeldsen and J. Kender, “Finding Skin in Color Images”, Proc. Int. Conf. AFGR’96, Killington, Vermont, pp. 312-317, 1996. (b) [7] S. Gundimada, L. Tao, and V. Asari, “Face Detection Technique based on Intensity and Skin Color Distribution”, ICIP2004, pp. 1413-1416, 2004. [8] S.L. Phung, A. Bouzerdoum, and D. Chai, “A Novel Skin Color Model in YCbCr Color Space and its Application to Human Face Detection, ICIP2002, pp. 289-292, 2002. [9] R.-L. Hsu, M. Abdel-Mottaleb, and A.K. Jain, “Face Detection in Color Images”, IEEE Trans. PAMI, 24(5), pp.696-706, 2002. [10] K. Sabottka and I. Pitas, “Segmentation and Tracking of Faces in Color Images”, AFGR’96, Killington, Vermont, pp. 236-241, 1996.
(c)
[11] P. Peer, J. Kovac, F. Solina, “Human Skin Colour Clustering for Face Detection”, EUROCON1993, Ljubljana, Slovenia, pp. 144-148, September 2003. [12] C. Garcia and G. Tziritas, “Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis”, IEEE Trans. Multimedia, 1(3), pp. 264-277, 1999. [13] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features”, IEEE Conf. CVPR, Vol. 1, pp. 511-518, 2001.
(d)
Figure 8. Sample face detection results using the test data set
395
M2USIC 2006
TS- 3F
Scalable Pornographic Text Filtering System Oi Mean Foong1, Ahmad Izuddin Zainal Abidin2, Suet Peng Yong3, Harlina Mat Ali Information Technology/Information System Department, Universiti Teknologi PETRONAS, 31750 Tronoh, Perak Darul Ridzuan, Malaysia Tel: +605-3687422, E-mail:1,2,3{ foongoimean, izuddin_z, yongsuetpeng}@petronas.com.my
allowing them to produce and distribute information to the worldwide audience. Science Magazine has estimated that the size of the web is roughly about 320 million pages, with the web growing by several hundred percent per year [1]. The massive size of information available on the Internet produces information overload. Searching information in the Internet becomes a challenge as it often produces an overwhelming numbers of links, many points to entirely irrelevant sites. The web becomes the attention of the public in 1994, with issue of the scourge of easily accessed on-line pornography, violence and hate speech [1].
Abstract The advancement in computing enables anyone to become information producer, resulting in rapidly growing information in the Internet. One concern arising from this phenomenon is the easy access to offensive, vulgar or obscene page by anyone with access to Internet. One of the solutions for this concern is filtering software. Existing filtering software employed keyword matching to filter site content and required human intervention in determining the harmfulness of the page content. This paper presents a prototype called DocFilter that filters harmful content of text document without human intervention. The prototype is designed to extract each word of the document, stem the words into its root and compare each word to the list of harmful words in the hash set. Two systems evaluation were conducted to ascertain the performance of DocFilter system. Using various blocking levels, the prototype yields average filtering scores of 73.4%. The system is regarded to have produced an effective filtering accuracy of offensive words for most English text document.
The fear of this trend brings out the desire to protect the community, especially children from the harmful texts. As Internet becomes a part of daily life, the need for technology solutions to help in managing Internet access in education and enterprise becomes more acute [2]. As law on harmful Internet content has been passed, the software industry developed technological solutions, namely the content blocking filtering software that enforces their rules, blocking prohibited sites from being viewed by Internet user. Due to the desire to protect a community from the harmful information on the Internet, software filter is invented. Four most popular software filters currently available are Net Nanny, Solid Oak Software’s CYBERsitter, The Learning Company’s Cyber Patrol and SpyGlass Inc.’s SurfWatch [2]. They employed the use of artificial intelligence web spider to flag potential inappropriate content to be reviewed, categorized and added to the blocked list by the company employees. The list of URL’s added to the company BLOCK LIST would be blocked and inaccessible to their respective clients. A huge amount of resources, mainly labor and money, is required to keep up with the increasing number of web page in the Internet. The effort to create
Keywords: keyword matching, information filtering, adaptive information retrieval
1. Introduction Advancement achieved in computing enable information to be digitized and to obtain the unique characteristics of bits, namely convergence, compression, increased speed of dissemination and intelligence in the network. These characteristics enable anyone to become information producer,
396
M2USIC 2006
TS- 3F
software is by adaptive information retrieval system [7]. This approach is based on the query and relevance feedback from the user. System retrieved document based on queries from user and wait for feedback from user to indicate if retrieved document match the wanted document. In the process of relevance feedback, user will identify relevant document from the list of retrieved document to enable the system to create new query based on the sample document [8]. Based on this concept, the new query created based on relevant documents will return document that will also be similar to the desired document.
a perfectly working filter is not yet achieved. There are vast amount of reports in the Internet that pertains to the failure of filters to block the most repulsive content while most of them successfully block the non-sexual, non-violent content. There is no software available that is able to detect the potential harmful document without human intervention. This study aims to produce a prototype of software filter to identify obscene wordings in a document. The prototype will identify the existence of obscene words in the document or page without the intervention of human. In addition, the study is to find a way to reduce resources needed in order to filter unwanted information. In general, this study focuses on identifying harmful content of document to find possible solution that could be implemented to improve the identification process and reduce human intervention. The rest of this paper is organized as follows: In Section 2, previous work related to filtering software is given. Our document filtering system is introduced in Section 3. Results and discussions on the developed system are presented in Section 4. The last section summarizes this paper and proposes future work.
None of the above-mentioned approaches to information filtering is directly relevant to our proposed product in the sense that ours provide scalability. The scalable feature allows user to determine the blocking level of a document. Unlike the approach discussed in [9], our system does not replace censored words with words or sentences that potentially alter the whole meaning of the original sentence.
3. Censorship/Document Filtering The main module of the proposed architecture for document filtering DocFilter is shown in Figure 1. Tasks involved in identifying and detecting offensive document are document preprocessing, list processing and document processing.
2. Related Work Existing filters in the market employed the same mechanism that is categorizing, listing, word filtering and access, or distribution control [1]. These companies follow the same procedure of employing a web spider to flag potential harmful content by summarizing the site flagged by taking the 25 first and 25 last characters. This summary is passed to employees for further reviewing and categorizing. If site is offensive, the URL will be added to the company’s block lists. Some other filtering product pulls offending words from web pages without providing a clue to the reader that the text has been altered [3]. The altered text that results from filtering might change the meaning and intent of a sentence dramatically. Porter’s algorithm removes the commoner morphological and inflexional endings from words in English [4]. It strips suffix based on the idea that the suffixes in English language are mostly made of combination of smaller and simpler suffixes [5]. Another product, CiteSeer, consists of three main components, namely a subagent to automatically locate and acquire research publication, a document parser with database creator and a database browser interface which support searching by keyword and browsing by citation links [6]. Another approach to filtering
3.1. Document Preprocessing The first step is document preprocessing, where the DocFilter read text document contained in the specified directory and extract each string into individual token using a method called tokenizing. String tokenizing involved the use of existing function in Java to extract the substring from the string into individual word. This process is required to enable the system to analyze each word contains in the document word by word rather than painstakingly analyzing character by character [10]. Then the process of stemming each word takes place. Existing Porter stemmer algorithm is used in DocFilter. Stemming is the use of linguistic to get the root word.
3.2. List Processing There is a list of offensive words stored in a text file named “denylist.txt”, which acts as the database for
397
M2USIC 2006
TS- 3F
Figure 1: DocFilter System Architecture Table 1: Computing precision and recall
DocFilter. In list processing phase, offensive words contained in “denylist.txt” are uploaded into a hash set. Hash set is a collection that contains unique elements, stored in a hash. It is made up of an array where items are accessed by integer index. Features of a hash set are more efficient since access time in an array is bounded by a constant regardless of the number of items in the container. This class offers a constant time performance for the basic operations such as add, remove, contain and size, assuming the hash function distributes the elements properly among the buckets.
Actual Classes
Predicted Classes
Offensive
Non-offensive
Retrieved
a
b
Not retrieved
c
d
Precision =
a × 100% a+b
Recall =
a ×100% a+c
3.3. Document Processing Two functions in document processing are word detection and word frequency calculation. DocFilter used keyword-matching method to detect and filter the occurrence of offensive words in text document. Each token in text document is analyzed by comparing them with list of offensive word in hash table to find all unacceptable words. If token analyzed matches with word specified in offensive list, word is recorded and the number of occurrence is calculated.
The system’s blocking level is scalable as shown in Figure 2. Users determine the status of text document. DocFilter provides three different blocking levels, namely “Low Level”, “Medium Level”, “High Level”. Each level has different percentage of occurrence of offensive words in determining offensiveness of text document. Using this technique, DocFilter can be customized to allow some flexibility to support different level of access for different types of users. The blocking level is defined as follows:
In addition, each offensive word detected is replaced with a tag “” to inform user that DocFilter filters the document. Once offensive word is detected, DocFilter calculate the frequency of offensive word occurrence. Total number of words in a document is recorded. The percentage of offensive word occurrence occurrence for the whole text document analyzed is calculated using the formula in Table 1.
•
•
398
Low blocking level – Document is offensive when the percentage of occurrence is more than 15%. Medium blocking level – Document is offensive when the percentage of occurrence is more than 10% but less than or equal to 15%.
M2USIC 2006
•
•
TS- 3F
Once offensive words are detected and status of text document is determined, DocFilter provides user with the summary of the filtering done. Figure 3 shows the summary section of DocFilter.
High blocking level – Document is offensive when the percentage of occurrence is more than 5% but less than or equal to 10%. Strict blocking level – Document is offensive when the percentage is less than or equal to 5%.
Figure 2: Blocking level
Figure 3: DocFilter summary sections
399
M2USIC 2006
TS- 3F
evaluators are 26.76 for song lyrics, 27.2 for entertainment news and 19.52 for personal blog. However, DocFilter managed to detect 21 for song lyrics, 22 for entertainment news and 19 for personal blog.
4. System Evaluation Each module was developed with an objective in mind. Document preprocessing involves tokenizing word into tokens and stemming tokens into the roots. •
List processing involves loading the list of offensive word from the text file into a hash set, while document processing involve keyword matching and word occurrence calculation. Document pre-processing and list processing work in the background that is not transparent to the user. In the meantime, the documentprocessing module will produce output that is readable to the user. Each of this module functions as expected.
Table 2: Scores comparison between human evaluators and DocFilter
4.1. Comparison between human evaluators and Docfilter There are a total of three different types of document tested using DocFilter. These documents are entertainment news, song lyrics and personal blog. A total of 25 users consist of 16 parents and 9 English linguists are involved in the evaluation process, each evaluator is given six different types of text documents. First each user will read the original document. Next each user will highlight the occurrence of offensive word encounter. The same document is then processed using DocFilter to extract all offensive word in the document. Marks is given to each word that is highlighted both in the original document and extracted by DocFilter. For each report, average for the highlighted word by human evaluators and offensive word extracted by DocFilter is calculated. Then, total average for all documents is derived by comparing the scores of word highlighted by human evaluators against scores of DocFilter’s word occurrence. Finally, the text document status determined by DocFilter is compared against text document status determined by human evaluators.
Document Type
Result from Human Evaluator (E)
Result from DocFilter (D)
Entertainment news article
27.2
22
80.9
Song lyrics
27.68
21
75.9
Personal blog
19.52
19
61.5
Average Percentage
From the evaluation conducted, the results can be summarized as the following: •
Based on the default blocking level (High Level), both song lyrics and personal blog are identified to be offensive by DocFilter while the entertainment news is not considered as offensive. The result produced is similar for both human evaluators and DocFilter.
The average scores of offensive words detected by human evaluators differ from offensive words detected by DocFilter for 3 categories of text documents (song lyrics, entertainment news and personal blog). Based on offensive words stored in denylist.txt, actual occurrences in song lyrics are 24, entertainment news 27 and personal blog 16 offensive words. Average scores by human
D x 100% E
72.8
Based on “High” blocking level, the performance of DocFilter in determining the status of text document is 75.86% for song lyrics, 80.88% for entertainment news article and 61.47% for the personal blog. Based on these evaluations, the total average scores for DocFilter’s filtering accuracy is 72.8%. Table 2 shows the scores comparison made between human evaluators and DocFilter. After the document is processed, offensive word encountered is replaced with “”.
4.2. Precision and Recall Computation System evaluation were also conducted to compute the average precision and average recall for 3 categories of text documents such as song lyrics, entertainment news and personal blog as shown in Table 3.
400
M2USIC 2006
TS- 3F Some added features are recommended for future work including improvement on stemming algorithm. The accuracy of filtering can be achieved if the stemming algorithm can be improved to stem each word into their root word accurately. Currently, DocFilter faces some inaccuracy in the filtering process due to the problems of under-stemming and over-stemming. In addition, automating list of offensive word can be a good feature to have. An automated list of offensive word is a good approach to be applied to DocFilter. It will enable the user to add and delete word from the offensive word database as they see fit since there is a possibilities that new offensive words will appear. Another recommendation is that the system be extended to offensive words in Malay as the number of websites in Malay has increased.
Table 3: Calculate average recall & precision Average Precision (P) %
Average Recall (R) %
22
4.0
85.0
363
19
5.2
77.0
264
21
8.0
60.0
Types of documents
Total No. Words
Entertainment News
551
Personal Blog Song Lyrics
Offensive words retrieved
Average Percentage
74.0
The evaluation result reconfirms that precision is inversely proportional to recall and its average recall is 74.0% for keyword matching text filtering system.
6. References [1] Web Skills and Evaluation. The Importance of information. Retrieved 18 August, 2005, from the World Wide Web: .
5. Conclusion Generally, most filtering software require human intervention in determining the status of the site. Based on keyword matching approach, this filtering software depends on the URL block list and list of offensive word to filter and block site from their clients. DocFilter is a prototype aiming to find possible a method or a solution that could identify and determine the harmfulness of document’s content without human intervention. Furthermore, DocFilter employed the use of stemming algorithm to stem word to its root in order to improve the keyword matching used by existing filtering software. The architecture of DocFilter is carefully designed, to give the best system’s output possible. Overall, the experimental results shows that the application manages to detect offensive word occurrence in the sample documents. DocFilter produced quite a good result with average score of 73.4%. The performance of DocFilter can be improved with the inclusion an automated revision of offensive word list so user can add and delete offensive words based on their preferences.
[2] Hunter, C.D., 1999, ‘Filtering the Future?: Software Filters, Porn, PICS and the Internet Content Conundrum. Unpublished thesis’, University of Pennsylvania: Faculty of the Annenberg School.
The contribution of the study is that unlike the filtering approach in [3], our system does not alter censored words to words that are not necessarily having the same meaning as the offensive words. Our system is more on censoring offensive words and display original articles minus the offensive words. Secondly, our system is scalable, allowing user to adjust the level of blocking.
[8] Chen, L. & Sycara, K., 1997, ‘WebMate: A Personal Agent for Browsing and Searching’.
[3] Chowdhury, G.G., 2001, ‘Introduction to Modern Information Retrieval’, Library Association Publishing London. [4] Packet Dynamics Ltd. Studying Bloxx Filtering Technologies Version 2. (1999-2005). Bloxx No Nonsense, 1-6. [5] The Porter Stemming Algorithm. Retrieved 15 August, 2005 from the World Wide Web: . [6] Lawrence, B.K., Giles, S. & L., 1998, ‘CiteSeer: An autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications’, 2nd International ACM Conference on Autonomous Agent. [7] Department of Computer and System Sciences, 1998, ‘Information Filtering with Collaborative Interface Agent. (Report)’, Stockholm, Sweden.
[9] Censorware: How well does Internet filtering software protect student. Retrieved 2 September, 2005 from the World Wide Web:
401
M2USIC 2006
TS- 3F
Computer Based Instructional Design in Light of Constructivism Theory Kamel H. A. R. Rahouma
Inas M. El-H. Mandour
Assoc. Professor of Computer Science Technical College in Riyadh Riyadh, Kingdom of Saudi Arabia E-mail:
[email protected]
Assistant Professor of Educational Technology Faculty of Education for Women in Qwayia Qwayia, Kingdom of Saudi Arabia E-mail:
[email protected]
characteristic of successful instruction [3]. Learner control must consider the performance and motivation variables. learners should be provided with some level of control over the selection, sequence, and pacing of content in order to reinforce the belief that they personally control their own success [4] Section (2) discusses the constructivism and instructivism interpretations of instructional technology and the implications of these interpretations on instructional design. The different approaches of instructional design are introduced in section (3). Section (4) introduces the learner mental models and explains the different format of CBID. Section (5) focuses on the relationship between instructional design and instructional development. Section (6) presents some instructional design recommendations based on constructivism. Section (7) gives some conclusions. Some of the used references will be listed at the end of the paper.
Abstract Constructivism is a recent established theory which shows that knowledge does not transfer from the teacher to the learner through instruction but the learners build their knowledge structures according to their interaction with the learning environment and guided by instruction and teacher. This paper explains the relationship between constructivism theory and computer based instructional design (CBID). It exposes the different aspects of each of them and gives some constructivism recommendations for the CBID. Terms of instructivism and constructivism are to be explained. The instructional design approaches and the relationship between instructional design and instructional development will also be explored. Models of computer based instructional design will be presented and some of the constructivism recommendations will be highlighted.
Keywords: Constructivism, Instructivism, Techno-Centric Instructional Design, Computer Games.
Rapid
Prototype,
Microworlds,
2. Instructivism/Constructivism Interpretations of Instructional Technology
1. Introduction
There are two dominant and divergent interpretations of instructional technology, that affect instructional design and both of them assign a significant role for using computers in learning and education. The first view is closely aligned with instructional systems development (ISD) to treat instructional applications of computers related to conventional applications of other educational media. The second view is based on constructivism and it considers the computer a rich source of cognitive tools for learners. A learner's prior knowledge, abilities, needs, and interests have a major influence on how the instruction is designed. Most of the major instructional decisions, such as how content is selected, sequenced, structured, and presented are usually made on behalf of the learner.
The concept of constructivism represents a dramatic alternative view to instructional technology. Constructivism asserts that learning is a continuous and never-ending process of building and reshaping mental structures. Instructional designers are asked to merge features of gaming, simulations, and microworlds to construct flexible learning environments for students to appropriately alter the environment to match their abilities and interests and to use as a kind of learning skills. Rather than viewing students as passive agents who "receive" instruction, it is assumed and required that they should be active learners [1]. Some research indicates that total learner control of CBI is usually not advisable unless paired with some sort of coaching or advisement strategy [2]. Other research indicates that learner control is an important
402
M2USIC 2006
TS- 3F who learn according to the first scenario can pass tests, but may actually revert to their personal view, or theory, of the world when confronted with novel physics problems to solve (Eylon & Linn, 1988). Constructivists believe that learning is enhanced in environments that provide a rich and varied source of engaging experiences [8]. Thus, learning is not about acquiring new knowledge, but constantly reconstruct what someone already knows [9] and then a new structure is formed because new information just no longer matches the available structures is constructed. Central to constructivism is the assumption that to know is to continually reconstruct [10].
2.1. Instructivism Interpretation The part "-ism" is used in the term "instructivism" to describe the first interpretation of instructional technology [5]. Instructivist models characterize learning as a progression of stages starting at the novice or beginner level in a particular domain and ending at the point where the learner becomes an expert. Programmed instructional methods were based on and implemented using computers and other media to help the learners go forward and undertake the different steps to the purposed level of instruction. In this process, all instructivists make two assumptions: 1- One instructional purpose is to help the learner understand the "real world." 2- The teachers' and educators' authority and responsibility is to decide what should be taught and how it should be taught for such purpose.
3. Instructional Design Approaches There are essentially two main categories of instructional design. These are the techno-centric and non-techno-centric approaches.
2.2. Constructivism Interpretation 3.1. Techno-centric approach Constructivists define instructional technology as the generation of computer-based tools that provide rich and engaging environments for learners to explore. These environments are referred to by constructivists as microworlds because they allow learners to participate in a set of ideas until they begin to "live" the ideas, not just study them [6]. Constructivism believes that each of us defines the world according to what we know and believe [7]. So, instead of suggesting that knowledge can be transferred from one person to another, individuals use information from the environment as building blocks to construct their knowledge according to their environment or culture in a never static, but dynamic and ever-changing status.
This approach is very common in materials-centered instruction across all media, including computers. It is right to instructional designers confronted with tough design decisions. The techno-centric means that a certain technology, rather than the learner and instruction, is put at the center of the design process. Thus, all subsequent design decisions are based on the technology which is usually equated with the products of instruction. Although it is hard at first glance to find fault with this approach, there are many dangers inherent in it. Designers and consumers of educational computing unconsciously fall into techno-centric traps and believe that good instruction should incorporate all of the machine's capabilities. They encourage the use of all special features, instead of questioning whether such features are relevant to the lesson goals or not [11]. Hence, a techno-centric designer may criticize CBI, that contains no graphics or animation.
2.3. Implications on Instructional Design Consider Newton's first law that states that an object at rest remains at rest and one in motion remains in motion unless acted on by some outside force. Compare two different instructional designs for teaching this principle. First, consider a physics class where a teacher lectures about the principle to a room full of students sitting attentively in their chairs, followed by a series of homework problems from the textbook. Next, consider a second classroom where the teacher has each student build and test a series of ramps with a variety of objects (in order to test different levels of friction). The first scenario has students interacting with information selected and interpreted by someone else. In the second scenario, students begin by interacting with the principle itself and the teacher's job is to facilitate, manage, or at times, guide, the students' interactions. Research indicates that physics students
3.2. Non-Techno-centric approaches 1. The empirical approach is based on trial and error. The designer begins with a believed best idea, with little or no supporting rationale, tries it out, carefully observes what works and what doesn't work, and then makes adjustments for the next attempt. Few improvements come from each trial and the design is slowly and progressively shaped to achieve the desired goal. This approach is used by content experts given teaching or training responsibilities for the first time. 2. The artistic approach looks at instruction as an art or craft that takes years to perfect and thus,
403
M2USIC 2006
TS- 3F dramatic shift or a true design alternative. This depends on the current view of instructional design. Given the complexity of the design process, designers are necessarily making decisions with either incomplete information or no information. So, much design is necessarily based on conjecture. On the other hand, it would be a mistake to believe that designers who practice rapid prototyping are actually winging it. Rapid prototyping assumes the need for: (1) an understanding of learning theory, (2) instructional theory, (3) instructional practice, (4) profiles of the intended learners, (5) an understanding of the content area, and (6) consideration of contextual constraints such as the physical learning environment, motivational factors, and availability/limitations of resources, and so on. Design with rapid prototyping begins with intelligent and informed first-generation prototypes, which may or may not develop into a final product. Critical principles and assumptions for rapid prototyping may include: 1. The design and development may be considered as a one process although they are separated into individual activities and tasks to simplify and manage the process. Rapid prototyping offers the flexibility to test radically different designs and approaches, including competing hypotheses. 2. The medium within which the designer is working must possess modularity and plasticity in order for rapid prototyping to be practical [15]. Computer tools are usually able to satisfy both attributes. Modularity means the ability to add, delete, or rearrange entire sections of the instruction quickly and easily. Plasticity means the ability to make modifications to the existing prototype materials with minor time and effort.
teachers often begin to build up their teaching strategies based on years of experience. Master teachers may be seen as those who are able to suggest instructional plans that everyone trusts. They may be excellent models and perhaps good mentors to novices but are largely unable to explain what precisely they do or how they do it. 3. The analytic approach uses a systematic and systemic plan to guide the decision-making process of instructional design. It is known as the systems approach to instructional design on which scores of models are based [12]. It incorporates the empirical and artistic, but controls for their weaknesses and potential excesses [13]. It needs regular and systematic field testing of materials. After instructional decisions have been made and instructional strategies and tasks are determined, media that offer appropriate attributes to deliver them to the students are decided.
4. Instructional Design/Development 4.1. Traditional ISD A characteristic feature of traditional ISD is that design and development remain separate processes, though each is expected to provide critical feedback to the other. Traditional ISD assumes and expects that designers and developers will not necessarily be the same people. This allows for effective management and communication of many people working together with limited resources. Traditional views of instructional design at the lesson level go as follows: 1- Lesson objectives are identified. 2- A lesson plan or prescription is drafted and revised (the design phase). 3- A lesson script is written (the development phase). 4- The first drafts of the materials are produced. 5- A formative and summative evaluation is done to improve the materials through a series of field tests (formative) and to provide information on the effectiveness of the materials without using the information for revision (summative) [14]. Costs of producing instructional materials depend on the medium chosen. Production can not be repeated until the instruction is appropriate. Thus, designers use materials' prototypes in the formative evaluation.
5. FORMATS OF CBID CBID has three formats: the microworl, simulation, and computer games. Selection from these formats depends on the learner's mental model which is simply an individual's conceptualization, or theory, of a specific domain or system. Mental models are loosely organized and forever changing as new interactions with the environment suggest adaptations. The application of mental models to the instructional design involves: (1) the target system that a learner is trying to understand, (2) the user's mental model which describes the personal understanding of the target system, and (3) building a conceptual model which may be designed and presented to the users to help them more accurately understand a system [16]. Conceptual models are usually invented by teachers, designers, or engineers [17].
4.2. CBID: Rapid Prototyping Rapid prototyping is based on the idea that design, development, and implementation can never truly be separated and distinguished from one another. For some, it may represent extensions of formative evaluation of traditional ISD; for others, it represents a
404
M2USIC 2006
TS- 3F characteristics can heavily overlap, each of them can remain mutually exclusive depending on their design and use in a learning environment. A simulation is an attempt to mimic a real or imaginary environment or system while the microworld is designed according to the learner and learning situations. Simulations have a long history in education. Virtual reality is the most recent sibling to simulations and it uses the most sophisticated visualization techniques available [20] and transports users from one reality to another so that what seems to be present really is not. Simulations provide a means of studying a particular system and they teach about the system [22]. They are based on a set of rules or models [21] and offer the advantage of providing the feedback to the student in real-time since the mathematical model of the system is programmed onto the computer which can speed up or slow down the process, that is useful for feedback. Simulations: 1- are distinguished based on their interactivity [23]. 2- allow users to choose or set the value of variables and then watch the effects of their choices. 3- are commonly visual to provide greater similarity between the simulation and the actual system [24]. 4- refer to the degree of realism as its fidelity where the relationship between learning and a simulation's fidelity is nonlinear and depends on the instructional level of the student (e.g., too much realism may cause more harm than good, especially for inexperienced students) [24]. 5- would not be considered microworlds because they are designed to represent many of the variables and factors of the real experience. 6- start to become microworlds when designed to let a novice begin to understand the underlying model. 7- may be easily obtained from microworlds (e.g., a mathematical microworld, that estimates distances using a computer program to move the turtle from a point to another on the screen, becomes a whale search simulation, by changing the turtle into an animated boat and the screen target into a whale).
5.1. Microworlds: The basic format of CBID Constructivism indicates that knowledge is personally constructed as a result of cognitive conflicts with the environment. The term "Microworld" is used to describe placing learners in contact with the learning environments [6]. This refers to planning the learning situations which suit the learners to construct their knowledge. Personal discovery and exploration are essential for learning in microworlds [6] which present learners with experiences within specific domain boundaries and allow learners to establish preliminary ideas and then transform them into more sophisticated aspects of the domain. A microworld, may: 1- be a small, but complete subset of a domain. 2- be the simplest model of a domain that is recognizable by an expert of the domain. 3- provide an immediate "doorway" for novices to access a domain through experiential learning. 4- provide general, useful, and synthetic learning experiences. 5- provide learners with "objects to think with." 6- promote problem solving through "debugging." 7- share characteristics of an interactive "conceptual model." 8- be a small but complete subset of reality to which one can go to learn about a specific domain. Constructivism computer applications (microworlds) (such as LOGO, authorware, .. etc) reflect and promote Piagetian learning and let learners explore many areas of knowledge [18] and may have the characteristics: 1- The program is a procedural language in which a large problem can be broken down into more manageable chunks from top to down. 2- The turtle geometry is a main tool in the program. 3- Students can increase the turtle's vocabulary by creating new commands, or procedures that consist of the language primitives and other procedures. 4- Turtle is an aid to debug, identify and correct errors within the used computer program. 5- Turtle's animated graphics provide instantaneous graphic feedback as a powerful learning strategy. 6- Microworlds should be simple, general, useful, and synchronic [19]. 6- Learner control is essential in successful microworlds [2] to encourage learning conflicts to activate the process of equilibration. 7- Microworlds offer learners an opportunity to exercise a cognitive or intellectual skill that they would be unable or unlikely to do on their own.
5.3. Computer Games Related to Microworlds and Simulations Play serves several cognitive functions as well as entertaining. Through play, one practices a set of information over and over in a variety of contexts until the individual is completely comfortable and familiar with it. Play is valuable for people of all ages. Gaming also offers many similarities to microworlds and simulations, though it can remain totally distinctive. The value of games is that they are fun. Of course, fun is an extremely abstract concept. One common characteristic of most games is competition, in the form
5.2. Simulations Related to Microworlds Some instructional designers may be confused between the microworld and simulation. While their
405
M2USIC 2006
TS- 3F 2- establish a pattern for learners to goes from "the known to the unknown" and links new ideas to what they already know. 3- emphasize the usefulness of errors which are (1) an important type of feedback, (2) essential for learning and obtaining instructive information for problem solving [29], (3) can be handled in a systematic process usually referred to as debugging, (4) required for concept formation, (5) can be used within a microworld to help a learner to be successful in hypothesis forming and testing. 4- provide a balance between deductive and inductive learning where deductive learning is easier to apply for low level learning outcomes, (e.g., fact learning), and inductive learning is based on a "sink-or-swim" philosophy where learners may become frustrated or bored if they are unsuccessful or disinterested [30]. 5- anticipate and nurture incidental learning where: (1) constructivists recognize that learning rarely follows a fixed sequence for all learners, (2) Instructivism approaches, on the other hand, try to take a group of learners through an instructional sequence designed to meet predetermined learning objectives, (3) carefully designed microworlds help to balance the risks and incentives associated with both intentional and incidental learning, (4) the teacher role becomes critical because of the need to channel incidental learning to the lesson objectives or to revise lesson objectives to accept unexpected learning outcomes.
of learner against learner, learner against computer, or learner against self [25]. There are many negative aspects to competition, especially the learner versus learner ones. Students who constantly lose may become completely turned off to learning. Also, positive aspects of competition can be emphasized by challenge. Malone has proposed a framework to intrinsically motivate instruction by the interplay of challenge, curiosity, and fantasy [26]. Challenge and curiosity are closely related, and must be optimally maintained to be effective. They often result when tasks are novel, moderately complex, or they produce uncertain outcomes. Challenging tasks completion enhances feelings of confidence and competence [27]. To optimize educational game's challenge and curiosity [26]: 1- Design every game with a clear and simple goal. 2- Design games with uncertain outcomes. 3- Structure the game for players to increase/decrease the difficulty to match their skill and interest. 4- Design the game with layers of complexity and a broad range of possible challenges. 5- Provide some measure of success (e.g., scorekeeping features), for players to know how they are doing. 6- Clearly display feedback about a player's performance to make it readily interpretable. 7- Provide players with some level of choice. Fantasy is important where it encourages students to complete the activity in a context that is not really present. Intrinsic and extrinsic fantasies are described by Malone as two main kinds of computer game's fantasies [26]. Extrinsic fantasies overlay some general game context on an existing curriculum area. They can re-use the same game design with any content area. The students put up with the skill or information given in an extrinsic fantasy. Intrinsic fantasies mix the game and skill being learned [28] and allow the students to participate in the instructional skill of the game and combine the characteristics of microworlds & simulations (i.e., challenge, curiosity, and fantasy).
6.2. Graphics Design Recommendations These recommendations are based on the features of instructional applications with static/animated graphics. 1- The graphics intents and outcomes are considered and evaluated throughout the instructional design. 2- The computer visual (representational, analogical, arbitrary) and its instructional function (cosmetic, motivation, practice, presentation) are selected and used based on the learner needs, lesson content, task nature, and delivery system. 3- Graphics should not distract learner's attention. 4- Graphics are designed and used only whenever it is appropriate to serve the intended purposes. 5- Graphics are used in the design of instructional materials to increase motivation and interest. 6- Graphics and transition screens are planned and used to help users understand the information flow. 7- Graphics are used to present meaningful contexts for learning and to increase the intrinsic motivation and fantasy stimulation of the learner. 8- Attention-gaining graphics are designed to pull learners to the instruction and tasks by providing them with direct and clear directions to actively search for or use specific information in the visual.
6. Recommendations for CBID The following guidelines (1) have both instructivist constructivist influences, (2) are a compromise between these approaches, (3) offer a means to understand and incorporate constructivist goals in instruction [5].
6.1. Format Design Recommendations Designed interactive learning environments should: 1- provide a meaningful learning context that supports motivating and self-regulated learning that is relevant to the learner interests and life [28].
406
M2USIC 2006
TS- 3F [10] Forman, G., & Pufall, P. (1988). Constructivism in the computer age: A reconstructive epilogue. In G. Forman & P. Pufall (Eds.), Constructivism in the computer age (pp. 235-250). Hillsdale, NJ: Lawrence Erlbaum Associates. [11] Ragan, T. J. (1989). Educational technology in the schools: The technoromantic movement. Ed. Tech., 29(11), 32-34. [12] Gustafson, K., & Powell, G. (1991). Survey of instructional models with an annotated ERIC bibliography (2nd ed.). (ERIC Document Reproduction Service No. ED 335 027) [13] Reigeluth, C. (1983). Instructional design: What is it and why is it? In C. Reigeluth (Ed.), Instructional-design theories and models: An overview of their current status (pp. 3-36). Hillsdale, NJ: Lawrence Erlbaum Associates. [14] Gagné, R. M., Briggs, L., & Wager, W. (1992). Principles of instructional design (4th ed.). New York: Holt, Rinehart, & Winston. [15] Tripp, S., & Bichelmeyer, B. (1990). Rapid prototyping: An alternative instructional design strategy. Ed. Tech. Research and Development, 38(1), 31-44. [16] Norman, D. (1988). The psychology of everyday things. New York: Basic Books. [17] Mayer, R. E. (1989). Models for understanding. Review of Educational Research, 59, 43-64. [18] Lockard, J., Abrams, P., & Many, W. (1990). Microcomputers for educators (2nd ed.). Glenview, IL: Scott, Foresman/Little, Brown Higher Education. [19] Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. New York: Basic Books. [20] Rheingold, H. (1991). Virtual reality. New York: Summit Books. [21] Willis, J., Hovey, L., & Hovey, K. (1987). Computer simulations: A source book to learning in an electronic environment. New York: Garland Publishing. [22] Reigeluth, C., & Schwartz, E. (1989). An instructional theory for the design of computer-based simulation. Journal of Computer-Based Instruction, 16(1), 1-10. [23] Alessi, S. M., & Trollip, S. R. (1985). Computer-based instruction: Methods and development. Englewood Cliffs, NJ: Prentice-Hall. [24] Alessi, S. M. (1988). Fidelity in the design of instructional simulations. J. of Computer-Based Instruction, 15(2), 40-47. [25] Hannafin, M., & Peck, K. (1988). The design, development, and evaluation of instructional software. New York: MacMillan. [26] Malone, T. (1981). Toward a theory of intrinsically motivating instruction. Cognitive Science, 4, 333-369. [27] Weiner, B. (1979). A theory of motivation for some classroom experiences. J. of Ed. Psychology, 71(1), 3-25. [28] Cognition and Technology Group at Vanderbilt (1990). Anchored instruction and its relationship to situated cognition. Educational Researcher, 19(6), 10. [29] Schimmel, B. (1988). Providing meaningful feedback in courseware. In D. Jonassen (Ed.), Instructional designs for microcomputer courseware (pp. 183-195). Hillsdale, NJ: Lawrence Erlbaum Associates. [30] Seaman, D. F., & Fellenz, R. A. (1989). Effective strategies for teaching adults. Columbus, OH: Merrill.
9- Transition screens between lesson parts can help to: provide markers of lesson's completion, break up a series of presentations, and attract attention. 10- Graphics are congruent and relevant to certain text to convey only intended information.
7. Conclusions Constructivism and instructivism terminologies were introduced and their interpretations of instructional technology were explained. The implications of these interpretations on instructional design were also discussed. Different approaches of instructional design were presented. The computer based instructional software (microworlds, simulations, and computer games) were discussed as different formats of CBID. Selection between these formats depends on the used mental model of the learner. The learner mental models were introduced and the different formats of CBID were explained. The relationship between instructional design and instructional development was exposed based on the point of view of traditional instructional system development and the point of view of CBID. Some instructional design recommendations were highlighted based on constructivism and instructivism.
References [1] Jonassen, D. (1988). Integrating learning strategies into courseware to facilitate deeper processing. In D. Jonassen (Ed.), Instructional designs for microcomputer courseware (pp. 151-181). Hillsdale, NJ: Lawrence Erlbaum Associates. [2] Steinberg, E. R. (1989). Cognition and learner control: A literature review, 1977-1988. J. of Computer-Based Instruction, 16, 117-121. [3] Kinzie, M. B., Sullivan, H. J., & Berdel, R. L. (1988). Learner control and achievement in science computerassisted instruction. J. of Ed. Psychology, 80(3), 299-303. [4] Milheim, W., & Martin, B. (1991). Theoretical bases for the use of learner control: Three different perspectives. J. of Computer-Based Instruction, 18(3), 99-105. [5] Rieber, L. P. (1992). Computer-based microworlds: A bridge between constructivism and direct instruction. Ed. Tech. Research and Development, 40(1), 93-106. [6] Dede, C. (1987). Empowering environments, hypermedia and microworlds. The Computing Teacher, 15(3), 20-24, 61. [7] Goodman, N. (1984). Of mind and other matters. Cambridge, MA: Harvard University Press. [8] Papert, S. (1988). The conservation of Piaget: The computer as grist to the constructivist mill. In G. Forman & P. Pufall (Eds.), Constructivism in the computer age (pp. 313). Hillsdale, NJ: Lawrence Erlbaum Associates. [9] Fosnot, C. T. (1989). Enquiring teachers, enquiring learners: A constructivist approach for teaching. New York: Teacher's College Press.
407
M2USIC 2006
TS- 3F
The Role of Computer Based Instructional Media and Computer Graphics in Instructional Delivery Kamel H. A. R. Rahouma
Inas M. El-H. Mandour
Assoc. Professor of Computer Science Technical College in Riyadh Riyadh, Kingdom of Saudi Arabia E-mail:
[email protected]
Assistant Professor of Educational Technology Faculty of Education for Women in Qwayiyah Qwayiyah, Kingdom of Saudi Arabia E-mail:
[email protected]
the display of nonverbal information, or information that is conveyed spatially. The role of graphics in computer environments covers a lot of ground. Computers offer many instructional applications than just presentation. Computer microworlds are one of the exciting areas and based on computer animation [3-5]. This paper presents the relationship between instructional environments, instructional media and computer graphical capabilities. The role of computers in instruction is presented in section (2). Section (3) presents the relationship between computers and different instructional media formats. Classification of instructional graphics is introduced in section (4). Some design issues are discussed in section (5) and in section (6) some design recommendations are given. Section (7) highlights some conclusions and a list of the used references is given at the end of the paper.
Abstract This paper aims to present the role of computer based instructional media and computer graphics in instructional delivery. Issues about computers and instructional multimedia, hypermedia, and different levels of interactive video are presented. The role of computer instructional graphics is discussed. Classification of instructional graphics and some design issues of instructional graphics are introduced. Recommendations are given for computer graphics and instructional delivery.
Keywords: Computer based Instruction (CBI), Computer Graphics, Interactive Video, Rapid Prototyping, Microworlds
1. Introduction Highly interactive learning environments that computers make possible represent the major reason for investing in computer technology [1]. No doubt that computers offer powers for designers, developers, and instructors. The range, power, and number of computer tools for desktop computers are increasing. Terms of computer-assisted instruction (CAI) and computerbased instruction (CBI) are also overlapping. However, CBI may refer to instructional systems that are completely computer-based such that instructional delivery, testing, remediation, etc., are presented and managed by computer. Also, CAI may refer to using computers to support a larger instructional system, such as a traditional classroom [2]. Multimedia, and closely related areas of interactive video, and the surrounding "hyper-" environments, such as hypertext and hypermedia are main tools of both of CBI and CAI. These tools are effectively dependent on their visual effects. Computer visuals refer to all possible computer output, including text and graphics. Instructional computer graphics are considered a subset of computer visuals and involves
2. The Role of Computers in Instruction Materials-centered instruction depends on media rather than the teacher, trainer, or workshop leader for the primary presentation of instruction. A lot of developments and improvements in the courseware production are gained with the use of microcomputers because of the easy replication of courseware and consequently decreasing the cost of replicating efforts which have two parts:
2.1. The product form A certain medium may be used to deliver certain instructional attributes or stimuli such as: 1- Audiocassettes deliver aural stimuli, such as the spoken word, music, and other sounds. 2- Overheads present static visuals, such as words and pictures. 3- Slide/tape projectors offer static visuals/sound. 4- Videos offer static/dynamic visuals/sound.
408
M2USIC 2006
TS- 3F video. There are many hypertext tools such as Apple's HyperCard and IBM's Linkway and ToolBook [9]. Hypertext allows users to create their knowledge representations in a particular domain. Hypermedia allow users to explore informational systems by selecting and sequencing their own paths in a domain. Hypertext environments conform a propositional networks which involve an ever-transforming network of nodes and links. Nodes represent one of many different kinds of informational units, and links represent how the nodes are related or associated. Hypermedia derives much more from the links than the nodes. Hypermedia and CBI have opposite design directions where in CBI, authors and designers make decisions on how to relate information in advance [10]. Without guidance, novices may have a difficult time exploring a hypertext environment and often become disoriented [11,12]. Hypertext environments are probably not good instructional systems for introducing novices to an area, but may be good for users to subsequently organize and integrate information. Hypermedia environments have much potential in education because of the need for both structured and unstructured learning environments in training and education. With hypermedia a balance should exist between deductive and inductive learning strategies in the context of simulations, games, and microworlds [8].
5- Static/dynamic animation in linear/nonlinear formats with high-quality sound has become easy. Linking the computer to other media, such as videodisc players, may result examples of multimedia with unlimited number of instructional attributes while using computers, alone, does a poor job of delivering attributes such as responding and reacting to a student's puzzled or frustrated look, or encouraging a student when his/her self-esteem needs a boost. Of course, this has affected the delivered courses and the delivery method. Also, the teacher role has completely changed from teaching to helping.
2.2. Processing the product In traditional instructional design, the design and development phases are completely separated. So, the design is completed and then the product is developed. The final product does not appear unless its final form is decided. This requires the design to go through a test/modification process several times till the decision is established and this is costly and time consuming. With computers, design and development phases are integrated in one process through the rapid prototyping. Thus, the test/modification process is shortened and rapidly the final product form is established [6].
3. Computers and Instructional Media 3.3. Multimedia and Interactive Video 3.1. Multimedia Interactive video resembles the current multimedia interpretations. It is the marriage of video and computer technologies. It may be a computer-managed video or as CBI with a video component. It increases the learner's control over the video material [2]. There are many taxonomies of interactive video levels [13-15]. The following one is mostly hardware-based and it shows the interactive opportunities available through different hardware configurations. These level include: 1. Levels 0 and 1 include only the video technology with no computer. Level 0 includes linear video presentations or broadcast television where no interactivity between video materials and students who can not interrupt the video presentation once it begins. Level 1 includes manual interruptions of a stand-alone videodisc or videotape that is usually stop/start by either the teacher or student. Level 1 also includes manual branching and searching of segments by the teacher or student through the manual controls of the video unit. 2. Level 2 merges between computer and video technologies. A videodisc player, aided with an onboard microprocessor, runs a program for conditional branching based on a student's input to the videodisc's keypad.
Multimedia may refer to any situation in which a variety of educational media are used. It also refers to instructional delivery systems that include two or more media components, such as print-based, computerbased, and video-based. Thus, a traditional instructional setting combining lecture with a slide/tape presentation is considered multimedia. Multimedia may deliver a wide range of visual and verbal stimuli, through or in tandem with computer-based technologies although the computer is not necessarily a prerequisite. Interactive multimedia may be "a collection of computer-centered technologies that give the capability to access and manipulate a variety of media" [7]. Designers may be afraid of falling into the "technocentric design" traps where they become interested in incorporating technology in instruction either it is necessary or not [8]. Also, software designers may not be able to have direct influence on hardware advances.
3.2. Multimedia, Hypertext, and Hypermedia The term "hyper" translates simply as "link" and has been extended to include hypermedia environments, or systems that link various media, such as computer and
409
M2USIC 2006
TS- 3F
3. Level 3 is the level at which the video player is interfaced to a separate computer for mainstream interactive video or multimedia applications. Most level 3 systems have one monitor for the computer and one for the video, one monitor to be switched between the video and computer in case of some expensive systems. A computer monitor may include video "windows," in which a small portion of the monitor displays video material. 4. Level 4 defines the last level of interactive video (some taxonomies include several more levels). Its systems include a creative assortment of hardware, (e.g., multiple video units, sound synthesizers, voice recognition, touch screens, etc.). New technologies (e.g., virtual reality, interactive) give video a completely new definition. The taxonomy makes it easy for newcomers to understand the system configurations but unfortunately it promotes techno-centric design [8]. Some people consider interactive video and multimedia as unique technologies but other people argue that these technologies may be appropriate for learning when based on careful analysis of learner and instructional design attributes. However, Video has a dramatic and immediate ability to elicit an emotional response from an individual and this can provide a strong motivational incentive to choose and persist in a task [16]. Many examples use video technology such as social and language applications [17] and other applications that have a more constructivistic flavor, where learners build their own interactive video materials to learn about science and social studies [18]. Instructional design of these systems should be based on a careful analysis of many interrelated and interdependent elements including psychological foundations of individuals, especially those related to visual learning and instructional design of interactive learning systems.
They range from abstract to highly realistic. While, the line drawing is an example of the abstract representational visual, photographs and richly detailed colored drawings are the most common examples of realistic representational visuals [19]. However, simple line drawings are often considered better learning aids than realistic visuals, especially when the lesson is externally paced, such as in films and video. 2. Analogical Graphics are better for situations when students have absolutely no prior knowledge of the concept [20]. An analogy can act as a familiar "building block" on which a new concept is constructed [21]. If the student does not understand the content of the analogy, then its use is meaningless and confusing. Worse yet, students may form misconceptions from an inadequate understanding of how the analogy and target system are alike and different [22]. Graphics can help learner's see the necessary associations between parts of the analogy. 3. Arbitrary Graphics offer visual clues. They do not share any physical resemblances to the concept being explained, but yet contains visual or spatial characteristics that convey meaning. Examples may include outlines, to flowcharts, bar charts, and line graphs. Graphs also logically represent information along one or more dimensions, but the main purpose of graphs is to show relationships among the variables in the graph.
4.2. Production-Based Classification 1.
4. Instructional Graphic Classification Animated sequences, such as film and video, can be presented by many media. The computer is rapidly becoming an important tool in modern animation studios, such as Disney, where computer is used as an important production tool for creating animation sequences that are normally transferred to film or video for delivery. There are two classifications of visual types commonly used in instruction. The first one is based on the application of graphics and the second classification is based on the production of graphics.
4.1. Application-Based Classification There are three types of graphics classified as [6]: 1. Representational Graphics share a physical similarity with the object they are representing.
410
There are 3 ways to produce computer graphics [6]: The command-based approaches involve algorithmic processes for defining a graphic such as writing a programming code, using special graphics commands particular to the programming language (e.g., PASCAL, C, BASIC, LOGO). Such approaches have the following features: a- They are based on a Turtle Geometry (TG) or a Cartesian Coordinate System (CCS). b- In the CCS, a set of Cartesian coordinates consists of two numbers, (x,y), denoting the horizontal and vertical axes. Computers usually use the positive/positive quadrant of the screen where the graphics is really just a matter of playing "connect the dots" through a variety of programming commands which have very similar functions across languages. c- Turtle graphic is one of many "microworlds" that was initially developed for use with the LOGO programming language and founded on learning principles associated with the process of creating the graphic to find a way
M2USIC 2006
TS- 3F computer every time it is retrieved from disk to random-access memory (RAM). d- The pict file formats are created to allow a graphic to be stored/imported into a wide range of applications and it can store both bit-mapped graphics or object-oriented drawings. Most major brand-name graphics software packages can both read and save pict files, allowing for easy swapping of graphics from one application to another. Many word processors, desktop publishing/presentation packages, and authoring packages are only able to import pict files.
for people of varying ages and abilities to think and communicate about geometry without using complex Cartesian methods. d- The TG is based on differential or "intrinsic" mathematics whereas the CCS define graphics on the basis of fixed, absolute points. f- Turtle graphics are drawn by commands that are relative to each and in relation to the turtle position, heading immediately before the command's execution. For instance, the command FORWARD 50 will draw a line in whatever direction the turtle is pointing. 2. The GUI-based approach is based on the graphical user interface which involves graphic tools such as "pencils," "brushes," "fill buckets," "box makers," etc. GUI-based approaches most commonly use input devices such as a mouse, light pens, and graphic tablets, although some of the earliest approaches used the keyboard. Some authoring environments (e.g., HyperCard, Authorware), offer a combination of commandbased and GUI-based approaches. GUI-based graphic packages may be painting packages or drawing packages. Graphics of painting packages can only be saved as a bit map while the graphics of drawing packages may be composed of one or more objects, each of which can be edited and modified. Each package has its menus of functions, tools, features, and effects to control everything on the screen such as grouping/ungrouping of objects, object arrangement, alignment, rotation, layering, line smoothing, and graphing. 3. The Second-Hand Graphics were not created (drawn) by the user from the scratch. There are four alternatives of these graphics. The user can: a- take a graphic and convert it to a digital form. b- find/borrow a graphic drawn by someone else c- copy a graphic from a clip art or scan/digitize a picture or analog images.
4.4. Animation-Based Classification Computer graphics may be: (1) non-animated (static graphics) (e.g., photographs, still and fixed pictures, drawings, clip arts, .. etc) or (2) animated (e.g., moving pictures and video clips). Animation is an illusion that tricks a person to induce a perception of a moving object on the computer screen. This involves: (1) creating a series of carefully timed "draw, erase, move, draw" sequences, (2) sequencing these pictures as frames according to the events of the animation and presenting them at a rate of 16 frames or more per second. Like fixed graphics, animated graphics can be produced by a command-based or GUI-based approach. There are two animation designs: 1- Fixed-path animation which is analogous to choreographing a movie sequence. The same exact animation is supposed to happen the same way, in the same place, at the same time, each and every time the sequence is executed. 2- Data-driven animation in which motion and direction of screen objects do not vary according to the actual movement of the human hand, but by some data source. Although fixed-path animation can be created by a data source and then "captured" or "recorded," the data in data-driven animation are defined as that generated by the student during the instructional sequence.
4.3. Format-Based Classification
5. Instructional Graphic Design Issues
The obtained graphics from these approaches can be stored in a variety of formats (bitmap paint files, drawing files, or pict “picture” files) on a computer disk (floppy, hard, optical, or compact). From the common graphics formats we have [6]: a- Bit-mapped files or paint files. b- TIFF (Tagged Image File Format) files which store bit-map graphics and include additional grayscale/color information. c- Drawing files which store actual graphics as bit maps where the visual attributes of the graphic are stored as a group of mathematically defined objects and the graphic is simply redrawn by the
5.1. Instructional Graphic Design/Development Instructional design and development are two different processes [23]. Design proposes instruction and describes its specifications while development concerns with the actual production of the design. Changes to the design are easy and inexpensive while changes to the development can be costly and timeconsuming. So, designers must consider the resources, conditions, constraints, tradeoffs and compromises throughout the design and development phases.
411
M2USIC 2006
TS- 3F 6- Graphic colors should not interfere with learning and they are used to: a- gain effective attention. b- show contrast, in order to direct or focus attention. c- show relationships between screen information. d- increase motivation, interest, and perseverance, but avoid distraction effects. 7- There should be a reason for using each color. 8- Design elements rather than colors (i.e., text, shape, layout, fonts, etc.) are used to carry bulks of information, and work in tandem with colors. 9- The same conditions of using the final product of the project should be tried out. 10- The color potential should not be underestimated and alternative color schemes may be considered.
However, computers merge the design & development in the rapid prototyping. This allows us to test and modify the design without much cost to achieve a final form of the instructional courseware product.
5.2. Types of Computer Graphics Display There are three structures of computer images, known as graphic primitives [24,25]. These are the picture element (pixels that are simply single points of light on the computer screen), the line and the polygon. There are two major kinds of computer display systems that can produce pixels, lines, and polygons: raster graphics displays and vector graphics displays [26]. Each type uses a different hardware approach to control the scan rate, resolution, and pattern of the cathode-ray tube (CRT) to display the graphic. Recently, Liquid Crystal Displays (LCDs) are used. However, hardware and software should be compatible to get the suitable environment of displaying the instructional graphics.
6.2. Functional Design Principles 1- Graphics intent/outcomes should be considered and evaluated throughout the design process. 2- The type of visual (representational, analogical, arbitrary), and the instructional function it serves are selected based on the learner needs, content, and the task nature and delivery system All the variables must be consistent to increase motivation and interest but not to distract the learners. 3- Graphics are designed carefully to serve their instructional functions and lesson parts. 4- Cosmetic graphics should be carefully planned and used early in the design process and cautiously they are used in the material design because: (a) they do not carry any instructional value, (b) they aim to make materials more attractive, (c) their extensive use may distract a learner's attention from the instructional message. 5- The transitional screens should be used in ways that help users understand the flow of information. 6- Graphics should present meaningful contexts. 7- Attention-gaining graphics should pull learners generally back to instruction and intended tasks. 8- Graphics must be relevant to the given information and convey the intended message. 9- Students are cued to process the information contained in a graphic in some overt way. 10- Graphics are used only when they are needed.
5.3. Graphics in Practice Activities Graphics can be very useful in practice activities. They can act as visual instantaneous feedback to students as they interact with lesson ideas and concepts. This is particularly suited to the computer medium, such as those involving visually based simulations. Real-time animated graphics in interactive learning are known as "interactive dynamics" and they change continuously over time, depending on student input where students learn by discovery and testing. This is called "learning by doing" [27]. The role of graphics is merely to reinforce correct responses, such as displaying a happy face for right answers. The danger is that attractive and interesting graphics may actually reinforce wrong responses or other behaviors.
6. Instructional Graphics Principles 6.1. Graphic Software General Principles 1- The use of software should be clear and straightforward to help the end user understand, teach and learn about the content or domain easily. 2- Rapid prototyping techniques offer some of the best tools to design and test the screen format. 3- Information emphasis should be cosmetic- and information-based distributed on the screen. 4- Procedural protocols should be followed in delivering the designed lessons [2]. 5- Feedback is an essential element and it should be simple and direct to help students interact with the software. Software should easily detect the mistakes and help users remedy them.
6.3. Recommendations of using colors 1- Use one color to group the related elements and emphasize relations between them. 2- Use similar colors with different strength to denote elements relationships (e.g., chapters and sections). 3- Link color changes to dynamic events to portray elapsing time or other critical levels.
412
M2USIC 2006
TS- 3F Educational Technology Research and Development, 40(1), 93-106. [6] Rieber L. P.: Computers, Graphics, & Learning. Available in: http://www.nowhereroad.com/cgl/toc2535.html [7] Ambron, S., & Hooper, K. (Eds.). (1990). Learning with interactive multimedia: Developing and using multimedia tools in education. Redmond, WA: Microsoft Press. [8] Rahouma, K. H., and Mandour, I. (2005). Constructivism and its implications for computer based instructional design: Eista, 2005. [9] Jonassen, D. (1991b). Hypertext as instructional design. Educational Technology Research and Development, 39(1), 83-92. [10] Locatis, C., Letourneau, G., & Banvard, R. (1989). Hypermedia and instruction. Educational Technology Research and Development, 37(4), 65-77. [11] Jonassen, D. (1988). Designing structured hypertext and structuring access to hypertext. Educational Technology, 13-16. [12] Tripp, S., & Roby, W. (1990). Orientation and disorientation in a hypertext lexicon. Journal of ComputerBased Instruction, 17, 120-124. [13] Parsloe, E. (1983). Interactive video. Wilmslow, Cheshire (UK): Sigma Technical Press. [14] Daynes, R., & Butler, B. (1984). The videodisc book: A guide and directory. New York: John Wiley and Sons. [15] Gayeski, D., & Williams, D. (1985). Interactive media. Englewood Cliffs, NJ: Prentice-Hall. [16] Cognition and Technology Group at Vanderbilt (1990). Anchored instruction and its relationship to situated cognition. Educational Researcher, 19(6), 10. [17] Reed, M. (1991). Videodiscs help American Indians learning English and study heritage. Technological Horizons in Education, 19(3), 96-97. [18] Gerrish, J. (1991). New generation of "wiz kids" develop multimedia curriculum. Technological Horizons in Education, 19(3), 93-95. [19] Pauline, R., & Hannafin, M. (1987). Interactive slidesound instruction: Incorporate the power of the computer with high fidelity visual and aural images. Educational Technology, 27(6), 27-31. [20] Curtis, R. V., & Reigeluth, C. M. (1984). The use of analogies in written text. Instructional Science, 13, 99-117. [21] Tennyson, R. D., & Cocchiarella, M. J. (1986). An empirically based instructional design theory for teaching concepts. Review of Educational Research, 56(1), 40-71. [22] Zook, K. B., & Di Vesta, F. J. (1991). Instructional analogies and conceptual misrepresentations. Journal of Educational Psychology, 83(2), 246-252. [23] Reigeluth, C. (1983b). Instructional design: What is it and why is it? In C. Reigeluth (Ed.), Instructional-design theories and models: An overview of their current status (pp. 3-36). Hillsdale, NJ: Lawrence Erlbaum Associates. [24] Artwick, B. (1985). Microcomputer displays, graphics, and animation. Englewood Cliffs, NJ: Prentice-Hall. [25] Pokorny, C. K., & Gerald, C. F. (1989). Computer graphics: The principles behind the art and science. Irvine, CA: Franklin, Beedle & Associates. [26] Conrac Corporation (1985). Raster graphics handbook. New York: Van Nostrand Reinhold. [27] Brown, J. (1983). Learning-by-doing revisited for electronic learning environments. In M. A. White (Ed.), The future of electronic learning (pp. 13-32). Hillsdale, NJ: Lawrence Erlbaum Associates.
4- When using colors for coding information, use a maximum of five colors plus or minus two colors. 5- Use bright and saturated colors only for special purposes such as to draw attention, and to show error messages, urgent commands, or key words. 6- Use "temperature" of colors to indicate action levels or priorities. 7- Use logic in choosing meaningful color schemes (e.g., pink for female child, blue for male child). 8- Utilize the social signs of colors where colors carry meaning individually and in combinations. 9- Use colors consistently for all project aspects. 10- Do not use highly saturated and spectrally extreme colors because this causes visual fatigue and afterimages which painfully affect the eyes. 11- Use red and green for central colors, but not for background areas or for small peripheral elements. 12- Avoid adjacent colors which differ only in hue where they should differ in value and hue. 13- Consider the final viewing environment and use a dark background with light elements (text, etc.) for dark viewing conditions (slide presentations, etc.) and a light background with dark elements for light viewing conditions (paper, normal computer use). 14- Increase the overall brightness of the display and enhance color contrasts for older operators.
7. Conclusions The paper aimed to present the role of computer based instructional media and computer graphics in instructional delivery. Different issues about the design and use of computer and its graphics capabilities were explored. The role of computers and computer graphics were discussed. Issues about computers and instructional multimedia, hypermedia, and different levels of interactive video were also presented. Classification, design principles, and recommendations of instructional graphics and delivery were explained.
References [1] Hannafin, M. (1992). Emerging technologies, ISD, and learning environments: Critical perspectives. Educational Technology Research and Development, 40(1), 49-63. [2] Hannafin, M., & Peck, K. (1988). The design, development, and evaluation of instructional software. New York: MacMillan. [3] Sekular, R., & Blake, R. (1985). Perception. New York: Alfred A. Knopf. [4] Samuels, S. (1970). Effects of pictures on learning to read, comprehension, and attitudes. Review of Educational Research, 40, 397-407. [5] Rieber, L. P. (1992). Computer-based microworlds: A bridge between constructivism and direct instruction.
413
SESSION TS4A TOPIC: INFORMATION SECURITY & CRYPTOGRAPHY SESSION CHAIRMAN: Dr. Andrew Teoh _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________ Invited Talk: "Security Assurance Framework” Pn. Rozana Rusli (CISSP) Head of Security Assurance (National ICT Security and Emergency Response Centre (NISER)
2.00pm
A Weight-Based Scheme For Secret Recovery With Personal Entropy Rabiah Ahmad, Mohd Aizaini Maarof, Yu Leau Beng (Universiti Teknologi Malaysia,Kuala Lumpur, Malaysia)
414
2.30pm
TS4A- 1
2.50pm
TS4A-2
3.10pm
TS4A-3
Securing VPN Using IPSEC: A Case Study Sharipah Setapa Noraida Kamaruddin Gopakumar Kurup (Cyberspace Security Lab, MIMOS BHD.)
425
3.30pm
TS4A-4
Implementation of Spam Detection on Regular and Image based Emails- A Case Study using Spam Corpus Valliapan Raman, Biju Issac (Swinburne University of Technology (Sarawak Campus), Kuching, Sarawak, Malaysia)
431
On a Strong Location Privacy and Forward Secrecy RFID Challenge-Response Authentication Protocol Shu Yun Lim Hun-Wook Kim Hoon-Jae Lee (Dongseo University, South Korea, Busan, Republic of Korea)
420
M2USIC 2006
TS- 4A
A WEIGHT-BASED SCHEME FOR SECRET RECOVERY WITH PERSONAL ENTROPY
Leau Yu Beng
[email protected]
Mohd. Aizaini Maarof
[email protected]
The proposed scheme addressed these problems by assigning a weight to each entrenched-knowledge question and as long as the total of weight from all the questions which user had answered correctly equal or bigger than the ideal weight, and then the secret recovery is successful. It avoids the attacker in suspecting the question-answer based on its entropy and brute force them. The notions of the proposed scheme are as below:i. A weight will be assigned to each question based on their category. ii. A weight-based scheme for sharing and reconstructing the secret with personal entropy will be created. iii. The proposed scheme is providing higher security with three conditions as below: A proper set of questions must be chosen. The total of weight in this proper set of questions must equal or bigger than the ideal weight, (Wt ≥ Wi ). The given answer for each question must exactly matches with their corresponding answer as in registration. iv. The proposed scheme is flexible to be used in different applications which require diverse security level by adjusting the information rate in calculating ideal weight.
ABSTRACT Many encryption systems keep valuable and confidential information which usually required their user to log into account with his or her secret password or passphrase for authentication purpose. It means that particular systems require users to memorize high entropy passwords or passphrases and reproduce them accurately. We propose a weight-based scheme to recover the secret with personal entropy, in the meantime minimize the workloads of secret recovery. This proposed scheme will assigns a weight to each entrenched-knowledge question. If the total of weight from all the questions which user had answered correctly equal or bigger than ideal weight, then the secret recovery is successful. The proposed scheme uses personal entropy-based method as a core concept, combined it with key encryption scheme and a desired weight in each question. The future works will concentrate on developing a prototype for conducting evaluation and justification process to analyze the performance and security analysis in this proposed scheme.
KEYWORDS Cryptography, Key Management and Recovery, Secret Sharing, Personal Entropy
1.
Rabiah Ahmad
[email protected]
Secret
INTRODUCITON 1.1
In general, most application systems which keep valuable and confidential information usually required users to log into account with his or her secret password or passphrase for authentication purpose. The system’s user must memorize high entropy passwords or passphrases and reproduce them exactly. This leads user tends to select poor security level password and even keep it with insecure method or forget the meaningless password. To solve this dilemma, Ellison and Bruce have proposed in [4] replacing a single, long passphrase with multiple small ones, tied to life experiences, free association and etc. Unfortunately, it has been found that more works are required for user to recover their secret. The more keystrokes, the less user satisfaction. Besides that, there are many choices of the parameters in this particular approach are insecure [6].
Organization
We present some related works of secret key management, the most common Secret Shares Schemes, general techniques of secret key recovery and some weighted threshold functions in section 2. In section 3, we propose a Weight-based Secret Recovery Scheme and followed by its preliminary result in section 4. Lastly in section 5, some future direction works will be discussed.
2.
RELATED WORK
Secret key management is an important point to ensure the confidentiality of data. Lost of the secret key by human being or catastrophic loss are common issues
414
M2USIC 2006
TS- 4A are forgotten and entrenched knowledge is not [14]. Furthermore, these questions could be very personal to enhance theirs security level.
today. Hence, key escrow plays a vital role in this context. According Hal Abelson in [10], “key escrow” is also called “key recovery”, is generally referred to any system for assuring third-party access to encrypted data. This term was first introduced by United States Government in a program called “clipper” on April 1993. This mechanism required that individuals split their secret encryption keys into two pieces, and hand them over to escrow agents such as trusted third parties of the government's choosing. As a result, Shamir and Blakey have extended this idea by designing the secret sharing scheme. 2.1
2.3
By applying the concept and technique of question and answer, Ellison et al and Frykholm have proposed their secret recovery scheme which based on error-correcting and fuzzy commitment respectively. In 1999, Ellison et al [4] had proposed a method where a user can protect a secret key using the personal entropy in his own life by encrypting the passphrase using the answers to several entrenched-knowledge questions. This method is based on Shamir’s Secret Sharing Scheme [1], (t, n)-threshold scheme where a secret is distributed into n shares of which at least t of these are required to reconstruct the secret. By using the hash functions of the entrenchedknowledge questions and answers, n secret will be encrypted and decrypted. The work emphasizes on user fault tolerance by following the users still enable to achieve successful authentication and recovering secret even though they forget the answers to some small number of questions. Frykholm and Juels [17] had offered an approach which similar in flavor to the one proposed by Ellison et al, deriving a strong secret key from a sequence of answers to entrenched-knowledge questions. The main difference is their method is based on the fuzzy commitment technique [3] which accepts a witness that is closed to the original encrypting witness in a suitable metric but not necessarily identical. Both of their approaches are dispenses with the use of trusted third parties. In principle, the users can safely store their ciphertext even in a public directory. Users are not likely forget their answers that are more concern of personal fact, hence, the chance of reconstructing secret or secret key for them is very high. Even Frykholm and Jues in [17] have stated that the Ellison et al. scheme strikes an attractive balance between convenience and security but there are some shortcomings in Ellison methods. In [6], Bleichenbacher and Nguyen were shown that there have some choices of the parameter in Ellison’s method which based on noisy polynomial interpolation problem are insecure. From Frykholm’s view [17], a question and answer system is as secure as the entropy of the answer. But Ellison’s method is lack of rigorous security analysis because he employs an ad hoc secret sharing scheme in his system that undermines conventional security guarantees. Furthermore, Ellison also mentioned that the more works are required to the user in recovering their secret when larger number of required correct answers, t are chosen [4]. Hence, the objective of this proposed scheme is trying to apply weighted personal entropy in Ellison’s concept for the aim to solve the existing problems and
Secret Share Scheme
In [1], Shamir has proposed a secret share scheme called (t, n) threshold scheme which based on polynomial interpolation. His goal is to divide data D into n pieces D1, D2,…….,Dn in such a way that: (1) knowledge of any t or more Di pieces makes D easily computable; (2) knowledge of any t-1 or fewer Di pieces leaves D completely undetermined in the sense that all its possible values are equally likely. On the other hand, Blakey [9] has proposed a scheme based on probabilistic approach on linear projective geometry over finite field. The dealer will picks a one-dimensional flat, g and a (t-1)-dimensional flat, H such that g and H intersect in a single point, P. The secret will be the first coordinate of P. Then g will be made public but H will be kept secret. Based on Chunming Tang and Zhuojun Liu in [12], Shamir scheme is perfect but Blakey scheme is not. Blakey’s scheme is less space-efficient because his shares are t times larger where t is the threshold number of participants while for Shamir’s scheme, each share is same size as the original secret since the x-coordinates of each share can be known to all the participants. 2.2
Secret Recovery Scheme
Techniques of Authentication
Basically, there are four techniques to authenticate the legitimate participants in the process of recovering the secret or secret key as well [5]. There are including In Personal Identification technique which the participants physically present themselves to the secret key administrator; Faxed documentation which transmits a facsimile of some kind of official identification such as ID card or passport; Encrypted Email Recovery techniques which the participants provide some public key at registration time and send their recovery emails encrypted with that key, and lastly is Question and Answer which recorded participant’s personal information during account registration. Compared with password, personal knowledge types of questions and answers are widely used because memorized passwords
415
M2USIC 2006
TS- 4A
their own answers to the questions without any previous options. These questions afford the breadth and depth of reply and difficult to analyze the response. The closed-ended questions restricted to the respondent’s options. This is because the respondents only can reply with a finite number such as “None”, “One” or “Mohd Ali”. The bipolar and multiple choices questions are special kinds of closed question. These types of questions limits the respondents even further by only allowing a choice on either pole such as “yes/no”, “true/false”, “agree/disagree” or multiple-choices such as “Monday /Tuesday /Wednesday /Thursday /Friday /Saturday”. The respondents are not allowed to write down theirs response and still be counted as having correctly answered the question [11]. Commonly, these three types of questions can be compared based on theirs attributes such as the precision of data, the breath and depth, and the ease of analysis. The precision of data is referred to the ability of respondents answer to that particular question. The breath and depth is referred to the level of difficulties to answer that particular question, and the ease of analysis is referred to the ability of the attackers or hackers to analyze the correct answer of that particular question. The result of these particular attributes comparison proves that these three types of questions can be further categorized into weighted-based questions regarding to theirs level of entropy. Classifications of questions and their weight value as listed in Table 3.1 [11] and 3.2 below.
enhance the security level in secret sharing and recovery. 2.4
Weighted Threshold Functions
Weighted threshold secret sharing was introduced by Shamir in his seminal work on secret sharing [1]. In this scheme, the users are not of the same status. There is a set of users where each user is assigned a positive weight. A dealer wishes to distribute a secret among those users so that a subset of users may reconstruct the secret if and only if the sum of weight of its users exceeds a certain threshold. A secret sharing scheme is ideal if the size of the domain of shares of each user is at least as large as the domain of possible secrets [19]. In this case, Shamir’s threshold secret sharing scheme could be considered ideal in the sense that the domain of shares of each user coincides with the domain of possible secrets. Besides that, the concept of weighted threshold functions was also applied by Ito, Saito and Nishizeki in [16]. They generalized the notion of secret sharing as an arbitrary monotone collection of authorized sets called access structure. Only the set requirements in the access structure are allowed to reconstruct the secret, while sets that are not in the access structure should gain no information on the secret. Furthermore, Beimel, Tassa and Weinreb in [2] stated that a weighted threshold access structures is ideal if and only if it is a hierarchical threshold access structure as introduced by Simmons [8], or a tripartite access structure, or a composition of two ideal weighted threshold access structures that are defined on smaller sets of users.
3.
TABLE 3.1 CLASSIFICATIONS OF QUESTIONS
PROPOSED SCHEME
Precision
Breath and
Ease of
of Data
Depth
Analysis
Bipolar/Multiple
High
Little
Easy
Close-ended
Moderate
Moderate
Moderate
Open-ended
Low
Much
Difficult
Choices
The proposed scheme is akin to previous work as mentioned but it slightly difference in focusing on weighted personal entropy. 3.1
Types of Question
Weight in Various Categories of Question
In general, challenge questions are always used as hints for secret recovery. There are nine types of questions which are including Bipolar question (i.e., Yes/No, True/False, Agree/Disagree), Multiple-choices question, Fill in the blank, Which, What, Who, When, Why and How. Each type of question can be differentiated from the length of possible given answer and also the difficulties in answering the question correctly. All of these questions can be categorized into three, there are open-ended question, close-ended question and bipolar/multiple choices question. Openended questions are those where respondents provide
TABLE 3.2 WEIGHT VALUE OF QUESTIONS Types of Questions
Weight Value
(Assumption)
416
Bipolar/Multiple Choices
1
Close-ended
2
Open-ended
3
M2USIC 2006
TS- 4A questions and proper value for n, the number of questions has been chosen. The proposed scheme for sharing the secret message M is: i. Ask the user to choose a random number, s and n questions q1,…,qn to generate answers a1,…..,an. ii. Then using McCurley encryption scheme to compute yi = 16s (mod Ni) where Ni = PiQi. for i ={1, 2, …, n}. (Pi and Qi will be randomly generated by system to question, qi and its corresponding answer an. Value Pi and Qi must fulfill the criteria as mentioned in McCurley encryption scheme.) iii. In the same time, weight value, w will assigned to each question which chosen by user and system will accumulate the value of W = w1 + w2 +….+ wn. iv. After this, value Wi, ideal weight will be calculated by using formula of information rate which will be discussed in section 3.3. v. Use a (t, n)-threshold scheme to split the secret message M into n secret shares M1,…...,Mn. vi. Encrypt each share C1 = M1.y1 Wi (mod N1),……, Cn = Mn.yn Wi (mod Nn) and send Cn, ciphertext to nth user. vii. User will remember q1,…,qn , s, and C1,…., Cn.
Table 3.2 above shows that open-ended question has weight value 3 because of having the lowest precision of data, the most breath and depth and the most difficulties to analyze. For bipolar and multiple choices question, their weight value is 1 as its high precision of data, limited breath and depth and the easiest to analyze. For close-ended question, which is ranked between open-ended question and bipolar/multiple choices question, has weight value of 2 each. In this paper, an assumption was done in these assigned values for the weight of question (i.e., 1, 2 and 3). This set of values is used because there are the most common divisors for any number, which will provide some significant advantages for the scheme. In this context, higher weight value will be assigned to higher entropy because the attacker will feel more difficult to guess and analyze the answer. As a result, the higher value of weight surely will decrease the number of questions which user needs to answer correctly to recover their secret. 3.2
Weight-based Secret Recovery Scheme
As we know that the encryption and decryption process are important in secret sharing scheme to protect the secret between two entities (i.e., sender and recipient). For the purpose to enhance the security level, McCurley [15] has introduced an encryption scheme which based on Diffie-Hellman problem and RSA. This scheme seems like ElGamal scheme but it works in a subgroup of ZN where N is a special-form composite number. This scheme needs Alice produce a module N = pq, where p and q are two large primes having criteria as below [13]: i. p = 3 (mod 8) and q = 7 (mod 8) ii. (p – 1)/2 and (q – 1)/2 are primes iii. (p + 1)/4 and (q + 1)/8 have large prime factors. Briefly, in the McCurley scheme a composite modulus N is formed as the product of two primes p and q of rather particular construction. Basically both are being safe primes, one congruent to 3 mod 8 and the other congruent to 7 mod 8. A safe prime p is defined to be one for which (p – 1)/2 is also prime. Therefore, in [18], this kind of scheme was provably secure since it based on intractability of factoring. The proposed Weight-based Secret Recovery Scheme applies the factoring features in McCurley scheme to generate Pi and Qi in order to encrypt and decrypt the secret which involves a set of weighted entrenched-knowledge questions and answers. The following is a description of this weight-based secret recovery scheme for sharing and reconstructing the secret message. Here assume that a legitimate set of
The proposed scheme for reconstructing the secret message M is: i. Ask the user to enter his or her random number, s and choose proper t questions q1,…,qt to answer for generating the a’1,…..,a’t. ii. Only accurate s and proper questionanswer will give the exact value y1,…… ,yn for decrypting the C1,…… ,Cn. iii. On the other side, the total of weight for t questions, Wt will only be calculated from those matched answer given by user as in registration. iv. For any value of Wt ≥ Wi , the scheme will uses Wi in calculating M1,…… ,Mn W’i =
Wi , if and only if Wt ≥ Wi Wt, for any Wt < Wi
(This is a significant security feature when user needs to answer the proper questions
417
M2USIC 2006
v.
TS- 4A 4.1 shows the differences of Wi in various values ρ. The value of ideal weight, Wi is calculated by using various information rates, ρ over the value of weight, W which taken from a set of questions that had been answered by user. The lines graph shows that ρ = 1.00 contributes the highest Wi while ρ = 0.70 provides the lowest Wi. It indicates that when ρ become bigger, the value of ideal weight will increase steadily. This finding reveals that the number of correct questions required to recover the password will also increase slightly when the value of ρ become bigger. Assume a user has answered a set of questions which gives the value W = 40 during the registration process. Then in password recovery process, he needs to answer several questions in this particular set of questions in order to obtain a total of weight which must equal or bigger than Wi = 23 if the value of ρ is 0.85. However, in the same situation, this user is required to answer more questions to get the value Wi = 33 or bigger if ρ has become 0.95. This result obviously illustrates the dependency between the ideal weight, Wi and information rate, ρ. From the formula (1), we can conclude that ρ ∞ Wi and Wi ∞ W
in order to get the exact value of Mn, resulting from value of Wt ≥ Wi ). Decrypt each ciphertext Cn using the corresponding Nn and will get the Mn M1= C1.(y1W’i(modN1))-1, ……………... .., Mn= Cn.(ynW’i(modNn))-1 .
By using the (t, n)-threshold scheme, select subsets of t shares until the secret message M is correctly reconstructed by exact value of Wt. In other words, for the purpose to success recover the secret, at least t of the questions are needed to answer correctly in condition of Wt ≥ Wi, ,where Wt is the total weight from t questions which user answers correctly in proper questions, and Wi is the ideal weight for the scheme. If less than t questions were answered correctly, the user is unable to recover their secret. In the nutshell, each piece of share Mn will not recovered by correct answer if Wt < Wi. Unsurprisingly, even though an attacker has an accurate value Wt, but he or she still will not success in cracking the secret if the proper questions and answers have not been chosen. 3.3
Calculating Wi
The value of Wi is calculated from the value of W. To calculate proper Wi, formula for calculating information rate [7] will be applied. In [8], Simmons called a secret scheme extrinsic if the set of possible shares is the same for all participants. It is confirmed that a secret sharing scheme is ideal if it is perfect and has information rate 1. As a result, to calculate Wi in proposed scheme, the information rate need to be set regarding to the requirement of the application. The formula as below:
The Differences of Ideal Weight in Various p 60
Ideal Weight, Wi
50
30 20 10 0
ρ = log | Wi | / log |W| , where 0 ≤ ρ ≤ 1……..(1) Thus, the value Wi can be differentiated in various applications by determining different information rates. This feature makes this proposed scheme more flexible to meet the requirement of security level in different situations. There are some differences for calculating the value W in sharing and reconstructing secret respectively. In sharing scheme, W will accumulated from questions which the user has given the answer. But in reconstructing process, Wt only will be accumulated from each pair of correct question and matched answer as given in registration.
4.
p = 0.70 p = 0.75 p = 0.80 p = 0.85 p = 0.90 p = 0.95 p = 1.00
40
5
10
15
20
25
30
35
40
45
50
p = 0.70
3
5
7
8
10
11
12
13
14
15
p = 0.75
3
6
8
9
11
13
14
16
17
19
p = 0.80
4
6
9
11
13
15
17
19
21
23
p = 0.85
4
7
10
13
15
18
21
23
25
28
p = 0.90
4
8
11
15
18
21
25
28
31
34
p = 0.95
5
9
13
17
21
25
29
33
37
41
p = 1.00
5
10
15
20
25
30
35
40
45
50
Weight, W
Figure 4.1
The Differences of Ideal Weight in Various ρ
Overall, the graph shows that the bigger value ρ, the higher ideal weight Wi . This result point up that in order to boost up the level of security in any applications; the value ρ must be bigger. Since the value ρ can be varied, this is a good point for the proposed scheme to be used in different web applications based on their security requirements.
PRELIMINARY RESULT
In this section, we tend to demonstrate the dependency between ideal weight, Wi and information rate, ρ. Figure
418
M2USIC 2006
5.
TS- 4A
[6] Daniel Bleichenbacher and Phong Q. Nguyen, “Noisy Polynomial Interpolation and Noisy Chinese Remaindering”, Proceedings of Eurocrypt 2000: LNCS 1807, 2000, pp. 53-69. [7] Ernest F. Brickell, “Some Ideal Secret Sharing Schemes”, Journal of Combinatorial Mathematics and Combinatorial Computing, vol. 6, pp. 105-113, 1989. [8] Gustavus J. Simmons, “How To (really) Share A Secret”, CRYPTO 88, volume 403 of LNCS, 1990, pp. 390-448. [9] G. R. Blakley, “Safeguarding Cryptographic Keys”, AFIPS 1979 National Computer Conference Proceedings, 1979, pp. 313-317. [10] Hal Abelson and Ross Anderson, The Risks of Key Recovery, Key Escrow, Trusted Third Party and Encryption, Report by an Ad Hoc Group of Cryptographers and Computer Scientists, 1998. [11] Julie E. Kendall and Kenneth E. Kendall, System Analysis and Design, Sixth edition, US: Pearson Prentice Hall, 2005. [12] K. McCurley, “A key distribution system equivalent to factoring”, Journal of Cryptology, vol. 1, pp. 85-105, 1988. [13] Kooshiar Azimian and Javad Mohajeri, A Verifiable Partial Key Escrow, Based on McCurley Encryption Scheme, Electronic Colloquium on Computational Complexity, ECCC Report TR05078, 2005. [14] Lawrence O’Gorman, Amit Bagga and Jon Bentley, “Call Center Customer Verification by Query-Directed Passwords”, Financial Cryptography: 8th International Conference (FC), 2004, pp. 54-67. [15] Manezes A.J, Van oorschot P.C and Vanstone S.A, Handbook of Applied Cryptography, Boca Raton, CRC Press, 1998. [16] M. Ito, A. Saito and T. Nishizeki, “Secret Sharing Schemes Realizing General Access Structure”, Proceedings of IEEE Globecom, 1987, pp. 99-102. [17] Niklas Frykholm and Ari Juels, “ErrorTolerant Password Recovery”, Proceedings ACM Conference of Computer and Communications Security, 2001, pp. 1-8. [18] Z. Shmuley, Diffie-Hellman Public-Key Generating Systems Are Hard To Break. Technical Report No.356, Computer Science Department, Technion, Israel, 1985. [19] E. D. Karnin, J. W. Greene and M. E. Hellman, “On Secret Sharing Systems”, IEEE Trans. on Information Theory, vol. 29(1), pp. 3541.
FUTURE WORK
Our aim is to provide more secure weight-based secret recovery scheme. Due to the time constraint, our work only serves to demonstrate of the weight-based scheme concept, but there are a number of areas that require more investigation for this research. Developing a prototype, conducting an evaluation and justification in order to analyze in detail the performance of this proposed scheme. Designing a set of challenge entrenched-knowledge questions is a difficult task and has nothing to do with cryptography of the system. It is an interesting area of psychological research. Research needs to be done on the actual entropy which from the attacker’s point of view in answering the entrenched-knowledge questions. Additional empirical research is required on the actual value of weight for each type of question. In this article, assumption had made where weight of each question is assigned with 1, 2 and 3, but with no more than a little empirical basis.
6.
CONCLUSION
In this paper, we introduced a weight to question based on their characteristics after doing some analysis on various types of questions. Then, we proposed a weight-based scheme for secret recovery with personal entropy. In the future, we will develop a prototype to evaluate and justify the performance of this scheme from its security and usability aspects.
7.
REFERENCES [1] Adi Shamir, “How to Share a Secret” Communications of the ACM, vol. 22, no. 11, 1979. [2] Amos Beimel, Tamir Tassa and Enav Wienreb, “Characterizing Ideal Weighted Threshold Secret Sharing”, The proceedings of the Second Theory of Cryptography Conference (TCC), 2005, pp. 600619. [3] Ari Juels and Martin Wattenberg, “A Fuzzy Commitment Scheme”, 5th ACM Conference on Computer and Communication Security, 1999, pp. 28-36. [4] C. Ellison, C. Hall, R.Milbert and B.Schneier, “Protecting Secret Keys with Personal Entropy”, Future Generation Computer Systems, vol. 16, pp. 311-318, 2000. [5] Charles Miller, “Password Recovery”, GNU, Free Software Foundation, 2002.
419
M2USIC 2006
TS- 4A
On a Strong Location Privacy and Forward Secrecy RFID Challenge-Response Authentication Protocol Hun-Wook Kim1, Shu-Yun Lim1, Hoon-Jae Lee2 1
Graduate School of Design and Information Technology, Dongseo University, South Korea 2 Division of Computer and Information Engineering, Dongseo University, South Korea
[email protected],
[email protected],
[email protected]
reveal their identity to only the authorised RFID readers. Thus, the reader must authenticate itself to the tag. Whereas in location privacy, it is largely related to the identifier ID carried by a tag. The tag ID is an integer number that is uniquely assigned to each individual tag during manufacturing. This number cannot be changed and is read-only. Since tag ID can leave a trace for an adversary to analyze the tag owner’s activities, it is a threat for location privacy security. Consequently, our primary aim is to protect the tag ID in its transmission. The proposed challengeresponse authentication scheme, which incorporates both encryption and hash function, can provide full protection to the tag ID. At the same time, this authentication protocol performs encryption key update, which is a form of rekeying. A major security concern in cryptosystem is the protection of secret keys from exposure. All encrypted material is protected from key exposure after the keys are updated. This property is called forward secrecy. With forward secrecy, disclosure of long-term secret key material does not compromise the secrecy of earlier encrypted material. Owing to the high computation in both encryption and hash function, this highly secured authentication protocol might not be applied to RFID system directly because of lightweight calculation power of a low-cost tag. Anyhow, with the emerging of next generation RFID tag (Class 2 type), this high level security protocol can be realised one day in providing pervasive protection to RFID applications. This paper is organised as follows: In Section 2, we describe RFID system and its security problems. Section 3 covers our proposed solution and the security analysis of our scheme is discussed in Section 4. A summary of this proposal is presented in the final section.
Abstract RFID tag carries vital information in their operation and thus concerns on privacy and security issues do arise. The problem of traceability is critical in open radio frequency environments. An adversary can trace and interact with tag and this is referred as tracking. Nevertheless, with a strong authentication mechanism, uprising security problems in RFID systems can be solved. We had demonstrated current vulnerabilities and proposed our authentication mechanism to overcome them. As long as the secret information stays secret, tag forgery is not possible. Targeting RFID tag with short tag ID, we incorporate minimal encryption and hash operation to enhance the security features in active type RFID tag. Keywords: RFID, challenge-response authentication, location privacy, forward-security
1. Introduction Radio Frequency Identification technique is fast gaining popularity and attracting interest from both the industry and academic institutes. RFID tags have the ability to store data, which can be read rapidly without line of sight. This is especially significant in yielding convenience, efficiency and productivity gains in industries. Armed with all the benefits that it can offer, this automatic identification system received wide deployments in everyday consumer items and played an important role in the automation of industry. However, these low-cost radio frequency tags provide no access control and tamper resistance to sensitive information and hence pose new risks to security and privacy. For privacy, the tag must not disclose its identity until the reader has been authenticated. The tags must
420
M2USIC 2006
TS- 4A
importance in its operation, we had tried to encrypt tag ID and update it on every successful read attempt in a secure manner.
2. Background In this section we give an overview of RFID systems and model a variety of security protocols and attacks.
Table 1. Tag ID size and security level
2.1 RFID System
Tag Standard ID size (bit) Security level
RFID system comprises of a single reader R and a set of n tags T1…Tn. Tags are radio transponders that responses to reader requests. On the other hand, the reader, or radio transceiver, queries the tags for some identifying information. Readers are often regarded as a simple conduit to back-end database; In this case, our schemes identify the connection between reader and back-end database as secured link. Focus is put on the communication link between tags and reader.
ISO1800-7 48 248
EPC 64 264
UCODE 128 2128
Besides tag ID, if a secret key is exposed, security is compromised. Security of a system hinges on the condition that the attackers cannot gain access to its secret key. This condition may be difficult to satisfy since the secret key must be actively used by the system. Subsequently, another issue that we are trying to address is forward security. The idea is to protect cryptographically processed data prior to key exposure. It also helps to protect cryptographically processed data after an intrusion.
2.2 RFID Security Problems Both data privacy and location privacy are equally important to consider. If an RFID system stores personal data, the privacy of the passive party is threatened. Adversary can gain access to personal data by eavesdropping on the air interface. As opposed to data privacy, location privacy threat does not obtain personal data except for the location. Location privacy refers to the untraceability of certain RFID application. By tracking the usage of certain RFID tag, an attacker is able to create a movement profile of the victim by reading out the tag ID on a regular basis. This danger of tracking rises when RFID is employed on a ubiquitous scale. The concept of untraceability includes ID anonymity. RFID tag carries a tag ID for reader to recognise tag’s identity. The tag ID appears in various lengths in different standards as depicted in Table 1. The tag ID number has no structure and does not contain any information besides uniquely identifying a tag. Even an adversary does not have any information about the data carried by a tag; adversary can find specific patterns of output by tracking a particular tag movement. Therefore, our main concern here is to give full protection to tag IDs in RF transmissions. The longer the tag ID length, brute-force attacks become infeasible. An 80-bit level appears to be the smallest general purpose level as with current knowledge, it protects against the most reasonable and threatening attack scenarios [7]. ID sizes lower than 64-bit should not be used for confidentiality protection since it offers very poor protection in brute-force attack. As a result, the ID length in ISO1800 and EPC tag is very vulnerable to brute-force attack and enables adversaries to forge the tag by testing all possible sets in order to recover the tag ID. In view of tag ID’s
2.3 Related works Researches had been done depending on the privacy right concerned. RFID applications in different area are susceptible to different attacks. There is a procedure based on dynamic generation of a new meta-id at every read cycle proposed in [9] which is called a Randomised Hash-Lock approach. It is targeting resource limited low-cost tags whereby offering simple security scheme based on one-way hash functions. The scheme requires a hardware-optimised cryptographic hash on the tag and managing keys on the back-end. Since many hash operations need to be performed in the back-end, it is hardly applicable when the system needs to deal to a large number of tags. Besides, this scheme provides security only limited to data privacy but not location privacy since tag can be uniquely identified by its hash value. An attacker can track its meta-ID and impersonate the tag to a legitimate reader. A permanent deactivation at a certain situation is the best way to ensure data and location privacy. If a tag is able to deactivate itself, it can prevent illegal access of reader to its sensitive information. This “Killing Tag” [8] approach can be implemented in check-out counter of a hyper mart. Deactivation is done after purchase when the customer is exposed to privacy threats. However it prevents the advantages of reuse for maintenance or recycling. It also appears that the tag is exposed to any query of malicious reader and can kill the tags prematurely. Another approach to secure RFID application is by disabling the access of RFID tags. By default, RFID
421
M2USIC 2006
TS- 4A
tags can be activated by anyone equipped with the correct hardware and without notification of the owner. This scheme which is named is “Blocker Tag” [8] utilises tree-walking protocol to prevent tags from being scanned to enhance privacy. When the reader queries for tags and receives overlapping signal responses, it needs to walk down the tree to solve the jamming signal. As the reader walks through, the blocker tag essentially gives response to the reader regardless of the queried ID. This reflexive blocker tag prevents itself from being read by jamming the reader.
cnt S r A T
3.2 Initial Setup Each T, carrying a unique ID is equipped with a secret key. D in B shared the same copy of secret key. T itself has an encryption function, E() and a hash function h(). Meanwhile, R is equipped with a pseudorandom number generator for generating a nonce in each challenge process. D initially stores the record of all legitimate tag consisting of . D also keeps a h() and E() for verifying T identity.
3. Proposed Solutions The proposed protocol is shown in this section with detailed procedures described. Figure 1 described the general setup of this challenge-response protocol.
3.1 Notations and operations T R B D ID E() h() Ckey Lkey RR() LR() CE
RFID tag RFID reader Back-end server which keep a database D A database in back-end server RFID tag ID Encryption algorithm One-way hash function. Current key; stored in database, 128-bit in size Last key; stored in database, 128-bit in size Right rotate operation Left rotate operation Cipher text encrypted using current key ECkey(IDRR(1)), an operation done in database Cipher text encrypted using last key ELkey(IDRR(2)), an operation done in database Encryption secret key
LE key
ID
ID
1
flag == 1 Challenge Incomplete
Challenge-response authentication involves the proof of knowledge of a shared secret. Proposed protocol can provide mutual authentication in three steps as showed in Figure 1. 1.
2. 3. 4.
Data
Reader
Database
flag == 0 Challenge Responded
3.3 Authentication Process
RFID Tag
Info
3 4
At,T, cnt, S, flag Adb, r
S
2 At,T,cnt, flag Generate S (PRNG)
At T flag
An incrementing counter A nonce generated by pseudorandom number generator A random number ranging from 1 ~ IDlength-1 Ciphertext generated by tag or database ie. At or Adb Hashed value of the concatenation of key, cnt and S
Ekey ( RR(flag+1) (ID)) h(key||cnt||S) 1
5 Adb, r If ID == RRr(Dkey(Adb)) key RRr(key) flag 0
Figure 1. Proposed challenge-response protocol
422
R generates and saves a new pseudorandom number S, send it to a T. At Ekey (RR(flag+1) (ID)) T will right rotate the ID one time and encrypt it with secret key. T h(key||cnt||S) T later hashes the nonce S together with cnt and key and sends the result back to the R. The R is able to verify the value of S and recognise it as a legitimate T. R allows the access of T to D. In D, either Procedure 1 or Procedure 2 will be executed. The selection is done based on the flag status, 0 stands for authentication process completed and 1 stands for authentication process incomplete. This is to ensure authentication process is completely executed and secret key is successfully updated. Ckey and Lkey are both secret keys that are used for encryption but Ckey is the current key in used meanwhile Lkey is a key used in last encryption process. When flag value is set to 0, meaning previous authentication process is performed successfully; procedure Challenge_Responded (fig. 2) is executed. D will compute a cipher text CE in parallel. CE:=ECkey(IDRR(1)). In order to find matching ID in its record, D searches the received At that matches CE. If more than one result is
M2USIC 2006
5.
TS- 4A
found, the algorithm will hash Ckeyi, cnt and S to determine which is the actual ID whereby h(Ckeyi || cnt || S) == T. For the case where T failed to receive response from R (eg. response dropped by adversary), flag status is still set to 1. Therefore, procedure Challenge_Incomplete (fig. 3) is executed. The process is similar to procedure Challenge_Responded except that LE is not updated by CE. Later on, it will right rotate ID for a random number of times and encrypt it using Ckey, Adb ECkey(LRr(ID)). Rotation is performed on the ID to further randomise it before sending it for encryption. Next, update the database Ckey with newly computed Ckey, where Ckey RRr(Ckey). Ckey is right rotate for some rounds determined by random number r. Lastly compute ciphertext CEECkey(RR1(ID)) and send the encrypted ID back to the T. The ID is changed on every successful read attempt. Finally cipher text created by D is sent back to tag together with a random number r. r is a value range from 1 to IDlength-1. Now the T is aware of the number of rounds the key has been rotated. It will perform the same rotation after decryption so that both key are synchronised. Upon the completion of authentication, flag is set to 0. T is now equipped with new ID, shared key is also updated.
Procedure 2 Challenge_Incomplete Input: flag, S, T, At Output: Adb, r R:= Rnd(1~IDlength -1) LE:= ECkey(IDRR(2)) If flag == 1 Then Search LE == At If LE.count > 0 Then Repeat i ← i+1 If h(Lkeyi||cnt||S)==T Then ID← LR(flag+1)(ELkey(At)) If IDdb == ID Then Adb ← ELkey(LRr(ID)) Ckey ← RRr(Lkey) CE ← ELkey(RR1(ID)) Return Adb, r Return 0 Until i 0 Then Repeat i ← i+1 If h(Ckeyi||cnt||S)==T Then ID←LR(flag+1)(ECkey(At)) If IDdb == ID Then LE ← ECkey(RR2(ID)) Lkey ← Ckey Adb ← ECkey(LRr(ID)) Ckey ← RRr(Ckey) CE ← ECkey(RR1(ID)) Return Adb, r Return 0 Until i MAX
Yes
Get the first MAX bytes of data, D
D-MAX =Remainder
No Encrypt data, D Compute 256 bits MAC using HMAC with Secret key, K and Ciphertext, C Append computed MAC to the trailer Ciphertext C Embed processed data into ICMP ECHO request message Send to the intended receiver End
Figure 6. Flow chart diagram for embedding process As a means of achieving replay attack immunity, concatenation of eight 2 bytes Sequence Number is
457
M2USIC 2006
TS- 4B difference of the maximum IP datagram length (65535 bytes) and summation of the IP header (20 bytes), ICMP header (8 bytes) and MAC (32 bytes).
supplied as the Initialization Vector for LILI-II keystream generator. The value of Sequence Number will be increase by 1 each time an ICMP Echo Request message is sent. Thus, the same data will be encrypted differently when using the same key. If the data size is greater than the size defined, it will be split iteratively and encrypted in each round until the end. There is a tradeoff between efficiency and ease of realization when defining the data size. The bigger the data size, the easier it will be realized and vice versa. We suggest that the data size to be 24 bytes with 32 bytes MAC to imitate the actual optional data field size (56 bytes) of UNIX ping packet. In addition with that, we had made use of MAC which plays a very important role in data authenticity and data integrity. We extend its usage by serving it as the steganography detection on the receiver side. For data arrival confirmation, it is automatically done by operating system kernel. This is because in principle, a reply by ICMP Echo Reply message is mandatory for every ICMP Echo Request. The flow chart diagram for embedding and extracting process are described in figure 6 and 7 respectively.
Figure 8. Embedded secret data and MAC in ICMP message
4.2 Result and analysis The implementation of our proposed solution, both in sender and client side, were written in C#. We had implement LILI-II keystream generator and HMACSHA256 algorithm. In addition of that, we utilized the .NET Framework class library for ICMP packet creation. For optimum efficiency, we suggested that it should be implemented in a low level programming language. Table 1. Security Level comparison between previous
work and proposed solution Previous Work Proposed solution Confidentiality Susceptible to LILI-II cryptanalysis ≥ 128bit security Integrity ICMP Checksum HMAC-SHA256 ≤ 16bit security ≥ 128bit security Authenticity HMAC-SHA256 ≥ 128bit security
Apparently, the security performance of our proposed solution is far more superior than the previous work. As shown in Table 1, data confidentiality, integrity and authenticity achieved at least 128-bit of security level in our approach. LILI-II keystream generator was selected in view that it is a secure high speed stream cipher where there are no currently known attacks on the cipher which are more feasible than exhaustive key search with complexity of 2128. In the meantime, we have selected HMACSHA256 to preserves the data integrity and data authenticity of our approach. Since each of the encrypted stego is required to be authenticated at the receiver end, any attempt to modify stego’s contents or removes the 32 bytes MAC, will results in an invalid stego. HMAC-SHA256 is secure against currently known style of attack. Unfortunately, there is a famous natural attack which is called a birthday attack; the attack uses birthday paradox to find collision of cryptographic hash algorithm at the complexity 2n/2
Figure 7. Flow chart diagram for extracting process The situation where an encrypted data and 32 bytes MAC are embedded inside an ICMP message is illustrated in figure 8. The maximum size for data that can be embedded is 65475 bytes. It is calculated as the
458
M2USIC 2006
TS- 4B
where the n is the length in bits. Therefore, we estimate that HMAC-SHA256 can produce 2128 bits before collisions happen. Our approach doesn’t inherit restrictions of the previous work where there is a limitation of the maximum message length. On the other hand, the confidentiality of the previous work is susceptible to cryptanalysis. A cryptanalysts can be easily performs a Time Memory Tradeoff Attack (TMTO) to recover the secret key faster than exhaustive key search. Other than that, Mblock can be recorded, modify and replay it in the future. The main reason behind all these weaknesses is the used of Electronic Codebook mode (ECB) in block cipher encryption. It is known that ECB mode is not advisable for practical security purposes because identical data block will be encrypted identically and might be a loophole if apply in any application. Moreover, only a simple 16 bits checksum had been used for data integrity checking and no data authentication protection in the design of previous work. All these factors result in the low security performance in their work.
special issue on Protection of Multimedia Content , Vol. 87, pp. 1062-1078, 1999. [2] L. M. Marvel and C. T. Retter, “A Methodology for Data Hiding Using Images”, Proceedings of the IEEE Conference on Military Communication (MILCOM) Boston, MA, October 1998. [3] K. Gopalan, “Audio Steganography by Cepstrum Modification”, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2005), Philadelphia, March 2005. [4] A. Westfeld and G. Wolf, "Steganography in a Video Conferencing System", In Information Hiding: second international workshop, Preproceedings, Portland, Oregon, 15-17 April 1998. [5] E. T. Lin and E. J.delp, “A Review of Data Hiding in Digital Images”, Proceedings of the Image Processing, Image Quality, Image Capture Systems Conference, PICS '99, Savannah, Georgia, April 25-28, 1999, pp. 274-278. [6] B. Pfitzmann, “Information Hiding Terminology”, Information Hiding: Proceedings of the First International Workshop, Cambridge, U.K., Springer, 1996, pp. 347-349. [7] J. Postel, “Internet Control Message Protocol”, RFC 792, September 1981. [8] E. O. Savateev, “Design of the Steganography System Based on the Version 4 Internet Protocol”, Conference on Control and Communication SIBCON-2005, 21-22 Oct 2005, pp. 38-51. [9] E. Dawson, A. Clark, J. Golic, W. Millan, L. Penna, L. Simpson, “The LILI-128 Keystream Generator”, in: Proc. 1st NESSIE Workshop, http://www.cosic.esat.kuleuven.ac.be/nessie/. [10] A. Clark, E. Dawson, J. Fuller, J. Golic, H-J. Lee, W. Millan, S-J. Moon, and L. Simpson, “The LILI-II Keystream Generator”, Information Security and Privacy: 7th Australasian Conference, ACISP 2002 Melbourne, Australia, July 3-5, 2002, volume 2384 / 2002, pp. 25. [11] H. Krawczyk, M. Bellare and R. Canetti, “HMAC: Keyed-Hashing for Message Authentication”, Internet Engineering Task Force, Request for Comments 2104, 1997. [12] National Institute of Standards and Technology, FIPS180-2: “Secure Hash Standard (SHS),” August 2002. [13] R. Atkinson and S. Kent, “IP Encapsulating Security Payload (ESP)”. Request For Comments 2406, November 1998. [14] M. Bellare and C. Namprempre, “Authenticated encryption: Relations among notions and analysis of the generic composition paradigm”, Proceedings of the 6th International Conference on the Theory and Application of Cryptology and Information Security: Advances in Cryptology, Lecture Notes In Computer Science, Vol. 1976, pp. 531- 545. [15] M. E. Hellman, “A cryptanalytic time-memory tradeoff,”IEEE Transactions on Information Theory, vol. 26, pp. 401–406, 1980. [16] J. Hong and P. Sarkar, “Rediscovery of time memory radeoffs,” 2005. http://eprint.iacr.org/2005/090. [17] A. Biryukov, S. Mukhopadhyay and P. Sarkar. Improved Time-Memory Trade-offs with Multiple Data, in the proceedings of SAC 2005
5. Conclusion Information hiding used to receive meager attention from experts. However, nowadays, extensive research has begun to focus on this field. This paper presents a novel algorithm for data hiding in ICMP. The approach covers data protection of confidentiality, integrity and authenticity. To supply sufficient input to the cryptographic algorithm, Initialization Vector for LILIII would be the concatenation of eight sequence number field (2 bytes) in the ICMP message. Subsequently, the data is encrypted using the keystream generated by LILI-II cryptographic algorithm. Right after that, a MAC is generated with a secret key and resulting ciphertext by using HMAC-SHA256 cryptographic hash algorithm. As a result, even the existence of this secret communication had been revealed, the intruder would not further gain any information from it or modify it.
Acknowledgement This research was supported by University IT Research Center Project.
6. References [1] F. A. P. Petitcolas, R. J. Anderson, M. G. Kuhn, “Information Hiding – A survey”, Proceedings of the IEEE:
459
M2USIC 2006
TS- 4B
MyKad Security: Threats & Countermeasures Huo-Chong Ling1 and Raphael C.-W. Phan2 1
Center for Cryptography and Information Security, Multimedia University, 63100 Cyberjaya, Malaysia
[email protected] 2 Information Security Research (iSECURES) Lab, Swinburne University of Technology (Sarawak Campus), 93576 Kuching, Malaysia
[email protected]
ABSTRACT MyKad was officially launched on 5th September, 2001 and it is expected that within the next 5 years, MyKad will be used by every Malaysian as the Identity Card (IC), driving license, passport, medical record, credit card and ATM card all rolled into one. One of the issues that a Malaysian MyKad holder would be concerned with is how secure MyKad really is. In this paper, we present potential threats faced by the usage of smart cards, and relate them to MyKad. We also suggest possible countermeasures to guard against these threats. Our intention is to highlight the importance of and the need for, a more transparent and open-minded approach to analyzing the security of MyKad. Keywords: Computer and network security, cryptography, MyKad, threats
I. INTRODUCTION The national general multipurpose card, also known as the MyKad [1], was officially launched on 5th September, 2001 and it is expected that within the next 5 years [2], every Malaysian would be carrying a MyKad in his pocket to be simultaneously used as an Identity Card (IC), driving license, passport, medical report and Automatic Teller Machine (ATM) card. Obviously, personal as well as financial information would be stored in MyKad so some of the basic questions that a Malaysian MyKad holder would ask are: 1.
Confidentiality: “How secure is the information in MyKad?”
2.
Authentication: “If someone gets hold of my MyKad, would he be able to impersonate me?”
460
3.
Integrity: “Can the information in my MyKad be modified without authorization? If so, would I be able to detect such a modification?”
4.
Non-repudiation: “If I have used my MyKad for a transaction, would I be able to confirm or deny in future of having made such a transaction?”
Clearly, all these questions are important. The first question being so because the personal information stored in MyKad such as our IC number, Personal Identification Numbers (PINs), passwords, and birthdates are private and we would not want simply anyone to be able to have access to them. Also, since we would be using MyKad to prove our identity both within and outside Malaysia (IC and passport), the second question is an equally important one. Meanwhile, the question of ensuring integrity of information is vital because we would want only the authorized parties to be able to modify certain information, and to be able to determine if unauthorized modifications have been performed. Finally, with MyKad being used as an ATM, and in online transactions, it is necessary that whatever previous transactions that have been made with MyKad be binding on the parties involved. Earlier research on the security and design of MyKad has been done by Phan and Mohammed [2]. In this paper, we present the potential threats faced by MyKad and suggest possible countermeasures to guard against these threats.
II. PARTIES INVOLVED IN MYKAD We should realize that unlike computers, a smart card such as MyKad would be more exposed as it would be dealing with many parties throughout its useful lifetime. In this section, we will highlight the parties [3] involved in dealings with MyKad as our initial attempt to show the potential threats faced by a MyKad.
M2USIC 2006
TS- 4B
A. MyKad Cardholder This would be a typical Malaysian who keeps the MyKad in his pocket, using it to prove his identity, his driving license, etc.
Technologies Sdn. Bhd. and Unisys (MSC) Sdn. Bhd., who are responsible for deploying MyKad. The MyKad manufacturer typically also controls the operating system which handles and interacts with the data and functions in MyKad.
B. MyKad Data Owner This is the party who has the authority to create, and modify the data inside MyKad. Data owners would vary with different applications being used for a single MyKad. There would be various data owners, each owning a specific collection of data within MyKad. For example, the data owners of the IC and passport information in MyKad would be the relevant government departments, while the data owner of the digital certificate in MyKad is really the cardholder himself. C. Terminal The terminal is actually a device that allows MyKad to connect to the outside world. There are 3 types of terminals for MyKad [4] as shown in Fig. 1, namely the: (a) Mobile Card Acceptance Device (CAD) This is typically used by government agencies to read information from MyKad and also to verify fingerprints. (b) Key Ring Reader (KRR) This is a small and convenient reader with two versions. The personal version allows every Malaysian to access some basic information in MyKad while a more powerful version allows enforcement officers to access selection information in MyKad. (c) Autogate This is fixed at all immigration checkpoints in Malaysia to perform immigration clearance based on the passport information in MyKad.
Fig. 1. NRD Website, showing the MyKad Terminals
D. MyKad Issuer F. Software Developer This is the party that issued MyKad to Malaysians, which is in this case the National Registration Department (NRD). The NRD would be in control of some information on MyKad, especially the national identification information of the cardholder. E. MyKad Manufacturer This refers to the GMPC Consortium [1], comprising 5 local companies: CSA (MSC) Sdn. Bhd., Dibena Enterprise Sdn. Bhd., EPNCR Sdn. Bhd., IRIS
461
This refers to the party or parties involved in producing the software within MyKad. There are commonly many different software manufacturers. For example, the software to access IC information could have been written by government agency A, while the software to access ATM information could have been written by Bank B.
M2USIC 2006
TS- 4B
III. POTENTIAL THREATS
C.
A.
This is not saying that the MyKad manufacturer would intentionally launch attacks on MyKad cardholders or data owners, but just serves to highlight what a privileged position the MyKad manufacturer is in. His responsibility is therefore heavy in developing a MyKad secure against physical attacks and a secure operating system within the MyKad. The latter is not an easy task since MyKad, being a multi-application system, would require that various applications be run at the same time, similar to multitasking environments in Windows and Unix. How for example, would the operating system ensure that a running application does not access or even modify confidential information used by another running application? Certain restrictions would also have to be implemented on these applications because the operating system cannot afford to offer them too much freedom and control [3]. Otherwise, an attacker could make use of these applications to launch attacks on other applications that are currently running.
Threats by the Terminal
When a MyKad holder inserts his MyKad into a MyKad terminal, he has complete faith that the terminal is authentic, and fully trusts the terminal to do what it is supposed to do. This trust could be exploited if the terminal is modifiable or can be tampered with [3]. An attacker, having modified the terminal to his advantage, could make use of the terminal to cheat, commit fraud or thefts. For example, when withdrawing RM1000 from an ATM via a MyKad terminal, you would expect the terminal to correctly minus that amount from the ATM account information in your MyKad. If the terminal had been tampered with, it could well just subtract RM5000 from your account. B.
Threats by the Cardholder
In a situation where there are so many parties, you just can’t have too much trust. In the previous subsection, we discussed how terminals could be exploited by attackers. In contrast, since everyone in Malaysia would hold a MyKad, cardholders could also be potential attackers. Such an attacker would target other MyKad-related parties such as the data owner and terminal, by cheating, counterfeiting MyKads or reading the data inside them that belongs to other parties. It appears that such threats are less serious nowadays with current smart cards using technologies that make them much harder to forge. However, there are still successful attacks, especially those carried out by the cardholder against the data owner, for example sidechannel attacks such as fault analysis [5, 6], timing analysis [5, 7] and power analysis [5, 8]. In cases where MyKad is stolen, the thief is considered to be the new cardholder. The difference between an attack by the thief and an attack by the cardholder is that the thief does not have access to any secret information required to activate the card and he has only limited amount of time to carry out his attack before the cardholder will notice that his card has been stolen. The most obvious attack that a thief will do is a physical attack on the card itself. Physical attacks attempt to reverse engineer the card and determine the secret key. Works on such attacks have been demonstrated in practice against smart card chips by Boneh, DeMillo and Lipton [6], Anderson and Kuhn [9] and Paul Kocher [10].
D.
Threats by the Manufacturer
Attacks When Two or More Parties Collaborate
The MyKad environment assumes that various parties are independent and do not have influence over the other. If two or more parties choose to collaborate [3] and attack other parties in the MyKad environment, it would be much easier. For example, consider a Malaysian programmer who works with one of MyKad’s software developers who takes part in developing software for MyKad. Since he also has a MyKad, he would be acting as both a cardholder and a software developer. How do you guarantee that he would not insert trapdoors into the MyKad software so that it can be used to his own advantage?
IV. COUNTERMEASURES AGAINST THREATS A.
Integrity Checks on MyKad Terminals
As a possible countermeasure against attacks by terminals, MyKad terminals should have integrity checks to detect if any unauthorized modifications have been done on them. This will ensure that unauthorized modifications by attackers do not go unnoticed, and once detected, such terminals would be quarantined or serviced. B.
Resistance Against Side-channel Attacks
Side-channel attacks are practical to mount so MyKad should be designed to be resistant against these attacks.
462
M2USIC 2006
TS- 4B
For example, recently an optical fault induction attack was presented by some researchers from Cambridge University by using reasonably cheap equipment that could be obtained from a standard camera shop [11]. These attacks are reasons for concern and hence MyKad should be designed to resist such attacks. Countermeasures against side channel attacks are suggested in the papers [5-8]. C.
Restricting Privileges of MyKad Programs
The MyKad operating system should practice stricter restrictions on the capabilities that individual MyKad programs may offer. This is to maintain a certain level of control and security so that it would not be possible for malicious programs to circumvent the security of the operating system or be used by attackers to attack other MyKad programs. D.
Avoiding Implementation Errors
Some attacks on smart cards work by exploiting implementation errors and bugs in smart card programs. MyKad could be using the AES, RSA and other highsecurity techniques, but if the software or hardware implementations of these techniques do not follow the specification exactly, bugs or errors might result. An attacker would then be able to exploit these errors to his advantage. E.
Combining Roles
As was noted in [3], the smart card scenario involves too many parties, sometimes causing more threats than if fewer parties were involved. A possible countermeasure as suggested by [3] is to combine the roles of these parties so that with less parties involved, some of the threats could be entirely eliminated. For example, if the cardholder is the enforcement officer such as the Royal Malaysian Police, he would not want to sabotage the terminals to attack himself. Or in other cases where the cardholder owns the terminal, all attacks by the terminal against the cardholder will be eliminated. Similarly, if the cardholder is also the software developer, then he would have no desire to insert a Trojan to attack his own data in MyKad. F.
More Openness
Security of MyKad should not rely on secrecy of the techniques that it uses. This concept of “security via obscurity (keeping things vague and hidden)” is highly
463
discouraged by the security community because if your security only rests on the technique being a secret, it will only be a matter of time before the implementation of that technique is reversed engineered, and the details broadcast publicly. This happened to popular security techniques such as RC4 [12]. Therefore, the various parties involved in the design and implementation of MyKad such as IRIS and MIMOS should be more open in their dissemination of MyKad details to the community, especially to peer security researchers. We also suggest the setting up of an independent security panel to analyse the security of MyKad. Aspects such as physical security, implementation security, protocol security and resistance against side channel attacks are among some of the issues that could be studied in more detail and the results disseminated to the Malaysian public. This will increase the confidence of Malaysians in the security provided by MyKad.
V.
CONCLUSIONS
Designing a security system is easy, but to design a secure one is very hard. You can never anticipate enough what an attacker might use or where he might attack, for he always attempts to breach a system through the weakest point. We have highlighted in this paper the need for a more open approach towards analyzing the security of MyKad. It is only through the cooperative effort of security researchers, designers, analysts, and implementers that a more secure MyKad can be developed.
ACKNOWLEDGEMENTS We would like to thank K. S. Ng[13] for his many comments and discussions on MyKad, and for encouraging us to write up our ideas.
REFERENCES [1] NRD. (2006). GMPC – Government Multipurpose Card. [Online]. Available: http://www.jpn.gov.my/kppk1/Index2.htm [2] R. C.–W. Phan and L. A. Mohammed, “On the Security and Design of MyKad”, Proceedings of the 9th Asia-Pacific Conference on Communication (APCC03), Penang, Malaysia, September 2003, pp. 142-145. [3] B. Schneier and A. Shostack, “Breaking Up is Hard To Do: Modeling Security Threats for Smart Cards”,
M2USIC 2006
TS- 4B
Proceedings of USENIX, Workshop on Smart Card Technology, 1999. [4] NRD. (2006). MyKad Devices. [Online]. Available: http://www.jpn.gov.my/kppk1/GMPCdevices.htm [5] J. Kelsey, B. Schneier, D. Wagner and C. Hall, “Side Channel Cryptanalysis of Product Ciphers”, Journal of Computer Security, vol. 8, no. 2-3, pp. 141-158, 1995. [6] D. Boneh, R. A. Demillo and R. J. Lipton, “On the Importance of Checking Cryptographic Protocols for Faults”, Proceedings of Eurocrypt’97, pp. 37-51, 1997. [7] P. C. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Proceedings of Crypto’96, pp. 104-113, 1997. [8] T. S. Messerges, E. A. Dabbish, and R. H. Sloan, “Investigations of Power Analysis Attacks on Smartcards”, Proceedings of USENIX Workshop on Smartcard Technology, pp. 151-162, 1999. [9] R. Anderson and M. Kuhn, “Tamper Resistance – A Cautionary Note”, Proceedings of USENIX Workshop on Electronic Commerce, USENIX Press, pp. 1-11, 1996. [10] P.C. Kocher, “Differential Power Analysis”. [Online]. Available: http://www.cryptography.com/dpa/. [11] S. Skorobogatov and R. J. Anderson, “Optical Fault Induction Attacks”, Proceedings of Cryptographic Hardware and Embedded Systems (CHES), 2002. [12] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, U.S.A: John Wiley & Sons, 1996. [13] K. S. Ng, “Details of MyKad and iVEST”, iVEST Product Development Manager, MIMOS, Malaysia, personal communication, 2003.
464
M2USIC 2006
TS- 4B
Certificateless Encryption Schemes Revisited
Wun-She Yap, Swee-Huay Heng and Bok-Min Goi Centre for Cryptography and Information Security (CCIS), Multimedia University {wsyap, shheng, bmgoi}@mmu.edu.my 2. MATHEMATICAL PROBLEMS
ABSTRACT This paper helps in understanding the research work carried out in the area of certificateless public key cryptography (CLPKC) from year 2003 to 2006. CLPKC can be seen as a model for the use of public key cryptography that is intermediate between Traditional Public Key Cryptography (TPKC) and Identity-Based Cryptography (IBC). We review the existing certificateless public key encryption (CLPKE) schemes. Efficiency and security of these schemes are compared. We then review the malicious key generation center (KGC) attack proposed by Au et al. before mounting this attack on the Liu and Au provable secure CLPKE scheme in the standard model.
In this section, we present some mathematical problems which help in realizing CLPKE schemes. Bilinear pairing is an important primitive for many cryptographic CLPKE schemes and identity-based cryptographic schemes. We describe some of its key properties below. Notation: Throughout this paper, (G1, +) and (G2, ▪) denote two cyclic groups of prime order q. A bilinear map, e: G1 × G1 → G2 satisfies the following properties: i. Bilinearity: For all P, Q, R ∈ G1, e (P+Q, R) = e (P, R) e (Q, R) and e (P, Q+R) = e (P, Q) e (P, R). ii. Non-degeneracy: e (P, Q) ≠ 1. iii. Computability: There is an efficient algorithm to compute e (P, Q) for any P, Q ∈ G1. Note that a bilinear map is symmetric such that, e (aP, bP) = e (bP, aP) = e (P, P) ab for a, b ∈ Zq*. Bilinear Diffie-Hellman (BDH) Problem [5] Let G1, G2, P and e be as above. The BDHP in (G1, G2, e) is as follows: Given (P, aP, bP, cP) with uniformly random choices of a, b, c ∈ Zq*, find e (P, P) abc ∈ G2. Generalized Bilinear Diffie-Hellman (GBDH) Problem [2] Let G1, G2, P and e be as above. The GBDHP in (G1, G2, e) is as follows: Given (P, aP, bP, cP) with uniformly random choices of a, b, c ∈ Zq*, output a pair (Q ∈ G1, e (P, Q) abc ∈ G2). k-Decision Bilinear Diffie-Hellman Inversion (kDBDHI) Problem [6] Let G1, G2, P and e be as above. The k-DBDHIP in (G1, G2, e) is as follows: Given (P, xP, x2P, …, xkP) with uniformly random choices of x ∈ Zq*, output e (P, P) 1/x ∈ G 2. Computational Diffie-Hellman (CDH) Problem [4] Let G1, G2, P and e be as above. The CDHP is as follows: Given (P, aP, bP) for uniformly chosen a, b, c ∈ Zq*, compute abP.
1. INTRODUCTION Cryptology can be divided into two areas which are cryptography and cryptanalysis. Cryptography is the field concerned with linguistic and mathematical techniques for securing information, particularly in communications by using key while the goal of cryptanalysis is to find some weaknesses or insecurity in a cryptographic scheme. In this paper, we will focus on one of the cryptographic primitives called encryption since encryption is the core of cryptography. We outline what we achieve below: i. We review various types of public key encryption schemes and summarize the various characteristics associated with each of them. ii. We review the existing CLPKE schemes [2,23,14,3,7,21,4,16,10] along with their efficiency and security considerations. The remainder of the paper is organized as follows. In Section 2, we introduce some mathematical problems which help in realizing the CLPKE schemes. In Section 3, we review various types of public key encryption schemes in the literature and draw a comparison between them. In Section 4, we review the concepts of CLPKC and its security model. In Section 5, we present the comparison among the existing CLPKE schemes in the random oracle model. In Section 6, we review the first CLPKE scheme in the standard model and show that it is insecure against the malicious key generation center attack. Finally, we conclude in Section 7.
3. VARIOUS SCHEMES
PUBLIC
KEY
ENCRYPTION
3.1 Traditional Public Key Encryption (TPKE) Traditional public key encryption scheme such as RSA [19], ECC [18] and NTRU [13] had been introduced in order to solve the key distribution problem of symmetric cryptography. In TPKE, the sender encrypts a message with the receiver's public key while the
465
M2USIC 2006
TS- 4B public key cryptography was introduced by Al-Riyami and Paterson [2]. In this new system, a user's public key is no longer an arbitrary string which is generated by a TTP. Rather, it is similar with the public key used in traditional public key cryptography. Thus, the inherent key escrow problem can be solved. At the same time, the implicit certification of CLPKC can be done by using a partial private key which is generated by a TTP called the Key Generation Center (KGC). Notice that CLPKC can be constructed by using IBC and TKPC [23].
receiver decrypts the ciphertext with their own private key. Public key in TPKE is critical in maintaining security of TPKE, thus certificate is needed in the authentication of public key to prevent impersonation attack. The need for certificate in TPKE results in certificate revocation problem. Several methods such as certificate revocation list (CRL), online certificate status protocol (OCSP) and Novomodo [17] had been proposed to solve the certificate management issue. However, the search for satisfactory solution continues. Thus, the design of a secure and efficient public key encryption scheme without certificate becomes the goal of many cryptographers.
3.5 Certificate-Based Encryption (CBE) In 2003, Craig Gentry [11] introduced a new notion of certificate-based encryption (CBE). In this model, a certificate acts not only as a certificate but also as a decryption key. To decrypt a message, the receiver needs both his private key and the up-to-date certificate from its Certification Authority (CA). Thus, it solves the inherent key escrow problem of IBE while preserving the merit of implicit certification. Besides, CBE does not require a secure channel between the two entities. However, CBE does not get much attention as compared to CLPKE.
3.2 Identity-Based Encryption (IBE) Identity-based cryptography was introduced by Shamir [20] in 1984 to achieve implicit certification. In IBE, each client has his own identity (ID) which can be any arbitrary strings. ID is used as a certified public key, thus certificate can be omitted in authenticating the public key. To overcome the key revocation problem, the sender encrypts a message by using the receiver's ID which includes DATE such as
[email protected] ||18102005. Since all private keys are generated by a trusted third party (TTP) called the Private Key Generator (PKG) in IBE, the private key escrow problem is inherent in this system and thus the PKG can easily decrypt the client's message. Besides, the PKG must send the client the corresponding private key over a secure channel. Although the concept of identitybased cryptography was proposed in 1984, the first practical IBE scheme due to Boneh and Franklin [5] which is based on bilinear pairing emerged only in 2001.
3.6 Comparison of Various Public Key Encryption Schemes The Table 1 summarizes the various characteristics associated with the various public key encryption schemes. In the table, “Trust Level” referred to the following three levels of trust as defined by Girault [12]: i. [Level 1] The authority knows (or can easily compute) the user's private keys and is capable of impersonating any user without being detected. ii. [Level 2] The authority does not know the private keys, but it can still impersonate any user by generating false certificates that may be used without being detected. iii. [Level 3] The authority does not know (and cannot compute) the private keys and if it generates false certificates for users, it can be proven. It is obvious that in schemes of level 1 and 2, the authority must be fully trusted by the users, while in schemes of level 3, the authority is considered as a potentially powerful adversary that may use any means to impersonate the users. Notice that IBE achieves only level 1 due to the inherent key escrow problem while the rest of the schemes achieve either level 2 or level 3. By “Guarantee” we refer to the component that acts as a “certificate” in order to achieve either explicit or implicit certification.
3.3 Self-Certified Public Key Encryption (SCPKE) The self-certified public key concept was introduced by Girault [12] in 1991 to achieve implicit certification in public key cryptography. This model combined the advantages of traditional public key and identity-based models. The trick is that the public key is computed by both the authority and the user, so that the certificate is embedded in the public key itself. However, SCPKE is different from IBE because the authority does not know the corresponding private key which is chosen by the user himself. Self-certified public key reduces the amount of storage and computations in public key schemes while it does not need an extra certificate to certify the public key. Although SCPKE seems better than IBE, there is no provable secure SCPKE scheme in the literature. 3.4 Certificateless Public Key Encryption (CLPKE) In order to solve the inherent key escrow problem in identity-based cryptography while keeping the implicit certification, a new paradigm called certificateless
466
M2USIC 2006
TS- 4B
Table 1: Comparison of PKE Schemes Authority/TTP Certification Key Escrow Trust Level Guarantee
Communication Channel with TTP
TPKE CA Explicit No 3 Certificate
Public
IBE PKG Implicit Yes 1 Private Key Private
4. CERTIFICATELESS ENCRYPTION
SCPKE TTP Implicit No 3 Public Key Public
CLPKE KGC Implicit No 2/3 Private Key Private
PUBLIC
CBE CA Implicit No 3 Decryption Key Public
KEY
Encryption is considered as an important security primitive in CLPKC. Many certificateless public key encryption (CLPKE) schemes [2,23,14,3,7,21,4,16,10] have been proposed since CLPKC was introduced in 2003. The current trend in e-commerce has increased the dependence of both organizations and individuals on the sensitive information stored and communicated electronically using the computer systems. This has spurred a need to guarantee the confidentiality, authenticity and integrity of data and user. Thus we see the importance of CLPKC scheme to guarantee the confidentiality and authenticity without using certificate. First, we review the general structure and security model for CLPKE [2]. 4.1 Framework of CLPKE A CLPKE scheme is specified by the following seven algorithms: i. Setup is a probabilistic algorithm that takes security parameter k as input and returns the system parameters, params and master-key. ii. Partial-Private-Key-Extract is a deterministic algorithm that takes params, master-key and an identifier for entity A, IDA ∈ {0, 1}* as inputs. It returns a partial private key, DA. iii. Set-Secret-Value is a probabilistic algorithm that takes as input params and outputs a secret value xA. iv. Set-Private-Key is a deterministic algorithm that takes params, DA and xA as inputs. The algorithm returns a (full) decrypting key SA. v. Set-Public-Key is a deterministic algorithm that takes params and xA as inputs and outputs a public key PA. vi. Encrypt is a probabilistic algorithm that takes params, PA, IDA and a message m as inputs and returns either a ciphertext c or the null symbol ⊥ indicating an encryption failure. vii. Decrypt is a deterministic algorithm that takes params, c and SA as inputs. It returns a message m or the null symbol ⊥ indicating a decryption failure. In a CLPKE scheme, the Setup and PartialPrivate-Key-Extract algorithms are performed by a KGC. A partial private key DA is transmitted to the user
467
A by the KGC through a secure channel. CLPKE scheme can solve the inherent key escrow problem in IBE since Set-Secret-Value, Set-Private-Key and SetPublic-Key are executed by the user A itself. In order to encrypt a message, the user A needs to run the Encrypt algorithm with input SA, m, IDA and params. Finally, the receiver can decrypt A's ciphertext by running the Decrypt algorithm with input c, m, IDA and params. 4.2 Security Model As defined in [2], the security of a CLPKE scheme can be analyzed by considering two types of adversary: Type I Adversary (AI) and Type II Adversary (AII). AI represents a dishonest user who attacks the CLPKE. Since no certificate is needed in authenticating the user public key, AI is allowed to replace the user public key at will. On the other hand, AII represents a malicious KGC. AII is equipped with the KGC's master-key but cannot replace the user public key. So, we will need to distinguish between the two types of adversary with different capabilities: a. CLPKE Type I Adversary: Adversary AI does not have access to master-key. However, AI may request public keys and replace public keys, extract partial private and private keys and make decryption queries. Here are several natural restrictions on such a Type I adversary: i. AI cannot extract the private key for ID* at any point. ii. AI cannot request the partial private key for any identity if the corresponding public key has already been replaced. iii. AI cannot both replace the public key for the challenge identity ID* before the challenge phase and extract the partial private key for ID* in some phase. iv. In phase 2, AI cannot make a decryption query on the challenge ciphertext c* for the combination of identity ID*and public key P* that was used to encrypt m b. b. CLPKE Type II Adversary: Adversary AII does have access to master-key, but may not replace public keys of entities. Adversary AII can compute partial private keys itself, request public keys, make private key extraction queries and decryption queries, both for identities of its choice. The restrictions on this type of adversary are: i. AII cannot replace public keys at any point. ii. AII cannot extract the private key for ID* at any point. iii. In Phase 2, AII cannot make a decryption query on the challenge ciphertext c* for the combination of identity ID* and public key P* that were used to encrypt m b. An attack model specifies how much information a cryptanalyst has access to when it attempts to decrypt an encrypted message. In analyzing the CLPKE scheme security, we only consider two attack models which are
M2USIC 2006
TS- 4B
adaptive chosen plaintext attack (IND-CPA) and adaptive chosen ciphertext attack (IND-CCA2). IND-CCA2 for CLPKE: A CLPKE scheme is semantically secure against IND-CCA2 if no polynomially bounded adversary A of Type I or Type II has a non-negligible advantage against the challenger in the following game: Setup: The challenger, C takes a security parameter k and runs the Setup algorithm. It gives A the resulting system parameters params. If A is of Type I, then the challenger keeps master-key to itself, otherwise, it gives master-key to A. Phase 1: A issues a sequence of requests, each request being either a partial private key extraction, a private key extraction, a request for a public key, a replace public key command or a decryption query for a particular entity. These queries may be asked adaptively, but they are subject to the rules on adversary behaviors defined above. Challenge Phase: Once A decides that Phase 1 is over it outputs the challenge identity ID* and two equal length plaintexts m0, m1. Again, the adversarial constraints given above apply. The challenger now picks a random bit b ∈ {0, 1} and computes c*, the encryption of mb under current public key P* for ID*. If the output of the encryption is ⊥ , then A has immediately lost the game. Otherwise, c* is delivered to A. Phase 2: A issues a second sequence of requests as in Phase 1, again subject to the rules on adversary behaviour above. Guess: Finally, A outputs a guess b' ∈ {0, 1}. The adversary wins the game if b = b'. We define A's advantage in this game to be Adv (A):= 2(Pr [b = b']1/2). IND-CPA for CLPKE: It is similar to IND-CCA2 except that the adversary is more limited, i.e. it cannot issue decryption queries. The attack game is identical to the IND-CCA2 game except that the adversary cannot make any decryption queries. The advantage of an IND-CPA adversary A against the CLPKE scheme is defined similarly as above.
5. SURVEY ON EXISTING CLPKE SCHEMES IN THE RANDOM ORACLE MODEL In this section, we review some recently proposed CLPKE schemes in the random oracle model and study their efficiency and security accordingly. Basically, a cryptographic method is said to be provably secure if the difficulty of defeating it can be shown to be essentially as difficult as solving a well-known and supposedly difficult (typically number-theoretic) problem.
468
The Al-Riyami-Paterson Scheme (2003) [2] Al-Riyami and Paterson introduced the notion of CLPKC. This scheme was derived from bilinear pairing on elliptic curve based on the Boneh-Franklin IBE scheme [5]. This scheme achieves an IND-CCA2 security in the random oracle based on the GBDH assumption. However, Galindo [9] showed that there is a flaw in the security reduction exhibited in this scheme. Moreover, the GBDH problem is less well-established and no harder than the BDH problem. In this scheme, they also introduced an alternative key generation technique which enhances the resilience of CLPKE scheme against a cheating KGC and allows for nonrepudiation of certificateless signature by binding public key with ID before hashing it. It reduces the degree of trust that users need to have in the KGC. The Yum-Lee Scheme (2004) [23] Yum and Lee proposed a generic construction of certificateless encryption by using both IBE and TPKE schemes. They presented the relation between TPKE, IBE and CLPKE. They claimed that this scheme is chosen ciphertext secure if IBE and TPKE schemes are secure against chosen ciphertext attacks. However, no formal proof is given. Recently, this scheme had been proven insecure in both full model provided by AlRiyami and Paterson and even in the weaker model provided in this scheme. The Lee-Lee Scheme (2004) [14] Lee and Lee proposed an authenticated certificateless public key encryption scheme which improved the efficiency and security of the Al-Riyami-Paterson original scheme [2]. This scheme is IND-CCA2 secure under the BDH assumptions in the random oracle model. The Al-Riyami-Paterson Scheme (2005) [3] Al-Riyami and Paterson presented a new CLPKE scheme resulting from a double encryption construction of CLPKE, by using the Boneh-Franklin IBE scheme [5] and the ElGamal public key encryption as components. This scheme achieves an IND-CCA2 security in the random oracle model based on the BDH assumption. Even though it is based on a stronger assumption as compared with the Al-Riyami-Paterson original scheme [2], this scheme is more efficient. However, recently Zhang and Feng [24] showed that their scheme is vulnerable to the adaptive chosen ciphertext attacks and they presented a countermeasure to overcome such a security flaw. The Cheng-Comley Scheme (2005) [7] Cheng and Comley presented another scheme by combining the Boneh-Franklin IBE and the ElGamal public key encryption scheme. This scheme is INDCCA2 secure based on the BDH assumption. The advantage of this scheme is that it involves less pairings and hash functions. Its public key length is also shorter than the Al-Riyami-Paterson 2005 scheme [3]. Besides,
M2USIC 2006
TS- 4B
6. CLPKE MODEL
this scheme can be extended to provide authenticated encryption. The Shi-Lee Scheme (2005) [21] Shi and Lee proposed another scheme based on the Sakai-Kasahara IBE scheme [6] and the Yum-Lee scheme [23]. They claimed that their scheme achieves IND-CCA2 security in the random oracle model. However, they only managed to prove that their scheme is IND-CPA secure in the random oracle model based on the intractability of k-BDHIP. This scheme is better than the above schemes since no pairing computation is needed during encryption. The Baek-Safavi-Naini-Susilo Scheme (2005) [4] Baek, Safavi-Naini and Susilo presented a CLPKE scheme without using bilinear pairing. This scheme is IND-CCA2 secure under the CDH assumption in the random oracle model. This is the first scheme which does not need any pairing computation. However, this scheme is different with other CLPKE scheme since a user needs to possess a partial private key first prior to generating a public key. The Libert-Quisquater Scheme (2006) [16] Libert and Quisquater proposed another scheme based on Sakai-Kasahara IBE scheme [6]. They proved that their scheme achieves an IND-CCA2 security in the random oracle model based on the intractability of kBDHI problem. Similar to the Shi-Lee scheme, no pairing computation is needed during encryption. Besides, Libert and Quisquater showed that the YumLee generic construction of CLPKE is insecure in the Al-Riyami and Paterson defined model [2]. The Galindo-Morillo-Rafols Scheme (2006) [10] Galindo et al. pointed out that the double encryption technique [8] used by Yum and Lee in [23] is insecure even in the weaker security model considered in [23]. Besides, they showed that the generic construction of certificate-based encryption scheme proposed by Yum and Lee is insecure too.
SCHEME
IN
THE
STANDARD
Recently, Liu and Au proposed the first provable secure CLPKE scheme [15] in the standard model based on the intractability of the decisional bilinear Diffie-Hellman problem (DBDH) problem, deriving from the Waters IBE scheme [22]. However, this scheme is vulnerable to the malicious KGC attack which is first pointed out by Au et al. in [1]. In this section, we first review the Liu and Au CLPKE scheme [15] before showing the insecurity of this scheme. 6.1 The Liu and Au CLPKE Scheme We only describe briefly the Setup algorithm, UserKey-Generation algorithm and Encrypt algorithm as it is enough to show that this scheme is vulnerable to the malicious KGC attack. Let S(pub) denotes an encapsulation scheme and Mac(sk, m) denotes a message authentication scheme. For more details, reader can refer to [15]. Setup: Select a pairing e: G1 × G1 → G2 where the order of G1 is p. Let g be a generator of G1. Randomly select s ∈ Zp, g2 ∈ G1 and compute g1 = gs. Compute pub = Init (1k). Also, randomly select the following elements: - u’, g1’, h1 ∈ G1 - ŭi ∈ G1 for i = 1, …, n. Let Ŭ = {ŭi}. The public parameters params are (e, G1, G2, g, g1, g2, u’, g1’, h1, U, pub) and the master secrets key is g2 α . User-Key-Generation: The user selects the secret value x ∈ Zp as his secret key sk and compute his public key as (gx, g1x) = (pk(1), pk(2)). Encrypt: To encrypt a message m ∈ G2 for an identity ID and public key (pk(1), pk(2)), first check whether pk(1), pk(2) ∈ G1 and e (pk(1), g1) = e (pk(2), g). If not, output reject and abort encryption. Otherwise, run S(pub) to obtain (r, com, dec) and set M = m||dec. Assume there exists a representation of M in G2. Randomly select t
Generally, there are three major cost operations involved which are pairing (p), scalar multiplication (s) and exponentiation (e). Table 2 shows the comparison of the existing CLPKE schemes based on computation complexity, public key length and assumption (considering pre-computation).
∈ Zp, compute U = u' ∏i∈u ui and C1 = e (pk(2), g2) tM, C2 = gt, C3 = Ut and C4 = (g1’comh1)t. Let C = (C1, C2, C3, C4). Compute tag = Mac(r, C). The ciphertext = (C, com, tag).
Table 2: Comparison of CLPKE Schemes Schemes AP [1] LL [13] AP [2] CC [6] SL [19] BNS [3] LJ [14]
Encrypt 1p+1s+1e 1p+2s+1e 1p+2s+1e 1p+2s+1e 3s+1e 4e 3s+2e
Decrypt 1p+1s 1p+1s 1p+2s 1p+2s 1p+2s 3e 1p+3s
Public Key 2 2 1 1 1 2 1
Assumption GBDH BDH BDH BDH k-BDHI CDH k-BDHI
6.2 Malicious KGC Attack on the Liu and Au CLPKE Scheme Au et al. [1] recently showed that the Al-Riyami and Paterson CLPKE scheme [2] is vulnerable to the malicious KGC attack. In other words, the KGC can obtain the user secret key by manipulating the Setup algorithm. Au et al. [1] stated that the KGC randomly
469
M2USIC 2006
TS- 4B
chooses a ∈ Zp and computes g = H (ID*) a where ID* is belonged to the victim user of the KGC. The rest follows the original algorithms. Suppose the user public key published by the user with identity ID* is (XID* = gx, YID* = gsx). From the user public key, the KGC can
[2] S.S. Al-Riyami and K.G. Paterson, “Certificateless Public Key Cryptography”, ASIACRYPT 2003, LNCS 2894, pp.452-473, Springer-Verlag, 2003. [3] S.S. Al-Riyami and K.G. Paterson, “CBE from CL-PKE: A Generic Construction and Efficient Schemes”, PKC 2005, LNCS 3386, pp. 398-415, Springer-Verlag, 2005. [4] J. Baek, R. Safavi-Naini and W. Susilo, “Certificateless Public Key Encryption Without Pairing”, ISC 2005, LNCS 3650, pp. 134148, Springer-Verlag, 2005. [5] D. Boneh and M. Franklin, “Identity-based Encryption from the Weil Pairing”, CRYPTO 2001, LNCS 2139, pp. 213-229, SpringerVerlag, 2001. [6] L. Chen and Z. Cheng, “Security Proof of Sakai-Kasahara's Identity-Based Encryption Scheme”, IMA 2005, LNCS 3796, pp. 442-459, Springer-Verlag, 2005. [7] Z.H. Cheng and R. Comley, “Efficient Certificateless Public Key Encryption”, Cryptology ePrint Archive, Report 2005/012, 2005. http://eprint.iacr.org/2005/012. [8] Y. Dodis and J. Katz, “Chosen-Ciphertext Secure from Multiple Encryption”, TCC 2005, LNCS 3378, pp. 188-209, Springer-Verlag, 2005. [9] D. Galindo, “Boneh-Franklin Identity Based Encryption Revisited”, ICALP 2005, LNCS 3580, pp. 791-803, Springer-Verlag, 2005. [10] D. Galindo, P. Morillo and C. Rafols “Breaking Yum and Lee Generic Constructions of Certificateless and Certificate-Based Encryption Scheme”, EuroPKI 2006, LNCS 4043, pp. 81-91, Springer-Verlag, 2006. [11] C.Gentry, “Certificate-Based Encryption and the Certificate Revocation Problem”, EUROCRYPT 2003, LNCS 2656, pp. 272-293, Springer-Verlag, 2003. [12] M. Girault, “Self-Certified Public Keys”, EUROCRYPT 1991, LNCS 547, pp. 490-497, Springer-Verlag, 1991. [13] J. Hoffstein, J. Pipher and J.H. Silverman, “NTRU: A RingBased Public Key Cryptosystem”, ANTS-III, LNCS 1423, pp.267288, Springer-Verlag, 1998. [14] Y.R. Lee and H.S. Lee, “An Authenticated Certificateless Public Key Encryption Scheme”, Cryptology ePrint Archive, Report 2004/150, 2004. http://eprint.iacr.org/2004/150. [15] J.K. Liu and M.H. Au, “Self-Generated-Certificate Public Key Cryptosystem”, Cryptology ePrint Archive, Report 2006/194. http://eprint.iacr.org/2006/194. [16] B. Libert and J. Quisquater, “On Constructing Certificateless Cryptosystems from Identity-Based Encryption”, PKC 2006, LNCS 3958, pp. 474-490, Springer-Verlag, 2006. [17] S. Micali, “Novomodo: Scalable Certificate Validation and Simplified PKI Management”, PKI Research Workshop 2002, pp.1527,2002. [18] V.S. Miller, “Use of Elliptic Curves in Cryptography”, CRYPTO 1985”, LNCS 218, pp. 417-426, Springer-Verlag, 1985. [19] R.L. Rivest, A. Shamir and L. Adleman, “A Method for Obtaining Digital Signatures and Public Key Cryptosystems”, ACM 1978, vol 21, no2, pp.120-126, 1978. [20] A. Shamir, “Identity Based Cryptosystems and Signature Scheme”, CRYPTO 1984, LNCS 196, pp. 47-53, Springer-Verlag, 1984. [21] Y. Shi and J. Li, “Provable Efficient Certificateless Public Key Encrytion”, Cryptology ePrint Archive, Report 2005/287. http://eprint.iacr.org/2005/287. [22] B. Waters, “Efficient Identity-Based Encryption without Random Oracles”, EUROCRYPT 2005, LNCS 3494, pp. 114-127, Springer-Verlag, 2005. [23] D.H. Yum and P.J. Lee, “Generic Construction of Certificateless Encryption”, ICCSA 2004, LNCS 3043, pp. 802-811, SpringerVerlag, 2004. [24] Z. Zhang and D. Feng, “In the Security of a Certificateless Public Key Encryption”, Cryptology ePrint Archive, Report 2005/426. http://eprint.iacr.org/2005/426.
−1
compute the user secret key by computing YID* a . Thus, the KGC can decrypt and sign any message at will. At first glance, the Au et al. [1] claim that AlRiyami and Paterson CLPKE scheme [2] is insecure against the malicious KGC attack seems true. Nevertheless, we point out some issues concerning this malicious KGC attack. The KGC must first know the victim user identity ID* before generating g = H (ID*) a, but the problem lies in that the KGC does not know the ID* in real world since such ID* is not created yet. Besides, another merit for CLPKC is the sender can encrypt any message to the receiver although the receiver has yet to register any ID with the KGC. Thus, the probability for ID* = ID is negligible. Hence, based on the above observation, Al-Riyami and Paterson CLPKE scheme [2] is not vulnerable to the malicious KGC attack as claimed by Au et al. [1]. However, we show that the Liu and Au CLPKE scheme [15] suffers from the malicious KGC attack as follows. The KGC performs the following steps: i. Select g2 ∈ G1. ii. Select a ∈ Zp and compute g = g2a. iii. Select s ∈ Zp and set g1 = gs. Suppose the user public key published by the user with identity ID* is (XID* = gx, YID* = gsx). From the user −1
public key, the KGC can compute g2xs = (YID*) a . Thus, the KGC can decrypt any message in [15] since M = C1▪ e (gt, g2sx)-1 As a conclusion, the Liu and Au CLPKE scheme [15] is insecure against AII and thereby the key escrow problem is inherent.
7. CONCLUSION We reviewed the existing CLPKE schemes. Efficiency and security of these schemes are compared. Besides, we showed that the only provable secure CLPKE scheme in the standard model is insecure against the malicious KGC attack. There remains a challenging open problem to find a practical and efficient CLPKE scheme which is provable secure in the random oracle model with a tighter security reduction.
7. REFERENCES [1] M.H. Au, J. Chen, J.K. Liu, Y. Mu, D.S. Wong and G. Yang, “Malicious KGC Attack in Certificateless Cryptography”, Cryptology ePrint Archive, Report 2006/255. http://eprint.iacr.org/2006/255.
470
M2USIC 2006
TS- 4B
Analysis of an SVD-based Watermarking Scheme Grace C.-W. Ting, Bok-Min Goi and Swee-Huay Heng
Abstract— The technique of singular value decomposition (SVD) has been applied to watermarking schemes for some years now. One of the most popularly cited work is due to Liu and Tan, but a recent attack by Zhang and Li cast doubt on its security. However, the Zhang-Li attack is purely mathematical and no attack scenario nor application was given. In this paper, we show that although the Zhang-Li attack is mathematically correct, it is flawed in the sense it is fundamentally infeasible from an SVD-based watermarking scenario. In particular, they rely on non-standard assumptions that fall outside the original security proofs by Liu-Tan. We give a watermarking attack scenario for such attacks and then present an improved attack that is more reasonable and within the proof assumptions made by Liu-Tan.
Keywords: Robust watermarking, SVD, rightful ownership, non-invertibility, invertible attack, ambiguity, flaw. I. I NTRODUCTION Digital watermarking [12], [13], [15], [5] is a common technique to provide copyright protection, proof of ownership and traitor tracing of digital content. In the particular case of copyright protection and proof of ownership, an ownerspecific watermark is imperceptibly embeded into the content. Successful extraction of this watermark proves ownership. The singular value decomposition (SVD) technique, known to exhibit desirable properties for image processing [4], has in the past years been applied to the watermarking case [8], [17]. The most notable SVD-based watermarking scheme is due to Liu and Tan [8], but a recent attack by Zhang and Li [17] appears to show that the Liu-Tan scheme is insecure. However, we remark that the “attack” was purely mathematical in nature, without being related to a conventional watermarking application scenario. In fact, we show in this paper that the Zhang-Li attack is fundamentally flawed in the sense that it relies on infeasible assumptions that do not fall within the original security proofs of Liu-Tan. We also give a practical watermarking scenario in which such attacks may be feasible, and then present an improved attack that indeed contradicts the security proofs by Liu-Tan, thus showing their scheme does not achieve rightful ownership contrary to their claims. In section 2, we discuss basic definitions and notations, and then briefly review the SVD technique. The SVD-based watermarking scheme by Liu-Tan is discussed in section 3. In section 4, we concentrate on rightful ownership proofs. Grace Ting is with the Information Security Research (iSECURES) Lab, Swinburne Univerity of Technology (Sarawak Campus), 93576 Kuching, Malaysia. E-mail:
[email protected] . Bok-Min Goi is with the Center for Cryptography and Information Security (CCIS), Faculty of Engineering, Multimedia Univerity, 63100 Cyberjaya, Malaysia. E-mail:
[email protected] . Swee-Huay Heng is with the Center for Cryptography and Information Security (CCIS), Faculty of Engineering, Multimedia Univerity, 63100 Cyberjaya, Malaysia. E-mail:
[email protected] .
Particularly, we first consider the security proofs by LiuTan. We then analyze the Zhang-Li attack, and show how it is flawed. We also present an improved attack that directly counters the original Liu-Tan proofs. We conclude in section 5. II. D EFINITIONS AND N OTATIONS Let A denote the cover image of which an owner wishes to lay his ownership claims by embedding a watermark W . This results in a watermarked image AW . E is the encoder function that takes as input the cover image A and watermark W , and generates the watermarked image AW : E(A, W ) = AW .
(1)
A decoder function D takes as input the (possibly corrupted) watermarked image, AW , and optionally the cover image A to extract from AW the watermark W ∗ or an evidence f (W ∗ ) of its presence in AW . In the case where decoding involves the cover image A, we have: D(AW , A) = W ∗ ,
(2)
then the watermarking scheme is non-blind. In the case where decoding does not involve the cover image A, and hence we have: D(AW ) = W ∗ , (3) then the watermarking scheme is blind1 . A comparator function, Cδ (·) is supplied at its input the original watermark W and the extracted watermark W ∗ and outputs a binary decision on whether there is a match: 1, c ≥ δ; (4) Cδ (W, W ∗ ) = 0, otherwise, where c is the correlation of the two watermarks. Definition 1: A watermarking scheme is the triple (E, D, Cδ ) where E is the encoder function, D the decoder function and Cδ the comparator function. Definition 2: A watermarking scheme (E, D, Cδ ) is invertible [6] if for any image AW there exists a mapping E −1 such that: 1) E −1 (AW ) = (AF , WF ), 2) E(AF , WF ) = AW , 3) Cδ (D(AW ), WF ) = 1, where E −1 is a computationally feasible mapping, WF belongs to the set of allowable watermarks, and the fake cover 1 Some authors have used the terms public and private to refer to non-blind and blind watermarking schemes respectively, but we do not use this to avoid confusion with watermarking schemes that use public-key cryptography [9], [11].
471
M2USIC 2006
TS- 4B
image AF is perceptually similar to the authentic (original) watermarked image AW . Statement (1) simply means that given a watermarked image AW , it is computationally feasible to find a fake watermark WF and a fake cover image, AF perceptually similar to AW , such that by Statement (2) the embedding of the fake watermark WF into the fake cover image AF would result in the authentic watermarked image AW , and such that by Statement (3) the fake watermark WF would be correlated to the original watermark extracted from AW . This sums up to say that an invertible watermarking scheme allows an attacker to find a fake watermark and a fake cover image that can be used to make equal ownership claims to the authentic watermarked image. This is clearly undesirable and causes the problem of rightful ownership [6]. Meanwhile, every real matrix A (as is an image) can be decomposed into a product of three matrices, A = U · Σ · V T where U and V are orthogonal matrices, U T · U = I, V T · V = I, and Σ = diagonal(λ1 , λ2 , . . .); where U T denotes the conjugate tranpose of U . This is known as singular value decomposition (SVD) [4], and the diagonal entries of Σ are called the singular values (SVs) of A, the columns of U (respectively V ) are called the left (respectively right) singular vectors of A. Applying SVD in watermarking [8] takes advantage of the fact that large singular values do not vary much after going through common image processing transformations, thus are used to embed watermark information. This complicates the task of an attacker in trying to distort, modify or remove the watermark as it will affect the singular values which would in turn result in a seriously distorted image. III. L IU -TAN SVD- BASED WATERMARKING S CHEME The SVD-based watermarking scheme by Liu and Tan [8] performs the watermark embedding as follows: E1. Do SVD on A, SVD(A) = [U, S, V ].
(5)
E2. Add S to α · W to obtain D, D = S + α · W;
(6)
here α denotes the scale factor. E3. Do SVD on D,
D2. Recombine SW with the supplied UW , VW to form D, T D = UW · SW · VW .
(10)
D3. Subtract S from D, D − S = α · W ∗, ∗
(11) ∗
thus W is obtained. With the obtained W , the correlation between this extracted watermark W and the original watermark is then computed. IV. R IGHTFUL OWNERSHIP P ROOFS The issue of resolving rightful ownership was introduced by Craver et al. [6] in 1998. The basic idea behind using watermarks to prove ownership of an image, A is that a person who produces a watermark W that matches the extracted watermark from the watermarked image AW , is considered the legitimate owner of the content. However, if multiple ownership claims are made by more than one person (as would be the case for invertible watermarking schemes), how do we resolve this rightful ownership? As a solution to the rightful ownership problem, Craver et al. defined the notion of noninvertibility to (informally) mean that it is computationally infeasible for an attacker to find a fake image AF and a fake watermark WF , such that the pair can result in the same watermarked image AW created by the real owner. This was the approach used by Liu-Tan, i.e. to prove that their scheme provided rightful ownership, they proved that it is noninvertible, namely that the following objective is achieved: An attacker with access to the watermarked image AW is unable to compute a fake cover image AF and a fake watermark, WF such that the fake watermark WF can be extracted from AW via the extraction steps D1 to D3 above. In more detail, their proof considered the following cases: • Given SW , it is infeasible to obtain S and W . • Given SW and S, it is infeasible to obtain W . • Given SW and W , it is infeasible to obtain S. • Given SW and WF , it is infeasible to create SF (SVD of AF ) such that WF is embedded in AW . • Given SW and SF , and by imposing constraints (e.g. semantically meaningful restrictions) on WF , it is infeasible to obtain such a WF such that it is embedded in AW .
Some Remarks. We have some concerns. In page 1, LiuTan pointed out that the Zeng-Liu scheme [16] cannot resolve E4. Recombine SW with U, V to form the watermarked the rightful ownership problem because the watermark detection does not need the original cover image. But Liu-Tan’s image, T AW = U · SW · V . (8) watermark detection method similarly does not use the entire original image, A, but only the S of A, which is not significant To prove ownership of the watermarked image AW in the Liu- enough. Instead, as pointed out by Zhang-Li [17], UW and Tan scheme, matrices UW , VW , and S are provided in addition VW are more significant thus they influence the watermark to AW in order to extract the embedded watermark W . This detection. So we remark that Liu-Tan’s scheme essentially has watermark extraction process is essentially the reverse of the the same problem as Zeng-Liu’s. Liu-Tan’s experiments all prove robustness against image embedding steps: processing attacks, thus they only prove that false negative deD1. Do SVD on AW , tection rate (not detecting the correct embedded watermark) is SVD(AW ) = [U, SW , V ]. (9) low. They did not consider to prove that false positive detection 472 SVD(D) = [UW , SW , VW ].
(7)
M2USIC 2006 rate (detecting watermarks that were never embedded) is low. This mistake occurs frequently when researchers propose their watermarking scheme designs but only consider robustness results. A. The Zhang-Li Attack Zhang and Li [17] recently demonstrated an attack on the Liu-Tan scheme, particularly that it is possible to cause the extracted watermark not to be the embedded watermark, but instead have it be determined by the UW and VW information used as reference during the watermark extraction process. To prove this, they experimentally demonstrated by using a 256x256 image “Lena” as the host cover image A. “Panda” (WP ) and “Monkey” (WM ) were used as two different watermarks. During the embedding process, the two watermarks were embedded into the host image respectively, using the LiuTan scheme, so two different watermarked images AW P and AW M were obtained, containing the embedded “Panda” and “Monkey”, respectively. Then, the watermarked image AW M (with “Monkey” embedded) was used during the watermark extraction process, but instead of using UW M and VW M as done by Liu-Tan in their experiments, Zhang-Li used UW P and VW P . Because of this, the “Panda” watermark was extracted, even though it had never been embedded into AW M . Zhang-Li explained that this is possible because Liu-Tan’s scheme actually requires the UW and VW of a reference watermark during the watermark extraction process, which carry a lot of information about that watermark, and therefore influences and subsequently causes the extracted watermark to be that one. This result obviously shows that the Liu-Tan scheme has the false positive detection problem because they demonstrated that “Panda” can be extracted eventhough it is not in the watermarked image AW M . An Attack Scenario. Zhang-Li gave a theoretical and mathematically correct attack, but did not provide actual attack scenarios in which their attack could apply in the watermarking case, i.e. of circumventing proofs of ownership. The most obvious application of their attack is the case where the image (content) owner Bob has embedded his watermark, WM into his image A, and thus obtain the watermarked image AW M , which he puts on his website to share with the rest of the world. An attacker Alice downloads AW M from Bob’s website, and wishes to claim ownership of it, i.e. prove that she is able to extract her watermark, WP from AW M . This would be viewed as an ambiguity attack [6] since both Alice and Bob would rightly be able to lay claim to the ownership of AW M and no one could prove who is right, thus there is confusion as to who rightfully owns it: rightful ownership problem. To Alice, she is happy enough that she is able to cause this ambiguity.
TS- 4B have UW P and VW P , it means the attacker must be able to access the cover image A, and be able to embed his watermark “Panda” into it. This is not a valid assumption in a SVDbased watermarking case, and further defeats the purpose of attacking, because if the attacker has access to A, he does not need to go to all the trouble of doing the Zhang-Li attack, he can straightaway embed his own watermark “Panda” directly into A to prove he owns A. In the next section, we present an improved attack that does not have this problem. B. An Improved Attack Our improvement is to consider exploiting equation (6) instead of having the attacker access to the cover image A. No embedding of the attacker’s watermark “Panda” is required. From equation (6), note that by only having access to the singular value, S of A, the attacker can do the left-hand side since W would be his own watermark “Panda”. Then he will get UW P and VW P without having access to the whole cover image A. Unlike the infeasible Zhang-Li attack, this attack falls within the security proofs of Liu-Tan, since an attacker only requires access to AW M which is in the public domain and S which is considered by Liu-Tan. C. Non-invertibility is Necessary but not Sufficient Both the Zhang-Li and our improved attack demonstrate that although it has been shown that the non-invertibility criterion [6] is necessary to prevent ambiguity attacks, it is insufficient to solve the rightful ownership problem because the LiuTan scheme was proven non-invertible but false positives still occur, thus causing multiple ownership claims. In fact, our attacks do not require any inverting, nor the need to find a fake watermark satisfying any condition, instead we can simply use any fake watermark. This means our attacks are far easier to mount than the invertible attack that Liu-Tan attempted to prove resistance to, hence our results are more devastating than if someone discovers an invertible attack on the scheme. V. C ONCLUSION AND C OUNTERMEASURES Our work suggests that more caution should be exercised when proving security against ambiguity attacks and solutions to the rightful ownership problem, and that the noninvertibility criterion is necessary but not sufficient. A direct fix to counter our attack is to ensure S is not disclosed to any party except for those who are authorized to extract the embedded original watermark W . Thus, the original Liu-Tan security proofs should be modified to consider this. ACKNOWLEDGEMENT
We thank God for everything. The first author thanks Dennis Problems with the Assumptions. The problem with Zhang- ML Wong for advice, insight and suggestion to consider Li’s attack is that they assumed that UW P and VW P are SVD-based watermarking schemes, and Enn Ong for constant available whenever they are needed, by explaining that to encouragement to pursue research. 473
M2USIC 2006
TS- 4B R EFERENCES
[1] A. Adelsbach, S. Katzenbeisser and A.-R. Sadeghi, “On the Insecurity of Non-invertible Watermarking Schemes for Dispute Resolving”, Proceedings of the International Workshop on Digital Watermarking (IWDW ’03), Lecture Notes in Computer Science, vol. 2939, pp. 355-369, 2003. [2] A. Adelsbach, S. Katzenbeisser and H. Veith, “Watermarking Schemes Provably Secure Against Copy and Ambiguity Attacks”, Proceedings of the ACM Workshop on Digital Rights Management (DRM ’03), pp. 111-119, 2003. [3] A. Adelsbach and A.-R. Sadeghi, “Advanced Techniques for Dispute Resolving and Authorship Proofs on Digital Works”, Proceedings of the SPIE vol. 5020, Security and Watermarking of Multimedia Contents V, pp. 677-688, 2003. [4] H.C. Andrews and C.L. Patterson, “Singular Value Decomposition (SVD) Image Coding”, IEEE Transactions on Communications, pp. 425432, 1976. [5] I.J. Cox, M.L. Miller and J.A. Bloom, Digital Watermarking, Morgan Kaufmann, 2002. [6] S. Craver, N. Memon, B.L. Yeo and M.M. Yeung, “Resolving Rightful Ownerships with Invisible Watermarking Techniques: Limitations, Attacks and Implications”, IEEE Journal of Selected Areas in Communication, vol. 16, no. 4, pp. 573-586, 1998. [7] S. Katzenbeisser and H. Veith, “Securing Symmetric Watermarking Schemes Against Protocol Attacks”, Proceedings of the SPIE vol. 4675, Security and Watermarking of Multimedia Contents IV, pp. 260-268, 2002. [8] R. Liu and T. Tan, “An SVD-Based Watermarking Scheme for Protecting Rightful Ownership”, IEEE Transactions on Multimedia, vol. 4, no. 1, pp. 121-128, 2002. [9] A.J. Menezes, P.C. van Oorschot and S.A. Vanstone, Handbook of Applied Cryptography, CRC Press, 1997. [10] M. Ramkumar and A.N. Akansu, “Image Watermarks and Counterfeit Attacks: Some Problems and Solutions”, Proceedings of the Symposium on Content Security and Data Hiding in Digital Media, pp. 102-112, 1999. [11] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley & Sons, 1996. [12] M. Swanson, M. Kobayashi, A. Tewfik, “Multimedia Data Embedding and Watermarking Technologies”, Proceedings of the IEEE, vol. 86, pp. 1064-1087, June 1998. [13] G. Voyatzis and I. Pitas, “The Use of Watermarks in the Protection of Digital Multimedia Products”, Proceedings of the IEEE, vol. 87, pp. 1197-1207, July 1999. [14] N.R. Wagner, “Fingerprinting”, Proceedings of the IEEE Symposium on Security and Privacy, pp.18-22, 1983. [15] R. Wolfang, C. Podilchuk, E. Delp, “Perceptual Watermarks for Digital Images and Video”, Proceedings of the IEEE, vol. 87, pp. 1108-1126, July 1999. [16] W. Zeng, B. Liu, “A Statistical Watermark Detection Technique without using Original Images for Resolving Rightful Ownerships of Digital Images”, IEEE Transactions on Image Processing, vol. 8, pp. 15341548, November 1999. [17] X.-P. Zhang, K. Li, “Comments on ‘An SVD-Based Watermarking Scheme for Protecting Rightful Ownership’ ”, IEEE Transactions on Multimedia, vol. 7, no. 2, pp. 593-594, 2005. [18] J. Zhao, E. Koch, “Embedding Robust Labels Into Images For Copyright Protection”, Proceedings of the International Congress on Intellectual Property Rights for Specialized Information, Knowledge and New Technologies, Vienna, 1995.
474
SESSION TS4C TOPIC: VIRTUAL REALITY SESSION CHAIRMAN: Dr. Harold M. Thwaites _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________ 9.00am
9.30am
Invited Talk: “ Latest R&D trends in VR and AR in both extremes [high-end oriented and low-end oriented]” Simon Su1, William R. Sherman1, R. Bowen Loftin2 1Desert Research Institute, Reno, USA. 2 Texas A&M University at Galveston, USA
PANEL SESSION
SESSION TS4D TOPIC: VIRTUAL REALITY SESSION CHAIRMAN: Dr. Harold M. Thwaites _________________________________________________________________________________________ Time Paper No. Paper Title Page No. _________________________________________________________________________________________ Invited Talk: “ECulture and Its EPossibilities: Immortalising Cultural Heritage through Digital Repository” Faridah Noor Mohd Noor (UM, Kuala Lumpur, Malaysia)
497
11.00pm
TS4D-1
11.30am
TS4D-2
Using Heuristic to Reduce Latency in A VR-Based Application Mohamed Nordin Zakaria Yulita Hanum Iskandar Abas Md. Said (Universiti Teknologi Petronas, Tronoh, Perak, Malaysia)
480
11.50am
TS4D-3
Surface Reconstruction from the Point Clouds : A Study Ng Kok Why (Multimedia University, Cyberjaya, Selangor, Malaysia)
486
12.10pm
TS4D-4
A Network Communications Architecture for Large Networked Virtual Environments Yang- Wai Chow Ronald Rose, Matthew Regan
491
(Monash University, Australia)
12.30pm
TS4D- 5
The Instructional Design and Development of NoviCaD Chwen Jen Chen, Chee Siong Teh
475
(Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia)
M2USIC 2006
TS- 4D
The Instructional Design and Development of NoviCaD Chwen Jen CHEN and Chee Siong TEH Faculty of Cognitive Sciences and Human Development Universiti Malaysia Sarawak
[email protected],
[email protected] introducing any additional peripherals. Its cost is much lower and thus, making it feasible and cost-effective to be ubiquitously utilised. In addition, unlike most other driving simulators, this learning environment focuses only on the cognitive domain and employs an instructional design theoretical framework that enables learner to actively construct knowledge on traffic rules.
Abstract NoviCaD is a virtual reality (VR)-based learning environment that aims to assist novice car drivers to learn traffic rules. This paper emphasises the instructional design and technical procedures for developing NoviCaD. It provides a detail description of the various features of this learning environment, which is designed according to an instructional design theoretical framework that grounded on the concept of integrative goals, constructivist learning environments design model, and the cognitive theory of multimedia learning. Such theoretical framework enables the creation of learning environment that allows learner to actively construct knowledge on traffic rules. It also includes an overview of the technical procedures for developing the learning environment as well as a brief report on the finding of the educational effectiveness evaluation of this learning environment.
As reported in a local newspaper [6], from May 2002 until August 2003, around 500,000 candidates underwent the Malaysia Road Transport Department computer-based theory test. However, only 219,965 of them passed the test. Further investigation revealed that those who did not pass the test have difficulty in studying the materials, which are in the form of text and two-dimensional static images. Indeed, the current methods of instruction for the cognitive domain, which rely on textbook and basic practical lesson and test the learners using theory test, are observed to pose various limitations in assisting learners to recall or recognise knowledge, and to develop their understandings, intellectual abilities and skills. These limitations include its disability to (i) support learners with different cognitive abilities, (ii) provide authentic and meaningful tasks, (iii) provide concrete experience and active experimentation to support learners with this type of learning style, and (iv) provide learner-directed learning. On the other hand, the various capabilities of the virtual reality technology are foreseen to be able to overcome the observed problems, such as its ability to provide threedimensional representation, authentic representation, excellent visualisation tool, multiple perspectives, controlled level of complexity, active learning, learnercentred, and strong positive emotional reaction. A thorough description of these limitations and the potential of VR to overcome them are provided in [7].
Keywords: novice car drivers, learning environment, virtual reality, VRML
1. Introduction NoviCaD is a virtual reality (VR)-based learning environment that was created for novice car drivers to better comprehend the traffic rules for various road scenarios that consist of ordinary roads, different types of road junctions and related traffic signs. A survey on the use of virtual reality (VR) to create driving simulators has revealed that there are two major types of such simulators: (i) the costly and sophisticated research-based driving simulator that uses the latest in visual display technology and a high-fidelity audio system to complete the driving experience, e.g. [1], [2], [3], and (ii) the more common research-based or commercial-based driving simulator that employs steering wheel, brake and accelerator pedals, visual display that gives three-dimensional scenario, auditory source, and dashboard instruments, e.g. [4], [5]. Although the idea of using VR to create driving simulators is not new, NoviCaD is noteworthy as it only utilises conventional personal computer setting without
2.
Instructional design of NoviCaD
NoviCaD runs on a conventional personal computer and uses mouse and/or keyboard as the input devices. The design of NoviCaD was guided by an instructional
475
M2USIC 2006
TS- 4D learning environment, it assists in engaging the learner with the learning activities. Problem representation. The narrative that is presented in text and virtual environments help the learner builds a mental representation of the problem. The narrative is in the form of stories while the virtual environments present various virtual road scenarios that correspond to the stories. Both the problem context and problem representation describe a set of events, which explains the problem that needs to be resolved.
design theoretical framework that grounded on the concept of integrative goals [8], constructivist learning environments design model [9], and the cognitive theory of multimedia learning [10]. Figure 1 shows a screenshot of the learning environment.
Problem manipulation space. In this learning environment, the virtual road scenarios serve as the problem manipulation space that allow the learner to navigate his or her virtual car through the virtual road scenarios using input devices such as a mouse or the keyboard. Navigation is however restricted to movements that were possible in the real world, such as moving forward or backward, and turning left or right. The effect of this navigation on the virtual environment is viewed in real time and thus, closely resembles the real-world car navigation. Figure 1: Screenshot of the learning environment
2.3 Related cases 2.1 Integrative goal It is important that the constructivist learning environment provides access to a set of related experiences or knowledge that learners can refer to. In this learning environment, the virtual road scenarios implicitly provide authentic representations that the learner can easily relate to those found in the real world. For example, incorporating appropriate traffic signs in a simulated road scenario presents a similar cognitive challenge that is faced in real driving conditions. Through the process of visiting or exploring the simulated environment, learners can further comprehend the real uses of these signs in contrast to learning them in isolation through printed text alone.
According to [8], an instructional goal must be a combination of several different objectives that are to be integrated into a comprehensive purposeful activity, which is known as an enterprise. In other words, an instructional designer needs to identify the component skills and knowledge that relate to the goal and designs the scenario that relates each piece of knowledge or skill to the goal. The integrative goal is incorporated within the enterprise schema as verbal knowledge. In this learning environment, after identifying the types of learning and the respective learning objectives, the integrative goal is identified as the learners’ abilities to interpret the basic rules of road scenarios that comprise ordinary roads, road junctions, and various traffic signs.
2.4 Information resources
2.2 Enterprise scenario or problem
Rich sources of information are also essential in the constructivist learning environment. This enables learners to construct their mental models and to formulate hypotheses that drive the manipulation of the problem space. In this learning environment, hyperlinks to various resources that contain the description of relevant basic rules for ordinary roads and road junctions, traffic signs, and line markings are provided. The learner is free to access these resources while trying to solve the problem.
The learning environment also comprises an enterprise scenario or problem, which include three integrated components: problem context, problem representation, and problem manipulation space. Problem context. The importance and the learning goal are presented when a learner begins exploring the environment. As pointed out by [11], one critical attribute of a problem is that it must have some social, cultural, or intellectual value to the problem solver. Thus, by explicitly revealing the value and focus of the
476
M2USIC 2006
TS- 4D the decisions on the presentation strategies of the learning environment.
2.5 Cognitive tools The learning environment incorporates a few cognitive tools. The virtual road scenarios act as a visualisation tool where learners can visualise a dynamic threedimensional representation of the problem. This is then much more authentic when compared with the static two-dimensional representations in picture form. This representation, which mimicked the real world, helps reduce the learner’s cognitive load in constructing mental images and performing visualising activities. The virtual environment also functions as a cognitive tool that is capable of making imperceptible things perceptible. It can be designed to make the abstract more concrete and visible by providing symbols not available in the real world. The learning environment, for instance, provides guiding arrows at appropriate places in the virtual road scenarios to avoid the learner from getting lost in the virtual environment. Overall, the virtual road scenarios are designed to be less complex than those in real world so that learner can focus on the salient aspects of the representation. As pointed out by [12], reduced fidelity is known to benefit learning for a novice learner.
3. Technical procedures for developing the virtual road scenario Generally, the learning environment was formed by integrating the virtual road scenarios onto the web interface. The following provides an overview of the technical procedures for developing the learning environment. Emphasis is given to the development of the virtual road scenarios and how these virtual environments were integrated onto the web interface. A Virtual Reality Modeling Language (VRML)-based tool was used for this purpose. Figure 2 depicts the core steps for developing a virtual road scenario.
Identify a scenario
Virtual environments also allow a learner to visualise and understand complex structures that will otherwise remain hidden. The learner’s viewpoint may also be manipulated through which arbitrary levels of scale can be applied to facilitate observations [13]. Indeed, the uniqueness of virtual reality is its ability to provide an infinite number of viewpoints. In this learning environment, the screenshots of the appropriate physically impossible viewpoints of the virtual road scenarios are provided. These include a plan view map to provide understanding of the overall road scenario and 2.5 dimensional (or bird’s eyes) view of various parts of the road scenario, and a tracer that shows the position of the learner’s vehicle on a two-dimensional plan view map in real time. The learning environment also presents a summary of the overall journey plan and a structure map that guides the sequence of the problem solving process. These components all act as cognitive tools that help to reinforce the learner’s mental representation of the problem and help him or her in performing the learning activities of the learning environment.
Sketch a twodimensional plan
Assemble a scene
Figure 2: Steps for developing a virtual road scenario
3.1 Identify a scenario The road scenario that was simulated in the virtual environment was chosen based on the input given by the subject matter expert and thorough review of related documents. This overall scenario was divided into five sub-scenarios to provide scaffold for the learners.
3.2 Sketch a two-dimensional plan The identified overall road scenario was sketched on a graph paper to determine the scale, such as the length and width of each segment of the road. This was necessary as it could help in creating the virtual road at the appropriate proportion.
2.6 Presentation strategies
3.3 Assemble a scene
This learning environment also employed a number of design principles proposed by [10], which include multimedia principle, spatial contiguity principle, coherence principle, and redundancy principle to guide
The development started by assembling the virtual road scenario. This involved creating a new scene, setting the background, creating and adding the virtual road, setting the viewpoint, creating and inlining other virtual objects, and setting the navigation. To specify the
477
M2USIC 2006
TS- 4D interpolator were defined. This interpolator functioned to compute the intermediate positions to produce the effect of a car movement. Similar procedure was used to simulate the colour changes of the traffic light. However, instead of changing the position or orientation, colour was changed at each keyframe.
background of the scene, a set of images that defined a background panorama was specified. These images are mapped to the inside faces of a cube that formed the scene. The image of blue sky with clouds was selected. Based on the two-dimensional plan of the road that was sketched on the graph paper, the image for each segment of the road was created using an image editing software. The road was segmented based on the shape for the various parts of the road and the number of grids occupied by each different shape on the graph paper. The image of each segment was then textured on the face of a VRML box shaped primitive object. The various segments of the virtual road were then added to the scene. All the other virtual objects were created using a three-dimensional modelling tool or obtained from other sources, which were then converted into .wrl format. These objects were then inlined into the scene at the appropriate position and orientation.
In addition to the incoming and outgoing events that control the animation, additional incoming and outgoing events were associated with the object to control the animation playing. For example, the movement of vehicles in all the scenarios was triggered by the Proximity Sensor object. Events were generated whenever the learner’s viewpoint entered or left a defined box-shaped region. The outgoing event was routed to the incoming event of other object(s) to trigger other behaviour. For example, to control the animation playing of a car, a proximity sensor was placed behind the car in the scene. When the learner navigated his or her vehicle forward in the scene, the vehicle would reach and enter the box-shaped region as defined by the sensor. The ‘enter’ event would generate an outgoing event of the SFTime type, which was routed to Animation.start as an incoming event. This would start the animation playback. Such routing mechanism was also used to detect collision and provide a number of feedbacks if collision occurred, such as producing crashing sound, stopping all animations, highlighting the car, producing a text message, and producing a narrated version of the message.
To control how the learner viewed the scene once it was loaded, the viewpoint was set by defining an appropriate position and orientation for the camera. The Navigation Info node was used to specify the values used to describe the virtual presence of the learner within the world and the viewing model. The parameters of the avatarSize property were specified to determine the viewpoint dimensions for the purpose of collision detection. The speed property that set the rate at which the viewer travelled through a scene was also defined. The type of navigation allowed was ‘WALK’.
3.4 Make a scene interactive
4. Effectiveness evaluation The building block of the scene is the virtual objects. Every object contains fields that hold the values of its parameters. The basis for animation and interaction with a world is the ability to change the values of a given object’s field. The way to change a field is to send an event to that field by means of a mechanism through which this event can be propagated to cause changes in other object. This is called a route. The route is the connection between an object generating an event and an object receiving the event, which can be compared to a path through which information is passed from one object to another.
A quasi-experimental study was conducted to investigate the cognitive effects resulted from the use of this learning environment. A group of 62 participants used the VR-based learning environment and another group of 64 participants relied on lectures and reading materials, which contained only text and twodimensional static images. Each group was given a pretest as well as a posttest. A one-way analysis of covariance was then conducted to compare the effects of these two instructional treatments. Learners’ scores on the pretest were used as the covariate in the analysis and gain score, which was obtained by subtracting each post-test score from the respective pre-test score, served as the dependent variable. After adjusting for the pre-test scores, the analysis showed that there was a significant difference between the two instructional treatments on the gain scores, F(1, 123) = 34.524, p = 0.000. The effect size, calculated using η2, was 0.207, which in Cohen’s
Object animation was performed by changing the position, orientation and size of any object in the scene over time. Other properties, such as colour, transparency, intensity, and others can also be animated. Vehicle movement or changing the position and orientation of vehicle was applied to almost all the vehicles in the scene. For example, to create a car animation, a sequence of key positions of the car and an
478
M2USIC 2006
TS- 4D [6] Computer-based traffic rules and regulations test: More than 200,000 failures (in Chinese), Sin Chew Jit Poh, Retrieved November 28, 2003, from http://www.sinchew-i.com/print.phtml?sec=1&arti d=200311280278, 28 November 2003.
(1988) terms would be considered a large effect size. The group that employed the VR-based learning environment had an adjusted mean, M, of 28.751 and the group that utilized the conventional text and twodimensional static images had an adjusted mean, M, of 14.231. A more thorough report of this evaluation study is reported in [14].
[7] C.J. Chen, S.C. Toh, and M.F. Wan, “Virtual reality: a potential technology for providing novel perspective to novice driver education in Malaysia”. In N. Ansari et al. (Eds.), Proceedings of the International Conference on Information Technology: Research and Education, New Jersey: IEEE, 2003, pp. 184–188.
5. Conclusion To date, various hardware and software tools are available for creating virtual reality applications. Indeed virtual reality technology demonstrates various capabilities that depict smart technical achievements and it continues to advance as to provide even more capabilities. Undeniably, such advancements have afforded numerous new possibilities to learning applications. Nevertheless, it is crucial to realise that tools by themselves do not teach. Creating VR-based learning environment that purely demonstrate these technical capabilities will not ensure effective learning, unless they are guided by appropriate instructional design and human factors guidelines. This paper has reported the technical procedures for creating NoviCaD as well as provides evidence on the positive learning effects that a properly designed VR-based learning environment could bring about.
[8] R.M. Gagné and M.D. Merrill, “Integrative goals for instructional design”, Educational Technology Research and Development, vol. 38, no. 1, pp. 2330, 1990.
[9] D.H. Jonassen, “Designing constructivist learning environments”. In C.M. Reigeluth (Ed.), Instructional-design Theories and Models: A New Paradigm of Instructional Theory, New Jersey: Lawrence Erlbaum Associates, 1999, vol. 2, pp. 215-239. [10] R.E. Mayer, (2002). Multimedia Learning, Cambridge: Cambridge University Press. 2002
[11] D.H. Jonassen, (2002). “Integration of problem solving into instructional design” In R.A. Rieser & J.V. Dempsey (Eds.), Trends and Issues in Instructional Design and Technology, New Jersey: Merrill Prentice Hall, 2002, pp. 107-120.
6. References [1]
[12] S.M. Alessi and S.R. Trollip, Multimedia for Learning: Methods and Development, Massachusetts: Allyn & Bacon, 2002.
BMW Latest Driver Assistance R&D Developments. (2006). Retrieved Sept 28, 2006, from http://www.worldcarfans.com/news.cfm/ newsID/2060804.010/page/10/lang/eng/country/g cf/bmw
[13] W.D. Winn, “A Conceptual Basis for Educational Applications of Virtual Reality”, Human Interface Technology Laboratory, (Tech. Rep. R-93-9), Seattle, WA: Human Interface Technology Laboratory, 1993.Technology Laboratory, (Tech. Rep. R-93-9), Seattle, WA: Human Interface Technology Laboratory, 1993.
[2] Leeds Advanced Driving Simulator. Retrieved Sept 25, 2006, from http://www.its.leeds.ac.uk/ facilities/lads/ [3]
The National Advanced Driving Simulator. (2006).Retrieved Sept 28, 2006, from http://www. nads-sc.uiowa.edu/
[14] Chen, C. J., Toh, S. C. & Wan, M. F. (2005). The design, development and evaluation of a virtual reality (VR)-based learning environment: Its efficacy in novice car driver instruction. World Conference on Educational Multimedia, Hypermedia & Telecommunications. (pp. 45164524). Montreal, Canada: AACE.
[4] STISIM Drive (2006). Retrieved Sept 25, 2006, from http://www.systemstech.com/content/view/ 23/39/ [5] Virtual Driver Interactive. (2003). Retrieved Sept 25, 2006, from http://www.virtualdriver.net/
479
M2USIC 2006
TS- 4D
Using Heuristic To Reduce Latency In A VR-based Application Yulita Hanum Iskandar Fakulti Sains & Teknologi Informasi, Kolej Universiti Islam Antarabangsa Selangor,
[email protected]
Abas Md. Said ICT/BIS Department, Universiti Teknologi PETRONAS,
M Nordin Zakaria ICT/BIS Department, Universiti Teknologi PETRONAS
[email protected]
[email protected]
end-user of a virtual training system may be apprehensive to system latency, which limits its usefulness in real world applications [1, 3, 6, 7, 13, 17]. Latency, in this context, refers to the delay that occurs between the movement of the real input device (e.g. the racket) within a Virtual Environments (VEs) and the result of that action being reflected by the VE [15]. The computer needs time to read the tracker measurements, set the new camera position and perform rendering. Because of this, the picture is presented with some delay, which makes some “especially fast” tasks harder to perform.
Abstract Latency is one of the most frequently cited shortcomings of current virtual reality (VR) applications. To compensate latency, previous prediction mechanisms insert mathematical algorithms, which may not be appropriate for complex virtual training applications. For a complex VR simulation, this will most likely impose a greater computation burden and result in an increase in latency. This study implements a new prediction algorithm based on heuristic that could be used to develop a more effective system for virtual training applications. The heuristic-based prediction provides a platform to utilize the heuristic power of human along with the algorithmic capability, geometry accuracy of motion-planning programs and biomechanical laws of human.
Quite a number of prediction methods have been proposed to address the problem of latency [3, 7, 9, 13, 17]. However, each prediction algorithm has its own limitations. In this study, we propose a prediction method based on heuristic that could be used to develop a more effective and general system for virtual training applications.
In measuring the performance on various prediction methods, this study makes a comparison in real tasks between the heuristic-based prediction and Grey system prediction. Findings indicate that heuristic-based algorithm is an accurate prediction method to reduce latency in virtual training. Overall findings indicated that heuristic-based prediction is efficient, robust and easier to implement.
Heuristic has been successfully applied to a wide range of humanoid robots and avatars in VR systems [10, 12, 14, 22], but not yet as a prediction algorithm to reduce latency in a virtual training application. To a certain extent, the human body movement principles can be considered as heuristic because the movements can be patterned and are often orderly during a specific task. The heuristic algorithm implemented in this study focuses on the habitual motions of the squash game.
Keywords: interdisciplinary research, latency, heuristicbased prediction, Grey system prediction.
1.0
Introduction
Therefore, in this research, the researcher explores the benefits of heuristic over the previous predictions in human estimation. Drawing on this research, this study explores how and what the subsequent outcomes are.
One major challenge in VR applications is to provide an immersive environment, which is realistic in appearance, behavior and interaction. For example, an
480
M2USIC 2006
TS- 4D Wu & Ouhyoung [17, 18, 19] compare three prediction algorithms for head motion, using the same datasets. The algorithms in the comparison were an Extended Kalman filter, an extrapolator and a Grey system theory-based prediction. On the average, results from Extended Kalman filtering and Grey system are significantly better than that without prediction, but in terms of jittering, filtering appears to have the largest. Despite the robustness of Grey system, their testing were limited to only head motion sequences.
In the next section, we discuss basic issues in latency. In Section 3, we describe methods used in this study and discuss in detail the heuristic-based prediction algorithms for predicting user motion followed by a description of the Grey system prediction we tested against. Then, we present the prediction experiment to study the viability of the newly proposed algorithms. Section 4 and 5 dedicated to discussion and analysis of the empirical findings and conclusion.
2.0
In general, a mathematical algorithm is usually inserted in a prediction algorithm. To more complex VR applications, this will most likely impose greater computation burdens drastically [21]. Furthermore, almost all prediction algorithms contain one or more parameters used for tuning to optimize performance [8]. Therefore, a significant aspect in determining what prediction algorithms to use is in adjusting an algorithm’s parameter values. Making these adjustments is nontrivial in the sense that an optimal parameter setting for one type of user motion may not be optimal for another.
How Latency is Handled
A number of prediction methods have been proposed to handle latency in VR applications. These can be categorized as interpolation, extrapolation, integerised, filter-based or neural-network-based [5]. While interpolation and extrapolation methods may fail in quick motion applications [2] and are susceptible to noise amplifications [4], the neural network-based approaches are more towards learning algorithms which are attributed to more overhead.
3.0
More promising approaches are often confined to Grey system and filter-based predictions.
2.1
Heuristic-based Prediction
Our proposed heuristic-based prediction method hinges on the premise that human movements for a specific class of tasks can be predicted and is applied in the game of squash.
Grey System Prediction and Filter-based Prediction
Human motion sequence consists of some kind of motion elements, namely primitive motions [11]. The motion primitive consists of a basic motion, which is common to all players. Furthermore, human motion is unable to be modeled exactly by mathematics and physics method because it is very complex and controlled by brains [20]. As sport is a game of habits, during which a player typically acts in a certain habitual pattern. It involves fast strategic thinking and planning skills, basic habits that give the player the basic control.
La Viola [9] and Rhijn et. al [13] present some studies for comparison of prediction filters. They compare different filtering methods to compensate the system latency especially for orientation prediction of hand movements. However, the results are inconclusive and are strongly dependent on the input data, parameters and requirements. Wu and Ouhyoung [17, 18, 19] proposed a prediction method using the Grey system theory. The method is applied to the prediction of a tracker motion because the behavior of the tracker output is “grey” (only partially known). The study shows that the computation requirement of Grey system is relatively low. The Grey system algorithm’s parameters were tuned using a limited amount of motion datasets, optimizing their performance to motion data with similar characteristics. If the prediction needs to be applied to other types of motion, these tuned parameters may not yield accurate results.
Squash is a fast-movement game and has certain rules, tactics and patterns of movements that make it ideal a subject for the application of heuristic and study of VR latency. The basic heuristic-based algorithm is to move the player from a start location to a goal location, thus minimizing the path length in terms of time.
481
M2USIC 2006
TS- 4D
Cognitive & Evaluation
Perception
T2
Dis play
Legend:
First Layer
User Action
T1
Tracking
Simulation
: Marker data
T_Pre dict
Path Prediction Motion Implementation
Motion Planning Second Layer
Figure 3: Marker matching - Trajectory of a squash international player (Source: [17])
Human Rule
Updating Motion
Based on analysis, we summarized that players’ movements related to various strategic actions. The motion of the squash game comprises different stops and poses, changes of motion direction, turns, jumps and side steps. As previous works demonstrated that a player’s reaction is mainly influenced by the footwork, the heuristic-based prediction focused on the position and movement analysis.
Figure 1: Framework of heuristic-based prediction Figure 1 shows the framework of heuristic-based prediction. Motion planning incorporates path prediction with simple human rule, to develop integrated reasoning module for richer expressiveness and robust motion. The path prediction is an analysis of human behavior and specific domain knowledge of sport application. Motion planning is use to update the player’s motion. This updated can be referred as motion implementation or heuristic-based prediction.
3.1
An understanding of biomechanics and how the player moves his body is needed for planning the marker placement. Specifically, the marker data guide the physical simulation movement of the player. Figure 3 illustrates the procedure of marker matching. Based on the analysis of squash game, the marker position was chosen with the key frames space at 0.03 sec and the marker shapes in used are round blips.
Constructing the Heuristic
The construction of the heuristic is organized according to the flowchart in Figure 2.
A n a ly s i s
Id e n t i fy in g p a t t e rn o f m o ve m e n t
M a rk e r M a t c h in g
M a rk e r T ra n s fo rm a t i o n
Legend: : Trajectory of the squash’ player : T-area
Figure 2: Flowchart of heuristic-based prediction In the analysis phase, we observed the people playing the game. The main purpose was to acquire information on player’s general positions and movements on the court.
Figure 4: Marker transformation. Motion in x-z plane In contrast to most physically-based mapping techniques that synthesize motion from scratch, motion transformation is used in this study as the underlying paradigm in generating motion sequence. In doing so, based on marker matching, a set of normalized trajectories is calculated and stored. Figure 4 shows patterns of squash game that result from the marker
482
M2USIC 2006
TS- 4D
transformation. Since the motion of most objects in nature is continuous, the resultant of motion trajectories is piecewise smooth. This is because motion of the player can be described by the direction of body links.
4.1
Prediction Accuracy
Table 1: Statistical Analysis on Prediction’s Accuracy
Since the concept of unit motions is introduced as smooth primitive motions in the coordinate system and also based on movement pattern of the squash player, heuristic-based prediction for squash game movements employ B-spline curves, which are usually constructed by stitching together with Bezier curve. We used Bezier curve for interpolation and B-spline to get smooth movements. The B-spline curve is an extended version of the Bezier curve that consists of segments, each of which can be viewed as an individual Bezier curve. The Bezier curve, which is a curve interpolation, has been a very common way to display smooth curves, both in computer graphics and mathematics. Figure 4 also shows how the researcher creates a motion by using disjoint segments in one B-spline (done through the knot vector) and interpolates the movement through Bezier. A screenshot of the scene is shown in Figure 5.
Grey system prediction
Sum Mean Variance Standard Deviation
Observations Df t Stat P(T
39
"
9
,8-9 #
# 8
0
%
# #
8
6
#
0
$2'
#
4 #
,8-9 8=
#
(
+ ( +η − 5
(
6
)
5
(
6 4 40
− #3
5 ,?-#,;(−
$3'
)?
530
!
"
#
" )3
M2USIC 2006
#
TS- 4F
$ %+ 7&
< 7µ
5
!
()*+
%+ #
# E * ( & + F# 1@@@ 119 G ? ; ?08# ) BB7
% + 38#
G
G
G C
#
0 & 02 µ 8 2 & # 27#
H
%
D 7
= # = ) % I>( # E J