AN IMPROVED AES ENCRYPTION OF AUDIO WAVE

57 downloads 0 Views 4MB Size Report
Sep 4, 2003 - and then to C, in the form of directives and pragmas, respectively. The 1980s ..... B E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08. C BA 78 .... sampling rate and sample size, but digital audio have another fundamental.
Republic of Iraq Ministry of Higher Education & Scientific Research University of Technology Department of Computer Science

AN IMPROVED AES ENCRYPTION OF AUDIO WAVE FILES A THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TECHNOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCES

By Nada Hussein Mohammad Ali

Supervised by Prof. Dr. Abdul Monem S. Rahma Assist. Prof. Dr. Abdul Mohssen Jaber Abdul Hossen

April 2015

1436

‫بِسِِمِ ِ‬ ‫اهللِالرَّحْ ِ‬ ‫منِالرَِّح ِيمِ‬ ‫ْ‬ ‫*وأَن ِلَّيس ِلِ‬ ‫ِ‬ ‫ِ‬ ‫َّ‬ ‫ِ‬ ‫ِس َعى*ِ‬ ‫ا‬ ‫ِم‬ ‫ال‬ ‫إ‬ ‫ِ‬ ‫ان‬ ‫نس‬ ‫إل‬ ‫َ َ‬ ‫َ‬ ‫َ‬ ‫َْ‬ ‫فِيُ ـ ـ َـرى*ِثُمَِّيُ ْج َِزىِهُِِ‬ ‫َو َّ‬ ‫ِس ـ ـ ْـو َ‬ ‫ِس ْعيَهُ َ‬ ‫أَن َ‬ ‫َوفَى*‬ ‫َ‬ ‫الج َزاءَِاأل ْ‬

‫ِالعظيم‬ ‫صدَ َقِاهللُ َ‬ ‫َ‬ ‫النجمِ(‪)41-39‬‬

Dedication

TO

MY FATHER AND THE MEMORY OF My Mother WITH LOVE

nADA 2015

ACKNOWLEDGEMENTs First of all, thanks for "Allah" who enabled me to achieve this thesis and who has given me the greatest pride to carry out my research under the supervision of Prof. Dr. Abdul Monem S. Rahma for his valuable advice, guidance, cooperation and giving generously of his valuable time when help was needed throughout the work of this study. Also to express gratitude to co-advisor Ass. Prof. Dr. Abdul Mohssen Jabber

Abdul Hassan for his guidness,

support and help. My deep thanks and sincere gratitude to co-adviser Prof. Dr. Sufian Yousef/Anglia Ruskin University/ United Kingdom for his guidness and help throught my research visit in United Kingdom. I am sincerely grateful to my husband Prof. Dr. Hussein H. Karim/Building & Construction engineering Department- University of Technology for his encouragement, help and assistance in reviewing my thesis who supported me throughout my Ph.D study. Special thanks to Ass. Prof. Dr. Lamiaa Hafidh /Computer Science Department, College of Science –University of Baghdad for her valuable scientific comments and advice. Also many thanks are extended

to my family especially my

mother (God rest her soul), father, my son “Zaidoon” and two daughters ”Rand & Dina”, my brothers and sisters for their patience,

moral support and for providing me with convenient facilities.

Nada, Hussein.Mohammad Ali 2015

Abstract The main objective of this study is to get a secure broadcasting in real time applications for audio files (precisely the Wave type) and to increase the degree of complexity of the encrypt/decrypt arithmetic operations in the finite . field GF(28)

on these files. This objective can be achieved by

implementing different proposed architectures, methods and algorithms both in block and stream cipher structure to increase the degree of complexity. For high utilization of resources and fast execution time of operations in cryptographic processes in the proposed cryptographic algorithms, the parallel computing was implemented by using OpenMP directives. The AES Rijndael algorithm has been considered as a base for the developed algorithms in the block cipher systems. To increase the complexity degree of the encryption/decryption operations, the transformation functions, in AES Rijndael (SubByte, MixColumn and ShiftRows) are modified to the new functions (DK- SBOX-AES, 4K-MIX-AES and 5K-SHIFT-AES) respectively. The aforementioned proposed modified functions in addition to AddRoundKey formed the proposed

improved AES algorithm. The high

degree of complexity for the improved AES (256! * (2-16)! * 4! * 5!) compared to AES Rijndael (256!) has been obtained with little increase in the execution time. In this work, it is found that the OpenMP has the capability to reduce the costs of design / development and time to implement these systems, and to increase the programmer productivity. The execution of parallel programming leads, reducing the operating time to one half or less depends on the number of processors used, for both encryption/decryption processes for the proposed algorithms. I

PUBLICATIONS 1- Nada Hussein M. Ali, Abdul Monem S. Rahma and Abdul Mohsen Jaber, 2013, “Encryption using Dual Key Transformation based on Creation of Multi S- Boxes in AES Algorithm”, International Journal of Computer Applications, Volume 83 – No 10, December, pp. 1-6. 2- Nada Hussein Mohammad Ali Al-Khafaji, Abdul Monem S. Rahma, Abdul Mohsen Jaber, Sufian Yousef, 2014, “Random Key Permutation Stream Algorithm Based on Modified Functions in AES Algorithm”, International Journal of Engineering and Technology Volume 4 No. 6, June, pp.367-373. 3- Nada Hussein M. Ali, Abdul Monem S. Rahma, Abdul Mohsen Jaber, Sufian Yousef, 2014,”A Byte-Oriented Multi Keys Shift Rows Encryption and Decryption Cipher Processes in Modified AES”, International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April, pp. 953-955. 4- Nada Hussein M. Ali, Sufian Yousef, Abdul Monem S. Rahma, Abdul Mohssen J. Abdul Hossen, 2014, “A Novel Multi Modification in AES Block Cipher Algorithm for Comlexity”, International Review on Computers and Software (I.RE.CO.S.), Volume 9 No. 6, pp. 01-905, June. 5- Nada Hussein M. Ali, Abdul Monem S. Rahma,Abdul Mohssen J. Abdul Hossen, Sufian Yousef, 2014, “Multi Keys Shift Rows Encryption and Decryption Processes in AES transformation”, 4th Annual Research and Scholarship Conference 14th May 2014, Anglia Ruskin University/ United Kingdom.

II

LIST OF CONTENTS Item No.

Subject

Page

ABSTRACT

I

PUBLICATIONS

II

LIST OF CONTENTS

III

LIST OF ABBREVIATIONS

VI

LIST OF FIGURES

VIII

LIST OF TABLES

X

LIST OF THE PROPOSED ALGORITHMS

XII

Chapter One INTRODUCTION

1.1

Introduction

1

1.2

Symmetric Key Encryption Algorithms

1

1.3

Accelerating Symmetric Key Cipher using OpenMP Parallel Processing

2

1.4 1.5

Problem Statement

3

Literature Survey

3

1.6

Aims of the Thesis

5

1.7

Thesis Structure

6 Chapter Two THEORETICAL BACKGROUND

2.1

Introduction

7

2.2

Finite Fields

8

2.3

Symmetric Key Encryption

9

2.4

Advanced Encryption Standard (AES)

10

2.4.1

AES Transformation Functions

11

2.4.2

AES Encryption

15

2.4.3

AES Decryption

15

2.4.4

Operational Modes for Security

18

Fundamentals of Audio Files

20

2.5

III

Item No.

Subject

Page

2.5.1

Digital Audio

20

2.5.2

WAVE Audio Format

22

2.6

Types of Parallel Programming

24

2.7

Parallel Processing using OpenMP Structure

25

2.7.1

Introduction to OpenMP

25

2.7.2

Benefits and Applications of OpenMP

27

2.7.3

Speedup using OpenMP

28

Programming in OpenMP

29

2.8.1

OpenMP API

29

2.8.2

31

2.8.3

The OpenMP Execution Model (Fork/Join) Number of Threads

33

2.8.4

Creating an OpenMP Program

34

2.8.5

Parallel Construct Directives

35

2.8

Chapter Three THE PROPOSED IMPROVEMENT FOR ALGORITHMS ON BLOCK AND STREAM CIPHER 3.1

Introduction

37

3.2

The Proposed SubByte Transformation in AES using Dual Keys (DKSBOX-AES) Function MixColumn Transformation

39

45

3.4

The proposed MixColumn Transformation in AES using Four Keys (4K-MIX-AES) Function The proposed ShiftRows Transformation in AES using Multi Keys (5KSHIFT-AES) Function

3.5

The Proposed Improved Block Cipher AES Algorithm

48

3.6

The Proposed Stream Cipher Algorithms

49

3.7

AES Parallelization

57

3.8

Data Parallelism in Wave File using OpenMP Parallel Design Methodology

59

3.3 3.3.1

3.8.1

IV

43

46

60

Item No. 3.9

Subject The proposed Stream Cipher Methodology

Page 63

4.1

Chapter Four PROCESSING RESULTS AND DISCUSSION Introduction

65

4.2

Block cipher

66

The Proposed SubByte Function using Dual Keys Algorithm (DKSBOX-AES) The Proposed MixColumn Function in AES using Four Keys (4KMIX-AES) The Proposed ShiftRows Function in AES using Five Keys in Shift Operation (5K-SHIFT-AES)

66

4.2.4

The Proposed Improved AES Algorithm

77

4.3

The Proposed Stream Cipher Algorithms

80

4.4

Parallel Block Cipher Algorithms using OpenMP

82 84

4.4.4

Parallel AES Rijndael Algorithm The Proposed SubByte Function using Dual Keys in the Encryption Algorithm (DK-SBOX-AES) The Proposed MixColumn Transformation in AES using Four Keys (4K-MIX-AES) The Proposed ShiftRows Transformation in AES using Five Keys in Shift Operation (5K-SHIFT-AES)

4.4.5

The Proposed Improved AES Algorithm

96

4.4.6

The Proposed Parallel Stream Cipher Algorithms

100

4.2.1 4.2.2 4.2.3

4.4.1 4.4.2 4.4.3

71 74

87 90 93

Chapter Five CONCLUSIONS AND RECOMMENDATIONS 5.1

Conclusions

103

5.2

Recommendations

105

References

106

Appendix A – Finite Fields

111

Appendix B – Parrallel Computing using OpenMP

116

V

LIST OF Abbreviations Abbreviation

Meaning

AES

Advanced Encryption Standard

API

Application Programming Interface

ARB

Architecture Review Board

AddRoundKey ASCII

Add Round Key transformation American Standard Code for Information Interchange

CBC

Cipher Block Chaining

CFB

Cipher Feed Back

CPU

Central Processing Unit

CTR

Counter Mode

CUDA

Computer Unified Devise Architecture

DES

Data Encryption Standard

ECB

Electronic Code Book

GCD

Greatest Common Divisor

GF GPU

Galois Field Graphic Processing Unit

HDL

Hardware Description Language

Hz

Hertz

ICV

Internal Control Variables

InvMixColumns InvShiftRows InvSubByte MDS

Inverse Mixed Columns transformation Inverse of Shift Rows transformation Inverse Sub Byte transformation Matrix Maximum Distance Separable VI

MixColumns

Mix Column transformation

MPI

Message Passing Interface

NIST

National Institute of Standards and Technology

NUMA OFB

Non- Uniform Memory Access Output FeedBack

OpenMP

Open Multiprocessing

PCM

Pulse Code Modulation

PDA

Personal Digital Assistance

POSIX

Portable Operating System Interface

PRNG

Pseudo number Generator

RIFF

Resource Interchange File Format

ShiftRows

Shift Rows transformation

SMP

Symmetric MultiProcessing

SPMD SPN

Single Program Multiple Data Substitute on- Permutation

SubByte

Sub Byte transformation

UMA

Uniform Memory Access

WAVE

Waveform audio format

XOR

Exclusive OR

VII

LIST OF FIGURES Figure No.

Title

Page

(2.1)

Add Round Key transformation in AES.

11

(2.2)

Sub Bytes transformation in AES.

13

(2.3)

Shift Rows transformation in AES.

13

(2.4)

The Mix Column transformation in AES.

14

(2.5)

AES diagram for encryption process.

15

(2.6)

Inverse Shift Rows diagram in AES.

16

(2.7)

AES Diagram for decryption process.

18

(2.8)

Pulse Code Modulation (PCM).

22

(2.9)

Basic WAVE file layout.

23

(2.10)

Shared and distributed memory models.

26

(2.11)

OpenMP main structure language.

30

(2.12)

Fork/Join model.

32

(3.1)

Possible [4 * 4] Key distribution.

40

(3.2)

Multiplication process in Modified MixColumn Transformation.

45

(3.3)

One byte key example.

47

(3.4)

Permutation function details.

52

(3.5)

Parallel version of AES Rijndaels 128-bit algorithm

58

(3.6)

Data Parallelism.

59

(3.7)

The Methodology Overview

60

(3.8) (4.1)

Overview of the Methodology distributed over multicore system using OpenMP . Four key parts and their inverse matrices.

VIII

62 72

Figure No.

Title

Page

(4.2)

ShiftRow Transformation in AES Algorithm.

75

(4.3)

The serial version of the encryption algorithm.

83

(4.4)

The parallel version of the encryption algorithm.

84

(4.5)

Performance graph of AES Rijndael Encryption Algorithm using multiple processors on a different file sizes.

86

(4.6)

Performance graph of AES Rijndael decryption algorithm using multiple processors on a different file sizes.

86

(4.7)

Performance graph of DK-SBOX-AES Encryption algorithm using multiple processors on a different file sizes.

89

(4.8)

Performance graph of DK-SBOX-AES Decryption algorithm using multiple processors on a different file sizes.

89

(4.9)

Performance graph of 4K-MIX-AES encryption algorithm using multiple processors on a different file sizes.

92

(4.10)

Performance graph of 4K-MIX-AES decryption algorithm using multiple processors on a different file sizes.

93

(4.11)

Performance graph of 5K-SHIFT-AES encryption algorithm using multiple processors on a different file sizes.

95

(4.12)

Performance graph of 5K-SHIFT-AES decryption algorithm using multiple processors on a different file sizes.

95

(4.13)

Performance graph of Improved AES encryption algorithm using multiple processors on a different file sizes.

98

(4.14)

Performance graph of Improved AES decryption algorithm using multiple processors on a different file sizes.

98

(4.15)

Performance graph of PER-MIX-SBOX encryption algorithm using multiple processors on a different file size.

(4.16)

Performance graph of PER-MIX-SBOX decryption algorithm using multiple processors on a different file size.

IX

102

102

LIST OF tables Table No.

Title

Page

(2.1)

The AES S-Box Table

12

(2.2)

The AES Inverse S-Box

17

(3.1)

The multiplicative inverses in Rijndael's Galois Field.

42

(3.2)

AES S-Box for Key=0x85 and C=0x45.

42

(3.3)

Inverse S-Box for Key=0x85 and C=0x45.

43

(3.4)

Details of one byte key.

47

(4.1)

AES S-Box for Key=0x67 and C=0x82.

68

(4.2)

Inverse S-Box for Key=0x67 and C=0x82.

68

(4.3)

AES S-Box for Key=0x4C and C=0XC1.

69

(4.4)

Inverse S-Box for Key=0x4C and C=0xC1.

69

(4.5)

Comparison between AES Rijndael and DK-SBOX- AES algorithms.

70

(4.6)

A comparison between AES Rijndael and 4K-MIX-AES algorithms.

74

(4.7)

A comparison between AES Rijndael and 5K-SHIFT-AES algorithms.

76

(4.8)

A comparison between AES Rijndael and improved AES in time.

79

(4.9)

Comparison between AES Rijndael and improved AES algorithms for different factors.

80

(4.10)

The proposed Stream algorithms properties.

82

(4.11)

The Encryption time for AES Rijndael implemented on different CPU cores using key size 128 bits. The Decryption time for AES Rijndael implemented on different CPU cores using key size 128 bits.

85

(4.13)

Improvement in time obtained for AES Rijndael implemented on different CPU cores using key size 128 bits.

85

(4.14)

The Encryption time for DK-SBOX-AES implemented on different CPU cores using key size 128 bits.

88

(4.12)

X

85

(4.15)

The Decryption time for DK-SBOX-AES implemented on different CPU cores using key size 128 bits.

88

(4.16)

Improvement in time obtained for DK-SBOX-AES implemented on different CPU cores using key size 128 bits.

88

(4.17)

The Encryption time for 4K-MIX-AES implemented on different CPU cores using key size 128 bits.

91

(4.18)

The Decryption time for 4K-MIX-AES implemented on different CPU cores using key size 128 bits.

91

(4.19)

Improvement in time obtained for 4K-MIX-AES implemented on different CPU cores using key size 128 bits. The Encryption time for 5K-SHIFT-AES implemented on different CPU cores using key size 128 bits. The Decryption time for 5K-SHIFT-AES implemented on different CPU cores using key size 128 bits. Time saving percentage obtained for 5K-SHIFT-AES implemented on different CPU cores using key size 128 bits. The Encryption time to the proposed improvement in AES algorithm. The Decryption time to the proposed improvement in AES algorithm. Time saving percentage obtained for the proposed improvement in AES implemented on different CPU cores using key size 128 bit.

92

The comparison between AES Rijndael and the proposed improved AES encryption execution time. The comparison between AES Rijndael and the proposed improved AES decryption execution time.

99

(4.20) (4.21) (4.22) (4.23) (4.24) (4.25) (4.26) (4.27)

94 94 94 96 97 97

100

(4.28)

The proposed Stream cipher algorithms encryption time.

100

(4.29)

The proposed Stream cipher algorithms decryption time.

101

(A.1)

Addition in GF(23)

112

(A.2)

Addition Inverse in GF(23)

113

(A.3) (A.4)

Multiplication in GF(23) with the Irreducible Polynomial ( ) ( ) Multiplicative Inverse in GF(23) with the irreducible Polynomial ( ) ( )

XI

114 115

LIST OF THE PROPOSED ALGORITHMS Algorithm No.

Title

Page

(3.1)

The Proposed S-Boxes generation

40

(3.2)

The Proposed SubByte Encryption/Decryption Transformation Function (DK-SBOX-AES)

41

(3.3)

The Proposed MixColumn Encryption /Decryption Transformation Function (4K-MIX-AES)

46

(3.4)

The proposed ShiftRow Encryption/Decryption Transformation Function (5K-SHIFT-AES)

48

(3.5)

The proposed Improved AES Algorithm

49

(3.6)

Random key Permutation function in stream cipher

53

(3.7)

(3.8)

The Parallel AES uses OPENMP

The parallel proposed stream cipher algorithm for the wave file

XII

61

64

Chapter One 1.1

General Introduction

Introduction

With the vast development in the sciences and technology, which will play a large role in the future security of sending confidential information. The latter application is becoming more and more important issue and it can be said that this is the Information Age. Nowadays, many large organizations and even smaller commercial business depend on stored information where the security of this information is of top importance. This study will deal with the security of audio information, which is considered a very important issue, especially in the communication field over the network. By implementing the encryption for such information which can be stored, sent and retrieved securely by the only proper individuals or companies. With the vast development increases in technology, the ability of individuals to crack encryptions is also increased. The Real-time applications need the high speed for transferring information in addition to security. Thus,

the parallel

processing is considered a good solution to accelerate the processing of encryption and decryption operations.

1.2

Symmetric Key Encryption Algorithms In symmetric key cipher, the sender and receiver use the same key for

encryption and decryption. The symmetric key cipher is also called secret key because both sender and receiver have to keep the key secret and properly protected. Basically, the security level of the symmetric keys cipher method totally depends on how well the users keep the keys protected. If the key is known by an intruder, then all data encrypted with that key can be decrypted. This will make it more complicated when symmetric keys are practically shared and updated if necessary. Symmetric key methods can be classified 1

Chapter One

General Introduction

into two groups, namely block and stream ciphers [Sta 11]. The United State Government announced in June 2003 that the block cipher AES (Advanced Encryption Standard) algorithm for 128-bit key length

is categorized as

SECRET, TOP SECRET level for information. [Gra 10] AES is considered as a recent block cipher system that that perform both encrypts and decrypts data in pieces or blocks of 128 bits. According to the key sizes of 128, 192 or 256 bits it uses 10, 12 and 14 rounds respectively. In other words, AES encrypts each block of 128 bits by performing 10 rounds of encryption by a different round key which are generated by the key expansion algorithm of AES. In general, practices, security of any encryption algorithm can be ensured until its cryptanalysis is not possible [Sri 12].

1.3

Accelerating Symmetric Key Cipher using OpenMP Parallel Processing In the 1970s, vector and parallel computer evolution were on the move.

Programming assistance was provided by language extensions first to Fortran and then to C, in the form of directives and pragmas, respectively. The 1980s were a golden era of parallel architectural evolution, with many people writing parallel programs, so extensions again diverged, and programming needs grew [Cha 09]. In October 1997, the first API specifications for the OpenMP Fortran 1.0 was published by the Architecture Review Board (ARB), the following year they released the C/C++ standard. Parallel programming with its efficient design is well recognized to get scientific computing with high-performance for many years. The simulation of scientific problems is an essential issue in natural and engineering sciences, in which the simulations need greater computing power and memory space. In 2

Chapter One

General Introduction

the last decades, high-performance research involving the new progresses in parallel software and hardware technologies has been observed [Rau 10]. The OpenMP is considered as a high-level programming model used for parallel computation. When it is embedded in other applications, it will have many advantages, for example, it needs minimum efforts to convert the sequential program into parallel version. These benefits will reduce the time, costs and increase the programmer productivity. OpenMP may be developed if it encounters the embedded application developers to prompt many levels of parallelism [Cha 09].

1.4

Problem Statement

There are many real time encryption algorithms that encrypt data in the Wave audio files. The Real-time applications need a high speed for transferring information in addition to security As there is no successful attacks against AES have been recognized since 2004, the present work will deal with the AES algorithm as a basis for developing and improving other algorithms both in block and stream cipher systems. The parallel processing is used to accelerate the process of manipulation of information or data for both the sender and receiver on the stand-alone personal computer.

1.5

Literature Survey

Many researchers use the AES block cipher, stream cipher encryption and OpenMP directives as a headline in their work. Some of the published works are the following:  Hosseinkhani and Javadi (2012) introduced a new algorithm to generate dynamic S-Box from cipher key. The S-Box component that used in AES is fixed and not changeable. The S-Box dynamically generated to increase 3

Chapter One

General Introduction

the cryptographic strength of the AES cipher system. The quality of this algorithm is tested by changing only two bits of cipher key to generate new S-Boxes.  Yacob (2012) proposed a symmetric dual key Dynamic block algorithm (SDD) for digital video in the partial encryption technology. This algorithm meets the requirements of real time with high level of complexity with a considerable speed. The proposed encryption algorithm SDD achieves better results of the time faster than AES by 13 times factor for encryption and 9 times of the decryption.  Barnes et al. (2012) took a sequential program that implements the AES algorithm and convert the same to run on multicore architectures with minimum effort. Two different parallel programs have been implemented, one with the fork system call in Linux and the other with the PTHREADS, the POSIX standard for threads.  Das et al. (2012) devised a new algorithm to generate random S-Box and its inverse S-Box based on using different irreducible polynomial in the finite field GF(28), while only fixed polynomial in AES standard. The irreducible polynomial in the AES standard is

m(x) = x8+ x4 + x3 + x+1, this

polynomial is used to find the multiplicative inverse which is well known to any attacker. To overcome this problem in the propose algorithm, different irreducible polynomial is used every time in the finite field of GF (28) and send this the receiver joint with the secret key to raise the security of the cipher operations.  Lambić and Živković (2013) applied an alternative S-box generation method of forming compositions of permutations from some fixed sets. After choosing these sets, output S-boxes are obtained by making various

4

Chapter One

General Introduction

compositions of the starting S-boxes. The sequence of the used indices of starting S-boxes are key-controlled.  Navalgund et al. (2013) proposed an optimized parallel AES algorithm which implemented on shared memory architecture systems. The proposed algorithm uses two techniques for parallization. The first approach use data level parallization, while the second one use control level for parallel. The proposed algorithm is executed on multi core system and the program is written in C language, OpenMP standard is embedded into the program to get full parallization benefits. The results show very attractive performance-effort ratios by OpenMP.  Nagendra

and Sekhar (2014) implemented the AES algorithm using

OpenMP to parallaized this algorithm. The OpenMP standard directives are embedded to an existing program to reproduce a new parallel version of that program. The parallel AES is implemented on dual-core (Intel Core 2 Duo) system. The results show that the improvement in time is about 4045% less compared to the sequential AESfor the encryption and decryption process.

1.6

Aims of the Thesis

This thesis aims to provide a secure broadcasting in real time applications for one of specific types of information, which are audio files. Different architectures, methods and techniques have been proposed both in block and stream cipher structure for cryptographic operations to increase the complexity of such operations. For fast execution and high utilization of resources in the proposed cryptographic algorithms, parallel computing using OpenMP will be implemented to reduce the execution time for these algorithms. The OpenMP will be used to speed up the AES algorithm and the 5

Chapter One

General Introduction

other improved algorithms both in block and stream cipher on a stand-alone Personal computer .

1.7

Thesis Structure

The thesis is categorized into five chapters: Chapter Two is concerned with the mathematical concepts of finite fields that used in the proposed encryption and decryption algorithms. The encryption

algorithm

AES

algorithm

is

described

for

both

encryption/decryption transformation functions (AddRoundKey, SubByte, MixColumn and ShiftRows).

It also reviews a brief description of the

fundamentals and the structure of digital wave files which will be useful for understanding the practical chapter. Finally in this chapter the OpenMP directives which are used to implement the proposed algorithms in parallel processing are described. Chapter Three is devoted to explaining the proposed algorithms and the methodologies for the AES improvement functions. Also the proposed algorithms in stream cipher are introduced. The proposed algorithms which are implemented in sequential version are modified for parallel versions. Chapter Four is dedicated to implement the algorithms presented in chapter three and demonstrates the obtained results for both sequential and parallel versions of the above algorithms. Chapter Five is devoted to listing the conclusions simulated from the analysis of test results. Also, some proposals for future works are presented.

6

Chapter Two

Theoretical Background

2.1 Introduction Efficient implementation of cryptographic algorithms has been focused on the major research efforts for the last two decades. Different metrics such as implementation space (silicon area, code size, execution time (speed), memory usage, etc.) and power usage/energy consumption are used to measure quantitatively the performance of a design/implementation. Majority of cryptographic algorithms utilizes arithmetic operations on finite mathematical structures such as finite multiplicative rings, groups, and finite fields [Sav 10]. Block and stream cipher algorithms are presented in this chapter. In October 2000, the National Institute of Standards and Technology (NIST) publicized the Rijndael algorithm, which is Encryption Standard (AES).

The

selected for the Advanced

symmetric key AES

algorithm is

considered the most common and widely at present to send information in a secure manner. [Nal 07]. The parallelization process of the AES algorithm (Rijndael) along with the description of exploited parallelization tools is presented. The data dependence analysis of loops and appropriate loop transformations were applied in order to parallelize the sequential algorithm. The OpenMP standard was chosen for representing the parallelism of the AES algorithm. Speedup measurements for a parallel program are presented [Bie 05].

7

Chapter Two

Theoretical Background

2.2 Finite Fields Finite fields are fields with only finitely many elements. These are also called Galois Fields GF, in honor of Evariste Galois (1811-1832) who, in his study of roots of polynomials, discovered many of their fundamental properties. Too many cryptographic algorithms are based on finite field arithmetic [such as: Diffie and Hellman, 1976; ElGamal, 1985; Miller, 1986; Kravitz, 1993 and the Advanced Encryption Standard (AES) [ NIST 01]. To understand the AES algorithm and some other modern cryptosystems, it is necessary to understand a little bit about finite fields. For a long time, the theory of finite fields was considered as a branch of mathematics of purely theoretical interest. Yet, following the dawn of computers, practical applications have been found, e.g., in error-correcting codes and cryptography [Yac 12]. A field is more than just a set of elements; it is a set of elements under two operations, called addition and multiplication, along with a set of properties governing these operations. The addition and multiplication operations also imply inverse operations called subtraction and division. Subtraction is just the inverse of addition, Division is just the inverse of multiplication; [Lid 00]. In order to understand the encryption algorithms and methods presented in this study, it is needed to review some mathematical concepts of finite field. It is necessary to provide a brief, but sufficient, coverage of the abstract algebra mathematical concepts. Also, polynomial representation for finite field elements will be introduced as it is used throughout this work. Further details about mathematical operations in finite fields will be presented in Appendix A.

8

Chapter Two

Theoretical Background

2.3 Symmetric Key Encryption In the cipher systems that use symmetric keys, both the two parties (transmitter and receiver) use identical key for in the cipher operation (encryption and decryption). The secret key is another

name for the

symmetric key cipher, since the key must be remain secret and accurately protected by both transmitter and receiver. Essentially, the security level of the symmetric keys cipher method totally depends upon the way of key protection applied by the users. If the key is identified by an intruder, then all information or data encrypted with that key could be decrypted. This will make the protection is more complicated when symmetric keys are practically updated and shared if needed. Symmetric key methods can be categorized into two sets, these are either block ciphers or stream ciphers [Sta 11]. 1. Block Ciphers: The main property for the block cipher it works with constant transformation on a fixed length of bits called blocks. The fixed length could be 64-, 128- and 256-bit. For example, if a block cipher with an input of the 128-bit block of plaintext, then it will yield an output of 128-bit ciphertext block. Thus, the precise transformation is dependent on the secret key. In a similar way, for the decryption operation if the algorithm takes a 128-bit block of ciphertext with its secret key, then it produces the original 128-bit block of plaintext. To encrypt messages longer than the block size, the entire message is subdivided into blocks and encrypted under an operation mode. Different modes of operation provided by the block cipher built on encryption systems are cipher-block chaining (CBC), output feedback (OFB), cipher feedback (CFB) and electronic codebook (ECB). Rijndael and DES are examples of algorithms built on this idea [Sch 96]. 9

Chapter Two

Theoretical Background

2. Stream Cipher: It is a vital method for encryption where the plaintext is encrypted bit-by-bit or symbol-by-symbol to yield the equivalent ciphertext. A stream cipher could be built by creating a pseudo-random key stream applying a block cipher output of exclusive-or (XOR) with the plaintext to yield ciphertext at the sender side. While at the receiver side, the plaintext is recovered by creating the same key stream which is then XORed with the ciphertext. Stream ciphers could be used for highspeed networks at the physical layer in a communication system [Zha 07]. A stream cipher which is best operated in real time system, it reduces the necessity to pad a message to be an integral number of blocks. Thus, each character in the transmitted stream could be encrypted and transmitted immediately by using a character-oriented stream cipher. One required condition of a stream cipher is that the ciphertext must be of the same length as the plaintext. Thus, if 8- bit characters are being sent, each character should be encrypted to yield a ciphertext output of 8 bits [Suo 10].

2.4 Advanced Encryption Standard (AES) Because of great development in the science and technology and the need for exchange information through the communication channels over the network, DES is no longer suitable for such demand. In 1997, a new standardization is announced by NIST, which is known as AES.

The

Advanced Encryption Standard also knew as Rijndael [NIST 01]. The AES algorithm compose of four transformation

functions,

AddRoundKey, substitution byte (SubByte), MixColumns and shift rows 10

Chapter Two

Theoretical Background

(ShiftRows). The algorithm is considered as an iterative algorithm which consists of different rounds depends on key length, for key sizes 128, 192 and 256 bits the number of rounds will be ten, twelve and fourteen respectively. Each plaintext encrypted block in AES has the length of 128 bits, called State matrix. This matrix is represented as 4×4 bytes square matrix. The major of AES is based on operations over the finite field [Nal 07].

2.4.1 AES Transformation Functions The AES encryption algorithm has four basic transformations; these are AddRound Key, SubBytes, ShiftRows, and MixColumns. 1. AddRoundKey is a very easy function or transformation to understand. The AddRoundKey transformation function is a simple bitwise XOR operation. For every 128-bit state matrix it is exclusive OR with a 128-bit round key [Sta 12]. Figure 2.1 shown the state matrix which is simply Xore’s with the present key to yield the output.

Figure 2.1: Add Round Key transformation in AES [WebSite 1].

2. SubBytes transformation It is a substitution function which is considered as a nonlinear transformation. Each byte in the State 11

Chapter Two

Theoretical Background

matrix is working independently with re mapping their values of the S-Box table. The S-Box is created using the multiplicative inverse in the finite field GF (28), two operations are used to implement this process, multiplication ( )

polynomial

with the irreducible followed by an addition,

as an affine transformation[Sah 12].

SubBytes implements the

following equation:

(

)

(

*

)

(

+

)

(

*

… (2.1)

)

+

Figure 2.2 shows the SubBytes transformation function. This process was applied in code using the S-Box table as shown in Table 2.1

Table (2.1): The AES S-Boxes Table [Sta 12]. Y 0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

63

7C

77

7B

F2

6B

6F

C5

30

01

67

2B

FE

D7

AB

76

1

CA

82

C9

7D

FA

59

47

F0

AD

D4

A2

AF

9C

A4

72

C0

2

B7

FD

93

26

36

3F

F7

CC

34

A5

E5

F1

71

D8

31

15

3

04

C7

23

C3

18

96

05

9A

07

12

80

E2

EB

27

B2

75

4

09

83

2C

1A

1B

6E

5A

A0

52

3B

D6

B3

29

E3

2F

84

5

53

D1

00

ED

20

FC

B1

5B

6A

CB

BE

39

4A

4C

58

CF

6

D0

EF

AA

FB

43

4D

33

85

45

F9

02

7F

50

3C

9F

A8

X 7

51

A3

40

8F

92

9D

38

F5

BC

B6

DA

21

10

FF

F3

D2

8

CD

0C

13

EC

5F

97

44

17

C4

A7

7E

3D

64

5D

19

73

9

60

81

4F

DC

22

2A

90

88

46

EE

B8

14

DE

5E

0B

DB

A

E0

32

3A

0A

49

06

24

5C

C2

D3

AC

62

91

95

E4

79

B

E7

C8

37

6D

8D

D5

4E

A9

6C

56

F4

EA

65

7A

AE

08

C

BA

78

25

2E

1C

A6

B4

C6

E8

DD

74

1F

4B

BD

8B

8A

D

70

3E

B5

66

48

03

F6

0E

61

35

57

B9

86

C1

1D

9E

E

E1

F8

98

11

69

D9

8E

94

9B

1E

87

E9

CE

55

28

DF

F

8C

A1

89

0D

BF

E6

42

68

41

99

2D

0F

B0

54

BB

16

12

Chapter Two

Theoretical Background

Figure 2.2: Sub Bytes transformation in AES [WebSite 1].

3. ShiftRows transformation This is a permutation function in which each row in the state matrix is cyclically shifted for different offsets. Basically, the first row (or the top row)

of the state matrix is

unchanged, while the second, third and fourth (or the bottom row) are cyclically shifted one, two and three bytes respectively [Dae 03]. The purpose of shifting operation over different offset is to ensure that each byte in each column is made up from all previous four columns in the state matrix. [Gra 10]. Figure 2.3 demonstrates this operation.

Figure 2.3: ShiftRow Transformation in AES Algorithm [Dae 03].

13

Chapter Two

Theoretical Background

The MixColumn transformation is one of the more complex functions which makes up the AES. This transformation has been applied to the state matrix in a column-by-column manner. Each column will be treated as a four term polynomial over GF(28) and it is multiplied by the constant polynomial

value presented in

). The matrix multiplication can

equation 2.2 and modulo (

be written as shown in Figure 2.4. This must be reversible, so that it can be decrypted again [Sta 12]. ( )

*

+

*

+

*

+

*

+……………(2.2)

Figure (2.4): The Mix Column transformation in AES [WebSite 1].

In AES, Mix Column Transformation is the most expensive operation in which the input matrix is multiplied (in GF) by MDS Matrix (An MDS “matrix Maximum Distance Separable” is a matrix representing a function with certain diffusion properties that have useful applications in cryptography) [Saj 13].

14

Chapter Two

Theoretical Background

2.4.2 AES Encryption The encryption operation of the AES algorithm is carried out as stated in the following manner: It starts with the addition operation (through an XOR operation) of round sub key to the input data. Next, a fixed number of rounds of a substitution-permutation network (SPN) are used. Each round consists of four transforming operations [Jun 05]. Figure 2.5 shows in detail the encryption process.

Figure (2.5): AES diagram for encryption process [Gra 10].

2.4.3 AES Decryption In a similar way, the AES decryption algorithm works through implementing the inverse of all the transformations described in the previous section.

15

Chapter Two

Theoretical Background

AddRoundKey is the exact and the same transformation which is used in the encryption process. Since the next generated State is the same present State exclusive XOR with the key. The inverse of the State is used in both encryption and decryption. InverseShiftRows It is basically the inverse of ShiftRows as the name implies in which the top row of the state is left kept alone or unchanged. The second,third and fourth are shifted one, two and three bytes to the right [Gra 10]. A diagram of this operation is shown in Figure 2.6.

Figure (2.6): Inverse Shift Rows diagram in AES [NIST 01].

InverseSubBytes implements the Inverse S-Box as shown in Table 2.2. This transformation has 1-to-1 mapping which is the reverse of the SubBytes function used in the encryption process. Tables 2.1 and 2.2 are inverse of each other [Gra 10].

16

Chapter Two

Theoretical Background Table (2.2): The AES Inverse S-Box [Sta 12]. Y

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

52

09

6A

D5

30

36

A5

38

BF

40

A3

9E

81

F3

D7

FB

1

7C

E3

39

82

9B

2F

FF

87

34

8E

43

44

C4

DE

E9

CB

2

54

7B

94

32

A6

C3

23

3d

EE

4C

95

0B

42

FA

C3

4E

3

08

2E

A1

66

28

D9

24

B2

76

5B

A2

49

6D

8B

D1

25

4

72

F8

F6

64

86

68

98

16

D4

A4

5C

CC

5D

65

B6

92

5

6C

70

48

50

FD

ED

B9

DA

5E

15

46

57

A7

8D

9D

84

6

90

D8

AB

00

8C

BC

D3

0A

F7

E4

58

05

B8

B3

45

06

X 7

D0

2C

1E

8F

CA

3F

0F

02

C1

AF

BD

03

01

13

8A

6B

8

3A

91

11

41

4F

67

DC

EA

97

F2

CF

CE

F0

B4

E6

73

9

96

AC

74

22

E7

AD

35

85

E2

F9

37

E8

1C

75

DF

6E

A

47

F1

1A

71

1D

29

C5

89

6F

B7

62

0E

AA

18

BE

1B

B

FC

56

3E

4B

C6

D2

79

20

9A

DB

C0

FE

78

CD

5A

F4

C

1F

DD

A8

33

88

07

C7

31

B1

12

10

59

27

80

EC

5F

D

60

51

7F

A9

19

B5

4A

0D

2D

E5

7A

9F

93

C9

9C

EF

E

A0

E0

3B

4D

AE

2A

F5

B0

C8

EB

BB

3C

83

53

99

61

F

17

2B

04

7E

BA

77

D6

26

E1

69

14

63

55

21

0C

7D

The InverseMixColumn in this transformation, the fixed polynomial of equation 2.3 is defined by the following matrix multiplication (2.4) [Man 04]. ( )

*

+

*

+

*

+

*

+ …………………..(2.3)

….. (2.4) [Sta 12]. The decryption process is the reverse order of encryption process. Figure 2.7 demonstrates the decryption operation.

17

Chapter Two

Theoretical Background

Figure (2.7): AES Diagram for decryption process [Gra 10].

2.4.4 Operational Modes for security Although, AES provides considerable security power of transformation of large data through a network, however the security power is also reliant on the operational modes of encipherment. Practically, the data or message may be with larger or variable size, that means much greater than the block size of encryption method. Five modes of operations have been stated by Dworkin (2001) to encrypt data with variable large size. These modes are as follows:

18

Chapter Two

Theoretical Background

 CBC (Cipher Block Chaining mode).  CFB (Cipher FeedBack mode).  CTR (Counter Mode).  ECB ( Electronic code book Mode).  OFB (Output FeedBack mode).

At present, to achieve encipherment in real time several, several researchers suggested the implementation of parallel AES on Computer Unified Devise Architecture (CUDA). The ECB could be applied in encipherment parallel processing as defined by Di Biagio et al. (2009), Daniel and Mircea (2011) and Tran et al . (2011) [Sri 12]. For ECB mode, each plaintext block is encrypted individually. Thus, the encryption operation of different blocks could be accomplished concurrently. While in the CBC or OFB mode, the encryption operation for each block depends on the preceding cipher text block. Thus, in these cases, the parallel processing is not possible. In ECB mode, each plaintext block will produce the same ciphertext block, that means the patterns at the block level will be maintained. If some ciphertext blocks are similar, thus the attacker will be capable to identify the equivalent plaintext block. The independence of the block produces gaps in the security operations of the parallel AES in the ECB and CTR modes. This encourages the researchers working in the field of parallel AES [Sri 12].

19

Chapter Two

Theoretical Background

2.5 Fundamentals of Audio Files There are applications for audio technology, but in general the main aim is to reproduce sound at another place or at later time or both [Wat 01]. Physically, the sound is a small with rapid changes in pressure. It is typically heard through the air. Vibrations are perceived as sound only with increasing or decreasing air pressure many times per second. How rapidly the air pressure change determines loudness. How repeatedly the pressure cycles determines the frequency, thus the two parameters amplitude and frequency are needed to describe the sound [Wag 10]. There is a wide range of sound frequencies are sensitive by human ear, normally from about 20Hz to 22,000 Hz [Sal 07]. In the real world, sound exists as continuous analog values. They represent an effectively infinite range of amplitude (loudness) and frequency (Waggoner, 2010). Before the representation of an audio signal in the digital format, there is a request for converting the continuous signal amplitude values to a discrete representation that is storable by a computer [Bos 03].

2.5.1 Digital Audio Both

ideal digital and analog audio recorders have the same

characteristics. Digital audio signals have two important characteristics: sampling rate and sample size, but digital audio have another fundamental characteristic which is the channel number. To reveal the spatial characteristics of sound, these characteristics must be measured in different spaces and record these values in the same time. For example, in stereo sound system, every independent measurement point creates one channel (i.e., the

20

Chapter Two

Theoretical Background

number of channels is either one (mono sound) or two (stereo sound)). New systems, such as Dolly 5.1 can use up to 6 channels [Bed 13]. There are different methods to convert the analog sound into digital values. In the Pulse Code Modulation (PCM), which is virtually universal use, the audio is represented digitally in this system. Figure 2.8 shows the work of Pulse Code Modulation. The time axis is represented in a discrete or a stepwise manner instead of being continuous. The waveform is measured at regular intervals but not carried by continuous representation. This process is called sampling and the frequency with which samples are taken is called the sampling rate or sampling frequency, Fs. The sampling rate is generally fixed and is thus independent of any signal frequency. If every effort is carried out to overcome the sampling clock of jitter, or time instability, every sample will be made at an exactly even time step. It is clear, if there is any subsequent time base error, the instants at which samples arrive will be changed and the effect can be detected. If samples arrive at some destination with an irregular time base, the effect can be eliminated by storing the samples temporarily in a memory reading them out using a stable, locally generated clock. This process is called a time base correction and all properly engineered digital audio systems must use it. Clearly a time base error is not reduced; it is totally eliminated [Wat 01].

21

Chapter Two

Theoretical Background

Figure (2.8): Pulse Code Modulation (PCM) [Wat 01] .

Audio files that are known and played on a personal computers come in different formats which are usually associated with different file extensions such as (*.wav) format and (*.mp3) format [Azi 07].

2.5.2 WAVE Audio Format In Windows operating system, the WAVE format file is considered a popular file which is employed to save digital audio data. Modern operating systems support a native format for audio files, so it has become important as images and videos. WAVE is a short form for a Wave form audio format; it is a standard data format for storing audio data. The WAVE file is a collection of a number of different types of chunks; each chunk starts with a header and may contain sub chunks as well as data. Figure 2.9 shows the basic layout of the wave file; the basic chunks (descriptor chunk), the format chunk and the data chunk. The descriptor chunk is the WAVE header. The format chunk contains some 22

Chapter Two

Theoretical Background

important parameters describing the waveform such as the sample rate, byte rate, and bits per sample. The data chunk indicates the size of the sound data and contains the raw data New types of chunks may be added in the future leading to a situation where existing software does not recognize the new chunks. Thus, the general rule is to skip any unrecognizable chunks [Bed 13].

Figure (2.9): Basic WAVE file layout [WebSite 2] .

23

Chapter Two

Theoretical Background

2.6 Types of Parallel Programming In the present days, the demand for parallel processing is increased to speedup the execution of many applications. The main goals of the developers are to obtain a full advantages of multiprocessing in the system. It is essential to identify the request for parallel processing. For example, consider a computer has more than one processor (or one processor

with Hyper

Threading), thus if the program is executed on such system, then only one processor will be used for execution. In this case wasting the other processing power, instead of that, utilize the full power of processors rather than letting the other processor sit idle. Using OpenMP will speedup these programs. Parallel processing can be subdivided into two groups, either task based or data based [Akt 06]. 1. Task based: In this type every processor in the system will execute different job or tasks in a parallel way. For example, consider the word processor application program, printing thread is the responsibility for one thread while a Spell Checking is run by other processors at the same time. Each thread is considered as an isolated task.

2. Data based:In this type, the same task is executed, but over different data sets. The first step is to subdivide the load of the data above several processors. For example, to convert a color image to gray scale image, the image is divided into two halves. In the dual core system, the upper half will of image is converted by the first processor while the lower half is converted by the second processor, thus, the processing will decrease the time to the half. In the systems which have more than two processors, the time is decreased as much as CPUs the system has. 24

Chapter Two

Theoretical Background

2.7 Parallel Processing using OpenMP Structure The OpenMP

has become a standard paradigm for shared memory

programming, as it offers the advantage of simple program development and incremental parallel, in a higher abstraction level. The OpenMP usage offers many features expandable, reliable and computational environment which is quite more economic than large massively parallel machines [Phi 08].

2.7.1 Introduction to OpenMP OpenMP (Open Multi-Processing) is defined as an API that works on multiprocessing systems combined with shared memory platform. Presently, the OpenMP directives are embedded in three programming languages (C, C++ and FORTRAN ) on most operating systems and processor architectures. The OpenMP contains the following [ARB 11].  A set of compiler directives (directives could be defined as a statement that added to the program). These directives are combined at compile time to produce multiple versions of the same code or multi- threaded code.  A small set of library routines is applied to permit the shared memory parallelism. These routines could be applied to get finer control over the thread execution.  Environment variables that influence run-time behavior.

The OpenMP parallel programming is executed depending on the principle of a a fork-join model. This model for the OpenMP program is implemented using single thread which is called a master thread. The master thread is executing the program in a sequential manner, as long as there are no parallel directives faced the master thread. For the first parallel statements or 25

Chapter Two

Theoretical Background

directives encountered the master thread, then forks execution to different parallel regions and each code copy is executed by different threads. After finishing the parallel process, the threads in the team join again at an implicit barrier and the master thread is the only thread that remains execution [Hoe 06]. There are two paradigms for parallel processing, shared memory or distributed memory models. In the first model; in case of desktop or laptop computers; there is only one memory units and multiprocessing units, all processors in this model have the right to access to the memory unit which is shared between them. While in the distributed model; as in the computer cluster; every multiple processing units have their own memory unit to store data in. In this model the information is passed between these units. One may define the computer clusters as a separated unit of computers which are connected through the network and are considered as a single unit system [Mat 12]. The underlying architecture can be shared memory UMA or distributed memory NUMA as illustrated in Figure 2.10.

Figure (2.10): Shared and distributed memory models [Man 10].

26

Chapter Two

Theoretical Background

2.7.2 Benefits and Applications of OpenMP OpenMP is considered to be appropriate for execution on a broad range of SMP (Symmetric multiprocessing) architectures. As multi-core machines and multithreading processors are spread in the marketplace, also it might be increasingly used to create programs for uniprocessor computers. OpenMP is considered a simple approach for converting the sequential version of the program into the parallel program with minimum programmer efforts, code variation and time. Such simplicity is achieved by embedding the OpenMP instructions into the sequential program, the suitable insertion is to utilize the full advantage of shared memory model. Many applications have considerable parallelism that can be exploited [Cha 08]. The potential for high performance is gained by Parallel computing offers. The improved computer application performance is the distinctive applied benefit

of parallel processing. To get this benefit, support for parallel

processing must be available in both software and hardware. Nowadays, the developments in multi-CPU and multi-core hardwares with their extensive availability have ensured the necessity for hardware support [Des11]. To

apply parallelism in the software and to achieve significant

improvement in performance, the developer has to participate in programming complexity and additional design. Providing a portable standard parallel computing API specifically for programming shared memory multiprocessors is the main goal of OpenMP implementation. Moreover, the OpenMP offers support for the three main aspects of parallel programming: workload division, communicating between threads, and synchronization between threads [Cha 08].

27

Chapter Two

Theoretical Background

OpenMP suggests the following practical benefits [Des 11]:  Requires small to moderate increase in code size.  Code size increase depends on the range of changes required for parallel scalability.  Capability to implement parallelism in segments.  Small segments of the application could be parallelized individually.  Availability of rich application development and debugging tools.  OpenMP employed the compiler directives and some simple library calls. The directives are automatically ignored if the compiler does not support OpenMP.  Portability.

2.7.3 Speedup using OpenMP Speedup is the predictable performance advantage of running an application on a multi-core versus a single-core machine. Single-core machine performance is considered as the baseline when the speedup is measured. For example, if

the duration of an application on a single-core machine is

assumed to be six hours, then the duration will be reduced to three hours when this application runs on a quad machine. The speedup is 2—(6/3)— that means, the application is twice as fast. One might expect that an application running on a single-core machine would run twice as quickly on a dual-core machine, and that a quad-core machine would run the application four times as fast. But that’s not exactly correct. With some notable exceptions, such as super linear speedup, linear speedup is not possible, even if the entire application runs in parallel. That is because there is always some overhead from parallelizing an application, such as scheduling threads onto separate

28

Chapter Two

Theoretical Background

processors. Therefore, linear speedup is not obtainable Some of the limitations to linear speedup of parallel code are listed below [Mar 11]:  Serial code.  Overhead from parallelization.  Synchronization.  Sequential input/output.

2.8 Programming in OpenMP In OpenMP, the program is usually achieved by multiple independent threads sharing data, but it may also that each thread has some additional private memory regions. OpenMP provides a straightforward interface to write software that can be used for multiple core computers. By using OpenMP, the programmer can write code that will be able to use all cores on a multicore computer and will be run faster if the number of cores is increased [Aly 14].

2.8.1 OpenMP API To prompt shared memory parallelism, the OpenMp API is composite of sets for compiler directives and

a small set of library routines. These

directives offer adequate support to program the parallel execution threads. The library routines could be used to obtain better control over the threads execution. Figure 2.11 demonstrates the structure of OpenMP model. OpenMP classified into three main API components as follow [Des 11]:  Compiler Directives.  Runtime Library Routines.  Data environment. 29

Chapter Two

Theoretical Background

Figure (2.11): OpenMP main structure language [WebSite 3].

The parallel structure systems can be classified into four pattern designs [Mar 11]:  SPMD (Single Program/Multiple Data): A single parallel operation is used for multiple data sequences. In a parallel program, the processor cores often execute the same task on a collection of data.  Master/Worker: The process (master) sets up a pool of executable units (workers), such as threads which execute concurrently.  Loop: Parallelism Iterations of a sequential loop are converted into separate parallel operations. Resolving dependencies between loop iterations is one of the challenges. The Net Framework 4 provides various solutions for loop parallelism, including Parallel.For, Parallel.ForEach. 30

Chapter Two

Theoretical Background

 Fork/Join: Work is decomposed into separate tasks that complete some portion of the work. A unit of execution, such as a thread, spawns the separate tasks and then waits for them to complete.

2.8.2 The OpenMP Execution Model (Fork/Join) The OpenMP program is executed based on fork-join model. In this type of models the program starts executions with main thread which is called master thread in a sequential manner. When a parallel directive is faced by that thread, then the program is forked into multiple parallel region for execution by a team of threads. After the complete execution of parallel region, the threads in the team join again at an implicit barrier and the master thread is the only one that continues for program execution [Hoe 06]. The fork-join model working in nested way, if it is needed. In other word, if the parallel region presently works with the thread team and any one of these thread faced another parallel region, then the forked execution again. In this manner another thread team is created for the new region an so on. After finishing the parallel execution, the thread team is again joined to the main thread and continue execution. Figure 2.12 shows the Fork/Join structure [Bar]. OpenMP uses the fork-join model of parallel execution as follows:  All OpenMP programs start as a single process in which the master thread executes sequentially until the first constructed parallel region is encountered.  FORK: the master thread creates a team of parallel threads.  The statements in the program that are enclosed by the constructed parallel region are executed in parallel among the various team threads. 31

Chapter Two

Theoretical Background

 JOIN: When the team threads complete the statements in the constructed parallel region, they synchronize and terminate leaving only the master thread.  The number of parallel regions and the threads that involve them are arbitrary.

C / C++: General Code Structure [ARB 14]. #include main () { int var1, var2, var3; Serial code . . Beginning of parallel section. Fork a team of threads. Specify variable scoping #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads . Other OpenMP directives . Run-time Library calls . All threads join master thread and disband } Resume serial code

Figure 2.12: Fork/Join model [WebSite 4].

32

Chapter Two

Theoretical Background

OpenMP supplies synchronization constructs menu: critical sections for mutual exclusion, barriers to holding the threads until all in a given team has arrived, atomic constructs for conferring atomicity of individual operations, reduction operations, and a way of ordering code sections within a parallel loop. With these synchronization operations, it is possible to implement most of the desired forms of cooperation between threads [Hoe 06]. The job of the OpenMP implementation is to sort out the low-level details of actually creating independent threads to execute the code and to allocate work to them according to the strategy specified by the programmer [Cha 08].

2.8.3 Number of Threads Threads could be numbered from 0 (master thread) to N-1. The number of threads in a parallel region is determined by the following factors: 1. Evaluation of the IF clause 2. Setting of the NUM_THREADS clause 3. Use of the omp_set_num_threads() library function.

4. Setting of the OMP_NUM_THREADS environment variable 5. Implementation default - usually the number of CPUs on a node, though it could be dynamic. Example 2.1 illustrates the use of the parallel region with OpenMP.

33

Chapter Two

Theoretical Background

Example 2.1: [ARB 14] C++ Parallel Region Example #include main () { int nthreads, tid; /* Fork a team of threads with each thread having a private tid variable */ #pragma omp parallel private(tid) { /* Obtain and print thread id */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and terminate */ } . . . }

2.8.4 Creating an OpenMP Program The first step to convert any sequential program into a parallel one using OpenMP standard, the programmer must first identify part of the program that could replace with parallel code. Essentially, it means finding the sub code of the program that may execute simultaneously by multiple processors. The identification of the parallel region with the sequential program is not always an easy task. The developers must identify first the program part that may convert into parallel execution, or even replaces with another algorithm or strategy to get the full advantage of parallel implementation. This can be a 34

Chapter Two

Theoretical Background

challenging problem. Fortunately, many parallel structure strategies, algorithms or designs could be as an alternative method to replace the sequential into parallel. Also a good deal of knowledge exists about algorithms and their suitability for parallel execution [Cha 08]. The second step in generating an OpenMP program is to rewrite the program in a suitable way for the parallelism that has been recognized. An enormous practical advantage of using OpenMP is that it could be used to incrementally create a parallel program from an existing sequential code. The developer could change the sequential program into a parallel version by inserting the OpenMP program directives, these directives are inserted into parallel region and leaves the rest of the program executed in a sequential manner. Once the resulting program version has been successfully compiled and tested, another portion of the code can be parallelised. The programmer can finish this process once the desired speedup has been obtained [Cha 08].

2.8.5 Parallel Construct Directives In spite of the presence many OpenMP directives, it's easy to starting with knowing just a little. But the most important directives and widely used are either pragma omp parallel or pragma omp parallel for.

1. pragma omp parallel This directive generates a parallel region for the dynamic extent of the structured block that follows the directive. This directive ―pragma omp” tells the compiler that the structured block of code should be performed in parallel on several threads. Each thread will execute the same instruction

35

Chapter Two

Theoretical Background

stream not necessary

the same instruction set. This will depend upon

control-flow statements such as if-else [ARB 13].

2. The pragma omp parallel for The use of the “ pragma omp parallel for ” is to parallelize the for loop following the pragma. At the start of the parallel region, the master thread generates zero or more threads (child threads) depending upon the available number of processors in the system. The total number of loop iterations is subdivided between the master and the child threads; in the same team. The threads within the

team are executed in parallel.

Unlike the sequential

manner, there is no assurance that the iterations would be occurred in an ordered mode as the case with only one thread executes the whole loop. At the end of the for loop in parallel region, there is a an implied barrier at the end. The execution operation is working in the following manner, each thread after finishing execute its iterations, then it will wait all other threads to finish their iterations at this barrier. Only the master thread continues beyond the loop [ARB 13].

36

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

3.1 Introduction This chapter deals with the development of the proposed AES algorithm and the stream cipher. The main improvement functions in the proposed AES algorithm are SubByte, MixColumn and ShiftRows transformation functions. The proposed improved AES have been implemented on audio wave files. Also, some of these improved functions in AES have been implemented on stream ciphers. OpenMP, which is a common parallel programming paradigm for sharedmemory multiprocessors is used for the SPMD model. OpenMP and the development AES algorithm have been applied together with Wave files to accelerate the execution time for both in the encryption and decryption operations. Also, this chapter describes the substitution byte and its improvement, using a Dual Keys algorithm (DK-SBOX-AES),

in the encryption and

decryption operations. Mix column function has been discussed in brief and the proposed MixColumn using four key algorithm (4K-MIX-AES) is presented. Shift rows transformation and the proposed improvement algorithm (5K-SHIFT-AES) using Multi Keys in shift operation is described. Stream ciphers and the proposed algorithms using the functions of the improved AES transformation. Also in this chapter a brief introduction into OpenMP directives, the proposed algorithms mentioned above modified to execute in parallel with OpenMP on the wave file to reduce encryption and decryption time. The following block diagram demonstrate the algorithms that executed both in block and stream cipher and the parallel processing implementation of these algorithms.

37

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

AES and the Proposed Algorithims

Parallel Processing using OpenMp

Sequential processing

Block Cipher Algorithms



AES Rijndael

Stream Cipher Algorithms

 PER-MIX-SBOX



Parallel AES Rijndael

 AddRoundKey

 4K-MIX-AES



Parallel DK-SBOX-AES

 SubByte

 PER-RND-KEY



Parallel 4K-MIX-AES

 MixColumn

 DK-SBOX-AES



Parallel 5K-SHIFT-AES



Parallel Proposed

 ShiftRows  The proposed DK-SBOX-AES

Transformation Algorithm.  The proposed 4K-MIX-AES

Transformation Algorithm.

 PER-SHIFT-SBOX

Improved AES

 5K-SHIFT-AES

 Parallel PER-MIX-SBOX

 PER-RND-KEY

 Parallel PER-SHIFT-BOX

 DK-SBOX-AES

 The proposed 5K-SHIFT-AES

Transformation Algorithm. 

The proposed Improved AES Algorithm including the following transformation:  AddRoundKey  DK-SBOX-AES  4K-MIX-AES  5K-SHIFT-AES

A Block Diagram for the Proposed Block and Stream cipher Algorithms

38

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

3.2 The Proposed SubByte Transformation in AES using Dual Keys (DK-SBOX-AES) Function In general, S-Box is a nonlinear substitution table that gets a number and returns another number. The values of the S-Box table in AES is being fixed without any change. The goal of the proposed approach is to use dual keys in the encryption and decryption processes in SubByte transformation function instead of fixed structure for the S-Box used in the AES Rijndael. Instead of using single S-Box table in AES Rijndael, the proposed SubByte function solves the problem of the fixed structure which will lead generating more secure block ciphers. Each byte in State matrix will encrypt using different S-Box tables created by the first key, and in turn increase the security of the AES block cipher system. The main advantage of the propose function is that an huge number of S-Boxes can be generated. The second key represents a random distribution of the S-Boxes created by the first key. This key will be as the form of a set of sequence S-Boxes tables arranged randomly chosen by the two parties (sender and recipient). This operation leads to increase the degree of complexity within the same delay time during the encryption and decryption processes in the proposed SubByte function.

The two keys are described in details as follows: 1. The first key: It selects multi random values, each one will lead generating unique S-Boxes provided that each one has its associated inverse. Algorithm 3.1 shows in details how to use eight different random keys to generate eight different S-Boxes.

39

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

2. The second key: It is represented as a two dimensional matrix [4*4], which means that the key space available in this manner is (16!) as shown in Figure 3.1. This figure presents some possible output keys [4*4] that can be generated, while Algorithm 3.2 demonstrates the encryption process using the first and second keys. Algorithm 3.1: The Proposed S-Boxes generation Input:: Different Eight values as { Rnd_Key[k], Con_c[k] , k= 1,2,………,8 } Output:: Different Eight S-Boxes { (S-box[i][j])k , (S-1-box[i][j])k , k=1,2,…..,8 } Step1:Select 8 keys Rnd_Key[k] that each one generate a unique Sbox in condition every key has its inverse to rebuild Inverse S-box Step2: Select 8 random values Con_c[k] for constant C Step3: For every key Rnd_Key[k] and corresponds constant Con_c[k] create its own S-box[i][j] using affine transformation as: (S-box[i][j])k= Rnd_Key[k] * mulp[r][c] + Con_c[k] Where mulp[r][c] represent the multiplicative inverse in GF(28) as shown in Table 3.1.

Step4: Use the above( S-box[i][j])k in encryption process . Step5: End

Figure (3.1): Possible [4 * 4] Key distribution.

40

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Algorithm 3.2: The Proposed SubByte Encryption/Decryption Transformation Function(DK-SBOX-AES) Input :plaintext Block message { State[Row][Column]Row,Colum= 1,2,3,4} Different Eight S-Boxes { (S-box[i][j])k , (S-1-box[i][j])k , k=1,2,…..,8 Key distribution matrix {Key_Enc[4][4]} Output: ciphertext Block message {State[Row][Column]RowColum= 1,2,3,4} For every block to be ciphered in AES algorithm using the new created S- box[i][j] as follows:

Step1: For Every Row in State matrix Do Step2: For every Column in State matrix Do Step3: Y=(State[Row][Column])&0x0f; X=(State[Row][Column]>>4)&0x0f; Where X, Y represent the index of row and column in each S-Box respectively.

Step4: The state matrix State[Row][Column] can be encrypted using the index of each S-Box Key_Enc[4][4] as: State[Row][Column]=(S-box[x][y] ) Row, Column Step5: End

The S-Box is created by determining the multiplicative inverse for a given number in GF(28). Table (3.1) shows the multiplicative inverses in Rijndael's AES in Galois Field GF(28), while Tables 3.2 and 3.3 represent the S-Boxes created for

the values (Key=0x85 and C=0x45) and its inverse in the

proposed algorithm.

41

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Table(3.1): The multiplicative inverses in Rijndael's Galois Field [WebSite 5] Y 0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

--

01

8D

F6

CB

52

7B

D1

E8

4F

29

C0

B0

E1

E5

C7

1

74

B4

AA

4B

99

2B

60

5F

58

3F

FD

CC

FF

40

EE

B2

2

3A

6E

5A

F1

55

4D

A8

C9

C1

0A

98

15

30

44

A2

C2

3

2C

45

92

6C

F3

39

66

42

F2

35

20

6F

77

BB

59

19

4

1D

FE

37

67

2D

31

F5

69

A7

64

AB

13

54

25

E9

09

5

ED

5C

05

CA

4C

24

87

BF

18

3E

22

F0

51

EC

61

17

6

16

5E

AF

D3

49

A6

36

43

F4

47

91

DF

33

93

21

3B

7

79

B7

97

85

10

B5

BA

3C

B6

70

D0

06

A1

FA

81

82

8

83

7E

7F

80

96

73

BE

56

9B

9E

95

D9

F7

02

B9

A4

9

DE

6A

32

6D

D8

8A

84

72

2A

14

9F

88

F9

DC

89

9A

A

FB

7C

2E

C3

8F

B8

65

48

26

C8

12

4A

CE

E7

D2

62

B

0C

E0

1F

EF

11

75

78

71

A5

8E

76

3D

BD

BC

86

57

C

0B

28

2F

A3

DA

D4

E4

0F

A9

27

53

04

1B

FC

AC

E6

D

7A

07

AE

63

C5

DB

E2

EA

94

8B

C4

D5

9D

F8

90

6B

E

B1

0D

D6

EB

C6

0E

CF

AD

08

4E

D7

E3

5D

50

1E

B3

F

5B

23

38

34

68

46

03

8C

DD

9C

7D

A0

CD

1A

41

1C

0

X

X

Table (3.2): AES S-Box for Key=0x85 and C=0x45. Y

X

X

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

45

06

B0

E3

EB

27

16

43

46

47

74

34

B8

1F

12

FC

1

C4

B5

10

4A

89

F2

FD

73

BB

CB

3C

23

BA

95

CD

3E

2

85

6C

3D

2B

EF

C1

96

6D

77

DA

CA

3F

19

98

0A

B2

3

39

DB

56

EA

AD

40

76

13

EE

57

2D

2F

01

67

F8

28

4

25

F9

D1

35

79

5A

26

94

44

F0

53

B4

AC

63

05

1C

5

08

B6

B

98

82

20

2C

69

6B

88

AB

68

E2

4B

BE

B9

6

FA

30

5E

C5

CC

07

92

50

65

5D

93

D2

DC

15

6E

C6

7

90

70

18

AA

71

F6

24

0E

33

C9

00

CE

CF

F4

A7

62

8

21

58

16

E4

5B

C

29

2A

0F

41

9E

59

A0

C3

E1

81

9

91

61

9F

A9

1A

78

E9

4F

B1

7C

02

FE

31

17

BD

4C

A

B7

DE

BC

F1

36

A2

B3

8F

96

2E

F7

09

95

94

86

7B

B

52

5C

93

8E

32

87

D3

8A

C2

75

42

4D

EC

AF

6F

69

C

99

37

FF

49

9C

0D

51

97

D5

E5

64

48

AE

7F

9B

D7

D

55

8D

1D

38

7A

DF

DA

C0

DD

3B

39

4E

84

72

D0

22

E

FB

11

8B

83

BF

D4

E6

D8

5F

04

C8

99

F5

A1

E0

7D

F

7E

E8

03

14

E7

1E

80

F3

54

C7

96

8C

60

ED

D6

66

42

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Table (3.3): Inverse S-Box for Key=0x85 and C=0x45. Y

X

X

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

7A

3C

9A

F2

E9

4E

01

65

50

AB

2E

52

85

C5

77

88

1

12

E1

0E

37

F3

6D

06

9D

72

2C

94

82

4F

D2

F5

0D

2

55

80

DF

1B

76

40

46

05

3F

86

87

23

56

3A

A9

3B

3

61

9C

B4

78

0B

43

A4

C1

D3

DA

30

D9

1A

22

1F

2B

4

35

89

BA

07

48

00

08

09

CB

C3

13

5D

9F

BB

DB

97

5

67

C6

B0

4A

F8

D0

32

39

81

8B

45

84

B1

69

62

E8

6

FC

91

7F

4D

CA

68

FF

3D

5B

BF

57

58

21

27

6E

BE

7

71

74

DD

17

0A

B9

36

28

95

44

D4

AF

99

EF

F0

CD

8

F6

8F

54

E3

DC

20

AE

B5

59

14

B7

E2

FB

D1

B3

A7

9

70

90

66

6A

AD

1D

26

C7

2D

EB

C0

CE

C4

FA

8A

92

A

8C

ED

A5

B2

47

AC

A8

7E

53

93

73

5A

4C

34

CC

BD

B

02

98

2F

A6

4B

11

51

A0

0C

5F

1C

18

A2

9E

5E

E4

C

D7

25

B8

8D

10

63

6F

F9

EA

79

2A

19

64

1E

7B

7C

D

DE

42

6B

B6

E5

C8

FE

CF

E7

29

D6

31

6C

D8

A1

D5

E

EE

8E

5C

03

83

C9

E6

F4

F1

96

33

04

BC

FD

38

24

F

49

A3

15

F7

7D

EC

75

AA

3E

41

60

E0

0F

16

9B

C2

If the cryptanalyst wants to decode the proposed DK-SBOX-AES algorithms, all the possible S-Boxes should be trying to generate and use them in the proposed function of the AES cipher system, beside the encryption key.

3.3 MixColumn Transformation Different irreducible polynomials of degree

GF(28) could be used to

implement the multiplication operations in the MixColumn transformation function. The following are some of the irreducible polynomial in GF(28), the Polynomials are given in an octal representation

435 , 567, 763 , 551, 675, 747, 453, 727, 023, 545, 613, 543, 433, 477, 537, 703, 471, 037, 007.

43

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

The binary digits are the coefficients of the polynomial, with the higherorder coefficients at the left. For example, the binary equivalent of 763 is (1 1 1 1 1 0 0 1 1) , and the corresponding polynomial is: X8 + X7 + X6 + X5 + X4 + X + 1. Different irreducible polynomials could be used in the MixColum function. In the Example 3.1 the State matrix is encrypted using two different irreducible polynomials in the MixColumn transformation function of AES Rijndael. These polynomials are: First: X8 + X6 + X5 + X2 + 1 Second: X8 + X7 + X3 + X + 1.

Example 3.1: 1. The encrypted State matrix using the first irreducible polynomial. 22 6A C0 E1

State matrix 35 41 88 97 C9 D3 DD D9

4D 96 DF C8

B3 FA BD EB

Enc. State matrix F3 8B 59 E6 F2 53 79 89

BD F7 1F 50

2. The encrypted State matrix using the second irreducible polynomial.

22 6A C0 E1

State matrix 35 41 88 97 C9 D3 DD D9

4D 96 DF C8

5D FA BD EB

44

Enc. State matrix F3 65 B7 08 1C 53 97 89

BD F7 F1 BE

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

3.3.1 The Proposed MixColumn Transformation in AES using Four Keys (4K-MIX-AES) Function At AES, MixColumn Transformation is the most expensive operation where the input matrix is multiplied Over GF(28). The key matrix, uses in forward and inverse MixColumn transformation functions, that operate on State matrix is single and fixed with dimension [4*4]. The proposed algorithm to improve MixColumn transformation splits the key matrix into four parts, each part represents a different key with dimension [2*2] for each one chosen by the user. The State matrix is also divided into four parts with [2*2] dimension, each part corresponds to one of the keys to a similar position in the key matrix. In the product matrix and in each part, any element is the sum of the products of the elements of one row and one column. In this case, the individual additions and multiplications are executed in GF(2) and GF(28). The transformation can be defined by the following matrix multiplication between the State and key matrices as shown in Figure 3.2 and equation 3.1. The proposed algorithm to improve MixColumn transformation is shown in Algorithm 3.3.

Figure (3.2): Multiplication process in Modified MixColumn Transformation.

45

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Where: (

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

Algorithm 3.3: The Proposed MixColumn Encryption /Decryption Transformation Function (4K-MIX-AES) Input: Plaintext Block Message P_Block[r][c], r,c =1,……,4 The mix column key encryption Mix_Key[r][c], r,c = 1,……,4

Output: Ciphertext Block Message C_Blok[r][c], r,c =1,…..,4 . Step1: Split P_Block[r][c] and Mix_Key[r][c] into four parts each of which of size 2*2, and for each part of Mix_Key[2][2] find its inverse matrix. Step2: Multiply each part of P_Block[2][2]* Mix_Key[2][2] to produce C_Matrix[2][2]. The multiplication and addition is executed in GF(28)

Step3: Reconstruct the C_Block[r][c] r,c=1,2,….,N, from each part of C_matrix[2][2] Step4 : End

3.4 The Proposed ShiftRows Transformation in AES using Multi Keys (5K-SHIFT-AES) Function The AES ShiftRows transformation is a byte oriented, while the proposed algorithm is a bit oriented. Five keys have been used to increase the complexity of the ShiftRows transformation in each round. The keys are generated randomly, if the number of rounds is 10, so the number of keys is 50. 46

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

The encryption and decryption processes are performed either in rows or columns. The description details of each used key are given in Figure 3.3 and Table 3.4, while Algorithm 3.4 demonstrates the proposed ShiftRow transformation steps.

Figure (3.3): One byte key example.

Table (3.4): Details of one byte key. Bits index 0, 1, 2, 3, 4

5,6

7

Description Numbers of bits to be shifted cyclically (25) The index of either row or column to be shifted (22) =0 apply on column, =1 apply on row

47

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Algorithm (3.4): The proposed ShiftRow Encryption/Decryption Transformation Function (5K-SHIFT-AES) Input : plaintext Block message { State[Row][Column],Row, Column=1,2,3,4} Shift_Key[Round][5]={..,..,..,…..}, Round=1,2,…..,depend on key length.

Output: ciphertext Block message {State[Row][Column],Row, Column=1,2,3,4}

For every Block message to be ciphered in AES algorithm do

Step1: Repeat for each State[Row]{Column} using Shift_Key[Round][j], j=1,2,…..,5}

Step2: for each key (one byte=8 bits {0,1,….,7} ) find the following: RC: only one bit which indicate choosing either ROW or COLUMN Index_RC: the index of either the row or column to be shifted(two bits) Shift_No: number of bits to be shifted (five bits) Step3: Encrypt/ Decrypt the State matrix each round five times depend on given keys in Shift_Key set. Step4: End

3.5 The Proposed Improved Block Cipher AES Algorithm The aim of improved AES is to achieve higher complexity compared to AES Rijndael keeping the required time for encryption/decryption processes near the same. Such complexity was justified by applying the proposed modified SubByte function (DK-SBOX-AES), the proposed

Modified

MixColumn function (4K-MIX-AES) and the proposed Modified ShiftRow function (5K-SHIFT-AES). The proposed algorithm has been applied to the variable length sizes of wave files. Algorithm 3.5 describes in detail the encryption process for the proposed modified functions which have the same steps as an AES Rijndael algorithm.

48

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Algorithm (3.5): The proposed Improved AES Algorithm Input : plaintext Block message { State[Row][Column],Row, Column=1,2,3,4}

Output: ciphertext Block message {State[Row][Column],Row, Column=1,2,3,4} For every Block message to be ciphered in AES algorithm do

Step1: AddRoundKey (State[Row][Column]) Step2: Repeat Step3 for nine rounds Step3: DK-SBOX-AES 5K-SHIFT-AES 4K-MIX-AES AddRoundKey

Step4: The last round have the following functions DK-SBOX-SES 5K-SHIFT-AES AddRoundKey Step5: End

3.6 The Proposed Stream Cipher Algorithms The below improved functions which have been developed in AES algorithm could be utilized to implement encryption/ decryption process in the proposed stream cipher algorithms. 1. (DK-SBOX-AES) of the SubByte and InvSubByte 2. (4K-MIX-AES) of the MixColumn, InvMixColumn 3. (5K-SHIFT-AES) of the ShiftRows, InvShiftRows

49

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Another function (PER-RND-KEY) has been created in the proposed stream cipher algorithm which represents the key. The key values are arranged randomly depending on the user desire to increase the complexity. The random key is used in the permutation between a pair of tiny blocks. The proposed algorithms are applied on stream cipher with tiny blocks (N*N). Different block lengths ([2,2], [4,4] , [8,8]) could be chosen to represent the tiny blocks. Consequently, two algorithms have been developed and implemented in wave files.  The first proposed algorithm (PER-MIX-SBOX) uses three functions; two of them are 4K-MIX-AES and DK-SBOX-AES, the third function is PER-RND-KEY. The three functions are used in the encryption process in the proposed algorithm as in the following order: 1. 4K-MIX-AES. 2. PER-RND-KEY. 3. DK-SBOX-AES. While, in the decryption process for the same algorithm, the above mentioned functions are used in reverse order as follows: 1. INV-DK-SBOX-AES. 2. INV-PER-RND-KEY. 3. INV-4K-MIX-AES.  The second proposed algorithm (PER-SHIFT-SBOX) also uses three functions in the encryption algorithm in the following order: 1. 5K-SHIFT-AES. 2. PER- RND-KEY. 3. DK-SBOX-AES.

50

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

While, in the decryption process for the same algorithm, the above mentioned functions are used in reverse order as follows: 1. INV-DK-SBOX-AES. 2. INV-PER-RND-KEY. 3. INV-5K-SHIFT. The proposed algorithms have been implemented, giving a successful encryption/decryption results with considerable execution time. Figure 3.4 demonstrates the permutation function. While, Algorithm 3.6 demonstrates the pseudocode for the encryption/decryption process using the tiny blocks in PER- RND-KEY function.

51

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Figure (3.4): Permutation function details.

52

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Algorithm 3.6: Random key Permutation function in stream cipher Input: Two plaintext tiny blocks of size N*N, Block1[r][c], Block2[r][c], r,c=1,2…..,N .

Output: New_Block1[r][c],New_Block2[r][c], r,c=1,2……,N.

Step1: For Per_Key[r][c], generate a unique index in each item in Per_Key, where the index range is 1,2,………….,N*N.

Step2: Create Perm_Block[r][c], r,c=1,2,…….,N, where First half of Perm_Block= first half of Block2. Second half Perm_Matrix=Second half of Block1.

Step3: The New_Per_Block[r][c], r,c=1,2,…….,N, is constructed from Per_Block depending on the Per_Key[r][c].

Step4: The New_Block1[r][c], New_Block2[r][c] is reconstructed again as follow: First half of New_Block

=First half of New_Per_Matrix

Second half of New_Block= Second half of New_Per_Matrix

Step5: End

The following example (Example 3.2) explains in details how to encrypt/decrypt the plaintext using the PER-MIX-SBOX algorithm. Example 3.2: Two tiny blocks BLOCK1 and BLOCK2 of sizes [4*4] for each have been chosen to encrypt in this example. The necessary keys used are: a. Subtitution Byte key creation (different 8 S-Boxes creation)

53

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Rnd_Key[16]={0xf1,0x67,0x25,0x85,0xb5,0xA4,0x19,0x4c} b. The key distribution matrix which uses for random selection of the S-Boxes created in the previous step. Key_Enc[16]={5,3,7,6,2,4,1,7,8,2,6,4,1,3,5,8}; c. Four keys of size 2*2 use in MixColumn transformation d. The permutation key [PER_Key[4][4]={8,0,14,4,1,3,10,7,6,11,2,15,13,12,5,9}]

1. The plaintext tiny blocks are: 22 6A C0 E1

BLOCK1 35 41 88 97 C9 D3 DD D9

4D 96 DF C8

B0 69 35 40

BLOCK2 A2 94 5C 47 32 33 46 57

77 39 3A 60

BLOCK2 94 0D 03 0C 07 5C 06 CE

E3 7E 21 39

2. After encryption using 4K-MIX-AES function. 71 5C 80 96

0C B5 09 3C

BLOCK1 96 66 62 61

0C 31 CB FB

D9 8E 63 89

3. The PER-BLOCK that is configured from Block1 and Block2 PER-BLOCK D9

94

0D

E3

8E

03

0C

7E

80

09

62

CB

9D

3C

61

FB

54

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

4. The RND-PER-BLOCK after rearrangement with permutation key (PER_Key) RND-PER-BLOCK 80

D9

61

8E

94

E3

62

7E

0C

CB

0D

FB

3C

9D

03

09

5. Reconfigure the two tiny blocks BLOCK1

BLOCK2

71

0C

96

0C

80

D9

61

8E

5C 0C 3C

B5 CB 9D

66 0D 03

31 FB 09

94 63 89

E3 07 06

62 5C CE

7E 21 39

E5 A5 00 7E

BLOCK2 F1 A5 83 99 06 07 BA D4

AB 7C 6C FE

6. Encrypt using DK-SBOX-AES function. 56 E0 39 EB

BLOCK1 DD 89 87 33 B5 CF C9 DB

B3 6C 8C C6

The last result in step 6 represents the encrypted blocks, then they will be decrypted in reverse order as in the following steps: 1. INV-BLOCK1 and INV-BLOCK2 after applying the INV-DK-SBOXAES. 71 5C 0C 3C

INV-BLOCK1 0C 96 B5 66 CB 0D 9D 03

0C 31 FB 09

80 94 63 89 55

INV-BLOCK2 D9 61 E3 62 07 5C 06 CE

8E 7E 21 39

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

2. The INV-PER-BLOCK is reconfigured from INV-BLOCK1 and INVBLOCK2. INV-PER-BLOCK 80

D9

61

8E

94

E3

62

7E

0C

CB

0D

FB

3C

9D

03

09

3. The INV-RND-PER-BLOCK after rearrangement with the permutation key (PER-Key) INV-RND-PER-BLOCK D9

94

0D

E3

8E

03

0C

7E

80

09

62

CB

9D

3C

61

FB

4. Reconfigure the two INV-BLOCK1 and INV-BLOCK2 71 5C 80 96

INV-BLOCK1 0C 96 B5 66 09 62 3C 61

0C 31 CB FB

D9 8E 63 89

INV-BLOCK2 94 0D 03 0C 07 5C 06 CE

E3 7E 21 39

INV-BLOCK2 A2 94 5C 47 32 33 46 57

77 39 3A 60

5. Decrypt the inverse blocks with INV-4K-MIX-AES. 22 6A C0 E1

INV-BLOCK1 35 41 88 97 C9 D3 DD D9

4D 96 DF C8

B0 69 35 40

56

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

The aim of linking the two blocks and randomly distributed their elements in the PER-Key function

is to increase the complexity in the

encryption/decryption process. This is achieved by increasing the degree of diffusion for the elements that overlap with each other. This will increase the difficulty of the decryption process by the cryptanalysis.

3.7 AES Parallelization During this research, it is realized that the total running time of the AES algorithm consisted of the following time consuming operations: 1. Reading data from input file (both encrypted / decrypted text), 2. The data encryption, 3. The data decryption, and 4. Writing the data to the output file (both encrypted / decrypted text).

In accord to the proposed parallel AES algorithm, it is firstly store and save the plaintext and expanded key. Then, the plaintext is subdivided into blocks of 16 bytes which are completely parallel encrypted. In this study, the 128 bit block will be executed as a parallel AES algorithm. Figure 3.5 shows in detail the parallel of the AES algorithm.

57

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Start

Start

Input key

Input key

Input key

Key expansion

Key expansion

Key expansion

Input plaintext

Input plaintext

Input plaintext

AddRoundKey

AddRoundKey

AddRoundKey

ShiftRow

MixColumn

AddRoundKey

MixColumn

SubByte

ShiftRow

MixColumn

AddRoundKey

AddRoundKey

SubByte

SubByte

SubByte

ShiftRow

ShiftRow

ShiftRow

AddRoundKey

AddRoundKey

AddRoundKey

Save ciphertext

Save ciphertext

End

End

Rr

Save ciphertext

End

Figure 3.5: Parallel version of AES Rijndaels 128-bit algorithm. 58

Round10-1

ShiftRow

………… …………

Round10-1

SubByte

Round10-1

SubByte

Start

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

3.8 Data Parallelism in Wave File using OpenMP In data Parallelism approach, the data are subdivided into many parts (more than one) and send each part to different processors for execution. Each processor is performing the same function or procedure, but with different data. This approach is very active and effective when there are huge data to process. AES can be executed using DATA parallelism in the following manner, as shown in Figure 3.6. The master processor sends the key and the plaintext together to be executed on processor1 which will compute the ciphertext by running the AES algorithm and finally send back the result to the master processor. The same procedure is followed by the Processor2 as shown in the figure.

Master processor

AES (PROCESSOR2) Key + Plaintext2

AES (PROCESSOR1) Key + Plaintext1

Master processor

Figure (3.6): Data Parallelism.

59

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

3.8.1 Parallel Design Methodology In the present study, the data are subdivided according to the number of available processors into smaller chunks in the wave file and perform the encryption/decryption process in parallel, so that the whole throughput is much greater compared to its single case implementation as shown in Figure 3.7.

Figure (3.7): The Methodology Overview.

Algorithm

3.7

shows

the

proposed

pseudocode

for

parallel

encryption/decryption operation, while Figure 3.8 gives the methodology overview of the for the present study. As shown in this Figure, the large data file will subdivide by the system into small parts or chunks with equal sizes. The number of these chunks depends on the number of available processors in 60

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

the system. For example, if the system has four CPUs then the number of these chunks must be a multiple of four. After that, the encryption process is done in parallel depending on the given number of processes or threads in the system. This process is repeated several times( 100 times) and the average execution time is calculated, the purpose of this step is to eliminate results that have greater deviation to assure that the amounts recorded are fair.

Algorithm 3.7 : The Parallel AES uses OPENMP Input: input wave file , to encrypt or decrypt . Output: output file encrypted or decrypted

Step 1: Define maximum number of processors to be used in the program execution CPU. Step 2 : Divide the wave file into chunks, as follow Number of chunks=wave file length % iCPU Step 3 : Repeat to all chunks of the wave file Step 4: Read sub chunksi = number of iCPU , i=1,2,……iCPU, where each sub chunk represent State matrix [4*4] Step 5: Define the parallel region using OPENMP directives Step 6: For every process (CPU) perform encryption /decryption process in parallel to each sub chunk using AES code. Step 7: Go to step 3. Step 8: Calculate the execution time. Step 9: Close all open files Step 10: Print results and end.

61

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Open Wave file

Define iCPU=Number of processors

Split the input wave file into chunks where each chunk represents sub-chunks equal to iCPU for each round Create different copies of AES code for each processer iCPU using OPENMP directives

Encrypt each chunk in parallel in input file

Run the previous step for the same input file 100 times and compute the execution time for each time

Take average execution time to control the run time deviation

Save the output encrypted chunks into output wave file

Figure (3.8): Overview of the Methodology distributed over multicore system using OpenMP.

62

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

3.9 The proposed Stream Cipher Methodology The proposed algorithms PER-MIX-SBOX and PER-SHIFT-SBOX are modified to be compatible with parallel processing computation when applied to wave files. The master thread (single thread) does the initialization, after that forking the threads. Each thread works on its share of the cipher initialization. The global barrier ensures that the initialization is complete before any of the threads start on encryption. The first step of implementing these algorithms is to calculate the number of blocks in the input data size divided by the number of threads. Each thread encrypts/decrypts a pair of the block size of data simultaneously and saving these blocks in the output file. The threads then move to the next blocks till the end of the input file reached. Algorithm 3.8 pseudocode demonstrates how OpenMP directives embedded into the algorithm and multiprocessing work implemented.

63

Chapter Three

The Proposed Improvement for Algorithms in Block and Stream Cipher

Algorithm 3.8 : The parallel proposed stream cipher algorithm for the wave file Input: wave file to be encrypted/decrypted. Output: wave file to be encrypted/decrypted. Step1: read Data-Size :the total data size, MAX-THRD: which represent the maximum number to be used, length: 2,4,8 Step2: initialize the serial code part initialization(); Step3: Read BLOCKS[N*N*MAX-THRD] which represent pair of input BLOCK1[N] and BLOCK2[N] multiply by the number of processors to be used , N=1,2,…….,length.

Step4: for (block-length=1; < Data-Size; block-length++ ) Step5: Start OpenMP directives, global barrier(); start the multithread operations Step6: Encrypt(BLOCKS); Step7: end global barrier(); Step8: Save the encrypted/decrypted BLOCKS into the output files. Step9: jump to next BLOCKS; Step10: End

64

Chapter Four

Processing Results and Discussion

4.1 Introduction This chapter, demonstrates the processing results obtained by implementing the proposed algorithms described in Chapter 3, where, all programs have been executed in Microsoft Visual Studio 2010/ C++, Windows 7 with 64-bit operating system, Intel(R) Core(TM) i5 , four processors with speed 2,50 GHZ for each. This chapter is subdivided into three main parts. The first part deals with block cipher algorithms and the proposed improved algorithms to increase the complexity in the encryption/decryption process, these algorithms are:  AES Rijndael algorithm.  The proposed improved functions in AES including the following algorithms:  Dual Keys SubByte function (DK- SBOX-AES). 

Four Keys MixColumn function (4K-MIX-AES).



Five Keys ShiftRows functions (5K-SHIFT-AES).

 The proposed Improved AES algorithm. The second part deals with the stream cipher operations. Some of the functions in part one have been used and mixed with a permutation function, PER-RND-KEY, developed to get high complexity algorithms in the encryption/decryption processes. The third part presents the results obtained, after applying the OpenMP directives, on algorithms in the first and second parts to reduce the time spent on both encryption /decryption processes by utilizing different CPUs which is available in the computer.

65

Chapter Four

Processing Results and Discussion

4.2 Block Cipher It is defined in cryptography as an algorithm which are exetensivly applied to encipher groups (called blocks) of huge data .

4.2.1 The Proposed SubByte Function using Dual Keys Algorithm (DK-SBOX-AES) The proposed algorithm uses dual keys; the first key is a set of multi values up to 16 elements. Each value in the key set has another value related to it, as in AES Rijndael algorithm leading to generate different S-boxes provided that each one has its associated inverse S-Box. The following values were based on hexadecimal would represent the first key, which uses eight different values with its related constants to create different eight S-Boxes: Rnd_Key[8]={0xf1,0x67,0x25,0x85,0xb5,0xA4,0x19,0x4c} Cons_c[8]={0x63,0x82,0xc4,0x45,0xa5,0x7b,0xd5,0xc1}

Tables 4.1 and 4.3 give the S-Boxes created for the keys 0x67, 0x4C, while Tables 4.2 and 4.4 gives the inverse S-Boxes created for the above keys respectively. The values added for the two keys are 0x85 and 0xc1 to the first and second key, which

considered as an affine transformation. With

condition, each key used to create the S-box in the proposed algorithm must have reversible value during the decryption process. The second key is also a set of values which represents a random distribution of the S-boxes created by the first key. The dual keys lead to increase the complexity degree within almost near the time needed for the encryption and decryption processes in SubByte function. This is because instead of using single and fixed S-Box to each byte in the state matrix, the 66

Chapter Four

Processing Results and Discussion

proposed algorithm uses different S-Boxes to each byte for the cipher operations. The results of

the present proposed algorithm have good

cryptographic strength. This algorithm

is resistant to linear and differential

cryptanalysis which requires that the S-boxes to be known in addition to the encryption key. Table 4.5 gives a summary for a comparison between AES Rijndael and the DK-SBOX-AES algorithms with respect to number of S-Boxes, complexity, mapping using S-Box, etc. Besides, the proposed algorithm has been implemented in wave files. From the aforementioned table, It could be noticed that the complexity is improved many times compared to the AES Rijndael, this was achieved by creating different (2 to 16). Example 4.1 demonstrates the encryption process using the proposed transformation function (DK-SBOX-AES). The structure of the AES Rijndael is the same except the creation of the S-Box and its inverse are changed. Each S-Box table is represented as 8-bit, which means the total values will be 256 and each value will appear only once in the that table. In the proposed DK-SBOX-AES function, every element in the State matrix (16 elements) has a probability to mapped for another value of maximally 16! * 256! , since there are numerous S-boxes look-up tables which generated from the first key in that function. The value 256! represent the probability appearance of every element in each S-Box depends on multiplicative inverse used to calculate that table.

67

Processing Results and Discussion

Chapter Four

Table (4.1): AES S-Box for Key=0x67 and C=0x82. Y

X X

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

82

4f

F0

DE

2f

B6

AC

06

C0

FE

98

17

01

63

54

76

1

A3

36

28

C9

1B

03

48

22

43

E8

E6

4E

7D

F1

6C

9A

2

12

8A

D8

BF

D7

65

B3

B4

DA

77

D6

A4

E7

C6

46

8C

3

62

0B

23

11

24

44

E4

6A

E9

1D

3B

47

F5

39

8E

FD

4

CA

B0

86

29

AF

2A

88

EB

BC

7F

E5

08

1A

C1

0D

21

5

3A

74

78

E2

A8

0C

05

0E

30

25

A0

72

E0

F7

85

3F

6

F2

EF

D2

9D

52

71

4B

A7

45

90

75

C4

B1

EE

F6

DF

7

37

60

D9

9E

5E

FB

F4

BE

AD

94

CB

2A

10

87

A9

FF

8

32

56

9B

64

14

C2

C3

81

80

7A

42

68

13

19

A2

EA

9

09

BD

7C

DC

A5

91

53

0F

CE

69

B7

0A

D1

92

C7

4D

A

4A

CD

F9

41

6B

6F

B2

9F

97

79

C5

04

B5

CF

50

D3

B

DB

AE

51

A1

93

6E

FA

59

27

A6

38

73

95

58

C8

4C

C

BA

55

34

8B

3E

FC

99

8D

7E

5A

7B

B5

66

2B

84

02

D

61

E3

1F

IE

ED

F3

35

5B

8F

5C

20

31

2C

1C

B8

70

E

CC

16

67

96

BB

40

18

49

EC

33

AA

F8

B9

2D

9C

57

F

15

6D

89

D0

26

5D

D4

3D

5F

E1

00

DD

83

AB

3C

07

Table (4.2): Inverse S-Box for Key=0x67 and C=0x82 Y

X X

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

FA

0C

CF

15

AB

56

07

FF

4B

90

9B

31

55

4E

57

97

1

7C

33

20

8C

84

F0

E1

0B

E6

8D

4C

14

DD

39

D3

D2

2

DA

4F

17

32

34

59

F4

D8

12

43

45

CD

BC

ED

7B

04

3

58

DB

80

E9

C2

D6

11

70

BA

3D

50

3A

FE

F7

C4

5F

4

E5

A3

8A

18

35

68

2E

3B

16

E7

A0

66

BF

9F

1B

01

5

AE

B2

64

96

E

C1

81

EF

BD

B7

C9

B7

D9

F5

74

F8

6

71

D0

30

D

83

25

CC

E2

8B

99

37

A4

1E

F1

B5

A5

7

DF

65

5B

BB

51

6A

F

29

52

A9

89

CA

92

1C

C8

49

8

88

87

00

FC

CE

5E

42

7D

46

F2

21

C3

2F

C7

3E

D8

9

69

95

9D

D4

79

BC

E3

A8

A

C6

1F

82

EE

63

73

A7

A

5A

B3

8E

10

2B

94

B9

67

54

7E

EA

FD

06

78

B1

44

B

41

6C

A6

26

27

CB

05

9A

DE

EC

C0

E4

48

91

77

23

C

08

4D

85

86

6B

AA

2D

9E

BE

13

40

7A

E0

A1

98

AD

D

F3

9C

62

AF

F6

AC

2A

24

22

72

28

B0

93

FB

03

6F

E

5C

F9

53

D1

36

4A

1A

2C

19

38

8F

47

E8

D4

6D

61

F

02

1D

60

D5

76

3C

6E

5D

EB

A2

B6

75

C5

3F

09

7F

68

Processing Results and Discussion

Chapter Four

Table (4.3): AES S-Box for Key=0x4C and C=0XC1. Y

X X

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

C1

A5

25

79

65

56

9D

C8

45

C6

0A

EA

39

02

93

A7

1

83

A8

94

57

F2

C2

54

80

BD

15

F6

58

3E

D8

1C

F1

2

E0

2E

75

44

6B

0E

5C

AD

8E

2A

96

72

0B

49

B7

22

3

FF

2D

7D

E6

8C

4C

0D

10

E8

FE

4D

4A

2F

B6

D9

C0

4

51

5A

36

69

9B

6F

D5

13

42

C5

F0

2B

0F

B8

21

86

5

B0

2C

34

01

6A

DC

CE

27

A4

71

85

20

FA

D4

30

BA

6

DE

E4

61

00

9F

26

52

74

B1

E5

D1

B2

A7

19

29

84

7

55

04

88

06

87

CC

D2

B9

60

12

AC

98

1B

CB

97

3B

8

5F

68

0C

F3

EC

BE

43

C7

3A

CF

40

EB

1D

09

7E

EE

9

D6

BF

C3

82

8F

18

62

DA

A6

16

AB

D0

67

1E

B4

5E

A

AF

A0

37

46

ED

1A

A1

FB

14

C9

4F

33

90

5B

64

9C

B

73

66

99

78

E3

E7

31

76

8A

89

4B

DD

EF

8B

AA

A3

C

4E

6E

53

D3

47

3D

F7

DF

38

70

32

50

08

92

CD

3F

D

F9

FC

05

F8

1F

23

AE

8D

34

7C

7B

59

63

03

B5

DP

E

5D

17

F5

E9

B3

BB

F4

A9

E2

A2

91

CA

48

9E

FD

95

F

11

E1

28

9A

77

81

6D

41

7A

07

C4

7F

3C

6C

BC

35

Table (4.4): Inverse S-Box for Key=0x4C and C=0xC1 Y

X

X

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

0

63

53

0D

DD

71

D2

73

F9

CC

8D

0A

2C

82

36

25

4C

1

37

50

79

47

A8

19

99

E1

95

6D

A5

7C

1E

8C

9D

D4

2

5B

4E

2F

D5

D8

02

65

57

F2

6E

29

4B

51

31

21

3C

3

5E

B6

CA

AB

52

FF

42

A2

C8

0C

88

7F

FC

C5

1C

CF

4

8A

F7

48

86

23

08

A3

C4

EC

2D

3B

BA

35

3A

C0

AA

5

CB

40

66

C2

16

70

05

13

1B

DB

41

AD

26

E0

9F

80

6

78

62

96

DC

AE

04

B1

9C

81

43

54

24

FD

F6

C1

45

7

C9

59

2B

B0

67

22

B7

F4

B3

03

F8

DA

D9

32

8E

FB

8

17

F5

93

10

6F

5A

4F

74

72

B9

B8

BD

34

D7

28

94

9

AC

EA

CD

0E

12

EF

2A

7E

7B

B2

F3

44

AF

06

ED

64

A

A1

A6

E9

BF

58

01

98

6C

11

E7

BE

9A

7A

27

D6

A0

B

50

68

6B

E4

9E

DE

3D

2E

4D

77

5F

E5

F2

18

85

91

C

3F

00

15

92

FA

49

09

87

07

A9

EB

7D

75

CE

56

89

D

9B

6A

76

C3

5D

46

90

0F

1D

3E

97

DF

55

BB

60

C7

E

20

F1

E8

B4

61

69

33

B5

38

E3

0B

8B

84

A4

8F

EC

F

4A

1F

14

83

E6

E2

1A

C6

D3

D0

5C

A7

D1

EE

39

30

69

Processing Results and Discussion

Chapter Four

Table (4.5): Comparison between AES Rijndael and DK-SBOX- AES algorithms. Properties 1. 2. 3. 4. 5.

Block length Numbers of rounds Key length S-box Mapping using S-Box

6. Single round details 7. Complexity 8. Time Wave file Size=182 KB

AES Rijndael algorithm 16 bytes 10//12//14 16//24//32 Single Depends on State matrix AddRoundKey SubByte MixColum ShiftRow 256! Encryption=0.61 Sec Decryption=0.62 Sec

DK-SBOX-AES algorithm The same The same The same Multi ( 2 to16) Depends on the index of every byte in State matrix The same + Create the New S-Boxes (2 to 16)! *256! Encryption=0.76 Sec Decryption=0.76 Sec

The dual keys lead in increasing the degree of complexity within the same delay time during the encryption and decryption processes in SubByte function. The results show that the proposed S-Box transformation algorithm have good cryptographic strength. In addition, this algorithm is unaffected to linear and differential cryptanalysis which requires both the S-Boxes and encryption key to be known. Example 4.1: Encryption

operation

for

the

State

matrix

(4*4)

using

proposed algorithm DK-AES-SBOX. The first key is: Rnd_Key={0xf1,0x67,0x25,0x85,0xb5,0xA4,0x19,0x4c}, Cons_c ={0x63,0x82,0xc4,0x45,0xa5,0x7b,0xd5,0xc1},

While the second key is: Key_Enc={5,3,7,6,2,4,1,7,8,2,6,4,1,3,5,8}; 70

the

Processing Results and Discussion

Chapter Four

1. The encrypted State matrix after one round using DK- SBOX-AES. 22 6A C0 E1

State Matrix 35 41 88 97 C9 D3 DD D9

ENC. State Matrix in 1st Round 5B BF DB A8 96 2B BE 7B 6E AA 77 C2 45 05 3C AD

4D 96 DF C8

2. The final encrypted State matrix after 10 rounds using DK-AES-SBOX. Final ENC. State Matrix 3D 40 4D FD E1 CD A4 B2 B3 41 21 15 19 61 5D 7D

4.2.2 The Proposed MixColumn Function in AES using Four Keys (4K-MIX-AES) The proposed algorithm is to improve MixColumn transformation function. This is achieved by splitting the key matrix into four parts, each part represents a different key with dimension [2*2] chosen by the user. The State matrix is also subdivided into four parts, each part corresponds to one of the keys at the same position in the key matrix with [2*2] dimension. The inverse matrix can be calculated over GF(28). Figure 4.1 shows the inverse key matrix for each part in the key matrix in which the key values could be variable.

71

Processing Results and Discussion

Chapter Four 02

03

B9

68

01

01

03

01

02

03

01

02

Inverse

01

02

D1

B9

01

01

8D

8D

03

01

8C

8D

Inverse

,

,

Inverse

Inverse

8D

8D

8C

8D

B9

68

D1

B9

Figure (4.1): Four key parts and their inverse matrices.

A new MixColumn transformation has been proposed which has an inverse to each part. This transformation assistances in the increase of the cipher performance at the cost of diffusion power. To achieve unchanged throughput with purposed modification, a high complexity had been obtained with a reasonable CPU time consuming. The results of the proposed algorithm implemented in wave file of different sizes show good findings compared to AES Rijndael with respect to time. Example 4.2 and 4.3 shows the encryption process for the State matrix using four keys in the proposed function (4KMIX-AES) instead of using a single and fixed key as in the AES Rijndael, the only difference between these examples is the irreducible polynomials. Table 4.6 gives a summary of the comparison between AES Rijndael and 4KMIX-AES algorithms.

Example 4.2: Encryption operation for the State matrix (4*4) using the proposed algorithm (4K-MIX-AES). The following four keys ( in hexadecimal) are :

72

Processing Results and Discussion

Chapter Four

a. Key1= 0x01, 0x03, 0x01, 0x02. b. Key2= 0x01, 0x01, 0x03, 0x01. c. Key3= 0x01, 0x01, 0x03,0x01. d. Key4= 0x02, 0x03, 0x01,0x02. e. Irreducible polynomial m(x) = x8 + x4 + x3 +x+1.

1. The encrypted State matrix after one round using 4K-MIX-AES. State Matrix 35 41 88 97 C9 D3 DD D9

22 6A C0 E1

ENC. State Matrix 0B E4 F2 0D 24 FC 03 89

4D 96 DF C8

in 1st Round C6 17 08 2A 77 AD 8A A7

2. The final encrypted State matrix after 10 rounds using 4K-MIX-AES. 33 C5 E5 44

Final ENC. State Matrix C3 1D 5B 95 21 05 3D 4F 28 8F D0 BD

Example 4.3: Encryption operation for the State matrix (4*4) using the proposed algorithm (4K-MIX-AES). The keys are same in the previous example except the irreducible polynomial. Irreducible polynomial m(x) = x8 + x6 + x5 +x2+1. 1. The encrypted State matrix after one round using 4K-MIX-AES. 22 6A C0 E1

State Matrix 35 41 88 97 C9 D3 DD D9

ENC. State Matrix 0B E4 F2 73 24 FC 03 89

4D 96 DF C8 73

in 1st Round B8 17 08 2A 77 E3 8A D9

Processing Results and Discussion

Chapter Four

2. The final encrypted State matrix after 10 rounds using 4K-MIX-AES. 33 02 E8 CF

Final ENC. State Matrix BD 6D 83 21 E5 31 7A AE

2B 05 56 C3

Table (4.6): A comparison between AES Rijndael and 4K-MIX-AES algorithms. Properties

4K-MIX-AES

AES Rijndael

1. Block length

16 bytes

The same

2. Numbers of rounds

10//12//14

The same

3. Key length

16//24//32

The same

4. MixColum keys

Single

Four

5. Single round details

AddRoundKey, SubByte, MixColum, ShiftRow

The same + Using four keys with dimension 2*2 for each in, MixColum Function

6. Complexity

256!

4!*256!

7. Time Wave file Size=182KB

Encryption=0.61 Sec

Encryption=0.48 Sec

Decryption=0.62 Sec Decryption=0.48 Sec

4.2.3 The Proposed ShiftRows Function in AES using Five Keys in Shift Operation (5K-SHIFT-AES) The AES shift row transformation is a byte oriented as shown in Figure (4.2) while the proposed algorithm is a bit oriented, this is aimed to achieve higher complexity compared to AES Rijndael keeping the required time for encryption and decryption processes almost near the same. Such 74

Processing Results and Discussion

Chapter Four

complexity was justified by applying the 5K-SHIFT-AES using five keys. The proposed algorithm has been implemented over variable sizes of wave files. Table (4.7) gives a summary of comparison between AES Rijndael and the modified AES ShiftRow transformation (5K-SHIFT-AES) algorithms for different factors. Example 4.4 shows the encryption process for the State matrix using five keys in the proposed algorithm (5K-SHIFT-AES) instead of using a fixed key as in the AES Rijndael. The most detailed of each key is explained in section 3.4 and the description was shown in Table 3.4.

S11

S12

S13

S14

S11

S12

S13

S14

S21

S22

S23

S24

S22

S23

S24

S21

S31

S32

S33

S34

S33

S34

S31

S32

S41

S42

S43

S44

S44

S41

S42

S43

Input State Matrix

Output State Matrix after shift rows

Figure (4.2): ShiftRow Transformation in AES Algorithm.

75

Processing Results and Discussion

Chapter Four

Table (4.7): A comparison between AES Rijndael and 5K-SHIFT-AES algorithms. Properties

Standard AES

Modified AES for ShiftRows function

1. Block length

16 bytes

The same

2. Numbers of rounds

10//12//14

The same

3. Key length

16//24//32

The same

AddRoundKey, 4. Single round Details

SubByte,

+ using 5 keys selected randomly

MixColum,

within ShiftRow in each round

ShiftRow 5. ShiftRow keys 6. Complexity

7. Time wave file Size=182 KB

The same Functions

One key/one round

5 keys/one round

256!

5!* 256!

Encryption=0.61 Sec

Encryption=0.99 Sec

Decryption=0.62 Sec

Decryption=0.98 Sec

Example 4.4: Encryption operation for the State matrix (4*4) using the proposed algorithm (5K-SHIFT-AES). This function used for each round different five keys in the shift operation, the total keys for ten rounds will be fifty.

Sft_Key[50]={0x19,0xbe,0xcf,0x74,0x2a,0x6e,0x54,0xab,0xc5,0x1f, 0x33,0xbe,0x29,0x31,0x89,0x24,0x15,0x1c,0xb8,0x43, 0x8f,0xab,0xc8,0x94,0xa0,0x10,0x88,0x99,0x55,0xff, 0x93,0x42,0x11,0x60,0x70,0xaa,0x20,0x68,0xdd,0xdc, 0x92,0xcc,0x14,0x50,0x15,0x10,0x37,0x1d,0xf8,0x9c};

76

Processing Results and Discussion

Chapter Four

1. The encrypted State matrix after one round using 5K-SHIFT-AES. 22 6A C0 E1

State Matrix 35 41 88 97 C9 D3 DD D9

ENC. State Matrix 02 C5 46 2b 53 F0 18 2b

4D 96 DF C8

in 1st Round 96 52 38 80 01 49 E3 A9

2. The final encrypted State matrix after 10 rounds using 5K-SHIFT-AES. Final ENC. State Matrix 7E A1 9A A6 23 5F 97 35 A0 C2 61 D8 DB 31 95 34

4.2.4 The Proposed Improved AES Algorithm The purpose of developing the proposed improved AES algorithm, is to achieve a higher complexity compared to an AES Rijndael algorithm, and keeping the required time for encryption and decryption processes almost the same. Such complexity was justified by applying the DK-SBOX-AES, 4KMIX-AES and 5K-SHIFT-AES transformation functions. The proposed algorithm has been applied to variable length sizes of wave files. Example 4.5 gives the details of encryption process when executing the first round for the State matrix for both the AES Rijndael and the proposed improved AES. In addition, the resultant State matrix from the latest round for both the aforementioned algorithms. Table (4.8) shows a comparison between AES Rijndael and the proposed improved AES algorithms in time. While, Table (4.9) shows a summary for a comparison between the two above algorithms for different factors. From the aforementioned tables it is noticed that the

77

Processing Results and Discussion

Chapter Four

degree of complexity is increased many times compared to the AES Rijndael, but the time for cipher operations almost the same.

Example 4.5: The encryption operation for the State matrix (4*4) using both AES Rijndael and the proposed improved AES algorithm. The keys used for the proposed improved AES algorithm were given in the previous Examples 4.1, 4.2 and 4.3.

1. The encrypted State matrix after executing AddRoundKey function. 22 6A C0 E1

State Matrix 35 41 88 97 C9 D3 DD D9

ENC. State Matrix 23 16 E3 23 C1 EA 68 76

4D 96 DF C8

in 1st Round 04 2A 5A 49 96 B8 14 27

2. The encrypted State matrix after executing both and DK-SBOX-AES and SubByte transformation functions in the two algorithms. ENC. State in Rijndael- 1st Round 26 47 F2 E5 11 26 BE 3B 78 87 90 6C 45 38 FA CC

ENC. State in Imp. AES- 1st Round 5B BF DB A8 96 2B BE 7B 6E AA 77 C2 45 05 3C AD

2. The encrypted State matrix after executing both and 5K-SHIFT-AES and ShiftByte transformation functions in the two algorithms. ENC. State in Rijndael- 1st Round 26 47 F2 E5 26 BE 3B 11 90 6C 78 87 CC 45 38 FA

ENC. State in Imp. AES- 1st Round 02 C5 96 52 46 2b 38 80 53 F0 01 49 18 2b E3 A9

78

Processing Results and Discussion

Chapter Four

4. The encrypted State matrix after executing both and 4K-MIX-AES and MixColumn transformation functions in the two algorithms. ENC. State in Rijndael- 1st Round CD F3 F5 BD FA 27 98 F7 BD 8C 53 61 EB 07 89 2E

ENC. State in Imp. AES- 1st Round 0B E4 C6 17 F2 0D 08 2A 24 FC 77 AD 03 89 8A A7

5. The encrypted State matrix after AddRoundKey for both algorithms. ENC. State in Rijndael- 1st Round 8E 92 F2 98 31 CE 17 5A FE ED 54 44 20 EE 06 83

ENC. State in Imp. AES- 1st Round FD CC D0 58 31 FB EF 6C F5 83 F4 16 1F 2A EE D5

6. The final encrypted State matrix after 10 rounds using both algorithms. Final ENC. State in Rijndael 07 67 E9 AC 78 BF 67 BF 89 8E EB 39

Final ENC. State in Imp. AES BA F8 BB 7A 60 4A 12 0E D3 E7 BD 67

91 1C 2A 49

A4 5E 0A A6

Table (4.8): A comparison between AES Rijndael and improved AES in time. Time in Seconds Data size (KB)

AES Rijndael ENC

Improved AES ENC

AES Rijndael DEC

Improved AES DEC

FILE1: 182

0.61

0.76

0.62

0.76

FILE1: 540

2.28

2.3

2.18

2.25

FILE2: 1515

5.3

6.37

5.2

6.26

79

Processing Results and Discussion

Chapter Four

Table (4.9): A comparison between AES Rijndael and improved AES algorithms for different factors. Properties

Improved AES

AES Rijndael

1. Block length

16 bytes

The same

2. Numbers of rounds

10//12//14

The same

3. Key length

16//24//32

The same

4. MixColum keys

Single

Four

5. S-box

Single

Multi (2 to16)

6. Mapping using S-Box

Depends matrix

7. ShiftRow keys

One key /one round

5 keys/one round

8. Single round details

AddRoundKey, SubByte, MixColum, ShiftRow

The same + Create the New S-Boxes + Using four keys with dimension 2*2 for each in, MixColum Function

9. Complexity

256!

(2 to 16)! *4! *5! *256!

on

State Depends on the index of every byte in state matrix

4.3 The Proposed Stream Cipher Algorithms Typically, stream ciphers executes at a higher speed when it compared with the block ciphers. The use of tiny blocks of length {2*2},{4*4} in stream cipher has advantage to utilize the improved functions in AES and mix these functions with a developed permutation function (PER-RND-KEY). The aim is to achieve high complexity with a considerable time for encryption and decryption processes. Two stream algorithms have been developed and

80

Chapter Four

Processing Results and Discussion

implemented in wave files of different sizes, the first algorithm (PER-MIXSBOX) consist of three functions as follows: 4. 4K-MIX-AES. 5. PER-RND-KEY. 6. DK-SBOX-AES.

While the second algorithm (PER-SHIFT-SBOX) consists of the following functions: 4. 5K-SHIFT-AES. 5. PER- RND-KEY. 6. DK-SBOX-AES. The function PER-RND-KEY has been developed and its value randomly generated to permutate the elements in the tiny blocks of the plaintext. Table (4.10) summarized the details of the cipher operations for different factors.

81

Chapter Four

Processing Results and Discussion

Table (4.10): The proposed Stream algorithms properties . Properties 1. Text length

Random Key Permutation Stream Algorithm (2*2, 4*4, 8*8,….)

2. Numbers of rounds

No round

3. Key length

Depend on text length

4. MixColum keys

1, 4, 8 and more depend on key length

5. S-box

Multi, depend on text length

6. Mapping using S-Box

Depends on the index of every byte in tiny block

7. Shift row key

Five keys a) The PER-MIX-SBOX algorithm 1. 4K-MIX-AES 2. PER-RND-KEY 3. DK-SBOX-AES

8. Single round details b) The PER-SHIFT-SBOX algorithm 1. 5K-SHIFT-AES 2. PER-RND-KEY 3. DK-SBOX-AES , a) (2 to 16)! *4! *256!* (key length)! 9. Complexity

b) (2 to 16)! *5! *256!* (key length)!

4.4 Parallel Block Cipher Algorithms using OpenMP The proposed algorithms are key symmetric. Both

encryption and

decryption algorithms are the same. The parallel algorithm is designed with a shared memory model in which a master thread controls partitions and distributes the data. The algorithm needs the master thread to initialize the 82

Chapter Four

Processing Results and Discussion

parameters of the multi core system before assigning work. Both the serial and parallel versions are implemented using the Microsoft Visual Studio 2010 /C++ programming language combined with the OpenMP directives. Furthermore, the master thread is responsible for both input and output data file of the encryption /decryption processes when reading the original file from disk and writing the encrypted/decrypted file to the disk. The serial encryption algorithm is shown in Figure 4.3. The blocks in the shared memory are encrypted in order. Figure 4.4 shows the parallel version of the encryption propose algorithms. For example, if there are four threads in the parallel system, thread 1 encrypts the block 1, at the same time the thread 2 processes the block 2, and so on. The threads can encrypt the blocks in random order to obtain the advantage of non-dependencies between the encrypted blocks.

Figure (4.3): The serial version of the encryption algorithm

83

Chapter Four

Processing Results and Discussion

Figure (4.4): The parallel version of the encryption algorithm.

4.4.1 Parallel AES Rijndael Algorithm The improvement in the time is obtained by using the OpenMP standard for multiprocessing computation which

depends on many factors. Such

factors are file size; the larger size the greater improvement in time; number of processors, processor cycle speed, parallel programming strategy, full utilization of system resources and others. The AES Rijndael is implemented in single, dual, four CPUs for different wave file lengths. The execution times of the encryption/ decryption process are measured in seconds. Table 4.11 lists the obtained results for encryption operation, while Table 4.12 demonstrates the decryption process. Table 4.13 gives a comparison between the implementation of AES Rijndael on a single, two and four processors. The percentage value represents the improvement in time that has been reduced during the execution of this algorithm. Figures 4.5 and 4.6 clarify graphically the improved performance of the time for the AES Rijndael algorithm implemented on different processors.

84

Processing Results and Discussion

Chapter Four

Table (4.11): The Encryption time for AES Rijndael implemented on different CPU cores using key size 128 bits.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.61

0.48

0.38

File2: 540

2.28

1.6

1.1

File3: 1515

5.3

3.9

3

Table (4.12): The Decryption time for AES Rijndael implemented on different CPU cores using key size 128 bits.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.62

0.48

0.39

File2: 540

2.18

1.45

1.15

File3: 1515

5.2

3.98

3

Table (4.13): Improvement in time obtained for AES Rijndael implemented on different CPU cores using key size 128 bits.

Data size (KB) File1: 182 File2: 540 File3: 1515

Encryption process

Decryption process

1 & 2 CPUs (%)

1 & 4 CPUs (%)

1 & 2 CPUs (%)

1 & 4 CPUs (%)

21

38

23

37

30

52

33

47

26

43

23

42

85

Processing Results and Discussion

Chapter Four

AES Rijndael Encryption Algorithm Encryption Time (SEC)

6 5 4 3

1 CPU

2

2 CPU 4 CPU

1 0 182

540

1515

File Size (KB)

Figure (4.5): Performance graph of AES Rijndael Encryption Algorithm using multiple processors on a different file sizes.

Decryption Time (SEC)

AES Rijndael Decryption Algorithm 6 5 4 3

1 CPU

2

2 CPU 4 CPU

1 0 182

540

1515

File Size (KB)

Figure (4.6): Performance graph of AES Rijndael decryption algorithm using multiple processors on a different file size.

86

Processing Results and Discussion

Chapter Four

4.4.2 The Proposed SubByte Function using Dual Keys in the Encryption Algorithm (DK-SBOX-AES) Several methods for random S-box generation have been proposed in recent years. The main aim of the proposed approach is to generate the random key-independent S-boxes in AES. The aforementioned results show the generation of S-Box from different cipher keys to increase complexity. In order to provide both security and minimum time in Secure Real-time Media, transmission parallel processing is implemented in the proposed SubByte function. Tables 4.14 and 4.15 demonstrate the time obtained for both encryption/decryption processes using different file sizes. As shown in the above

tables,

the

results

of

the

proposed

encryption

algorithm

DK-SBOX-AES algorithm when implemented on different CPUs, the time needed is reduced to almost half between single and quad processors. Table 4.16 shows a comparison between the implementation of the algorithm on a single, two and four processors. The percentage value represents the improvement in time that has been reduced during the execution of the algorithm. For example, File 1 when implemented on two processors the amount of encryption time savings as a percentage is 39% and the decryption time savings is 37% compared to that implemented with one processor. Figures 4.7 and 4.8 clarify graphically the improved performance of the time for the proposed DK-SBOX-AES algorithms.

87

Processing Results and Discussion

Chapter Four

Table (4.14): The Encryption time for DK-SBOX-AES implemented on different CPU cores using key size 128 bits.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.76

0.46

0.39

File2: 540

2.3

1.4

1.1

File3: 1515

6.4

4.02

3.3

Table (4.15): The Decryption time for DK-SBOX-AES implemented on different CPU cores using key size 128 bits.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.76

0.48

0.32

File2: 540

2.3

1.4

1

File3: 1515

6.3

4.02

3.35

Table 4.16: Improvement in time obtained for DK-SBOX-AES implemented on different CPU cores using key size 128 bits. Encryption process Data size (KB)

1 & 2 CPUs (%)

Decryption process

1 & 4 CPUs 1 & 2 CPUs 1 & 4 CPUs (%) (%) (%)

File1: 182

39

49

37

58

File 2: 540

39

52

39

57

File3:1515

37

48

36

47

88

Processing Results and Discussion

Chapter Four

Encryption Time (SEC)

DK-SBOX-AES Encryption Algorithm 8 6 1 CPU

4

2 CPUs

2

4 CPUs

0 182

540

1515

File Size (KB)

Figure (4.7): Performance graph of DK-SBOX-AES Encryption algorithm using multiple processors on a different file size.

DK-SBOX-AES Decryption Algorithm Decryption Time (SEC)

7 6 5 4

1 CPU

3

2 CPUs

2

4 CPUs

1 0 182

540

1515

File Size (KB)

Figure (4.8): Performance graph of DK-SBOX-AES Decryption algorithm using multiple processors on a different file size.

89

Chapter Four

Processing Results and Discussion

4.4.3 The Proposed MixColumn Transformation in AES using Four Keys (4K-MIX-AES) The Mix Column Transformation is the most expensive operation where input matrix is multiplied Over GF(28). The key matrix is used in forward and inverse MixColumn transformation functions that operate on state matrix with dimension [4*4] which is single and fixed. In the proposed 4K-MIX-AES algorithm, instead of using a single key this fixed key is fragmented into four keys and each key finds its inversion in GF(28). In this way, the degree of complexity is increased and the number of arithmetic operations (multiplication and addition) are reduced. For each element in the State matrix, the number of multiplication operation=4 and the number of XOR=3.

Total number of multiplication operation of the State [4*4] =64, Total number of XOR operation of the State [4*4] =48

For the proposed 4K-MIX algorithm, the number of operations performed are less since the State and the key with dimensions of [2*2] as shown in equation 3.1. While, the number of multiplication operation=2 and the number of XOR=1 for each element.

Total number of multiplication operation of the State[2*2] =32, Total number of XOR operation of the State[2*2]=16

Tables 4.17 and 4.18 show the different time periods when implemented on different CPUs. If these tables are compared with Tables 4.11 and 4.12, 90

Processing Results and Discussion

Chapter Four

the improvement in time for this algorithm is better than AES Rijndael in encryption/ decryption processes. Table 4.19 shows the saving time when using the parallel processing on the same algorithm. Figures 4.9 and 4.10 clarify graphically the improved performance of the time for the proposed 4KMIX-AES algorithms.

Table (4.17): The Encryption time for 4K-MIX-AES implemented on different CPU cores using key size 128 bits.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.48

0.32

0.22

File2: 540

1.46

0.87

0.71

File3: 1515

4.09

2.48

2

Table (4.18): The Decryption time for 4K-MIX-AES implemented on different CPU cores using key size 128 bits.

Data size (KB)

Serial AES

Parallel AES

TIME (Sec)

TIME (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.48

0.31

0.22

File2: 540

1.4

0.9

0.7

File3: 1515

3.91

2.5

2

91

Processing Results and Discussion

Chapter Four

Table (4.19): Improvement in time obtained for 4K-MIX-AES implemented on different CPU cores using key size 128 bits. Encryption process Data size (KB)

Decryption process

1 & 2 CPUs %

1 & 4 CPUs %

1 & 2 CPUs %

1 & 4 CPUs %

33

54

35

54

40

51

36

50

39

51

36

49

File1: 182 File2: 540 File3: 1515

4K-MIX-AES Encryption Algorithm Encryption Time (SEC)

5 4 3 1 CPU 2

2 CPUs 4 CPUs

1 0 182

540

1515

File Size KB

Figure (4.9): Performance graph of 4K-MIX-AES encryption algorithm using multiple processors on a different file size.

92

Processing Results and Discussion

Chapter Four

4K-MIX-AES Decryption Algorithm Decryption Time (SEC)

4 3 2

1 CPU 2 CPUs

1

4 CPUs

0 182

540

1515

File Size (KB)

Figure (4.10): Performance graph of 4K-MIX-AES decryption algorithm using multiple processors on a different file size.

4.4.4 The Proposed ShiftRows Transformation in AES using Five Keys in Shift Operation (5K-SHIFT-AES) The key sets length used in the encryption/ decryption process in the proposed ShiftRow transformation is randomly selected by the user (five keys for each round). So, the number of keys set depends upon the length of the used key in AddRound function. For example, if an AES algorithm has 10 (for a 128-bit key), 12 (for a 192-bit key), or 14 rounds (for a 256-bit key) then the keys set are 50, 60 and 70 respectively. Five keys have been used to increase the complexity of the ShiftRows transformation in each round in the 5K-SHIFT-AES algorithm. Tables 4.20, 4.21 and 4.22 show different times for both encryption and decryption operations and the amount of saving time. Figures 4.11 and 4.12 clarify

93

Processing Results and Discussion

Chapter Four

graphically the improved performance of the time for the proposed 5K-SHIFTAES algorithms.

Table (4.20): The Encryption time for 5K-SHIFT-AES implemented on different CPU cores using key size 128 bits. Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.99

0.74

0.49

File2: 540

3.1

2.37

1.52

File3: 1515

8.6

6.67

4.27

Table (4.21): The Decryption time for 5K-SHIFT-AES implemented on different CPU cores using key size 128 bits. Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.98

0.79

0.53

File2: 540

3

2.27

1.52

File3: 1515

8.3

6.67

4.24

Table (4.22): Time saving percentage obtained for 5K-SHIFT-AES implemented on different CPU cores using key size 128 bits. Encryption process Data size (KB) File1: 182 File2: 540 File3: 1515

Decryption process

1 & 2 CPUs (%)

1 & 4 CPUs (%)

1 & 2 CPUs (%)

1 & 4 CPUs (%)

25

51

19

46

24

51

24

49

22

50

20

49

94

Processing Results and Discussion

Chapter Four

Encryption Time (SEC)

5K-SHIFT-AES Encryption Algorithm 10 9 8 7 6 5 4 3 2 1 0

1 CPU 2 CPUs 4 CPUs

182

540

1515

File Size KB

Figure (4.11): Performance graph of 5K-SHIFT-AES encryption algorithm using multiple processors on a different file sizes.

Decryption Time (SEC)

5K-SHIFT-AES Decryption Algorithm 9 8 7 6 5 4 3 2 1 0

1 CPU 2 CPUs 4 CPUs

182

540

1515

File Size KB

Figure (4.12): Performance graph of 5K-SHIFT-AES decryption algorithm using multiple processors on a different file size.

95

Processing Results and Discussion

Chapter Four

4.4.5 The Proposed Improved AES Algorithm Three functions have been modified in AES Rijndael (SubByte, MixColumn and ShiftRow) to achieve higher complexity keeping the required time for encryption and decryption processes almost near the same. Comparison between the performance results of the AES Rijndael and the proposed improved AES algorithm is achieved by applying both algorithms to wave files at different lengths. It is found that in spite of increasing the complexity, there is no much difference in execution time between the two schemes for encryption and decryption process. Tables 4.23, 4.24 show the obtained results, while Table 4.25 gives the improvement percentage in saving time when implemented on multiple processors. It is worth to mention that for the file size (1515 KB), the improvement percentage in saving time using 2 and 4 processors are ranging between (32-46)% in the encryption process, while they are (35-44)% for decryption process. It is obvious that the time needed for the encryption / decryption process approximately reduced to the half. Figures 4.13 and 4.14 clarify graphically the improved performance of the time for the proposed Improved-AES algorithms. Table (4.23): The Encryption time to the proposed improvement in AES algorithm.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1 : 182

0.76

0.5

0.45

File2 : 540

2.3

1.46

1.32

File3 : 1515

6.37

4.3

3.46

96

Processing Results and Discussion

Chapter Four

Table (4.24): The Decryption time to the proposed improvement in AES algorithm.

Data size (KB)

Serial AES

Parallel AES

Time (Sec)

Time (Sec)

1 CPU

2 CPUs

4 CPUs

File1: 182

0.76

0.53

0.46

File2: 540

2.25

1.45

1.35

File3: 1515

6.26

4.1

3.48

Table (4.25): Time saving percentage obtained for the proposed improvement in AES implemented on different CPU cores using key size 128 bit. Encryption process

Decryption process

Data size (KB)

1 & 2 CPUs (%)

1 & 4 CPUs (%)

1 & 2 CPUs (%)

1 & 4 CPUs (%)

File1: 182

34

41

30

39

File2: 540

37

43

36

40

File3: 1515

32

46

35

44

97

Processing Results and Discussion

Chapter Four

The Improved AES Encryption Algorithm Encryption Time (SEC)

7 6 5

4

1 CPU

3

2 CPUs

2

4 CPUs

1 0 182

540

1515

File Size KB

Figure (4.13): Performance graph of the Improved AES encryption algorithm using multiple processors on a different file sizes.

The Improved AES Decryption Algorithm Decryption Time (SEC)

7 6 5 4

1 CPU

3

2 CPUs

2

4 CPUs

1 0 182

540

1515

File Size KB

Figure (4.14): Performance graph of Improved AES decryption algorithm using multiple processors on a different file size.

98

Processing Results and Discussion

Chapter Four

Comparing the AES Rijndael with the proposed improved AES when they are implemented on audio wave files, the results show that the time required to process the encryption/decryption operations is almost close together, but the degree of complexity has been increased in the proposed improved AES several times with respect to AES Rijndael. Table 4.26 gives summarized times for the encryption process between the two algorithms implemented on different processes. While, Table 4.27 gives the same action, but for decryption process. For the file size 1515 KB, it can be shown that for both encryption and decryption process the time is reduced to a value almost close to the original using 4 CPUs in spite of the high obtained complexity (256! * (2-16)! * 4! * 5!) compared to AES Rijndael (256!) which is considered to be a good achievement.

Table (4.26): The comparison between AES Rijndael and the proposed improved AES encryption execution time.

Data size

1 CPU time (Sec)

(KB) AES

Improved AES

2 CPUs time (Sec)

AES

Improved AES

4 CPUs time (Sec)

AES

Improved AES

File1: 128

0.61

0.76

0.48

0.5

0.38

0.45

File2: 540

2.28

2.3

1.6

1.46

1.1

1.32

File3: 1515

5.3

6.37

3.9

4.3

3

3.46

99

Processing Results and Discussion

Chapter Four

Table (4.27): The comparison between AES Rijndael and the proposed improved AES decryption execution time.

Data size

1 CPU time (Sec)

(KB) AES

Improved AES

2 CPUs time (Sec)

AES

Improved AES

4 CPUs time (Sec) Improved

AES

AES

File1: 128

0.62

0.76

0.48

0.53

0.39

0.46

File2: 540

2.18

2.25

1.45

1.45

1.15

1.35

File3: 1515

5.2

6.26

3.98

4.1

3

3.48

4.4.6 The Proposed Parallel Stream Cipher Algorithms In stream cipher two algorithms (PER-MIX-SBOX and PER-SHIFTSBOX) have been developed and implemented on different CPUs as mentioned before. Table 4.28 displays the results that have been obtained through the application of the above mentioned algorithms. The decryption process works in reverse order for both algorithms. Table 4.29 shows the obtained results.

Table (4.28): The proposed Stream cipher algorithms encryption time. Data size (KB)

1 CPU time (Sec) PER-MIXSBOX

PER-SHIFT-

2 CPUs time (Sec)

4 CPUs time (Sec)

PER-MIX-

PER-MIX-

SBOX

SBOX

PER-SHIFTSBOX

SBOX

PER-SHIFTSBOX

File1 : 128

0.058

0.041

0.085

0.077

0.037

0.047

File2 : 540

0.174

0.13

0.27

0.27

0.13

0.09

File3 : 1515

0.468

0.35

0.76

0.69

0.32

0.28

100

Processing Results and Discussion

Chapter Four

Table (4.29): The proposed Stream cipher algorithms decryption time. 1 CPU time (Sec)

2 CPUs time (Sec)

4 CPUs time (Sec)

PER-MIX-

PER-MIX-

Data size (KB) PER-MIXSBOX

PER-SHIFTSBOX

SBOX

PER-SHIFTSBOX

SBOX

PER-SHIFTSBOX

File1: 128

0.057

0.039

0.084

0.76

0.038

0.048

File2: 540

0.171

0.12

0.28

0.24

0.153

0.098

File3: 1515

0.457

0.32

0.75

0.7

0.34

0.28

From the results of tables 4.28 and 2.29 the following point may be drawn: a. Algorithm PER-MIX-SBOX had better time performance than algorithm PER-SHIFT-SBOX when both algorithms implemented on more than a single processor. This is because of the multiplication operations in the first algorithm is the most time consuming, therefore when use more than one processor the time of implementation will be improved. b. For both algorithms when implemented using two processors, the time obtained was not better when executing on one processor. This is due to the extra overhead work time when implemented the fork and join in OpenMp technique, this time is reduced when more processors are used because it is distributed among them. The two algorithms have better performance on 4 processors than 2.

c. There is no big difference in time for the algorithm PER-SHIFT-SBOX when it is performed using 4 processors compared to one processor.

Figures 4.15 and 4.16 clarify graphically the improved performance of the time (encryption/ decryption) for the PER-MIX-SBOX algorithm. 101

Processing Results and Discussion

Chapter Four

PER-MIX-SBOX encryption algorithm Enryption Time (SEC)

0.8 0.7 0.6 0.5 0.4

1 CPU

0.3

2 CPUs

0.2

4 CPUs

0.1 0 182

540

1515

File Size (KB)

Figure (4.15): Performance graph of PER-MIX-SBOX encryption algorithm using multiple processors on a different file size.

PER-MIX-SBOX decryption algorithm Decryption Time (SEC)

0.8

0.7 0.6 0.5 0.4

1 CPU

0.3

2 CPUs

0.2

4 CPUs

0.1 0

182

540

1515

File Size (KB)

Figure (4.16): Performance graph of PER-MIX-SBOX decryption algorithm using multiple processors on a different file size.

102

Chapter Five

Conclusions and Recommendations

5.1 Conclusions Based on the results of this study. The main conclusions can be summarized as follows: 1. The AES Rijndael transformation functions (SubByte, MixColumn and ShiftRows) have been modified to increase the complexity in a block cipher in the same range in the finite field GF(28). The modified functions are:  The Dual Keys SubByte function (DK- SBOX-AES).  Four Keys MixColumn function (4K-MIX-AES).  Five Keys ShiftRows functions (5K-SHIFT-AES).

2. After replacing and executing each of the modified functions in the original AES Rijndael functions, the following properties are obtained:

a. The degree of complexity for the AES Rijndael algorithm is considered as (256!), while the degree of complexity in the proposed improved AES algorithm is (256! * (2 to 6)! * 4! * 5!) b. The time taken to process the encryption and decryption operations increases by a slight value is added compared to the original AES Rijndael algorithm.

3. Increasing the complexity of any algorithm led to cost increases the time processing for such algorithm, to reduce such time, the OpenMP directives are used to reduce the encryption and decryption time in the proposed algorithms.

103

Chapter Five

Conclusions and Recommendations

4. It is worth to mention that, for example, for the file size of 1515 KB, for the proposed improved AES algorithm, the improvement percentage in saving time using 2 and 4 processors is ranging between (32-46) % in the encryption process. While they are about

(35-44) %

in the decryption process for the same algorithm. Therefore, It can be stated that the time is reduced to approximately one half. 5. The ideal speedup for the OpenMP directives are depend on the number of threads used in the system. Choosing the correct parallelization strategy is very important. The correct choice can alter the execution time from ideal times to times that are even poorer than the sequential time.

104

Chapter Five

Conclusions and Recommendations

5.2 Recommendations The following suggestions may be drawn as topics for future research work: 1. In this study the operation is performed in the finite field GF (28) It is possible to increase the range of other values, e.g. GF (216), to increase the degree of complexity in the performed mathematical operations.

2. Another type of audio files other than Wave type could be used like MP3, which is a compressed file type. The proposed algorithms could be implemented in the compressed type and compared the results related to time. 3. Different language types could be used to parallizes the sequential programs other than the OpenMP directives in C++ language, e.g., VB.NET framework. A comparison could be made between the two languages when converting the sequential program to the parallel version.

4. Instead of data parallelism, task parallelism could be used for algorithms parralazation in OpenMP, the results between the two approaches could be compared.

5. Other data types could be implemented like text, video and image files.

105

References [Aly 14] Alyasseri, Z. A. A., Al-Attar, K., Ismail, N., M., 2014, “Parallelize Bubble Sort Algorithm Using OpenMP”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 1, January. [Akt 06] Akhter, S. and Roberts, J., 2006, “Multi-Core Programming: Increasing Performance through Software Multi-Threading”, Intel Press Corporation. 321 P. [ARB 11] ARB version 3.1, 2011, “OpenMP Application Program Interface”, 346 P. [ARB 13] ARB version 3.1, 2013, “OpenMP Application Program Interface”, Version 4.0 - July 2013.320 P. [ARB 14] ARB, OpenMP Architecture Review Board, 2014, “OpenMP Application Program Interface Examples”, Version 4.0.1, 226 P. [Azi 07] Aziz, W. M., 2007,”Fast Fractal Audio Compression Using DCT Descriptors”, M.Sc. Thesis, Al-Nahrain University, College of Science. [Bar ]

Barney, B., "OpenMp". Lawrence Livermore National Laboratory. https://computing.llnl.gov/tutorials/openMP.

[Bar 12] Barnes, A., Fernando, R., Mettananda, K. and Ragel, R., 2012, “Improving the Throughput of the AES Algorithm with Multicore Processors”, IEEE 7th International Conference on Industrial and Information Systems, Aug. 2012, pp.1-6. [Bed 13] Beden A. K., 2013. “Speeding-up Fractal Audio Compression Using Tiling and Complementary Moments”, M.Sc. Thesis in Computer Science, College of Science- University of Baghdad. [Bie 05] Bielecki, W. and Burak, D., 2005, " arallelization of the AES Algorithm", Proceedings of the 4th WSEAS Int. Conf. on Information Security, Communications and Computers, Tenerife, Spain, December 16-18, 2005, pp. 224-228. [Bos 03] Bosi, M. and Goldberg, R. E., 2003, “Introduction to Digital Audio Coding and Standards”, Kluwer Academic Puplisher, Text book.

106

References [Cha 08] Chapman, B., Jost, G., Pas R. van der, 2008. "Using OpenMP Portable Shared Memory Parallel Programming", The MIT Press Cambridge, Massachusetts London, England. Text book. [Cha 09] Chapman, B., Huang, L., Biscondi, E., Stotzer, E., Shrivastava, A. and Gatherer, A., 2009,"Implementing OpenMP on a high performance embedded multicore MPSoC", IEEE International Symposium on Parallel & Distributed Processing, pp.1-8, IEEE Conference Publications. [Dae 03] Daemen, J. and Rijmen V., 2003, "AES Proposal: Rijndael", document version 2, Date, Modified on 9/04/2003 [Das 12] Das, I. , Nath, S., Roy, S. and Mondal, S. , 2012, "Random S-Box generation in AES by changing irreducible polynomial", International Conference on Communications, Devices and Intelligent Systems (CODIS), 28-29 Dec. 2012, Kolkata , pp. 56 – 559, IEEE. [Des 11] Deshmukh, A., 2011,"Writing Parallel Processing Compatible Engines Using OpenMP", Cytel Incorporation India, PhUSE 2011, Paper CC02, Seventh Annual Conference, Brighton , 9th to 12th October 2011, pp.1-8. [Gra 10] Graham, J., Jones, S. and Wirtzfeld, J., (2010), "AES Voice Encryption", Group 228, Final Report, 26P. [Hoe 06] Hoeflinger J. P., 2006, "Extending OpenMP to Clusters", White Paper, Intel Corporation, 10 P. [Hos 12] Hosseinkhani, R. and Javadi, H. H. S., 2012, "Using Cipher Key to Generate Dynamic S-Box in AES Cipher System", International Journal of Computer Science and Security (IJCSS). Vol. 6, Issue 1, pp.19-28. [Jun 05] Junod, P., 2005, "Statistical Cryptanalysis of Block Ciphers", Ph.D thesis, Institue of System and Communication, University of Lausanne. [Lam 13] Lambić, D. and Živković, M., 2013,"Comparison of Random SBox Generation Methods", Publications DEL’Institute Mathematique Nouvelle série, tome 93 (107), pp. 109-115. 107

References [Lid 00] Lidl R. and Niederreriter H., 2000, "Introduction to finite fields and their applications", Cambridge University Press, Text Book. [Mad 12] Madhuri, O.B.B. , Rambabu, E. and Murali, M. 2010, " Design and Implementation of Arithmetic Unit for GF(2m) " , International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Vol. 1, Issue 9, pp. 185191. [Man 04] Manteen, R., 2004, "A VHDL Implemetation of the Advanced Encryption Standard-Rijndael Algorithm ", University South Florida , College of Engineering, Ms. thesis.

of

[Man 10] Manchanda, N., Anand, K., 2010, " Non-Uniform Memory Access (NUMA)", New York University. [Mar 11] Marshall, D., 2011. "Parallel Programming with Microsoft Visual Studio 2010 Step by Step", O’Reilly Media, Inc. , 226 P., Text book. [Mat 12] Matloff, N., (2012), "Programming on Parallel Machines", CUDA and NVIDIA, 324 P. [Nag 14] Nagendra M. and Sekhar M. C., 2014, "Performance Improvement of Advanced Encryption Algorithm using Parallel Computation", International Journal of Software Engineering and Its Applications, SERSC, Vol. 8, No. 2, pp.287-296. [Nal 07] Nalini, C. , Anandmohan, P.V., Poornaiah, D.V. and Kulkarni, V.D., 2007, "Optimized S-BOX Design for AES Core", IET-UK International Conference on Information and Communication Technology in Electrical Sciences (ICTES 2007), India, Dec. 20-22, 2007, pp. 843-849. [Nav 13] Navalgund, S. S., Desai, A., Ankalgi, K. and Yamanur, H., 2013, "Parallelization of AES Algorithm Using OpenMP", Lecture Notes on Information Theory, Engineering and Technology Publishing, Vol. 1, No. 4, December 2013. [NIST 01] NIST, 2001,"Advanced Encryption Standard (AES)", Federal Information Processing Standards Publication (FIPS PUB) 197.

108

References [Phi 08] Philos, G. C., Dimakopoulos, V. V. and Hadjidoukas, P. E., 2008, "A runtime system architecture for ubiquitous support of OpenMP ", The conference of International Symposium on Parallel and Distributed Computing, pp. 189-196, IEEE. [Rah 11] Rahma, A. M. S. and Yacob, B. Z., 2011, “The Dynamic Dual Key Encryption Algorithm Based on joint Galois Fields", IJCSNS International Journal of Computer Science and Network Security, Vol. 11, No. 8., pp. 190-199. [Rah 12] Rahma, A. M. S. and Yacob, B. Z., 2012, "Real-Time Partial Encryption of Digital Video Using Symmetric Dynamic Dual Keys Algorithm (SDD) ", Eng. & Tec. Journal, Vol. 30, No. 5. Eng.&Tech. Journal ,Vol. 30 , No.5, Eng.&Tech. Journal ,Vol. 30 , No.5, 2012

[Rau 10] Rauber, T. and R ̈ nger, G., (2010), "Parallel Programming for Multicore and Cluster Systems", Springer-Verlag Berlin Heidelberg, 455 P., Textbook. [Saj 13] Sajadieh,M., Dakhilalian, M., Mala, H. and Sepehrdad, P., 2013, "Effcient Recursive Diffusion Layers for Block Ciphers, and Hash Functions" ,Provider: citeseer. [Sal 07]Salomon, D., 2007, "Data Compression", Springer; United Kingdom, London, 4th Edition, Text book. [Sav 10] Savas, E. and Koc, C. K. , (2010), "Finite Field Arithmetic for Cryptography", IEEE Circuits and Systems Magazine, 56 P. [Sch 96] Schneier, B., 1996,"Applied Cryptography - Protocols, Algorithms, and Source Code in C", 2nd edition, John Wiley and Sons, Inc., NewYork. [Sri 12] Srivastavaa, S., Singh, A. K. and Nandi, G.C., (2012)," Inter Cipher Block Diffusion: A Novel Transformation for Proposed Parallel AES", 2nd International Conference on Communication, Computing & Security [ICCCS-2012], Procedia Technology 6, Elsevier, pp. 872–879. [Sta 11] Stallings W., 2011,"Network Security Essentials, Applications and Standards ", Fourth edition, Pearson Education, Inc., 417P. Text book.

109

References [Sta 12] Stallings, W, 2012, "Cryptography and Network Security Principles and Practice", Fifth Edition, Prentice Hall. [Suo 10] Suod, A. T. and Sagheer, A. A., (2010), " Counter Mode Dvelopment For Block Cipher Operationd", Journal of University of Anbar for pure science, Vol. 4, No. 1. [Tal 06] Talbot, J. and Welsh, D., 2006, "Complexity and Cryptography An Introduction", Cambridge University Press. Text book. [Uus 07] Uusheikh, 2007, ―Begin Parallel Programming With OpenMP‖, Code Project web site, http://www.codeproject.com/Articles/19065/Begin-Parallel- ProgrammingWith-OpenMP. [Wag 10] Waggoner, B., 2010, "Compression for Great Video and Audio", 2nd Edition, Elsevier, Inc., Text book. [Wat 01] Watkinson, J., 2001,“The Art of Digital Audio Third Edition”, Plant A Tree, 752 P., Text book. [Yac 12] Yacob, B. M., 2012, "An Improved Algorithm for Partial Cryptography of Digital Video", University of Zakho, Ph.D Thesis. [Zha 07] Zhang, L. and Howard, H. M., 2007,"Hardware Design and Analysis of Statistical Cipher Feedback Mode Using Serial Transfer", IEEE Canadian Conference on Electrical and Computer Engineering, pp.1133-1136.

[WebSite 1] http://en.wikipedia.org/wiki/Advanced_Encryption_Standard. [WebSite 2] https://ccrma.stanford.edu/courses/422/projects/WaveFormat. [WebSite 3] http://en.wikipedia.org/wiki/OpenMP. [WebSite 4] https://computing.llnl.gov/tutorials/openMP. [WebSite 5] http://msdn.microsoft.com/en-us/magazine/cc163717.aspx [WebSite 6] http://bisqwit.iki.fi/story/howto/openmp/ [WebSite 7] http://www.pgroup.com/products/freepgi/freepgi_ref/ch01.htm

110

Finite Fields

Appendix A A.1 Introduction

Arithmetic in a finite field is different from standard integer arithmetic. There are a limited number of elements in the finite field; all operations performed in the finite field result in an element within that field. There are infinitely many different finite fields; their number of elements (which is also called cardinality) is necessarily of the form pm where p is a prime number and m is a positive integer, and two finite fields of the same size are isomorphic. The prime p is called the characteristic of the field, and the positive integer m is called the dimension of the field over its prime field. Addition, multiplication, division, exponentiation and inverse multiplication are the most basic arithmetic operations in a finite field. Addition and subtraction are performed by adding or subtracting two of these polynomials together, and reducing the result modulo the characteristic [Mad 12].

A.2 Finite Fields Operations of the Form GF(2n) and Irreducible Polynomial Let p be a prime number. The integers mod p, consisting of the integers {0, 1,2, . . ., p−1} with addition and multiplication performed mod p, is a finite field of order p. The finite field of order pn is generally written as GF(pn); GF stands for Galois Field. A particular case of Finite Field that this study is interesting when the prime (p) = 2, it is conventional to express elements of GF(2n) as binary numbers. Finite fields of order 2n are called binary fields or characteristic-two finite fields. One way to construct GF(2n) is to use a polynomial basis representation. Here, the elements of GF(2n) are the binary polynomials (polynomials whose coefficients are in the field GF(2) = {0,1}) of degree at most n −1. A polynomial f(x) in GF(2n) is presented as in equation A.1 which can be uniquely represented by its n binary coefficients (an-1an-2. . . a0) [Sta 12]. ( )



…… (A.1)

Thus, every polynomial in GF (2n) can be represented by an n-bit number. Irreducibility of f (x) of degree less than n means that f(x) cannot be factored as a product of binary polynomials [Tal 06].

111

Finite Fields

Appendix A A.3 Addition

The addition operation is the most fundamental arithmetic operation in finite fields, on which all other arithmetic operations are based. The addition of two finite field elements is achieved by adding the coefficients for corresponding powers in their polynomial representations. This addition is being performed in GF(2), that is, mod 2, so that 1 + 1 = 0. Consequently, the addition and subtraction are both equivalent to an exclusive-or ―XOR‖ operation of the n-bits that represent the field elements of GF(2n). The following example shows the addition of two polynomials in a finite field GF(2n), and their equivalent representations in the binary and hexadecimal system [Tal 06]. Tables A.1 and A.2 represent, respectively the addition and the additive inverse in GF(23). Example A.1 The addition of two polynomials in a finite field GF(23) )( ) Polynomial: ( Binary: {111} + {101} = {010}. Hexadecimal: {7} + {5} = {2}. Table A.1: Addition in GF(23) [Sta 12].

112

Finite Fields

Appendix A

Table A.2: Addition Inverse in GF(23) [Sta 12].

A.4 Multiplication Finite field multiplication is more difficult than the addition which is achieved by multiplying the two polynomials for the two elements concerned and collecting them like powers of x in the result. If multiplication result in a polynomial is of a degree greater than n-1, then the polynomial is reduced by module of some irreducible polynomial m(x) of degree n. That is, it is divided by m(x) and kept the remainder [Rah 12]. As mentioned before, the definition of irreducible polynomial is a polynomial f(x) over a field F is called irreducible if and only if f(x) cannot be expressed as a product of two polynomials. With conditions, both polynomials over F, and both degrees are lower than that of f(x) (no of which in of degree zero) [Sta 12]. Since each polynomial for 3 bit block can have the powers of x up to 3, the multiplication result can have the powers of x up to 6 and will no longer fit within a 3bits form. For 4, 5, and 6 bit block will be the same and powers of x will be 8, 10, and 12 respectively. This situation is handled by replacing the result with the remainder polynomial after division by a special order irreducible polynomial [Rah 11]. The following are irreducible polynomials for some degrees [Yac 12]:  Irreducible polynomial of degree 3 there are only two such polynomials: ( ) ( )  Irreducible polynomial of degree 4, there are three: ) ( ) ( )(

113

Finite Fields

Appendix A  Irreducible polynomial of degree 5 there are only six ( )( )( )( ( ) (

) )

 Irreducible polynomial of degree 6, there are nine: ( )( )( )( )( )( )( )( ) ( ) To construct the multiplication table in the finite field GF(2n), it requires choosing an irreducible polynomial of degree n. The number of rows and columns in the table represents the number of elements in the GF(2n) that stand for as a polynomial. The table is constructed by multiplying each row number with each column, and then the result is stored in a location represented by the row and the column number which are multiplied by each other. But if the result of the multiplication operation in a polynomial of degree is greater than n-1, then the polynomial is divided by choosing an irreducible polynomial of m(x) and kept the remainder as a result. Table A.3 represents multiplication in the finite field GF(23) [Yac 12]. Table A.3: Multiplication in GF(23) with the Irreducible Polynomial ( ) ( ) [Sta 12].

A.5 Finding the Multiplicative Inverse In mathematics, a multiplicative inverse for a number x, denoted by 1/x or x , is a number when multiplied by x yields the multiplicative identity 1. Each element of the finite field sets other than 0 has a multiplicative inverse. The −1

114

Finite Fields

Appendix A

Euclidean algorithm can be adapted to find the greatest common divisor (GCD) of two polynomials. While, the extended Euclidean algorithm can be adapted to find the multiplicative inverse of a polynomial. Specifically, the latter algorithm will find the multiplicative inverse of b(x) mod m(x) if the degree of b(x) is less than the degree of m(x) and gcd[m(x), b(x)] = 1. If m(x) is an irreducible polynomial, then it has no factor other than itself or 1 so that gcd[m(x), b(x)] = 1. The multiplicative inverse table of GF(2n) could be found directly from multiplication table of GF(2n) [Sta 12]. Table A.4 represents the multiplicative inverse in finite field GF(23) with irreducible polynomials ( ) ( ) Table A.4: Multiplicative Inverse in GF(23) with the irreducible Polynomial ( ) ( ) [Sta 12].

115

Appendix B

Parallel Computing using OpenMP

B.1 Introduction The collection of compiler directives, library routines, and environment variables described in this chapter collectively define the specification of the OpenMP Application Program Interface (OpenMP API) for shared-memory parallelism in C, C++ and FORTRAN programs for Windows systems. This specification provides a model for parallel programming that is portable across shared memory architectures from different vendors. Compilers from numerous vendors support the OpenMP API. More information about the OpenMP API can be found at the following web site: http://www.openmp.org The directives, library routines, and environment variables defined in this chapter allow users to create and manage parallel programs while permitting portability. The directives extend the C, C++ and FORTRAN base languages with single program multiple data (SPMD) constructs, tasking constructs, device constructs, work sharing constructs, and synchronization constructs, and they provide support for sharing and privatizing Data [ARB 13].

B.2 OpenMP Setting in Microsoft Visual Studio OpenMP is considered as an explicit parallelism (not automatic) programming model, offering the programmer full control over parallelisation. To use OpenMP with Visual Studio, it will require: 1. Visual Studio Professional software or any other updated softwares. 2. A multi-processor or multi core system to get a speed enhancement. 3. An algorithm to parallelize. The following steps explain OpenMP setting in Microsoft Visual Studio: 1. Include 2. Enable OpenMP compiler switch in Project Properties 3. That's all. Figure B.1 shows the setting of OpenMP Project Properties

116

Appendix B

Parallel Computing using OpenMP

Figure B.1: Enabling OpenMP in Visual Studio [Uus 07].

B.3 OpenMP Directives In C/C++, OpenMP directives are specified by using the #pragma mechanism provided by the C and C++ standards. The syntax of an OpenMP directive is formally specified as follows [WebSite 5]: #pragma omp directive-name [clause[ [,] clause]...] Any function prototypes and types in the file: #include  The parallel pragma directive The parallel pragma starts a parallel block. It creates a team of N threads, where N is determined at runtime, usually from the number of CPU cores  The parallel for Clause The for directive splits the for-loop so that each thread in the current team handles a different portion of the loop [WebSite 6].

117

Appendix B

Parallel Computing using OpenMP

 Scheduling Directives There are two types of scheduling directives, static and dynamic, static is the default schedule, while the dynamic schedule there is no predictable order in which the loop items are assigned to different threads.  Ordering Clause The order in which the loop iterations are executed is unspecified, and depends on runtime conditions. However, it is possible to force that certain events within the loop happen in a predicted order, using the ordered clause [WebSite 7].  Critiall Clause The enclosed code block will be executed by only one thread at a time, and not simultaneously executed by multiple threads. It is often used to protect shared data from race conditions.  Atomic Clause The memory update (write, or read-modify-write) in the next instruction will be performed atomically. It does not make the entire statement atomic; only the memory update is atomic.  Barrier Clause Each thread waits until all of the other threads of a team have reached this point. A work-sharing construct has an implicit barrier synchronization at the end .The barrier directive causes threads encountering the barrier to wait until all the other threads in the same team have encountered the barrier[WebSite 6].  Nowait Clause Specifies that threads completing assigned work can proceed without waiting for all threads in the team to finish. In the absence of this clause, threads, encounter a barrier synchronization at the end of the work sharing construct  The single and master Clauses The single directive specifies that the given statement/block is executed by only one thread. It is unspecified which thread. Other threads skip the statement/block and wait at an implicit barrier at the end of the construct.

118

Appendix B

Parallel Computing using OpenMP

 The private and shared Clauses In the parallel section, it is possible to specify which variables are shared between the different threads and which are not. By default, all variables are shared except those declared within the parallel block [WebSite 6].  Firstprivate Clause Firstprivate is a special case of private. Initializes each private copy with the corresponding value from the master thread .  Lastprivate Clause Lastprivate passes the value of a private from the last iteration to a global variable.  The default Clause The most useful purpose of the default clause is to check whether you have remembered to consider all variables for the private/shared question, using the default(none) setting. The default clause can also be used to set that all variables are shared by default (default(shared)) [WebSite 6].  Threadprivate Clause Makes global data private to a thread, Threadprivate variables can be initialized using COPYIN clause.

B.4 Library Routines OpenMP also has a set of runtime routines that are useful for writing OpenMP applications. There are three broad classes of routines available: execution environment routines, lock/synchronization routines, and timing routines. The following list summarizes the runtime library calls[WebSite 7].  omp_get_num_threads Returns the number of threads in the team executing the parallel region from which it is called. When called from a serial region, this function returns 1. A nested parallel region is the same as a single parallel region.

119

Appendix B

Parallel Computing using OpenMP

 omp_set_num_threads Sets the number of threads to use for the next parallel region. This or function can only be called from a serial region of code.  omp_get_thread_num Returns the thread number within the team. The thread number lies between 0 and omp_get_num_threads()-1.  omp_get_max_threads The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team [ARB 13].  omp_get_num_procs The omp_get_num_procs routine returns the number of processors available to the device [ARB 13].  Lock Routines The OpenMP runtime library includes a set of general-purpose lock routines that can be used for synchronization. These general-purpose lock routines operate on OpenMP locks that are represented by OpenMP lock variables. The simple lock routines are as follows: • The omp_init_lock routine initializes a simple lock. • The omp_destroy_lock routine uninitializes a simple lock. • The omp_set_lock routine waits until a simple lock is available, and then sets it. • The omp_unset_lock routine unsets a simple lock. • The omp_test_lock routine tests a simple lock, and sets it if it is available.  Timing Routines These routines are supported for portable wall clock timer.  The omp_get_wtime routine returns elapsed wall clock time in seconds.  The omp_get_wtick routine returns the precision of the timer used by omp_get_wtime. 120

Appendix B

Parallel Computing using OpenMP

B.5 Environment Variables The names of the environment variables must be upper case. The values assigned to the environment variables are case insensitive and may have leading and trailing white space. Modifications to the environment variables after the program has started, even if modified by the program itself, are ignored by the OpenMP implementation. However, the settings of some of the ICVs(Internal Control Variables) can be modified during the execution of the OpenMP program by the use of the appropriate directive clauses or OpenMP API routines. The environment variables are as follows[[ARB 13]: • OMP_SCHEDULE sets the run-sched-var ICV that specifies the runtime schedule type and chunk size. It can be set to any of the valid OpenMP schedule types. • OMP_NUM_THREADS sets the nthreads-var ICV that specifies the number of threads to use in parallel regions. • OMP_DYNAMIC sets the dyn-var ICV that specifies the dynamic adjustment of threads to use in parallel regions. • OMP_PROC_BIND sets the bind-var ICV that controls the OpenMP thread affinity policy. • OMP_PLACES sets the place-partition-var ICV that defines the OpenMP places that are available to the execution environment. • OMP_NESTED sets the nest-var ICV that enables or disables nested parallelism. • OMP_STACKSIZE sets the stacksize-var ICV that specifies the size of the stack for threads created by the OpenMP implementation. • OMP_WAIT_POLICY sets the wait-policy-var ICV that controls the desired behavior of waiting threads. • OMP_MAX_ACTIVE_LEVELS sets the max-active-levels-var ICV that controls the maximum number of nested active parallel regions. • OMP_THREAD_LIMIT sets the thread-limit-var ICV that controls the maximum number of threads participating in the OpenMP program. • OMP_CANCELLATION sets the cancel-var ICV that enables or disables cancellation. • OMP_DISPLAY_ENV instructs the runtime to display the OpenMP version number and the initial values of the ICVs, once, during initialization of the runtime. 121

Appendix B

Parallel Computing using OpenMP

• OMP_DEFAULT_DEVICE sets the default-device-var ICV that controls the default device number.

B.6 If Clause The if clause is supported on the parallel construct only, where it is used to specify conditional execution. Since some overheads are inevitably incurred with the creation and termination of a parallel region, it is sometimes necessary to test whether there is enough work in the region to warrant its parallelization.

B.7 Sections Construct The sections construct is a non-iterative worksharing construct that contains a set of structured blocks that are to be distributed among and executed by the threads in a team. Each structured block is executed once by one of the threads in the team in the context of its implicit task [ARB 13].

122

‫الــخـالصــة‬ ‫اٌ انهذف انشئٍسً يٍ هزِ انذساست هى حىفٍش انبث اَيٍ فً حطبٍماث انىلج انذمٍمً نهًهفاث‬ ‫انصىحٍت ] بانخذذٌذ ‪ [ WAVE‬وكزنك صٌادة دسجت انخعمٍذ فً انعًهٍاث انشٌاضٍت انخاصت بانخشفٍش‬ ‫وفك انخشفٍش فً انًجاالث انًذذودة ) نهذمم )‪ (GF(28‬عهى هزِ انًهفاث‪ .‬وٌخذمك هزا يٍ خالل حُفٍز‬ ‫هٍكهٍاث يخخهفت‪ ،‬طشق كزنك انخىاسصيٍاث انًمخشدت سىاء فً هٍكهٍت انكخم او االَسٍابٍت نهخشفٍش نضٌادة‬ ‫دسجت انخعمٍذ‪ .‬ونهخُفٍز انسشٌع نعًهٍاث انخشفٍش باالضافت انى االسخخذاو االيثم نهًىاسد فً خىاسصيٍاث‬ ‫انخشفٍش انًمخشدت‪ .‬حى حُفٍز انذىسبت انًخىاصٌت باسخخذاو اٌعاصاث ‪ OpenMP‬نهخمهٍم يٍ ولج انخُفٍز‬ ‫نهزِ انخىاسصيٍاث‪.‬‬ ‫ولذ اعخبشث خىاسصيٍت ‪ AES Rijndael‬كماعذة نهخىاسصيٍاث انًطىسة فً َظايً انكخم‬ ‫واالَسٍابً‪ .‬ونضٌادة دسجت انخعمٍذ فً عًهٍاث انخشفٍش ‪ ،‬فاٌ دوال انخذىٌم فً خىاسصيٍت ‪Rijndael‬‬ ‫‪ MixColumn ،SubByte) AES‬و‪ (ShiftRows‬حى حطىٌشهى انى دوال جذٌذة‬

‫)‪DK-‬‬

‫‪ 4K-MIX-AES ،SBOX-AES‬و‪ (5K-SHIFT-AES‬عهى انخىانً ‪ .‬وباسخخذاو دانت‬ ‫‪ AddRoundKey‬وباالضافت انى انذوال انخً حى ركشها سابما حى حكىٌٍ خىاصيٍت ‪Improved‬‬ ‫‪ AES‬دٍث حى انذصىل عهى دسجت حعمٍذ عانٍت فً انخىاسصيٍت انًمخشدت نم ‪ AES‬انًطىسة !‪(256‬‬ ‫)!‪* (2-16)! * 4! * 5‬وصادث عذة يشاث يماسَت يع ‪ ، (256!)AES Rijndael‬يع صٌادة نٍسج‬ ‫كبٍشة فً ولج انخُفٍز نعًهٍاث انخشفٍش وفخخ شفشة انبٍاَاث‪.‬‬ ‫وخهصج هزِ انذساست إنى أٌ اٌعاصاث ‪ OpenMP‬نذٌها انمذسة عهى صٌادة سشعت انبشَايج‪ ،‬وانذذ‬ ‫يٍ حكانٍف انخصًٍى‪/‬انخطىٌش وانىلج انالصو نطشح يثم هزِ األَظًت ‪ .‬اٌ حسشٌع انبشايج انًخىاصٌت‬ ‫ٌؤدي انى حمهٍم انىلج إنى انُصف او الم باالعخًاد عهى عذد انًعانجاث انًسخخذيت‪ ،‬نكم يٍ عًهٍخً‬ ‫انخشفٍش ‪ /‬فك انخشفٍش فً انخىاسصيٍاث انًمخشدت‪.‬‬

‫جمهوريةِالعراق‬ ‫وزارةِالتعليمِالعاليِوالبحثِالعلمي‬ ‫الجامعةِالتكنولوجية‬

‫قسمِعلومِالحاسوب‬

‫تشفير ملفات ‪ WAVE‬الصوتية بخوارزمية‬ ‫‪ AES‬المحسنة‬ ‫اطروحة هقذهة الي قسن علوم الحاسوب‬ ‫الجاهعة التكنولوجية‬ ‫وهي جزء هن هتطلبات نيل درجة دكتوراه فلسفة‬ ‫في علوم الحاسبات‬ ‫هن قبل‬

‫نـذى حسين هحوذ علي‬ ‫بإشراف‬ ‫أ‪.‬د‪ .‬عبذ الونعن صالح رحوة‬ ‫أ‪.‬م‪.‬د‪ .‬عبذ الوحسن جابر عبذ الحسين‬ ‫جمادي االخر ‪6341‬‬

‫نيسان ‪5162‬‬

Suggest Documents