Neural Network Techniques for Proactive Password ... - IEEE Xplore

0 downloads 0 Views 4MB Size Report
Nov 2, 2006 - Angelo Ciaramella, Paolo D'Arco, Alfredo De Santis, Clemente Galdi, .... P. D'Arco and A. De Santis are with the Dipartimento di Informatica ed.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 3,

NO. 4,

OCTOBER-DECEMBER 2006

327

Neural Network Techniques for Proactive Password Checking Angelo Ciaramella, Paolo D’Arco, Alfredo De Santis, Clemente Galdi, and Roberto Tagliaferri, Senior Member, IEEE Abstract—This paper deals with the access control problem. We assume that valuable resources need to be protected against unauthorized users and that, to this aim, a password-based access control scheme is employed. Such an abstract scenario captures many applicative settings. The issue we focus our attention on is the following: Password-based schemes provide a certain level of security as long as users choose good passwords, i.e., passwords that are hard to guess in a reasonable amount of time. In order to force the users to make good choices, a proactive password checker can be implemented as a submodule of the access control scheme. Such a checker, any time the user chooses/changes his own password, decides on the fly whether to accept or refuse the new password, depending on its guessability. Hence, the question is: How can we get an effective and efficient proactive password checker? By means of neural networks and statistical techniques, we answer the above question, developing suitable proactive password checkers. Through a series of experiments, we show that these checkers have very good performance: Error rates are comparable to those of the best existing checkers, implemented on different principles and by using other methodologies, and the memory requirements are better in several cases. It is the first time that neural network technology has been fully and successfully applied to designing proactive password checkers. Index Terms—System security, access control, passwords, machine learning, neural networks.

Ç 1

INTRODUCTION

T

HE Access Control Problem: A Challenge. The design of efficient and secure protocols for protecting partially shared or private resources from unauthorized users’ accesses is a big challenge for computer scientists. Even though several suitable techniques have been proposed in the literature over the years, e.g., biometric identification schemes [20], [13] and challenge-response protocols based on smart cards [22], password-based schemes are still frequently used due to their simplicity. Password-Based Schemes and Dictionary Attacks. In a password-based access control scheme, a user who wishes to gain access to a resource or a system executes a (possibly interactive) protocol whose goal is to “prove knowledge” of some secret information, i.e., the password. As an example, the familiar login-password scheme to get access to a computer constitutes the basic authentication scheme implemented by every operating system. These schemes seem to be secure if the user keeps his password secret. Unfortunately, this is not true. Indeed, a password can be retrieved not only when the user accidentally discloses it, but also when the password is

. A. Ciaramella and R. Tagliaferri are with the Dipartimento di Matematica ed Informatica, Universita` di Salerno, Via Ponte Don Melillo, I-84084, Fisciano (SA), Italy. E-mail: {ciaram, robtag}@dmi.unisa.it. . P. D’Arco and A. De Santis are with the Dipartimento di Informatica ed Applicazioni, Universita` di Salerno, Via Ponte Don Melillo, I-84084, Fisciano (SA), Italy. E-mail: {paodar, ads}@dia.unisa.it. . C. Galdi is with the Dipartimento di Scienze Fisiche, Universita` di Napoli “Federico II”, Via Cinthia, Complesso Monte S. Angelo-I-80126, Napoli, Italy. E-mail: [email protected]. Manuscript received 6 Dec. 2005; accepted 17 May 2006; published online 2 Nov. 2006. For information on obtaining reprints of this article, please send e-mail to [email protected] and reference IEEECS Log Number TDSC-0167-1205. 1545-5971/06/$20.00 ß 2006 IEEE

easy to guess, i.e., it belongs to a small dictionary of words [19], [15], [14]. In this case, all words in the dictionary can be checked until a match is found in a reasonable amount of time. Such attacks are referred to as dictionary attacks. A solution developed in order to strengthen the password-based scheme is the one-time password approach [11], in which the user, by means of a passphrase, generates a list of passwords that are used to log in onto a remote host just once. However, such a technique is still not secure against dictionary attacks. Indeed, the Request for Comments 2289 [12] requires that, in order to reduce the risks related to dictionary attacks, the length of the secret information used to generate the one time password sequence has to be at least 10 characters. Notice that, starting from [2], a lot of research has been done in order to design password-based authentication schemes that are secure against dictionary attacks [25], [8], [18], [7]. We stress that the problem of choosing good passwords is not restricted to access control of network system hosts. Indeed, passwords are also often used, for example, to protect private information such as cryptographic keys or data files. In general, to increase the security level of password-based systems, we need a method to reduce the efficacy of dictionary attacks. This goal can be achieved if users are not allowed to choose easy-to-guess passwords. Weak and Strong Passwords. To simplify our discussion, we will informally use the terms weak or bad for easyto-guess passwords and strong or good for hard-to-guess ones. Notice that, according to the definition of dictionary attack, weak basically means a condition of membership in some dictionary of words that can be exhaustively checked in a reasonable amount of time, while strong refers to the opposite condition. These two notions are computational in Published by the IEEE Computer Society

328

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

nature. Hence, a password is weak if it can be found in a reasonable amount of time, while it is strong if the search requires unavailable resources of time or space, i.e., it can be any element of a big dictionary constructed over a given alphabet. It follows that a strong password looks like a random string. Previous Work on the Subject. Several papers have addressed the issue of choosing good passwords. In the literature, different techniques have been proposed in order to discourage/remove the choice of easy-to-guess passwords (see [26] for a recent overview). Proactive password checking is a promising technique. A proactive password checker is a program that interacts with the user when he changes his password. It checks the proposed new password and the change is allowed only if it is hard to guess. If the password is easy to guess, the system asks the user to type in another password instead. The philosophy on which these programs are based is that the user has the ability to select a password, but the system enables the selection of nontrivial ones only. Proactive Password Checkers. A proactive password checker is a simple program conceptually. It holds a list of weak passwords that must be rejected. When the user chooses or wishes to change his password, it checks for membership in the list. If the password is found in the list, the substitution is not enabled and a short justification is given; otherwise, the substitution is allowed. However, a straightforward implementation of such a program is not suitable for two reasons: The list of weak passwords can be very long and cannot be kept in the first levels (i.e., cache and main memories) of the memory hierarchy. Also, the time for checking membership can be high, which implies an unacceptably long wait for the user. Therefore, several proactive password checkers that aim at reducing the time and space complexities of the trivial approach have been proposed (see [17], [10], [23], [16]). All these models are an improvement over the straightforward scheme. However, both the straightforward scheme and these checkers have a low predictive power when tested on new dictionaries of words, i.e., they do not perform well if passwords are chosen from dictionaries which have not been considered during the setup phase of the checker. Indeed, a desirable feature of a proactive password checker is the ability to correctly classify passwords which do not appear in the initial set. To this aim, an interesting approach for designing a proactive password checker is the one applied in [1]. The problem of password classification is therein viewed as a Machine Learning Problem. The system, in a training phase, using dictionaries of examples of weak and strong passwords, gets the knowledge for distinguishing weak passwords from strong ones. This knowledge is represented by means of a decision tree. Later on, the decision tree is used for classification. The experimental results reported in [1] showed a meaningful enhancement of the error rates of previous solutions. The same technique was subsequently applied in [5], where the power of the checker was increased by exploring another key idea of machine learning in the construction of the decision tree: the Minimum Description Length Principle (MDLP). Finally, [6], put forward the possibility of using neural networks for

VOL. 3,

NO. 4,

OCTOBER-DECEMBER 2006

proactive password checking. Instead of using standard computing techniques, the classifier was implemented by means of a perceptron, the simplest example of neural networks. The extended abstract pointed out the efficiency and efficacy of the approach compared to previous proposals. Our Contribution. In this paper, we fully develop the approach of [6]. We discuss and analyze proactive password checkers based on multilayer neural networks. We evaluate the performance of several network topologies and of a combined approach comprising standard preprocessing techniques of the inputs and neural networks. We compare the performance of our system with those obtained by [5], [16]. The results obtained show that proactive password checkers based on this technology are a suitable alternative to currently available solutions and, for resource-constrained devices (e.g., smart cards), they might represent the best choice.

2

A MATHEMATICAL FRAMEWORK

A Model [4]. Let P be the set of all acceptable passwords, let p be an element chosen from P, and let s be a function used to select the password p from P. Then, denote by p0 a guess for the password p and assume that it takes a constant amount of time T ¼ tðp0 Þ to determine whether this guess is a correct one, i.e., if p0 ¼ p. We can model the choice of p in P with a random variable G, taking values in P. These values are assumed according to a probability distribution PG over elements of P that is induced by the selection function s. Moreover, assuming that s is known, the time to guess p can be represented with a random variable FPG , which assumes real values according to PG . If G is uniformly distributed on P, i.e., PG ¼ U, and no prior knowledge of the authentication function (the function used by the operating system to check the equality of a guess with the true password) is available, then, as pointed out in [4], to guess the selected password p, we have to try, on average, jPj 2 passwords from P and the expected running time is EðFU Þ ¼ T jPj 2 . Notice that, in this model, there is a correspondence between the set S of selection functions and the set DP , the set of all probability distributions on the set P. Therefore, we can characterize the bad selection functions s to choose p in P, with those probability distributions PG such that EðFPG Þ  kEðFU Þ:

ð1Þ

The parameter k 2 ½0; 1 defines a lower bound on the suitability of a given selection function, represented by the distribution PG . If p is chosen according to a probability distribution PG that satisfies (1), we say that p is easy-toguess. A family of bad selection functions is represented by language dictionaries, where the dictionary can be seen as the image set of a selection function s. The words in the dictionary are a small subset of all the strings that can be constructed with the symbols of a given alphabet. According to our model, the distribution induced by languages is skewed on P since they assign nonzero values only to a small

CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING

TABLE 1 The Charset

subset of elements; therefore, EðFPG Þ is much smaller than EðFU Þ. Hence, it is sufficient to try a number of passwords smaller than jPj 2 to guess the chosen p. To guarantee the security of the system against illegal accesses, we have to require that the selection function does not localize a small subset of P. This means that we have to find a method to discard those probability distributions PG on P such that EðFPG Þ is too small. If EðFU Þ is very large and we can force PG to look like U, then the goal is obtained. Our Point of View. In the above abstract model, a proactive password checker can be viewed as a tool for checking if a password p is chosen from P according to a suitable selection function, i.e., a function which induces a probability distribution that looks like the uniform one. Such a viewpoint is useful in order to understand why, in implementing our proactive password checker, we use statistical techniques. Indeed, the password selection problem can be cast as a particular instance of pattern recognition problems and the most general and natural framework in which to formulate solutions to pattern recognition problems is a statistical framework. A Concrete Setting. The above is a general analysis of the password selection problem. In order to derive practical results, we need to carefully specify the password space P. We consider the set of all the strings composed of “printable” ASCII characters with length less than or equal to 8 (i.e., the length allowed for passwords by Unix-like operating systems). This set is reported in Table 1. The charset is divided into “weak” characters (namely, all the letters), the digits, and “strong” characters. This partition is motivated by the empirical evidence that strong characters do not usually appear in passwords. On the other hand, digits are considered “weaker” than strong characters because of the users’ habit to use numbers in their passwords, e.g., birth dates. We stress that this partition should be considered as an experimental setting and that it can be modified in order to fit per-site password policies.

3

PATTERN RECOGNITION AND NEURAL NETWORKS

Pattern Recognition. Pattern recognition is a solid area of studies. It encompasses a wide range of information processing problems of great practical significance, from speech recognition and classification, to fault detection in machinery and medical diagnosis. From an abstract point of view, it concerns with function approximation and object classification. In this paper, we restrict our attention to

329

object classification. Loosely speaking, we assume that a given universe of objects can be partitioned in different classes according to some characteristics. The recognition problem consists in associating each object to a class. Generally, a pattern recognition system is a two-part device: a feature extractor and a classifier. It takes as input an object and outputs the classification. A feature is a measurement taken over the input object that has to be classified. The values of the measurements are usually real numbers and are arranged in a vector, called the feature vector. The set of possible feature vectors is called the feature space. The feature extractor of a pattern recognition system simply takes measurements over the object and passes the feature vector to the classifier. The classifier applies a given criterion, implemented by means of a discriminant function, to establish in which class the object belongs. A discriminant function is a function that maps the feature vector into the classification space and, usually, defines a boundary among the classes. If the discriminant function is a linear function, i.e., it defines a boundary in the classification space that looks like a hyperplane, the classifier is said to be linear. Of course, a linear classifier can be used if the classes themselves can be separated by means of a straight line. When this happens, we say that the problem is linearly separable. As we will see later, the system we are looking for is a pattern recognition system which takes as input a password, extracts some features from it, and passes them to a classifier which outputs a decision. In our abstract view, if the classifier says that the password is weak, then it means that the selection function the user is applying is not suitable. Neural Networks. Neural Networks (NNs, for short) can be considered as a statistical technique in pattern recognition. They implement nonlinear mappings from several input variables to several output variables, where the form of the mapping is governed by a number of adjustable parameters. An NN learns how to compute a mapping by trial and error, through a certain parameter optimization algorithm. Such an algorithm, due to the biological premises of the theory of NNs, is called a learning algorithm. During the learning process (also called training), the network receives a sequence of examples and adapts its internal parameters to match the desired input-output functionality. The knowledge to compute the mapping is therefore acquired during this learning process and it is stored in the modified values of the internal parameters. Several learning algorithms have been developed in order to teach an NN to compute a certain mapping. We will use an NN in implementing the classifier in our pattern recognition system, i.e., the proactive password checker. Learning Typologies. Let us suppose that a sufficiently large set of examples is available. Two main learning strategies can be adopted in general: .

Supervised learning—This learning typology requires that target output values of the network are known for all the input patterns of the training set. Examples of algorithms implementing such a strategy are the Multilayer Perceptron and Radial Basis Functions.

330

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 3,

NO. 4,

ak ðxÞ ¼ wTk x þ w0k ¼

d X

OCTOBER-DECEMBER 2006

wik xi þ w0k :

ð3Þ

i¼1

Each variable ak is associated with each output unit. The values wik represent the weights and the values w0k represent the bias parameters. Moreover, to each output unit is associated an activation function of its own inputs. The choice of the output activation functions for the output units depends on the application. The simplest choice is the linear function of the form yk ðx; wÞ ¼ ak ðxÞ: Fig. 1. Single layer perceptron.

Unsupervised learning—This learning typology can be applied when target answers of the network for the input patterns of the training set are unknown. Unsupervised learning teaches the network to discover by itself correlations and similarities among the input patterns of the training set. Examples of algorithms implementing such a strategy are SelfOrganizing Maps, Hopfield Nets, and PCA NNs. In the following, we consider NNs which apply the first learning strategy. In this case, an NN can be regarded simply as a particular choice of a function of the form .

yðx; wÞ;

ð2Þ

where x denotes the input vector and w denotes the vector of adjustable parameters, called weights. Learning in NNs, i.e., adaptation of the value of w during the training process for learning a certain mapping, is usually formulated in terms of minimizing an error function with regard to w. Generalization and Early Stopping: Training, Validation, and Test Sets. The goal of the training procedure does not consist of exactly modeling a given set of objects, but rather in learning the mapping underlying such a data set. Indeed, the network should exhibit good performance over new unseen inputs. A method to control the complexity of the learning process is called early stopping. More precisely, a training procedure can be seen as an iterative procedure which aims at reducing the error function over the training set. This error decreases as a function of the number of iterations of the procedure. However, the error measured on an independent data set, referred to as the validation set, often shows a decrement followed by an increment when the network starts to overfit. Therefore, it is convenient to stop the process at the point of smallest error on the validation set. Indeed, it is known that the network we get at this point has the highest generalization power. The performance of the selected network should be confirmed by using another independent data set, referred to as the test set. Single Layer Networks. Single Layer Networks, or Single Layer Perceptrons (SLP), implement the well-known statistical techniques of linear regression and function approximation [3]. Such NNs have a single layer of adaptive weights between the inputs and the outputs (see Fig. 1). More precisely, the input values to the network are denoted by xi , for i ¼ 1; . . . ; d. The network, for k ¼ 1; . . . ; c, forms c linear combinations of these inputs, producing a set of intermediate variables ak defined by

ð4Þ

Another choice, when multiple independent attributes are involved, is given by using independent logistic sigmoidal activation functions, applied to each of the outputs independently, defined by yk ðx; wÞ ¼

1 : 1 þ eak ðxÞ

ð5Þ

One of the most used methods for training an SLP is the least-square learning algorithm [3], [21]. Nevertheless, it is also possible to take advantage of the linear (or near linear) structure of the network and use a particularly efficient special purpose learning algorithm known as Iterated Reweighted Least Squares (IRLS) [21]. Notice that, in a classification problem, an SLP is used as follows: Once the network has been trained applying the aforementioned early stopping method, a new vector is classified by giving it in input to the network, computing the output unit activations, and assigning the vector to the class whose output unit has the largest activation value. It is possible to show that an SLP, by appropriately instantiating the output units with specific activation functions, implements various forms of linear discriminant functions [3]. This implies that an SLP always defines a linear decision boundary among the classes. Unfortunately, there are some problems that cannot be solved using a linear decision boundary or, in other words, that are not linearly separable. Hence, SLPs correspond to a narrow class of possible discriminant functions and, in many situations, may not represent the optimal choice. Multilayer Perceptron. The Multilayer Perceptron (MLP) is probably the most widely used architecture for practical applications of NNs. Usually, the network consists of two layers of adaptive weights with full connectivity between inputs and hidden units and between hidden units and outputs (see Fig. 2). From the theory of NNs [3], it is well-known that this architecture is capable of universal approximation in the sense that it can approximate to arbitrary accuracy any continuous function from a compact region of the input space, provided the number of hidden units is sufficiently large and provided the weights and biases are chosen appropriately. In practice, this means that, if there is enough data to estimate the network parameters, an MLP can model any smooth function. The input values to the network are denoted by xi , for i ¼ 1; . . . ; d. The first layer of the network forms M linear combinations of these inputs, producing a set of indetermið1Þ nate activation variables aj , for j ¼ 1; . . . ; M, defined by

CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING

Fig. 2. Multilayer perceptron. ð1Þ

aj ¼

d X

ð1Þ

w1ij xi þ w0j ;

ð6Þ

331

effect on the performance of the network. One of the most important forms of preprocessing consists of the reduction of the dimensionality of the input data. To this aim, several approaches require forming linear or nonlinear combinations of the original measurements on the object, in order to generate inputs for the network (i.e., the feature vector). The principal motivation for dimensionality reduction is that it can help to alleviate the worst effects of the so-called curse of dimensionality (see [3] for details). Principal Component Analysis. A widely used preprocessing technique is the Principal Component Analysis (or PCA, for short) or Karhunen-Loe´ve transformation [3]. Let us briefly describe the classical PCA: Let x1 ; . . . ; xN be a set of d-dimensional vectors. Our goal is to map d-dimensional column vectors xi to m-dimensional vectors zi , where m < d. To this aim, notice that a generic vector xi can be represented, without loss of generality, as a linear combination of a set of d orthonormal vectors uj :

i¼1 ð1Þ

with one variable aj associated with each hidden unit. The values w1ij represent the weights of the first layer of wires, ð1Þ while the values w0j represent the bias parameters ð1Þ associated with the hidden units. The variables aj are then transformed by the nonlinear functions of the hidden layer. In our application, we restrict the attention to the hyperbolic tangent ðtanhÞ activation functions since this is the most appropriate choice for classification problems [3]. The outputs of the hidden units are therefore given by ð1Þ

zj ¼ tanhðaj Þ:

ð7Þ

The zj are then transformed by the second layer of weights ð2Þ and biases yielding the second layer activation values ak , for k ¼ 1; . . . ; c: ð2Þ

ak ¼

M X

ð2Þ

w2kj zj þ wj0 :

ð8Þ

xi ¼

m X

xij uj þ

j¼1

d X

xij uj :

ð9Þ

j¼mþ1

Since our goal is to reduce the space dimensionality while keeping as much information as possible, we would like to retain only a subset of m < d of the basic vectors uj so that we use only m coefficients xij for representing xi , i.e., the m-dimensional vector zi corresponding to the d-dimensional vector xi is given by zi ¼ ðxij1 ; . . . ; xijm Þ with regard to a suitably chosen set of d orthonormal vectors u1 ; . . . ; ud . Hence, we need to find a set of d orthonormal vectors such that, by retaining only m of them, we maximize the information held by the vectors in the m-dimensional space. It is possible to show that the directions of maximum variance (i.e., the directions that brings most of the information in the original vectors) are parallel to the eigenvectors corresponding to the largest eigenvalues of the covariance matrix defined by

j¼1

Finally, these values are passed through the output-unit activation functions (e.g., linear, sigmoidal, softmax [3]) producing the output values yk ðx; wÞ. For networks having differentiable activation functions, such as tanh, there exists a powerful and computational efficient learning algorithm called error back-propagation [3], [21]. However, to estimate the weights of a two-layer MLP, we can also adopt different optimization strategies, (e.g., conjugate gradients, scaled conjugate gradients, quasiNewton method [3]). In our implementation, we use the quasi-Newton optimization algorithm, which is more stable than the back-propagation algorithm [3], [21]. Preprocessing of the Input. An NN can implement any arbitrary functional mapping between multidimensional spaces. However, in real applications, a straightforward use of a network to map the raw input data directly into the required output variables often is not a suitable choice. In practice, it is nearly always advantageous to apply preprocessing transformations to the input data before it is presented to a network. Data preprocessing is one of the most important stages in the development of a solution, and the choice of preprocessing steps can have a significant



N X

ðxi  xÞðxi  xÞT ;

ð10Þ

i¼1

where x¼

N 1X xi N i¼1

ð11Þ

and xi are the original feature vectors. For this reason, the set of vectors uj we need is just the set of the eigenvectors1 uj of  with highest eigenvalues j . Such eigenvectors of  represent a new system of coordinates. Each eigenvector is called a principal component. In this way, each vector xi in the original d-dimensional space is represented by zi in this new system of coordinates for the m-dimensional space.

4

EXPERIMENTAL EVALUATION

AND

COMPARISON

In order to verify the applicability of NNs to proactive password checking systems, we have run several experiments with both single-layer and multilayer perceptrons. 1. Notice that uj is an eigenvector of  with eigenvalue j if and only if uj ¼ j uj .

332

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

TABLE 2 Dictionaries Description

NO. 4,

OCTOBER-DECEMBER 2006

is constructed from noise.0.x by substituting half of the lowercase letters with the corresponding uppercase ones. Features. The four features we have used for the classification process are the following: Classes, #Strong Characters, Digrams, Upper-Lower Distribution. More precisely: .

We have applied these models using different architectures and using a PCA preprocessing technique. The obtained results are compared to identify the NN model that yields the best performance. In the following, we describe in detail the dictionaries and the features we have used and the experimental results we have got. Dictionaries. The dictionaries we have used for the training and testing phases of the NNs are reported in Table 2. More specifically, the weak dictionary is composed of 327,878 words from the English dictionary. All the words are lowercase and no special symbol belongs to any of these words. We created three types of dictionaries of strong passwords. The first type is composed of strings whose characters are pseudorandomly sampled from the set of characters reported in Table 1. This dictionary is composed of 30,000 words of length 6 (the dictionary strong.0.1), 30,000 words of length 7 (the dictionary strong.0.2), and 30,000 words of length 8 (the dictionary strong.0.3). Notice that the set of strong characters is approximately one-third of the whole charset; hence, this strategy for constructing strong passwords does not rule out the possibility of obtaining pseudorandom passwords composed of only lowercase letters. For this reason, we have constructed the dictionaries strong.1.x and strong.2.x so as to force each password in these dictionaries to present either strong characters or some digits. More precisely, each word in strong.1.x contains at least one strong character or at least two digits. Similarly, each word in strong.2.x contains at least two strong characters or three digits. Each dictionary is composed of 30,000 words, of length 6 in strong.y.1, length 7 in strong.y.2, and length 8 in strong.y.3. To simulate usual users’ behavior, we have also used noisy dictionaries. The idea underlying the construction of such dictionaries is that users, in order to remember their passwords, might substitute in a weak word one or more characters with strong ones and/or use both lowercase and uppercase letters. Hence, the dictionary noise.0.x is obtained by substituting, in each word, x strong characters, in randomly chosen positions. Finally, the dictionaries noise.1.x

VOL. 3,

.

.

Classes. It is reasonable to consider the set of ASCII characters divided into classes of different strength. Commonly, passwords are composed of letters; this means that all (uppercase and lowercase) letters must have low values. In a second class, we can put the digits 0; . . . ; 9. This is because it is not usual to find a digit in a password, but it is not so unusual, either. In the last class, called the class of strong characters, we can put every character that does not belong to the first two classes. To mark the distance among these classes, we have assigned to the class of letters a value equal to 0.2, to the class of digits a value equal to 0.4, and, to the last class, 0.6. The overall value of a password is computed by summing up the values associated with each character in the password. Notice that, since the feature is a sum, the longer the password, the higher the value. #Strong Characters: The second feature is the number of strong characters (as defined in Table 1) contained in the password. Upper-Lower Distribution: The value of this feature is computed by the following formula: jUP P  LOW j=‘et;

where UP P is the number of uppercase letters, LOW is the number of lowercase letters, and ‘et is the number of letters in the password. The presence of this feature is due to the observation that passwords that contain both uppercase and lowercase letters are slightly stronger that passwords composed of lowercase (uppercase) letters only. . Digrams: This feature looks at the types of digrams present into the password. More precisely, we say that a digram is an alternance if the two characters of the digram belong to different classes. The checker scans the password, analyzes all the digrams from left to right, and assigns values to each of them. In a password with n characters, we consider all n  1 possible digrams. The more alternances the password has, the higher the value. Experimental Results. The aim of these experiments is to compare an SLP and an MLP in classifying words as weak or strong. We use a supervised learning strategy to train the NN. In order to implement such a strategy, we need to construct a training set (as well as validation and test sets, required by the early stopping method [3]) by using the dictionaries described in Table 2. As a starting point, we consider a data set, referred to as Data_Set_1, constructed as follows: We assume that weak and the noisy dictionaries contain weak passwords, while the remaining ones, namely strong.x.y, for x ¼ 0; 1; 2 and y ¼ 1; 2; 3, contain strong passwords. We label each weak (respectively, strong) password with a 0 (respectively, 1).

CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING

333

TABLE 3 Classification Percentage of Data_Set_1 with Different NNs, no PCA

The training, the validation, and the test sets have been obtained by collecting the labeled passwords in order to form two big dictionaries of weak and strong passwords and by assigning a randomly chosen 60 percent of the two dictionaries to the training set, a randomly chosen 20 percent to the validation set, and the remaining 20 percent to the test set. Our experiments were carried out using algorithms of the Netlab Toolbox [21]. More precisely, the training algorithm used for the SLP is the IRLS [21], with activation functions for the output units given by the logistic sigmoidal function. Moreover, we have used the quasiNewton optimization algorithm as the training algorithm for the MLP with tanh activation functions for the nodes of the hidden layer and a linear function for the nodes of the output layer. The number of inputs for both the NNs is four and the number of outputs is one. In the case of the MLP, we have considered several instances of the network by changing the number of hidden nodes from four to 10. Performance. In Table 3 we show the classification rates with respect to Data_Set_1, obtained with different NNs. From these results, it is clear that MLPs achieve better performance than SLPs, even when the number of hidden nodes is small. Since the classification rate is higher when the number of hidden nodes is eight, we report in Table 4 the classification rates of such an NN on all dictionaries we have constructed. TABLE 4 Classification of the Dictionaries of Data_Set_1 Using an Eight-Hidden-Nodes MLP

Fig. 3. Eigenvalues plot of the training set.

From the analysis of the correlation matrix, we note that the first two principal components (see Fig. 3) have meaningful information (more than 90 percent of information). This allows us to use a linear PCA technique to extract two features and to obtain a two-dimensional dictionary. More precisely, we have that, from the four features of the training set that we denote as x1 ¼ Classes; x2 ¼ #Strong Characters; x3 ¼ Upper  Lower Distribution; and x4 ¼ Digrams; we obtain two features (y1 and y2 ). These features are a linear combination of the four source features and they can be described by the following equation: X3  x for i ¼ 1; 2; ð12Þ yi ¼ j¼1 ij j where the coefficients i ¼ i1 ; . . . ; i4 are the coefficients of the principal components of the covariance matrix, i.e., 1 ¼ ½0:1331 2 ¼ ½0:0422

0:7435 0:5892 0:0434; 0:3982 0:3892 0:8296:

ð13Þ

In Table 5, we show the results obtained by the different NNs when preprocessing the inputs using the PCA technique. We point out that the results obtained using the best NN (eight hidden nodes) on each dictionary are very close to those obtained without applying PCA. In Fig. 4, we plot the two-dimensional data set and the decision boundary obtained with the MLP with eight hidden nodes. Along the lines of the above experiments, we have carried out a second set of experiments by using a new data

334

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

TABLE 5 Classification Percentage of Data_Set_1 with Different NNs, with PCA

VOL. 3,

NO. 4,

OCTOBER-DECEMBER 2006

TABLE 7 Classification of the Dictionaries of Data_Set_2 by Using an MLP with 10 Hidden Nodes

TABLE 8 Classification Percentage of Data_Set_2 with Different NNs, with PCA Fig. 4. Data visualization and contour plot obtained by using a MLP with eight hidden nodes with PCA on the whole dictionary Data_Set_1: (.)

set, referred to as Data_Set_2, constructed by changing Data_Set_1 by labeling with 1 the words in noise.1.2, i.e., noise.1.2 is a strong dictionary. We report in Table 6 the classification rates obtained by applying an SLP and an MLP with different hidden nodes, trained using this new training set, but with no PCA preprocessing. Again, we note that by using an MLP, we obtain a better result than by using an SLP. Moreover, since the best result is obtained using an MLP with 10 hidden nodes, we classify the original dictionaries using this network configuration. In Table 7, we show the obtained classification rates. In Table 8, we show the results obtained on this dictionary using an SLP and different MLPs with four, six, eight, and 10 hidden nodes, respectively, and applying the PCA preprocessing technique. We note that the results are again very similar to the ones obtained without using the PCA technique. In Fig. 5, we plot the two-dimensional data set and the obtained decision boundary. We can also note from this figure that the classification is hard using an SLP.

From these results, it is clear that the choice to consider noise.1.2 a strong dictionary reduces the overall error rate and significantly reduces the gap among the errors associated to different dictionaries. Furthermore, we notice that the percentage of false negatives, i.e., the misclassified weak passwords, is negligible. The false negatives in Table 7 are the passwords in the dictionaries weak, noise.0.x, and noise.1.1 classified as strong and their percentage with regard to the number of all weak passwords is 0.4 percent. This property ensures that a negligible number of user passwords will be easily guessable. Notice that most of the

TABLE 6 Classification Percentage of Data_Set_2 with Different NNs, No PCA

Fig. 5. Data visualization and contour plot obtained by using an MLP with 10 hidden nodes with PCA on the whole dictionary Data_Set_2: (.) class with label 1, (+) class with label 0.

CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING

335

Fig. 6. Decision boundary of the RBF model with 10 hidden nodes and with PCA on Data_Set_1.

Fig. 8. Decision boundary of the FRNN model with 10 hidden nodes and with PCA on Data_Set_1.

errors are due to the noise.1.1 dictionary, composed of passwords of the dictionary weak in which a randomly chosen character has been substituted with a strong character and half of the letters have been substituted with their uppercase. Comparison with Other Classification Methods. For the sake of completeness, we have compared the results obtained with MLPs with the following classification methods: Kernel Functions. Radial Basis Functions (RBFs) are powerful kernel-based techniques for interpolation and classification in multidimensional spaces. Basically, an RBF is a function which implements a distance criterion with respect to a center. Radial basis functions have been applied in the area of NNs. Such networks have three layers: the input layer, the hidden layer with the RBF functions, and a linear output layer. The most popular choices for RBFs are the Gaussian functions. For a complete introduction to NNs based on RBFs, the reader is referred to [3]. . Fuzzy Models. Fuzzy Relational Neural Networks (FRNNs) have been introduced in [9]. FRNNs apply fuzzy rules in order to classify objects. These fuzzy rules are obtained by combining fuzzy relations learned during the training process. The composition of such relations is accomplished by using suitable norms (e.g., t-Norms). The experiments considered so far show that the percentages of classification obtained with and without PCA preprocessing are substantially the same. On the other hand, PCA preprocessing allows one to construct a smaller

NN and enables a visual representation of the classification process. For these reasons, in using the RBF and FRNN models, we have only considered the two data sets obtained by applying the PCA preprocessing. With respect to Data_Set_1, we obtain, by using an RBFbased NN with 10 hidden nodes, 92.1818 percent of perfect classification on the training set and 92.2148 percent on the test set. With respect to Data_Set_2, we obtain 94.8316 percent of perfect classification on the training set and of 94.8507 percent on the test set. By using an FRNN with 10 hidden nodes on Data_Set_1, we get 95.9108 percent of perfect classification on the training set and 95.8644 percent on the test set. With respect to Data_Set_2, we obtain 97.194 percent of perfect classification on the training set and 97.2195 percent on the test set. In Figs. 6, 7, 8, and 9, we plot the decision boundaries obtained in the above experiments. Notice that the experiments executed with FRNNs have also shown that a smaller number of hidden nodes are sufficient to get high classification rates. Indeed, an FRNN with five hidden nodes correctly classifies 93.9547 percent of the training set and 93.9114 percent on the test set of Data_Set_1. At the same time it correctly classifies 97.1391 percent of the training set and 97.1782 percent on the test set of Data_Set_2. Comparison with Other Password Checkers. We compared our password checker with those described in [5] and [16]. We chose these proactive password checkers since the system described in [5] outperforms all the previous solutions described in the literature and the second one is derived from Crack [15], the most used password cracker. Both these systems can be adapted to a per-site policy.

Fig. 7. Decision boundary of the RBF model with 10 hidden nodes and with PCA on Data_Set_2.

Fig. 9. Decision boundary of the FRNN model with 10 hidden nodes and with PCA on Data_Set_2.

.

336

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 3,

NO. 4,

OCTOBER-DECEMBER 2006

TABLE 9 Hyppocrates—noise.1.2 Considered in Training Phase as Strong

TABLE 11 Hyppocrates—noise.1.2 Considered in Training Phase as Strong

Hyppocrates: This system uses decision trees to classify passwords. More precisely, Hyppocrates creates a decision tree from the training set, which consists of positive and negative examples, namely, weak and strong passwords. The result of the training process is a trained tree. Starting from this tree, it is possible to construct a pruned one. We used the pruned tree generated by the system since it requires less space than the trained tree while maintaining almost the same classification rates. For a more detailed description, we refer the reader to [5]. We ran several experiments, and in Tables 9, 10, 11, 12, 13, and 14, we report the results of the most relevant. Notice that Hyppocrates is case-insensitive. Thus, from the system’s point of view, the dictionaries noise.0.x and noise.1.x are exactly the same, while, in NN-based checkers, such dictionaries are distinct. For this reason, we only consider the dictionaries noise.1.x both for the training and the testing phases.

The idea behind the experiments is to test different trees constructed with training sets in which the weak passwords are sampled from weak, noise.1.1, and noise.1.2. In all the experiments, except Experiment 5, the strong dictionary used for the training is randomly generated by Hyppocrates and its size is 10 percent of the size of the weak dictionary used for the same phase. The value 10 percent is suggested by the authors of [5] in order to optimize the performance of the system. In Experiment 5, we use strong.1.x, with x ¼ 1; 2; 3, during the training phase. In Experiment 1, we trained the system by using the dictionary weak, transformed by Hyppocrates in the following way: Each word with probability one-third is not modified; with probability two-thirds, it is modified by substituting a strong character in a randomly chosen position. Notice that two-thirds of the words in the resulting dictionary have the same characteristics of words belonging to noise.1.1.

TABLE 10 Hyppocrates—noise.1.2 Considered in Training Phase as Strong

TABLE 12 Hyppocrates—noise.1.2 Considered in Training Phase as Weak

CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING

TABLE 13 Hyppocrates—noise.1.2 Considered in Training Phase as Weak

In Experiment 2, we considered the case in which examples of weak passwords are taken from noise.1.1. In Experiment 3, half of the weak passwords are taken from weak and the other half are taken from noise.1.1. In all the above experiments, the passwords in noise.1.2 are classified as strong. In order to force the system to classify such a dictionary as weak, in the training phase of Experiments 4 and 5, we used half the examples of weak passwords from noise.1.2. Finally, in Experiment 6, one-third of the examples of weak passwords are sampled from weak, one-third from noise.1.1, and the remaining third from noise.1.2, as suggested by the authors in [5]. CrackLib: This password checker also provides the possibility of adapting the system in order to tune its performance to local policies. In this case, the training is carried out by using only the dictionary containing the weak passwords that have to be rejected. We report in Tables 15 and 16 the results of the following experiments: In TABLE 14 Hyppocrates—noise.1.2 Considered in Training Phase as Weak

337

TABLE 15 Classification of the Dictionaries Used to Construct Data Set 1 and Data Set 2 Provided by Cracklib in Several Experiments

Experiment 1, the system has been constructed by using half of the dictionary weak. In Experiment 2, the system has been constructed by using the whole dictionary weak. In Experiment 3, we have used half the words in weak and half the words in noise.0.1. In Experiment 4, we have used noise.0.1. Finally, in Experiment 5, we have used both weak and noise.0.1. Performance Evaluation. From the test results reported in Table 4 and Table 7, we can state that the system presented in this paper correctly classifies almost all the weak passwords. More precisely, if noise.1.2 is considered to be a weak dictionary (Table 4), the percentage of misclassified weak passwords, i.e., weak passwords classified as strong, is 0.5 percent. On the other hand, the classification error on strong passwords is 25.5 percent. If noise.1.2 is considered to be a strong dictionary (see Table 7), the error rate over weak TABLE 16 Classification of the Dictionaries Used to Construct Data Set 1 and Data Set 2 Provided by Cracklib in Several Experiments

338

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

TABLE 17 Space Requirement

NO. 4,

OCTOBER-DECEMBER 2006

the space required by the different experiments we have presented in this paper. It is clear that previous solutions do require an amount of information that is, in most cases, considerably larger than the solution presented in this paper.

5

passwords decreases to 0.4 percent and, simultaneously, the error rate over the strong passwords decreases to 10 percent. For Hyppocrates, we note that Experiments 1, 2, and 3 show a very good classification rate for weak passwords. Indeed, if we consider the dictionary noise.1.2 a strong one, the percentage of weak passwords that are misclassified is 0 for Experiments 2 and 3. Furthermore, the memory requirements for the pruned tree are almost comparable to the size of the NN (see Table 17). On the other hand, we note that, for Hyppocrates, it is hard to classify noise.1.2 as a weak dictionary. The results of Experiments 4, 5, and 6 show that, if we consider noise.1.2. as a weak dictionary, the percentage of misclassified passwords becomes unacceptable. Only in Experiment 6, the percentage of misclassification of weak passwords is 0.67, but the price we pay is that the size of the pruned tree grows to 76 KB. Hence, the size of a tree that classifies noise.1.2 as a weak dictionary (and that does not induce misclassification on the other weak dictionaries) is quite big. The results of the experiments for Cracklib show very poor performance with regard to the classification of weak passwords (see Tables 15 and 16). Indeed, while it correctly classifies almost all the strong passwords, it fails in classifying weak passwords, if they have not been used during the training phase. Memory Requirement. As a final remark, we would like to stress that the space needed to store a NN once it has been trained, consists of a few bytes and that it is independent of the size of the training set. In our case, the PCA matrix consists of eight real values. An MLP with h hidden nodes, with four inputs and one output, requires storing 5h real values for the weights and biases needed to compute the intermediate activation variables and h þ 1 values, for computing the second-layer activation variable, along with the bias for the output unit. This means that, in an MLP with 10 hidden nodes with PCA preprocessing, if a real value is encoded using 8 bytes, the total number of bytes to be stored is 552, independently of the size of the training set. In contrast, previous solutions presented in [5], [1], [16] require the size of the information to be stored, once the system has been trained, to be dependent on the specific training set. In Table 17, we report

VOL. 3,

CONCLUSIONS

We have applied SLP and MLP networks to the design of proactive password checkers. It is the first time that such techniques have been fully (and successfully) employed in this setting. We have evaluated the performance of several network topologies. For some of them, we have provided a visualization of the behavior of the network using a standard preprocessing technique of the inputs. Moreover, we have compared the MLP networks with kernel-based and fuzzy-based neural network models. Although such models obtain very high classification rates, MLP networks still appear to be the best choice. Finally, we have compared the classification rates obtained by our solutions with previously presented proactive password checkers. In all cases, the results confirm that proactive password checkers based on this technology have high efficiency and efficacy. The solution presented has the main advantage that the size of information to be stored after the training of the NN is independent of the size of the training set and, in our case, can be as low as 552 bytes. Hence, such checkers might be easily implemented using the smart card technology.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

F. Bergadano, B. Crispo, and G. Ruffo, “High Dictionary Compression for Proactive Password Checking,” ACM Trans. Information and System Security, vol. 1, no. 1, pp. 3-25, 1998. S.M. Bellovin and M. Merritt, “Encrypted Key Exchange: Password-Based Protocols Secure against Dictionary Attacks,” Proc. IEEE Symp. Research in Security and Privacy, pp. 72-84, 1992. C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995. M. Bishop, “Proactive Password Checking,” Proc. Fourth Workshop Computer Security Incident Handling, pp. 1-9, 1992. C. Blundo, P. D’Arco, A. De Santis, and C. Galdi, “Hyppocrates: A New Proactive Password Checker,” J. Systems and Software, vol. 71, nos. 1-2, pp. 163-175, Apr. 2004. C. Blundo, P. D’Arco, A. De Santis, and C. Galdi, “A Novel Approach to Proactive Password Checking,” Proc. Infrastructure Security (INFRASEC ’02), pp. 30-39, 2002. M.K. Boyarsky, “Public-Key Cryptography and Password Protocols: The Multi-User Case,” ACM Conf. Computer and Comm. Security, pp. 63-72, 1999. V. Boyko, P. MacKenzie, and S. Patel, “Provably Secure PasswordAuthenticated Key Exchange Using Diffie-Hellman,” Proc. Eurocrypt 2000, pp. 156-171, 2000. A. Ciaramella, R. Tagliaferri, W. Pedrycz, and A. Di Nola, “Fuzzy Relational Neural Network,” Int’l J. Approximate Reasoning, vol. 41, pp. 146-163, 2006. C. Davies and R. Ganesan, “Bapasswd: A New Proactive Password Checker,” Proc. 16th Nat’l Conf. Computer Security, pp. 1-15, 1993. N.M. Haller, “The S/KEY One-Time Password System,” Proc. ISOC Symp. Networks and Distributed Systems Security, 1994. N. Haller, C. Metz, P. Nesser, and M. Straw, A One-Time Password System, Request for Comments 2289, 1998. L.C. Jain, U. Halici, I. Hayashi, S.B. Lee, and S. Tsutsui, Intelligent Biometric Techniques in Fingerprint and Face Recognition. CRC Press, 1999. “John the Ripper” password cracker, http://www.openwall. com/john, 2006.

CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING

[15] A.D. Muffett, “Crack 5.0,” http://www.crypticide.com/users/ alecm/, 1997. [16] A.D. Muffett, “Cracklib v2.7: A Proactive Password Sanity Library,” http://www.crypticide.com/users/alecm/, 1997. [17] J.B. Nagle, “An Obvious Password Detector,” Usenet News, 1988. [18] J. Katz, R. Ostrovsky, and M. Yung, “Efficient PasswordAuthenticated Key Exchange Using Human-Memorable Passwords,” Proc. Eurocrypt ’01, pp. 475-495, 2001. [19] D.V. Klein, “Foiling the Cracker–A Survey of, and Improvements to, Password Security,” Proc. Second USENIX Workshop Security, pp. 5-14, 1990. [20] R. de Luis-Garcı´a, C. Alberola-Lo´pez, O. Aghzout, and J. RuizAlzola, “Biometric Identification Systems,” Signal Processing, vol. 83, no. 12, pp. 2539-2557, 2003. [21] I.T. Nabney, NETLAB-Algorithms for Pattern Recognition. SpringerVerlag, 2002. [22] C. Schnorr, “Efficient Identification and Signature for SmartCards,” Proc. Eurocrypt ’89, pp. 239-252, 1989. [23] E. Spafford, “Opus: Preventing Weak Password Choices,” Computers and Security 3, 1992. [24] R. Stalling, Network and Internetwork Security Principles and Practice. Prentice Hall, 1995. [25] T. Wu, “The Secure Remote Password Protocol,” Proc. ISOC Network and Distributed System Security Symp., pp. 97-111, 1998. [26] J. Yan, “A Note on Proactive Password Checking,” Proc. ACM New Security Paradigms Workshop, Sept. 2001.

Angelo Ciaramella received the laurea degree (cum laude) and PhD degree in computer science from the University of Salerno, Italy, in 1998 and 2002, respectively. He is currently a postdoctoral researcher with the Department of Mathematics and Computer Science at the University of Salerno. He works on nonlinear PCA for periodicities detection and independent component analysis in blind source separation for linear, convolutive, and single channel mixtures. He also works on fuzzy and neurofuzzy systems. He is the author of several publications in the area of soft computing and signal processing. Paolo D’Arco received the PhD degree in computer science from the University of Salerno in February 2002. From November 2001 to September 2002, he was a postdoctoral fellow at the Centre for Applied Cryptographic Research, Department of Combinatorics and Optimization, University of Waterloo (Canada). Since December 2003, he has been an assistant professor at the University of Salerno. His research interests include cryptography and data security.

339

Alfredo De Santis received the laurea degree in computer science (cum laude) from the University of Salerno in 1983. Since 1984, he has been with the Dipartimento di Informatica ed Applicazioni of the University of Salerno, in 1984-1986 as an instructor in charge of the computer laboratory, in 1986-1990 as a faculty researcher, and since November 1990 as a professor of computer science. From November 1991 to October 1995 and November 1998 to October 2001, he was the chair of the Dipartimento di Informatica ed Applicazioni of University of Salerno. He was the chairman of the graduate program in computer science at the University of Salerno: ciclo XII (1996–2000), ciclo XIII (1997–2001), ciclo XIV (1998–2002), ciclo XV (1999–2002), and ciclo XVI (2000–2003). From September 1987 to February 1990, he was a visiting scientist at the IBM T.J. Watson Research Center, Yorktown Heights, New York. He spent August 1994 at the International Computer Science Institute (ICSI), Berkeley, California, as a visiting scientist. He was the program chairman of Eurocrypt ’94, of the Fifth Italian Conference on Theoretical Computer Science, 1995, and of the Security in Communication Networks Conference, 1996. He was cochair of the Advanced School on Computational Learning and Cryptography, Vietri sul Mare, Italy, 1993. He served on the scientific program committee of several international conferences. He was the editor of the Proceedings of the Fifth Italian Conference on Theoretical Computer Science, World Scientific, 1996. He was the editor of the volume Advances in Cryptology—Eurocrypt ’94 and coeditor of the volume Sequences II: Methods in Communication, Security and Computer Science (Springer-Verlag, 1993). His research interests include algorithms, data security, cryptography, communication networks, information theory, and data compression. Clemente Galdi received the laurea degree (cum laude) and the PhD degree in computer science from the University of Salerno (Italy) in 1997 and 2002, respectively. From May to September 2001, he visited Telcordia Tecnologies, New Jersey. From November 2001 to October 2004 he was a postdoctoral fellow with the Department of Computer Engineering and Informatics of the University of Patras and the Computer Technology Institute, Patras, Greece. Since April 2006, he has been assistant professor at the University of Napoli “Federico II.” His research interests include cryptography, data security and algorithms. Roberto Tagliaferri received the laurea degree in computer science from the University of Salerno, Italy, in 1984. From 1986 to 1999, he was a researcher with the Department of Computer Science at the University of Salerno. Since 2000, he has been an associate professor with the Department of Mathematics and Informatics of the University of Salerno. His research covers the area of neural nets: neural dynamics, fuzzy neural nets, clustering and data visualization techniques and their applications to signal and image processing with astronomical and geological data, bioinformatics, and medical computer-aided diagnosis. He has been cochairman of special sessions at AMSE ISIS ’97, at IJCNN ’99, IJCNN ’01, IJCNN ’03, WILF ’03, IJCNN ’04, IJCNN ’05, WILF ’05, and IJCNN ’06, and coeditor of a special issue of Neural Networks. He presented tutorials on “Learning with Multiple Machines: ECOC models versus Bayesian Framework” at IJCNN ’03 and on “Visualization of High Dimensional Scientific Data” at IJCNN ’05. He is the author of more than 100 publications in the area of neural networks. Since 1995, he has been coeditor of the Proceedings of the Italian Workshops on Neural Nets (WIRN). He was Secretary of SIREN (Societa´ Italiana Reti Neuroniche) from 1994 to 2005. Currently, he is a cochair of the Bioinformatics SIG of the INNS, a member of the Director Council of the IIASS (International Institute for Advanced Scientific Studies) E.R. Caianiello, a senior member of the IEEE, and a member of INFN, INFM, and AIIA.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Suggest Documents