Short Paper
CIKM’18, October 22-26, 2018, Torino, Italy
Towards a Quantum-Inspired Framework for Binary Classification Prayag Tiwari
Massimo Melucci
University of Padova Department of Information Engineering
[email protected]
University of Padova Department of Information Engineering
[email protected]
ABSTRACT
which category a given object belongs to [4]. Classification is a key task of ML in a variety of Data Science disciplines such as Data Mining, Information Retrieval and Recommender Systems. The effectiveness of classification algorithms are dependent upon the theory of probability, sets and vector spaces. Enhancement of the effectiveness of a classifier has been the field of in-depth research in ML over the last decades. This intensive research led to the new state-of-art classifiers, i.e. Naïve Bayes (NB), Support Vector Machine (SVM) and decision trees. Since the data is exponentially and constantly growing, the main challenge is to identify innovative methods in ML. The framework of Quantum Mechanics (QM) might be the way to sort out those challenges. Physicists have demonstrated the strength of QM for information processing. While classical computers use two states called 0 and 1, quantum computers use the superposition of the quantum states |0⟩ and |1⟩ in order to observe several different paths of measurement. Similarly to the way QM shifted the computational paradigm from bits to quantum bits, which are also known as qubits, the quantum mechanical framework can inspire the design of novel ML algorithms. It is necessary to note that the use of QM suggested in this paper is not about the use of quantum computers to perform ML tasks. Our aim is to leverage and replace the classical probability theory with the more general quantum probability theory, thus finding out about those novel algorithms that cannot be seen through the lens of a classical theoretical framework. In this paper, we implemented a quantum inspired binary classification framework, which is a step towards moving from state-of-artmodels to quantum inspired models. Our framework suggests that effectiveness of a quantum inspired classification framework can be achieved. The rest of the paper is organized as follows: Section 2 presents the background followed by SVM, NB, Chi-square statistics, Signal Detection Theory (SDT), and quantum SDT. Section 3 and 4 present the proposed methodology and our experiments, respectively. Finally, Section 5 presents the conclusion and possible future works.
Machine Learning (ML) models learn the relationship between input and output by examples and then apply the learned models to relate unseen input. Although ML has successfully been used in almost every field, there is always room for improvement. To this end, researchers have recently been trying to implement Quantum Mechanics (QM) in ML, since it is believed that quantum-inspired ML can enhance learning rate and effectiveness. In this paper, we address a specific task of ML and present a binary classification model inspired by the quantum detection framework. We compared the model to the state of the art. Our experimental results suggest that the use of the quantum detection framework in binary classification can improve effectiveness for a number of topics of the RCV-1 test collection and that it may still provide ways to improve effectiveness for the other topics.
KEYWORDS Machine Learning, Quantum Detection, Binary Classification ACM Reference Format: Prayag Tiwari and Massimo Melucci. 2018. Towards a Quantum-Inspired Framework for Binary Classification. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA, 4 pages. https://doi.org/10. 1145/3269206.3269304
1
INTRODUCTION
Machine Learning (ML) is the field of computer science in which algorithms learn patterns from data and predict outputs for given unknown input data. ML has been applied successfully in several areas, including robotics, brain computer interface, and chemistry. As part of both Statistics and Artificial Intelligence, ML models can process huge amounts of data for jobs that come naturally to the human brain, such as finding patterns for speech recognition, image recognition, email spam filtering, financial risk estimation, etc., are other applications of ML [7]. The notion of classification has been introduced to capture the extent and abstraction of the problem space to which the given object belongs. For a given set of classes, one attempts to decide
2 BACKGROUND 2.1 Chi-Square
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. CIKM ’18, October 22–26, 2018, Torino, Italy © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6014-2/18/10. . . $15.00 https://doi.org/10.1145/3269206.3269304
Chi-square feature selection has been used to select the most important features. χ 2 is calculated between every target and feature to choose the features with the highest χ 2 scores. In general, χ 2 is used to test the interdependency of two events. χ 2 can be calÍ culated as, χ 2 = ni=1 (OC − EC )2 / EC where OC is the number of observations in the class C and EC is the number of expected observations in the class C.
1815
Short Paper
CIKM’18, October 22-26, 2018, Torino, Italy
Presence of signal (1) Absence of signal (0) Decision as Yes (1) Detection (C 11 ) False Alarm (C 01 ) NB is a probabilistic learning model. For a given set of documents Decision as No (0) Miss (C ) Rejection (C 00 ) 10 D ∈ S, a document space is represented by S and C ∈ {C 1 , C 2 , C 3 , . . . , C j } Table 1: Summary of decision about the presence / absence of is the set of classes. The probability for a given document D being Î signal. For each actual state and decision the corresponding in the class C can be estimated as Pr(C |D) ∝ Pr(C) Pr(tr |C). 0≤r ≤n D outcome has costs. The conditional probability of term occurrence tr in a document for the given class C is represented by Pr(tr |C), which also explains how much evidence tr provides about C. Prior class probability is Pr(C). a xsource receiver channel
2.2
Naïve Bayes Classification
2.3
Support Vector Machine
SVMs consider the input data (documents) as a point in the geometrical space. In the case of binary classification, the objective is to find the best hyperplane that distinguishes between positive and negative documents. In terms of flexibility, SVM uses a set of feature functions instead of event space as in NB. In the event of linear separable data, x − and x + are the closest negative and positive training points to the hyperplane. The margin can be written as margin = |w.x − | + |w.x + | / ||w ||. A vector w has to be found to solve the optimization problem of finding the best hyperplane. The optimization problem can be stated as finding ||w || 2 /2 subject to w.x i ≥ 1, ∀i, class(i) = positive and w.x i ≤ −1, ∀i such that class(i) = negative. In the real world, few datasets are linearly separable; on the contrary, most of them are non-linearly separable.
2.4
Figure 2: Classical communication system
source
a-
coder
|ϕ⟩ receiver channel
x-
Figure 3: Quantum communication system
the analogy between decision in SDT and decision in ML to define the methodology presented in Section 3.
2.5
Quantum Mechanics
The essence of QM lies in the indeterminacy of measurement. While measurement is deterministic in the classical world because the real state of a system can exactly be measured several times, randomness is to the contrary an intrinsic element of the quantum world. When a quantum system is repeatedly measured, the series of measurements may produce different results even though the initial conditions of the system being observed are always the same. This is not due to the measurement error occurring in the classical world – it is rather due to the intrinsic randomness of the state of the system. From a mathematical point of view, quantum randomness of the state is described by a complex wave function whose squared modulus gives the probability density of the random result variable.
Signal Detection Theory
SDT gives a general structure to depict and examine choices made in ambiguous and uncertain circumstances. SDT is based on signal, noise and decision. Source, channel and receiver are the components of the signal detection theory. SDT has been utilized widely in the field of psychophysics, that is, the field that examines the connection between stimulus and its psychological effect. SDT requires some inference about how any kind of decision is taken under uncertainty. There are three grounded questions in SDT, i.e. (a) are we sure about the signal, (b) how can we know that it is the correct signal and (c) how do we decide to act or not?
2.6
Quantum SDT
In quantum SDT there is a coder between the source and the channel as depicted in Figure 3. On the one side, the coder encodes the signal into a particle (e.g. a qubit) and assigns to the particle a pure state described by its pure state vector |ϕ⟩. On the other side, a measurement is performed like the classical detection framework. The key difference between the classical framework and the quantum framework lies in what an encoder encodes to and what a decoder decodes to. [3] In the classical framework the initial encoding is formulated as a classical-classical (c-c) mapping from a symbol to a wave to be sent to the corrupting channel; the decoding provides a c-c mapping again from the corrupted wave to a symbol. In sum, in a classical framework all the operations are performed according to c-c mappings. In the quantum framework system the classical symbol is transmitted through quantum states. Then the initial encoding becomes a classical-quantum (c-q) mapping from the symbol to the quantum
Figure 1: Signal Detection Theory (SDT) The main task of SDT is to classify the observed impulse, whether it is generated by the noise or signal [1, 5, 8]. A good way to understand SDT to take the example of an Information Retrieval (IR) system. A user has to decide whether the retrieved documents fulfill the user’s requirement (i.e. it is signal) or not (i.e. it is noise). The objective is to discriminate signal and noise from observation through statistical decision based on probability and costs of decision w.r.t. the actual state as summarized by Table 1. We leverage
1816
Short Paper
CIKM’18, October 22-26, 2018, Torino, Italy
4
state selected from a finite set of states. The quantum channel provides a q-q mapping from the quantum state to a corrupted version thereof. The decoding provides a q-c mapping. Informally speaking, the potential of the quantum framework is the larger number of degrees of freedom in setting the decoder which may be set to measure different observables. While the classical symbols, on the one hand, are selected by a finite set (e.g. a feature vocabulary), the c-q and q-c mappings can leverage on more effective signals in the sense that these signals are less susceptible to classification errors.
3
A BINARY CLASSIFICATION FRAMEWORK BASED ON QUANTUM SDT
We assume that a document is in a binary state with respect to a single topic in the sense that the document either belongs to a topic or it does not. Mathematically speaking, we utilize a projector P for a given topic to quantify whether a test document belongs to the topic or not. To decide whether a document belongs to a topic, P is checked against a vectorial representation of the document. Suppose there are n distinct features in the collection of documents. Each document d of the document collection can be represented as a feature vector of n features: Wd = (Wd1 , . . . ,Wdn ); each element in the feature vector is a non-negative integer such N has as frequency. Each document D i in the training set {D i }i=1 a binary label C(D i ) ∈ {0, 1}; each document D j in the test set M . The main objective is to acquire the has a feature vector {D j }j=1 unknown binary label for every document in the test set. By following the quantum SDT, our proposed algorithm starts computing the density operators ρ 0 and ρ 1 , respectively, from the training documents of the negative (i.e. C(D i ) = 0) and positive examples (i.e. C(D i ) = 1). In order to obtain these density operators, for each distinct feature, we first compute the number of documents with the non-zero values on this feature. In this way, one n-dimensional vector |v⟩ is produced from every class, thus achieving two vectors |v 0 ⟩ and |v 1 ⟩. These vectors can be considered as statistics of the feature in a class. We normalize the obtained vectors to obtain | ⟨v |v⟩ | 2 = 1. Then, we measure the outer space in order to acquire density operators ρ 0 and ρ 1 : ρ0 =
|v 0 ⟩⟨v 0 | tr (|v 0 ⟩⟨v 0 |)
ρ1 =
|v 1 ⟩⟨v 1 | tr (|v 1 ⟩⟨v 1 |)
5
η>0
β