On the Universality of Memcomputing Machines - arXiv

1 downloads 0 Views 1016KB Size Report
Dec 23, 2017 - On the Universality of Memcomputing Machines. Yan Ru Pei, Fabio L. Traversa, and Massimiliano Di Ventra. Abstract—Universal ...
1

On the Universality of Memcomputing Machines

arXiv:1712.08702v1 [cs.NE] 23 Dec 2017

Yan Ru Pei, Fabio L. Traversa, and Massimiliano Di Ventra

Abstract—Universal memcomputing machines (UMMs) [IEEE Trans. Neural Netw. Learn. Syst. 26, 2702 (2015)] represent a novel computational model in which memory (time non-locality) accomplishes both tasks of storing and processing of information. UMMs have been shown to be Turing-complete, namely they can simulate any Turing machine. In this paper, using set theory and cardinality arguments, we compare them with liquid-state machines (or “reservoir computing”) and quantum machines (“quantum computing”). We show that UMMs can simulate both types of machines, hence they are both “liquid-” or “reservoircomplete” and “quantum-complete”. Of course, these statements pertain only to the type of problems these machines can solve, and not to the amount of resources required for such simulations. Nonetheless, the method presented here provides a general framework in which to describe the relation between UMMs and any other type of computational model. Index Terms—memory, elements with memory, Memcomputing, Turing Machine, Reservoir Computing, Quantum Computing.

I. I NTRODUCTION Memcomputing stands for “computing in and with memory” [1]. It is a novel computing paradigm whereby memory (time non-locality) is employed to perform both tasks: storing and processing of information on the same physical location. This paradigm is substantially different than the one implemented in our modern-day computers [2]. In these machines there is a separation of tasks between a memory (storage) unit and one that performs the processing of information. Modern-day computers are the closest physical approximation possible to the well-known (mathematical) Turing paradigm of computation that maps a finite string of symbols into a finite string of symbols in discrete time [3]. The memcomputing paradigm has been formalized by Traversa and Di Ventra in Ref. [4]. In that paper it was shown that universal memcomputing machines (UMMs) can be defined as digital (so-called digital memcomputing machines (DMMs) [5]) or analog [6], with the digital ones offering an easy path to scalable (in terms of size) machines. UMMs have several features that make them a powerful computing model, most notably intrinsic parallelism, information overhead, and functional polymorphism [4]. The first feature means that they can operate collectively on all (or portions of) the data at once, in a cooperative fashion. This is reminiscent of neural networks, and indeed neural networks can be viewed as special cases of UMMs. However, neural networks do not have “information overhead”. This feature is related to the topology (or architecture) of the network of memory units (memprocessors). It means the machine has YRP and MD are with the Department of Physics, University of CaliforniaSan Diego, 9500 Gilman Drive, La Jolla, California 92093-0319, USA, (email: [email protected], [email protected]). FLT is with MemComputing Inc., San Diego, CA 92130 USA (e-mail: [email protected]).

access, at any given time, to more information (precisely originating from the topology of the network) than what is available were the memprocessors not connected to each other. Of course, this information overhead is not necessarily stored by the machine (the stored information is the Shannon one) [4]. Nevertheless, with appropriate topologies, hence with appropriate information overhead, UMMs can solve complex problems very efficiently [4]. Finally, functional polymorphism means that the machine is able to compute different functions without modifying the topology of the machine network, by simply applying the appropriate input signals [4]. In Ref. [4] it was shown that UMMs are Turing-complete, meaning that UMMs can simulate any Turing machine. Note that Turing-completeness means only that all problems a Turing machine can solve, can also be solved by a UMM. It does not imply anything about the resources (in terms of time, space and energy) required to solve those problems. The reverse, namely that all problems solved by UMMs are also solved by Turing machines (namely equivalence between the two models of computation) has not been proved yet. Nevertheless, a practical realization of digital memcomputing machines, using electronic circuits with memory [5], has been shown to be efficiently simulated with our modernday computers (see, e.g., [7]). In other words, the ordinary differential equations representing DMMs and the problems they are meant to solve can be efficiently simulated on a classical computer. In recent years, other computational models, each addressing different types of problems, have attracted considerable interest in the scientific community. On the one hand, quantum computing, namely computing using features, like tunneling or entanglement, that pertain only to quantum phenomena, has become a leading candidate to solve specific problems, such as prime factorization [8], annealing [9], or even simulations of quantum Hamiltonians [10]. This paradigm has matured from a simple proposal [11] to full-fledged devices for possible commercial use [12]. However, its scalability faces considerable practical challenges, and its range of applicability is very limited compared to even classical Turing machines. Another type of computing model pertains to a seemingly different domain, the one of spiking (recurrent) neural networks, in which spatio-temporal patterns of the network can be used to, e.g., learn features after some training [13]. Although somewhat different realizations of this type of networks have been suggested, for the purposes of this paper, here we will focus only on the general concept of “reservoir-computing” [14], and in particular on its “liquid-state machine” (LSM) realization [15], rather than the “echo-state network” one [16]. The results presented in this paper carry over to this other type of realization as well. Therefore, we will use the term “reservoir-computing” as analogous to “liquid-state machine”

2

and will not differentiate among the different realizations of the former. Our goal for this paper is to study the relation between these seemingly different computational models. In particular, we will show that UMMs encompass both quantum machines and liquid-state machines, in the sense that, on a theoretical level, they can simulate any quantum computer or any liquidstate machine. Again, this does not say anything about the resources needed to simulate such machines, only that such a mapping exists. In other words, we prove here that UMMs are not only Turing-complete, but also “liquid-complete” (or “reservoir-complete”) and “quantum-complete”. In order to prove these statements, we will use set theory and cardinality arguments to show that the LSM and quantum computers can be mapped to subsets of the UMM. This methodology is general. Therefore, we expect it to be applicable to any other type of computational model, other than the ones we consider in this work. Our paper is organized as follows. In Sec. II we briefly review the mathematical definition of the machines we consider in this work, starting with UMMs. In Sec. III we introduce the basic ingredients of set theory that are needed to prove the completeness statements that we will need later. In Sec. IV we show how to define a general computing model in terms of set theory and cardinality arguments. We will use these results in Sec. V to re-write UMMs, quantum computers and liquid-state machines in this set-theory language. This allows us to show explicitly in Sec. VI that UMMs are not only Turing-complete, but also quantum-complete and liquid-state complete. In Sec. VII we offer our conclusions and thoughts for future work.

directs the machine to what cells to read as inputs. Depending on what is being read, the transition function then writes the output on a new set of cells (not necessarily distinct from the original ones). The transition function also contains information on what cells to read next, and what transition function(s) to use for the next step. Using this definition, we can better explain what the two properties of intrinsic parallelism and functional polymorphism mean. Unlike a Turing machine, the UMM does not operate on each input cell individually. Instead, it reads the input cells as a whole, and writes to the output cells as a whole. Formally, what this means is that the transition function cannot be written as a Cartesian product Q of transition Q functions acting on individual cells, namely δ( i Mi ) 6= i (δi (Mi )). This is the property of intrinsic parallelism. We will later show that the set of transition functions without intrinsic parallelism is, in fact, a small subset of the set of transition functions with intrinsic parallelism. Furthermore, the transition function of the UMM is dynamic, meaning that it is possible for the transition function to change after each time step. This is shown as the set A at the output of the transition function, whose elements indicate what transition functions to use for the next time step. This is the property of functional polymorphism, meaning that the machine can admit multiple transition functions. Finally, the topology of the network is encoded in the definition of the transition functions which map a given state of the machine into another. A more in depth discussion of these properties can be found in the original paper [4].

II. R EVIEW OF M ACHINE D EFINITIONS

Informally, we can think of the LSM as a reservoir of water [16]. The process of sending the input signals into the machine is analogous to us dropping stones into the water, and the evolution of the internal states of the machine is the propagation of the water waves. Different waveforms will be excited depending on what stones were being dropped, how and when they were dropped, and the properties of the waveforms will encompass the information of the stones being dropped. Therefore, we can train a function that maps the waveforms to the corresponding output states that we desire. Formally, we can define the machine using a set of filters and a trained function [13]. A series of filters defines the evolution of the internal states, and the trained function maps the internal states to some output. The set of filters must satisfy the point-wise separation property. This is defined as follows:

We first provide a brief review of the definitions of the three machines we will be discussing in this paper, so the reader will have a basis of reference. A. Universal Memcomputing Machines The UMM is an ideal machine formed by interconnected memory cells (“memcells” for short or memprocessors). It is set up in such a way that it has the properties we have anticipated of intrinsic parallelism, functional polymorphism, and information overhead [4]. We can define a UMM as the eight-tuple [4]: (M, ∆, P, S, Σ, p0 , s0 , F )

(1)

where M is the set of possible states of a single memory cell, and ∆ is the set of all transition functions: 0

δa : M ma \ F × P → M ma × P 2 × A

(2)

where ma is the number of cells used as input (being read), F is the final state, P is the set of input pointers, m0a is the number of cells used as output (being written), P 2 is the Cartesian product of the set of output pointers and the set of input pointers for the next step, and A is the set of indices a. Informally, the machine does the following: every transition function has some label a, and the chosen transition function

B. Liquid-State Machines

Definition 1. A class B of basis filters has the pointwise separation property if for any two input functions u and v, with u(s) 6= v(s) for some s ≤ t, there is a basis filter b ∈ B such that (b ◦ u)(t) 6= (b ◦ v)(t). This means that we can choose a series of filters such that the evolution of the internal states will be unique to any given signal, at any given time. In other words, this property ensures that different ”stones” excite different ”waveforms”.

3

The trained output function must satisfy the “fading memory” property. This is defined as follows: Definition 2. F : U → Rn has fading memory if for every internal state u ∈ U and every  > 0, there exist δ > 0 and T > 0 so that |(F v)(0) − (F u)(0)| <  for all v ∈ U with |u(t) − v(t)| < δ for all t ∈ [−T, 0]. Intuitively, this means that we do not need to know the evolution of the internal states from the infinite past to determine a unique output. Instead, we only need to know the evolution starting from a finite time −T . C. Quantum Computers There are many ways in which one can define a quantum computer. For simplicity sake, we consider the most general model of quantum computing and ignore its specific operations. Consider a quantum computer with n identical qubits, and each qubit can be expressed as a linear combination of m basis states. The choice of the basis states can be arbitrary. However, they have to span the entire Hilbert space of the system. If we look at one single qubit, then every state that it admits can be expressed in terms of a linear combination of the m basis states. (Typically m is chosen equal to 2, but we do not restrict the quantum machine to this basis number here.) Pm−1 In other words, |ψi = i=0 ci |ii, where i simply labels the basis state, and ci is some complex number. Note that we have Pm−1 to impose the normalization condition i=0 |ci |2 = 1. Now, let us consider the whole system. In general, total Pstate can be expressed as |ψi = Pm−1the Pm−1 m−1 i1 =0 i2 =0 ... in =0 ci1 i2 ...in |i1 i2 ...in i. Here, ij denotes that the j-th qubit is in the i-th basis state. A basis state of the total wavefunction can be expressed as the tensor product of the basis states of the individual qubits. Then it is not hard to see that the total state will have a total number of mn basis states. Each basis state is associated with some complex factor ci1 i2 ...iP where, again, the normalization n Pm−1 Pm−1 m−1 2 condition ... i1 =0 i2 =0 in =0 |ci1 i2 ...in | = 1 has to be imposed. Any quantum algorithm can be expressed as a series of unitary operations [17]. Since the product of multiple unitary operations is still a unitary operation, we can then express ˆ (t), the total operation after time t with a single operator U ˆ so that |ψ(t)i = U (t) |ψ(0)i. In other words, the state of the quantum system at any point in time can be expressed ˆ (t) as some unitary operation on the initial state. Note that U ˆ can be either continuous or discrete in time. Either way, U (t) can be considered as the “transition function” of the quantum computer. Finally, we have to make measurements on the system in order to convert the quantum state into some meaningful output for the observable we are interested in. The process of measurements can be considered as finding the expecˆ so the output function tation value of some observable O, ˆ of a quantum computer can be written as hψ(t)|O|ψ(t)i = †ˆˆ ˆ hψ(0)|U (t) OU (t)|ψ(0)i. Of course, to obtain an accurate result of the expectation value, many measurements have to

be made. In fact, the initial state has to be prepared multiple times, evolved multiple times, and the corresponding expectation value at a given time, must be measured multiple times. A quantum computer is thus an intrinsically probabilistic-type of machine. III. M ATHEMATICAL T OOLS After the description of the three machines we consider in this work, we now introduce the necessary mathematical tools that will allow us to construct a general model of a computing machine using set theory and cardinality arguments. Most of the definitions and theorems in this section, with their detailed proofs can be found in the literature on the subject (see, e.g., the textbook [18]). We denote the cardinality of the set of all natural numbers as ℵ0 (|N| = ℵ0 ), and the cardinality of the set of all real numbers as c (|R| = c). Then, one can prove the following theorems [19]: Theorem III.1. The power set of natural numbers has cardinality c, |2ℵ0 | = 2|ℵ0 | = c. Theorem III.2. Any open interval of real numbers has cardinality c. We can generalize the concept of infinity by introducing Beth numbers [20], defined as follows: Definition 3. Let i0 = ℵ0 , and iα+1 = 2iα for all α ∈ N. By this definition, we see that c = i and 2c = i . Each 1

2

Beth number is strictly greater than the one preceding it. The following theorem [18] allows us to perform arithmetics on infinite cardinal numbers, and derive relationships between Beth numbers. Theorem III.3. Given any two numbers, µ and κ, if at least one of them is infinite, then µ + κ = µκ = max{µ, κ}. Using this theorem, we can prove the following properties of Beth numbers: Corollary III.3.1. iβ iα = iβ + iα = iβ for all α, β ∈ N, where α ≤ β. Corollary III.3.2. µiα = iα+1 if 2 ≤ µ ≤ iα+1 . (iα )κ = iα if 1 ≤ κ ≤ iα−1 Proof. Corollary III.3.1 can be proven trivially if we note that each Beth number is greater than its predecessor. Proof. Corollary III.3.2 can be proven as follows. By definition 2iα = iα+1 . Furthermore, (iα+1 )iα = (2iα )iα = 2(iα iα ) = 2iα = iα+1 , where iα iα = iα from Corollary III.3.1. Since 2 ≤ µ ≤ iα+1 , then iα+1 = 2iα ≤ µiα ≤ (iα+1 )iα = iα+1 . This implies µiα = iα+1 . The second equality can be proven in the same vein. In the following section, we will define computing models using Cartesian products and mapping functions. The following two theorems will be helpful [18]: Theorem III.4. Let the set S be the Cartesian product of the sets S1 and S2 , or S = S1 × S2 . Then |S| = |S1 ||S2 |.

4

Theorem III.5. Let f : S → T be a function that maps set S to set T , and let F be the set of all possible functions f . Then |F | = |T ||S| . Finally, we introduce the following theorem, which can be derived directly from the definition of cardinality [18]: Theorem III.6. Two sets have the same cardinality if and only if there is a bijection between them. This theorem implies that there is a bijection between any two real coordinate spaces regardless of their dimensions: Corollary III.6.1. There is a bijection between Rn and Rm , for any n, m ∈ N. Proof. From Corollary III.3.2, we see that |Rn | = |R|n = (i1 )n = i1 . Similarly, |Rm | = i1 . Therefore, the two sets have the same cardinality, so there is a bijection between the two. Note that for complex coordinate spaces, |Cn | = |C|n = |R2 |n = |R|2n = (i1 )2n = i1 . Therefore, there is a bijection between any two complex coordinate spaces as well. Furthermore, there is a bijection between any complex coordinate space and any real coordinate space. IV. G ENERAL C OMPUTING M ODEL We have now introduced all the mathematical tools necessary to define a computing machine using set theory. We begin by describing the Cartesian product of two sets - the set of all internal states and the set of all transition functions. A. Internal States Consider a general computing machine. We let the state variable s describe the full internal state of the machine, which belongs to a set S of states. The internal state should encompass all the necessary information, such that given this state variable s and any transition function δ (which will be defined shortly), the machine can fully determine the next internal state. The important thing to note is that not all the elements of S are necessarily possible internal states of the machine. In other words, the set of all possible internal states is only a subset of S. The set S and this subset are however not necessarily equivalent. The reason for this will be explained in greater detail later in Sec. IV-C. As an example, consider the full internal state of the Turing machine. The internal state should include the Cartesian product of three states - the register state of the control (s(1) ), the tape symbols written on the cells (s(2) ), and the current address of the head (s(3) ) [21]. Given these three states and some transition function (which depends on how the machine is coded), then the processor will know what to write (thereby changing s(2) ), in what direction to move (thereby changing s(3) ), and what new state to be in (thus changing s(1) ). We see that the set S is expressible as a Cartesian product of three sets, S = S (1) × S (2) × S (3) , then |S| = |S (1) ||S (2) ||S (3) |. Consider a Turing machine with m tape symbols, n tape cells, and k register states. It is easy to see that |S (1) | = k, |S (2) | = mn , and |S (3) | = n. Therefore, we can

Fig. 1. The set of all transition functions for |S| = 2. There are 22 = 4 transition functions in total. A transition function on this set is described by two red arrows.

calculate |S| = |S (1) ||S (2) ||S (3) | = kmn n < |N| = i0 . The strict inequality comes from the fact that m, n, and k are all finite numbers. In other words, this is a finite digital/discrete machine, and in general, the |S| < i0 is true for a finite digital machine. On the other hand, if we consider the theoretical model of a Turing machine with infinite tape cells µ, then the calculation changes significantly. First, note that µ = |N| = i0 because we are working with infinite discrete cells and we can map each cell to an integer in N. Then, we have |S (1) | = k, |S (2) | = mi0 = i1 (from Corollary III.3.2), and |S (3) | = i0 . Therefore, we see that |S| = |S (1) |S (2) ||S (3) | = ki1 i0 = i1 (from Corollary III.3.1). For the purpose of proving Turingcompleteness, we use this model of infinite tape. B. Transition Function 1) General Definition of Transition Function: The operation of any computing machine is defined using a transition function [21] (or a set of transition functions such as in a general UMM [4]). However the transition functions we consider are defined as “deterministic” in contrast to the “nondeterministic” transition functions used for example to define the non-deterministic Turing Machine [3]. Here, we give a much more general definition of the (deterministic) transition function. Essentially, we are throwing away constraints such as initial states, accepting states, tape symbols, and so forth, and simply define the transition function as a mapping from some state to some other (or the same) state. We can then formally define the transition function as follows: Definition 4. Let S be any set, then ϕS is said to be a transition function on S if, for every s ∈ S, we have a unique ϕS (s) ∈ S. In other words, ϕS is a function that maps S to itself, or ϕS : S 7→ S. We denote the set of all transition functions on S as ΦS . From Theorem III.5, it is easy to see that the cardinality of ΦS is simply |ΦS | = |S||S| . Again, it is important to note that the set of the actual transition functions of a given machine is a subset of all possible transition functions ΦS . The two are generally not equivalent. This is because we are not defining

5

the transition function based on the operation of the machine. Instead, we are defining the transition function as the mapping of a set to itself (see Fig. 1 for a schematic representation), so there are transitions that are impossible for the machine to support. The difference between the “possible” and “actual” transition functions will be formalized in section IV-C. 2) Turing Machine as Example: For a Turing machine, after the machine is coded, the transition function remains stationary and cannot be changed during the execution of an algorithm (unlike a UMM where the transition function is dynamic). In other words, the machine will take some initial internal state si and apply some transition function ϕS to it recursively until the final state sf is reached and the machine halts. We can express this as sf = ϕS (...ϕS (ϕS (si ))), where n is some {z } | n iterations

integer. Furthermore, when the final state is reached, we should have sf = ϕS (sf ), meaning that the transition function should not alter the final state, and this represents the termination of the algorithm. The entire machine process can be fully described given some initial state si and some transition function ϕS . To show this, we only have to consider a single step process, or show that there is an appropriate choice of ϕS such that given some state s, ϕS (s) will always give us the expected next state for any algorithm. The following argument will demonstrate why this is true. From section IV-A, we know that given a state si , we can (1) (2) (3) divide the state into three components, si = si × si × si . Recall that the three components give the register state, the tape symbols on the tape cells, and the address of the head respectively. First, the machine reads the tape symbol under the head. This is equivalent to the transition function taking (3) (2) the input si (locating the head) and si (reading the tape symbol). Then, according to what is being read and the register state of the control, the head writes a new symbol on the tape cell. This corresponds to the transition function taking (1) (2) the input si (reading the register state) and outputting si0 (updating the tape symbol). Finally, the machine moves the head and changes the register state. This is equivalent to the (1) transition function outputting si0 (updating the register state) (3) and si0 (updating the address). Therefore, we see that a single state machine process is fully encompassed in the transition si → si0 . For every si ∈ S, there exists a unique si0 ∈ S associated with it. Then we can choose ϕS ∈ ΦS such that ϕS (si ) = si0 is satisfied for every si ∈ S. Therefore, we see that ϕS fully describes the single step machine process, and this implies that recursions of ϕS can describe any Turing machine algorithm. Even though one can find a ϕS ∈ ΦS for every Turing machine algorithm, the reverse is not true. In fact, there are transition functions in ΦS that the Turing machine cannot support. For example, consider the operation of simultaneously writing two tape symbols on two cells. This cannot be done by the Turing machine since by definition, a Turing machine can only write one tape symbol on one tape cell at a time. However, this transition function is not excluded a priori from (2) ΦS , because we can always find a ϕS ∈ ΦS , such that si and

Fig. 2. Let S and T be two sets of the same cardinality, and label the set elements such that si pairs with ti . The blue arrows represent a single transition function for each set. Informally, the pairing of the two transition functions under the “proper bijection” is represented by the purple arrows that pair the arrows in set S with their respective counterparts in set T .

(2)

(2)

si0 = ϕS (si ) differ by two tape symbols for some si ∈ S. 3) Proper Bijection: Let us first consider the following important theorem: Theorem IV.1. If |S1 | = |S2 |, then |ΦS1 | = |ΦS2 |. The theorem can be easily proven using cardinality arguments (from Theorem III.6). In other words, if there exists a bijection between two sets, then there must also be a bijection between the sets of their respective transition functions. Note that at this point, we have only proved that there is a bijection between the two transition function sets. However, in our case, we are really only interested in one particular bijection. For simplicity, let us define this bijection for the |S| = i1 case. This definition will easily generalize to cases where |S| > i1 . If |S| = i1 , then there is a bijection from S to R. Denote this bijective function as R = f (S). Let ϕS ∈ ΦS be some transition function on S, and for ∀si ∈ S, let sj = ϕS (si ), where i, j ∈ R are just arbitrary real number labels. Then we can define a bijection from S to R such that f (si ) = i and f (sj ) = j. Note that if we let ϕR = f ◦ ϕS ◦ f −1 , then j = ϕR (i). Now let T be another set such that |T | = |S|, then similarly, we can let g(ti ) = i and g(tj ) = j. Now if we let ϕT = g −1 ◦ ϕR ◦ g, then tj = ϕT (ti ). We can then define ϕT as the proper bijection of ϕS , and we do this for every single element of the transition function set. To put this informally, we first define some bijection from set S to set T , then if s bijects to t, then ϕS (s) also bijects to ϕT (t). The proper bijection essentially maps all machine operations from one machine to another machine. See Fig. 2 for a schematic representation of a pairing of transition functions under the proper bijection. The following theorem is a restatement of the above dis-

6

cussion: Theorem IV.2. If |S| = |T |, then there is a proper bijection from ΦS to ΦT . It is easy to see that a proper bijection implies that any algorithm that the machine S × ΦS can run, can also be run by machine T × ΦT . This essentially means that S and T are equivalent [22]. However, this is under the assumption that there are absolutely no constraints whatsoever on the transition functions and/or states of the machines. As we will see shortly, this is generally not the case. C. Constraints The above naive argument essentially states that if the internal states of two different machines have the same cardinality, then the two machines are equivalent. This is obviously not the case. Any computational model has always some constraints, either on the internal states or the transition functions (or both) of the machine at hand. In other words, the actual set of internal states S 0 is a subset of S, and the actual set of transition functions Φ0S is a subset of ΦS . In some sense, the computational structure of a machine is defined by its constraints, so we should carefully discuss what constraints actually are. 1) Constraints on Internal States: We make the following distinction between the full set of internal states and the actual set of internal states. Definition 5. Let S 0 be the set of all possible internal states that a machine can support. We call this the actual set of internal states. Let S be some arbitrary superset of S 0 . We call S the full set of internal states. As an example, consider a very simple computer that only consists of two bits, with each bit supporting two states, 0 and 1. However, the two bits are connected in such a way that they must have the same state at a given time. In other words, the actual set of internal states is S 0 = {00, 11}, where ij denotes the first bit being in state i and the second bit being in state j. A natural choice for the full set is instead S = {00, 01, 10, 11}, namely the set of internal states if the two bits were not connected (constrained). However, we may as well have chosen S to be, for example, S = {00, 01, 10} or S = {00, 01, 10, 11, −10}. (Note that the last element of the second set, −10, represents a state that is not supported by the machine, even if the constraint were to be removed.) However, it is usually convenient for us to choose the full set such that it includes, and only includes, all the possible states when the constraints on the machine are removed. 2) Constraints on transition function: Following a similar reasoning, we distinguish between the full set of transition functions and the actual set of transition functions. The full set of transition functions on S is already defined in Def. 4, and the definition of the actual set is as follows: Definition 6. Let Φ0S be the set of all transition functions of a machine when the constraints on the machine are included. We call this the actual set of transition functions.

Following the above example, S = {00, 01, 10, 11} and S 0 = {00, 11}. For convenience, let us express the set elements in decimal representation. Then S = {0, 1, 2, 3} and S 0 = {0, 3}. We can then choose Φ0S so that it only contains 4 transition functions, and we let them to be ϕS1 (n) = n, ϕS2 (n) = (n + 1) (mod 4), ϕS3 (n) = (n + 2) (mod 4), and ϕS4 (n) = (n + 3) (mod 4). Note that this is obviously not the full set of transition functions because, for example, there is no transition function such that ϕSi (0) = 2 ∧ ϕSi (2) = 3 (where ∧ is the logical AND) is satisfied. In fact, the full set of transition functions has cardinality of 44 = 256, and we merely considered 4 of them. In addition, the transition functions Φ0S and ΦS 0 are not the same. ΦS 0 is the full set of transition functions on the actual set of internal states S 0 . To illustrate this difference, consider the previous example of S 0 = {0, 3}. Let us try to find the full set of transition functions on this set. There should be 22 = 4 possible transition functions. They are ϕS 0 1 (n) = n, ϕS 0 2 (n) = 3 − n, ϕS 0 3 (n) = 3, and ϕS 0 4 (n) = 0. Without any formal proof, we can already see that ΦS 0 is very different from Φ0S , and there is no proper bijection between Φ0S and ΦS 0 even though they have the same cardinality of 4. This is because the sets that the transition functions act on are different (they have different cardinality), so it makes no sense to pair them together. An important aside is that the transition function is defined by its operation on every element of a particular set and not its analytic representation. For example, ϕS 0 (n) = 3 − n is different from ϕS (n) = 3 − n even though they have the same analytic expression. (For example, ϕS 0 (2) is not defined while ϕS (2) = 1.) On the other hand, ϕS 0 i (n) = 3 − n and 2 ϕS 0 j (n) = 3 − n3 are the same on set S 0 even though their analytic expressions are different. (They are, however, not the same on set S.) D. Equivalence and Completeness At this point we can recall the traditional definitions of “equivalence” and “completeness” among machines. The equivalence relationship is defined as follows [22]: Definition 7. Two machines, S × Φ0S and T × Φ0T , are said to be equivalent if S × Φ0S can simulate T × Φ0T , and T × Φ0T can simulate S × Φ0S . Then the following theorem is clearly true: Theorem IV.3. Two machines, S × Φ0S and T × Φ0T , are equivalent if |S| = |T |, and there is a proper bijection between Φ0S and Φ0T . This theorem is fairly obvious. If there is a proper bijection between two sets of transition functions, then we can map every machine operation from the first machine to the second machine, and vice versa. This implies that the two machines can simulate each other. Note that from Section IV-B3, it is clear that there is always a proper bijection between the two full sets of transition functions ΦS and ΦT if |S| = |T |, so if two equal-sized machines admit the full sets of transition functions, then the two machines are equivalent.

7

Now, the natural next step is to define the concept of completeness, but before we do so, let us first introduce the concept of a reduced machine. Definition 8. Consider a machine S × Φ0S . If S 0 ⊂ S, then we call S 0 × Φ0S 0 the reduced machine of S × Φ0S . It is easy to obtain mathematically a reduced machine from a full machine. First, we let S 00 = S\S 0 = {s|s ∈ S ∧ s ∈ / S 0 }, 0 00 then we can express the full set as S = S ∪ S (note this is only true if S 0 ⊆ S). It is not hard to show that Φ0S 0 × Φ0S 00 ⊆ Φ0S , and from this subset we can simply “ignore” the Φ0S 00 factor. Then it is clear that S × Φ0S simulates S 0 × Φ0S 0 , but not the other way around. Informally, we are using a machine with greater “resources” to simulate another machine with lesser “resources”. Let us then recall the conventional definition of completeness [22]: Definition 9. Machine, S × Φ0S , is T × Φ0T -complete if the former can simulate the latter. From this definition, it is clear that, for the example above, S × Φ0S is S 0 × Φ0S 0 -complete. Note also that equivalence implies completeness, but the reverse is not true. It is also obvious that S ×ΦS can simulate S ×Φ0S , since Φ0S is just a subset of the full set of machine processes (transition functions). Then we can also say that S × ΦS is S × Φ0S complete. Informally, we are using an unconstrained machine to simulate a constrained machine. Let us then recall a few obvious lemmas: Lemma IV.4. If machine A is equivalent to machine B, and machine B is equivalent to machine C, then machine A is equivalent to machine C. Lemma IV.5. If machine A is B-complete, and machine B is C-complete, then machine A is C-complete. With these lemmas and preliminaries, we can then prove this important theorem: Theorem IV.6. Machine S × Φ0S is T × Φ0T -complete if |T | ≤ |S|, and if we can find a reduced machine S 0 × Φ0S 0 such that |S 0 | = |T | and there is a proper bijection between Φ0T and some subset of Φ0S 0 . Proof. The proof of this is quite simple. We know that S × Φ0S is S 0 × Φ0S 0 -complete, since the latter is just the reduced machine of the former. Furthermore, we also know that S 0 × Φ0S 0 is T ×Φ0T -complete. This is because Φ0T is just a subset of Φ0S 0 , or there is a “proper injection” from Φ0T to Φ0S 0 , implying that any machine process of T × Φ0T can be simulated by S 0 × Φ0S 0 . Therefore, from Lemma IV.5, we see that S × Φ0S is T × Φ0T complete. From this theorem, we can immediately derive a corollary that will be useful for showing the universality of memcomputing machines in section VI: Corollary IV.6.1. Machine S × ΦS is T × Φ0T -complete if |T | ≤ |S|, where ΦS is the full set of transition functions and Φ0T can either be the actual set or the full set of transition functions.

Proof. First, we find a reduced machine S 0 × ΦS 0 of S × ΦS such that |S 0 | = |T |. Note that if a machine admits the full set of transition functions, then its reduced machine also admits the full set. Since |S 0 | = |T |, then we can find a proper bijection from ΦS 0 to ΦT (since both sets of transition functions are full). Under the same proper bijective mapping, we can map Φ0T (a subset of ΦT ) to Φ0S 0 (a subset of ΦS 0 ). In other words, there is a proper bijection from Φ0T to some subset of ΦS 0 , and from Theorem IV.6, machine S × ΦS is T × Φ0T -complete. V. R EVISED M ACHINE D EFINITIONS WITHIN S ET T HEORY Up to this point, we have avoided the discussion of the concept of “output states” of a machine. However, under this new mathematical framework, this concept is easily expressible. In general, the internal state of the machine must be “decoded” into an output state to be read by the user. We can denote the set of all possible output states as F . Then it is obvious that |F | ≤ |S 0 |, otherwise we would not be able to find a function that maps S 0 to F . In other words, every internal state must correspond to a unique output state, and it is easy to show that the output function is expressible as a transition function. To show this, we first choose a subset SF0 ⊆ S 0 such that |SF0 | = |F |, then there is a bijection between F and SF0 . Therefore, we can simply describe the output mapping function as a transition function that maps S 0 to SF0 . Then it is clear that the set of all output mapping functions is a subset of ΦS 0 , so there is no need to redefine a new set to include the output functions. The mathematical framework has now been fully established, and we are ready to redefine the three machines we are considering in this work within this framework. A. Universal Memcomputing Machines In the original definition of the UMM, there is the complication of input and ouput pointers (see Eq. (1)). Within the new set-theory framework, we can avoid the concept of pointers, since the transition function always reads the full internal state (all the cells) and writes the full internal state. In other words, we consider the combination of all cell states as a whole, and make no effort in describing which cell admits which state. In this case, intrinsic parallelism is obviously implied. Let us then discuss the cardinality of the set of internal states. For a UMM, there is a finite number of cells, n < i0 , and each cell may admit a continuous state (with cardinality i1 ). In this case, it is easy to show that |S| = (i1 )n = i1 . Furthermore, in the original definition of the UMM [Eq. (1)], there are no constraints on the transition function δ. This means that we can use the full set of transition functions, ΦS , to describe the machine. In this case, functional polymorphism is obviously implied. Therefore, we can define the UMM machine as S × ΦS , with |S| = i1 . B. Liquid-State Machine It is not hard to see that the internal state structures for the LSM and the UMM are similar. Instead of memory cells, the

8

LSM has neurons. But if we make the conservative assumption that there are no constraints on the internal states of the LSM, then the cardinality of the set of internal states for this machine is the same as that of a UMM, |S| = i1 . However, what real distinguishes the LSM from the UMM is that the set of transition functions for the LSM is not full. Recall that the LSM consists of a series of filters and an output function (see Sec. II). The set of filters satisfies the point-wise separation property, and the function satisfies the fading-memory property. There is no need to express the two properties in the language of our new framework. Instead, it is enough to note that the point-wise property is a property of a set, while the fading-memory property is a property of an element of the set. Therefore, we can find a subset (ΦS )1 ⊆ ΦS such that its elements represent the filters, with the subset itself satisfying the point-wise separation property. As discussed earlier, we can express the output function as a transition function, so we can find a subset (ΦS )2 ⊆ ΦS such that its elements represent the output functions, and they satisfy the fadingmemory property. Then, we can take the union of the two subsets Φ0S = (ΦS )1 ∪ (ΦS )2 to get the actual set of transition functions on S. Therefore, we can describe the LSM as S × Φ0S , with |S| = i1 . The specific structure of the machine can be defined by expressing the two properties as constraints on Φ0S . This is a slightly tedious process, so we will not be presenting it here, since it is irrelevant for our conclusions. C. Quantum Computers Again, consider a quantum computer with n identical qubits, each having m basis states. In subsection II-C, we have shown that the total number of basis states for the entire system is mn < |N|. Each basis state is associated with some complex factor ci1 i2 ...in ∈ C. And thesePfactorsPare constrained Pm−1 by the normalization condim−1 m−1 tion i1 =0 i2 =0 ... in =0 |ci1 i2 ...in |2 = 1. Furthermore, in practice we usually ignore an overall phase factor since it does not affect the expectation value of an observable. At this point, it is clear that we can fully describe a quantum state as the Cartesian product of the complex factors for all basis states. In other words, we can represent an internal state as c01 02 ...0n × c11 01 ...0n ... × c(m−1)1 (m−1)2 ...(m−1)n , where there are mn factors. Given this information, we can calculate the cardinality of n the full set of internal states to be |S| = |Cm | − |R2 | = n 2 0 2 |Cm | ≤ |CN | = |C||N| = (|R||N| )2 = (ii 1 ) = (i1 ) = 2 i1 . (The unimportant |R | is from the normalization condition and factoring out the overall phase factor.) In addition, |S| = n |Cm | ≥ |R| = i1 . Therefore, we have |S| = i1 . The full set of internal states contains quantum states with varying degrees of entanglement. It is worth stressing though that, in practice, it is extremely hard to construct a quantum computer that can support the full set of quantum states. For example, it is very challenging to prepare 100 qubits that are fully entangled. (The current record is on the order of tens of fully-entangled qubits [23][24].) Therefore, the actual set S 0 is

a small subset of S, or S 0 ⊆ S, unless m and n are both very small. However, since we are making here only theoretical arguments, let us just assume that the full set S of all possible entangled states can be supported. As discussed previously, the transition functions of a quantum computer can be expressed as unitary operations on some initial state. The set of all unitary operations is obviously a strict subset of the full set of transition functions. For example, you cannot find an unitary operation that collapses every single state to |00...0i (setting c01 02 ...0n = 1, and setting all the other factors to 0), though this is included in ΦS . Therefore, we can describe the quantum computer as S × Φ0S . A few words on the “output function” of a quantum computer are also in order. The output function essentially represents the operation of taking the expectation value of ˆ some observable on the internal state, or hψ|O|ψi. This maps the set of internal states to the set of output states F ⊆ R (expectation values have to be real), so we obviously have |F | ≤ |S|. VI. U NIVERSALITY OF M EMCOMPUTING M ACHINES From the above discussions, we can summarize all the results we have obtained so far, and express all the machines we considered here in their most general form: 0 0 0 • Turing Machine: T × ΦT 0 , |T | ≤ i1 , 0 • Liquid-State Machine: L × ΦL , |L| = i1 , 0 • Quantum Computer: Q × ΦQ , |Q| = i1 , • Universal Memcomputing Machine: M ×ΦM , |M | = i1 . Therefore, by applying Corollary IV.6.1, we see that a UMM can simulate any Turing machine, any liquid-state machine, and any quantum computer. Let us expand on this for each pair of machines separately. In particular, let us briefly discuss how a mapping between a UMM and the three other machines can be realized in theory. Of course, this mapping does not tell us anything about the resources required for a UMM to simulate the other machines. Hence, this is by no means a discussion on how to realize an efficient or practical mapping. 1) UMM vs. Turing Machines: Let us look at the mapping between a Turing machine and a UMM. First, we map each tape cell to a memory cell (memcell). We can denote these memcells collectively as a “memtape”. The tape symbols can be mapped to the internal states of each memcell of the memtape. Then, we can map the state register to another memcell which we will denote as “memregister”. The state of the Turing machine is then stored as the internal state of the memregister. Finally, we can store the current address of the head as an internal state of yet another memcell which we denote as “memaddress”. We can then wire the memcells together into a circuit such that it simulates the operation of the Turing machine. Note that as a result of functional polymorphism, we do not have to re-wire the circuit each time we choose to run a different algorithm. The circuit first reads the memregister and the memaddress, so that it knows which memcell of the memtape to modify and how to modify it. After that memcell

9

is modified, the memregister and memaddress then update themselves to prepare for the next cycle. In short, we are replacing the tape, head, and control with memprocessors. 2) UMM vs. LSM: The mapping between a LSM and a UMM is fairly obvious. We simply have to map each “reservoir cell” to a memcell, and wire the circuit such that the point-wise separation and fading-memory properties are satisfied. The explicit construction of the circuit to realize such properties will not be explored here. Although, in theory, it is possible to simulate an LSM with a UMM, it is not always efficient or necessary to do so in practice. The circuit topologies of the two machines are very different, and they are designed to perform different tasks. For the LSM model, the connections between the reservoir cells are typically random, and the reservoir as a whole is not trained. The expectation of getting the correct output relies entirely on training the output function correctly. In the end, the operation of the machine relies on statistical methods, and is inevitably prone to making errors. In some sense, the machine as a whole is analogous to a “learning algorithm” [25]. On the other hand, for the UMM, we can connect the memcells into a circuit specific to the tasks at hand. One realization of this connection employs self-organizing logic gates [5] to control the evolution of the machine such that it will always evolve towards an equilibrium state representing the solution to a particular problem (the machine is deterministic). In the general case, the UMM is an entirely different computing paradigm than the LSM, proven already to provide exponential speed-up for certain hard problems [5], [7]. In other words, while possible, utilizing a UMM to simulate an LSM will not be exploiting all the properties of the UMM to its full use. In practical applications, it would then be more advantageous to use a UMM (and its digital version, a DMM) to tackle directly the same problems investigated by LSMs. 3) UMM vs. Quantum Computers: Simulating a quantum computer with a UMM requires “compressing” the internal state of a quantum computer. Recall that a quantum computer has mn basis states, and each basis state is associated with some complex factor. All these complex factors would then need to be represented with only k = O(n) interacting memcells. In this work we have shown that this is in fact doable on a theoretical level. However, as already mentioned, this result does not provide any information on the resources required for a UMM to simulate a quantum computer. Nevertheless, one of the features that makes UMMs a practical and powerful model of computation is precisely its “information overhead”. Information overhead and quantum entanglement share some similarities: in some sense, both of them allow the machine to access a set of results of mathematical operations (without actually storing them) that is larger than that provided by simply the union of non-interacting processing units (cf. Refs. [4] and [8]). We could then argue that we may exploit the information overhead property of a UMM to precisely represent efficiently the entanglement of a quantum system. At this point, however, this question is still open.

VII. C ONCLUSIONS In conclusion, we have employed set theory and cardinality arguments to describe the relation between universal memcomputing machines and other types of computing models, in particular liquid-state machines and quantum computers. Using this mathematical framework we have confirmed that UMMs are Turing-complete, a result already obtained in Ref. [4] using a different approach. In addition, we have also shown that UMMs are liquidcomplete (or reservoir-complete), and quantum-complete, namely they can simulate any liquid-state (or reservoircomputing) machine and any quantum computer. Of course, the results discussed here do not provide an answer to the question of what resources would be needed for a UMM to efficiently simulate such machines, only that such a mapping exists. Along these lines, it would be interesting to study the relation between information overhead and quantum entanglement. If such a relation exists and can be exploited at a practical level, it may suggest how to utilize UMMs to efficiently simulate quantum problems that are currently believed to be only within reach of quantum computers (such as the efficient simulation of quantum Hamiltonians). Further work is however needed to address this practical question. Acknowledgments – MD acknowledges partial support from the Center for Memory and Recording Research at UCSD. R EFERENCES [1] M. Di Ventra and Y. V. Pershin, “The parallel approach,” Nature Physics, vol. 9, pp. 200–202, 2013. [2] J. v. Neumann, “First draft of a report on the edvac,” tech. rep., 1945. [3] S. Arora and B. Barak, Computational Complexity: A Modern Approach. New York, NY, USA: Cambridge University Press, 1st ed., 2009. [4] F. L. Traversa and M. Di Ventra, “Universal memcomputing machines,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 11, p. 2702, 2015. [5] F. L. Traversa and M. Di Ventra, “Polynomial-time solution of prime factorization and np-complete problems with digital memcomputing machines,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 27, p. 023107, 2017. [6] F. L. Traversa, C. Ramella, F. Bonani, and M. Di Ventra, “Memcomputing NP-complete problems in polynomial time using polynomial resources and collective states,” Science Advances, vol. 1, no. 6, p. e1500031, 2015. [7] F. Traversa, P. Cicotti, F. Sheldon, and M. Di Ventra, “Evidence of an exponential speed-up in the solution of hard optimization problems,” arXiv:1710.09278, 2017. [8] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” SIAM J. Comput., vol. 26, pp. 1484–1509, Oct. 1997. [9] A. Finnila, M. Gomez, C. Sebenik, C. Stenson, and J. Doll, “Quantum annealing: A new method for minimizing multidimensional functions,” Chemical Physics Letters, vol. 219, pp. 343–348, Mar. 1994. [10] G. H. Low and I. L. Chuang, “Optimal hamiltonian simulation by quantum signal processing,” Physical Review Letters, vol. 118, Jan. 2017. [11] P. Benioff, “The computer as a physical system: A microscopic quantum mechanical hamiltonian model of computers as represented by turing machines,” Journal of Statistical Physics, vol. 22, pp. 563–591, May 1980. [12] D. Korenkevych, Y. Xue, Z. Bian, F. Chudak, W. G. Macready, J. Rolfe, and E. Andriyash, “Benchmarking quantum hardware for training of fully visible boltzmann machines,” 2016. [13] W. Maass, T. Natschl¨ager, and H. Markram, “Real-time computing without stable states: A new framework for neural computation based on perturbations,” Neural computation, vol. 14, no. 11, pp. 2531–2560, 2002.

10

[14] M. Lukoˇseviˇcius and H. Jaeger, “Reservoir computing approaches to recurrent neural network training,” Computer Science Review, vol. 3, no. 3, pp. 127–149, 2009. [15] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout, “Isolated word recognition with the liquid state machine: a case study,” Information Processing Letters, vol. 95, no. 6, pp. 521–528, 2005. [16] H. Jaeger, “The echo state approach to analysing and training recurrent neural networks-with an erratum note,” Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, vol. 148, no. 34, p. 13, 2001. [17] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information. Cambridge Series on information and the Natural Sciences, Cambridge University Press, 10th Aniversary ed., 2010. [18] H. B. Enderton, Elements of set theory. Academic Press, 1977. [19] G. Cantor, “Ueber eine eigenschaft des inbegriffs aller reellen algebraischen zahlen.,” Journal f¨ur die reine und angewandte Mathematik, vol. 77, pp. 258–262, 1874. [20] T. E. Forster, “Set theory with a universal set. exploring an untyped universe,” 1994. [21] J. E. Hopcroft, R. Motwani, and J. D. Ullman, “Introduction to automata theory, languages, and computation,” ACM SIGACT News, vol. 32, no. 1, pp. 60–65, 2001. [22] H. R. Lewis and C. H. Papadimitriou, Elements of the Theory of Computation. Prentice Hall PTR, 1997. [23] T. Monz, P. Schindler, J. T. Barreiro, M. Chwalla, D. Nigg, W. A. Coish, M. Harlander, W. H¨ansel, M. Hennrich, and R. Blatt, “14qubit entanglement: Creation and coherence,” Physical Review Letters, vol. 106, no. 13, p. 130506, 2011. [24] C. Song, K. Xu, W. Liu, C.-p. Yang, S.-B. Zheng, H. Deng, Q. Xie, K. Huang, Q. Guo, L. Zhang, et al., “10-qubit entanglement and parallel logic operations with a superconducting circuit,” Physical Review Letters, vol. 119, no. 18, p. 180511, 2017. [25] Y. Zhang, P. Li, Y. Jin, and Y. Choe, “A digital liquid state machine with biologically inspired learning and its application to speech recognition,” IEEE transactions on neural networks and learning systems, vol. 26, no. 11, pp. 2635–2649, 2015.