MTHSC 641: Project Proposal Parallelization of Learning Algorithm for Neural-Net Decision Support
William M. Jones, Nathan DeBardeleben February 21, 2001
1
Motivation
Artificial Neural Networks (ANN) can provide decision support that is both adaptive and capable of acquiring knowledge and solving problems. These neural networks are often implemented as a software simulation. As with any simulation, a set of well defined parameters describe the structure and flow of the system, and ANNs are no exception. Arguably the most fundamental network structure used in many applications is the Multilayer FeedForward ANN (FFANN). This network infrastructure is characterized by several features including number of layers, number of nodes in a given layer, interconnection between adjacent layers, and individual node parameters. In addition to the network architecture itself, there is the significant burden of training the ANN. Classical learning algorithms such as first order gradient decent often converge rather slowly but are less computationally expensive than more sophisticated techniques. To add to the complexity of the situation, there is the issue of scalability of the algorithm as a function of layers and nodes. To accommodate larger networks, there is a need to decrease the time required to train the ANN. This has traditionally taken the form of modifications of the learning algorithm, particularly the back-propagation (BP) phase. In order to speed-up the back-propagation (BP) phase of the training algorithm, work has been done to modify the BP algorithm to converge more rapidly, while maintaining learning accuracy. For this project we propose a parallel solution to the modified BP algorithm initially studied by Piramuthu et al. [2].
2
Proposal
For the purpose of this project, we will implement the modified BP algorithm presented in[2]. Additionally we will implement the classic BP algorithm as a reference point. This first implementation will be sequential in nature. We will then validate a number of results regarding overall performance of this algorithm compared with the classic BP algorithm. Given that the primary focus is to speed-up the training procedure, we will parallelize the learning algorithm. This will most likely require that we formulate the FFANN as a series of matrix operations. We will then be in the position to leverage the considerable parallelized matrix operations knowledge-base. 1
We will investigate the issues and merits of parallelizing the FFANN for both shared memory as well as distributed memory systems. We will then implement the FFANN on at least one of these platforms. After developing the parallel solution, we will adjust some of the parameters of the FFANN to determine effect of learning convergence and accuracy.
3
Background
Before we can effectively parallelize the FFANN, we will need to analyze the structure of computations involved in solving the problem. As with any parallel implementation, the inherit granularity, or time between communications, of the sequential algorithm dictates the way in which the problem can effectively be decomposed into disjoint parallel parts. Another consideration for algorithm decomposition is the target parallel processing platform. Although there are several flavors, the vast majority of parallel systems can be divided into two, possibly overlapping, categories: shared memory and distributed memory systems. The salient distinguishing feature of these systems is the physical location of the memory. As the name implies, a distributed memory system typically has physically separated memories, with no global name-space. This forces a type of communication between related parallel tasks commonly referred to as explicit message passing. This paradigm is often best suited for solving problems with a large grain size. On the other hand, shared memory systems have localized and tightly coupled main memories that allow related parallel tasks to share a common global name-space. These systems are well suited for solving problems with a variety of grain sizes. Although these systems provide more flexibility, they are also more expensive and less scalable than their distributed memory counter-parts.
4
Reference Material
Although this project was initially selected as a result of our inspection of the material presented in [2], we have since then been able to collect some additional sources of information pertaining to ANNs [3] and their parallelization techniques on specific hardware architectures [1]. We intend to bring these considerable resources to bear on this project. It it our intention to continue to expand our reference-base, as we move further towards our ultimate goal.
5
Conclusion
This paper represents our tentative plan to meet the goals of the semester project in MTHSC 641. It essentially represents a contract between the group and the professor. If there are any questions, comments, or suggestions, please feel free to contact us at:
[email protected].
2
References [1] Douglas Aberdeen, Jonathan Baxter, and Robert Edwards. 0.92/MFlops/s, Ultra-LargeScale Neural-Network Training on a PIII Cluster. In Proceedings of the IEEE/ACM SC2000 Conference. IEEE Computer Society, November 2000. [2] Selwyn Piramuthu et. al. Learning Algorithms for Neural-Net Decision Support. In ORSA Journal on Computing, volume 5, 361-373, Fall 1993. Operations Research Society of America. [3] Robert J. Schalkoff. Artificial Neural Networks. McGraw-Hill, Inc., 1997.
3