Signal and Image Processing in Java Jonathan Campbell and Fionn Murtagh University of Ulster, Magee College, Derry, BT48 7JL email:
[email protected]. Revisions available from http://www.infm.ulst.ac.uk/research/preprints.html Original paper presented at IMVIP '97 University of Ulster, Magee College, Derry 10-13 September 1997. 9 September 1997 Revised 20 September 1997 Revised 6 November 1997
Abstract
We describe the implementation of a multi-purpose data analysis laboratory, DataLab-J, in the programming language Java. We brie y trace the stages of the evolution of DataLab from a FORTRAN-IV system in 1973 to the current Java development. Description of this evolution allows us to discuss some key design and functionality decisions and issues that arose throughout the years; many of these issues remain topical, so, in addition to an evaluation of Java, we identify and discuss what are for us the major issues in the design of such software. Moreover, we address questions raised by the need to convert legacy systems, e.g. those programmed in C and various versions of FORTRAN. The experience of redesign and implementation in Java is described, together with a brief evaluation of the suitability of Java for 'number-crunching'. Overall conclusions are drawn, regarding design of such software, lessons learned, traps to avoid, and on Java itself.
1 Introduction As in most applied mathematics, research in signal processing, image processing, pattern classi cation, signal estimation more and more depends on good software tools; especially important are easy-touse interactivity with eective data visualisation. Of course, nowadays, there are a great number of commercially available packages, e.g. [Mathworks, 1997, Khoral, 1997], and numerous public-domain and shareware packages, e.g. [Eaton, 1997]. However by the very nature of `research', workers in these areas often demand more than mere `canned-solutions'. New algorithms must be implemented in the system { not always easy in thirdparty software. Memory and speed performance issues that can be considered as secondary in a multipurpose product may become crucial to the feasibility of, for example, lengthy Monte Carlo simulations. Further, valuable insight is often gained by probing algorithms and the inspection of intermediate data. We have always held the view that, sooner or later, bought-in software will be found wanting { whether in general lack of functionality, in lack of programmability/extensibility, in lack of performance, etc. In addition, there are libraries and (object-oriented) class hierarchies such as Image Understanding Environment (IUE) [AAI, 1997]. IUE is public domain, and is very comprehensive in covering apparently every `image processing' need, and caters for the range of abstractions from low-level pixel operations up to high-level `descriptions'. Moreover, IUE's aim of promoting cooperation and the sharing of research results is laudable. However, the sheer size and complexity of IUE seems inappropriate for our requirements. Issues of complexity and parsimony of design are more explicitly addressed later. 1
On the other hand, the alternative extreme, that of each researcher, even within a laboratory, `rolling your own' software in an ad hoc manner, cannot be countenanced. Not only will individuals, themselves, waste much energy in developing infrastructural functions such as le handling and displays, but such practices will ensure the dissipation of any permanent technology base for the organisation. Like human culture needs a language, so does a developing technology need its `language'. If we are to foster proper technology accumulation and retention, we need `reusability' in its widest sense. The project/package that is described in this paper has its origins in the very dierent computational environment of the early 1970s. The Data Structure Laboratory [Morgan et al., 1973] was an early multivariate data analysis laboratory developed by a company whose business was sensors and systems. These were also early days for digital signal processing (DSP) and especially for digital image processing; in fact we were only just at the stage where we could discard analogue solutions (section 3). One of us has lived with this system since 1973, and in that time has seen it evolve in some ten stages, from FORTRAN IV on a small single-user minicomputer, to larger multitasking and virtual memory minicomputers, to mainframes; then to MS-DOS and an Intel 8088 based PC { still in FORTRAN. Next to the C programming language, still on the ubiquitious PC and MS-DOS; then expanding again to inhabit a multitasking, virtual memory under Linux (on a PC) with X-Window system displays. Finally, the chief focus of this paper is its most recent matamorphosis to inhabit the world of Java and the World Wide Web. In all of its time, except at the very beginning, the project has survived with little or no funding support; nonetheless, it has been capable of being used for serious projects in such diverse areas as face recognition, development of fuzzy logic algorithms, experimentation with neural networks, as well as the usual pattern classi cation, and signal and image processing activities [Campbell and Hashim, 1992]. DataLab has been used for teaching an image processing course [Campbell, 1995], and has been used successfully on a large number of student dissertations, see e.g. [Doherty, 1994]. Although DataLab normally comes under the category `image processing system', we must emphasise that our model of image processing involves a great deal more than display and manipulation of graphic images: we include everything from signal processing to monochrome and multivariate image processing, multivariate statistics, data mining, and vector and matrix arithmetic. Section 2 of the paper describes the general requirements and characteristics of such a package: processing functions, display & user-interaction, and the sort of data objects involved { the one permanent theme of the 25 years evolution is that of data abstraction [Shaw and Garlan, 1996]. Section 3 outlines the evolution. Section 4 describes the Java implementation. Section 5 discusses what appear to be the major issues a critical decisions in such designs. Section 6 presents conclusions and discusses future directions for the work.
2 DataLab: Signal Processing, Image Processing, Multivariate Data Analysis
2.1 General
The target is a software environment for processing and handling signal (1-D), image (2-D), and multivariate data sets. In order to explore the data and processing requirements, it is instructive to examine the facilities of DataLab [Campbell, 1994b, Campbell, 1994a]. The most general data structure is a multidimensional image which could represent, amongst other things, a multicolour two-dimensional spatial intensity pattern, i.e. whose pixels are vector valued, or a time sequence of monochrome images. We observe an image with only one row, is a digital-`signal', i.e. a discrete data sequence. This, too could be vector valued. For unordered data collections we can again use one row of an image. DataLab supports matrix and vector mathematics; matrices are single dimensional images; vectors are single dimensional, single row, images. 2
When image objects are being de ned, DataLab allows users to specify representation data type: BYTE { 8 bits, INT { 16 bits, REAL { 32 bit IEEE oating-point. However, in most interfaces, elementary data are treated as `numbers', with integer and oating-point data being treated uniformly. Although byte and integer data types have been neccessary in the past for systems with limited memory, the diculties attendant on xed point arithmetic have generated pressure for the avoidance of integer data types whatsoever. In addition to the primary image data, a DataLab image may possess two additional ancillary data elements: label data - which are typically used for class labels in pattern classi cation experiments, and `ancillary' data, typically the dependent/concomitant data in estimation experiments, e.g. regression. Clearly, some DataLab functions are restricted in the type/structure of `images' they can handle; for example, the operation of an two-dimensional edge detector is de ned for an `image' with only one row; and it is even less relevant for a data collection. The inverse Discrete Fourier transform requires a pair of images, etc. Nevertheless, many functions e.g. `add' is de ned for any compatible pair of `images'. In general, the laissez-faire principle adopted is `if the user commanded it, try to do it'.
2.2 Summary, Data and Operations 2.2.1 Signal - Data Sequence Data
where N is the number of samples.
x[n];
n = 0; 1; 2 : : :; N ? 1
Typical Process - Convolution y[n] = x[n] h[n] =
X x[n ? m] h[m]
N ?1 m=0
2.2.2 Digital Image Data
f [r; c]; r = 0; 1; 2 : : :; nr ? 1; c = 0; 1; 2 : : :; nc ? 1 where nr is the number of rows, nc number of columns.
Typical Process - Convolution g[r; c] =
Xr Xc
k=r?N +1 l=c?M +1
f [k; l]h[r ? k; c ? l]
2.2.3 Multivariate Data Set Data XT = fxi; !ig; i = 0; 1; 2 : : :; ns ? 1 where ns is the size of the sample, xi is a typical datum, and !i is its class label. Typical Process - Pattern Classi cation Given the training data: XT = fxi; !ig; i = 0; 1; 2 : : :; ns ? 1
and a new vector xj , infer !j :
!j = f (xj ; XT) 3
In cases like this of set data, i.e. where these is spatial or sequential relationship between data, it is a simple matter to represent that data as one row of a multispectral image: f [d][0][c]; d = 0; : : :; p ? 1, the vector index, c = 0; : : :; ns ? 1, the collection index. On the other hand, the system is well capable of coping with true signals and image data, even if it is multivariate.. In order to cope with labels, we use a separate, similar, data structure for class labels, in which case we can use integers. We can also imagine the case where, instead of a discrete label, we are estimating a continuous value, y , e.g. linear regression; we call the y (dependent variable) data ancillary data. We represent ancillary data simply as another multivariate image.
2.2.4 Statistics One may need to store statistics, e.g. ranges, means, variance-covariance matrix; most statistics may be class dependent. However, the storage of statistics, or indeed of any value that can be computed, is a moot point: in addition to computing them, software must be provided to store them, to retrieve them, and to keep track of their existence and currency.
2.3 Overall Data Object
Consequently, a general data object consists of:
Data (primary data): f[d][r][c] (Optional) Labels: w[l][r][c] (Optional) Ancillary Data: a[d][r][c] Metadata, for example:
{ { { {
Size of data `image': p, nr, nc. Existence of statistics data. Existence of, and number of labels. Existence of, and dimensionality of ancillary data
3 History and Origins In this section we trace the development of DataLab from its FORTRAN IV, minicomputer origins through to the current Java development.
3.1 Data Structure Laboratory (DSL), 1973
The DataLab project had its origins the the Plessey Data Structure Laboratory [Morgan et al., 1973]. One of its rst projects was the design of optimal (spectral) ltering for the detection of shallow graves using multispectral data, remotely sensed from an airborne platform.
Hardware Platform Honeywell DDP 516 mini-computer. Single user BOS operating system. 16 Kwords (16-bit) memory, 2 x 1.5 Mb disks. Textronix 611 Storage-tube graphical display.
Software Platform Single user BOS operating system, FORTRAN IV.
4
Software Architecture Stand alone programs. Data objects are les, each accompanied by a header- le (meta-data). Data les are accessed via well de ned protocol/subprograms. Software is developed independently, but must use DSL protocol; before entering 'validated library', software is subject to informal scrutiny. DSL is not mandated, but the pull provided by the ease of use and availability of a rapidly growing software base, mean that it gains widespread acceptance.
Related Software Multics OLPARS [Sammon, 1970]. Key Concerns Visualisation { 'data structure', feedback, analyst-in-loop, 'serendipity' in data analysis. We had no line printer based on the motto: the purpose of computers is to provide insight, not create data { John Tukey. Performance was a concern: we were still experimenting with analogue solutions for image processing, e.g. an analogue nearest-mean classi er. In 1974, 4 Kbit memory chips are announced: 512 x 512 x 3 colour image { 3K chips! { or 30 boards and a lot of power. Events ERTS (LANDSAT) programme started in 1972 - multispectral image data for (non-military) land-use applications. Meteosat.
3.2 Signal Processing DSL (SPDSL)
Hardware and Software Platforms As DSL. Software Architecture Data objects are sub-arrays in FORTRAN COMMON (shared) memory,
with some metadata. Up to 6 x length 2048 signals. Processing is via stand-alone 'segments'. Bridge to DSL via les.
3.3 DSL Image Processing (DSLIMP)
Hardware and Software Platforms As DSL. Software Architecture Data objects are sub-arrays in FORTRAN COMMON (shared) memory, with some metadata. Up to 3 x 64 x 64 images. Processing is via stand-alone 'segments'. Bridge to DSL via les.
3.4 Conversion to Prime 300
Hardware & Software Platform Prime 300 16-bit mini-computer. Primos multitasking, virtual memory operating system (64K virtual memory). As a consequence, we were able to increase sequence lengths by a factor of four, and image dimensions by a factor of two. Key Concerns and Projects Winograd Fourier Transform algorithm (WFTA) [McClellan and Rader, 1976, Winograd, 1978], number theoretic transforms (NTT), implemented using WFTA [Bailey and White, 1977]; the Prime 300 was microprogrammable, so that key algorithm hot-spots could be implemented in microcode, e.g. FFT 'butter y'.
3.5 IDP 3000 1975{80
First 'commercial' digital image processing system in Europe.
3.6 DSL Revision [Campbell, 1979]
Hardware & Software Platform Prime 300 16-bit mini-computer. Software Architecture As before. Statistics block, means, covariances etc, stored with data. 5
3.7 Laboratory for Multivariate and Image Data (LAMID) 1980-81 [Campbell, 1981] Hardware and Software Platform DECsystem-20, TOPS-20 operating system. FORTRAN IV. (University College, Dublin).
Software Architecture LAMID - stand alone programs. Data objects single UDSF les (Universal Data Storage Format): metadata, data, statistics. IMPS - table of commands, dispatcher for table of subprograms. Data objects are again FORTRAN COMMON blocks.
3.8 DISPP (1986)
Hardware and Software Platform IBM PC, MS-DOS, Microsoft FORTRAN-77.
3.9 LISSP, DataLab, 1990-95 [Campbell, 1994b, Campbell, 1994a] Hardware and Software Platform IBM PC, MS-DOS, Borland C.
Software Architecture Table of commands - functions; command interpreter - dispatcher. 2.
Fully memory based data objects: based on IM(age) abstract-data-type and described in section
3.10 DataLab-32X, 1995-97
Hardware and Software Platform IBM PC, Linux, GNU C, X-Window displays.
Software Architecture As before.
3.11 DataLab-Java, 1997{
Hardware and Software Platform Any that supports Java. Software Architecture As before. Key Concerns and Projects Data mining, applets, operability via WWW.
4 DataLab-Java 4.1 Introduction
By 1997, DataLab was ripe for conversion to an object-oriented design. C++ was a strong contender because of supposed performance advantages, but eventually Java was chosen, mainly because of the prospect of Web based activity. Later, other advantages of Java became apparent. However, this article asserts neither that Java is perfect, nor that C++ is bad.
4.2 Java
Descendant of C, C++ Java [Flanagan, 1997, Horstmann, 1997] is a descendant of C++. Like C++, Java supports object-oriented software development, though it is perfectly feasible to produce Java code in an almost entirely imperative style. Indeed most C functions which deal with purely with numbers, convert directly to Java. Java functions which deal with input-output, with strings, characters, and heap arrays are considerable dierent from their C and C++ counterparts. Nonetheless, someone with a working knowledge of C++ classes/objects, and with a good knowledge of procedural C or C++ could expect to be programming productively in Java with one day's reading of textbooks such as [Flanagan, 1997, Horstmann, 1997]. 6
Strong Typing Java type checking is stricter than C or C++. Expressions containing mixed types, even float, double must use explicit casts; this is not a signi cant inconvenience, and, in fact, is a considerable advantage for Java as a teaching language. It brings a slight penalty for the conversion of carelessly mixed-mode C/C++ programs. Pointers Although it is often said that Java has no pointers, this is untrue: Java does rely heavily on
pointers/references, but it allows programmers to access these only in a benign manner; for example, no so-called 'pointer-arithmetic' is allowed. Array and object variables are passed-by-reference, but without the dangerous side-eects of C/C++.
Garbage Collection Perhaps the greatest contrast with C++ is Java's built-in garbage-collection. All array and object variables are allocated on the heap ; however, although this allows them to have lifetime beyond that of their de ning scope unit, still the programmer is absolved the diculty of remembering to delete them before their reference is destroyed. Although the garbage collection feature only reaches true signi cance in considerably sized programs, this signi cance is great, for there is anecdotal evidence that, in large programs which use dynamically allocated memory { and most programs in this category must { upwards to 50% of errors are to do with memory allocation/deallocation: i.e. garbage or dangling pointers.
Vector class The Java Vector class is quite dierent from the Vector classes provide by C++
STL and otherwise: these are simply 'super' arrays, with perhaps the facility of run-time resizing. Java Vector is somewhat like a List: a Vector can have objects inserted anywhere, with automatic resizing. Java anticipates there use a temporary stores of undetermined size, and provides a copyTo method speci cally for the purpose of copying the contents of a Vector to a more ecient array storage.
Header les, dependency Java does not have header les. This is no loss, since C/C++ header
les merely duplicate the information in their implementation les. The Java compiler will simply read the corresponding part of appropriate implemenation le. Java import is not the same: import simply allows a program to use unquali ed names { it imports scope.
Interpreted { Performance Most current Java implementations execute by interpreting compiled
bytecode, consequently performance penalties are to be expected. Benchmarking tests carried out during this project (speci cally, computation of the Sobell gradient of a 512x512 image, JDK 1.1.3, Linux) seem to indicate a performance penalty, compared to DataLab C compiled code (gcc, Linux), of about a six times performance penalty. With just in time and native compilation, we can expect closer convergence. In some cases, such a factor may be problematical, e.g. in the case mentioned (512x512 Sobell) the increase is from 5 seconds (C), to 30 seconds (Java); for exploratory interactive work, 30 seconds is probably intolerable. On the other hand, it is probably feasible to carry out exploratory work (where speed of interactivity is an issue) on smaller images. We note that much greater performance de cits (e.g. 50 times) are tolerated in some comparable interpreted packages.
Inheritance Java allows only single inheritance; in the DataLab context this has not been a problem. We have not had to employ the substitute interface construct.
Higher-order functions, pointers-to-functions A tentative design for part of the dispatcher suggested grouping of DataLab-functions (i.e. functions available at the user-interface), with either a sub-scheduler for each group, or passing of functions to a group executor. Java functions are not higher-order, i.e. they cannot take functions as arguments, not even pointers-to-functions as in C. 7
Though a workaround, using interfaces, is indicated in [Harold, 1997], we have adopted a simpler solution in the current prototype.
Scienti c libraries Although the lack of scienti c software libraries for Java could be seen as a weakness, things are catching up fast. For linear algebra (matrix inversion, singular value decomposition, etc.) we use the public domain LINPACK conversions provided by [Verrill, 1997].
Polymorphism via Templates Java does not have templates. Although C++ templates are appealing for the implemenation of, for example, multiple image types, byte, integer, oat, etc., see [Campbell, 1997], we have successfully overcome their absence in our current simpli ed design, see Section 5 { multiple numeric types. Polymorphic Collections Since, in the style of Smalltalk, all Java classes inherit from the class
Object Java allows, with approriate casts, polymorphic collections, e.g. Vector. Nevertheless, in the DataLab design, we have never found it neccessary to deviate from homogeneity within a particular collection.
4.3 Classes, Objects and Encapsulation
The class construct provides the basic mechanism for object-oriented programming. The class serves as the vehicle for providing abstract-data-types. The representation and the associated set of interface functions methods are grouped together in a class declaration. An object is an instance of a class { just as a variable is an instance of a type. Classes provide encapsulation, or equivalently information-hiding, via public, private. Private members (functions or data) are inaccessible outside the class { encapsulation. We can take a more syntactic view of classes :
class is an extension of record, struct in C++. As well as having data members, a class can have function members. Figure 1 shows part of a class Im, which is the basic DataLab class: it represents a simple float
two-dimensional data array; as we will indicate later, a data sequence may be implemented by a single row Im object, likewise a data collection. As a consequence of encapsulation we can view objects, e.g. of class Im, as capsules, containing the private representation data, but these data may be accessed only through the interface functions, see Figure 2. Regarding the names of the interface functions we note the terminology:
Constructor: to construct objects { usually by allocating memory and initialising it Selector or inspector { to inspect object state. Mutator. A mutator is used to update object state. Destructor. In C++, a class developer must normally provide a destructor which is called (implicitly) when an object goes out of scope. Since Java has garbage collection, destructors are not needed. Operators. Unlike C++, functions cannot be de ned as operators. No one should regard this loss as signi cant.
4.4 DataLab-Java, Object Design
Figure 3 shows an abridged DataLab ? J object dependency diagram. The following paragraphs describe some of these classes, and outline further aspects of the design. 8
Class Im Class Im is already described in the previous section. In an earlier prototype we employed two subclasses of Im:
Vect: which encapsulated a Java 1-d. array (float[]) and which implemented one row of a (see next item) or a data sequence. Matrix: which encapsulated a Java 2-d. array (float[][]) and which implemented monochrome image or matrix representation. Matrix
Eventually, these classes proved to be more hindrance than help, and we now use plain Java arrays in their place, coupled with a class Matvec that provides no data-abstraction, but a collection of vector and matrix operations, and incorporates some of the Basic Linear Algebra Subroutines Java conversions provided by [Verrill, 1997]. Whilst Im image representation does bene t from data-abstraction and encapsulation, special array and matrix objects were incompatible with external software, e.g. the LINPACK Java conversions provided by [Verrill, 1997] and that much complexity was introduced by required conversion functions. An alternative approach, to convert the externally produced libraries to use our objects, is attractive to the naive, but costs dearly in the long run. With the current approach, we can use external software unmodi ed, and our low-level software is also usable on other projects. Moreover, since Java arrays are in fact objects, Java provides their memory management, i.e. they are allocated on the heap, and, eventually, garbage collected.
Class Imd Figure 4 shows part of a class Imd. This time we us a Java Vector: consequently, see an earlier section, it is resizable { i.e. the number of bands/images can be altered dynamically; given the high-level of this class, the overhead of the less ecient Vector is insigni cant. Its methods follow the same pattern as Im. Class Dld Class Dld represents the overall DataLab data object and part of it is shown in Figure 5. As has been described, it consists of (primary) data { dat, and optional labels { lab, and optional ancillary data { anc. Class Cmd This class handles the command-line user-interface: reading commands (from keyboard { or le, which provides a rudimentary script facility) and decoding them and their parameters.
Class Dlj This class provides the main program: 1. Calls the command fetcher-and-decoder (class Cmd); 2. Dispatches the commands to DataLab-functions; 3. Manages the computational environment Dld variables as they are created and/or destroyed during an interactive session. Currently we use an Dld-object list implemented as a Java Vector, with objects referenced simply by their position in the list. In the next version we plan to incorporate an Env (environment) class will be implemented using an appropriate Dictionary/symboltable class. Key points are shown in Figure 6.
DataLab-functions A typical DataLab-function { Karhunen-Loeve transformation (kl) (Principal
Components Analysis) is shown in Figure 7.
9
5 Discussion 5.1 Java
In this discussion we are keen to praise Java, especially the fact that the rst revision (9 September 1997) of the work was carried out in around 50 hours { from a starting point of little or no knowledge of Java. The similarity of the basic language to C/C++ is an advantage, in that purely procedural programs are easy to convert. We have already noted that some of the more troublesome aspects have been removed. Our current discussion focusses on Java as a general purpose computing language; our interest in its Web capabilities is secondary. Indeed, we note that there a fairly widespread and unfortunate misconception that Java is for Web applets only.
5.2 Data Model
As can be seen in the previous section, our data model is extremely simple. In previous systems, we have always wondered about the poverty of the raster/lattice data model: would a pyramidal representation be worth considering? More importantly would it be easy for client programmers and users; and what about a symbolic or relational representation for higher level processing? We have even simpli ed the original DataLab representation: we no longer allow sequences and images to have negative indices { this was possible to fake in C; negative indices were one of those features that seemed 'nice to have', but were actually used only in a minuscule percentage of cases; nonetheless, the feature incurred signi cant support code.
5.3 Data Implementation
Although the image data model is clearly that of a function on a restricted grid { its domain is a subset of Z 2 , so that a grid based raster storage is natural. Nevertheless, our data abstraction properly denies any statement about the actual implementation of the storage. Our data could be stored in some sparse structure, using for example run-length encoding. Yet another possibility is the inclusion of functional representation, e.g. f [r; c] = constant(1.0), or x[n] = sin(2an=N ). In previous designs, we have attempted to provide a continuous data abstraction { such that the image function domain was modelled as subset of R2 . This was more trouble than it was worth, since the vast majority of digital signal and image processing is based on a sampled data model.
5.4 Metadata
By `metadata' we mean object descriptor data such as image size, existence of optional data blocks etc. What is remarkable about the classes described in the previous section is the complete absence of metadata. This is because Java provides methods for interrogating the size of arrays and Vectors. Indeed, if optional elds are unused, e.g. label data, lab in DLd, we simply initialise them to null; testing for existence is simply a test of equality with null. The absence of metadata may seem a minor point { but, it is signi cant: such data are redundant but they must be kept up to date { which may not only require a considerable amount of code, but also result in conceptual pollution and increase in complexity. Related is the decision to `store or compute'.
5.5 Store or Compute?
Data such as statistics, e.g. covariance matrices etc, are related to metadata. The question arises, store or compute? Since they seem computationally expensive to compute, there is a temptation to store them. However, storing brings its own expense: strored statistics must be kept up to date; how? by marking them as invalid for every updating of the related data? by recomputing them at each 10
update? A previous design used the policy of `opening' data collections for writing or reading, just like a le; this way, the neccessity of updating stored statistics could be determined; however, this `feature' cost ve to fteen lines of code for every process. Measures such as lazy evaluation and memoising [MacLennan, 1990] are partial solutions, but may incur unwarranted complexity.
5.6 Multiple Numeric Types
In previous systems it was neccessary to store, at least initially, large images in byte and integer format. However, xed-point computations are simply inconvenient { so that there is always temptation, where feasible, to work in oating-point. In the current system, all numerical data are stored in oat { though, of course, this is hidden by the data abstraction. Eventually, we may have to consider reintroducing multiple types, e.g. byte for class labels.
5.7 Software Architecture
The software architecture based on function-table/shared memory/objects has some limitations compared to other architectures, [Shaw and Garlan, 1996]. The chief drawback is that of con guration management: it is dicult to extend the system across distributed sites, without serious divergence occurring { divergence that is most dicult to correct. A UNIX ` lter' architecture is one notable and attractive alternative. Web distribution may generate further requirements in this respect.
6 Conclusions Our primary goal was to explore the feasibility of numerical programming in Java; that feasibility, in particular performance, has been proven. Java is easy to learn. Compared to C++ and C, it is small; it is remarkably devoid of small-print and `gotchas'. The Java JDK classes provide a great many of the functions that had to be programmed in the previous C implementations. We have demonstrated a pleasantly parsimonious design, and it is pleasing to report that many `features' have been relinquished, compared to the predecessor. So far, we have de nitely avoided the `second system' syndrome [Brooks, 1995] { in which developers run riot with indulgence, pride and delusions of grandeur.
Acknowledgements We are indepted to George Row, now Southbank University, London, for many enlightening discussions on data-abstraction, functional programming, and, indeed, computer science in general. Likewise Aiden McCaughey for his support and encouragement on Linux and Java.
References [AAI, 1997] AAI (1997). Www page: Image understanding environment. Technical report, AAI Corp. http://www.aai.com/AAI/IUE/IUE.html (9 Sept 1997). [Bailey and White, 1977] Bailey, D. and White, I. (1977). The Winograd Fourier Transform Algorithm { a description with software and hardware applications. Technical report, Plessey Radar Research Centre, Havant, U.K., Report 17/77/R132U. [Brooks, 1995] Brooks, F. (1995). The Mythical Man Month. Reading: MA, Addison Wesley, second (20th anniversary) edition. [Campbell, 1979] Campbell, J. (1979). The Data Structure Laboratory, Description and User Manual. Technical report, Plessey Electronic Systems Research, Havant, United Kingdom.
11
[Campbell, 1981] Campbell, J. (1981). The Use of Landsat MSS Data for Ecological Mapping. In Proceedings of Ninth Annual Conference of the Remote Sensing Society, University of London, pages 143{161. [Campbell, 1994a] Campbell, J. (1994a). DataLab Programmers' Manual. Technical report, University of Ulster, Interactive Systems Centre, Report isc/94/016/n; available http://www.infm.ulst.ac.uk/ jgc/dl/dlusr.a. [Campbell, 1994b] Campbell, J. (1994b). DataLab Users' Manual. Technical report, University of Ulster, Interactive Systems Centre, Report isc/94/015/r, available from http://www.infm.ulst.ac.uk/ jgc/dl/dlprg.a. [Campbell, 1995] Campbell, J. (1995). Lecture notes on image processing. Technical report, University of Ulster, Module AC460, available from http://www.infm.ulst.ac.uk/ jgc/ip/. [Campbell, 1997] Campbell, J. (1997). Lessons on object-oriented programming. Technical report, University of Ulster, Module AC264, available from http://www.infm.ulst.ac.uk/ jgc/oop/oop.complete.ps. [Campbell and Hashim, 1992] Campbell, J. and Hashim, A. (1992). Fuzzy sets, pattern recognition, linear estimation, and neural networks { a uni cation of the theory with relevance to remote sensing. In Proceedings of Eighteenth Annual Conference of the Remote Sensing Society, University of Dundee, pages 508{517. [Doherty, 1994] Doherty, W. (1994). Textile aw detection. Master's thesis, University of Ulster, Dept. of Applied Computing. [Eaton, 1997] Eaton, J. (1997). Www page: Octave. Technical report, Univ. Wisconsin. http://www.che.wisc.edu/octave (9 Sept 1997). [Flanagan, 1997] Flanagan, D. (1997). Java in a Nutshell. O'Reilly and Assoc., 2nd edition. [Harold, 1997] Harold, E. R. (1997). Www page: Java frequently asked questions (faq). Technical report, http://sunsite.unc.edu/javafaq/javafaq.html#methodpointers (6 Nov 1997). [Horstmann, 1997] Horstmann, C. (1997). Practical Object-oriented Development in C++ and Java. New York: John Wiley. [Khoral, 1997] Khoral (1997). WWW Page: Khoral Research Inc. Creators of Khoros Technology. Technical report, Khoral Research Inc. http://www.khoral.com/ (9 Sept 1997). [MacLennan, 1990] MacLennan, B. (1990). Functional Programming: Practice and Theory. Reading: MA, Addison Wesley. [Mathworks, 1997] Mathworks (1997). WWW page: The Mathworks Web Site. Technical report, Mathworks. http://www.mathworks.com/ (9 Sept 1997). [McClellan and Rader, 1976] McClellan, J. and Rader, C. (1976). Seminar: There is something much faster than the Fast Fourier Transform, October 21, 1976. Technical report, Massachussetts Institute of Technology, Lincoln Laboratory. [Morgan et al., 1973] Morgan, O. E., White, I., Balston, D., Morton, R. D., Stentiford, F., Pike, M., and Campbell, J. (1973). The Data Structure Laboratory - software currently available. Technical report, Plessey Radar Research Centre, Havant, United Kingdom. [Sammon, 1970] Sammon, J. (1970). Interactive pattern analysis and classi cation. IEEE Trans. Computers, C-19(7). 12
[Shaw and Garlan, 1996] Shaw, M. and Garlan, D. (1996). Software Architecture: Perspectives on an Emerging Discipline. Upper Saddle River, NJ: Prentice-Hall. [Verrill, 1997] Verrill, S. (1997). Www page: Linear algebra for statistics java package. Technical report, http://www1.fpl.fs.fed.us/linear algebra.html (6 Nov 1997). [Winograd, 1978] Winograd, S. (1978). On Computing the Discrete Fourier Transform. Mathematics of Computation, 32(141).
13
//---- dlj.Im.java ---------------------------------package dlj; import dlj.*; import java.io.*; public class Im{ private float[][] dat; // constructors public Im(int nr,int nc){ dat= Matvec.make(nr,nc); } public Im(Im m){ this(m.nrows(), m.ncols()); copy(m); } public Im(int nr, int nc, float val){ dat= Matvec.make(nr,nc,val); } public Im(double[][] v){ dat = Matvec.fromDouble(v); } // etc... // accessors public int nrows(){ return dat.length; } public int ncols(){ return dat[0].length; } public float get(int r, int c){ return dat[r][c]; } public float max(){ return Matvec.max(dat); } // etc... // mutators public void put(float val, int r, int c){ dat[r][c]= val; } // print public void println(PrintStream out){ Matvec.println(dat, out); } }
Figure 1: Class Im 14
+----------------------------------+ |Hidden: private float[][] dat; | Public | | Interface | | functions | | +---------+ | Constructors| Im() | etc. | +---------+ | | | +------------------+ | Selectors/ | get(int r, int c)| etc. | Inspectors +------------------+ | | | +--------------+ | Mutators | put(val,r,c) | etc. | +--------------+ | etc... | | +----------------------------------+
Figure 2: Object as Capsule
+---Dlj -- user interface + environment/ / / \ collection of DL objects uses/ / \ / / \ Cmd / \ +------- *\has / Dld -- DataLab object /uses | data, labels, anc. data Dld/------+ 3|has funct- \ uses | | -ions \ | | \ +-----Imd -- multispectral image \ | or multivariate set \ *|has \ | \ | +--------Im -- mono. image | |is | Java 2-d. array
Figure 3: DataLab Object Dependency Diagram 15
//---- dlj.Imd.java ---------------------------------// j.g.c. 16/8/97, 10/10/97 package dlj; import dlj.*; import java.util.*; import java.io.*; public class Imd{ private Vector dat; public Imd(int nd,int nr,int nc){ dat= new Vector(); for(int d= 0; d