Modeling Languages Versus Matrix Generators for ...

3 downloads 0 Views 3MB Size Report
United Computing Systems, Inc., 2525 Washington, Kansas City, Mo 64108 ... Department of Business Studms, University of Edmburgh, 50 George Square, ... DONAGHEY, C., DEWAN, P., AND SINGH, D. A beginner's language for LP Ind.
Modeling Languages Versus Matrix Generators for Linear Programming ROBERT FOURER Northwestern University

Linear optimization problems (hnear programs) are expressed in one kind of form for human modelers, but m a quite different form for computer algorithms. Translation from the modeler's form to the algorithm's form is thus an unavoidable task in hnear programming. Traditionally, this task of translation has been divided between human and computer, through the writing of computer programs known as matrix generators An alternatwe approach leaves almost all of the work of translation to the computer. Central to such an approach is a computer-readable modehng language that expresses a linear program in much the same way that a modeler does It is argued that modehng languages should lead to more reliable application of linear programming at lower overall cost Categories and SubJect Descriptors D.2 4 [Software Engineering] Program Verification--reliabthty, vahdatton; D 3.2 [Programming Languages]' Language Classifications--applieative languages, nonprocedural languages; G 1 6 [Numerieal Analysis]: Optimization--linear programmmg General Terms Documentation, Languages, Rehabdlty, Verffmation Additional Key Words and Phrases- Modehng languages, matrix generators

1. INTRODUCTION P e o p l e a n d c o m p u t e r s see l i n e a r o p t i m i z a t i o n i n d i f f e r e n t ways. F o r t h e h u m a n m o d e l e r , a l i n e a r p r o g r a m is a n a b s t r a c t r e p r e s e n t a t i o n to b e a n a l y z e d a n d u n d e r s t o o d ; for t h e c o m p u t e r a l g o r i t h m , a l i n e a r p r o g r a m is a c o n c r e t e p r o b l e m to b e solved. T h u s m o d e l e r s d e s c r i b e l i n e a r p r o g r a m s i n a r e a d a b l e a n d s y m b o l i c form, s u c h as t h e f a m i l i a r a l g e b r a i c n o t a t i o n for v a r i a b l e s , c o n s t r a i n t s , a n d objectives. A l g o r i t h m s , o n t h e o t h e r h a n d , r e q u i r e a c o n v e n i e n t a n d e x p l i c i t form, t y p i c a l l y a v a r i a b l e - b y - v a r i a b l e list of n o n z e r o coefficients. T h e s e two f o r m s of a l i n e a r p r o g r a m - - t h e m o d e l e r ' s f o r m a n d t h e a l g o r i t h m ' s f o r m - - a r e n o t m u c h alike, a n d y e t n e i t h e r c a n be d o n e w i t h o u t . T h u s a n y a p p l i c a t i o n of l i n e a r o p t i m i z a t i o n i n v o l v e s t r a n s l a t i n g t h e o n e f o r m to t h e o t h e r . Research for this paper was initially supported by National Science Foundation Grant MCS 76-01311 to the National Bureau of Economic Research and the M I.T. Center for Computational Research in Economms and Management Scmnce Author's address Department of Industmal Engineering and Management Sciences, Northwestern University, Evanston, IL 60201 Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the pubhcatlon and its date appear, and notice Is given that copying m by permission of the Association for Computing Machinery To copy otherwise, or to republish, reqmres a fee and/or specific permission @)1983 ACM 0098-3500/83/0600-0143 $00 75 ACM TransacUonson Mathematlca!Software, Vo| 9, No. 2, June 1983,Pages 143-183.

144



Robert Fourer

This process of translation has long been recognized as a difficult and expensive task of practical linear programming. In the traditional approach to translation, the work is divided between modeler and machine. First the modeler converts a symbolic modeler's form to a special computer program; then the computer executes this program to produce a linear program in the algorithm's desired form. The intermediate computer program is known as a matrtx generator. Matrix generators have been the predominant means of translation for large linear nrograms. Specialized systems and programming languages have thus evolved for matrix generation, while certain features and conventions of matrix generators have come to be regarded as standard. There is also a quite different approach to translation, in which as much work as possible is left to the machine, q'he central feature of this alternative approach is a modeling language that is written by the modeler and translated by the computer. A modeling language is not a programming language; rather, it is a declarative language that expresses the modeler's form of a linear program in a notation that a computer system can interpret. Although the idea of a modeling language has been used successfully in other kinds of modeling, it has not been widely applied in linear programming. Moreover, most applications of modeling languages to linear programming have been small-scale or experimental, or have employed the concepts of a modeling language to a limited extent. The central contention of this paper is that modeling languages have certain inherent advantages over matrix generators. In particular, it is argued that modeling languages avoid the common programming difficulties--verification, modification, and documentation--associated with matrix-generator programs; and it is asserted that modeling languages make linear programming easier and more reliable, by involving people with fewer and simpler forms of the model. In addition, it is argued that various specific drawbacks of standard matrix generators may be avoided through the use of modeling languages. On the basis of these arguments, it is concluded that modeling languages should afford more reliable linear programming at lower overall cost. Since modeling languages for linear programming are at an early stage of development, while matrix-generator systems are at an advanced stage, this paper does not attempt to directly compare implementations of the two. Rather, the aim is to compare what matrix generators are--both in principle and in standard practice--with what modeling languages should be if they are fully developed as part of large-scale systems. It is hoped that this comparison will encourage both successful design and further development of modeling languages. 1.1 Outline of This P a p e r

Modeler's form, algorithm's form, and the process of translation are defined more precisely in Section 2. The next two sections then define and describe matrix generators and their drawbacks. Section 3 takes a general point of view, summarizing the principles of matrix generation and identifying consequent weaknesses that are common to all matrix generators. Section 4 considers particular features and drawbacks of standard matrix generation systems. The central case for modeling languages is set forth in the following three sections. Section 5 defines and explains the idea of a modeling language, and offers a hypothetical example. Sections 6 and 7 then compare modeling languages ACM TransacUons on Mathematical Software, Vol 9, No 2, J u n e 1983.

Modeling Languages Versus Matrix Generators for Linear Programming



145

and matrix generators. The presentations of Sections 6 and 7 parallel those of Sections 3 and 4, respectively, so that the advantages of modeling languages are related directly to the drawbacks of matrix generators. Section 8 considers the practicality of modeling languages and their translators. It outlines the design of a translator and estimates the costs of running the translator in various modeling situations. As further evidence for the practicality of modeling languages, this section also fists and compares various implementations of finear-programming systems that incorporate some of the concepts of a modeling language; successful modeling languages for other kinds of models are cited as well. Section 9 briefly considers how a full finear-programming system may be built around a modeling language. This section suggests that the use of a modeling language may benefit several essential tasks of linear programmingmnotably data management, algorithm control, and result reportingmin addition to translation. The overall economy of modeling languages is touched upon in the concluding remarks of Section 10. Appendixes A and B offer further information on implementations of modefing-language features in numerous systems. To avoid ambiguity, linear programs are generally called "LPs" in this paper. Sequences of executable statements for a computer are called "programs." 2. FORMS OF A LINEAR PROGRAM

Two forms of an LP, labeled modeler's form and algorithm's form, have been sketched in the Introduction. Each form is discussed at greater length below; a final subsection defines translation from one form to the other. 2.1 Modeler's Form

The modeler is the person who formulates a linear program. This person needs to express both what the LP is and how it relates to the situation being modeled. A modeler's form of an LP is a notation that serves these purposes. Modeler's forms have certain common characteristics. They are symbolic: they represent most of the problem data by symbols, which are usually mnemonic in nature. They are general: they can define an entire class of LPs together, each particular LP corresponding to some choice of data. They are concise: they describe an LP nearly as briefly as possible, in such a way that the description's length depends on the complexity of the model rather than on the quantity of data or on particular data values. Finally, they are understandable: they present an LP in a form that is easily read and comprehended by people. A simple but typical example of a modeler's form appears in Figure 1. {Figures 1-5 display various forms of one simple LP. To facilitate comparisons, all five figures are collected in Appendix C at the end of this paper.) This form describes an LP in terms of an objective and constraints. It is certainly symbolic and general; it is also concise, its length being a function only of the number of different varieties of data, variables, and constraints. (One cannot even say whether it represents a "small" or "large" LP, since the numbers of variables and constraints depend on the value of parameter T and the size of sets P and R.) The use of familiar algebraic notation ensures that the description is understandable. A C M Transactions on M a t h e m a u c a l Software, Vol, 9, No. 2, J u n e 1983.

146



Robert Fourer

The objective-and-constraints form of Figure 1 serves as an example throughout this paper. Since all optimization is fundamentally the minimization or maximization of an objective function subject to constraints, this form is both natural and general. Moreover, as is argued below, this is the form on which modeling languages are most likely to be based. Nevertheless, there are other candidates for the LP modeler's form. Some models can be naturally expressed in terms of activities, costs, and requirements {see, for example, [19]), and some are best described by nodes, arcs, and network flows. More specialized forms may be appropriate in applications such as refining or portfolio management. Although this paper favors a variable-and-constraint form, the choice of a particular modeler's form is not crucial to its conclusions. The same arguments could be based on any other form that is symbolic, general, concise, and understandable. 2.2 A l g o r i t h m ' s Form

The algorithm is the computational method that finds optimal solutions to the modeler's LPs. Such a method is not concerned with how an LP came to be formulated; it simply needs a description of an LP upon which it can operate efficiently. An algorithm's form is a description of this sort. Although algorithm's forms also share certain characteristics, they are distinctly different from modeler's forms. They are exphcit rather than symbolic: they incorporate numerical problem data directly in the LP description. They are specific rather than general: they describe just one or a few particular LPs. They are redundant rather than concise: they describe an LP more extensively than necessary, and the length of their description depends on the number and size of the data values. Finally, they are convenient rather than understandable: they organize an LP so that it can be stored and operated upon most efficiently by the computer. Figure 5 depicts a common algorithm's form--a supersparse representation of variables' coefficients for the simplex method--which is used to describe the same LP as the modeler's form of Figure 1. This example is clearly explicit and specific. Its redundancy is evident in several ways: information about the arrangement of nonzeros is repeated for each period, numerical data values are copied into a special array, and the length of the description is a function of the numbers of products, raw materials, and periods. As for convenience, the form of Figure 5 is well appreciated in design of LP systems [10, 20], yet it so disguises the modeler's intent that it is useless for formulating or understanding an LP. There are many acceptable algorithm's forms for an LP, just as there are many acceptable modeler's forms. Choice of an algorithm's form depends on the algorithm and its implementation. However, the main arguments herein will not depend on any particular algorithm's form, but will apply to any explicit, specific, and redundant form that is convenient to an algorithm. 2.3 Translation

Both of the above forms are essential to linear programming: every practical LP of any size is conceived in a modeler's form and solved in an algorithm's form. Yet the fundamental characteristics of these two forms are opposite and incomACM Transactions on M a t h e m a t m a l Software, Vol 9, No 2, J u n e 1983

Modehng Languages Versus Matrix Generators for Linear Programming



147

patible. No single means of expression could serve as both a modeler's and an algorithm's form. Practical linear programming thus requires a conversion between the two forms. Most importantly, the modeler's-form conception of an LP must be transformed to an algorithm's form before the LP is solved or analyzed. The work involved in this transformation is here called translation. Since the modeler's form is written by a person and the algorithm's form is read by a computer, translation necessarily entails both human work and machine computation. Modeling languages and matrix generators differ most fundamentally in how they divide this task of translation between human and machine. Thus translation is a central topic of this paper. Because the modeler's form is symbolic and the algorithm's form is explicit, translation must involve the substitution of appropriate explicit data values for certain symbolic parameters. In order to specify such a substitution, a modeler must first gather the data values and define their correspondence with the parameters; such a task is called data management in the sequel. Conceptually, data management and translation are independent tasks; the former creates and organizes data values, while the latter merely uses them. As a practical matter, however, the implementation of either of these tasks has certain implications for the implementation of the other. Thus, although this paper is concerned primarily with translation, data management is also considered where it bears on the comparison of modeling languages and matrix generators. 3. GENERAL DRAWBACKS OF MATRIX GENERATORS

A matrix generator is a computer program that writes out the algorithm's form-the coefficient matrix--of an LP. Translating an LP from modeler's to algorithm's form traditionally involves both a person and a matrix generator: first a person reads the modeler's form and writes a matrix-generator program for it; then the matrix-generator program is executed to produce an algorithm's form. Since a matrix-generator program is itself a description of an LP model, it is properly viewed as just another form of LP. Such a matrix-generator form can be symbolic or explicit, general or specific, and concise or redundant to various degrees, but its most important qualities are understandability as a programming language and convenience to a compiler or interpreter. Consequently, a matrix generator typically incorporates loops, assignments, transfers of control, or other executable statements, which are more valuable in describing a computer program than in describing an LP. Matrix-generator form is thus not a modeler's form as previouslydefined. It is instead a distinct intermediate form. Whereas modeler's forms are designed for formulating and communicating LPs, matrix-generator forms are intended primarily to facilitate translation to an algorithm's form. Because modeler's form and matrix-generator form are dissimilar, converting from the one to the other-which must still be done by a person--is not a trivial job. Indeed, writing a matrix generator is a job of computer programming, and has all of the characteristics of any programming task. Matrix generation is certainly a great advance over translation by human labor alone, especially for very large LPs. Nevertheless, there are substantial difficulties inherent in the use of matrix generators. This section first considers difficulties of ACM Transactions on M a t h e m a t i c a l Software, Vol. 9, No. 2, J u n e 1983.

148



Robert Fourer

three kinds--verifiability, modifiability, and documentability--that are inherent in any sort of modeling, but that are exacerbated by the introduction of matrixgenerator form and computer programming. Subsequently, the drawbacks of matrix generators in two additional areas--independence and simplicity--are also argued. Specific drawbacks of some of the standard features of matrixgenerator design are considered separately in the following section. It will be convenient in the sequel to refer to matrix generators as MGprograms or just MGs. A programming language intended for writing matrix generators will thus be called an MG language, and a processor for such a language will yield an MG system. Analogously, the matrix-generator form of an LP will be called the

MG form. 3.1

Verifiability

A central problem of linear programming.is verification: determining whether the modeler's form correctly represents reality, and determining whether the algorithm's form is a correct translation of the modeler's form. Verifying the modeler's form (or validating the model) is a task inherent in modeling, whereas verifying the algorithm's form is a job that varies according to how the translation is carried out. Under matrix generation, in particular, verifying the algorithm's form amounts to debugging an MG computer program. MG verification is thus subject to the problems of cost and reliability that are normally associated with program debugging. Moreover, MG programs tend to be especially difficult to debug. Their output--the algorithm's form--is voluminous and is not meant to be read by people; hence, direct manual methods of verification, such as inspection of listings of the MG output, are neither efficient nor reliable for large LPs. Instead, MGs are most often debugged by a series of indirect or automated methods. First, an MG program may be double-checked by a person who may recognize obvious inconsistencies with the modeler's form. Second, an MG may be run through the MG system, which may signal certain errors in compilation or execution. Third, an MG's output may be examined or tested by specially designed computer routines: examination routines {such as those in PERUSE [23]} simply read the MG output and display parts of it more understandably, while more sophisticated testing routines (such as those incorporated in ANALYZE [11, 35] and the CHEK component of RPMS [2]) conduct diagnostics designed to reveal common errors. Finally, an MG's output may be fed to the simplex algorithm, where remaining errors may be reflected in infeasibility, unboundedness, or implausibility of an optimal solution. Usually the nature of an abnormal LP solution offers some clue to the location of an MG error, but a certain amount of detective work--by hand or by use of computer routines--is needed to actually find the error. Debugging by the above methods is a substantial task that must be repeated for each new MG program, often at considerable cost in human time, computer time, or both. Yet these methods, since they rely on heuristic and indirect approaches, are only moderately reliable. Even an erroneous MG can look correct to a person, can generate output that passes many diagnostic tests, and can represent an LP that has a plausible solution. Thus there is normally a nonnegACM Transactions o n M a t h e m a t m a l Software, Vo| 9, No. 2, J u n e 1983

Modehng Languages Versus Matrix Generators for Linear Programming



149

ligible risk, in the use of an MG, that the wrong LP will be generated, solved, and analyzed. Additionally, employment of the above indirect methods to debug an MG and verify the algorithm's form tends to complicate the task of validating the model. Certain signs of an incorrect model--particularly failure of diagnostic tests and abnormality of the solution--are much the same as the indirect signs of an incorrect MG. In response to such signs, the modeler must consider the possibility of an MG error, even when only a model error is causing the difficulty. 3.2 Modifiab~hty

Parallel to verification, there is the problem of modification: determining a new modeler's form when there are changes in reality, and determining a new algorithm's form when there are changes in the modeler's form. Modifying the modeler's form is a task inherent in modeling, whereas modifying the algorithm's form is a job that varies depending on the means of translation. Under matrix generation, modifying the algorithm's form amounts to revising the MG computer program. Revising the MG tends--as for any computer program--to introduce new bugs, thus raising again all of the problems of debugging discussed above. MG programs are especially prone to modification. Whenever an LP model is changed--whether to reflect changes in reality, to test a new hypothesis, or to correct a deficiency in the model--the corresponding MG must also be changed. Thus it is often impractical to freeze an MG in the interests of reliability or economy, as is done with other application programs. Instead, modifying and debugging the MG can become a fairly persistent task. 3.3 Documentabdity

Along with verification and modification, there is the unavoidable problem of documentation: maintaining understandable information about the LP's forms and their relationships. The modeler's form must be documented with an explanation of how it is intended to represent reality, and each algorithm's form must be documented with a record of the model and data that it represents. In matrix generation, it is also necessary to document the MG program. Thus the modeler also encounters all of the usual difficulties of documenting a computer program. Adequate documentation is time-consuming and unpopular with programmers, yet inadequate documentation can eventually make it harder to use or change a program. Documentation of MG programs, moreoever, tends to be particularly extensive and essential. An MG requires internal documentation of its logic and organization, just like any computer program, plus external documentation of how it represents the LP modeler's form. Good MG documentation thus is fairly lengthy, both because MG form is different from modeler's form and because MG form is not especially understandable. Inadequate MG documentation can severely complicate the already difficult problems of verification and modification. Poor internal documentation makes it harder to debug the MG, while poor external documentation makes it harder to debug the model. In extreme cases of the latter, the relationship between a model ACM Transactions on M a t h e m a t i c a l Software, Vol. 9, No. 2, J u n e 1983.

150



Robert Fourer

and its matrix generator may become quite obscure, so that eventually no modeler can remember exactly what LP the MG represents. 3.4 Independence

A further weakness of matrix generators lies in their involvement with the algorithm's form. Whereas a modeler's form of a linear program is independent of other forms, an MG form is naturally tied to a particular algorithm's form, since the former specifies how the latter is written. Indeed, primitive matrix generators contain executable statements that describe the writing of algorithm's form in great detail. More sophisticated MG languages handle these details implicitly, but their designs still necessarily reflect the intended algorithm's form. Interdependence of MG form and algorithm's form has two drawbacks. First, modelers who write MGs are forced to learn about a particular algorithm's form, even though they are really interested only in a modeler's form. Second, people who implement algorithms are forced to accept the particular algorithm's forms that are written by existing MGs, even though other forms may be preferable. It is possible to avoid both of these drawbacks only at some cost. To insulate the modeler from algorithm's form, special computer systems may be devised to automatically generate MG programs according to a modeler's instructions; such systems are a further expense, however, and they further complicate the problems of verification, modification, and documentation. To insulate the algorithm designer from existing particular algorithm's forms, programs may be written to convert between existing forms and new ones; such programs, too, are an additional expense. 3.5 Simphcity

A final drawback of matrix generation stems from its overall influence on the job of translation. By introducing another form of LP, the MG form, matrix generation inevitably complicates the conceptually simple job of translating from modeler's to algorithm's form. The MG user must deal both with the task of formulating an LP in an understandable form and with the task of programming an LP in an executable form, and must plan a conversion from the first of these forms to the second. It is reasonable to ask whether there may be a simpler and more straightforward approach. After all, while the modeler's and algorithm's forms are inherent to linear programming, the MG form is purely a consequence of computer-system design. 4. SPECIFIC DRAWBACKS OF CURRENT MATRIX GENERATORS

Existing matrix-generation systems include DATAFORM [21], DATAMAT [25], GAMMA [32, 33, 34], MaGen and OMNI/PDS [12, 13], IBM MGRW [16], APEX-II MRG [5], and MODELER [4]. These systems have quite similar philosophies of design, though they differ more or less in details. The comments below thus apply to all of them, with only a few exceptions. 1 ' A more extensiveintroductionto the organizationand use of current MG systemsIs containedIn the textbook by Murtagh [28]. ACMTransactionsonMathematmalSoftware,Vol 9, No 2, June 1983

Modehng Languages Versus Matrix Generators for Linear Programmin9



151

Each of these systems incorporates its own MG language in which its MG programs are written. Additionally, each system writes its output in a standard kind of algorithm's form known as MPS form [17]. Virtually all commercial simplex-method implementations accept this MPS form as input, then translate it to a compact speciahzed algorithm's form that is more suitable to efficient computer routines. Particular variations and extensions of M P S form are also accepted by particular commerical codes, such as CDC APEX-III [6] and Honeywell MPS [15]. Current MG systems thus commonly presume a sequence of four forms: modeler's form, MG form, standard (MPS) algorithm's form, and specialized algorithm's form. Only the first of these is of intrinsic interest to the modeler, and only the last is actually employed by a simplex-method code to solve an LP problem. The other two forms are intermediaries in the process of translation. The modeler's form and specialized algorithm's form for a sample LP are displayed, as noted previously, in Figures 1 and 5 {Appendix C). A MaGen MG program for the same LP is presented in Figure 3, and the MPS-form output of this program is in Figure 4. Even a cursory examination shows that each of these forms is quite different from any of the others. The most striking feature of Figure 3 is its use of a special-purpose matrixgenerator language; all of the above-cited MG systems employ a language of this kind. In principle, any sufficiently powerful programming language may serve to write MGs. F O R T R A N may serve as an MG language, for example, and a F O R T R A N compiler may be part of an MG system. From the programmer's point of view, however, there are definite advantages to using an MG language that is specially designed for matrix generation. Programs are easier to write in such a language, because it explicitly deals with major features of an LP (sets, parameters, rows, columns, bounds) and implicitly handles the minor details of writing the MPS form. A specialized MG language also fosters more reliable programming by imposing certain conventions and structures on MGs. Nevertheless, from the modeler's point of view, current special-purpose MG languages still have serious drawbacks. Most importantly, these languages are influenced only weakly by LP modeler's forms. They are much more strongly influenced by algorithm's forms--particularly MPS form--and by the data formats of MG systems. As a consequence, the modeler who uses one of these languages must deal with various conventions and restrictions of MG data formats and of MPS form, at the cost of extra work and complication. This section presents in detail the drawbacks of current MG languages and systems. It first considers weaknesses in the representation of MG data, then looks at problems in three areas--naming of LP components, ordering of coefficients, and representation of special constraints--where the influence of MPS form is particularly strong. 4.1 Representation of Data

There is a significant ambiguity in the representation of data in a typical MG program, such as the one of Figure 3. Strictly speaking, only the statements beginning with the first "COPY" are executed to generate the LP matrix; the preceding DICTIONARY and DATA sections merely define one instance of the ACM Transactions on M a t h e m a t i c a l Software, Vol. 9, No. 2, J u n e 1983.

152

Robert Fourer

LP's set and parameter data. Yet, to any reader of Figure 3, the DICTIONARY and DATA sections also serve as a general description of classes and tables that the LP requires; without these sections the figure would be incomplete. Data representation for an MG system such as MaGen is thus a compromise between the symbolic description in modeler's form and the explicitness of algorithm's form. This is a workable arrangement, but it has two sorts of drawbacks for data handling in MG systems. First, MG data-handling is inflexible. Any data to be used in the LP, from whatever source, must be converted to the form of the CLASS and TABLE statements in order to be used by the MG program. These statement forms necessarily restrict the available data and file structures. MaGen, for example, requires that all numerical data be expressed in the form of two-way tables. There is no provision for expressing data as scalars, vectors, or three-way tables; nor is there a convenient way of expressing large sparse tables compactly. Second, MG data-handling complicates verification. Because the explicit and symbolic representations of MG data are combined, an error in specifying particular data values can easily lead to structural errors in the generated matrix. As an example, one typographical error in a CLASS listing can cause an MG to produce nonsensical variables. Such an error is especially likely when CLASS and TABLE data are repeatedly stored, modified, and recalled. 4.2 Naming of LP Components

A prominent feature of the MPS form in Figure 4 is the use of short "names" to identify model components. In the ROWS section, each constraint is defined by a unique name of eight or fewer characters. In the COLUMNS section, each variable is similarly defined by a unique eight-character name; each nonzero coefficient of a variable is also defined by giving its value and the row name of the constraint in which it lies. Simple bounds on variables are similarly defined in the BOUNDS section. Since current MG systems write MPS form, one of their major tasks is to form distinct eight-character names in some systematic way. This task has heavily influenced the design of matrix-generator languages, as Figure 3 shows clearly. In the DATA section, which specifies the LP's numerical data, every table row and column is labeled by a short string of characters. Subsequently, a special kind of compact expression--for example, "X(PRD)(T)"--indicates how these short strings are to be concatenated to form row and column names. In the COLUMNS and BOUNDS sections, the relationship between short strings and tables is exploited to specify the data values for coefficients and bounds. Used carefully, the name-manipulation features of a matrix-generator language can serve much the same purpose as subscripting in a modeler's form. Each CLASS in the DICTIONARY section is similar to a set of indices; an expression such as X(PRD)(T) can be regarded as defining a collection of variables subscripted by the indices in classes PRD and T. The principal drawback of MPS naming is its inflexibility: all indexing information for constraints and variables must be encoded in fixed-length character strings, regardless of the complexity or nature of the indexing. MPS naming schemes become progressively more difficult to construct as variables are indexed ACM TransacUons on M a t h e m a t m a l Software, Vol 9, No 2, J u n e 1983

Modeling Languages Versus Matrix Generators for Linear Programming



153

over more or larger sets, since normally each index set must be represented by at least one character in an MPS code and large sets may require two or three of the eight available characters. Numerical indices also pose special difficulties because numerical values must be converted to character strings within MPS names. Arithmetic operations on numerical indices in expressions such as s,,t+l are particularly troublesome and are often handled awkwardly by MG languages (as in Figure 3). The inflexibility of MPS names also tends to make them hard to understand. As one introduction to GAMMA [32] states, "Analysts normally attempt to associate some mnemonic significance with the variable label, but when the complexity and number of variables becomes large, the readability of variables becomes nearly hopeless." To the extent that it is difficult to decipher MPS form, it is also difficult to debug the MG program, as explained previously. Specialized systems, such as the aforementioned PERUSE and ANALYZE, may help with debugging, but even these systems rely on the modeler's understanding of the MPS naming scheme. Finally, inflexibility of MPS names adds to the work of linear programming by requiring the modeler to plan and document string lengths and naming conventions. This extra work is not a fundamental part of LP modeling; it is necessitated only by the particular design of MG systems and MPS form. 4.3 Ordering of Coefficients

MPS form, as with the more specialized algorithm's forms, contains a column-bycolumn list of the nonzero elements of the LP constraint matrix. Consequently, many popular MG languages require that nonzeros be specified variable by variable. In the MaGen program of Figure 3, for example, the ROWS section gives only the names of the constraints, whereas the following COLUMNS section both names each variable and indicates each variable's nonzero coeffÉcients. This columnwise arrangement, which is exactly parallel to the arrangement of MPS form, is part of the MaGen language; there is no option to put the COLUMNS section first and list the coefficients in the ROWS section. Other MG systems, such as DATAMAT, GAMMA, and IBM MGRW, do provide for row-by-row specification of coefficients. Even so, their design tends to favor the columnwise arrangement of MPS form; row-wise specification of coefficients may be less convenient or may be handled less efficiently. A GAMMA manual [34], for example, advises under "Row-Wise Coefficient Statements" that "the user should use this option sparingly when defining large problems." Similar advice is given in an MGRW manual [16] under "Generation by Row or Column." This columnwise bias of current MGs is justifiable when LPs have a natural formulation by column (or "by activity"). Since the concept of a "natural" formulation is open to interpretation, the prevalence of natural columnwise formulations is a matter of some debate. It can be argued that, in practical applications, more LPs are naturally columnwise than are naturally row-wise. Further, the columnwise bias of current MG systems and languages can be cited as evidence that modelers prefer columnwise formulations. Nevertheless, a strong case can be made for the naturalness of row-wise (or constraintwise) formulations, as in the example of Figure 1. For one thing, it is ACM Transactions on Mathematical Software, Vol. 9, No. 2, June 1983

154

Robert Fourer

natural to regard an LP as a problem of optimization, and the general form of an optimization problem--minimization or maximization of an objective, subject to constraints--is essentially row-wise. Furthermore, constraintwise descriptions are greatly favored in written reports of LPs, which presumably reflect how modelers naturally think of these problems. Finally, there is good reason to believe that the columnwise organization of current MG languages was dictated more by convenience and habit than by user preference, as Orchard-Hays [29] suggests. When LPs a r e formulated by constraint, the columnwise bias of MG systems is a considerable inconvenience. Columnwise bias complicates the job of translation by requiring the writer of an MG program to convert from row-wise to columnwise organization. Columnwise MG form is also harder for people to verify and modify when the LP is naturally thought of in row-wise terms. The columnwise treatment of objectives and right-hand sides is particularly inconvenient; coefficients for all objective functions are mixed in with variables' other coefficients, while all right-hand-side constants are collected in a separate section (headed RHS in Figure 3). From the modeler's standpoint, therefore, MG languages' columnwise bias is arguably a drawback for linear programming. At least, to the extent that the modeler's form is naturally constraintwise, a columnwise MG form is inconvenient and unnatural. Only from a computational standpoint do columnwise MG programs have some advantage, as they generally execute efficiently in writing the columnwise algorithm's form. In addition, MG languages' columnwise bias is certainly a drawback for nonlinear programming. The columnwise expression of coefficients does not naturally generalize to the expression of varied nonlinear objectives or constraints. Consequently, current MGs are not conveniently adapted to handle nonlinear models. 4.4 Representation of Special Constraints

An efficient algorithm's form stores upper and lower bounds on the variables separately from other constraints, so that the bounds can be handled directly by the simplex method. Thus MPS form, in particular, has a separate section in which variables' bounds are specified. As a consequence, there is a corresponding BOUNDS section in the MaGen program of Figure 3; any simple bounding constraint that is to be handled as such by the simplex method must be specified in this BOUNDS section rather than in the FORM ROW ID and COLUMNS sections. Similar comments apply to other kinds of special constraints. Upper bounds on slack or surplus variables must be specified in a RANGES section of MPS form, and various kinds of generalized upper bounds require m a r k e r c a r d s of different sorts. These are handled in an MG program by special statements that generate the required headings and markers. By contrast, modeler's forms express special constraints in the same way as other constraints. Simple upper bounds are written, for example, "x,