Programming Shorthands - CiteSeerX

0 downloads 0 Views 174KB Size Report
Jan 19, 2000 - Such repetition is annoying to write, and provides little bene t in terms of program readability. ... shorthands they are including in their language.
Programming Shorthands Todd A. Proebsting Benjamin G. Zorn January 19, 2000 Microsoft Research Technical Report MSR-TR-2000-03

c 2000 Todd A. Proebsting and Benjamin G. Zorn Microsoft Research Microsoft Corp. One Microsoft Way Redmond, WA 98052 USA

Programming Shorthands Todd A. Proebsting Benjamin G. Zorn Microsoft Research January 19, 2000

Abstract

We propose programming language mechanisms to reduce redundancy in program source code. These abbreviation mechanisms, shorthands, make programs shorter and easier to write and read. In addition, we provide a framework for describing language abbreviation mechanisms.

1 Introduction Most computer programs contain redundant descriptions of similar entities. For instance, it is not uncommon to see program fragments like the following, which includes a repeated expression. foo.diffusion_array[i] = xi; foo.diffusion_array[j] = yi; foo.diffusion_array[k] = zi;

Such repetition is annoying to write, and provides little bene t in terms of program readability. It is possible to eliminate some kinds of redundancy by naming redundant entities|either through variable bindings or macro substitutions|and then repeating that name: tmp = foo.diffusion_array; tmp[i] = xi; tmp[j] = yi; tmp[k] = zi;

Names can reduce redundancy, but they are tedious to invent and are often cumbersome or impossible to use cleanly. For example, the following contains redundancy that would be impossible to eliminate in Algol-like languages without resorting to awkward and inelegant macro processing. fn(a, 0, 1, 2); fn(b, 0, 1, 2); fn(c, 0, 1, 2);

The macro solution is awkward and hard to read:

1

#define FN(x) FN(a); FN(b); FN(c);

fn((x), 0, 1, 2)

We present new mechanisms for reducing program redundancy, while maintaining program readability. Our mechanisms are designed to make writing programs easier by eliminating the need to de ne new names (or macros) for repeated program segments. Eliminating names is a substantial bene t|programmers dislike creating names [1, 6]. Mechanisms for abbreviating programs have existed for some time. Pascal's with construct eliminates the need to repeat references to the same structure. Similarly, C's \x++" abbreviates the common idiom, \x=x+1." As a sample of the kinds of language mechanisms that we favor, we o er alternatives to our previous examples that are easier to read and to write. foo.diffusion_array[i] = xi; [j] = yi; [k] = zi; fn(a, 0, 1, 2); fn(b, ... ); fn(c, ... );

Clearly, these examples look unusual. That's part of the contribution of this research|to challenge the conventional wisdom about what programming languages should look like and what features they should support. We are proposing language features that support what programmers actually want to do|write short programs quickly (i.e., \programming-in-a-hurry"), without losing readability. To this end, we propose program abbreviation mechanisms|\shorthands." To facilitate reasoning about and comparing these mechanisms, we outline a framework for classifying program shorthands. This work is meant to focus attention on the important issue of supporting programmers as they actually work. We've all heard programmers boast how their language (e.g., Perl, Tcl, etc.) allowed them to write some concise program. This work was motivated, in part, by the abbreviation mechanisms found in languages like Perl and Tcl. We hope to launch a principled discussion of mechanisms for making programs shorter, and to propose a few means for doing this. We have accomplished our goal with this paper if every programmer reading it says at some point: \I could have really used that feature." Further, we encourage language designers to address explicitly what shorthands they are including in their language.

2 Names Natural language is highly e ective at compressing and unambiguously expressing complex concepts. Words (names) provide a concise encoding that provides signi cant compression with little loss of information. Compression is achieved in natural language in two ways: large vocabularies and pronouns. Natural languages have very limited forms of user-de ned names (proper nouns) and instead support 2

great expressiveness by providing large xed vocabularies. Further compression is achieved by providing pronouns whose referent is context dependent. Most people would consider the sentence \The Archbishop of Canterbury entered the pub where the Archbishop of Canterbury ordered a pint of ale," too long. Substituting the pronoun \he" for the second occurrence of \Archbishop of Canterbury," improves the sentence considerably|making it easier to read (and write!). Note that the use of pronouns does not require the creation of a new name in order to shorten the sentence.1 Unlike natural languages, programming languages typically have a small xed vocabulary (builtins, keywords) and a larger user-de ned vocabulary (function names, types, local variables, etc). As a result, a signi cant part of the e ort of writing a program is deciding what things to name and what to name them. While programmers have many naming decisions to make, languages typically provide few mechanisms beyond de nition facilities to help them make these decisions. A well designed language will provide the user both with features to create new names, and with features to avoid creating unnecessary names. Every additional name added to a program has associated costs. The programmer has the burden of choosing an appropriate name, declaring the entity being named in languages with declarations, and ensuring that the name does not con ict with existing names in the namespace. As more names are introduced, the mental task of remembering all names and their scopes becomes increasingly dicult. A person reading a program with many unfamiliar names has the burden of remembering what each name means. One of the widely used features in C++ and ANSI C is the ability to mix statements and declarations in the body of a function. The main advantage of this feature is that it reduces the e ort required by programmers when they need to create new names. Because introducing new names is an e ective way to make programs more concise, programmers are often tempted to create them. Programming shorthands, akin to pronouns in natural language, allow programmers to achieve conciseness in programs, without resorting to creating new names.

3 Framework Before discussing speci c proposals for shorthand notations, we rst introduce a framework for classifying the way that names and abbreviations are used. First, it is necessary to gure out how programmers will refer to things. We see three interesting ways to refer to things: Full Name: Items can be referred to by their full names, which usually consist of identi ers and data-access operators. For example, foo.bar.sam = 3; Pronoun: Items can be referred to by an abbreviated name. In Perl, for instance, one can access a subroutine's parameter array as @_. Anonymous (Default): Items can be referred to by no name at all. For instance, method calls in most object-oriented languages default to dispatch on the current object. I.e., foo() is a shorthand for self.foo(). The other dimension of our classi cation framework is determining the referent of a particular name. Full names are easily understood|they refer to exactly what they name. Pronouns In fact, modern programming language design would dictate that the original sentence be either left alone, or turned into something like \Let AC be (Archbishop of Canterbury) in (The AC entered the pub where the AC ordered a pint of ale.)" This hardly seems an improvement. 1

3

and default references must refer to something else, however. We see two possibilities for these references:

Programmer-de ned Referent: Pronouns (or default values) can be explicitly bound by the

programmer. For instance, in Perl it is possible to refer to a sub-match of a regular expression by putting the sub-expression inside parentheses and then referring to the matched value as $n for the nth such sub-expression. Language-de ned Referent: Many languages have pronouns that refer to speci c values by convention. For instance, AWK uses $1 to refer to the rst eld of a parsed input record. Similarly, Java uses this to refer to the current object within a virtual method. Thus, we have outlined two dimensions to the problem of naming individual program entities: how the noun is named, and how the referent is bound. The table below illustrates the di erent combinations along these dimensions, using the following simple code fragment as an example. a.b.c.d.e = x+y; a.b.c.d.f = x+z;

(The examples are fully explained in the subsequent section|the table simply is designed to shed light on the two dimensions of the framework.)

4

Name

Referent Language-De ned

Programmer-De ned a.b.c.d.e = x+y; a.b.c.d.f = x+z;

Full Name

Full names explicitly refer to their (N/A) values|there is no implicit referent, either programmer- or languagede ned. (a.b.c.d).e = x+y; $().f = x+z;

Pronoun

(a.b.c.d).e = x+y; '' .f = x+z;

a.b.c.d.e = x+y; '' .f = x+z;

The programmer-de ned referent is The language-de ned referent of the the most-recently parenthesized ex- pronoun '' is the last expression to pression. The pronouns are either $() which \." was applied. or ''. (a.b.c.d).e = x+y; .f = x+z;

The programmer-de ned referent of Anonymous the missing (i.e., anonymous) expression is the most-recently parenthesized expression. Therefore, .f is applied to that parenthesized expression.

a.b.c.d.e = x+y; .f = x+z;

The .f is applied to the languagede ned referent of the missing expression. This referent is the most recent expression to which \." was applied.

4 Proposed Shorthands Our proposed shorthands use pronoun and anonymous naming as well as programmer-de ned and language-de ned referents. The list is not meant to be exhaustive, but rather to demonstrate that shorthands can yield concise, unambiguous programs that are easy to read and write.

4.1 Parenthesized Referents

A simple, concise mechanism for a programmer to denote a referent is to put it inside parentheses. A mnemonic pronoun to refer to the most recently parenthesized expression could be $(), if $ is used to pre x pronouns. A simple example follows. (foo.diffusion_array)[i] = xi; $()[j] = yi; $()[k] = zi;

5

Similarly, this example makes sense with anonymous naming: (foo.diffusion_array)[i] = xi; [j] = yi; [k] = zi;

Please note that it is not our intent to promote a particular pronoun (e.g., $()), or to argue for or against anonymous naming, but rather to point out that these concepts are convenient and should be considered.

4.2 Similar-Use Referent

It is common to apply the same operator (or method) repeatedly to the same expression with di erent operands, or in di erent contexts. For this situation, pronoun or anonymous naming can refer to the last expression to which a given operator was applied. In the following example, the pronoun '' (ditto ) simply refers to the last expression to be used in a similar context: foo.diffusion_array[i] = xi; ''[j] = yi; ''[k] = zi;

Anonymous naming works similarly: foo.diffusion_array[i] = xi; [j] = yi; [k] = zi;

Repetitive calls on the same function could be abbreviated with either a pronoun or anonymous naming of the referents: foo(x foo(x foo(x foo(x

+ + -

y, y, 1, 1,

z, bar(g)); ...); ...); ...);

+ + -

y, y, 1, 1,

z, bar(g)); ); ); );

or foo(x foo(x foo(x foo(x

6

4.3 Construct-Bound Referents

Control constructs are often controlled by the values produced by a particular expression. For example, loops often have a control variable. We propose a pronoun be de ned whose languagede ned referent is that control variable for the enclosing control construct. The example below uses the pronoun $loop to represent the values controlling a loop. for 1 to 10 { if $loop > 5 print "big"; }

4.4 Recent-De nition Referents

Often, values are computed into a temporary value and then immediately used simply to break up a complex computation. Likewise, sometimes an assignment is made to a complex variable, and that value is immediately accessed. We propose a pronoun, $=, whose language-de ned referent is the most recently assigned value. Thus, one could write the following. spatial_dist.pt.x.velocity = z * c; foo($=);

Pronouns could refer to speci c recently computed values. For instance, $retval might refer to the value most recently returned from a function: fopen("foo.txt", "r"); if ($retval != 0) print "error opening file";

4.5 Data-Selector Referents

Sometimes, we want to abbreviate a repeated set of operations. For instance, traversing many levels of indirection occurs frequently when accessing complex data. Being able to elide such references makes code easier to read and write: planet1.spatial_dist.pt.x.velocity = v1; planet2.spatial_dist.pt.x.velocity = v2; planet1...mass planet2...mass

= m1; = m2;

planet1...accel = a1; planet2...accel = a2;

The operator \..." denotes the elision. While strictly not a pronoun, it has a similar \shorthand" feel, and how it is resolved is language-de ned. One could imagine a programmer-de ned analogue: planet1.(spatial_dist.pt[0]).velocity = v1; planet1.(...).mass = m1; planet1.(...).accel = a1;

7

4.6 Parameter Referents

A common shorthand in some programming languages is the use of default parameters. Unfortunately, many languages limit the defaulting mechanism to omitted trailing parameters with no compelling reason. Default parameters can be language-de ned or programmer-de ned, and both pronoun or anonymous naming work. In the example below, \~" is our pronoun to represent the use of a default parameter value, and parameter c uses the language-de ned default value. function foo(a := 3, b := "hello", c) end function foo(~, "bye", ~); foo( , "bye", );

Use of default (or optional) parameters that bind to language-de ned referents is a common practice in programming languages. For instance, the Pascal write procedure takes an optional rst argument to specify the output le|if it is missing, it defaults to output.

4.7 Positional Referents

One might want to refer to values by their declared position. For instance, one can refer to functional parameters by their declared position, or to structure elds by their declared position. These positional pronouns refer to language-de ned referents. function min(...) if $1 < $2 then return $1 else return $2 end

5 What is the Scope of a Pronoun Binding? Determining the referent of a pronoun is subtle. Even the simple example of a parenthesis-bound referent|where the programmer refers to the most recently parenthesized expression with $()| raises interesting questions about the scope of a pronoun-referent binding. Consider the following example. x = (a.b.c.d); if (y) { i = (0); } else { r = (3.14); } write($());

What does $() refer to, a.b.c.d, 0, or 3.14? We see three broad categories of scoping policies for pronoun bindings: 8

Closest-De nition Binding: A simple macro-substitution de nition of pronoun binding would nd the lexically-closest parenthesized expression (3.14) and use it. Dynamic Binding: A dynamic-binding mechanism would bind the pronoun to the most recently executed binding expression. Hence, $() could refer to either 0 or 3.14 depending on dynamic control ow. Static Binding: A static binding mechanism would restrict bindings to obey normal lexical scoping conventions. In the example above, 0 and 3.14 would not be able to escape their scopes and $() would be bound to a.b.c.d

6 Implementation Issues Implementing some of the proposed shorthands may be dicult, to the extent that they may not be possible to implement in their full generality. There are three approaches to implementing the shorthand bindings that we have presented: text-based substitution, structure-based substitution, and semantics-based bindings. Shorthands can be implemented as a pre-processing phase that performs text substitutions, which has advantages. First, because the text expansion is independent of the target language syntax, the shorthand notations it provides can be used across target languages. In this way, C, C++, and Java could all be extended with the same set of shorthands. Second, this approach is the easiest to implement and deploy. There are also limitations of using a preprocessor to implement shorthands. First, because there is no syntax checking, the program source can become embedded with syntactically invalid constructs. For example: x = (a + ) b y = $() c

Second, there are many shorthand notations we have discussed that cannot be expressed using a preprocessor. For example, uses of anonymous naming are not possible because there is no pronoun to indicate where the substituted text should be placed. Shorthand notations can also be implemented as parser extensions. In this case, shorthand symbols (or the absence of symbols, in some cases) are handled as a special case by the parser and the resulting parse tree is manipulated appropriately. Parser extensions can implement anonymous references to the extent that a language grammar that includes them can be written unambiguously. Because they lack a semantic understanding of the program, parser extensions remain a form of substitution, but are more powerful than preprocessors. To illustrate, the following example can be implemented with a parser extension, but not with a preprocessor. For this example, assume that the use of the pronoun $() in this context refers to the previous right-hand side of a \." operator (i.e., rectangle[i]). area = rectangle[i].width * $().height;

The parser technology needed to implement some of the shorthands we have proposed is nontrivial. For example, this notation 9

foo(x + y, z, bar(a, x*x, y*y)); foo(x - y, z, bar(a + 1, ...));

requires tree pattern matching mechanisms. While it seems clear to a human reader what is meant, it is less clear what the actual semantics of this shorthand are and whether the semantics could be explained to a programmer clearly and concisely. Parser extensions are still limited in their expressiveness and are unable to implement shorthands for which semantic information is necessary to resolve ambiguity. The syntax-based mechanisms for implementing shorthands are limited in power because they are essential substitution-based techniques. The referent of a pronoun is simply substituted for the pronoun, without any analysis or optimizations applied. Compiler-based techniques are more powerful. The lack of analysis in a substitution-based mechanism eliminates certain kinds of pronounreferent bindings. For instance, they cannot implement default parameter substitution, which would require the use of a symbol table. In fact, they cannot handle any pronoun binding policies that require symbol-table information (e.g., data-selector referents, positional referents, etc.) Making pronoun-binding a part of the compiler also creates optimization opportunities. It is possible for the implementation to allocate temporary storage for pronouns eciently. For instance, the L-value or the R-value (as appropriate) for a pronoun could be cached in storage rather than being recomputed|an advantage over substitution-based techniques. Further, this caching of an L-value may be more desirable semantics for a pronoun than the recomputation implied by the substitution-based techniques.

7 Related Work Many languages contain shorthand mechanisms that are appreciated by programmers. A number of researchers have speci cally considered the cognitive aspects of programming language design. More than twenty years ago, John Backus in his Turing Award lecture complains about the elaborate naming mechanisms in von Neumann languages that \: : :interfere with the use of simple combining forms: : :" [1]. As Backus points out, the elegance of one-line programs in APL is achieved largely without the use of any names at all. Many language designers insist that the terseness of APL programs is a source of signi cant problems in actual use. In the Ada83 design rationale, for example, keyword abbreviations, such as proc instead of procedure, are avoided \: : :since we believe full words to be simpler to read." [3]. There is currently little strong evidence either way that languages with shorthand notations are more dicult to read or write. Several successful languages, including C, C++, Java, and Perl, are viewed as cryptic and terse by novice programmers. Nardi discusses the goal of making human-computer interaction mimic conversation [7]; much of what makes human interaction so compact is the large amount of context shared by the individuals interacting. This context is used to resolve natural ambiguities that arise when words like \that" and \nice" are overloaded. Our proposals for shorthands similarly attempt to achieve a greater degree of compactness in programming languages. Nardi also cites studies showing that spreadsheet users bene t from not having to create variable names, a task many users \ nd confusing and burdensome" [6]. Our pronouns often eliminate new variables. Shorthands are not new and have been an important and heavily used feature for many years. From the earliest designs, e orts to reduce the amount of text in a program have been proposed 10

and adopted with varying amounts of success. Fortran's implicit type declarations based on the starting character of a variable name represents an early e ort to reduce unnecessary verbiage. Macro processors have been adopted with varying degrees of success for similar reasons, but they di er from our shorthands in two ways. First, macros require the creation of new names, which we believe to be a substantial source of overhead, and hence avoid in our shorthand suggestions. Second, macros are traditionally preprocessor transformations, and hence, have syntactic e ect but do not require semantic support in a language. Our shorthands are both syntactic and semantic, and thus may require compiler support to implement. While extensible languages enable programmers to create language constructs that abbreviate compositions of existing constructs, their focus is not on creating shorter programs. Traditional programming languages contain a number of useful shorthands that increase their ease of use. Assembly languages, including Knuth's MIX language, often include a shorthand, such as *, for referring to the current program location [5]. MIX's \local symbols" are another shorthand mechanism that allows locations to be named based on context. C programmers make heavy use of the shorthand x++ for x=x+1. As mentioned, object-oriented languages such as C++, SmallTalk, and Java provide shorthand forms (both pronoun and implicit references) for referring to the instance object inside methods of its class. Pascal, a feature-sparse language, provides the with construct, a mechanism for eliminating the need to refer repeatedly to redundant selectors [4]. Most languages with package mechanisms, such as Ada, provide a use declaration that eliminates the need to qualify fully external references to symbols in other packages. While all of these features may confuse naive users, they are also all widely appreciated and used. Shorthand notations exist in less mainstream languages as well. Functional languages have introduced a number of intersting syntactic conventions. For example, both Haskell [2] and Python [8] use indentation and layout to convey syntactic information, eliminating matching begin/end tokens. Shell languages, because they are used as a command-line interface, have adopted numerous conventions to reduce the amount users type. Shell languages have powerful mechanisms for referring to multiple les concisely using wildcard naming conventions. Shorthand notations for naming aggregates|plural values|are common in programming languages. Array assignment, list and array comprehensions, and array slicing notations are all examples of plural shorthands (available, for example, in Haskell [2]). While the techniques we consider in this paper can be extended to shorthands for plurals, the focus of this paper is on naming individual entities. Perl provides aggressive support for shorthand notations [9]. Perl includes many prede ned variables, giving it a large initial vocabulary of names, and many variables that are similar to English pronouns because they are de ned in a context dependent manner. For example, the variable $ refers, depending on context, to the current input record, the current pattern string, or the current foreach loop iterator variable, among other things. Perl also de nes many of the arguments to built-in functions to have defaults that are de ned by the context. Perl provides an inspiration for this paper, as we share Larry Wall's perspective that language designers should pay close attention to what programmers are doing. Our work di ers from the Perl design in that our intention is to elevate the ideas present in Perl (and other languages) to the point that they become a more central part of modern language design.

11

8 Summary Programs are unnecessarily verbose, which makes them dicult to read and write. We present many possible program abbreviation shorthands, and we present a framework for classifying those shorthands. Our shorthands make programs more concise without making programs more dicult to understand. As a further advantage, shorthands reduce the need to create new names in programs. Many programming languages bene t from limited shorthand mechanisms, yet shorthands have not received signi cant attention from language designers or researchers. We propose that shorthands be carefully investigated and widely supported in programming languages.

Acknowledgements We thank Chris Fraser, Dave Hanson, and Clayton Lewis for their insightful comments.

References [1] John Backus. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM, 21(8):613{641, August 1978. [2] P. Hudak, S. Peyton Jones, P. Wadler, B. Boutel, J. Fairbairn, J. Fasel, M. M. Guzman, K. Hammond, J. Hughes, T. Johnsson, D. Kieburtz, R. Nikhil, W. Partian, and J. Peterson. Report on the programming language Haskell, version 1.2. SIGPLAN Notices, 27(5), May 1992. [3] J. Ichbiah, J. Barnes, R. Firth, and M. Woodger. Rationale for the Design of the Ada Programming Language. Cambridge University Press, New York, N.Y., 1991. ISBN 0-521-39267-5. [4] Kathleen Jensen and Niklaus Wirth. PASCAL User Manual and Report (third edition). Springer-Verlag, New York, N.Y., 1985. Revised to the ISO Standard by Andrew B. Mickel and James F. Miner. [5] Donald E. Knuth. The Art of Computer Programming, volume I: Fundamental Algorithms, chapter 2. Addison-Wesley, second edition, 1973. [6] Clayton Lewis and Gary M. Olson. Can principles of cognition lower the barriers to programming? In Empirical Studies of Programmers: Second Workshop, pages 248{263, 1987. [7] Bonnie A. Nardi. A Small Matter of Programming: Perspectives on End User Computing. MIT Press, Cambridge, Massachusetts, 1993. NARSH. [8] Guido van Rossum. Python reference manual. In 115, page 56. Centrum voor Wiskunde en Informatica (CWI), ISSN 0169-118X, April 30 1995. AA (Department of Algorithmics and Architecture). [9] Larry Wall, Randal L. Schwartz, Tom Christiansen, and Stephen Potter. Programming Perl. Nutshell Handbook. O'Reilly & Associates, 2nd edition, 1996. 12