Freedom (and responsibility) to the developer: the core Rubberband language ..... [4] Apple Inc.: Working with Objects. https://developer.apple.com/library/.
The Rubberband Programming Language Luiz Romário Santana Rios1 , Lais do Nascimento Salvador1 1
Departamento de Ciência da Computação – Universidade Federal da Bahia (UFBA) 40.170-110 – Salvador – Bahia – Brasil {luizromario, laisns}@dcc.ufba.br
Abstract. Many programming languages have features that increase their expressivity and allow the programmer to abstract away implementation details, and express some language constructs in a more readable way. But, many times, this approach means introducing features which are not themselves customizable. Also, it is known that too many core language features can compromise a language’s learnability and readability, since the user has to learn a large portion of those features to understand the code. In this way, we present a new programming language called Rubberband. Rubberband is an object-oriented, classless, experimental programming language. Its main goals are: (i) to offer a high degree of freedom to the programmer; (ii) to be highly flexibile and minimalistic. Some of the more particular features in Rubberband are the ability to define any object as the scope of a block and that all operations are based on message passing to objects, including variable access, assignment and arithmetic operations, for instance. At the end of this paper, the language is compared to similar efforts and future plans are outlined.
1. Introduction Many modern programming languages have features to help the programmer abstract more specific operations into more general concepts. These features increase the reusability of the code, its readability and the expressibility of the language. But those features tend not to be customizable themselves. For example, in C++, the way to declare classes is fixed and defined by the standard specifications of the language [1]. To circumvent this, the Qt GUI framework1 had to create a meta-object compiler, that analyses the source code before the compilation of the program to fetch meta-information about its classes [2]. Besides, according to Sebesta [3], a language with too many basic constructs is more difficult to learn than one with fewer. This article introduces the Rubberband programming language. Rubberband is a programming language created with the main goal of providing an experiment on giving the developer a very high degrees of customizability, flexibility and freedom in several areas of it. The paper is structured as follows: the next section presents the design goals and the characteristics of Rubberband; Section 3 describes Rubberband constructs with code examples; Section 4 presents related works; and, finally, Section 5 disccusses the presented work. 1
http://www.qt.io/
ERBASE - WTICG - 2016
47
2. Language Overview This section will give an overview on the design goals and the characteristics of Rubberband. It will also compares Rubberband’s characteristics with similar approaches in other existing programming languages. 2.1. Design Goals The main goal of Rubberband, presented in the introduction, led to the following design goals, some of them based on Sebesta’s language evaluation criteria [3]: • Core minimalism: having just a few basic elements instead of putting many features inside the core of the language makes it easier to build its features – and the ones needed by the developer – from the ground up, avoiding the redundancy of core language constructs. • Orthogonality: to provide the developer the flexibility to customize the language, it needs to allow composition between most elements of the language in a way that makes sense. • Explicitness: hidden or implicit mechanisms in the language create elements which, being out of reach from the runtime, cannot be directly customized – thus, explicitness increases the customizability of the language. • Conciseness: although making a language concise solely to save keystrokes is considered a bad idea, since it may hurt readability, too much verbosity may also make the code hard to follow. Since the language is supposed to be explicit, many basic expressions will be visible in the code – making all these expressions too long leads to verbose code, which can sacrifice readability more than conciseness. Aside from that, avoiding to use reserved words like “if”, “for”, “while” and others in favor of symbols like “$”, “@”, “~”, etc. leaves room for the developer use the former for their own purposes. • Freedom (and responsibility) to the developer: the core Rubberband language intends to define policies and follow as conventions as little as possible, because one of its goals is to give the developer the ability to define their own conventions and policies. This comes from the idea defining coding style, conventions, the design patterns in use and other kinds of project policies is a job of the framework not one of the language, which should only be concerned in providing the constructs necessary to express those policies. There can be many more considerations on the design goals of Rubberband, but those described above provide a good summary about the general idea of the language and the reasoning behind its current design. 2.2. Characteristics Now that the design goals have been set, the actual characteristics of the latest implementation will be described. Rubberband is: • Purely object-oriented: object-orientation is a concept which needs no introduction. However, pure in this context means that every value in the language is an object. There are some languages in which this isn’t the case, i.e. not all values are objects – namely, C++ [1]. That’s why this distinction is made.
ERBASE - WTICG - 2016
48
• Message-based: unlike many object-oriented languages, though – languages like C++, etc. –, Rubberband is based on message passing instead of function calling. This makes it more similar to Smalltalk [7] and other languages which inherited this characteristic from it, like Object C [4], for instance. But, unlike Smalltalk, messages do not have arguments attached to them in Rubberband. • Classless: In Rubberband, objects are standalone and don’t need a class to exist – although the programmer can inquire on what type of object it’s dealing with. In that regard, Rubberband is closer to languages like Self [9], Lua [10], Io [11], etc.. Languages like those are also referred to as prototype-based languages. • Reflexive: All of the basic objects of the language offer reflection capabilities which enable the developer to query metainformation about an object; they can query the type of the object and if a given message is valid. • Duck-typed: Duck typing is a concept common in dynamic languages like Python [5]. Like in those languages, in Rubberband, if an object behaves like you want it to in the context you’re using, then, for all practical intents, it’s the object you want. Without duck typing, the programmer is required to check if the type of the object is the right type.
3. Language Presentation This section presents Rubberband language: constructs, features and base library. Inline code snippets will be shown inside boxes in a monospaced typeface (like this: 2 + 3 ) 3.1. Objects As seen before, Rubberband is a purely object-oriented, message-based, classless, reflexive, duck typed, experimental programming language. As a purely object-oriented language, the object is everywhere in the program. The basic objects in the language are: • Numbers: 10 , 200.555 , -3 , 0 • Symbols, used to name variables, functions, operations, fields, etc.. Basically, identifiers: a , while , + , etc. • The empty object, denoted by () , used to indicate the absence of a value • Booleans: ?1 and ?0 (true and false, respectively) • Arrays. They start with | , are separated by commas and accept any object as an element – but are not resizable: – |10, 20, 30 – |a, 10, (|10, 20, 30), ?1 • Tables, which map symbols to functions. They start with : , followed by the symbol-object pairs, separated by commas: – :a -> 10, b -> 20, c -> 30 – :yes -> ?1, twelve -> 12 – :list -> (|2, 3, 5, 7), table -> (:happy -> ?1, age -> 25) • Blocks of code. They are first-class citizens in Rubberband, so, just like arrays or numbers, they can be put inside a table, an array, etc.. They’re enclosed between
ERBASE - WTICG - 2016
49
curly braces, within which the code is defined; currently, the code contained inside a Rubberband script file defines a block object, the only difference being the lack of curly braces. More details on blocks will be given in the following section. 3.2. Messages and answers Rubberband is a message-based message. This means that the programmer gets information and modifies the state of an object by sending messages to it. A message to an object can be any other object; to send a message, the programmer just needs to put the message at the right side of the object: ~:arr -> |10, 20, 30, 40 ~:third_element -> ~arr 2
In the above example, the message 2 is being sent to the variable arr . As a reaction to receiving this message, the array gives an answer, which is, in this case, the third element of the array. There are two ways in which an object can react to a message: it either gives an answer or it causes a runtime error if the object doesn’t respond to that message. For example, if you sent 4 instead of 2 to arr , since that index is out of range, that would generate a runtime error: terminate called after throwing an instance of ’rbb::in_statement_error’ what(): In statement "~ arr 4": When sending message 4 to object (|10, 20, 30, 40): Message not recognized
Of course, to send a second message to the answer of a message passing, just place the second message at the right side of the first one, and the same for any subsequent messages. For example: ~:nested -> |1, (|2, 3, (|4, 5, 6, (|7, 8, 9, 10, 11), 12), 13) ~:number -> ~nested 1 2 3 4
In the last expression, the chain of message passing gets a value in the innermost nested array – its fifth element, 11 .
3.2.1. Variable assignment and access Message passing is the only operation in the language, so, even when the programming is, say, creating or accessing the value of a variable, they are sending a message to an object. You might have noticed that, in the previous examples, the syntax of the declaration of the arr and third_element variables looks similar to the one of tables presented in the previous section. That’s not a coincidence: to create a variable or modify the value of an existing one, you send a table containing all the variables you want to declare or change to the current context, which is denoted by ~ . Consider the following example: ~:pi -> 3.14159 ~:circumference -> ~pi * 2 * 1.5
ERBASE - WTICG - 2016
50
In the first line, you’re simply sending :pi -> 3.14159 (a table) to ~ , which is the block’s context (or scope) – and also usually a table. Then, to access the pi variable, as depicted in the second line, you just need to send the pi symbol to the context.
3.2.2. Arithmetic and Comparison As well as variables, arithmetic operations are also based in the message passing operation. Here’s how arithmetic looks like in Rubberband: ~:two_plus_two -> 2 + 2 ~:my_bmi -> 93 / (1.8 * 1.8) ~:overweight -> ~my_bmi > 25
Messages in Rubberband do not have any parameters and any object can only respond to one message at a time. However, it is possible to implement curried functions – that is, functions that return other functions – to emulate message parameters the way they work in Smalltalk or Self. Knowing this, let’s interpret the 2 + 2 expression: the + message is sent to the 2 object – it’s important to note that, in Rubberband, + is just a regular symbol, with no difference from a syntactic or semantic standpoint to, say, plus or my_bmi . The response to that message is a function that gets any number sent to it and answers that number plus 2. Then, we send 2 to the returned function, getting a 4 . For simplicity, from now on, function objects that are answers of symbol messages will be called methods – for instance, the object 2 has the + method, which accepts a number as a message. Note that the operations in Rubberband do not have any precedence rules: they’re simply executed in left-to-right order. For instance, the expression 1 + 2 * 3 + 4 evaluates to 13 , instead of 11 , as one might expect; to evaluate it to the latter value, parentheses are needed, like this: 1 + (2 * 3) + 4 . Arithmetic comparison operations – less than, greater than, etc. – are also methods of any number, going by the names of < , > , etc.. The difference is that they result in a boolean object; in the example, the overweight variable will be assigned ?1 , that is, true.
3.2.3. Control Flow There are no special keywords to handle control flow in the core language: control flow, just like other operations, is also done by sending messages – this time, to the boolean objects. For example, following the example from the previous subsection:
ERBASE - WTICG - 2016
51
~overweight?~ { %inspect_object overweight } { %inspect_object fine }
Any boolean object has the ? method, which gets the context (this will be explained later) and two blocks of code: the first one is the block which gets executed if the boolean is ?1 ; the second one, if it is ?0 . In other words, the ? method gets the following parameters: • The context, which will be bound to whatever block is executed (explained later) • The "if" block, { %inspect_object overweight } • The "else" block, { %inspect_object fine } The %inspect_object method gets an object and prints a string representation of it. Loops are not present in the core language, but they can be done through recursion (as we’re going to see in the next section) and they’re also provided by the standard library, which will be introduced later. 3.3. Blocks Blocks define a sequence of instructions, optionally containing a response value. For example: ~:factorial -> { ~:n -> $, self -> @ !~n == 0?~ {!1} {!~n * (~self~(~n - 1))} }
This block contains code to calculate the factorial of a number and it presents all special symbols that can be used inside a block: • ~ : This was already presented and it’s the context, which is similar to the scope of the code. • $ : The message received by the block. In our example, this would be the number for which we want to calculate the factorial. • @ : A reference to the block itself (self-reference). • ! : The response expression. The expression, in this case, is an “if else” statement, which responds the response of whichever block runs – either the “if” block or the “else” one. Here, if the ~n == 0 expression is ?1 (true), the block responds 1 ; otherwise, it responds ~n * (~self~(~n - 1)) . A bit more of detail is needed to understand what a context is. Blocks aren’t actually executable – at least, not directly. To execute the code inside a block, you need to instance it. A block instance is bound to an object until the end of its lifetime – that object is the context, that is, the object which is represented by ~ inside the block. To
ERBASE - WTICG - 2016
52
create a new instance, you send an object which you want to bind the block to – that is, the context. So, for instance, to calculate the factorial of some numbers using the code provided by the block above, we could do the following: ~:factorial_instance -> ~factorial(:) %inspect_object(~factorial_instance 6) %inspect_object(~factorial_instance 10) %inspect_object(~factorial_instance 12)
# 720 # 3628800 # 479001600
First, we create a new instance of the factorial block bound to a new empty table (which is used to store n inside the block). Then, in the following lines, we send some numbers as messages to that instance – which, as it’s been shown, are represented by $ inside the block. One noteworthy difference between ~ – the context – and $ – the message – is that, while the message may be different in each execution of the block instance, the context is the same object in all of its executions; in other words, any changes to the context are kept in the object after the block instance finishes executing. This didn’t have an effect in the previous example, but consider this: ~:add -> { ~:x -> ~x + $ !~x } ~:add_instance -> ~add:x -> 1 %inspect_object(~add_instance 5) %inspect_object(~add_instance 6) %inspect_object(~add_instance 7)
# 6 # 12 # 19
add gets a number, adds it to the value of x , attributes the value to that same
variable and returns it. We, then, create an instance of it bound to the block :x -> 1 – setting the initial value of x as 1 . As we call the instance again and again with new values, x keeps growing because the changes we do to the context in one call – that is, the incrementation of the value – are still available in the next calls. Going back to the factorial example, we can now understand why we have to pass a context object to the ? method: !~n == 0?~ {!1} {!~n * (~self~(~n - 1))}
The two blocks sent to it ( { !1 } and { !~n * (~self~(~n - 1)) } ) are not block instances and, thus, aren’t bound to any context. That method, then, needs a context to create an instance from one of those blocks to, then, execute it properly. Since we’re passing ~ as their context, they will run over the same context of the factorial block; this means that will be able to directly access and modify anything in the parent context – that’s why n and self are accessible from inside the second block. The self-reference, as introduced before, is denoted by @ – it was attributed to the self variable for reasons which will be explained shortly. The self-reference is used in this case because we need it to do the recursive calls that are necessary to calculate the factorial of a number. For example: Suppose we send 2 to a factorial instance. ~n == 0 will be false, so the block { !~n * ~self~(~n - 1) } will be called. This block will first
ERBASE - WTICG - 2016
53
get the value of n , then call itself with ~n - 1 as the message. Note that we have to call ~self~(~n - 1) ; that’s because the self-reference is a reference to the block, not to the current instance of it, so we need to instance it in order to call it. From this, we recursively calculate the factorial of 2 . A final note on why it was necessary to declare the n and self variables. That’s needed because blocks don’t automatically bind to their parent scope, so if we, instead, did: ~:factorial -> { !~n == 0?~ {!1} {!$ * (@~(~n - 1))} }
the code wouldn’t work as expected, since the $ , in this case, would be the message passed to the else block, which is () ; since () has no * method, this would result in a runtime error. Even if the runtime error didn’t happen, @ is a reference to the else block, not to factorial ; in this case, the stopping condition would never be checked and the program would enter in an endless loop. 3.4. Object Metainformation and Raising Runtime Errors This subsection will quickly give a look at these two features. Their most common use is to ensure an object passed to some block is the kind of object which is expected and to raise an error (instead of keeping executing) when it’s not. Object Metainformation is how reflection[8] takes form in Rubberband. You’re able to query information about the kind of an object and to what methods it responds, using the