Unicode in Domain-Specific Programming Languages for Modeling & Simulation ScalaTion as a Case Study Michael E. Cotterell
John A. Miller
Tom Horton
arXiv:1112.1751v1 [cs.PL] 8 Dec 2011
Department of Computer Science University of Georgia Athens, GA 30602
[email protected] [email protected] [email protected]
Abstract As recent programming languages provide improved conciseness and flexibility of syntax, the development of embedded or internal Domain-Specific Languages has increased. The field of Modeling and Simulation has had a long history of innovation in programming languages (e.g. Simula-67, GPSS). Much effort has gone into the development of Simulation Programming Languages. The ScalaTion project is working to develop an embedded or internal Domain-Specific Language for Modeling and Simulation which could streamline language innovation in this domain. One of its goals is to make the code concise, readable, and in a form familiar to experts in the domain. In some cases the code looks very similar to textbook formulas. To enhance readability by domain experts, a version of ScalaTion is provided that heavily utilizes Unicode. This paper discusses the development of the ScalaTion DSL and the underlying features of Scala that make this possible. It then provides an overview of ScalaTion highlighting some uses of Unicode. Statistical analysis capabilities needed for Modeling and Simulation are presented in some detail. The notation developed is clear and concise which should lead to improved usability and extendibility. Categories and Subject Descriptors D.2.11 [Software Engineering]: Software Architectures - Domain-specific Architectures; D.2.13 [Software Engineering]: Reusable Software - Reusable Libraries; D.3.2 [Programming Languages]: Language Classifications - Extensible Languages, Specialized Application Languages; I.6.2 [Simulation and Modeling]: Simulation Languages General Terms Languages Keywords Domain-specific languages, Java, Scala, ScalaTion, Unicode
1.
Introduction
ScalaTion is an embedded Domain-Specific Language (DSL) for modeling and simulation (M&S). M&S has had a long history
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Technical Report #UGA-CS-LSDIS-TR-11-011 Department of Computer Science, University of Georgia, Athens, Georgia (May 2011) c 2011 ACM [to be supplied]. . . $10.00 Copyright
of using both General-purpose Programming Languages (GPLs) and Simulation Programming Languages (SPLs). Traditionally, the SPLs may be viewed as external DSLs, although M&S is broader than many domains of study. Thus, SPLs require many of features of a GPL and the fact that they are external DSLs means that they require extensive custom language support and longer development cycles. Deursen defines DSLs as “programming languages or executable specification languages that offer, through appropriate notations and abstractions, expressive power focused on, and usually restricted to, a particular problem domain.” [5] Providing Unicode support in DSLs is a natural way to facilitate this expressive power by enabling domain-specific notations in programming. DSLs are often implemented using a GPL. The differences between DSLs and GPLs is covered quite extensively by Deursen. [5] Many of these DSLs are implemented externally through the use of lexical parser combinators. However, as covered by Hofer [10, 11], there are domain-specific embedded languages (DSELs) in which the DSL is “embedded as a library into a typed host language instead of creating an external DSL”. These are referred to simply as internal or embedded DSLs. Until recently, it was difficult to find a GPL suitable for building an embedded DSL for M&S. Desirable features and capabilities include the following: • Object-Oriented, Functional Programming Language. Mature
object models in modern programming languages allow programmers to work with various levels of hierarchy and abstraction. This leads to efficiency and code reuse. Functional programming paradigms help increase readability by emphasizing immutable states and the application of functions as opposed to imperative procedures. It is difficult to find languages that take full advantage of both object-oriented and functional paradigms. • Support for Unicode. Unicode character encoding enables a
great number of additional characters outside the traditional ASCII subset to be included in programming languages. Languages can support these characters in literals, identifiers and operators. Although most modern programming languages support Unicode characters in literals, fewer support such characters in their identifiers and operators which greatly diminishes their usefulness in the domain-engineering of an embedded or internal DSL. • Adequate Performance. Some GPLs and their corresponding
embedded or internal DSLs suffer a hit when it comes to execution speed due to many reasons. These reasons include the
fact that some of them are interpreted languages or that some of them are dynamically, instead of statically, typed. For M&S, because of its compute intensive nature, an ideal GPL should be both statically typed and allow compilation to machine code (at least using a Just-In-Time (JIT) compiler) in order to minimize overhead and improve the speed of execution of its programs. The Scala Programming Language provides these features and capabilities in a form familiar to Java programmers, so that such programmers can quickly program in Scala. (This has been the case when Scala has been used in classes at the University of Georgia.) An overview of ScalaTion, based on Scala, is given in Miller et al. [15], which highlights the five modelling paradigm (or world-views) supported by ScalaTion (event, activity, process, state, dynamics based modelling). Just as we chose Scala as our GPL because, among other reasons, it is statically typed and supports Unicode, there appears to be a growing trend toward using Scala for creating DSLs. DSL Apache Camel Scala DSL [24] OptiML [4] Regions [11] Sake [27] ScalaTion [15] Squeryl [25]
Domain Routing Machine Learning Image Processing Build Tool Modeling & Simulation Relational Databases
Table 1. Examples of DSLs implemented in Scala
Oriented, Functional Programming Language as a basis, the benefits gained by the increasing use of Unicode in programming languages, and the performance advantages of using a statically typed, compiled language. Section 3 provides an overview of ScalaTion and highlights some of its features, particularly its conciseness and use of Unicode. In section 4, the statistical capabilities of the ScalaTion DSL needed for M&S are illustrated with examples. Section 5 addresses some practical issues that arise when using Unicode. Finally, conclusions and future work are given in section 6.
2.
Embedded DSL for M&S
In this section, we discuss the itemized list of language features and capabilities presented in the Introduction in more detail. 2.1
GPL Language Features
GPLs that take advantage of both Object-Oriented and Functional programming language features enable the uses of these features in their embedded or internal DSLs. These features are useful because they allow programmers to work with various levels of abstraction while also increasing readability. 2.1.1
Object-Oriented, Functional Language Features in Scala
Many of Scala’s object-oriented, functional language features make it ideal for implementing a embedded or internal DSL. These include: • Functional Object Model. Scala’s object model provides the
In this paper, we focus on the statistical capabilities provided by ScalaTion. These capabilities have the following uses in M&S: • Random Variate Generation. M&S requires the use of a good
random number generator and several random variate generators for common probability distributions. ScalaTion provides the following distributions that mixin the Variate trait. Bernoulli, Beta, Binomial, Cauchy, ChiSquare, Deterministic, Discrete, Erlang, Exponential, Fisher, Gamma, Geometric, HyperExponential, HyperGeometric, LogNormal, NegativeBinomial, Normal, Poisson, Randi, Random, RandVec, StudentT, Triangular, TruncatedNormal, Uniform and Weibull. • Output Analysis. The purpose of output analysis is to obtain
reliable statistics from simulation runs including point and interval estimates (e.g., means and confidence intervals). In this paper, we illustrate how concisely and straightforwardly the Method of Batch Means can be implemented in ScalaTion. This method makes one long run and divides it into batches, so that the batch means are sufficiently uncorrelated and the confidence interval computed from the batch means and centered around the grand mean is adequately tight. • Comparative Analysis. M&S is often used to design systems.
As such it is frequently useful to compare alternative designs. For example, one factor that could affect the performance (e.g, response time) of a Database Management System (DBMS) is the type concurrency control protocol used. A one-way Analysis of Variance (ANOVA) test could be conducted to determine if the effect of changing concurrency control protocols is significant. One might also wish to determine how the size of the database cache and the speed of main memory affect the response time of the DBMS. Multiple Regression can be used to address this question. The rest of this paper is organized as follows: In section 2, we discuss the features and capabilities that are desirable for an embedded DSL for M&S, including the advantages of using an Object-
benefits of both object-oriented and functional programming paradigms. In Scala, everything is an object [17]. • For Comprehensions, Folds, and Ranges. More functional lan-
guage features such as these provide powerful abstractions for writing intuitive segments of code. In most cases, the difference between parallel and sequential version of these operations are simply a matter of the implementation of underlying data structures. • Mutability & Immutability. The ability to enforce the im-
mutability of an object helps enforce functional data structures and create code with fewer side effects. This also helps create code that is more thread-safe. • Implicit Conversions. This language feature enables DSL de-
signers to extend the methods and operations available to core language classes, traits and types. We will examine this particular feature more closely in Section 3. • Tuples. Built-in functionality for handling tuple types not only
helps enforce functional programming paradigms but also aids in statically typed pattern matching. • Generic Arrays. This particular feature, which is not available
in Java [17], enables the construction of generic containers such as vectors and matrices with underlying statically typed arrays. This minimizes the need for casting. • Name-based Operator Precedence. Although this particular
feature stems from Scala’s functional object model, we take take advantage of it in our implementation of operators defined with Unicode identifiers. 2.1.2
Statically Typed, Compiled Language Features in Scala
M&S is compute intensive for several reasons: systems being modeled are often complex consisting of many subparts, systems are typically simulated over time, simulation runs need to be replicated to obtain reliable statistics, and alternative designs and scenarios need to be considered. Many of Scala’s statically typed, compiled
language features make it ideal for implementing an embedded or internal DSL. These include: • Typed Language. Scala enforces the type of variables, and if
operators in Scala is determined by the identifier’s first character. In Scala, Unicode characters are considered to be “special characters” and have a higher precedence than all other operators.
a parameter specifies a specific type then only that type is allowed. There are instances where a different type will be accepted, but those cases are the result of implicit conversions that are explicitly defined by the programmer.
(all letters) | ˆ & =! : +− */% (all other special characters)
• Static Semantics. Scala checks and validates semantic rules,
or wellformedness, at compile time [17]. Examples of static semantics, in general, include rules (usually defined by some context-free grammar [11]) such as identifier declaration and uniqueness in matching labels [23]. • Compiled Language. Scala compiles to bytecode which is ex-
ecuted by the Java Virtual Machine (JVM) at runtime. This speeds up execution when compared to interpreted languages because it removes intermediate steps involved in translating the high-level level language code into machine code. 2.2
Unicode in Programming Languages
The language one uses to tell a computer what one wants done can have a large impact on the speed and accuracy of doing this, i.e., writing a computer program. In the 1960’s, there was substantial progress in programming languages, notably Algol-60, Simula67 and Algol-68. Since that time, in some sense progress has been slower, although advances in object-oriented languages and functional languages has been important. One barrier to making programming languages or domainspecific languages simultaneously more concise and more readable, is the limited character set. The use of Unicode in programming languages allows language designers and programmers to utilize a wider range of characters than what is contained in the traditional ASCII subset. Recently, languages are providing ever greater support for Unicode as indicated in the table below: Use of Unicode Character Literals String Literals Method Names Prefix Operators Infix Operators Postfix Operators
Java yes yes yes no no no
Scala yes yes yes no yes yes
Table 2. Unicode in Programming Languages
2.2.1
Support for Unicode in Java & Scala
According to the Java Language Specification [9] and the Scala Language Specification [17], both Java and Scala are programming languages which compile to the JVM (Java Virtual Machine) and support Unicode. They support character literals, string literals, and identifiers composed of characters within the Unicode Basic Multilingual Plane (BMP) character set. While this only includes characters in the range 0x0000–0xFFFF, the idea, as described in the Unicode Standard [26], is that this set contains the “majority of common-use characters for all modern scripts of the world”. It is interesting to note that the Scala Language Specification also states that infix operators can be defined with any arbitrary, but syntactically legal, identifier. The language even reserves the Unicode equivalents of some built-in operators: ‘⇒’ is equivalent to the ‘=>’ operator, and ‘←’ is equivalent to the ‘ 2)) println(∃(set1, _ > 2))
1 2
Support for product series is also included in the ScalaTion trait. We implemented this functionality using a Unicode method identified by the unary Q product symbol which is similar to the Greek capital letter ‘ ’. The range is defined using Scala’s built-in Range class. Q 1 val prod1 = Q(1 to 3) // 6 2 val prod2 = (1 to 3, i ⇒ i↑2) // 36
1 2 3 4 5 6
println(2 ∈ set2)
1
ScalaTion also supports the universal and existential quantifiers.
Product Series
val set = Set(1, 2, 3, 4) Q val prod3 = Q(set) val prod4 = (0 to 2, i ⇒ set(i))
// output: true // output: false
Also, as these operations are statically typed, conversions from one type to another are implicitly performed on a contextual basis. For example, in Scala, a Int can be implicitly converted to a Double. This enables the following statement.
val rising = 4 ⇑ 4 // 840 val falling = 4 ⇓ 4 // 24
3.2.3
1 2 3
println(2 ∈ set1) println(5 ∈ set1)
1 2
3.2.7
// output: false // output: true
Numeric Vectors
Support for numeric vectors and their operations are included in the ScalaTion trait. As mentioned earlier, in addition to other common vector operations, the Vec class defines a Unicode infix operator for the dot product. val vec1 = Vec(1, 2, 3) val vec2 = Vec(4, 3, 2)
1 2 3 4 5
val dp1 = vec1 · vec2 println(dp1) // output: 16
As with Array or IndexedSeq in Scala, the Vec class provides random access to the vector elements using the apply method. ScalaTion overloads this method to accept a Range parameter as well. When this is done, a new vector is returned that is equivalent to the original vector sliced at the bounds of provided range. val v = Vec(1, 2, 3, 4, 5, 6, 7)
1 2 3 4 5 6 7
val v2 = v(2) val v3 = v(3)
// 3 // 4
val v2_4 = v(2 to 4) // Vec(3, 4, 5) val v2_3 = v(2 until 4) // Vec(3, 4)
In an attempt to provide a more mathematically-recognized notation for Scala’s built-in Range, we implemented an alias for the to method in the form of a short ellipses (using the Unicode ellipse character). This enables users of ScalaTion to write the following: val v2_4 = v(2...4)
1
3.3
// Vec(3, 4, 5)
Unresolved Issues
Choosing Scala as ScalaTion’s GPL was not without some limitations. For example given our current approach, we are unable to implement the following language features: • Unicode prefix operators. As mentioned earlier, Scala does not
allow custom prefix operators in ASCII, Unicode, or otherwise. This required us to implement prefix operators as ordinary methods and functions, requiring parentheses. 1 2
val b = true println(¬(b)) // output: false
• Control of precedence levels. As groups of operators in Scala
have a fixed level of precedence and all Unicode-identified operators fall into the same group, we are unable to differentiate the precedence of such operators when evaluated. In order to get around this problem, we would could take two approaches. First, we could perform some sort of text-substitution preprocessing that replaces Unicode operators with operators of appropriate precedence. For example, in order to preserve the precedence of set union and intersection, we could replace the ∪ and ∩ characters with | and & respectively. In order to provide the subset operator with a lower precedence, we could replace the ⊆ character with subsetOf. Second, we could extend the parser combinators in Scala’s compiler through plugins in order to assign precedence without the need for preprocessing. However, both of these approaches are outside the scope of internal or embedded DSLs, because they externalize the language by either increasing the number of intermediate steps required by the end user during the compilation process and by requiring more than just the GPL and DSL library. (There has been some discussions about either including a certain subset of Unicode operators into Scala with different precedent levels or providing full user control over operator precedence [22].)
4.
Statistical Analysis using the ScalaTion DSL
Results are produced in simulation by making multiple runs of a program or by dividing a long run into multiple batches. Each of these produces sample data points that must be analyzed statistically. Consequently, simulation relies heavily upon statistics. The goals guiding the development of the statistical analysis capabilities of ScalaTion are the following: • Make the code concise and intuitive so that someone reading a
Modeling and Simulation or Statistics textbook would find the code easy to use (not exactly the formulas in the textbook, but similar enough to be easily recognized). • Make the code reasonably efficient. Following notation in a
textbook too closely may lead to inefficient code, but if the efficiency leads to obfuscation, efficiency needs to take a backseat. • Rely heavily on the use of vectors as this leads to concise and
readable code, and provides opportunities for parallel processing based on Scala 2.9’s parallel collection classes. 4.1
cedures for analyzing the observed variance in a particular variable or series of variables. Using the mathematical notation described earlier, ScalaTion is able to provide many statistical formulas that look similar to way they are defined in a textbook. In this section, we will present some of the statistical functions provided by ScalaTion that benefit from the use of Unicode in their function identifiers. Mean and Expectation In ScalaTion, the mean of a vector is provided by the mean function. This makes is easier for us to define the mean statistic, µ, for any given RandVec. In the case of the mean, both the sample statistic and the population characteristic are the same. 1
def µ (x: RandVec) = x.mean
We should also note that, in general, as the expected value E[x] is also referred to as the mean, µ, or the first moment of x. µ(x) = E[x] This, however, is just a matter of abstraction. In ScalaTion, all types that mixin Variate must define their own mean. 4.2.1
Mean Square
The mean square, or second moment, of x is simply the average of the squares of x. ms = µ2 = µ(x2 ) In ScalaTion, we define the ms function to take a RandVec and return this value. 1
def ms (x: RandVec) = µ(x↑2)
4.2.2
Variance
In statistics, variance measures how far a set of numbers are spread out from each other. In Banks, Carson, Nelson and Nicol [3], an equation for population variance is provided. For our purposes, we define this equation using the mean of a RandVec as its expected value. We also utilize the mean square calculation. σ2 = V (x) = µ (x − µ(x))2
Random Variate Generation
= µ(x2 ) − µ(x)2
ScalaTion provides classes for producing many different kinds of probability distributions. 1 2
val rv = Normal(µ, σ) val x = rv.gen
= ms − µ(x)2 This produces the following definition in ScalaTion. 1
The RandVec class provides a way to generate a numeric vector populated with a Random Distribution of numbers. Each number in the vector has an associated probability. By default, all numbers in a RandVec have equal probability. This class extends Vec and therefore supports all of the vector operations discussed earlier. It also mixes-in the Variate trait in order to allow interaction with certain statistical functions. 4.2
Output Analysis
Output analysis is the examination of data generated through a simulation. According to Banks, Carson, Nelson and Nicol [3], the purpose of of the statistical analysis is to estimate the confidence interval or to the number of observations required to achieve a confidence interval. ScalaTion includes a collection of statistical pro-
def σ2 (x: RandVec) = ms(x) - µ(x)↑2
We also define the sample variance σ2^. 1 2 3 4
def σ2^ (x: RandVec) = { val n = x.dim n * σ2(x) / (n-1) }
Note, several sample statistics are provided in ScalaTion, but in this paper we focus on population characteristics for simplicity. 4.2.3
Standard Deviation
Standard deviation shows how much variation there is from the mean or expected value. In Banks, Carson, Nelson and Nicol [3], it is defined as:
σ=
p V (x)
In ScalaTion, we define the standard deviation of a RandVec using the following function definition in Scala. 1
def σ (x: RanVec) = σ2(x)↓2
4.2.4
Skewness
The skewness statistic is a measure of the asymmetry of the probability distribution of a Variate, defined as follows: µ = µ(x) µ3 = µ(x3 ) σ = σ(x) 2
γ1 =
µ3 − 3µσ − µ σ3
3
Here is the corresponding definition in ScalaTion: 1 2 3 4
def γ1 (x: RandVec) = { val (µ, µ3, σ) = (µ(x), µ(x↑3), σ(x)) (µ3 - 3 * µ * σ↑2 - µ↑3) / (σ↑3) }
4.2.5
Covariance
The covariance statistic is a measure of how much two variables change together. It is defined by the following equation. cov(x, y) = µ ((x − µ(x))(y − µ(y))) = µ(xy) − µ(x)µ(y) Here are the corresponding definitions in ScalaTion: 1 2
def cov (x: RandVec, y: RandVec) = µ(x*y) - µ(x) * µ(y)
4.2.6
Correlation
The population correlation statistic, also known as the Pearson product-moment correlation coefficient, is a measure of dependence between two Variate objects. It is defined by the following equation. ρ(x, y) =
cov(x, y) σ(x)σ(y)
Here is the corresponding definition in ScalaTion: 1 2
def ρ (x: RandVec, y: RandVec) = cov(x, y) / (σ(x) * σ(y))
4.2.7
Autocorrelation
The first-order autocorrelation statistic of a Variate is the correlation statistic between different ranges of the Variate. We generalize this into the following formula. ρ(x0...n−1 ) = ρ(x0...n−2 , x1...n−1 ) In ScalaTion, autocorrelation is defined similarly using the correlation function we defined earlier. When we combine this with ScalaTion’s built-in vector slicing, we are able to produce the following Scala code.
1 2 3 4
def ρ (x: RandVec) = { val n = x.dim ρ(x(0...(n-2)), x(1...(n-1))) }
4.2.8
Batch Means
ScalaTion extends RandVec to create BatchVec a class for calculating the batch means and confidence levels for simulation. The method of batch means is a popular output analysis technique used for steady-state simulations [3]. Given an initial batch size of b, we try new batch sizes (doubling for each attempt) until the autocorrelation of the batch means drops to a threshold of, for example, 0.1. In ScalaTion, we implement this using the following Scala code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
def makeBatch (b: Int, n: Int = 1): RandVec = // simulate to collect b*n sample data points def µBatch(x: RandVec, b: Int): RandVec = { val n = x.dim / b for (i ← 0 until n-1) yield µ(x((i*b)...(i+1)*b-1)) } def formBatches (b: Int = 10, n: Int = 10, x: RandVec = RandVec.ofLength(0)): (Int, RandVec, RandVec) { val y = x ++ makeBatch(b, n) val µVec = µBatch(y, b) (ρ(µVec) > 0.1) match { case true => formBatches(2*b, n, y) case false => (b, y, µVec) } }
Now that the batch means are sufficiently uncorrelated, we can compute a confidence interval and determine relative precision (ratio of the confidence interval half-width to the grand mean). 1 2 3 4 5 6 7 8 9 10
var (b, x, µVec) = formBatches() var (gµ, precision) = (0.0, 0.0) do { (gµ, precision) = (µ(µVec), µVec.interval() / gµ) if (precision > 0.2) { x = x ++ makeBatch(b) µVec = µBatch(x, b) } } while (precision > 0.2)
The loop above will cause additional batches to be collected until a sufficient relative precision is obtained. 4.3
Comparative Analysis
In simulation, comparative analysis may be used to consider design alternatives, e.g., which server configuration is more efficient, two slower, less costly chips or one faster, more expensive chip. Again, as simulation results are stochastic, it is important to use rigorous statistical techniques to compare alternatives. There are several techniques for comparing design alternatives including paired-t tests and ANOVA as well as advanced techniques for ranking and selection [13]. 4.3.1
One-way Analysis of Variance
ScalaTion provides an Anova class and object for performing a oneway Analysis of Variance (ANOVA). One-way ANOVA is often
used to compare multiple treatments (e.g., design alternatives) typically using a Fisher distribution. An Anova object can be constructed with either a numerical matrix or a sequence of numerical vectors. For the following examples, let m and n be the dimensions of the input matrix x. 1 2
The F-statistic is the ratio of the between-groups and within-groups sum of of squares divided by their respective degrees of freedom. It is used in conjunction with a Fisher distribution to determine if the values are statistically significant for some probability.
val m = x.dim1 // m rows val n = x.dim2 // n columns
Each row of the matrix corresponds to a treatment and contains n replicates. Grand Mean
f=
Pm−1
µ(xi ) m
i=0
gµ =
In ScalaTion, we can define the grand mean using the same formula. def gµ = Σ(0, m-1, i => µ(x(i))) / m
ssb /m − 1 ssw /m · (n − 1)
In ScalaTion, we define this value in the Anova class using similar notation. 1
The grand mean is the mean of the means of each group [6]. It is defined by the following equation.
1
F-statistic
5.
def f = (ssb / m-1) / (ssw / m*(n-1))
Practical Issues in Using Unicode
Unicode support for embedded or internal domain-specific languages must include proper tooling. By this, we mean that adequate tools should be provided so that end users of the language can utilize the special Unicode features. As mentioned in the previous section, there are a few missing capabilities that can make this task difficult. However, they is nothing that cannot be dealt with by extending existing technologies.
Total Sum of Squares
5.1
The total sum of squares can be written as the sum of the squares of the group deviations. It is defined by the following equation.
Any claim of advantages or benefits to our efforts to enhance an internal or embedded domain-specific languages through the use of the extended character set available through Unicode could be legitimately criticized if the user is not able to easily incorporate these characters into their programming environment. This issue is not new and validly applies whenever the difficulties or disadvantages of inclusion of a new feature outweigh the potential benefits. It is for this reason that we elaborate on this problem and some of the methods and technologies that eliminate or greatly reduce end user burden when entering Unicode characters. As with the definition of Unicode itself, this is a dynamic process and the advancement of this goal is ongoing. The Unicode Standard - Version 6.0 - Core Specification is a 670 page document that contains the “universal character encoding, extensive descriptions and a vast amount of data how the characters function” [26]. The use of Unicode characters and symbols addresses the challenge of computer system users worldwide to expand upon the base problem of being able to utilize characters and symbols beyond that found on a traditional 80 to 100 key keyboard. The breadth of the problem that Unicode addresses can be illustrated by knowing the that Unicode character set covers over 100,000 characters in 93 scripts [26], although we focus on the BMP Unicode subset. In order for users to effectively utilize the advantages interoperability between different application implementations and the world’s languages that Unicode addresses there must be effective and easy to use methods for entering Unicode characters into a computer system. The data entry methods currently used seem to fit into the categories of supported by hardware, software, or a combination. Keyboards of many designs have long been used to implement spoken and computer languages. The number of national language keyboards exceeds 100 different keyboards. There are obvious and known problems with keyboards having a relatively small number of physical keys addressing certain spoken languages. The BIOS of some systems have been designed to accommodate the limited number of special keyboard characters. The “ControlAlt-Delete” key sequence (or chord) has been part of computer users’ entry repertoire for decades . Similarly, some computer programming languages and applications use symbols that are not universally found on computer key-
sst =
m−1 X n−1 X
(x2i,j ) − m · n · gµ2
i=0 j=0
In the Anova class, we define the total sum of squares using the same formula. 1
def sst = Σ(0, m-1, i ⇒ Σ(x(i)↑2)) - m*n*gµ↑2
The code above takes advantage of applying the exponentiation operator to each element of the vector. Between-groups Sum of Squares The between-groups sum of squares can be written as the square of the sum of deviations between each group. It can be defined by the following equation. ssb =
m−1 X n−1 X i=0 j=0
x2i,j n
− m · n · gµ2
We implement this formula in ScalaTion using the following Scala code. 1
def ssb = Σ(0, m-1, i ⇒ Σ(x(i))↑2/n) - m*n*gµ↑2
Within-groups Sum of Squares The within-groups sum of squares can be written as the square of the sum of deviations within each group. It can simplified into the difference between the total and between-groups sum of squares. It is defined by the following equation. ssw = sst − ssb We easily define this statistic in ScalaTion using the following Scala code. 1
val ssw = sst - ssb
Input Methods for Unicode
boards. The concurrent use of multiple keys, or a chord, is used to address entry of characters not otherwise found on a keyboard. Certainly, the concurrent use of a shift key allows for entry of upper and lower case letters. Likewise, the “alt”, “ctrl”, and “alt-ctrl” keys expand the base keyboard character sets. An example of one form of special and unique hardware support of an expanded character set is illustrated by the Art Lebedev Studio keyboard offerings. Their “Optimus Tactus keyboard does not have physical individual keys removing restrictions upon the shape and size” of keys [1].
Figure 3. XK-Professional by P.I. Engineering
Figure 1. Optimus Tactus Keyboard by Art Lebedev Studio Additionally, any part of the keyboard surface can be programmed to perform a function or to display an image. The “Tactus” can be programmed to appear as a typical qwerty keyboard or a video image [2]. The “Maximus” keyboard does have typical physical keys, but is able to be programmed to enter characters of many languages, special symbols, HTML code, and math functions. Each key top is a small display indicating what the button is programmed to do [1].
Currently, ScalaTion shows how by using a subset of the familiar mathematical symbols produces code that appears closer to the end-user’s problem formulation. It is very important to enable the user an easy and proficient way to enter the Unicode symbols implemented in this version of ScalaTion. End users have their choice of data entry methods: hardware, software, or combination method. What is important is for the users to understand that data entry should not be considered a stumbling block toward the use of a problem solving tool that uses a character set beyond that of the standard keyboard. We believe that as hardware and software continues to mature that data entry of expanded character set will likely include graphics, multi-media, mobile devices, and the full range of input and output devices.
6.
Figure 2. Optimus Maximus Keys by Art Lebedev Studio Another example of hardware supporting an extended character set is provided by the X-Keys product series [20]. The X-Keys product series (keyboards, keypads, and other devices) are physical extensions or auxiliary data entry devices. Without a major interruption to the data entry proficiency, a user is able to switch from the standard keyboard to an auxiliary key device thereby utilizing a greatly expanded character set. Specialty hardware is certainly not a requirement for the entry of Unicode characters. There are a large (and growing) number of software products that address the data entry of expanded character sets. Microsoft Windows provides a basic method for entry of Unicode characters. Through the use of the “alt+num pad” users can enter the Unicode-generated (UTF-16) character [14]. This expands the data entry character set via a standard keyboard. Other software tools incorporate the combined use of a keyboard and the screen or display. ISO 14755 refers to this as a screen-selection entry method [7]. 5.2
ScalaTion-specific Considerations
The need for entry of an expanded character set in ScalaTion is not as broad as the generic Unicode character entry problem.
Conclusions and Future Work
We have developed and presented ScalaTion, an embedded or internal DSL for M&S, which we believe will streamline language innovation in this domain through its utilization of Unicodeencoded identifiers. The code and documentation is available on the ScalaTion project website: http://code.google.com/p/ scalation/. Through our case study on ScalaTion, we have demonstrated that there are ways to make Scala code more concise, readable, and in a form more familiar to (M&S) domain-experts. ScalaTion provides Unicode-identified functions and operators that are easily recognized by domain-engineers in M&S. Such domain-specific notation enables concise, easily readable code to be written by such engineers and other users of ScalaTion. We took advantage of three different methods for adding Unicode support to the ScalaTion DSL. The first and easiest method was the creation of new classes and objects that define their own Unicode operators (e.g., the dot product operator in Vec). The second method was through Scala’s mixin compositions which enabled us to add Unicode constants, functions, and operators to the scope of any newly created object. For example, when extending the App trait for easy application creation, we can mixin the ScalaTion trait, which enables the use of these Unicode definitions within the application object. The third method by which we added Unicode support to ScalaTion was through implicit conversions. This enabled us to implicitly add Unicode functions and operators to existing Scala classes, objects, and types (e.g., adding the ∈ operator to all types, enabling us to test whether they are contained within a set). These methods demonstrate how easy it is to add Unicode support to both new and existing DSLs implemented in the Scala Programming Language.
Although ScalaTion already provides many of the functions needed for programming in M&S, there is always room for improvement. Here are some of our proposals for future work. • IDE Plugins and Frontends. We will work on integrating tools
for using the ScalaTion DSL into Integrated Development Environments (IDEs). This will include such things as toolbars for selecting Unicode identified operators and extensions to content-assist services for looking up and suggesting operators that are available and contextually relevant. Popular IDEs that currently support Scala via plugins include Eclipse and IntelliJ. Some work has already begun on extending Eclipse to support the ScalaTion DSL via toolbar plugins. In the future, such developments may lead to a unified frontend for ScalaTion similar to the frontends of external DSLs like R, Maple, Mathematica, and MATLAB. Such work will also help ease the input and output of mathematical notations in ScalaTion. • LATEX to ScalaTion. As seen earlier, many mathematical and sta-
tistical formulas can be expressed in ScalaTion. To this end, it would be convenient for users of the ScalaTion DSL if we implement a way to convert formulas written in LATEX to code that compiles with ScalaTion. This convenience extends beyond simply allowing users of the DSL to first write their formulas with LATEX. It also enables users to write their formulas in languages and environments (e.g., Maple) that support exportation to LATEX. • ScalaTion to LATEX. Many times, it would be convenient if a user of the ScalaTion DSL could easily convert code written in ScalaTion to LATEX. (For instance, when preparing formulas for a paper.) As the syntax for both ScalaTion and LATEXis linear, it should be possible to easily parse a formula written in one and convert it to the other. • Prefix Operators via Compiler Plugins. Unless Scala changes
how it handles operator precedence and associativity, we need to work on ways to define such things for our Unicodeidentified operators. This can be accomplished through the development of plugins for the Scala compiler. Possible implementations could include something as simple as regular expression substitution as a pre-processor phase or something as non-trivial as extending Scala’s own lexical parser combinators. We will also explore the language virtualization benefits of Rompf’s [21] on Lightweight Modular Staging (LMS) in order to make these improvements easier to implement. Such work will help make the language both more familiar and easier to use by domain engineers.
Acknowledgments We would like to acknowledge the other ScalaTion group members for their contributions to the project: Jun Han, Maria Hybinette and Robert Davis.
References [1] Art. Lebedev Studio. Optimus Maximus keyboard. http://www.artlebedev.com/everything/optimus/.
May 2011.
[2] Art. Lebedev Studio. Optimus Tactus keyboard. http://www.artlebedev.com/everything/optimus-tactus/.
May 2011.
[3] J. Banks, J. S. Carson, B. L. Nelson, and D. M. Nicol. Discrete-Event System Simulation. Prentice Hall, 4th edition, 2000. [4] H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun. A Domain-Specific Approach to Heterogeneous Parallelism. In Proceedings of the 16th ACM symposium on Principles and Practice of Parallel Programming, PPoPP ’11, pages 35–46, New York, NY, USA, 2011. ACM.
[5] V. A. Deursen, P. Klint, and J. Visser. Domain-Specific Languages: An Annotated Bibliography. ACM Sigplan Notices, 35(6):26–36, 2000. [6] B. Everitt and A. Skrondal. The Cambridge Dictionary of Statistics. Cambridge University Press, New York, NY, USA, 2002. [7] I. O. for Standardization. Geneva; International Electrotechnical Commission. Geneva. ISO/IEC 14755 - input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices. 1996. [8] Gabriel C. Boolean Algebra Internal DSL in Scala. May 2011. http://gabrielsw.blogspot.com/2009/06/boolean-algebra-internal-dslin-scala.html. [9] J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)). AddisonWesley Professional, 2005. [10] C. Hofer and K. Ostermann. Modular Domain-specific Language Components in scala. In Proceedings of the ninth international conference on Generative Programming and Component Engineering, GPCE ’10, pages 83–92. ACM, 2010. [11] C. Hofer, K. Ostermann, T. Rendel, and A. Moors. Polymorphic Embedding of DSLs. In Proceedings of the 7th international conference on Generative Programming and Component Engineering, GPCE ’08, pages 137–148. ACM, 2008. [12] D. Knuth. Mathematics and Computer Science: Coping with Finiteness. Mathematics: People, Problems, Results, 2, 1984. [13] A. M. Law and W. D. Kelton. Simulation Modeling & Analysis. McGraw-Hill, 2nd edition, 1982. [14] Microsoft Developer Network, Microsoft Corporation. Glossary of Terms Used on this Site. May 2011. http://msdn.microsoft.com/enus/goglobal/66964658.aspx. [15] J. A. Miller, J. Han, and M. Hybinette. Using Domain Specific Language For Modeling and Simulation: ScalaTion as a Case Study . In Proceedings of the 2010 Winter Simulation Conference, pages 741–752. ACM, December 2010. [16] M. Odersky. Poor Man’s Type Classes. In Presentation at the meeting of IFIP WG, July 2006. [17] M. Odersky. The Scala Language Specification, Version 2.8. Technical report, Programming Methods Laboratory, EPFL, Nov. 2010. [18] M. Odersky, P. Altherr, V. Cremet, B. Emir, S. Maneth, S. Micheloud, N. Mihaylov, M. Schinz, E. Stenman, and M. Zenger. An Overview of the Scala Programming Language. Technical report, Citeseer, 2004. [19] M. Odersky, V. Cremet, C. Rockl, and M. Zenger. A Nominal Theory of Objects with Dependent Types. In Proceedings of 2003 European Conference on Object-Oriented Programming, ECOOP ’03, pages 201–224. Springer, 2003. [20] P.I. Engineering, Inc. X-keys Programmable Devices. May 2011. http://www.piengineering.com/xkeys.php. [21] T. Rompf and M. Odersky. Lightweight Modular Staging: a Pragmatic Approach to Runtime Code Generation and Compiled DSLs. [22] Scala - User Mailing List. Special Unicodes in Method Names and Precedence. May 2011. http://scala-programminglanguage.1934581.n4.nabble.com/Special-unicodes-in-methodnames-and-precedence-td1957496.html. [23] M. L. Scott. Programming Language Pragmatics. Morgan Kaufmann Publishers, 2nd edition, 2000. [24] The Apache Software Foundation. Apache Camel - Scala DSL. May 2011. http://camel.apache.org/scala-dsl.html. [25] The Squeryl Team. Squeryl - A Scala ORM for SQL Databases. May 2011. http://squeryl.org/. [26] The Unicode Consortium. The Unicode Standard, Version 6.0.0 – Core Specification. Technical report, Mountain View, CA, USA, 2011. [27] D. Wampler. Sake: a Build Tool written in Scala. May 2011. https://github.com/deanwampler/SakeScalaBuildTool.