Extending Rosetta: The Embedding of a Programming ... - CiteSeerX

7 downloads 0 Views 651KB Size Report
The information content in the world is rapidly increasing, and tools are needed to extract useful and previously unknown patterns from large data sets. One such ...
Department of Computer and Information Science

Norwegian University of Science and Technology

Extending Rosetta: The Embedding of a Programming Language Project Report

Thomas  Agotnes Knowledge Systems Group

Trondheim, April 30, 1998

ii

iii

Abstract The information content in the world is rapidly increasing, and tools are needed to extract useful and previously unknown patterns from large data sets. One such tool is Rosetta, an interactive software system for knowledge discovery within the framework of the Rough Sets theory. Knowledge discovery has an important experimental aspect, which has been recognized in the design of Rosetta' s graphical user interface. However, such interfaces generally o er few means of automating subtasks and creating abstractions of frequently used computational patterns. The main objective of this report is to design a programming language and to implement an interpreter for this in Rosetta. A programming language will, by providing an interface to Rosetta's class library, be a supplement and an alternative to the graphical user interface. The structures of languages a ect the way we think about problems. It is therefore important to nd a programming language that allows the user to fully employ the powerful tool of abstraction, and that allows problems to be stated in a natural way. One such language is the algorithmic language Scheme, which is proposed as a programming language for Rosetta in this report. When an interpreter for Scheme is to be embedded in Rosetta, the interface between the two must be de ned. Rosetta's data needs to be accessed from Scheme, and, in order to retain Scheme's semantics, a model that explains the presence of Rosetta from the viewpoint of the interpreter, in terms of Scheme's semantics, must be found. In this report, a model is proposed. In this model, the Rosetta system is regarded as an enclosing environment to the environment where the interpreter runs. Furthermore, Rosetta's data and algorithm objects are represented in the interpreter as lexical closures. The message passing programming style is used to access the methods of the individual objects. An interpreter exhibiting the proposed model has been implemented.

iv

Preface This report completes the project in Course 78070 Computer Science, Projects at the Department of Computer Science, the Norwegian University of Science and Technology, Trondheim, Norway. The project work is stipulated to 20 hours per week, during one semester. Many people have in uenced this report, either in the past or during the semester. First, I would like to thank my supervisors, Staal Vinterbo and Alexander hrn. I would also like to thank Bjarte M. stvold for useful discussions and insightful comments in the start of the project, and Tor-Kristian Jenssen for providing comments and useful hints regarding Rosetta and memory management. I would like to thank Jan Komorowski for introducing me to the theoretical aspects of programming languages. Finally, I would like to thank those who provided software or data used during this project. Free Software Foundation provided high quality software tools. Brent Benson provided libscheme. The breast cancer data set used in Chapter 9 has been obtained from Dr. William H. Wolberg, University of Wisconsin Hospitals.

Trondheim, 30 April 1998, Thomas  Agotnes

v

vi

PREFACE

Contents Preface 1 Introduction

v 1

2 Data Mining with Rough Sets

3

1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Formal Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Reader's Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Knowledge Discovery and Data Mining . . . . . . . . . . . . . . . . . . . . . . 2.2 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Rosetta

3.1 Introducing Rosetta 3.2 Inside Rosetta . . . . 3.2.1 The Kernel . . 3.2.2 The Front-end

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4 Programming Languages

. . . .

. . . .

. . . .

4.1 Languages as Abstraction Tools . . . . . . . 4.1.1 Data Abstraction . . . . . . . . . . . 4.1.2 Procedural Abstraction . . . . . . . 4.2 Language De nition . . . . . . . . . . . . . 4.2.1 Syntax . . . . . . . . . . . . . . . . . 4.2.2 Semantics . . . . . . . . . . . . . . . 4.2.3 Pragmatics . . . . . . . . . . . . . . 4.3 Language Features . . . . . . . . . . . . . . 4.4 Interpreters . . . . . . . . . . . . . . . . . . 4.4.1 Interpreting Programs . . . . . . . . 4.5 Language Paradigms . . . . . . . . . . . . . 4.5.1 The Imperative Language Paradigm 4.5.2 The Functional Language Paradigm 4.5.3 The Object Oriented Paradigm . . . 4.5.4 The Logic Programming Paradigm . 4.6 Scripting and Extension Languages . . . . . 4.6.1 Scripting Languages . . . . . . . . . 4.6.2 Extension Languages . . . . . . . . . vii

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

1 2 2

3 4

7 7 8 8 9

11

11 12 12 12 13 16 16 17 18 21 22 22 23 24 24 24 25 25

CONTENTS

viii

5 An Extension Language for Rosetta

5.1 Reformulating the Problem . . . . . . . . . . . . . 5.1.1 The Extension Language Game Board . . . 5.2 General or Special Purpose Language? . . . . . . . 5.2.1 Language Paradigms . . . . . . . . . . . . . 5.3 Programming Language Controversies . . . . . . . 5.3.1 Pitfalls of Imperative Programming . . . . 5.3.2 A Closer Look at the Functional Paradigm 5.3.3 The Economy of Learning . . . . . . . . . . 5.4 Adopting a Language . . . . . . . . . . . . . . . . 5.4.1 Scheme Overview . . . . . . . . . . . . . . .

6 Integration

6.1 Constructing a Conceptual Model . . . . . . . 6.2 Considerations . . . . . . . . . . . . . . . . . 6.2.1 Visibility in the global namespace . . 6.2.2 Restrictions on manipulation . . . . . 6.2.3 Scheme-created objects in the GUI . . 6.2.4 Where data should live . . . . . . . . 6.3 Algorithms and Structures . . . . . . . . . . . 6.3.1 Procedural Abstractions: Algorithms . 6.3.2 Data Abstraction: Structures . . . . . 6.4 Objects as Closures . . . . . . . . . . . . . . . 6.4.1 Structures . . . . . . . . . . . . . . . . 6.4.2 Algorithms . . . . . . . . . . . . . . . 6.5 The Universal Environment . . . . . . . . . . 6.5.1 Naming . . . . . . . . . . . . . . . . . 6.5.2 Creating new bindings . . . . . . . . . 6.6 Summary . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

7.1 Pragmatic Concerns . . . . . . . . . . . . . . . 7.1.1 Garbage Collection . . . . . . . . . . . . 7.1.2 Tail Recursion . . . . . . . . . . . . . . 7.1.3 Continuations . . . . . . . . . . . . . . . 7.1.4 Architecture . . . . . . . . . . . . . . . 7.2 Existing Implementations . . . . . . . . . . . . 7.2.1 Embedding an Existing Implementation 7.2.2 Classi cation and Common Features . . 7.2.3 Desirable Features . . . . . . . . . . . . 7.2.4 Implementation Summary . . . . . . . . 7.2.5 Elk . . . . . . . . . . . . . . . . . . . . . 7.2.6 Guile . . . . . . . . . . . . . . . . . . . 7.2.7 libscheme . . . . . . . . . . . . . . . . . 7.2.8 MzScheme . . . . . . . . . . . . . . . . . 7.2.9 RScheme . . . . . . . . . . . . . . . . . 7.2.10 Scheme 48 . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

7 Scheme Interpreters

27

27 27 28 31 32 32 32 33 33 35

39

39 40 41 41 42 42 44 45 46 47 51 52 53 55 56 57

59

59 59 59 60 60 61 61 62 63 65 65 65 66 66 67 67

CONTENTS

ix

7.2.11 SCM . . . . . . . . . . . . . . . . . 7.2.12 VSCM . . . . . . . . . . . . . . . . 7.3 Implementing from Scratch . . . . . . . . 7.3.1 Garbage Collection . . . . . . . . . 7.3.2 Architecture . . . . . . . . . . . . 7.3.3 Tail Recursion and Continuations . 7.4 Conclusion . . . . . . . . . . . . . . . . . 7.4.1 An Implementation Strategy . . .

8 Implementation

8.1 Design Parameters . . . . . . . . . 8.2 Integration . . . . . . . . . . . . . 8.2.1 The Universal Environment 8.2.2 Objects As Closures . . . . 8.3 User Interface . . . . . . . . . . . . 8.4 libscheme . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

9 Experiences, Conclusions and Future Work 9.1 Experiences . . . . . . . . 9.1.1 Performance Issues 9.2 Conclusions . . . . . . . . 9.3 Future Work . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

A.1 InteractiveSchemeInterpreter.h . A.2 InteractiveSchemeInterpreter.cpp A.3 SchemeAlgorithmDispatcher.h . . A.4 SchemeAlgorithmDispatcher.cpp A.5 SchemeEditCtrlInputPort.h . . . A.6 SchemeEditCtrlInputPort.cpp . . A.7 SchemeEditCtrlOutputPort.h . . A.8 SchemeEditCtrlOutputPort.cpp . A.9 SchemeEnvironment.cpp . . . . . A.10 SchemeEnvironment.h . . . . . . A.11 SchemeError.h . . . . . . . . . . A.12 SchemeError.cpp . . . . . . . . . A.13 SchemeEvaluator.h . . . . . . . . A.14 SchemeEvaluator.cpp . . . . . . . A.15 SchemeInputPort.h . . . . . . . . A.16 SchemeInputPort.cpp . . . . . . A.17 SchemeInterpreter.h . . . . . . . A.18 SchemeInterpreter.cpp . . . . . . A.19 SchemeInterpreterDoc.h . . . . . A.20 SchemeInterpreterDoc.cpp . . . . A.21 SchemeMessageDispatcher.h . . . A.22 SchemeMessageDispatcher.cpp . A.23 SchemeObject.h . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

A Source Code

. . . .

. . . .

. . . .

67 67 67 68 68 68 68 69

71

71 71 72 72 72 73

75

75 78 79 80

83

83 84 85 86 90 91 95 96 97 99 100 101 101 103 105 106 109 110 113 114 116 117 119

CONTENTS

x A.24 SchemeObject.cpp . . . . . . . A.25 SchemeOutputPort.h . . . . . . A.26 SchemeOutputPort.cpp . . . . A.27 SchemeStructureDispatcher.h . A.28 SchemeStructureDispatcher.cpp A.29 SchemeUniversalEnv.h . . . . . A.30 SchemeUniversalEnv.cpp . . . . A.31 viewschemeinterpreter.h . . . . A.32 viewschemeinterpreter.cpp . . . A.33 Scheme.h . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

121 125 126 128 129 133 134 138 139 145

Chapter 1

Introduction The most powerful computing element in existence is the human brain, with an unparalleled computing power and a vast storage capacity. The world around us is extremely complex, and in spite of it's power and capacity not even the human brain would be able to handle the world if it were to deal with every detail of it. To cope with the complexity of the world, humans use abstractions. By using abstractions we can reason about concepts such as \a car" or \to run". Abstractions enables us to think about general concepts such as \cars", without enumerating all the cars that have been, are, and will be in the world. Also, abstractions allow us to think about a car as a car, instead of a composition of a motor and a body and four wheels, that again are a composition of e.g. a cylinder block and paint, that again are a composition of molecules, and so on. Abstractions are built on abstractions. This report is an exercise in making the powerful tool of abstraction available for the user of the Rosetta system. The information content in the world is rapidly increasing. Instead of dealing with vast data sets directly, we need tools to extract useful and previously unknown patterns from the data. One such tool is Rosetta. Rosetta is a toolkit for data analysis within the Rough Sets framework. One application of Rosetta is knowledge discovery. The process of knowledge discovery has a distinctive experimental aspect, which have been recognized in the design of Rosetta.

1.1 Problem Formulation The main objective of this project is to design a programming language for Rosetta, and implement an interpreter. The project formulation has been given as follows: A software toolkit for data mining and knowledge discovery within the framework of rough set theory has been developed by the Knowledge Systems Group, in cooperation with the Dept. of Mathematics at Warsaw University, Poland. The toolkit, named Rosetta, is written in C++ and a GUI has been developed that runs under Windows 95/NT. In practice when analyzing real-world datasets, the need often arises to partially 1

CHAPTER 1. INTRODUCTION

2

automate analysis steps that are either repetitions with minor variations, or that are highly problem-speci c. A fully exible data mining tool should therefore o er a programming language of some kind in order to aid in writing of ad-hoc scripts and novel combination of existing library contents. Using the existing Rosetta C++ class library, a language for this purpose is to be designed and an interpreter implemented. The need for a programming language further emphasizes the importance of the experimental aspect. A language will be a supplement to the graphical user interface (GUI), and will o er control constructs not found in the GUI, like iteration.

1.2 Formal Abstractions Since abstractions are mental tools, and thought is (at least) closely related to language, then also the means of abstraction must be closely related to language. Programming languages allow us to de ne formal abstractions, that is, abstractions formally de ned in terms of the well-de ned semantics of the it's parts. Programming language abstractions are generally divided into data abstractions and procedural abstractions, describing compound data objects (refer \cars") and computational patterns (refer \running"). The selection of a programming language design is not arbitrary. In addition to providing powerful means of abstraction, a suitable language must capture the experimental spirit of Rosetta by lending itself to incremental and interactive problem solving.

1.3 Reader's Guide This report assumes no further background knowledge than discrete mathematics and elementary computer science, although familiarity with Rosetta and with Lisp like programming languages is recommended. In addition, the chapter regarding implementation uses terminology from the C++ programming language. In the next chapter, an overview of the knowledge discovery process and the Rough Sets theory is given. Rosetta is presented in Chapter 3. In Chapter 4, aspects of programming languages are discussed, including formal de nitions, paradigms and types of languages. One section is also devoted to interpreters. The most important chapters of this report are Chapters 5 and 6. In the former, a programming language for Rosetta is selected. In the latter, the interface between an interpreter for this language and the rest of the Rosetta system is chiseled out. Since an interpreter is to be implemented, Chapter 7 discusses the available alternatives for doing just that, before the concrete implementation is described in Chapter 8. Finally, the last chapter presents experiences and discusses conclusions and future work.

Chapter 2

Data Mining with Rough Sets An information explosion has followed the entry of the modern computer. Information is becoming increasingly important in our everyday lives, as we enter the information age. Vast amounts of information is produced, collected, communicated and stored at any time. Most of the information produced and stored are never seen by human eyes. Combining atomic pieces of information from a large collection, can result in new and useful knowledge. The synergetic e ect of combining large databases further increases the potential amount of extractable knowledge, but can also pose important ethical considerations.

2.1 Knowledge Discovery and Data Mining The rapidly increasing amounts of data requires new methods to extract information from databases. The main objective is to extract patterns re ecting relations in the data, rather than atomic items of information. In [Mol97], Mollestad de nes Knowledge Discovery (KD) as follows. Knowledge Discovery (KD) is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The KD process can be seen as a mapping between a set of objects with associated attributes (see the next section), and some new knowledge. Generally a set of classi cation rules in human or machine readable form that can be used to classify new objects is generated. The task of generating rules is called Data Mining (DM). DM is only one of the subtasks of the KD process. An outline of the overall KD process, taken from [KSS97], is shown in Figure 2.1. 3

CHAPTER 2. DATA MINING WITH ROUGH SETS

4

........

Interpretation and evaluation

........

Data Mining

........

Transformation

........

Preprocessing

Selection

........

!

Knowledge Data

Target data

Preprocessed data

Transformed data

Patterns

Figure 2.1: Outline of the overall KD process. Taken from [KSS97].

2.2 Rough Sets Rough Sets [Paw82] is an approximate set theory, introduced by Pawlak. Rough Sets has become a popular theory for Data Mining, much due to it's ability to handle uncertain and/or incomplete data. A thorough understanding of the properties of Rough Sets is strictly not needed for the discussion that follows in this report. Thus, only a short introduction is given in this section. More comprehensive descriptions, rationale and discussions can be found elsewhere, e.g. in [Mol97]. The Rough Sets theory is concerned with sets of objects.

De nition 2.1 (Information System, Decision System) An information system (IS) is an ordered pair A = (U; A), where the universe U is a nite set of objects, and A is a nite set of attributes where each a 2 A is a total function a : U ! Va. An information system is called a decision system if there is a distinguished d 2 A that is called a decision attribute. Then, A ? d is the set of condition attributes. Fundamental to the theory, is the ability to discern between objects in an information system via their attributes. If objects are discernible, then the universe is naturally partitioned into classes of mutually indiscernible objects.

De nition 2.2 (Indiscernibility Relation) Let A = (U; A) be an IS, and B  A. Then the indiscernibility relation IND(B) = f(x; y) 2 U 2ja(x) = a(y) for every a 2 Bg is an equivalence relation. The equivalence class of an object x 2 U induced by IND(B ) is denoted [x]B .

Given a partition, the particular attributes discerning each pair of equivalence classes can be written down.

2.2. ROUGH SETS

5

De nition 2.3 (discernibility matrix) Let A = (U; A) be an IS, B  A, n = jU=IND(B)j and Ei 2 U=IND(B ); 1  i  n. Then the symmetric n  n discernibility matrix of A is M[B] = fm[B](i; j )g; 1  i; j  n where

m[B](i; j ) = fa 2 Bja(Ei 6= a(Ej )g; 1  i; j  n

A particular attribute may not be needed to discern between the objects in a given equivalence class. Then the attribute is said to be dispensable. A subset of the original attribute set that preserves the partitioning of the universe is called a reduct.

De nition 2.4 (Dispensability, Reducts) Let A = (U; A) be an IS, and B  A. Then a 2 B is dispensable in B if IND(B) = IND(B) ? fag. B  B is a reduct of B if all attributes a 2 B ? B are dispensable and IND(B') = IND(B). 0

0

Because reducts describe a partition of the universe, they induce decision rules that describes the functional dependency between the decision attribute and the condition attributes. Given an partition of the universe, an arbitrary set of objects can be approximated by a subset of the equivalence classes.

De nition 2.5 (Lower and Upper Approximation, Boundary Region) Let A = (U; A) be an IS, X  U and B  A. Then, the B-lower approximation BX and the B-upper approximation BX of X with respect to B are

BX = fx 2 U j[x]B  X g BX = fx 2 U j[x]B \ X 6= ;g

The set BX ? BX is called the B-boundary of X.

6

CHAPTER 2. DATA MINING WITH ROUGH SETS

Chapter 3

Rosetta The model of the KD process depicted in Figure 2.1 shows abstractions of the subtasks in the process, but reveals nothing about the computational methods that can achieve those tasks. For most of the subtasks, a range of methods are available. For example, a common preprocessing problem is discretizing, which can be achieved by a variety of discretizing algorithms. Algorithms performing the, basically, same task may di er in a number of properties, e.g. eciency, accuracy, memory requirements, etc. As indicated in Figure 2.1, the KD process not a strict pipeline, but includes feedback between the substeps. The process is experimental of nature; a set of algorithms and parameters yielding a good result for one instance of a database does not necessarily work for another instance or another database. Not only may the nature of interrelations in the underlying model be di erent, but also the statistical distribution of data, noise, missing values, size, etc. will a ect the steps of the KD process. A discretizing algorithm, for instance, may require manual tuning of it's parameters dependent on the clustering of the data in the database instance. A good con guration of algorithms and parameters can not be decided in advance, but will usually emerge as a result of extensive experimentation. In recognition of the need for an experimental, interactive environment for data analysis, Rosetta [K97, KSS97] has been constructed as a joint e ort between the Norwegian University of Science and Technology and the University of Warsaw. Rosetta is a toolkit for data analysis within the framework of the Rough Sets theory.

3.1 Introducing Rosetta The Rosetta system is, at the time of writing, an application running under Windows NT and compatible operating systems. The application consists of two parts, a computational kernel and a GUI front-end. Implemented in the kernel is a range of algorithms and data structures corresponding to the steps in the KD process. The front-end o ers a re ection of the kernel in the GUI. The need for experimentation is met by the front-end, by organizing data structures in a project tree. The topology of the project tree shows the relations between the structures, e.g. that a particular reduct set was derived from a particular decision table. 7

CHAPTER 3. ROSETTA

8

The Rosetta front-end and a sample project tree are shown in Figure 3.1. Several GUI features are included to make the front-end user friendly, including context menus and drag and drop functionality.

Figure 3.1: The Rosetta front-end.

3.2 Inside Rosetta Here, some of the aspects of the Rosetta implementation are considered. See [KSS97] for a more comprehensive presentation. Rosetta is implemented in C++, and consists of a kernel and a front-end. An important design parameter is that the kernel is independent of the front-end, and can thus compile on other platforms than those the GUI is bound to.

3.2.1 The Kernel The kernel can be viewed as a C++ class library. The main object dichotomy is into structure and algorithm objects. Structure and algorithm objects are, respectively, instances of classes inheriting from the Structure and Algorithm classes. Algorithm objects operates on structure objects, and are able to create new and/or mutate such objects.

3.2. INSIDE ROSETTA

9

Memory Management The need to manually deallocate dynamically allocated objects is a source of general frustrations and memory leaks, and is perhaps the most common source of errors in C++ programs. Rosetta attacks this problem by using smart pointers, or handles. A handle is a wrapper object around an ordinary C++ pointer, with approximately the same semantics. The handle mechanism implements reference count garbage collection by updating special attributes in the referenced object when new references are introduced or old ones cease to exist. When the object no longer is being referenced, it's storage space is reclaimed. Classes to be used with the handle mechanism must therefore be inherited from a particular class, namely the Referent class. The reference count mechanism works by counting the number of references to an object. Thus, if, say, two objects are forming a cycle by pointing at each other, they will never be deallocated and a memory leak exists. This restriction must be borne in mind when using the class library.

Algorithm Application An algorithm is applied to a structure object by passing the algorithm to the structure's Apply method. The declaration of the latter is: virtual Structure *Apply(const Algorithm &algorithm);

The algorithm parameters that can be speci ed textually are passed as a single string to the method: virtual bool SetParameters(const String ¶meters, bool warn = true);

Other parameters does not have a convenient textual representation, and must be set with special methods.

3.2.2 The Front-end Rosetta is an MDI application based on Microsoft Foundation Classes (MFC) and Mi-

crosoft's document/view architecture. The front-end re ects the contents of the kernel, and new additions to the kernel will appear in the front-end with none or little programming.

10

CHAPTER 3. ROSETTA

Chapter 4

Programming Languages Designing and implementing a programming language for the problem domain discussed in the previous chapters requires a discussion about several aspects of programming languages. The main dichotomy used by most languages is into data and procedures. Data are collections of attributes, which renders discernibility between objects possible. Procedures are descriptions of processes. Processes are \abstract beings that inhabit computers" [AS96]. They are a part of the computer's state, and evolve with time. Processes have the ability to mutate data. The goal of programming with programming languages is to create processes that manipulates data in a particular way. As will be seen in later chapters, in some programming languages the distinction between \active" processes and \passive" data becomes blurred. In this chapter, some of the aspects of programming languages are illuminated. In the next section, programming languages as tools for abstraction is discussed. Languages are de ned by their syntax and semantics, as described in Section 4.2. Selected common features of programming languages are discussed in the following section. An interpreter is to be implemented for the selected language, and interpreters are discussed in Section 4.4, before language paradigms are discussed. Several kinds of embedded languages exist in applications, and these are discussed in the last section.

4.1 Languages as Abstraction Tools Programming languages provide users with the powerful tool of abstraction. Programmers use abstraction as a tool for 1. Dealing with complexity 2. Increasing modularity Abstractions are used to create compound data and computational patterns. 11

CHAPTER 4. PROGRAMMING LANGUAGES

12

4.1.1 Data Abstraction Data abstraction is a design methodology where the details of the representation of data objects are isolated from the use of the objects. On other words, users of the data objects makes no assumptions about the internal representation of the data. This serves two purposes. First, the data objects can be though of as abstract entities instead of collections of related entities, the same way that we think about a car as an abstract entity. Second, the internal representation is \abstracted away" and is of no concern. Thus, the internal representation can be readily changed if the rest of the system only operates on the abstract data interface. The data objects are used through a small set of procedures that operates on the internal data. These procedures are the glue between the internal representation; the procedures are the interface to the users of the object, and the only ones that can access the internal data. The set of procedures are divided into those who access data, selectors, and those who create objects, constructors. Together, the internal data and the set of procedures are called an abstract data type, or ADT. Data abstraction can be seen as a style of programming, but built-in language mechanisms can be great aids. For instance, language mechanisms can help ensure that no assumptions about the internal representation is made, by only allowing the procedures of an ADT to operate on it's internal data. This mechanism is called information hiding.

4.1.2 Procedural Abstraction How the square root procedure is implemented, is of no concern to the user of the procedure. Any procedure that provides the same output for the same input can be used in stead of the original procedure. Languages provides means to combine procedures to create procedural abstractions. Procedure abstractions are analog with our general abstractions of behavior, like \running".

4.2 Language De nition Any formal { or natural { language can be de ned by the set of sentences that are valid in this language, along with a description of the meaning of each sentence. The former is referred to as the syntax of the language, the latter as the semantics. Consider the English sentence \He saw Nidarosdomen ying over Trondheim". Although syntactically correct, the sentence can not be interpreted without a rule that establishes the meaning of the parts of the sentence1 . A third facet of programming languages focuses on implementation; the pragmatics. In this example, the sentence can actually have more than one meaning. A language with a syntax like this is called ambiguous. 1

4.2. LANGUAGE DEFINITION

13

4.2.1 Syntax A language is de ned over a set of symbols 2. The syntax tells us how to compose symbols to make sentences, and thus determines the form of valid sentences and which strings of symbols that are in the language. Consider the task of adding 3 to the product of 4 and 5. This idea can be represented by a variety of syntactic forms (strings of symbols), e.g. 3 + 4 * 5

or (+ 3 (* 4 5))

or simply as the rst sentence of this paragraph. In the rst two examples, the symbols of the language are single characters like digits. In the last example, the symbols are words from the English language. Syntax describes the connection between symbols on a paper or a computer screen, and an abstract concept. In a particular language, one of the above expressions can be a syntactic expression for the abstract concept of applying the operator * to 4 and 5, before applying + to the result and 3. Many texts refer to the abstract concept as abstract syntax, and a concrete string of symbols as concrete syntax. An abstract syntax tree for our example is shown in Figure 4.1. The abstract syntax tree represents the relation between the symbols independent of the concrete representation of each symbol. One can think of the abstract syntax tree as the result of using \abstract" grammar rules (see below) of the type

 An expression can be an arithmetic expression or a numeral  An arithmetic expression consists of an operator and two sub-expressions  An operator can be an addition operator or a multiplication operator In languages with unambiguous semantics, the mapping between concrete and abstract syntax determines the relative precedence of the syntactic constructs. If Figure 4.1 is an abstract syntax tree for the rst example above,  has precedence over + in this language. The syntax of a language can be redundant in that more than one syntactic form can be used to specify the same concept. Redundant syntactic forms are called syntactic sugar, and can be de ned in terms of \basic" syntactic forms. Many criteria can be speci ed in constructing a syntax for a programming language. Some are discussed in [PZ96], and are represented as general goals:

 Readability Symbols can be de ned in terms of e.g. strings of characters, topologies of connected networks of graphical symbols or sequences of smoke signals. In what follows, we consider symbols to be strings of characters, although any other representation would suce. 2

CHAPTER 4. PROGRAMMING LANGUAGES

14 Expression

Arithmetic Expression

Operator

Expression

Addition

Numeral

3

Expression

Arithmetic Expression

Operator

Multiplication

Expression

Expression

Numeral

Numeral

4

5

Figure 4.1: Abstract Syntax Tree

   

Writeability Ease of veri ability Ease of translation Lack of ambiguity

Obviously, these goals are con icting. Particularly, programs that are easy to write are often dicult to read. Making the process of transforming an idea to a program as simple as possible requires writeability. Writeability is enhanced by for example using the same syntactic structures in more than one context. Readability is claimed to enhance the maintainability of computer programs. However, the author's claim is that the features that make a program more readable { e.g. many syntactic forms, natural statement formats and heavy use of assignment { also makes the source code more error-prone and less reusable. Many alternative \convenient" syntactic structures make the language less uniform and causes more trouble than they are worth when programs get large and complex [AS96]. Another aspect that is orthogonal to readability and writeability is expressiveness. Expressiveness is not only related to syntactic aspects, but also to e.g. means of abstraction. As an illustration, the Scheme expression (map (lambda (x) (+ (* x x) x)) '(1 2 3 4))

is both expressive and writable (it uses three very similar syntactic forms).

4.2. LANGUAGE DEFINITION

15

Formal Methods Thousands of pages have been written about, and a number of theoretical formalisms and practical tools built for, the syntax of programming languages. The early e orts of formal language design focused entirely on syntax, disregarding the importance of semantics. One reason for this is that the available tools at that time were suitable for attacking the problem of syntax. As a result, the concept of syntax is very well understood and can in many contexts be regarded as a \solved problem". Designing tools to automatically understand the syntax of a program is fairly mechanical. Here we consider the most common formal method for describing syntax; BNF.

De nition 4.1 An alphabet A is a nite, nonempty set. The elements of an alphabet are symbols. A sentence over A is a nite string of symbols from A. A language over A is a subset of A*.

A language can be described by listing all the sentences of the language. This is obviously not very convenient. The most common method is to use a grammar.

De nition 4.2 A phase-structure grammar G = (A,T,S,P) consists of an alphabet A, a set T  A of terminal symbols, a start symbol S 2 A and a set of productions P that speci es how to derive a new string from another string. The language generated by G, L(G), is the set of all strings included in T* that can be derived from S.

Phrase-structure grammars can be classi ed according to the types of productions that are allowed. The most common classi cation scheme is due to the American linguist Noam Chomsky. Chomsky de ned four types of grammars. The types can be ordered from 0 through 3, and for each type every grammar of that type is also a grammar of the types with lower order. Especially important in the context of programming languages is the type of grammars called context free grammars (type 2). The languages generated by context free grammars are called context free languages. Nearly all programming languages are contained in the class of context free languages. The most common notation for specifying the productions of a context free language is Backus-Naur form (BNF) 3 In BNF, the productions are strings of symbols with a single non-terminal symbol enclosed in angular brackets to the left of an arrow and strings of symbols, separated by |, that can replace the non-terminal to the right.4 Example:

!

| 0|1|2|3|4|5|6|7|8|9 +|*

! ! !

At the level of a programming language, the symbols are the \atomic" terms that makes up the sentences of the language. But how do we specify the individual symbols themselves? As 3 4

BNF was invented by John Backus, and re ned by Peter Naur for the description of Algol60. Actual notation may di er.

CHAPTER 4. PROGRAMMING LANGUAGES

16

mentioned, symbols are strings of characters. Thus, we can de ne a language with a grammar that uses the set of characters as the alphabet. It turns out that the languages for tokens of programming languages all fall into the smallest class of languages { those generated by regular grammars (type 3). This class is called regular languages. Regular languages are commonly de ned by regular expressions.

4.2.2 Semantics A language's semantics is a function that maps a syntactic structure { i. e. an abstract syntax tree { to a meaning. In most programming languages, the semantics of a compound expression is a function of the semantics of the parts. A semantics de ned this way is called syntax directed. Under a particular semantics, the meaning of the structure in Figure 4.1 can be 23 if the numerals are interpreted as numbers and  and + as the usual binary algebraic operations5 .

Formal Methods Unfortunately, due to the focus on syntax, formal techniques for describing semantics have not been given its fair share of attention6 . In contrast to syntax, no universally accepted notation is available for semantics. Alas, many programming languages does not have a formal de ned semantics. However, it seems like the need for a formal mathematical framework for semantics is getting more and more recognition. The principal methods for description of semantics are:

 Operational Semantics  Denotational Semantics  Axiomatic Semantics [TG97] contains an excellent introduction to operational and denotational semantics. In-depth discussions on formal semantics are provided in [Lee90, Win93, Mey88, Ten90].

4.2.3 Pragmatics Although a syntax directed semantics says what the meaning of an expression is in terms of the subexpressions, it does not say how to compute the meaning. Pragmatic issues such as memory allocation strategy does not a ect the result of the program (the meaning), but can a ect resources such as computing time and memory. Some formal language descriptions include pragmatic issues7 . In another interpretation,  could be interpreted as the binary maximum operator, giving the meaning of the structure to be 15. 6 As put in [TG97]; This state of a airs is reminiscent of the popular tale of a person who searches all night under a street lamp for a lost item not because the item was lost there but because the light was better. 5

7

As we shall see, the Scheme standard requires evaluation to be proper tail recursive.

4.3. LANGUAGE FEATURES

17

4.3 Language Features Programming languages can be classi ed according to a wide variety of criteria. A thorough treatment of the properties of the di erent classi cations is outside the scope of this report, hence only a brief discussion in the context of interpreters is provided in this section. One of the most important classi cations of programming languages, language paradigms, is postponed to the next section. Some of the features that distinguishes programming languages are:

      

Typing Scoping Extent Restrictions on Manipulation Parameter Passing Methods Order of Evaluation Case Sensitivity

Typing refers to when type checking occurs. In a dynamically typed, also called weak or latent, language like Lisp, type checking is deferred as long as possible { typically to during evaluation. In a statically or strongly, typed language like Pascal, type checking occurs during parsing. Typically, in strongly typed languages the programmer must explicitly specify the type of variables, parameters and return values. Most often, statically typing restricts the expressiveness of the language. An example is higher-order procedures, where the argument and result types of all argument procedures must be declared in a statically typed language. This renders certain abstractions of general behavior, like mapping a procedure over all elements in a general data structure, impossible. On the other side, dynamically typed languages are prone to hard to nd errors in rarely executed code. Scoping refers to where non-local variable references are looked up. In a lexically scoped programming language like Pascal, variables that are not bound in the current block are looked up in the enclosing de nitions (as they appear in the program code). Thus, variable references can be resolved at parse time. In a dynamically scoped language like Common Lisp, unbound variables are looked up in the calling environment, and resolving must be postponed to runtime. The problems related to dynamically typing are obvious; the semantics of expressions depend on local variables. Programming languages also di er in the extent of the values, or objects, created. In some languages, the objects are deleted when the variables used for their initial bindings goes out of scope. In others, like Scheme, objects have unlimited extent and are theoretically never deleted. Because of the (pragmatic) need for reallocation of memory, such objects actually are deleted when they provedly cannot be referenced. Languages have di erent Restrictions on Manipulation for the di erent computation elements. Those computation elements with the fewest restrictions have rst-class status and can be:

CHAPTER 4. PROGRAMMING LANGUAGES

18

   

named by variables passed as arguments to procedures returned as results from procedures included in data structures

The main distinction is between languages that award rst-class status for procedures and those that do not. The impact of rst-class procedures on expressiveness should be obvious. Parameter Passing Methods refers to the semantics of the formal and actual parameters in a procedure call. Some methods are:

 Pass-by-value  Pass-by-reference If parameters are passed by value, the values of the actual arguments are bound to the formal argument variables. In pass-by-reference, the formal arguments are bound to the locations pointed to by the addresses of the actual arguments variables. In the latter case, mutation of a variable value inside a procedure will have external e ects. The Order of Evaluation di ers in when arguments to procedures are evaluated. In languages with lazy evaluation, evaluation occurs when the value is needed. In applicative-order languages evaluation occurs before the procedure is applied. Last, programming languages di er in case sensitivity of identi ers.

4.4 Interpreters The task at hand is to implement an interpreter. That is, to make a computer program that evaluates a source code program for a given language. Thus, the program has become data to be input to another program. Also observe that when writing an interpreter program itself, most often this also becomes input to yet another program (e.g. a compiler). In this section these issues are investigated further, and a related terminology developed. What is needed is a way to de ne a programming language in terms of another. There are two principal constructions that achieve this; interpreters and translators. Although our primary concern is interpreters, we will need to talk a bit about translators here and in later chapters. The following de nitions are purposely vague.

De nition 4.3 An interpreter is a program written in a de ning language B that, when run, evaluates a program written in a de ned language I.

De nition 4.4 A translator is a program written in a base language B that, when run,

transforms a program written in a source language S to a program written in a target language T.

4.4. INTERPRETERS

19

Consider an executable program running on a computer. Conceptually, the computer's microprocessor does exactly the same thing as an interpreter; it evaluates a program. Thus, the microprocessor can be seen as an interpreter that evaluates programs written in the microprocessor's instruction set (executable object code). The de ning language of this interpreter could for example be said to be microcode, and the interpreter could be visualized running on a unit that evaluates microcode. Now what if the executable program is an interpreter itself? Then the microprocessor has become the de ning language for this interpreter. Together, the \microprocessor interpreter" and the machine it runs on (the microcode unit), has become the machine that our program interpreter runs on, de ning the de ning language for this interpreter. We see that the combination of an interpreter written in a de ning language B for a de ned language I and a machine for language B becomes a machine for language I. [TG97] introduces the general concept of a virtual machine :

De nition 4.5 A virtual machine is an agent that takes a program written in a language as input, and returns a result.

Thus, an interpreter is a program that transforms a virtual machine for language B to a virtual machine for the de ned language I. Together with the virtual machine that results from the combination of the \microprocessor interpreter" and the (virtual) machine it runs on (the microcode unit), the program interpreter again de nes a virtual machine. This virtual machine accepts programs written in the interpreter's de ned language. [TG97] introduces a pedagogical graphical notation for virtual machines and programs (Figure 4.2). The notation resembles a jigsaw puzzle. To illustrate that a program, e.g. an interpreter, Program

Virtual Machine

Figure 4.2: A virtual machine and a program (notation taken from [TG97]). can not do anything until combined with a virtual machine, the virtual machine that accepts the same language as the interpreter's de ned language ts exactly in the hole in the program. The indentation and protrusions indicates input and output of various forms. The example above is illustrated in Figure 4.3.

CHAPTER 4. PROGRAMMING LANGUAGES

20

Instruction set Interpreter

+

Microcode VM

=

Instruction set VM

=

Interpreter VM

Interpreter Program

+

Instr. Set VM

Figure 4.3: Interpreter example

4.4. INTERPRETERS

21

4.4.1 Interpreting Programs To help understand some of the more pragmatic issues relating to an interpreter, an overview of it's workings follows. This is a highly simpli ed description. The interpretation of a program can be divided into three phases: 1. Scanning 2. Parsing 3. Evaluation The task of the scanning, or lexical analysis, phase is to map a stream of characters into a stream of symbols, or tokens, in the alphabet under consideration. The scanner must recognize symbols such as reserved words, special symbols, numbers and identi cators. As mentioned in Section 4.2.1, regular expressions are used to describe the symbols of a language. Thus the speci cation is straightforward, but for many languages the implementation of a scanner can be complex. Regular languages are recognized by nite state machines, or nite automata (Kleene's theorem). Finite automata are mathematical formulations of algorithms, see e.g. [Ros96] for formal descriptions. Fortunately, software tools are available to cope with the complexity of creating nite automata. One such tool is the scanner generator Lex 8 [LMB92]. Lex takes a set of regular expressions as input, and outputs the source code that implements a nite automata that recognizes the corresponding regular language. The results of the scanning phase is fed to the parsing phase. The parser maps a stream of symbols into an abstract syntax tree. As mentioned before (Section 4.2.1), the syntax of a context free language is most often described using BNF. Also for the parsing phase, powerful and practical tools are available. One tool is the parser generator9 Yacc 10 (\Yet Another Compiler-Compiler") [LMB92]. Yacc maps a description in BNF into the source code of a deterministic nite automaton of LALR(1) items [Lou97]. Most often, the result of the scanning phase is a simpli ed syntax tree. Some texts distinguishes between a tree that re ects the exact operation of the grammar like Figure 4.1, and one that only shows an abstraction of the tokens. However, the use of terms like \abstract syntax tree", \syntax tree" or \parse tree" is inconsistent; one term in one text means the opposite concept in another text. In this report the issue is avoided by using \abstract syntax tree" for both concepts. The important thing is that the rst type of tree contains all the information needed to reconstruct the sequence of symbols, while the second does not. In most cases, such extended information is not needed. After parsing, the abstract syntax tree is evaluated. In a syntax directed semantics, the semantics of a node is recursively de ned as a function of the semantics of the child nodes. The leaf nodes, like numbers or identi ers, have a built-in semantics. One very popular implementation of Lex is GNU ex. Also called a compiler-compiler 10 One very popular implementation of Yacc is GNU bison. Both ex and bison is available for a variety of platforms under the Gnu Public License. 8

9

CHAPTER 4. PROGRAMMING LANGUAGES

22

4.5 Language Paradigms In the last section, some of the features that constitute the semantics of programming languages were described. In this section, a broader classi cation approach is taken in describing the computation model supported by the di erent programming languages. In choosing a proper extension language, it is important to identify the distinctions between the paradigms and their properties in the context of the problem domain. Programming languages di er in the design philosophy of their semantics. The semantics of a language is de ned with a particular computational model in mind. This computational model not only a ects the pragmatic issue of how a program should be executed, but also the style of programming used with the language. A language paradigm can be described by it's view of the nature of the world { it's \ontological commitments". The functional programming paradigm, for instance, considers problems to consist of functions that can be composed to make up other functions. Languages supporting the same paradigm have some of the same constructs, like procedure application or abstraction mechanisms, in common. There are four basic language paradigms; the imperative, the functional, the object oriented and the logic.

4.5.1 The Imperative Language Paradigm Imperative, or procedural, programming languages are characterized by extensive use of assignment. Assignment is a construct that mutates the value of a variable. Programs written in imperative languages consist of a sequence of statements. When executed, each statement alters the state of the computer, thus execution can be seen as a sequence of state transitions. Since the state is altered by each statement, the orders of the statements are critical. Typically, imperative programming languages o er a wide range of sequence control mechanisms. A typical basic syntax outline for imperative languages is:

!

program ; |

!



11

Consider the problem of calculating the factorial of a number n. A program written in imperative style could for example be a sequence of multiplication operations, each updating a variable representing an intermediate result. When the program has executed, the variable holds the nal result; see Figure 4.4 Imperative programming languages are often close to factorial = 1 counter = 1 while counter =< n do factorial = factorial * counter counter = counter + 1

Figure 4.4: Imperative style factorial function. 11



is the empty sentence.

4.5. LANGUAGE PARADIGMS

23

the computer hardware, because the computational model resembles the conventional von Neuman computer hardware architecture. Examples of programming languages using this model are Algol 60, COBOL, Pascal and C.

4.5.2 The Functional Language Paradigm An imperative program produces a sequence of states, starting with the initial and ending with the goal state. When humans think about problems, we usually try to nd a mapping between the problem (the initial state) and the solution (the goal state) instead of a sequence of states leading to the goal state. This approach is also taken by the functional, or applicative, programming languages. Instead of a sequence of state transitions, a program in a functional language represents a function. While programming in imperative languages focuses on the available data, programming in functional languages focuses on the result. This is done by constructing a function that maps the initial values to the desired result, rather than constructing consecutive states. Programs are often decomposed into smaller subproblems that again represent functions. The functions are combined using function composition. A typical basic syntax outline for functional languages is:

!

( * )

!

The factorial problem can be formulated in a functional language by considering the properties of the result. The properties of the factorial of a number n is:

 it equals n times the factorial of the number n ? 1, except  when n equals 0, then the factorial equals 1 Example that calculates the factorial of k in a Lisp like language is shown in Figure 4.5. This example illustrates that the informal description often can be directly mapped into a (define (fact n) (if (= n 0) 1 (* n (fact (- n 1))))) (fact k)

Figure 4.5: Functional style factorial function. formal functional program. Another point illustrated is the use of recursion as an abstraction technique for de ning a concept in terms of itself. Examples of functional programming languages are Lisp, Miranda and Haskell.

CHAPTER 4. PROGRAMMING LANGUAGES

24

4.5.3 The Object Oriented Paradigm A more exible and modular approach to data abstraction than ADTs are objects. An object instance consists of a local encapsulated state and a set of shared global procedures, methods, that operates on it's state. The set of procedures makes up the object's interface 12. Thus an object is a set of operations, constituting the interface of the ADT, that share a state. In Chapter 6 a representation of objects as procedures in a language with rst-class procedures is presented, along with a style of programming called message passing. Often, objects share the same methods and are said to be of the same class. In some languages, classes are objects themselves. Languages that employ objects together with inheritance are called object-oriented. Inheritance is used when data abstractions that are speci cations are needed. If an object is an instance of a class (the subclass) inherited from another class (the superclass), then the object inherits the superclass's state and methods in addition to those speci ed for the subclass itself. Object-oriented programming can be seen as a combination of the two paradigms above, in that the imperative model is used in building concrete data objects and the functional is used in building classes of functions that use a restricted set of data objects [PZ96]. The object oriented paradigm is becoming increasingly important, although the fundamentals of object-oriented programming is not yet fully understood. Examples of object-oriented programming languages are Simula-67, Smalltalk and C++.

4.5.4 The Logic Programming Paradigm Logic programming languages take a di erent approach than the languages employing the other paradigms. Grown out of research in automatic theorem proving, these languages are used to describe some aspect of the world. The program can then be \run" to nd sound consequences. An example of a logic programming language that solves problems stated in predicate calculus is Prolog.

4.6 Scripting and Extension Languages Almost every application program of a certain complexity contains a parser that recognizes the grammar of a language of some kind. This is not a new fact, even the rst word processors and spreadsheet programs recognized the need for the user to extend the application's functionality and included command line options, con guration les or even macro facilities. Such embedded languages takes a plethora of di erent shapes and serves di erent purposes. Hence a, often confusing, terminology describing the di erent kinds of languages have developed. The view taken in this report, is that the main dichotomy is into scripting and extension languages. 12

Sometimes called the protocol.

4.6. SCRIPTING AND EXTENSION LANGUAGES

25

4.6.1 Scripting Languages A scripting language is a self-contained system program residing on an operating system. The purpose of a scripting language is to serve as a glue between applications and between other system programs. Scripting language programs are used to feed the results from one program into another program, creating a pipeline. They are very popular on, for instance, Unix operating systems, where many programs have been designed following a philosophy stating that each program should perform one basic task. Scripting languages include control mechanisms to specify the sequence of external programs, and a powerful system interface. Scripting languages include the Bourne shell, Perl, Python and Tcl.

4.6.2 Extension Languages Embedded languages are used to extend the functionality of applications. Extension languages are tightly integrated with the applications they are embedded in. The need for such languages is primarily rooted in one of two di erent requirements:

 The need to customize the application.  The need to use a programming language interface with the application. An extension language can o er means to customize the functionality or the user interface of an application. This can be seen as an alternative to changing and recompiling the program's source code. Changing the appearance and controlling the functionality of applications can help adapt the applications to the users' needs. Customization of user interface can include changing of menus or output data format. Customization of functionality can include speci cation of parameters to be used in algorithms in a spreadsheet application. Customization languages takes on many forms:

 Command line options  Con guration les  Macro languages Often, these are proprietary, cryptic { though powerful { languages. In addition to a ecting the behavior of the applications, extension languages can be used as the { possibly sole { user interface to the application. Many modern application programs have GUIs. GUIs have their advantages; they are (often) intuitive and have gentle learning curves. However, GUIs does not solve all needs. They are seldom exible, and can at most be customized to a small extent as described above. The users are left at the mercy of the application designers' judgment of their needs. GUIs are functional when the use is limited to tasks foreseen by the designers, but when { often simple { operations which are not in the GUIs' repertory are needed they become frustrating and cumbersome to use. Extension languages o er more exible and powerful means of abstraction and combination of data and

26

CHAPTER 4. PROGRAMMING LANGUAGES

operations, and give more freedom to the user. They can be used either as the sole user interface, or to complement a GUI. An example of the latter is the consideration of this report. A prominent example of an extension language is Free Software Foundation's GNU Emacs [Sta79]. The extension language used by GNU Emacs is the Lisp dialect Emacs-Lisp. EmacsLisp is a full- edged programming language, specialized for text editing, and is an example of a general customization language. Emacs-Lisp also has aspects of being an application interface, blurring the distinction between the two classi cations.

Chapter 5

An Extension Language for Rosetta Programming is not merely specifying a computational process via a program, it is itself a process, where the programmer gains insight into the problem and usually rede nes the original problem formulation. In fact, programming is probably the best way of thinking of a wide range of problems. The nature of a programming language therefore has considerable in uence on the creative task of problem solving, thus selecting a particular extension language must be done carefully.

5.1 Reformulating the Problem The project formulation was stated in Chapter 1. In the light of the background information presented in the previous chapters, a closer look at the problem in the terms introduced is opportune. The term \scripting language" used in the original problem formulation is somewhat inaccurate, refer the discussion in Section 4.6. In the terminology of that section, what is sought for is a an extension language, or more precisely, an extension language to be used as an application interface.

5.1.1 The Extension Language Game Board The project formulation states that an interpreter is to be implemented. In Section 4.4 it was argued that this is the problem of de ning one language in terms of another. Interpreters, translators and virtual machines were also discussed in that section, and will help concretize the problem. If an interpreter transforms a virtual machine into another and programs just are data, why make an interpreter at all? Why not use the existing virtual machine in the rst place? The answer is, of course, that this machine fails to o er the hight level language features required by an extension language. Instead, we create an abstraction of this machine in the form of a new. Rosetta's implementation language is C++ (Section 3.2.1), and it will be assumed that the 27

CHAPTER 5. AN EXTENSION LANGUAGE FOR ROSETTA

28

interpreter is to be implemented in the same language1 . Figure 5.1 shows this setup. It is assumed that a C++ translator is available for the same platform (virtual machine) that it translates programs to; in other words with identical base and target language. This virtual machine is called i386. Together, the translator and a i386 virtual machine again constitutes a virtual machine that accepts C++ programs. The interpreter program for the imaginary language ExtLang, written in C++, is given as input to this machine, and the output is a semantically equivalent interpreter program written in the translator's base language; i386. Combining this program with a i386 virtual machine, as shown in Figure 5.2 a), makes up a new virtual machine that accepts programs written in ExtLang. Again combining this virtual machine with a program written in ExtLang creates a virtual machine for the given program; Figure 5.2 b). The ExtLang virtual machine is what's sought after. After the one-time translation, the interpreter is combined with the i386 virtual machine, i.e. run on the corresponding platform, and will accept ExtLang programs for evaluation.

5.2 General or Special Purpose Language? The rst issues to be decided upon are whether the extension language should be a special purpose or a general programming language, and in the latter case, if an existing programming language should be used. Application programs have a tendency to grow considerably in complexity. Unfortunately, this important fact is seldom captured in the initial design philosophy. The result, with respect to embedded languages, is that applications often start out with moderate language capabilities i.e. in the form of macro languages used for user customization. These language capabilities are then being extended while the complexity of the application increases. Often, at some point, the language gets a nature more like an interface language. This embedded language evolution has several aspects. First, it leads to heavily specialized and powerful languages, as features are added to suit the application. As put in [LB94]: Experience also indicates that simpli ed or specialized extension languages often have more features added and grow until they resemble a full programming language. Such \organically grown" extension languages are likely to be contorted designs as they will consist of several levels of extensions glued on to their initial, more limited design. Second, it leads to cryptic, hard to understand and use, languages that are seldom used by typical users [LB94]. Finally, languages initially designed for small customization tasks doesn't scale up, making it hard or impossible to write actual programs over a certain size. Embedding a general purpose programming language from the start will prevent language growth. Also, users want languages they can write fair-sized programs with, because that's exactly what they are going to do. A recognition of this has led to a trend away from 1

Or in C, which here can be viewed as a subset of C++.

5.2. GENERAL OR SPECIAL PURPOSE LANGUAGE?

ExtLang interpreter Written in C++

C++ -> i386 transl. Written in i386

i386 VM

ExtLang interpreter Written in i386

Figure 5.1: Compiling an interpreter program for the extension language ExtLang.

29

CHAPTER 5. AN EXTENSION LANGUAGE FOR ROSETTA

30

ExtLang interpreter Written in i386

a)

+

i386 VM

=

ExtLang VM

ExtLang Program

b)

+

ExtLang VM

=

Program VM

Figure 5.2: a) Combining the result from the translation in Figure 5.1 with a i386 virtual machine. b) Combining a program written in ExtLang with the virtual machine from a) to create a virtual machine for the given program.

5.2. GENERAL OR SPECIAL PURPOSE LANGUAGE?

31

specialized cryptic extension languages towards general programming languages with Algol or Lisp like syntax and semantics. In addition to the arguments above, the need for general programming language features like le I/O and data structures was emphasized in discussions with Rosetta users. The inevitable conclusion is: A general purpose programming language will be used as an extension language for Rosetta. The next question is then whether an existing general programming language should be used, or a new one designed. As a plethora of general programming languages are available to choose from, designing a new one seems unnecessary. Also, given that the language should be general and have some advantage over existing general languages, in all likelihood it would be outside the scope of the project. Thus, an existing language will be used as a basis for an extension language. Of course, the syntax and semantics of the language can be modi ed and/or extended to suit our needs.

5.2.1 Language Paradigms A language paradigm is more of an aid for a particular programming style than a implicit property of all programs written in the language. Most general programming languages support a variety of programming styles. However, generally a language is biased towards one paradigm, which is what is meant by a phrase such as \functional language" in this report. It is for instance said that Lisp is a functional language, although almost any language paradigm can nd expression in Lisp. Of the paradigms mentioned in Section 4.5, only imperative and functional languages will be considered here. The logic programming paradigm, although very powerful for the right kind of problems, does not generalize very well. The object oriented paradigm nds convenient expression in the other two paradigms, which { incidently { is a main point in the next chapter. Also, most contemporary object-oriented programming languages require large and complex implementations and are not very well suited to be used as extension languages. To further narrow the set of alternatives, mainly Algol like imperative style languages like Pascal or C and Lisp like functional style languages like Scheme will be considered. Lisp dialects have a simple syntax and a simple yet powerful semantics, and are especially suitable to create interpreters for. Other functional languages, such as those in the ML family, have a more special syntax and requires large run-time systems rendering them un tted for a simple embedded interpreter. Some other imperative style languages than those considered here are in use. However, these are often ad-hoc customization languages grown to be full general extension languages, and su er from the problems indicated in Section 5.2. One such language is Tcl [Ous90]. Tcl is a popular scripting/extension language, but lacks many of the features characterizing \real" general programming languages2 . Tcl has been the subject of controversies described in the next section. On September 23rd 1994, Free Software Foundation's Richard Stallman posted an article titled \Why you should not use Tcl" to the Usenet, pointing at the weaknesses of Tcl. Predictably, a heated discussion followed, involving among many others Tcl's designer John Ousterhout. 2

CHAPTER 5. AN EXTENSION LANGUAGE FOR ROSETTA

32

5.3 Programming Language Controversies Every discussion between computer programmers will eventually degenerate into a discussion about either religion or programming languages3 . There are great controversies between supporters of various languages, and particularly between followers of the imperative versus the functional paradigm. Of course, the objective of this chapter is not to take a stand in this debate, but to select a suitable extension language. To do that, a closer look at some of the characteristics of the languages must be taken.

5.3.1 Pitfalls of Imperative Programming Imperative programming languages have traditionally dominated both education [Val96, AS96] and systems engineering. Intuitively, one should think that this must be because they have some inherent exceptional ability to capture ideas and turn them into processes. This is not necessarily the case. [Val96] argues that functional languages are well suited to teaching engineering principles, and points at some of the reasons of the prevailing reluctance to actually use them. Programming languages are used to express ideas, hence it is important that the style of programming promoted by the language easily lends itself to the way we think about ideas. Unfortunately, imperative programming languages to a great extent model the computer ; they are constructed to program a (von Neumann) computer instead of to express ideas and problems. An illustrating example is the factorial function from Section 4.5. The functional programming example directly expresses the idea of a factorial, while the imperative programming example uses arti cial constructions to mechanically create the same process. Unlike functional languages, imperative languages are rarely based on formal mathematical fundaments nor do they have formal semantics, which raises questions about their computational models. One pitfall of imperative languages are that they allow hard to nd program errors that functional languages do not. Abelson and Sussman [AS96] point at an example involving the factorial function. Imagine changing the order of the assignments updating the variables in Figure 4.4. This would lead to an incorrect program, illustrating that the sequence of assignments must be carefully considered.

5.3.2 A Closer Look at the Functional Paradigm The functional paradigm is now gaining popularity, although the functional approach was strongly advocated very early, e.g. by John Backus in 1978 [AS96]. When discussing suitability as extension languages for Rosetta, some aspects of functional languages stand out. The function composition aspect ts very well into a description of the general KDD process, Figure 2.1. The process can be seen as a pipeline, where each step is a function transforming it's input into the input to the next step. Thus, the whole pipeline 3

Sometimes the distinction becomes very subtle.

5.4. ADOPTING A LANGUAGE

33

is a composite function. Also, Lisp like functional programming languages are especially well suited for interpretation. They generally have very clean syntax directed semantics, and a simple computational model saying that, modulo the imperative features, equals can be substituted for equals. Imperative languages are better suited for compilation, unlike functional languages which generally requires a rather large run-time environment. It is suspected that iterative constructs is an important requirement of a Rosetta extension language. Such constructs are often associated with imperative languages, the same way functional languages are associated with recursion. Actually, iterative procedures can be equally well be represented in Lisp like languages. As a matter of fact, iteration can be de ned in terms of recursion, reducing the requirement to syntactic sugar.

5.3.3 The Economy of Learning An assertion used against the use of functional languages in general is that they allegedly are counter-intuitive and have a steep learning curve. This seems especially important in the context of an extension language; a steep learning curve is even less acceptable if one only wants to automate a few steps in a process compared to e.g. writing a complete application. For Lisp like languages, both arguments can be suspected to be rooted in what is perceived as an unfriendly syntax. At rst sight all the parenthesises of an expression may look intimidating, but once the basic syntactic forms and the computational model is learned, most features actually seem rather intuitive. That Lisp like languages have a steeper learning curve than some imperative style languages, may be { although it is not obvious. Programs written in imperative languages usually look easier to write, but the fact is that generally very few di erent language constructs are needed to write a program in a functional language. But consider, for the case of the argument, that the statement is true. Is it then a viable argument against the use of functional languages? A steep learning curve certainly doesn't seem very nice, but seen from an \economic" viewpoint it is not so important after all. From economics, we know that, in the long run, variable costs always dominates xed overheads. This goes for the economics of time and learning, too. In the long run, the total time used on learning depends on the variable \time costs". Lisp like languages may have a steeper initial learning curve, but (at least some of them) require very little updating once learned. This is contrary to imperative languages, which often have large language manuals devoted mostly to syntax. Even expert users of these languages have to use language documentation regularly, to look up language syntax subtleties.

5.4 Adopting a Language In the previous section it was argued partly in favor of functional languages. One particular functional language stands out as especially suitable to be used as an extension language. The programming language Scheme will be used as an extension language for Rosetta. Scheme is a statically scoped Lisp dialect with rst-class procedures. After some of the arguments for selecting Scheme are discussed, an short overview of the language is presented.

34

CHAPTER 5. AN EXTENSION LANGUAGE FOR ROSETTA

Reasons for selecting Scheme are:

 Scheme is a general programming language. The advantages of selecting a general language was discussed in Section 5.2.

 Lisp lends itself readily to interpretation.  Lisp has an inherently interactive nature, which encourages exploration and testing of solutions in an incremental way.

 Scheme is the only Lisp dialect that is small enough to be used as a simple extension language without dominating the application, yet general enough.

 Scheme has a simple syntax, and powerful semantics.  Scheme is standardized.  Scheme is small, orthogonal and well-de ned enough to be learned with a small e ort, which is required of an extension language. A steep learning curve is thus not a problem, even in the context of an extension language.

 Scheme allows small implementations, which is desired of an extension language.  Although termed a functional language, Scheme is actually just as much akin to Algol

as to the original Lisp. From Algol Scheme has inherited statically scoping and block structure. With Scheme, users are not bound to a particular style of programming, but can e.g. write programs that are structural identical to corresponding programs in traditionally imperative languages.

 Because of Scheme's symbol handling abilities and metalinguistic power, it lends itself

to language interpretation. This comes in quite handy for an extension language: with a Scheme interpreter as a base, any language can be put on top. Thus the application can be easily augmented with a new language, and the individual user can use the language of choice. The new language constructs will actually be syntactic sugar for Scheme expressions. This approach is taken in the Scheme system Guile (see Section 7.2.6). In this system, an evaluator for a C-like language have already been implemented, and support for Perl, Python, TCL, REXX and Emacs Lisp are under development.

A counter-argument is that Lisp is very ine ective. This is mainly a myth, although it certainly is possible to write extremely inecient code in Lisp. As discussed in the next section, proper tail recursion is required of Scheme implementations, which allows recursive de ned procedures to exhibit iterative behavior and thus disposes of the worst causes of ineciency. Of course, the programmer must know about which kinds of expressions are tail calls and which are not in order to make use of tail recursion. The conclusion is that, even in functional languages, the declarative problem solving approach must be assisted by pragmatic eciency considerations.

5.4. ADOPTING A LANGUAGE

35

5.4.1 Scheme Overview Scheme was rst described in 1975, and was standardized by the IEEE in 1990. The language is constantly evolving, which has resulted in a series of language reports. The latest, \Revised5 Report on the Algorithmic Language Scheme" (R5 RS) [KCE98], was published after the start of this project, on 20 February 1998. Henceforth, the phrase \the Scheme standard" corresponds to the latest revised report. Here, a brief overview of some of Scheme's features is given. Refer Section 4.3 for description of some of the language features discussed. For a complete description of the language, see the latest revised report [KCE98]. For a more thorough introduction to Scheme programming, and to programming and computer science in general, refer Abelson and Sussman's textbook [AS96].

Expressions Scheme programs consist of expressions. Every expression evaluates to a value, and programs are usually created by constructing compound expressions. One type of expressions is literal expressions, which essentially evaluates \to themselves". An example expression is: (+ 1 (+ 2 3))

Syntactic Forms Expressions are constructed from a restricted set of syntactic forms. The syntactic form used in the example above is a combination. A combination consists of a sequence of subexpressions separated by whitespace and enclosed in parenthesises. In evaluating a combination, the rst subexpression is evaluated to a procedure and the rest of the subexpressions are evaluated to their respective values. These values are then passed, by value, as arguments to the procedure, which yields the value of the combination. Several special forms are available in addition to combinations. Most notable is lambda expressions, which evaluates to procedures: ((lambda(x) (* x x)) 3)

)

9

Others include conditionals, binding constructs, and sequencing.

Model of Evaluation Besides literal expressions and syntactic forms, Scheme expressions also include variables. Variables evaluate to the value stored in the location to where the variable is bound. Names for variables in Scheme are case-insensitive. Scheme's semantics can be explained with the environmental model of evaluation. In this model, every procedure is running in a current environment. When variables are evaluated, they are looked up in the current environment or, if not found there, recursively in the current environment's enclosing environment. An interpreter typically initially evaluates expressions in the top-level, or global, environment,

36

CHAPTER 5. AN EXTENSION LANGUAGE FOR ROSETTA

which has no enclosing environment. After the subexpressions have been evaluated in the evaluation of a combination, a new environment is created, the subexpressions' values bound to variables representing the procedure's formal parameters, and the procedure evaluated in the newly created environment. The Scheme standard speci es that the top-level environment initially shall contain bindings for a set of standard procedures.

Lexical Scoping The value resulting from evaluation of the lambda expression in the example above contains, in addition to a procedure, the environment in which the expression was evaluated. This combination of a procedure and an environment is called a closure. The environment contained in a closure becomes the enclosing environment to the environment that holds the formal parameters for the procedure, as described above. Thus the semantics of a procedure is de ned in part by it's de ning environment. In Section 4.3 this was called lexical, or statical, scoping.

Typing Scheme is a dynamically typed language, so the procedures are themselves responsible for type checking.

First-class Procedures Closures have rst-class status in Scheme. Partly because of this, Scheme objects (including procedures) have unlimited extent. Unlimited extent is necessary because, for instance, closures can be bound to variables, and the closure environment contains variables bound to yet other objects. These objects can never be deleted, since the procedure object theoretically can \live" forever. This is only a part of the truth, as e.g. the semantics of assignment also must be taken into consideration.

First-class Continuations The continuation of an evaluation is the entire future of computation after the evaluation has been made. For instance, the continuation of an evaluation in the top-level environment is the printing of the value on the screen. In Scheme, continuations can be captured and saved as rst-class procedures via the standard procedure call-with-current-continuation (also called call/cc).

Tail Recursion The Scheme standard [IEE90] requires implementations to be properly tail recursive. An implementation is tail recursive if exhibits iterative control behavior for tail calls. A process exhibits iterative control behavior if it uses a bounded amount of memory for control/context

5.4. ADOPTING A LANGUAGE

37

information. A tail call is a procedure call that occurs in tail position. Tail positions are de ned inductively. The complete de nition will not be presented here, but the most important instance is (lambda * * )

In this expression,

is in a tail position. An example of an inductively

(if )

If this expression occurs in a tail position, then the expressions named are in tail position. For the complete de nition of tail position, see [KCE98]. Also, [FWH92] contains an excellent discussion about tail recursion and continuations.

External Representations The external representation of a Scheme object is a sequence of characters. The integer 42 has external representation \42", the list created by the expression (list 1 2 3), \(1 2 3), and the pair consisting of the integers 1 and 2, \(1 . 2)".

38

CHAPTER 5. AN EXTENSION LANGUAGE FOR ROSETTA

Chapter 6

Integration Having selected a suitable extension language for Rosetta, the next task is to de ne how the application and the interpreter should work together. Although an interpreter is to be embedded into Rosetta, we will henceforth take the view that there are two individual programs running in parallel, sharing certain memory objects. This view is not fundamentally di erent from reality, but makes it possible to talk about Rosetta and the Scheme interpreter as separate entities. The Scheme interpreter will sometimes be referred to as \the interpreter" or simply \Scheme". Sometimes, the term \Rosetta" will be used when the Rosetta GUI is meant. The distinction will be clear from the context. The task of integrating the interpreter and Rosetta, is a task of communication. There are two concerns, corresponding to communication in each direction: 1. Making it feasible to access Rosetta objects { algorithms such as reducers and structures such as decision tables { from Scheme programs. 2. Making it feasible to insert new objects de ned in Scheme programs into the project tree, and to change and possibly remove existing objects. The former is of prime importance; it is an absolute requirement for making an interpreter useful. Hence, the focus will be on this concern { although without disregarding the latter completely. The interaction should be sound within Scheme's semantics, that is, it should be possible to explain the presence of Rosetta in a semantically sound way. Therefore, we want to construct a conceptual model of how the interpreter should view it's surroundings. The next section contains an initial discussion on such a model.

6.1 Constructing a Conceptual Model When integrating the Scheme interpreter with other systems, we don't want the presence of the other elements to a ect the semantics of Scheme expressions. It is, therefore, of importance to be able to explain parts of the other systems' behavior in terms of the Scheme semantics. If 39

40

CHAPTER 6. INTEGRATION

we exhibit a model that fails to comply with this requirement, the semantics of the integrated interpreter will be broken; i.e. the system as seen from the interpreter does not respect the standard Scheme semantics. The evaluation of Scheme expressions can be described by very elegant and powerful models. I will try to capture this spirit when carefully constructing a conceptual view of the world from the interpreter's viewpoint. This kind of model have to describe the interaction between Scheme and \the rest of the world" in terms of Scheme's semantics. That is, the communication between the interpreter and Rosetta must be speci ed in such terms. The environment model of evaluation (Section 5.4.1) views the world as consisting of processes, objects, and environments. This view will also be taken in the construction of a conceptual model. Communication between processes takes place through environments. Although other methods of attack is possible, we will use the following proposition in constructing a conceptual model.

Proposition 6.1 All communication between Rosetta and Scheme will take place through

an environment.

The Rosetta GUI, like the Scheme interpreter, is an agent that operates on a collection of data objects. Thus, it makes sense to model the GUI as a process. Conceptually, the GUI is a process, just like any other process, e.g. the interpreter.

Proposition 6.2 Conceptually, the Rosetta GUI is a process. A complete conceptual model of the world from the interpreter's viewpoint will be proposed in Sections 6.4 and 6.5. First, we will discuss some of the issues that must be taken into consideration when constructing a model.

6.2 Considerations Making the Rosetta objects available for the Scheme user can be done in a number of ways. For instance, we are not restricted to Scheme's model of evaluation; we can de ne a whole new model. We will, however, only discuss models of evaluation that are identical with or almost identical with the environment model of evaluation. Even with this constraint, there are still many factors involved and the number of potential alternatives are large. Here, we discuss four of these factors. Some stressing of the terminology used here is in place. Rosetta objects are data objects de ned by the Rosetta class library. These are not necessarily created by the GUI, as will be shown below. Scheme objects are objects of native Scheme data types [KCE98], de ned by the interpreter framework and created in the interpreter.

6.2. CONSIDERATIONS

41

6.2.1 Visibility in the global namespace The language constructions that allow Scheme expression to gain access to Rosetta objects must be established. The main question to be answered is: should names representing Rosetta objects be available in Scheme; i.e should there exist bindings between variables and Rosetta objects in the global (top-level) environment? For example, the data structure object \training" in Figure 3.1 could be referenced like this: (define x training)

In this scheme, the Rosetta objects are { conceptually { bound to names1 in the global Rosetta environment. The above scheme is not the only alternative to object referencing. Other alternatives include new primitive functions and/or new special syntactic forms. As an illustration, (define x (get-gui-object "training"))

uses a primitive function to achieve the same operation as above. The function get-gui-object takes a string as an argument and evaluates to a Rosetta object that presumably matches the description in the string. Note that in this scheme, no new names are bound in any environment. A widely recognized general goal when constructing programs that manipulates programs is to promote reusabillity. One potential problem with the rst alternative above is the lack of name hiding. This scheme introduces the possibility of name collisions when a sequence of Scheme expressions creates bindings with names that coincides with names of Rosetta objects. If the Rosetta objects are bound in the same environment as where the new bindings are introduced, the Scheme program will fail. Similar scenarios are when the new bindings are introduced in a local environment { they will then overshadow the Rosetta object bindings { or when new Rosetta objects are created after the creation of global bindings with the same names { the global bindings will then either be overwritten or the creation will fail, depending on the semantics of object creation. Thus, a program that work in one context does not work, or have a di erent semantics, in another context, and is not very reusable.

6.2.2 Restrictions on manipulation Closely related to the question of visibility in the global namespace is how the manipulation of representations of Rosetta objects should be restricted. Particularly, should they have rst-class status? First-class status entails certain problems, especially for large objects. One problem is that Scheme passes all arguments by value, so that e.g. a recursive procedure taking an object as parameter would induce a considerable overhead caused by copying actual to formal parameters. Some of the possible solutions to this problem are: The actual names are not important; they can be the actual names as they appear in the GUI or any other easily deducible names. 1

CHAPTER 6. INTEGRATION

42

1. Pass-by-reference. A new syntactic form with pass-by-reference semantics could be introduced. 2. Lazy evaluation. A new syntactic form with lazy semantics could be introduced, so that no arguments would be evaluated until needed. 3. Pointers. A new pointer data type could be introduced to Scheme, making it possible to \manually" take references. Actually, it is possible to achieve the desired kind of behavior without introducing new forms or types. An existing Scheme type already exhibits this behavior; namely the pair. Pairs are in fact just double pointers. An example of a scheme where Rosetta objects neither are bound in a Scheme environment nor have rst-class status is: (eval ``RSESGeneticReducer'' my-decisiontable)

In this scheme, the Rosetta algorithm objects are both referenced and used through a special form (eval : : :). An algorithm object can for example never be bound to a variable. my-decisiontable is an expression that evaluates to objects that are to be passed to the algorithm. Although the focus here is on algorithm representation, these objects can also have an arbitrary representation ( rst-class, strings, etc.).

6.2.3 Scheme-created objects in the GUI If Scheme is allowed to create new Rosetta objects, should they be automatically visible in the GUI? If so; always? If not; how? Clearly, it is undesirable that new objects always should automatically be made visible in the GUI. Consider the task of classifying the elements of a decision table consisting of, say, 1.000 elements. This can be accomplished by creating a classi er and iterating over the elements2 . Typically, each iteration will create one object for the element and one for the classi cation value. Presumably, the user is not interested in 2.000 new objects in the project tree after executing this simple operation. It seems reasonable to allow the user to specify if a certain object should be visible in the GUI. Thus, we are dealing with two kinds of Rosetta objects; those created in the GUI or in Scheme and made visible, and those who only are available through the Scheme interpreter. User-speci ed visibility introduces the problem of structural relationships. An arbitrary object can only be made visible if the object's parent is visible. The mechanism for specifying visibility have to take this into account.

6.2.4 Where data should live In a scheme that allows Rosetta objects to be represented as Scheme values, e.g. bound to variables, a new question arises. How should the objects be represented; by a Scheme 2

Incidently, this can also be accomplished by using the Batchclassi er algorithm.

6.2. CONSIDERATIONS

43

construction, by the native Rosetta representation, or both? When discussing the alternatives, graphical gures will be used to illustrate the computer memory used by Rosetta and by Scheme to represent objects. Figure 6.1 illustrates a scheme where no objects are shared between the Rosetta GUI and the Scheme interpreter, i.e. just a plain Scheme interpreter running in parallel with Rosetta without interaction. The gure is a Venn diagram where

Figure 6.1: Venn diagram representing Rosetta objects (circular points) and Scheme objects (square points) on the heap. The objects inside the dark circle is available in the Rosetta GUI, those inside the light circle is available in the Scheme interpreter. None of the objects are shared. the universal set, represented by a rectangle, is the heap (i.e. the set of memory locations), the circles represent the set of objects available in the Rosetta GUI (dark) and by the Scheme interpreter (light). Even though Scheme is embedded in the Rosetta application, we distinguish between objects that are instances of classes from the Rosetta class library { Rosetta objects (round points) and those who are Scheme values { Scheme objects (square points). The question of internal representation of objects is not only an implementation issue, but also have considerable in uence on the semantics we want to de ne. Consider an arbitrary decisiontable. This data structure could e.g. be represented in Scheme by the list ( (a b c) (1 2 3) (1 2 1) (2 2 3) (2 3 3) (3 5 1) )

i.e. an ordinary list which can be manipulated like any other list. For instance, an expression like (car (car mydt)) could be used to obtain the name of the rst attribute. A scheme like this is fundamentally di erent from one where the internal representation is a native Rosetta object. In the latter case, the external representation (Section 5.4.1) could be identical with external representation for the list, but the semantics would of course be di erent since the objects would not be lists. A scheme where Rosetta objects created in the interpreter are internally represented by Scheme structures is illustrated in Figure 6.2. Conversely, all the Rosetta objects can be represented in their native format, as shown in Figure 6.3. Disregarding which format is used for the objects, a mapping must be provided to \the other" representation. If Rosetta objects are represented as Scheme structures, a mapping into native Rosetta objects must be provided if these objects are to be referenced from the GUI. Conversely, a mapping from Rosetta objects to Scheme structures must be provided

44

CHAPTER 6. INTEGRATION

Figure 6.2: Rosetta objects and Scheme objects, using Scheme structures to represent the former that are visible inside Scheme.

Figure 6.3: Rosetta objects and Scheme objects, using native representation for the former inside Scheme. if the objects are represented in the native format. Actually, the di erences between the two representation schemes can be abstracted away by the use of mapping functions. E.g. all the list manipulating procedures could be extended to operate on native Rosetta objects, and semantically treat them as lists, blurring the distinction between the two alternatives. However, the latter is not crucial. Rosetta objects are mainly used to be passed back to other Rosetta objects (i.e. algorithms), and it is not very important that they can be attacked with Scheme's data manipulating functions. The important point is to establish a conceptual model of what data is, independent of how it is represented.

6.3 Algorithms and Structures Before proposing a object representation and a model of evaluation in the next sections, some of the more intuitive alternatives in the light of the aspects introduced in the previous section are outlined. It is important to recognize that the Rosetta objects are divided into algorithms and structures, and that the representations and reference mechanisms for these need not be the same. Rosetta algorithm objects are descriptions procedures. To the GUI user, these objects are abstractions of general procedures. Procedural abstractions cause procedures, and procedures cause processes. What kinds of abstractions should be used to represent these procedures in Scheme? Rosetta data structures objects are abstractions of collections of data. Scheme has powerful means of data abstractions. However, as stressed by [AS96], it is not so clear what data \is" { in fact data need not be represented by data structures at all. More will be

6.3. ALGORITHMS AND STRUCTURES

45

said about this in Section 6.4.

6.3.1 Procedural Abstractions: Algorithms Intuitively, it does not seem like a good idea to give algorithm objects rst-class status or create bindings for them in the global environment. One of the reasons is, as discussed in Section 6.2.2, that Scheme passes arguments by value, so that an operation that involves passing an algorithm as argument would induce a considerable overhead caused by copying actual to formal parameters. Also, if there where bindings for the algorithms in the global environment, the bindings for the algorithm names could be changed to arbitrary values and types: (set! RSESGeneticReducer ``Penguin'')

Algorithms should not be changed at all, and certainly not be changed to Scheme types like strings. The indicated problems seems to be solved by introducing a new special form for evaluation of the application of Rosetta algorithms. We saw this construction in an example in Section 6.2.2: (eval +)

In evaluating this special form, the rst argument must evaluate to a string that matches the name of an algorithm and the rest of the arguments are evaluated and passed as arguments to the indicated algorithm. Syntactic sugar for the above construct could be ([] +)

e.g. ([``RSESGeneticReducer''] my-decisiontable)

This scheme thus represents algorithm objects on the \Rosetta side", see Figure 6.3. Other issues regarding algorithm objects must also be settled. One is how to pass arguments to the algorithms. Actually, there are two kinds of arguments. It is important to realize that Rosetta objects have state. This is transparent to the user of the GUI. When the user applies an algorithm, (s)he enters arguments in a dialog box. Behind the scenes, the dialog box alters the state of the algorithm object before \running" the algorithm. One kind of arguments are values such as numerical thresholds, examples of the other kind are structure objects representing rules to be used by a classi er. In the class library, the distinction is made by setting the rst kinds of arguments by means of a method accepting textual representation of the values, and setting the second kinds by means of unique methods for each argument. Here, the distinction between arguments and other state changing methods becomes somewhat blurred. Methods of setting each kind of arguments must be determined.

CHAPTER 6. INTEGRATION

46

The problem with this intuitive solution is, of course, the same thing that motivated it; non- rst-class status.

6.3.2 Data Abstraction: Structures Data structures represents the intermediate results in the knowledge discovery process (Figure 2.1). This process \is an iterated waterfall cycle with possible backtracking on the individual substeps" [KSS97]. Clearly, few restrictions on manipulation of structures is desired. First, iterations and backtracking in the process requires the intermediate results to be saved for future reference. The most convenient way to achieve this is to allow variables to be bound to structures. Second, the results from one substep in the process is the input to another. Therefore, structures must be allowed to be returned as values of and passed as arguments for procedures. Third, it would be convenient to be able to include structures in native data structures3, e.g. a list of decision tables. Structures thus stands out as candidates for rst-class status. If structures have rst-class status, it is natural that the structures de ned in the GUI are automatically named by variables instead of providing a get-gui-object primitive (Section 6.2.1). However, introducing bindings for the GUI structures in the global environment complicates the conceptual model of interaction between the Scheme interpreter and the Rosetta GUI.

GUI structures bound in the global environment One problem with global bindings was pointed out in Section 6.2.1. Consider evaluating the expression (define dt 42)

entered directly in the interpreter or read from a le, and evaluated in the global environment. The evaluation would fail if the name dt previously was introduced by the creation of an object in the GUI. More interesting is the impact such bindings have on our conceptual model. Employing the environmental model of evaluation implicates that a process has access to all bindings in it's evaluating environment and the latter's enclosing environments. This causes no problems from the interpreter's point of view { conceptually it is a process running in the global environment, and can access all bindings including those de ned in the GUI. However, the GUI process must be modeled as one of two di erent concepts: 1. A process running in an environment that is enclosed by the global environment, Figure 6.4. The process can then access all GUI objects, but cannot create new objects. 2. A process running in the global environment, Figure 6.5. The process can access all objects, and create new objects. However, this environment is not only inhabited by GUI

I.e. Scheme's built-in structures. The use of the term structure in this chapter refers to Rosetta structure objects, unless otherwise noted. 3

6.4. OBJECTS AS CLOSURES

47

objects, but also by other Rosetta objects created in the interpreter and by Scheme objects. If the Rosetta GUI is to be conceptually modeled as a Scheme process, the most intuitive semantics of the project tree is a visual representation of the environment. With the process running in the global environment, this semantics is broken because not all of the objects in the environment is available for the GUI user. The conceptual model would be much nicer if we can de ne the semantics of exploring the structures in the project tree as looking up variables in an environment. global env

training: testing: myrules: x:

(interpreter code...)

GUI env

(gui-proc code...)

Figure 6.4: The GUI modeled as a process running in an environment that is enclosed by the global environment. global env

training: testing: myrules: x:

(interpreter code...) (gui-proc code...)

Figure 6.5: The GUI modeled as a process running in the global environment. The rst concept must be rejected, and the second is not very elegant. Also, keep in mind the problems with rst-class status mentioned in Section 6.2.2. In the next section, a conceptual model of algorithm and structure objects that remedies most of the de ciencies mentioned is proposed.

6.4 Objects as Closures In this section, a representation of Rosetta objects inside Scheme is proposed . First, a general object model for Scheme is presented. Then, conceptual representation models for Rosetta structure and algorithm objects are proposed. The discussion in the previous sections concluded that, at least, structure objects should have rst-class status. Unfortunately,

CHAPTER 6. INTEGRATION

48

the discussion also presented some undesired properties of a naive representation. The model proposed here { in addition to being an elegant conceptual model { remedies these problems. Recall that, although it is decided below that the objects will be implemented as Rosetta objects rather than compound Scheme objects (they will be implemented on \the Rosetta side", Figure 6.3), a conceptual model specifying the interpreter's view of these objects in Scheme terms must be found. Data abstraction is a technique that separates the means of access to a data object from it's implementation (see Section 4.1.1). If fortunate enough to have been exposed to [AS96], one may have, like the author, felt a jolt of excitement when exposed to the fact that data abstractions { de ned by the behavior of selectors and constructors { can be implemented without data structures at all but only with procedures. In the excellent textbook, this striking fact is illustrated with the alternative de nition of the pair selectors and constructors in Figure 6.6. The value of this example is twofold. First, it illustrates the power of data (define (cons x y) (define (dispatch m) (cond ((= m 0) x) ((= m 1) y) (else (error ``Argument not 0 or 1 -- CONS'' m)))) dispatch) (define (car z) (z 0)) (define (cdr z) (z 1))

Figure 6.6: Alternative de nition of pairs [AS96]. abstraction; that the interface to the data can be abstracted away from the implementation by an abstraction barrier. Second, it shows that procedural representation of data can be used to explore a style of programming called message passing. Message passing is heavily used in describing data abstraction models for object-oriented programming languages. Most object-oriented programming languages is based on the classical object model presented in Section 4.5.3. Applying a method can be seen as telling the object the name of the method to be applied to it's state. In other words, an object can be viewed as encapsulated in a procedure that maps the names of the methods into procedure calls. This is exactly what message passing is. The message passing programming style induces a natural way to represent objects in Scheme: as lexical closures. Observe that closures coincides with the classical object model in that they have local data and shared procedures. To represent objects as closures, all we need to do is to create a dispatching function that will map between messages and methods. Indeed, a dispatching function is present in the code in Figure 6.6. In Figure 6.7 a constructor for complex numbers (a recurring example from [AS96]) object is shown. The result of evaluating (define x (make-complex 3 7)) is shown in Figure 6.8. The procedure make-complex has created a closure that contains a procedure that accepts a message. In Scheme, this closure object can be manipulated like any other datum, due to rst-class status.

6.4. OBJECTS AS CLOSURES

49

(define (make-complex real imag) (lambda (message) (cond ((eq? message 'get-real) real) ((eq? message 'get-imag) imag) ((eq? message 'set-real!) (lambda (new-real) (set! real new-real))) ((eq? message 'set-imag!) (lambda (new-imag) (set! real new-imag))))))

Figure 6.7: Complex number object constructor.

global env

make-complex: x:

E1: real: 3 imag: 7

parameters: real, imag body: (lambda (message) ... parameters: message body: (cond (...

Figure 6.8: After evaluating the expression (define

x (make-complex 3 7)).

CHAPTER 6. INTEGRATION

50

The advantage of implementing objects as closures in our case is twofold: 1. It provides an elegant, powerfull and well-known conceptual model. 2. It solves the problems related to argument passing by value. In Section 6.2.2 it was hinted that these problem would be solved if the objects were represented as pointers to the real data. In fact, this is exactly the case with closures. Since a closure is a pair of pointers just like pair type value, creating a copy of a closure comprises creating a new pair and copying the pointers. Compared to copying a large datastructure, the decrease in overhead introduced by using closures is evident. The latter point also introduces some important semantic properties of objects as closures. Since an object evaluates to a set of pointers, it is not possible to make a copy of an object variable in the usual way. Consider the following evaluations: (define x (make-complex 3 7)) => x (define y x) => y (y 'get-real) => 3 ((x 'set-real!) 5) => undefined (y 'get-real) => 5

In the example, the value that the variable y is bound to appears to be changed when the variable x is mutated. In reality, the value of y is not changed, it is still the same pair of pointers that constitutes the closure. What have changed, is the binding for the variable real in the environment pointed to by the closure4 . This behavior illustrates a completely di erent semantics of this model compared to a model where Rosetta objects were implemented by compound native Scheme data structures. The question arises as to whether this scheme is in accordance with Scheme's philosophy and emphasis on functional programming style, given the { apparent { side e ects of message passing illustrated above. In the author's opinion it is, at least to the degree that the imperative features of the language are. The potential problems is not introduced by using closures to represent objects, but is inherited from the introduction of assignment to Scheme.5 Strictly speaking, by the same argument x is not changed either, only the environment it points to. Nevertheless the tradition is to say that x is mutated. 5 Closures is not the only constructions that show this kind of behavior. Lists are also represented as pairs of pointers. Consider the evaluations 4

(define x (list 1 2)) => x (define y x) => y (car y) => 1 (set-car! x 3) => undefined (car y) => 3

which illustrates the same thing, namely that a value pointed to by more than one pointer is changed. Thus, this concept should not be unfamiliar to Scheme programmers.

6.4. OBJECTS AS CLOSURES

51

In the following, it is shown how objects as closures can be used as a part of a conceptual model for the representation of Rosetta objects in Scheme.

6.4.1 Structures As seen above, objects as closures is a sound representation of general objects in a lexical scoped language. In the following, a proposal on the representation of Rosetta structure objects inside Scheme is given. A similar proposal regarding algorithm objects are given in the next subsection. The representation presented here solves the problems indicated in the previous sections.

Proposition 6.3 Rosetta structure objects will be represented as closures. The interface to the closure objects will be a re ection of the C++ interface to the native objects, i.e. the protocol will be a set of messages that each correspond to a public member function. The general object model exempli ed above exhibits currying. Currying is an approach to representing functions of multiple arguments where a function takes one argument and returns a new function that takes the rest of the (recursively curried) arguments. The semantics of the closures representing Rosetta structure objects, however, will not be de ned to exhibit currying. A non-curried semantics will simplify the syntax needed to apply methods; compare (mydt 'createinformationvector 6 #t) to (((mydt 'createinformationvector) 6) #t)6. A second reason is purely pragmatic; the implementation of curried semantics would require more of the limited time resources available for the project. Instead of using currying, the procedures representing the objects (the dispatching functions) will take n + 1 arguments for each n argument method. The rst argument are interpreted as the message and must satisfy the symbol? predicate. The values of the rest of the arguments are passed on to the member function after a suitable mapping from Scheme objects to formal parameter C++ types. In the case that the rst argument fails to be of the correct type or the message is not in the object's protocol, or any of the type mappings fail, an error is signaled and the value of the evaluation is unde ned. The protocol for each object type must be documented and available for the Rosetta user. The model is illustrated by the following example: (define vector (dt 'createinformationvector 9 #t))

Here, dt is an object that is bound to a closure representing a Rosetta object that is an instance of the class DecisionTable. As illustrated, evaluating objects can create new objects. But observe that we have not talked about how the initial objects have been made available in the rst place. This will be the issue of Section 6.5. The lack of currying represents no noticeable loss of expressiveness. If an abstraction of the method createinformationvector above { as would be obtained by evaluating (define x (dt 'createinformationvector)) in a curried semantics { is needed, this can be achieved by using the procedure make-curry de ned in Figure 6.9. We must, however, consider noncurried semantics in light of a conceptual model. Procedures exhibiting this kind of semantics 6

Alternatively, and more common, currying could be exhibited by the rst parameter { the message { only.

CHAPTER 6. INTEGRATION

52

(define (make-curry fun noargs args) (if (= noargs 0) (apply fun args) (lambda (arg) (make-curry fun (- noargs 1) (append args (list arg))))))

Figure 6.9: make-curry creates a curried version of a procedure. behave di erently according to the value of the rst argument (the message). Particularly, the number of arguments taken by the procedure di ers. Scheme semantics only allow one value to be associated with each name7, so the only option is to conceptually view these procedures as taking an unbounded number of arguments. Scheme procedures can be declared as taking either a speci ed nite number of arguments, or an arbitrary, unbounded number. In the latter case, the procedure actually receives a list containing the actual arguments, so the number of arguments is arbitrary. A new question then arises; what is the semantics of combinations involving a di erent object closures with an \illegal" number of arguments? A combination like this must be legal from the interpreter's view point, because the procedures can not be conceptually modeled as taking a speci ed number of arguments since the number of arguments vary with the value of the rst. Thus, these procedures must accept any number of arguments. The solution to this problem is to say that the procedures signals an error in the case of wrong number of arguments, by evaluating an expression that signals an error. Since the value of an expression that evaluates subexpressions that signals errors, is an error value, this makes it possible for a procedure to \manually" signal an error. Although this model of operation is not as elegant as a curried version, it is sound within Scheme's semantics. Using this model, it makes sense to internally represent Rosetta objects in their native format, see Figure 6.3. A representation like this makes it easy to implement the proposed model; all that is needed is a function that maps between messages and methods.

Proposition 6.4 Rosetta structure objects will be internally represented in their native

format.

6.4.2 Algorithms Here, a representation of Rosetta algorithm objects inside Scheme is proposed. Really how necessary is the distinction between structure and algorithm objects? When seen from the perspective of a user of the class library, the algorithm objects actually have rst-class status like the structure objects. When seen from the perspective of a GUI user, both kinds of objects also have similar representations. Also, as discussed in Section 6.2.2, algorithm objects have state. The motivation for non- rst-class status was the overhead incurred by copying presumably large objects during e.g. iterations. Using closures solved this problem for structure objects. There is then no evident reasons not to use the same representation for algorithm objects. In fact, doing so can be seen as following Rosetta's design principles to a certain degree. 7

Contrary to languages that allow function overloading.

6.5. THE UNIVERSAL ENVIRONMENT

53

Therefore, Rosetta algorithm objects will be represented as closures, similarly to structure objects. The representation will be identical, and the comments regarding arguments also apply to algorithm objects as well as to structure objects.

Proposition 6.5 Rosetta algorithm objects will be represented as closures. One consequence of this decision is that the semantics of combinations involving algorithm objects are somewhat untraditional. An algorithm object is not conceptually an algorithm, but an agent that can respond to messages in various ways. The object can not be applied to arguments directly, as would be the case for a \conventional" procedure; e.g. (+ 1 2 3). Instead, the combination must pass a message indicating that the algorithm should be \run": (batchclassifier 'apply dt)

In this example, batchclassifier corresponding to the BatchClassi er algorithm is applied to a decisiontable bound to the variable dt by calling the algorithm object's Apply method. The proposed model also answers the questions regarding argument passing discussed in Section 6.3.1. Simple algorithm arguments can be set by evaluating e.g. (majorityvoter 'setparameter "TRESHOLD=0.7")

Here, the parameter string is passed to the class library directly for parsing. The second kind of arguments can be set by their respective methods: (majorityvoter 'setrules myrules)

Of course, the only real alternative to internal representation of algorithm objects is as native objects.

Proposition 6.6 Rosetta algorithm objects will be internally represented in their native

format.

As a nal example, the use of structures as objects and algorithms as objects are combined: (define classified (majorityvoter 'apply (mydt 'createinformationvector 6 #t)))

6.5 The Universal Environment In the previous section, a conceptual object model was proposed. This model answered many of the questions posed in Sections 6.2 and 6.3, particularly regarding restrictions on manipulation and where data should \live". However, the questions regarding naming, namespace, global environments and accessibility of Scheme-created Rosetta objects in the GUI are not

CHAPTER 6. INTEGRATION

54

answered by the object model. In this section, the construction of a conceptual model is continued by proposing a model of evaluation that will do just that. In Section 5.4.1 Scheme's environment model of evaluation was presented. This model helps explain the semantics of expressions in a easy and intuitive, yet powerful, way. Since the Scheme's overall semantics otherwise would have to be described using a new model, the environment model is choosed as the model of evaluation when interfacing Scheme with Rosetta.

Proposition 6.7 The conceptual model will be explained in terms of the environmental model

of evaluation.

Prop. 6.1 states that Scheme will gain access to Rosetta GUI objects through an environment, but in Section 6.3.2 showed that none of the possible conceptual models able to explain the presence of such objects in the global environment were satisfactory. A natural question then is; do the objects need to be bound in the global environment? Obviously, they can not be bound in an environment enclosed by the global, in which they would be inaccessible from the interpreter running in the global environment. But what if they were bound in an environment that encloses the global environment, that is, what if the global environment was not the top-level environment? This line of reasoning leads to the de nition of a universal environment.

De nition 6.1 In the conceptual model, the universal environment is the sole environment that encloses the global environment.

Proposition 6.8 In the conceptual model, Rosetta GUI objects are bound in the universal environment.

Although the global environment is enclosed by another, it is still \global" in the sense that this is where the Scheme interpreter runs. These names are just a matter of convenience, the new top-level environment could as well has been called \global" and the interpreter environment something else, all without a ecting the environment model of evaluation. Now, the GUI can be modeled as a process running in the top-level environment.

Proposition 6.9 In the conceptual model, the Rosetta GUI is a process that runs in the universal environment.

The model is illustrated in Figure 6.10. Observe that this model solves the problems induced by modeling the GUI as a process running in the global environment, Figure 6.3.2. The project tree as an environment methaphore is sound, since the only variables that can be bound in the universal environment are those introduced by the GUI. Moreover, all the Rosetta GUI objects are visible in the interpreter's global8 namespace, that is, they are accessible from the interpreter. Also, the Rosetta and Scheme objects bound in the global environment are inaccessible to the GUI, nor can the GUI bind new variables there. The proposed model is not an extension of the environment model of evaluation, it merely describes the presence of Rosetta in terms of this model. Scheme's semantics are equally 8

Or maybe it should be \universal"?

6.5. THE UNIVERSAL ENVIRONMENT universal env

55

training: testing: myrules:

(gui-proc code...)

global env

x:

(interpreter code...)

Figure 6.10: The universal environment. well described by the environmental model even though the interpreter does not run in the top-level environment. Thus, the proposed conceptual model retains Scheme's semantics. A few more issues needs to be settled before the model is complete. One is regarding naming of variables, another creation of new universal bindings from the interpreter.

6.5.1 Naming Since, the GUI objects will be available as variable bindings, the names of the variables must be decided. The objects already have names, see Figure 3.1. A naming scheme for the universal environment is not restricted to use names identical with the GUI names. The only requirement is that it uses an easy deducible one-to-one mapping. In most Scheme implementations variable names are case insensitive, while object names in the GUI are not. Also, the GUI allows distinct objects to have identical names. Two options are then available for a naming scheme: 1. Use variable names identical with the GUI names and change the naming rules in Rosetta to disallow identical (case insensitive) names. 2. Use variable names that allows the user to discern between objects with identical GUI names. Consider Figure 3.1 for an example illustrating the latter. The object named \MyRules" in the project tree, could be bound to the variable breastcancer::discr::training::myreducts::myrules. Here, \::" is not an operator, it is a part of the variable name. This illustrates the freedom to choose a variable naming scheme, as long as the names are easily deducible. Still, there is one more problem if this scheme is used; objects with identical names on the \same level". If \MyReducts" had another child object that was also called \MyRules", the scheme above would not give unique names to the two corresponding variables. This could be dealt with by requiring names on the same level to di er, and if they don't only bind one of the objects to a variable.

CHAPTER 6. INTEGRATION

56

Naming considerations is really an implementation issue, and is not important for the interface model. The above discussion illustrates some of the implementation alternatives.

6.5.2 Creating new bindings Up to this point, one question posed in Section 6.2 has not been addressed; how and when Rosetta objects created in the Scheme interpreter should be made registered in the project tree. In the light of the proposed model, this is the problem of how a process can create binding in an enclosing environment. Unfortunately, this problem cannot be solved within Scheme's semantics. Although a process can change the value of a variable bound in an environment that encloses the environment the process runs in, it cannot create new bindings. The ability to register new objects in the GUI is highly desirable, which leads us to the one point where the we must re ne Scheme's semantics to explain the conceptual model. The re nement is in the form of a new syntactic special form that creates bindings in the top-level environment, like (define-universal name val)

The evaluation of this special form will create a binding in the universal environment for the value that val evaluates to { and correspondently show the graphical representation in the project tree. Although a new syntactic form with new semantics have been introduced, this is not very dramatic for the soundness of the proposed conceptual model. Scheme' s semantics have not been changed, only extended. The meaning of every regular Scheme expression is still the same, the only thing new is a new syntactic form9. The new syntactic form gives the Scheme user complete control over which objects to register in the GUI. It was stated above that the object will be made visible in the project tree, but not where in that tree.

Structural Relationship In Section 6.2.3 the problem of structural relationships was pointed out. The new syntactic form o ers no solution to the problem; it does not say anything about the parent object. The problem is solved below by, instead of using the syntactic form directly, using it to explain the behavior of a procedure. Variable bindings in the universal environment will be created by the object that is the structural parent10. Each Rosetta structure object has a method and a corresponding This is exactly why special syntactic forms are needed; because the needed behavior cannot be explained by the semantics of the \regular" syntactic forms. 10 This concept only has a meaning in the project tree, structural relationships are not re ected by the conceptual model. 9

6.6. SUMMARY

57

message for creating a new binding and, as a side-e ect, establish a parent-child structural relationship. For example, (mydt 'appendchild myotherdt)

will create a new variable in the universal environment, assign to it the value of the (local, global or universal) variable myotherdt, and create a structural relationship between the GUI object corresponding to mydt (parent) and the new object (child). Naming of the new variables remains an open issue to be settled by an implementation. If mydt is not bound in the universal environment, the method has no e ect. With the indicated message, the special syntactic form can be disposed of. However, when trying to explain the behavior of the procedures with the message appendchild, the same semantical problems that lead to the introduction of the special form surfaces. Hence, the special form { instead of being used explicitly { is used to explain the behavior of the procedures. The special form will not be directly available to the user { since, in reality, it does not exist and is only used as a conceptual tool. This can be done by saying that the concrete syntax of the special form not will be disclosed. The current discussion illustrates explanation of the semantics of methods in the Rosetta class library in terms of Scheme semantics. It will not be tried to do the same for all objects and methods, nor claim that they can be explained by such means. The indicated appendchild11 message is in an exceptional position, because it is so closely related to the conceptual model.

6.6 Summary In the current chapter, the Rosetta GUI has been modeled in Scheme terms. Here, the results are collected. Props. 6.1 and 6.2 laid the foundations for a conceptual model by stating that the GUI is a process, and that all communication between the interpreter and GUI processes will occur through an environment. Then, a conceptual model was constructed by Props. 6.3 through 6.9. These are summarized in Prop. 6.10.

Proposition 6.10 (Conceptual Model) The following constitutes the conceptual model of Rosetta from the viewpoint of the Scheme interpreter:  Rosetta structure and algorithm objects will both be represented as closures.  Rosetta structure and algorithm objects will both be internally represented in their native format.

 The semantics of the model complies with the environmental model of evaluation. Actually, appendchild need not and will not be a one-to-one mapping with the Structure::AppendChild method in the class library. 11

58

CHAPTER 6. INTEGRATION

 GUI objects are bound in the universal environment.  The Rosetta GUI is a process running in the universal environment. Except for the new special form introduced in Section 6.5.2, the proposed model does not alter Scheme's overall semantics. Even with that special form, the semantics are only extended. Furthermore, a uniform representation of structures and algorithms is achieved. In this model, the Scheme interpreter can be seen as an interface to the class library.

Chapter 7

Scheme Interpreters A wide range of Scheme implementations already exists, and some of them are designed to be easily embedded in other applications. In this chapter, features of existing implementations are discussed, and a selection of implementations presented. The other alternative is also discussed; to create a new implementations from scratch. Finally, the alternatives are discussed, and a conclusion made.

7.1 Pragmatic Concerns A wide range of pragmatic issues must be taken into consideration when implementing a Scheme interpreter. Some of them are discussed in this section.

7.1.1 Garbage Collection The requirement that Scheme objects have unlimited extent (Section 5.4.1) seems incompatible with the fact that computers only have a nite amount of memory. To maintain the illusion of unlimited extent, objects are deleted if and only if it can be proved that they are inaccessible from any Scheme process. Inaccessible objects are called garbage, and the process of removing them is called garbage collection. An implementation must include a strategy for garbage collection. Theoretical foundations for a range of such strategies have been laid. It is a common misunderstanding that garbage collection is more inecient than explicit deallocation strategies. In fact, studies show that the former performs better.

7.1.2 Tail Recursion Scheme interpreter implementations must have a strategy for achieving proper tail recursion. The abstract machine architecture described below can help ful ll this requirement. See [FWH92] for a thorough discussion on related issues. 59

CHAPTER 7. SCHEME INTERPRETERS

60

7.1.3 Continuations First-class continuations introduce many subtleties, e.g. that a procedure can return many times and that the entire state of the computation must be saved. An important design parameter for a Scheme implementation is how continuations should be implemented.

7.1.4 Architecture Scheme implementations generally are built on one of two architectures; either traversing the (possibly simpli ed) syntax tree during evaluation, or translating the Scheme code to an intermediate form to be interpreted by a byte-code interpreter.

Naive Architecture In the naive architecture, the Scheme code is parsed and the generated syntax tree evaluated directly as described in Section 4.4.

Abstract Machine Architecture Direct evaluation of the abstract syntax tree is not very ecient. Most of the existing implementations are based on a abstract machine architecture. After the Scheme code is read and parsed, the syntax tree is translated to a semantic equivalent program in another language. This generated code is usually simple byte code. The byte code is then fed to an interpreter for the abstract machine. The abstract machine evaluates the program by interpreting the byte code. The abstract machine architecture has many advantages, especially when it comes to code optimization1 . One of the main advantage of using an abstract machine, however, is related to the implementation of proper tail recursion. An abstract machine is usually implemented as a stack machine. It it relatively easy to implement tail recursion while manipulating low-level machine code for a simple stack machine. Without an abstract machine, tail recursion must be implemented by means of the de ning language. This naive architecture places certain demands on the de ning language; it must itself be able to make tail recursive calls under constant control space - restricting the class of possible de ning language to those who include the low-level constructs to accomplish this. Examples of such constructs found in many languages are \jump" and \goto". The use of an abstract machine moves these restrictions from the interpreter's de ning language to the abstract machine's de ning language. In an architecture without an abstract machine, the de ning language can be viewed as the virtual machine. Another advantage of the abstract machine architecture, is that it lends itself to the implementation of continuations. When saving a continuation, the relevant aspects of the state of the machine must be taken into account. On a typical architecture, this means that the Translating parts of a program to semantically equivalent but more ecient code is traditionally, by abuse of language, called \code optimization" although the code is seldom made \optimal" in any sense. 1

7.2. EXISTING IMPLEMENTATIONS

61

processors registers and the runtime stack must be saved. Using an abstract stack machine makes the machine state space much more controllable.

7.2 Existing Implementations Due to Scheme's simple syntax, clean semantics and popularity in academic communities, a wide variety of Scheme implementations have surfaced over the years. Most of these are have been made available from academic institutions, such as Rice University and M.I.T., as tools for teaching and research in programming languages and arti cial intelligence. Such implementations are usually distributed with full source code, and can be used free of charge if the use ful lls certain conditions. Some commercial Scheme implementations are also available. In this section, the existing implementations that were found to be the most suitable ones to use as starting points for an extension language are presented. The evaluated implementations represents only a small fraction of the large Scheme ora. Implementations not presented here were considered obviously unsuitable. None of the implementations considered here are commercial. Generally, commercial implementations are not delivered with source code { and are thus unusable for embedding. In Chapter 6 we described a model for integrating the Rosetta object view with the Scheme environment model. Embedding an existing implementation in Rosetta also poses a set of additional requirements on the implementation. Before we take a look at the individual implementations, we will discuss the implementation issues that our model rises, summarize some of the implementations' common features and talk about requirements and desirable features.

7.2.1 Embedding an Existing Implementation An existing implementation can not just be plugged into Rosetta, but must be altered to re ect the model introduced in the previous chapter. There are two main issues to consider:

 Implementing the universal environment  Implementing the representation of Rosetta objects as closures The universal environment is conceptually uncomplicated. All that is needed, is to make sure that a new global environment is created which the interpreter's read-eval-print loop starts out in. The old global environment then becomes the universal environment. A lookup function must be implemented for the universal environment, following a decided naming policy. The universal environment lookup function, as well as lookup functions for the regular environments in case of Rosetta objects bound in these, must return a closure. Since Rosetta objects will be represented in their native format (Props. 6.4 and 6.6), a wrapper in the form of a message dispatching procedure around the objects must be implemented. Wrapper

CHAPTER 7. SCHEME INTERPRETERS

62

objects must be created when objects are looked up in the universal environment, and when new Rosetta objects are created. A last issue worth mentioning is:

 Rede nition of input/output functionality Most of the existing implementations are non-native Windows NT implementations, and use Posix style I/O streams (standard input, output and error). Input/output routines must be rewritten to suit the Windows NT and Rosetta MDI design.

7.2.2 Classi cation and Common Features Most implementations are supersets of standard documents, e.g. [IEE90, CE91, KCE98]. After discussing a classi cation of the set of available implementations, some of the most common extensions to the standard are discussed.

External Interface The Scheme implementations can be classi ed into two categories; those who provide means for low-level interfacing with other systems and those who do not:

 Embeddable systems  Stand-alone interpreters Embeddable systems are designed to be integrated within other software. They o er interfaces that includes the low-level functionality of an interpreter, such as creating environments, manipulating variable bindings, evaluating expressions, etc. Some of the embeddable systems are pure Scheme libraries, others are fully functional interpreters that also provide functionality for embedding. Stand-alone interpreters are implementations of Scheme interpreters that are not intended by design to be embedded in other applications. These include the full functionality of an interpreter, but have no external interface for accessing low-level functions.

Executable Memory Images Some implementations include the possibility of generating executable memory images. At an arbitrary point in a Scheme program evaluation, the execution can be \frozen" and saved to an executable le2 . When the executable is executed, the evaluation continues where left of.

Object System An object system encouraging object oriented programming is included in some of the implementations. Such systems incorporate new object types together with incapsulation and 2

Executable memory images can be thought of as saved \continuations".

7.2. EXISTING IMPLEMENTATIONS

63

inheritance into Scheme. Some embeddable systems also allow C++ objects to be represented and manipulated in Scheme's object system.

First-class Environments It is also common to represent environments explicitly as rst-class objects. Typically, primitives are included to capture environments, e.g. the current or the top-level. These environment data objects can have variables bound to them, be passed as arguments and returned as values like any other rst class object. The usefulness of this representation is that expressions can be evaluated in a chosen environment. A new special form is most often included to accomplish this, e.g. (eval (+ x y) savedenv)

7.2.3 Desirable Features The task of embedding an existing implementation in Rosetta poses some requirements on the particular implementation. Requirements on and desirable features of an implementation are:

       

Source code available Embeddable Well-designed Portable Non-con icting object representation Non-con icting environment representation Moderate complexity Acceptable license policy

Making Scheme and Rosetta communicate requires modi cation of the Scheme implementations' source code. For example, as described above, variable bindings in the universal environment must be handled as a particular case.3 In the previous section, a distinction between implementations with external interfaces and those without was made. It is a clearly desirable feature that an implementation has an external interface intended for embedding the implementation within other applications. The source code of an implementation without an external interface intended for embedding is inclined to be tightly coupled and will require alterations of the implementation's internals Theoretically, this could be achieved without altering the source code of a suciently modular implementation available in object code only. In practice, such implementations does not exist. 3

CHAPTER 7. SCHEME INTERPRETERS

64

to a higher degree than that of an implementation with an external interface. Obviously, an implementation that can be integrated with Rosetta by de ning the interface between Rosetta and the implementation via a well-de ned interface will be preferred over an implementation that requires extensive inspection of the internal source code. In practice, a hybrid between these two hypothetical implementations is most likely. Independent of the above discussion on interface, it is desirable that sound software engineering techniques have been exploited in an implementation, simplifying extending. A high degree of seamless integration with rest of Rosetta is desirable; an absolute requirement is thus that the de ning language is C or C++. As discussed in Section 7.2.1, an abstraction layer incapsulating the functionality in the Scheme implementation must be constructed. A C++ implementation will help achieving a seamless integration, lessen the work needed to construct an abstraction layer, and will be highly preferred over a C implementation. Portability is important in two aspects. First, it is of prime importance that the implementation is portable to { if not already available for { Rosetta's primary platform (Windows NT). Second, portability in general is desirable to comply with Rosetta's design philosophy. Thus, we are looking for an implementation that is portable by design and preferably that has been ported to Windows NT. As described under Classi cation and Common Features above, many implementations have extended Scheme's standard type system with an object system. Integrating the object model from Chapter 6 with Scheme requires that the implementation's native object system is compatible with the model. An interpreter with two object systems, one for \Rosetta objects" and one for other objects, would indeed be a poor design. The importance is that an implementation does not contain a con icting object system. Generally, as described below, we are looking for a small, simple implementation that does not extend the current language standard with complex features. In fact, representing objects as closures does not introduce a new object system. This model does not extend Scheme's standard type system { it only conceptually represents objects as an existing type. Therefore, an implementation without an object system is desirable. Similarly, a possible explicit environment representation must not con ict with our model. Generally, an implementation without explicit representation of environments would be preferred. While considering possible implementations, it is important to have in mind that an extension to Rosetta is being sought. In some applications, an extension language have been a part of the design from the start. Some applications are even by and large written in their own extension language, reducing the core application to an interpreter for the extension language with the functionality of the application written in the extension language on top4. Although undoubtedly borne in mind through the design of Rosetta, an extension language have not been a part of the design philosophy. In addition, it is important to remember the motivation for embedding a language in the rst place. The motivation was not to include a complex general purpose programming environment, but to make available means for expressing patterns of use and for rapid access to the Rosetta kernel. Therefore, an extension language for Rosetta should be considered an add-on component and should be included within the existing design philosophy. The extension should not dominate the application in 4

GNU Emacs is an example of this design philosophy.

7.2. EXISTING IMPLEMENTATIONS

65

any way. Set apart from design, an extension should not dominate the size nor the speed of Rosetta. A small, simple and ecient implementation without excessive features is thus desirable. Particularly, a feature such as executable memory images would be an overshoot.

7.2.4 Implementation Summary A summary of the implementations that have been evaluated is shown in Table 7.1. Compliance tells which, if any, standard document the implementations are comply with. Platforms indicate which platforms the they are known to have been ported to (not necessarily those it can be ported to). Type is either embeddable or stand-alone, referring to the classi cation described above. Name Compliance Platforms Elk [CE91] Unix Guile Unknown Unix libscheme unknown Unix, Windows NT MzScheme [CE91] Unix, Windows NT, MacOS RScheme unknown Unix Scheme 48 [CE91] Unix, Windows NT SCM [KCE98] Unix, MacOS, others VSCM [CE91] Unix

Type Obj. Sys. Expl. Env's p p Embeddable Embeddable Embeddable p Embeddable p Stand-alone Stand-alone Stand-alone Stand-alone

Table 7.1: Existing Implementation Summary

7.2.5 Elk Elk [LB94] (Extension Language Kit) is a Scheme system designed to be embedded in applications. The motivation behind the design of Elk is a recognition of the need for a general purpose extension language system that can easily be embedded, together with the growing popularity of Lisp like languages. As such Elk includes a library interface for easy embedding, and many other features that makes it suitable as an extension language. However, Elk is a rather large and complex implementation. It includes e.g. executable memory images, a clearly undesired feature. A perhaps more serious problem is that when an application is integrated with Elk, the control in the resulting executable rests within Elk. That is, the original application must be started from the interpreter. Another possible problem is Elk's implementation of continuations, which is not strictly portable.

7.2.6 Guile Similarly to Elk, Free Software Foundation's (FSF) Guile is designed to be used as an extension language. Guile is a response to the increasing application of \cryptic" poor-designed languages such as Tcl. The motivation behind Guile is that a common general purpose extension language that can be embedded in all kinds of applications is needed.

CHAPTER 7. SCHEME INTERPRETERS

66

At the present time, Guile is in the development stage. Many new ideas and concepts have been implemented, and even more have been suggested. An interesting development e ort is the construction of language interpreters for other languages such as Tcl and Perl on top of Guile. Guile seems to be a very promising candidate for a standard extension language, and is intended to be FSF's ocial extension language.

7.2.7 libscheme libscheme [Ben94] is a simplistic C library that implements the functionality of a Scheme interpreter. The interface is through a single C header le. libscheme is a stripped-down implementation of the core Scheme features, without any extensions. Actually, proper tail recursion has not been implemented, making the implementation extremely small and simple. The external interface o ers functions to access the several parts of the interpreter, manipulate environments, add new primitive functions, add new syntax and special forms, and add new types. libscheme uses Hans Boehm's popular conservative garbage collector gc [BW88]. The library is distributed for Unix operating systems, but is portable to any platform that gc can be ported to, e.g. Windows NT. libscheme is not based on a abstract machine architecture and does not make any notable code optimizations, and is thus not very fast. Due to it's simplicity, the implementation is very small compared to the others.

7.2.8 MzScheme PLT MzScheme [Fla97b] is one of many Scheme related software packages produced by the Programming Languages Group at Rice University. MzScheme is a high-end implementation that includes numerous extensions to [CE91], and is readily available on a number of platforms. The external interface is roughly identical to, and compatible with, libscheme. A object system that includes new primitive data types for objects and classes is included, and will thus con ict with the model proposed in Chapter 6. The interface allows C++ classes to be installed as classes in the object system. Other major extensions are:

 Pre-emptive threads  Structure data type  First-class compilation units  Regular expression matching  Simple TCP communications

7.3. IMPLEMENTING FROM SCRATCH

67

As a consequence of the extensive set of features, MzScheme is a fairly large and complex implementation. It is unusually well documented; see [Fla97b, Fla97a, Fla97c]. MzScheme is written in C.

7.2.9 RScheme RScheme is a stand-alone abstract machine based Scheme implementation that includes an object system, a module system and thread support. It is designed to be very portable, but su ers from almost a complete lack of documentation.

7.2.10 Scheme 48 The design philosophy behind Scheme 48 [RK95], is an ecient, straightforward and correct implementation of R4 RS. It is based on an abstract machine architecture, contains few nonstandard features, and no embedding functionality.

7.2.11 SCM SCM is a plain stand-alone Scheme interpreter. It includes some minor and, in respect to being used as an extension language, unimportant additions to the standard. The implementation is portable to a wide range of platforms, but a fully functional Windows NT port is not yet available. It does not o er any external interface to be used for embedding.

7.2.12 VSCM VSCM is a stand-alone Scheme implementation. It is based on a abstract machine architecture, and includes features such as dumping of memory images. Actually, the interpreter needs to boot from such an image, making it quite unsuitable.

7.3 Implementing from Scratch The other option in addition to using an existing implementation, is to construct a Scheme implementation from scratch. This is certainly a feasible approach, given Scheme's small de nition and simple syntax. The advantage of this approach is that the extensions mentioned in Section 7.2.1 can be a part of the design from the start. Also, full control will be held over the parameters discussed in Section 7.2.3. Typically, such an implementation would be a direct mapping of R5 RS, and not include any extensions other than those needed for integration with Rosetta. The pragmatic issues discussed in Section 7.1 must be addressed.

68

CHAPTER 7. SCHEME INTERPRETERS

7.3.1 Garbage Collection Rosetta's handle mechanism (Section 3.2.1) can not be used to trace memory allocation

in Scheme, because of the restriction on cycles. Thus, a new strategy must be used. As implementation of a garbage collector would be out of the scope of the project, a usable implementation must be found.

7.3.2 Architecture Design and implementation of an abstract machine would require a large part of the time resources available for the project, and is not necessarily very important. Thus, an implementation from scratch should be based on a naive architecture.

7.3.3 Tail Recursion and Continuations Apart from being required by the Scheme standard, proper tail recursion is highly desired from a performance viewpoint. A implementation must implement tail recursion within the semantics of C++. Continuations is a subtle issue, especially in the absence of an abstract machine architecture. This feature is not crucial in the context of a small extension language, hence a simple Scheme implementation could relinquish this requirement.

7.4 Conclusion Having brie y presented some of the pragmatic aspects of interpreting Scheme, discussed classi cation and common and desirable features of existing implementations, presented a selection of existing implementations and looked at some of the aspects of constructing an implementation from scratch, a decision must be made as to whether an existing implementation should be used, or one implemented from scratch. If an ideal existing implementation was available, the decision would be easy. Although Scheme is a small programming language, implementing from scratch will still take a considerable amount of time used on designing, implementing, testing, etc. Thus, the existing implementations will be reviewed rst. Clearly, embeddable interpreters are preferable to stand-alone interpreters. SCM does not seem like a good candidate due to portability problems, nor do VSCM due to the boot image feature. RScheme's poor documentation in addition to it being a stand-alone interpreter, as good as out-rules it. Scheme 48 seems like a candidate, although it is a stand-alone interpreter. It is not a very good candidate, however, because the documentation turns out to be poor also in this case. Elk stands out as a very good candidate at rst sight. It has almost everything one could want from an embeddable implementation. And then some, unfortunately. Recall that an important design parameter was that the embedded language should not dominate the application. Clearly, this is incompatible with Elk's design, where the interpreter runs the application instead of the other way around. Consider however, that this requirement could be relaxed. Evidence of successful porting of Elk to Windows NT

7.4. CONCLUSION

69

has not been found. Electronic mail correspondence with Elk's designer, Oliver Laumann5 , reveals that NT ports allegedly have been done in commercial applications, but that none of them are freely available. Elk has been available for many years, and that an NT port has failed to appear must be taken, at least partly, as evidence for porting problems. The time resources available does not allow exhaustive porting experiments, so Elk must regrettably be discarded. Also Guile would be an almost perfect candidate if it was nished, but at the time of writing it is still in the development stage. Also given that the portability to NT is doubtful, Guile must be discarded, too. libscheme, the smallest implementation considered, also seems like a very good candidate because of it's size. It has been ported to NT, although the port is not available. However, few porting problems are suspected, because of the implementation's simple design and the fact that the garbage collector, gc, compiles directly under NT. Of course, the lack of tail recursion is undesired. Although a fairly large and complex implementation with many (for our purpose) unnecessary features, also MzScheme is an interesting implementation. It compiles directly on Windows NT, and is very well documented. MzScheme's size and complexity, however, make integrating MzScheme with Rosetta a larger task than using, say, libscheme. To recapitulate, two existing implementations have surfaced as good candidates; libscheme and MzScheme. The alternative of constructing a new implementation must also be considered. As argued above, the choice would be easy if a perfect existing implementation was available. But this goes the other way around too; implementing from scratch would probably take less resources that using the wrong existing implementation. Given Scheme's simplicity, the time used on struggling with problems related to e.g. porting or the extensions speci ed in Chapter 6, could quickly exceed the time needed to construct an implementation from scratch. And as it is, the two candidates are not perfect. The solution is a compromise.

7.4.1 An Implementation Strategy The Scheme interpreter to be embedded in Rosetta will be implemented using libscheme. The reasons for this decision are:

 libscheme is designed as library with the sole intention to be used by other applications.  libscheme is a very small and simple implementation, and little e ort will predictably be

needed to extend it as proposed in the previous chapter. Presumably, an implementation as simple as libscheme will cause few problems, and will presumably not require as much e ort as would creating a new implementation.  libscheme's interface is shared by MzScheme. If libscheme fails to meet the performance requirements, on the grounds of lack of tail recursion and a naive architecture, it can be replaced by MzScheme with little or no change of the interface code.

5

Technische Universitat Berlin, Germany

70

CHAPTER 7. SCHEME INTERPRETERS

Chapter 8

Implementation A Scheme interpreter has been implemented inside Rosetta, and integrated the way described in Chapter 6 using libscheme. The source code is included in Appendix A. The libscheme source code (except from the main header le) has been omitted because of limited space, although a range of changes and extensions has been made to that code. As the source code should be well structured and documented, only a brief overview is presented here.

8.1 Design Parameters It has been of prime importance to follow Rosetta's design philosophy when implementing the interpreter. Any extension should be an integral part of the application's source code. As such, some of the goals have been:

 The interpreter should interfere as little as possible with the rest of the application.    

In other words, as few changes as possible should be introduced to the existing source code.. Rosetta's existing components should be reused as much as possible. A new concept should not be introduced if there already is a similar concept in Rosetta. The interface between the interpreter and Rosetta should not reveal any details of the internal interpreter implementation, in particular: libscheme. If similar problems already have been solved in Rosetta, problems should be solved in a similar way. Rosetta's existing code layout standard should be followed.

8.2 Integration The interpreter exhibits the model constructed in Chapter 6. Implementing this model consisted mainly of two problems; the universal environment and objects as closures. 71

72

CHAPTER 8. IMPLEMENTATION

8.2.1 The Universal Environment libscheme's Scheme Env struct type was changed to a class type and renamed to SchemeEnvironment, and augmented with a lookup function. libscheme's own lookup function was changed to call this. By inheriting a SchemeUniversalEnv class from SchemeEnvironment, advantage is taken of C++'s polymorphism features so that lookup is performed in the universal environment when necessary. Instances of SchemeUniversalEnv is created by passing the constructor a pointer to a Project object. This pointer is used to look up names in the project tree.

8.2.2 Objects As Closures libscheme's Scheme Object struct was augmented with a handle pointing to an instance of a class derived from SchemeMessageDispatcher. When SchemeUniversalEnv::Lookup performs a successfull lookup in the project tree, a SchemeMessageWrapper instance is created as a wrapper around the particular Rosetta object. Then, a Scheme Object object is created, the type eld set to indicate a closure, and the handle is set to point to the object. libscheme's evaluation function has been changed, so that for closures with a non-NULL handle the Dispatch of the object pointed to by the handle is invoked. This method dispatches on the rst argument, presumably a textual message, and calls the appropriate member function of the associated Rosetta object. The addition of a new eld to Scheme Object is not very elegant, but was the only alternative to a major rewriting of libscheme. At the present time, there two classes derive from SchemeMessageDispatcher: SchemeStructureDispatcher and SchemeAlgorithmDispatcher. These are uses as wrappers around structure and algorithm objects, respectively. The dispatch functions typically dispatches on object type in addition to messages.

8.3 User Interface The interpreter is installed as an algorithm, which is applicable to the top-level project node only. SchemeInterpreter::Apply creates a SchemeUniversalEnv object, and a SchemeEvaluator object with the former as a parameter to the latter. The SchemeInterpreter is in reality a InteractiveSchemeInterpreter, the only none-virtual class derived from SchemeInterpreter. InteractiveSchemeInterpreter uses a document/view template to create a related frame, document and view. The document is an instance of CSchemeInterpreterDoc, and is given the SchemeEvaluator object as argument. SchemeEvaluator objects can evaluate Scheme expressions. They are initialized with Scheme input and output ports. SchemeEvaluator::Eval reads the next expression from the input port, evaluates it, and sends the result to the output port. Essentially, CSchemeInterpreterDoc just holds a SchemeEvaluator object. The view created by the template is a CViewSchemeInterpreter object. This class is derived from the MFC class CEditView, and appears on the screen as an edit window. This class' main task is to channel the user input to the document (which just forwards it to the SchemeEval-

8.4. LIBSCHEME

73

uator), and print the result returned. Conceptually simple, but the implementation required quite a lot overhead. The view object creates SchemeEditCtrlInputPort and SchemeEditCtrlOutputPort objects which are passed on to the SchemeEvaluator object. These objects implement Scheme ports. The current mechanism is simple and not very bulletproof, and should be redesigned. Generally, the GUI part of the interpreter can be regarded as a fully functional prototype, and needs to be made more robust.

8.4 libscheme Because some of the changes introduced required C++ speci c constructs, libscheme, being a C library, had to be compiled as C++ code. This entailed a few problems, and further changes to the libscheme code. Another and, as it turned out, more serious problem with libscheme was it's tight coupling to the Unix programming environment. First, input/output code was spread around the source code, and had to be rewritten. Second, libscheme's error control system is implemented with the C function longjmp. Unfortunately, longjmp does not respect C++ semantics. For instance, if an error is discovered in a function with a call chain involving class member functions, the functions never return. libscheme's error control system was rewritten using C++ exceptions. libscheme fails to implement proper tailrecursion. Fortunately, instead of being syntactic sugar for a recursive construct, the iteration construct do is implemented iteratively in libscheme's source code. Iterative control behaviour can thus be achieved by using this construct. Generally, considerable e ort were needed to adapt libscheme.

74

CHAPTER 8. IMPLEMENTATION

Chapter 9

Experiences, Conclusions and Future Work 9.1 Experiences The purpose of the example shown in this section is to illustrate practical use of Scheme. No interpretation of the numerical results will be presented. Typical applications of an embedded language in Rosetta are to calculate various simple statistical properties. The problem of calculating the c-index (Mann & Whitney U test) has been approached by writing a Scheme program. The c-index statistic is a measure of how well a classi er (e.g. a set of rules) classify a particular data set, given the correct classi cations. The calculation of the c-index can be divided into two parts. First, a list of pairs is generated. Each pair correspond to each object in the data set. The second value in a pair re ects the correct classi cation, and the rst value is a measure of the certainty that the object would be classi ed as a \positive" by the classi er. Figure 9.1 shows the Scheme code used to generate the list of pairs. getpairs-iter generates a list of pairs from a decision table and a classi er. In each iteration, a pair is calculated by makepair. The rst element of a pair is equal to the certainty measure returned by the classi er if the predicted value is \positive", or one minus the certainty returned if not. The picture gets complicated by the fact that there is no guarantee that the classi er is able to classify a given object. The fallbackval and fallbackcert parameters to getpairs-iter specify, respectively, a fallback value and a fallback certainty. If an object can not be classi ed, the fallback value is used with the associated certainty, see the ranking and predicted procedures. The last argument, one, is the classi cation value is to be regarded as \positive". The c-index is then calculated by processing the list, as shown in Figure 9.2. pairs2cindex takes a decision table, a set of rules, a list of pairs and the \positive" value, and returns the c-index. The actual computation is rather technical. It utilizes count to calculate results for each pair. cindex is a procedure that sets up the parameters, and combines pairs2cindex and getpairs-iter. 75

76

CHAPTER 9. EXPERIENCES, CONCLUSIONS AND FUTURE WORK

; classification functions: (define (ranking classification fallbackcert) (if classification (classification 'getcertaintycoefficient 0) fallbackcert)) (define (actual vector decattr) (vector 'getentry decattr)) (define (predicted classification fallbackval) (if classification (classification 'getdecisionvalue 0) fallbackval))

; make ranking/actual pair from vector, decision attribute and classifier. (define (makepair infvector decattr classifier fallbackval fallbackcert one) (letrec ( (rank (lambda () (ranking (classification) fallbackcert))) (classification (lambda() (classifier 'apply infvector))) ) (cons (if (= (predicted (classification) fallbackval) one) (rank) (- 1 (rank)) ) (actual infvector decattr) )))

; getpairs constructs a list of (Scheme) pairs. (define (getpairs-iter dt classifier fallbackval fallbackcert one) (do ( (i 0 (+ i 1)) (size (dt 'getnoobjects #f) ) (pairs '()) (da (dt 'getdecisionattribute #f)) (vector 'foo) ) ((= i size) pairs) (set! vector (dt 'createinformationvector i #f)) (let ( (pair (makepair vector da classifier fallbackval fallbackcert one)) ) (set! pairs (append pairs (list pair))) )))

Figure 9.1: C-index list of pairs.

9.1. EXPERIENCES

77

(define (count pairlist pair one) (do ( (rest pairlist (cdr rest)) (pairs 0 (+ pairs (if (= one (cdar rest)) 0 1))) (c 0 (+ c (if (= one (cdar rest)) 0 (if (> (car pair) (caar rest)) 1 0) ))) (t 0 (+ t (if (= one (cdar rest)) 0 (if (= (car pair) (caar rest)) 1 0) ))) ) ( (null? rest) (list pairs c t) ) 'foo ))

(define (pairs2cindex dt rules pairlist one) (do ( (rest pairlist (cdr rest)) (ones 0 (+ ones (if (= one (cdar rest)) 1 0))) (counts '(0 0 0) (if (= one (cdar rest)) (count pairlist (car rest) one) '(0 0 0))) (pairs 0 (+ pairs (car counts))) (c 0 (+ c (cadr counts))) (t 0 (+ t (caddr counts))) ) ( (null? rest) ; Add up last counts, and calculate c-index (let ( (pairs (+ pairs (car counts))) (c (+ c (cadr counts))) (t (+ t (caddr counts))) ) (/ (+ c (/ t 2)) pairs)) ) 'foo ))

(define (cindex dt rules) ; set up the parameters (let ( (one 2) ; "positive" classification (fallbackvalue 4) ; fallback to fallbackvalue with certainty (fallbackcertainty 0.43) ; fallbackcertainty if classifier fails. (classifier majorityvoter) ) ; set up the classifier (classifier 'setrules rules) ; compute (pairs2cindex dt rules (getpairs-iter dt classifier fallbackvalue fallbackcertainty one) one)))

Figure 9.2: Calculation of c-index from list of pairs.

CHAPTER 9. EXPERIENCES, CONCLUSIONS AND FUTURE WORK

78

The program was run on a sample data set; a breast cancer database from Dr. William H. Wolberg, University of Wisconsin Hospitals, Madison, USA. The data set contains 400 objects with 11 attributes corresponding to clinical data. The decision attribute indicate one of two decision classes, and has value 2 (respective 4) indicating a benign (respective malignant) case. To create a classi er, the following steps were taken: 1. The decision table was split in a training and a testing set of 268 and 132 objects respectively. 2. Both sets were discretized. 3. A set of 50 reducts was extracted from the training set. 4. A rule set was generated from the reducts. The cindex procedure was set up as shown in Figure 9.2. 4 is used as the fallback value, and the fallback certainty is chosen to be the a priori distribution of the particular classi cation, in this case 43% . cindex was run with the testing set and the rule set as arguments. The interaction with the Scheme interpreter is shown in Figure 9.3. Figure 9.4 shows the setup. This is Scheme. Copyright Thomas  A gotnes 1998, libscheme 0.5 Copyright Brent Benson 1994.

> (load "c:\\rosetta\\scheme\\samples\\pairs\\cindex.scm") => cindex > (cindex testing myrules) => 0.928304 >

Figure 9.3: Calculating the c-index. Interaction with the interpreter.

9.1.1 Performance Issues The Scheme code shown here is for illustrative purposes, and has not been optimalized in any sense. It includes e.g. many redundant procedure activations. The rst attempt to generate pair was a recursive procedure, shown in Figure 9.5. Observe that the recursive call is not in tail position. On large data sets, this procedure exhausted the available memory.

9.2. CONCLUSIONS

79

Figure 9.4: Setup for calculating c-index.

9.2 Conclusions It is still too early to conclude whether Scheme is a good programming language choice. The real test of this is the extent that people will actually use the interpreter. The integration model constructed in this report seems viable in practical programming. The model is both sound and easy to use, requiring little overhead to access Rosetta objects. The selection of libscheme may be regarded as a bad choice. A considerable part of the time resources have been used on libscheme, and it is not certain that the creation of a new implementation would have required particularly more time. Practical experiences show that Scheme can be used to solve real-world problems in Rosetta. However, they also show that pragmatic concerns must be taken into the consideration by the programmer, especially memory usage concerns. It is important to restrict deep recursions. Fortunately, Scheme o ers other programming techniques that can be used both as a supplement and an alternative to recursion. The problems concerning memory management can be suspected to be related to libscheme. A more ecient implementation would probably solve at least part of the problem.

80

CHAPTER 9. EXPERIENCES, CONCLUSIONS AND FUTURE WORK

(define (getpairs-rek dt i classifier fallbackval fallbackcert one) (let ( (infvector (lambda() (dt 'CreateInformationVector i #f))) (decisionattr (dt 'GetDecisionAttribute #f)) ) (if (= i (- (dt 'getnoobjects #f) 1)) '() (let ( (pair (makepair (infvector) decisionattr classifier fallbackval fallbackcert one)) ) (cons pair (getpairs-rek dt (+ i 1) classifier fallbackval fallbackcert one)) ))))

Figure 9.5: Recursive version of getpairs-iter.

9.3 Future Work The existing interpreter implementation is fully functional, but both an overhaul and extensions are in place. Some aspects are:

 Documentation of the available methods and corresponding messages, and their arguments must be made available for the Rosetta users.  The GUI part of the interpreter should have an overhaul to make it more robust.  Alternatives to libscheme should be considered. Creating a new implementation from scratch would make the Rosetta source code cleaner and more uniform. Also, MzScheme is worth considering.  Library functionality for Scheme code should be looked at.  Means for registering Scheme code as algorithms in the GUI should be investigated.  Means for registering objects of arbitrary Scheme types in the project tree would be useful to attach general results to their related structures, and should be investigated.

Bibliography [AS96] Harold Abelson and Gerald Jay Sussman. Structure and Interpretation of Computer Programs. The MIT Press, Cambridge, Massachusetts, second edition, 1996. [Ben94] Brent W. Benson Jr. libscheme: Scheme as a C library. In Proceedings of the 1994 USENIX Symposium on Very High Level Languages, pages 7{19. USENIX, October 1994. [BW88] Hans-Juergen Boehm and Mark Weiser. Garbage collection in an uncooperative environment. Software Practice and Experience, 18(9):807{820, September 1988. [CE91] William Clinger and Jonathan Rees (Editors). Revised4 Report on the Algorithmic Language Scheme, 2. November 1991. [Fla97a] Matthew Flatt. Inside PLT MzScheme. Technical report, Dept. of Computer Science, Rice University, Aug 1997. [Fla97b] Matthew Flatt. PLT MzScheme: Language Manual. Technical report, Dept. of Computer Science, Rice University, Aug 1997. [Fla97c] Matthew Flatt. PLT xctocc: C++ Glue Generator Manual. Technical report, Dept. of Computer Science, Rice University, Aug 1997. [FWH92] Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes. Essensials of Programming Languages. The MIT Press, Cambridge, Massachusetts, 1992. [IEE90] IEEE Std 1178-1990. IEEE Standard for the Scheme Programming Language, 1990. [KCE98] Richard Kelsey, William Clinger, and Jonathan Rees (Editors). Revised5 Report on the Algorithmic Language Scheme, 20 February 1998. [LB94] Oliver Laumann and Carsten Bormann. Elk: The extension language kit. USENIX Computing Systems, 7(4), 1994. [Lee90] Jan Van Leeuwen, editor. Formal Models and Semantics, volume B of Handbook of Theoretical Computer Science. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990. [LMB92] John Levine, Tony Mason, and Doug Brown. lex & yacc. O'Reilly & Assosiates, Inc., second edition, Oct 1992. 81

82

BIBLIOGRAPHY

[Lou97] Kenneth C. Louden. Compiler Construction, Principles and Practice. PWS Publishing Company, 1997. [Mey88] Bertrand Meyer. Introduction to the Theory of Programming Languages. Prentice Hall, 1988. [Mol97] Torulf Mollestad. A Rough Set Approach to Data Mining: Extracting a Logic of Default Rules from Data. PhD thesis, The Norwegian University of Science and Technology, Trondheim, Norway, 1997. [K97] A. hrn and J. Komorowski. Rosetta { A Rough Set Toolkit for Analysis of Data. Proc. Third International Joint Conference on Information Sciences, 3:403{407, Mar 1997. [KSS97] A. hrn, J. Komorowski, A. Skowron, and P. Synak. Rosetta { Part I: System Overview. Technical report, The Norwegian University of Science and Technology, Trondheim, Norway, 1997. [Ous90] John K. Ousterhout. Tcl: An embeddable command language. In Proceedings of the USENIX 1990 Winter Conference, pages 133{146, 1990. [Paw82] Z. Pawlac. Rough sets. International Journal of Information and Computer Science, 11(5):341{356, 1982. [PZ96] Terrence W. Pratt and Marvin V. Zelkowitz. Programming Languages, Design and Implementation. Prentice Hall, third edition, 1996. [RK95] Jonathan Rees and Richard Kelsey. A tractable scheme implementation. Lisp and Symbolic Computation, 7(4), 1995. [Ros96] Kenneth H. Rosen. Discrete Mathematics and its Applications. McGraw-Hill, Inc., third edition, 1996. [Sta79] Richard M. Stallman. EMACS: The extensible, customizable, self-documenting display editor. Technical Memo AIM-519A, Massachusetts Institute of Technology, Arti cial Intelligence Laboratory, June 1979. [Ten90] R. D. Tennent. Semantics of Programming Languages. Prentice Hall, 1990. [TG97] Franklyn Turbak and David Gi ord. Draft: Applied semantics of programming languages. 1997. [Val96] Kjetil Valstadsve. Fully functional, modern approaches to teaching engineering. Technical report, The Norwegian University of Science and Technology, Trondheim, Norway, 1996. [Win93] Glynn Winskel. The Formal Semantics of Programming Languages. The MIT Press, Cambridge, Massachusetts, 1993.

Appendix A

Source Code A.1 InteractiveSchemeInterpreter.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __INTERACTIVESCHEMEINTERPRETER_H__ #define __INTERACTIVESCHEMEINTERPRETER_H__ #include "SchemeInterpreter.h" //------------------------------------------------------------------// Identifier declarations. //=================================================================== DeclareId(INTERACTIVESCHEMEINTERPRETER) //------------------------------------------------------------------// Class.........: InteractiveSchemeInterpreter // Author........: Thomas gotnes // Date..........: 980401 // Description...: Starts interactive window based Scheme interpreter. // Revisions.....: //=================================================================== class InteractiveSchemeInterpreter : public SchemeInterpreter {

83

APPENDIX A. SOURCE CODE

84 public: //- Constructors/destructor InteractiveSchemeInterpreter(); virtual ~InteractiveSchemeInterpreter(); protected:

//- Metods inherited from SchemeInterpreter virtual void go( Handle schemer) const; }; #endif

A.2 InteractiveSchemeInterpreter.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include

"stdafx.h" "rosetta.h" "InteractiveSchemeInterpreter.h"

//------------------------------------------------------------------// Identifier implementations. //=================================================================== ImplementId(INTERACTIVESCHEMEINTERPRETER, "InteractiveSchemeInterpreter", "Interactive Scheme Interpreter") //------------------------------------------------------------------// Methods for class InteractiveSchemeInterpreter. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== InteractiveSchemeInterpreter::InteractiveSchemeInterpreter()

A.3. SCHEMEALGORITHMDISPATCHER.H { } InteractiveSchemeInterpreter::~InteractiveSchemeInterpreter() { } //------------------------------------------------------------------// Method........: go // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: Creates a document containing a Scheme evaluator, // and opens frame/view with read-eval-print loop. // Revisions.....: //=================================================================== void InteractiveSchemeInterpreter::go(Handle schemer) const { // Create document to evaluate expressions in the context of the project CSchemeInterpreterDoc *pNewDoc = new CSchemeInterpreterDoc(schemer); // Create interpreter window with read-eval-print loop CFrameWnd* pNewFrame = theApp.m_SchemeInterpreterTemplate->CreateNewFrame(pNewDoc, NULL); theApp.m_SchemeInterpreterTemplate->InitialUpdateFrame(pNewFrame, pNewDoc); }

A.3 SchemeAlgorithmDispatcher.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //===================================================================

85

APPENDIX A. SOURCE CODE

86 #ifndef __SCHEMEALGORITHMDISPATCHER_H__ #define __SCHEMEALGORITHMDISPATCHER_H__ #include "SchemeMessageDispatcher.h" #include #include "scheme.h"

//------------------------------------------------------------------// Class.........: SchemeAlgorithmDispatcher // Author........: Thomas gotnes // Date..........: 980401 // Description...: Wrapper for algorithmm objects. // Revisions.....: //=================================================================== class SchemeAlgorithmDispatcher : public SchemeMessageDispatcher { public: //- Constructors/destructor SchemeAlgorithmDispatcher(Handle a); SchemeAlgorithmDispatcher() {} //- Access virtual Handle Dispatch(int argc, Handle argv[]); virtual bool IsA(Id id); virtual Handle GetObject(); protected: Handle algorithm_; }; #endif

A.4 SchemeAlgorithmDispatcher.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401

A.4. SCHEMEALGORITHMDISPATCHER.CPP // Description...: // Revisions.....: //=================================================================== #include #include #include #include #include #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeAlgorithmDispatcher.h" "scheme.h"

//------------------------------------------------------------------// Methods for class SchemeAlgorithmDispatcher. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeAlgorithmDispatcher::SchemeAlgorithmDispatcher(Handle a) { algorithm_ = a; } //------------------------------------------------------------------// Method........: IsA // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== bool SchemeAlgorithmDispatcher::IsA(Id id) { return algorithm_->IsA(id); } //------------------------------------------------------------------// Method........: Dispatch // Author........: Thomas gotnes // Date..........:

87

APPENDIX A. SOURCE CODE

88

// Description...: The message dispatch function. // Comments......: // Revisions.....: //=================================================================== Handle SchemeAlgorithmDispatcher::Dispatch(int argc, Handle argv[]) { Handle wrapper; String ident = "Algorithm"; // Assert first argument (message) Assert(argc > 0, ident, SchemeErrorMsg::ArgNum(1)); Assert(argv[0]->IsSymbol(), ident, SchemeErrorMsg::ArgType("first", "symbol")); String message = argv[0]->SymbolVal();

// First dispatch on self type, then on message if (IsA(CLASSIFIER)) { ident = "Classifier"; Handle classifier = dynamic_cast(Classifier *, algorithm_.GetPointer()); if (message == "setrules!") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsRosettaObject(RULES), ident, message, SchemeErrorMsg::ArgType("first", "Rules")); return (new SchemeObject(classifier->SetRules(argv[1]->GetRules().GetPointer()))); } }

// Generic Algorithm methods if (message == "apply") {

A.4. SCHEMEALGORITHMDISPATCHER.CPP // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsRosettaObject(STRUCTURE), ident, message, SchemeErrorMsg::ArgType("first", "Structure")); // Apply algorithm Handle arg = argv[1]->GetStructure(); Handle result = algorithm_->Apply(*arg); // Get rid of those progress/continue windows Message::Reset(); // Application failed, return #f if (result == NULL) return new SchemeObject((bool) FALSE); //Fail("Algorithm::Apply failed."); // Wrap up result wrapper = new SchemeStructureDispatcher(result); return (new SchemeObject(wrapper)); } else if (message == "setparameters") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsString(), ident, message, SchemeErrorMsg::ArgType("first", "string")); return (new SchemeObject(algorithm_->SetParameters(argv[1]->StringVal()))); } else if (message == "getparameters") { // Assert parameters (none) Assert(argc == 1, ident, message, SchemeErrorMsg::ArgNum(0)); return (new SchemeObject(algorithm_->GetParameters())); } // Unhandled Message else {

89

APPENDIX A. SOURCE CODE

90 UnknownMessage(ident, message); } return NULL; }

//------------------------------------------------------------------// Method........: GetObject // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== Handle SchemeAlgorithmDispatcher::GetObject() { return algorithm_.GetPointer(); }

A.5 SchemeEditCtrlInputPort.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEEDITCTRLINPUTPORT_H__ #define __SCHEMEEDITCTRLINPUTPORT_H__ #include "SchemeInputPort.h" //------------------------------------------------------------------// Class.........: SchemeEditCtrlInputPort // Author........: Thomas gotnes // Date..........: 980401 // Description...: Scheme input port attached to an edit control. // Revisions.....: //=================================================================== class SchemeEditCtrlInputPort : public SchemeInputPort

A.6. SCHEMEEDITCTRLINPUTPORT.CPP { protected: //- Local data CEdit *editctrl_; int pos_; String buffer_; bool closed_; public: //- Exeptions class ReadPanic {}; //- Constructors/destructor SchemeEditCtrlInputPort(CEdit *editctrl); virtual ~SchemeEditCtrlInputPort(); //- Inherited from SchemeInputPort virtual void Reset(); virtual void Update(); virtual void Close(); protected: //- Inherited from SchemeInputPort virtual int Getc() throw (ReadPanic); virtual void Ungetc(); virtual int CharReady(); }; #endif

A.6 SchemeEditCtrlInputPort.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: // Description...: // Revisions.....: //===================================================================

91

APPENDIX A. SOURCE CODE

92 #include "stdafx.h" #include "rosetta.h" #include "SchemeEditCtrlInputPort.h"

//------------------------------------------------------------------// Methods for class ManualScaler. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeEditCtrlInputPort::SchemeEditCtrlInputPort(CEdit *editctrl) { editctrl_ = editctrl; closed_ = FALSE; Reset(); } SchemeEditCtrlInputPort::~SchemeEditCtrlInputPort() { } //------------------------------------------------------------------// Method........: Reset // Author........: Thomas gotnes // Date..........: // Description...: Resets position to the end of the buffer. // Comments......: // Revisions.....: //=================================================================== void SchemeEditCtrlInputPort::Reset() { int __dummy; Update(); // Read from cursor position editctrl_->GetSel(pos_, __dummy); } //------------------------------------------------------------------// Method........: Update

A.6. SCHEMEEDITCTRLINPUTPORT.CPP // Author........: Thomas gotnes // Date..........: // Description...: Gets new data from the edit control. // Comments......: // Revisions.....: //=================================================================== void SchemeEditCtrlInputPort::Update() { CString cstr; // Get text editctrl_->GetWindowText(cstr); buffer_ = cstr; } //------------------------------------------------------------------// Method........: Getc // Author........: Thomas gotnes // Date..........: // Description...: Returns next character from the buffer. // If no character is available, the function waits // in it's own message loop. // // Comments......: This is a crude prototype only, and should be // reimplemented. The message loop is needed to // avoid busy waiting. Other alternatives should // be considered. An exeption needs to be thrown // when the object is about to be destroyed, in order // to get out of the parsing before the port is // destroyed. This mechanism should also be redesigned. // Revisions.....: //=================================================================== int SchemeEditCtrlInputPort::Getc() { bool done = CharReady(); while ( !done ) { done = CharReady(); MSG msg;

93

APPENDIX A. SOURCE CODE

94

while ( ::PeekMessage( &msg, NULL, 0, 0, PM_NOREMOVE ) ) { if ( !theApp.PumpMessage( ) ) { done = TRUE; ::PostQuitMessage(0); break; } if (closed_) { throw ReadPanic(); } } // let MFC do its idle processing LONG lIdle = 0; while ( AfxGetApp()->OnIdle(lIdle++ ) ) ; } return buffer_[pos_++]; } //------------------------------------------------------------------// Method........: Ungetc // Author........: Thomas gotnes // Date..........: // Description...: Moves buffer pointer back. // Comments......: // Revisions.....: //=================================================================== void SchemeEditCtrlInputPort::Ungetc() { if (pos_ > 0) pos_--; } //------------------------------------------------------------------// Method........: CharReady // Author........: Thomas gotnes // Date..........: // Description...: True if there are unread characters in the buffer. // Comments......: // Revisions.....:

A.7. SCHEMEEDITCTRLOUTPUTPORT.H //=================================================================== int SchemeEditCtrlInputPort::CharReady() { return (pos_ < buffer_.GetLength()); } //------------------------------------------------------------------// Method........: Close // Author........: Thomas gotnes // Date..........: // Description...: Needed to interupt the message loop. // Comments......: // Revisions.....: //=================================================================== void SchemeEditCtrlInputPort::Close() { closed_ = TRUE; }

A.7 SchemeEditCtrlOutputPort.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEDITCTRLOUTPUTPORT_H__ #define __SCHEMEDITCTRLOUTPUTPORT_H__ #include "SchemeOutputPort.h"

//------------------------------------------------------------------// Class.........: SchemeEditCtrlOutputPort // Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....:

95

APPENDIX A. SOURCE CODE

96

//=================================================================== class SchemeEditCtrlOutputPort : public SchemeOutputPort { public: //- Constructors/destructor SchemeEditCtrlOutputPort(CEdit *editctrl); virtual ~SchemeEditCtrlOutputPort(); virtual void Close(); protected: virtual void Write(String str); CEdit *editctrl_; }; #endif

A.8 SchemeEditCtrlOutputPort.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include "stdafx.h" #include "rosetta.h" #include "SchemeEditCtrlOutputPort.h" //------------------------------------------------------------------// Methods for class SchemeEditCtrlOutputPort. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeEditCtrlOutputPort::SchemeEditCtrlOutputPort(CEdit *editctrl) { editctrl_ = editctrl;

A.9. SCHEMEENVIRONMENT.CPP } SchemeEditCtrlOutputPort::~SchemeEditCtrlOutputPort() { } void SchemeEditCtrlOutputPort::Write(String str) { // Insert text at the current cursor location editctrl_->ReplaceSel(str.GetBuffer()); } void SchemeEditCtrlOutputPort::Close() { }

A.9 SchemeEnvironment.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeEnvironment.h" "scheme.h"

//------------------------------------------------------------------// Methods for class SchemeEnvironment. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //===================================================================

SchemeEnvironment::SchemeEnvironment() {

97

APPENDIX A. SOURCE CODE

98 } SchemeEnvironment::~SchemeEnvironment() { }

//------------------------------------------------------------------// Method........: Lookup // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: lookup function in regular environment. // Revisions.....: //=================================================================== Handle SchemeEnvironment::Lookup(Handle obj) { Scheme_Object *symbol = obj->GetRepr(); int i; // try libscheme local lookup for ( i=0 ; iLookup(obj)); // try libscheme global lookup Scheme_Object *val = scheme_lookup_global (symbol, this); if (val) return new SchemeObject(val); return NULL; }

A.10. SCHEMEENVIRONMENT.H

A.10 SchemeEnvironment.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEENVIRONMENT_H__ #define __SCHEMEENVIRONMENT_H__ #include #include #include #include #include

"KERNEL\BASIC\referent.h" "scheme.h" "SchemeObject.h" // Added by ClassView "gc_c++.h"

//------------------------------------------------------------------// Class.........: SchemeEnvironment // Author........: Thomas gotnes // Date..........: 980401 // Description...: replacement for libscheme Scheme_Env. // Revisions.....: //=================================================================== class SchemeEnvironment : public gc { public: // libscheme Scheme_Env members: int num_bindings; struct Scheme_Object **symbols; struct Scheme_Object **values; Scheme_Hash_Table *globals; SchemeEnvironment *next; public: //- Constructors/destructor SchemeEnvironment(); virtual ~SchemeEnvironment(); virtual Handle Lookup(Handle obj);

99

APPENDIX A. SOURCE CODE

100 }; #endif

A.11 SchemeError.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEERROR_H__ #define __SCHEMEERROR_H__ #include "SchemeObject.h" //------------------------------------------------------------------// Class.........: SchemeError // Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== class SchemeError : public SchemeObject { public: //- Constructors/destructor SchemeError(); SchemeError(String s); virtual ~SchemeError(); virtual bool IsError() { return TRUE; } }; #endif

A.12. SCHEMEERROR.CPP

A.12 SchemeError.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeError.h" "scheme.h"

//------------------------------------------------------------------// Methods for class SchemeError. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeError::SchemeError() : SchemeObject(NULL) { obj_ = scheme_make_type("#"); } SchemeError::SchemeError(String s) : SchemeObject(NULL) { String err = "#: "; err += s; obj_ = scheme_make_type(const_cast(char *, err.GetBuffer())); } SchemeError::~SchemeError() { }

A.13 SchemeEvaluator.h //------------------------------------------------------------------// Author........: Thomas gotnes

101

APPENDIX A. SOURCE CODE

102

// Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEEVALUATOR_H__ #define __SCHEMEEVALUATOR_H__ #include #include #include #include #include #include #include



//------------------------------------------------------------------// Class.........: SchemeEvaluator // Author........: Thomas gotnes // Date..........: 980401 // Description...: Evaluates scheme expressions // Revisions.....: //=================================================================== class SchemeEvaluator : public Referent { public: //- Constructors/destructor SchemeEvaluator(SchemeEnvironment *e); virtual ~SchemeEvaluator(); virtual Handle Eval(); virtual void SetOutputPort(Handle port); virtual void SetInputPort(Handle port); protected: //- data SchemeEnvironment *env; Handle output_port_; Handle input_port_; }; #endif

A.14. SCHEMEEVALUATOR.CPP

A.14 SchemeEvaluator.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeEvaluator.h" "scheme.h"

//------------------------------------------------------------------// Methods for class SchemeEvaluator. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeEvaluator::SchemeEvaluator(SchemeEnvironment *e) { env = e; output_port_ = NULL; input_port_ = NULL; } SchemeEvaluator::~SchemeEvaluator() { }

//------------------------------------------------------------------// Method........: Eval // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: Read next expression from the input port, and evaluates. // Revisions.....:

103

APPENDIX A. SOURCE CODE

104

//=================================================================== Handle SchemeEvaluator::Eval() { Scheme_Object *val; // Get expression from stream Scheme_Object *exp = scheme_read(input_port_->GetRepr()); try { // Evaluate val = scheme_eval(exp, env); } catch(String s) { return new SchemeError(s); } return (new SchemeObject(val)); }

void SchemeEvaluator::SetOutputPort(Handle port) { // Save the port object (important!) output_port_ = port; // Set global libscheme variables cur_out_port = scheme_stdout_port = scheme_stderr_port = port->GetRepr(); } void SchemeEvaluator::SetInputPort(Handle port) { // Save the port object (important!) input_port_ = port; // Set global libscheme variables cur_in_port = scheme_stdin_port = port->GetRepr(); }

A.15. SCHEMEINPUTPORT.H

A.15 SchemeInputPort.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEINPUTPORT_H__ #define __SCHEMEINPUTPORT_H__ #include "SchemeObject.h" #include "scheme.h" //------------------------------------------------------------------// Class.........: SchemeInputPort // Author........: Thomas gotnes // Date..........: 980401 // Description...: Virtual base class for Scheme input ports. // // Revisions.....: //=================================================================== class SchemeInputPort : public SchemeObject { public: //- Constructors/destructor SchemeInputPort(); virtual ~SchemeInputPort(); //- Operations virtual void Reset() = 0; virtual void Update() = 0; virtual void Close() = 0; protected: //- libscheme interface static int libschemeGetcFun(Scheme_Input_Port *port); static void libschemeUngetcFun(int i, Scheme_Input_Port *port); static int libschemeCharReadyFun(Scheme_Input_Port *port); static void libschemeCloseFun(Scheme_Input_Port *port);

105

APPENDIX A. SOURCE CODE

106 //- Access virtual int Getc() = 0; virtual void Ungetc() = 0; virtual int CharReady() = 0; }; #endif

A.16 SchemeInputPort.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: // Description...: // Revisions.....: //=================================================================== #include "stdafx.h" #include "rosetta.h" #include "SchemeInputPort.h" //------------------------------------------------------------------// Methods for class SchemeInputPort. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== //------------------------------------------------------------------// Method........: Constructor // Author........: Thomas gotnes // Date..........: // Description...: Creates a libscheme port object, as defined // by the functions below. // Comments......: // Revisions.....: //=================================================================== SchemeInputPort::SchemeInputPort() : SchemeObject(NULL) { // Create libscheme port

A.16. SCHEMEINPUTPORT.CPP Scheme_Input_Port *port = scheme_make_input_port(scheme_input_port_type, this, libschemeGetcFun, libschemeUngetcFun, libschemeCharReadyFun, libschemeCloseFun); // Create libscheme object Scheme_Object *p = scheme_alloc_object (); SCHEME_TYPE(p) = scheme_input_port_type; SCHEME_PTR_VAL(p) = port; obj_ = p; } SchemeInputPort::~SchemeInputPort() { } //------------------------------------------------------------------// Method........: libschemeGetcFun // Author........: Thomas gotnes // Date..........: // Description...: Access point for the libscheme port. // Calls the "real" function defined in a subclass. // Comments......: // Revisions.....: //=================================================================== int SchemeInputPort::libschemeGetcFun(Scheme_Input_Port *port) { SchemeInputPort *theport = (SchemeInputPort *) port->port_data; return theport->Getc(); } //------------------------------------------------------------------// Method........: libschemeUngetcFun // Author........: Thomas gotnes // Date..........: // Description...: Access point for the libscheme port. // Calls the "real" function defined in a subclass. // Comments......: // Revisions.....: //=================================================================== void

107

APPENDIX A. SOURCE CODE

108

SchemeInputPort::libschemeUngetcFun(int i, Scheme_Input_Port *port) { SchemeInputPort *theport = (SchemeInputPort *) port->port_data; theport->Ungetc(); } //------------------------------------------------------------------// Method........: libschemeCharReadyFun // Author........: Thomas gotnes // Date..........: // Description...: Access point for the libscheme port. // Calls the "real" function defined in a subclass. // Comments......: // Revisions.....: //=================================================================== int SchemeInputPort::libschemeCharReadyFun(Scheme_Input_Port *port) { SchemeInputPort *theport = (SchemeInputPort *) port->port_data; return theport->CharReady(); } //------------------------------------------------------------------// Method........: libschemeCloseFun // Author........: Thomas gotnes // Date..........: // Description...: Access point for the libscheme port. // Calls the "real" function defined in a subclass. // Comments......: // Revisions.....: //=================================================================== void SchemeInputPort::libschemeCloseFun(Scheme_Input_Port *port) { SchemeInputPort *theport = (SchemeInputPort *) port->port_data; theport->Close(); }

A.17. SCHEMEINTERPRETER.H

A.17 SchemeInterpreter.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEINTERPRETER_H__ #define __SCHEMEINTERPRETER_H__ #include "KERNEL\ALGORITHMS\algorithm.h" #include //------------------------------------------------------------------// Identifier declarations. //=================================================================== DeclareId(SCHEMEINTERPRETER) //------------------------------------------------------------------// Class.........: SchemeInterpreter // Author........: Thomas gotnes // Date..........: 980401 // Description...: Virtual base class for Scheme interpreters. // Revisions.....: //=================================================================== class SchemeInterpreter : public Algorithm { public: //- Constructors/destructor SchemeInterpreter(); virtual ~SchemeInterpreter(); //- Methods inherited from Identifier DeclareIdMethods() //- Methods inherited from Algorithm virtual Structure * Apply(Structure &structure) const; virtual bool IsApplicable(const Structure &structure, bool warn = true) const;

109

APPENDIX A. SOURCE CODE

110 protected:

//- Main interpreter function (e.g. read-eval-print loop) virtual void go( Handle schemer ) const = 0; }; #endif

A.18 SchemeInterpreter.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include "stdafx.h" #include "rosetta.h" #include "SchemeInterpreter.h" #include #include #include #include #include #include #include #include #include

"scheme.h"

//------------------------------------------------------------------// Identifier implementations. //=================================================================== ImplementId(SCHEMEINTERPRETER, "SchemeInterpreter", "Scheme Interpreter") //------------------------------------------------------------------// Methods for class SchemeInterpreter. //=================================================================== //-------------------------------------------------------------------

A.18. SCHEMEINTERPRETER.CPP // Constructors/destructor. //=================================================================== //------------------------------------------------------------------// Method........: Constructor // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: Default empty constructor. // Revisions.....: //=================================================================== SchemeInterpreter::SchemeInterpreter() { } //------------------------------------------------------------------// Method........: Destructor // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== SchemeInterpreter::~SchemeInterpreter() { } //------------------------------------------------------------------// Methods inherited from Identifier. //=================================================================== ImplementIdMethods(SchemeInterpreter, SCHEMEINTERPRETER, Algorithm) //------------------------------------------------------------------// Methods inherited from Algorithm. //=================================================================== //------------------------------------------------------------------// Method........: IsApplicable // Author........: Thomas gotnes // Date..........: // Description...: Returns true if the algorithm is applicable to the // structure, false otherwise.

111

APPENDIX A. SOURCE CODE

112

// Comments......: // Revisions.....: //=================================================================== bool SchemeInterpreter::IsApplicable(const Structure &structure, bool /*warn*/) const { return structure.IsA(PROJECT); } //------------------------------------------------------------------// Method........: Apply // Author........: Thomas gotnes // Date..........: // Description...: Creates environment and Scheme evaluator. // The evaluator is passed to the virtual interpreter // loop function. // // Comments......: The space allocated to the env variable is // collected by GC when inaccessible. // Revisions.....: //===================================================================

Structure * SchemeInterpreter::Apply(Structure & structure) const { // Check input structure. if (!IsApplicable(structure)) return NULL; // Cast to verified type. Handle project = dynamic_cast(Project *, &structure); // Create universal environment SchemeUniversalEnv *env = scheme_universal_env(project); if (!env) return NULL; // Create the Scheme evaluator Handle schemer = new SchemeEvaluator(env); if (schemer == NULL) return NULL; // Enter interpreter, e.g. read-eval-print loop go(schemer); return &structure;

A.19. SCHEMEINTERPRETERDOC.H }

A.19 SchemeInterpreterDoc.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEINTERPRETERDOC_H__ #define __SCHEMEINTERPRETERDOC_H__ #include #include #include #include

"SchemeObject.h" // Added by ClassView

//------------------------------------------------------------------// Class.........: SchemeInterpreterDoc // Author........: Thomas gotnes // Date..........: 980401 // Description...: Document for use with CSchemeInterpreterView. // Created by InteractiveSchemeInterpreter::go. // Revisions.....: //=================================================================== class CSchemeInterpreterDoc : public CDocument { DECLARE_DYNAMIC(CSchemeInterpreterDoc) protected: //- Local variables Handle schemer_; public: //- Inherited from CDocument virtual BOOL CanCloseFrame(CFrameWnd*);

113

APPENDIX A. SOURCE CODE

114

//- Constructors/destructor CSchemeInterpreterDoc(Handle evaluator); virtual ~CSchemeInterpreterDoc(); //- Access Handle Eval(); void SetOutputPort(Handle port) { schemer_->SetOutputPort(port); } void SetInputPort(Handle port) { schemer_->SetInputPort(port); } }; #endif

A.20 SchemeInterpreterDoc.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include "stdafx.h" #include "rosetta.h" #include "SchemeInterpreterDoc.h" //------------------------------------------------------------------// Identifier implementations. //=================================================================== IMPLEMENT_DYNAMIC(CSchemeInterpreterDoc, CDocument) //------------------------------------------------------------------// Methods for class CSchemeInterpreterDoc. //=================================================================== //------------------------------------------------------------------// Inherited from CDocument //=================================================================== //------------------------------------------------------------------// Method........: CanCloseFrame

A.20. SCHEMEINTERPRETERDOC.CPP // Author........: // Date..........: // Description...: Called from the framework when the last frame // displaying a document is closed. // Comments......: We don't need to save anything. // Revisions.....: //=================================================================== BOOL CSchemeInterpreterDoc::CanCloseFrame(CFrameWnd*) { return TRUE; }

//------------------------------------------------------------------// Constructors/destructor. //=================================================================== CSchemeInterpreterDoc::CSchemeInterpreterDoc(Handle evaluator) { schemer_ = evaluator; } CSchemeInterpreterDoc::~CSchemeInterpreterDoc() { } //------------------------------------------------------------------// Method........: Eval // Author........: // Date..........: // Description...: Call evaluator. // Comments......: // Revisions.....: //=================================================================== Handle CSchemeInterpreterDoc::Eval() { return schemer_->Eval(); }

115

APPENDIX A. SOURCE CODE

116

A.21 SchemeMessageDispatcher.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980410 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEMESSAGEDISPATCHER_H__ #define __SCHEMEMESSAGEDISPATCHER_H__

#include #include #include struct Scheme_Object; //------------------------------------------------------------------// Class.........: SchemeMessageDispatcher // Author........: Thomas gotnes // Date..........: 980401 // Description...: Virtual base class for message dispatcher objects. // Message dispatchers are used as wrappers around // Rosetta structure/algorithm objects. Subclasses // overloads the Dispatch method to translate // (textual) messages into function calls. // // Revisions.....: //=================================================================== class SchemeMessageDispatcher : public Referent { public: virtual Handle Dispatch(int argc, Handle argv[]) = 0; virtual bool IsA(Id id) = 0; virtual Handle GetObject() = 0; // libscheme interface Scheme_Object* Dispatch(int argc, Scheme_Object *argv[]); protected:

A.22. SCHEMEMESSAGEDISPATCHER.CPP virtual virtual String msg); virtual virtual };

void Assert(bool expr, String id, String msg); void Assert(bool expr, String id, String message, void Fail(String msg); void UnknownMessage(String id, String message);

#endif

A.22 SchemeMessageDispatcher.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeMessageDispatcher.h" "scheme.h"

//------------------------------------------------------------------// Methods for class SchemeMessageDispatcher. //=================================================================== //------------------------------------------------------------------// Method........: Assert // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== void SchemeMessageDispatcher::Assert(bool expr, String id, String msg) { id += ": "; id += msg; // scheme_signal_error does not alter the buffer

117

APPENDIX A. SOURCE CODE

118

SCHEME_ASSERT(expr, const_cast(char *, id.GetBuffer())); } void SchemeMessageDispatcher::Assert(bool expr, String id, String message, String msg) { id += "->"; id += message; Assert(expr, id, msg); } //------------------------------------------------------------------// Method........: Fail // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== void SchemeMessageDispatcher::Fail(String msg) { scheme_signal_error(const_cast(char *, msg.GetBuffer())); } //------------------------------------------------------------------// Method........: UnknownMessage // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== void SchemeMessageDispatcher::UnknownMessage(String id, String message) { id += ": "; id += message; id += ": unknown message."; Fail(id); }

A.23. SCHEMEOBJECT.H //------------------------------------------------------------------// Method........: Dispatch // Author........: Thomas gotnes // Date..........: // Description...: libscheme interface to the "real" dispatch function. // Comments......: // Revisions.....: //=================================================================== Scheme_Object* SchemeMessageDispatcher::Dispatch(int argc, Scheme_Object * argv [ ]) { int i; Handle newargv[SCHEME_MAX_ARGS]; // make native objects from libscheme objects for (i = 0; i < argc; i++) newargv[i] = new SchemeObject(argv[i]); // call native dispatcher return Dispatch(argc, newargv)->GetRepr(); }

A.23 SchemeObject.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980410 // Description...: Interface for the SchemeObject class. // Revisions.....: //=================================================================== #ifndef __SCHEMEOBJECT_H__ #define __SCHEMEOBJECT_H__ #include #include #include

class SchemeOutputPort; class SchemeMessageDispatcher; struct Scheme_Object;

119

APPENDIX A. SOURCE CODE

120

//------------------------------------------------------------------// Class.........: SchemeObject // Author........: Thomas gotnes // Date..........: 980410 // Description...: // Revisions.....: //===================================================================

class SchemeObject : public Referent { protected: // libscheme object Scheme_Object *obj_; public: // Constructors/destructor SchemeObject(Scheme_Object *o); SchemeObject(int val); SchemeObject(bool val); SchemeObject(float val); SchemeObject(String val); SchemeObject(Handle wrapper); virtual ~SchemeObject(); // Identification virtual bool IsInteger(); virtual bool IsBool(); virtual bool IsString(); virtual bool IsSymbol(); // Rosetta object identification virtual bool IsRosettaObject(); virtual bool IsRosettaObject(Id id); // Access virtual int IntVal(); virtual bool BoolVal(); virtual String StringVal(); virtual String SymbolVal() { return StringVal(); } virtual Handle RosettaVal(); // Rosetta object access

A.24. SCHEMEOBJECT.CPP virtual Handle GetRules(); virtual Handle GetStructure(); // Textual representation virtual void Print(Handle port); // libscheme interface Scheme_Object* GetRepr(); }; #endif

A.24 SchemeObject.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeObject.h" "scheme.h"

//------------------------------------------------------------------// Methods for class SchemeObject. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeObject::SchemeObject(Scheme_Object *o) { obj_ = o; } SchemeObject::SchemeObject(int val)

121

APPENDIX A. SOURCE CODE

122 { obj_ = scheme_make_integer(val); } SchemeObject::SchemeObject(bool val) { if (val) obj_ = scheme_true; else obj_ = scheme_false; } SchemeObject::SchemeObject(float val) { obj_ = scheme_make_double(val); }

SchemeObject::SchemeObject(String val) { obj_ = scheme_make_string(const_cast(char *, val.GetBuffer())); } SchemeObject::SchemeObject(Handle wrapper) { obj_ = scheme_make_prim(wrapper); } SchemeObject::~SchemeObject() { } //------------------------------------------------------------------// Method........: Print // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== void SchemeObject::Print(Handle port) { // libscheme write scheme_write(obj_, port->GetRepr());

A.24. SCHEMEOBJECT.CPP } //------------------------------------------------------------------// Method........: GetRepr // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: // Revisions.....: //=================================================================== Scheme_Object* SchemeObject::GetRepr() { return obj_; }

bool SchemeObject::IsInteger() { return SCHEME_INTP(obj_); } bool SchemeObject::IsBool() { return SCHEME_BOOLP(obj_); } bool SchemeObject::IsString() { return SCHEME_STRINGP(obj_); } bool SchemeObject::IsSymbol() { return SCHEME_SYMBOLP(obj_); } bool SchemeObject::IsRosettaObject() { return (SCHEME_PRIMP(obj_) && (obj_->wrapper != NULL));

123

APPENDIX A. SOURCE CODE

124 } bool SchemeObject::IsRosettaObject(Id id) { return (IsRosettaObject() && RosettaVal()->IsA(id)); } int SchemeObject::IntVal() { return SCHEME_INT_VAL(obj_); } bool SchemeObject::BoolVal() { return SCHEME_INT_VAL(obj_); } String SchemeObject::StringVal() { return SCHEME_STR_VAL(obj_); } Handle SchemeObject::RosettaVal() { // be paranoid if (!IsRosettaObject()) return NULL; return obj_->wrapper; } Handle SchemeObject::GetRules() { if (IsRosettaObject(RULES)) return dynamic_cast(Rules *, RosettaVal()->GetObject().GetPointer()); return NULL; } Handle

A.25. SCHEMEOUTPUTPORT.H SchemeObject::GetStructure() { if (IsRosettaObject(STRUCTURE)) return dynamic_cast(Structure *, RosettaVal()->GetObject().GetPointer()); return NULL; }

A.25 SchemeOutputPort.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMEOUTPUTPORT_H__ #define __SCHEMEOUTPUTPORT_H__ #include "KERNEL\BASIC\referent.h" #include "scheme.h" #include //------------------------------------------------------------------// Class.........: SchemeOutputPort // Author........: Thomas gotnes // Date..........: 980401 // Description...: Virtual base class. // Revisions.....: //=================================================================== class SchemeOutputPort : public SchemeObject {

125

APPENDIX A. SOURCE CODE

126 public: //- Constructors/desctructor SchemeOutputPort(); virtual ~SchemeOutputPort(); virtual void Close() = 0; protected:

//- libscheme interface static void libschemeWriteFun(char *str, Scheme_Output_Port *port); static void libschemeCloseFun(Scheme_Output_Port *port); virtual void Write(String str) = 0; }; #endif

A.26 SchemeOutputPort.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include "stdafx.h" #include "rosetta.h" #include "SchemeOutputPort.h"

//------------------------------------------------------------------// Methods for class SchemeOutputPort. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //===================================================================

A.26. SCHEMEOUTPUTPORT.CPP SchemeOutputPort::SchemeOutputPort() : SchemeObject(NULL) { // Create libscheme port Scheme_Output_Port *port = scheme_make_output_port(scheme_output_port_type, this, libschemeWriteFun, libschemeCloseFun); // Create libscheme object Scheme_Object *p = scheme_alloc_object (); SCHEME_TYPE(p) = scheme_output_port_type; SCHEME_PTR_VAL(p) = port; obj_ = p; } SchemeOutputPort::~SchemeOutputPort() { } //------------------------------------------------------------------// Method........: libschemeWriteFun // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: Interface between libscheme and "real" write fun. // Revisions.....: //=================================================================== void SchemeOutputPort::libschemeWriteFun(char *str, Scheme_Output_Port *port) { SchemeOutputPort *theport = (SchemeOutputPort*) port->port_data; theport->Write(str); } void SchemeOutputPort::libschemeCloseFun(Scheme_Output_Port *port) { SchemeOutputPort *theport = (SchemeOutputPort *) port->port_data; theport->Close(); }

127

APPENDIX A. SOURCE CODE

128

A.27 SchemeStructureDispatcher.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #ifndef __SCHEMESTRUCTUREDISPATCHER_H__ #define __SCHEMESTRUCTUREDISPATCHER_H__ #include "SchemeMessageDispatcher.h" #include #include "scheme.h" //------------------------------------------------------------------// Class.........: SchemeStructureDispatcher // Author........: Thomas gotnes // Date..........: 980401 // Description...: Dispatches messages for structure objects. // Revisions.....: //=================================================================== class SchemeStructureDispatcher : public SchemeMessageDispatcher { public: //- Constructor/destructor SchemeStructureDispatcher(Handle s); virtual ~SchemeStructureDispatcher(); virtual Handle Dispatch(int argc, Handle argv[]); virtual bool IsA(Id id); virtual Handle GetObject(); protected: Handle structure_; };

A.28. SCHEMESTRUCTUREDISPATCHER.CPP #endif

A.28 SchemeStructureDispatcher.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeStructureDispatcher.h"

//------------------------------------------------------------------// Methods for class SchemeStructureDispatcher. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeStructureDispatcher::SchemeStructureDispatcher(Handle s) { structure_ = s; } SchemeStructureDispatcher::~SchemeStructureDispatcher() { } bool SchemeStructureDispatcher::IsA(Id id) { return structure_->IsA(id); }

129

APPENDIX A. SOURCE CODE

130 Handle SchemeStructureDispatcher::GetObject() { return structure_.GetPointer(); }

//------------------------------------------------------------------// Method........: Dispatch // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: Message dispatch function. // Revisions.....: //=================================================================== Handle SchemeStructureDispatcher::Dispatch(int argc, Handle argv[]) { String ident = structure_->GetName(); // Assert first argument (message) Assert(argc > 0, ident, SchemeErrorMsg::ArgNum(1)); Assert(argv[0]->IsSymbol(), ident, SchemeErrorMsg::ArgType("first", "symbol")); String message = argv[0]->SymbolVal();

// First dispatch on self type, then on message if (IsA(DECISIONTABLE)) { Handle dt = dynamic_cast(DecisionTable *, structure_.GetPointer()); if (message == "createinformationvector") { // Assert arguments Assert(argc == 3, ident, message, SchemeErrorMsg::ArgNum(2)); Assert(argv[1]->IsInteger(), ident, message, SchemeErrorMsg::ArgType("first", "integer")); Assert(argv[2]->IsBool(), ident, message, SchemeErrorMsg::ArgType("second", "boolean")); // Call method

A.28. SCHEMESTRUCTUREDISPATCHER.CPP

131

Handle inf = dt->CreateInformationVector(argv[1]->IntVal(), NULL, argv[2]->BoolVal()); // META: Check boolean // Application failed if (inf == NULL) Fail("DecisionTable::CreateInformationVector failed."); // Create new object Handle wrapper = new SchemeStructureDispatcher(inf); return (new SchemeObject(wrapper)); } else if (message == "getdecisionattribute") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsBool(), ident, message, SchemeErrorMsg::ArgType("first", "boolean")); return (new SchemeObject(dt->GetDecisionAttribute(argv[1]->BoolVal()))); } else if (message == "getnoobjects") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsBool(), ident, message, SchemeErrorMsg::ArgType("first", "boolean")); return (new SchemeObject(dt->GetNoObjects(argv[1]->BoolVal()))); }

} else if (IsA(CLASSIFICATION)) { // META: macro this:? Handle classification = dynamic_cast(Classification *, structure_.GetPointer());

APPENDIX A. SOURCE CODE

132

if (message == "getcertaintycoefficient") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsInteger(), ident, message, SchemeErrorMsg::ArgType("first", "integer")); return (new SchemeObject(classification->GetCertaintyCoefficient(argv[1]->IntVal()))); } else if (message == "getdecisionvalue") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsInteger(), ident, message, SchemeErrorMsg::ArgType("first", "integer")); return (new SchemeObject(classification->GetDecisionValue(argv[1]->IntVal()))); } else if (message == "getnodecisionvalues") { // Assert arguments Assert(argc == 1, ident, message, SchemeErrorMsg::ArgNum(0)); return (new SchemeObject(classification->GetNoDecisionValues())); } } else if (IsA(INFORMATIONVECTOR)) { Handle inf = dynamic_cast(InformationVector *, structure_.GetPointer()); if (message == "getentry") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1));

A.29. SCHEMEUNIVERSALENV.H Assert(argv[1]->IsInteger(), ident, message, SchemeErrorMsg::ArgType("first", "integer")); return (new SchemeObject(inf->GetEntry(argv[1]->IntVal()))); } }

// Generic Structure methods if (message == "appendchild") { // Assert arguments Assert(argc == 2, ident, message, SchemeErrorMsg::ArgNum(1)); Assert(argv[1]->IsRosettaObject(STRUCTURE), ident, message, SchemeErrorMsg::ArgType("first", "Structure")); Handle arg = argv[1]->GetStructure(); Handle result = new SchemeObject(structure_->AppendChild(arg.GetPointer())); return result; } // Unhandled message else { UnknownMessage(ident, message); }

return NULL; }

A.29 SchemeUniversalEnv.h //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...: // Revisions.....: //===================================================================

133

APPENDIX A. SOURCE CODE

134 #ifndef __SCHEMEUNIVERSALENV_H__ #define __SCHEMEUNIVERSALENV_H__ #include "SchemeEnvironment.h" #include #include

//------------------------------------------------------------------// Class.........: SchemeEnvironment // Author........: Thomas gotnes // Date..........: 980401 // Description...: Evaluates Scheme expressions // Revisions.....: //=================================================================== class SchemeUniversalEnv : public SchemeEnvironment { public: //- Constructors/destructor SchemeUniversalEnv(Handle p); SchemeUniversalEnv( void ) {} virtual ~SchemeUniversalEnv(); virtual Handle Lookup(Handle obj); void SetProject(Handle p) { project = p; } protected: Handle project; virtual Handle GetIdentifier(Id id, const String &name); bool GetMatchingIdentifiers(Id id, const String &name, Identifier::Vector &identifiers) const; }; #endif

A.30 SchemeUniversalEnv.cpp //------------------------------------------------------------------// Author........: Thomas gotnes

A.30. SCHEMEUNIVERSALENV.CPP // Date..........: 980401 // Description...: // Revisions.....: //=================================================================== #include #include #include #include #include #include #include

"stdafx.h" "rosetta.h" "SchemeUniversalEnv.h" "gc_c++.h"

//------------------------------------------------------------------// Methods for class SchemeUniversalEnv. //=================================================================== //------------------------------------------------------------------// Constructors/destructor. //=================================================================== SchemeUniversalEnv::SchemeUniversalEnv(Handle p) { project = p; } SchemeUniversalEnv::~SchemeUniversalEnv() { } //------------------------------------------------------------------// Method........: Lookup // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: The universal environment lookup function // // Revisions.....: //=================================================================== Handle SchemeUniversalEnv::Lookup(Handle obj) { // First look in local/global environments Handle val = SchemeEnvironment::Lookup(obj); if (val != NULL) return val;

135

APPENDIX A. SOURCE CODE

136 // Look in universal environment return GetIdentifier(IDENTIFIER, SCHEME_STR_VAL(obj->GetRepr())); }

//------------------------------------------------------------------// Method........: GetIdentifier // Author........: Thomas gotnes // Date..........: // Description...: // Comments......: Gets the first matching object from the project tree. // // Revisions.....: //=================================================================== Handle SchemeUniversalEnv::GetIdentifier(Id id, const String &name) { Identifier::Vector identifiers; // Get matching identifiers from the project GetMatchingIdentifiers(id, name, identifiers); // Identifier was not found if (!identifiers.size()) { return NULL; } // Get the first matching identifier Handle found = identifiers[0]; // Create wrapper object for message dispatching // Order is important; most special -> most general Handle wrapper; // Generic Algorithm if (found->IsA(ALGORITHM)) wrapper = new SchemeAlgorithmDispatcher(dynamic_cast(Algorithm *, found.GetPointer())); else if (found->IsA(STRUCTURE)) wrapper = new SchemeStructureDispatcher(dynamic_cast(Structure *, found.GetPointer()));

A.30. SCHEMEUNIVERSALENV.CPP // Make primitive Scheme function Scheme_Object *prim = scheme_make_prim(wrapper); // Return first identifier found return new SchemeObject(prim); } //------------------------------------------------------------------// Method........: GetMatchingIdentifiers // Author........: Thomas gotnes // Date..........: // Description...: Slight modification of Project::GetAllIdentifiers // to make name comparing case insensitive // Comments......: // // Revisions.....: //=================================================================== bool SchemeUniversalEnv::GetMatchingIdentifiers(Id id, const String &name, Identifier::Vector &identifiers) const { // Get all identifiers of the specified type in the project. if (!project->GetAllIdentifiers(id, identifiers)) return false; String current; // Remove those whose name does not match. for (int i = identifiers.size() - 1; i >= 0; i--) { // Get name. if (identifiers[i]->IsA(STRUCTURE)) current = dynamic_cast(Structure *, identifiers[i].GetPointer())->GetName(); else current = IdHolder::GetClassname(identifiers[i]->GetId()); // Erase? String n = name; n.ToUppercase(); current.ToUppercase(); if (current != n)

137

APPENDIX A. SOURCE CODE

138

identifiers.erase(identifiers.begin() + i); } return true; }

A.31 viewschemeinterpreter.h #ifndef __VIEWSCHEMEINTERPRETER_H__ #define __VIEWSCHEMEINTERPRETER_H__

#include "INTERPRETER\SchemeInterpreterDoc.h" // Added by ClassView #include "KERNEL\BASIC\string.h" // Added by ClassView #include ///////////////////////////////////////////////////////////////////////////// // CViewSchemeInterpreter view class CViewSchemeInterpreter : public CEditView { protected: CViewSchemeInterpreter(); // protected constructor used by dynamic creation DECLARE_DYNCREATE(CViewSchemeInterpreter) // Attributes private: Handle output_port_; Handle input_port_; bool evaluating_; // Operations protected: void EndCursor(); void Write(String str); public:

A.32. VIEWSCHEMEINTERPRETER.CPP

139

virtual CSchemeInterpreterDoc * GetDocument(); // Overrides // ClassWizard generated virtual function overrides //{{AFX_VIRTUAL(CViewSchemeInterpreter) public: virtual void OnInitialUpdate(); protected: virtual void OnDraw(CDC* pDC); // overridden to draw this view //}}AFX_VIRTUAL // Implementation protected: virtual ~CViewSchemeInterpreter(); #ifdef _DEBUG virtual void AssertValid() const; virtual void Dump(CDumpContext& dc) const; #endif // Generated message map functions protected: //{{AFX_MSG(CViewSchemeInterpreter) afx_msg void OnDestroy(); afx_msg void OnKeyUp(UINT nChar, UINT nRepCnt, UINT nFlags); //}}AFX_MSG DECLARE_MESSAGE_MAP() }; ///////////////////////////////////////////////////////////////////////////// //{{AFX_INSERT_LOCATION}} // Microsoft Developer Studio will insert additional declarations immediately before the previous line. #endif

A.32 viewschemeinterpreter.cpp //------------------------------------------------------------------// Author........: Thomas gotnes // Date..........: 980401 // Description...:

APPENDIX A. SOURCE CODE

140

// Revisions.....: //=================================================================== #include "stdafx.h" #include "..\..\rosetta.h" #include "viewschemeinterpreter.h" #include #include #include #include #include



// Ridiculous hack (something is rotten in MFC): const char CRLF[3] = {13, 10, 0};

//------------------------------------------------------------------// Methods for class ManualScaler. //=================================================================== IMPLEMENT_DYNCREATE(CViewSchemeInterpreter, CEditView) //------------------------------------------------------------------// Constructors/destructor. //=================================================================== CViewSchemeInterpreter::CViewSchemeInterpreter() { evaluating_ = FALSE; input_port_ = NULL; output_port_ = NULL; } CViewSchemeInterpreter::~CViewSchemeInterpreter() { }

BEGIN_MESSAGE_MAP(CViewSchemeInterpreter, CEditView) //{{AFX_MSG_MAP(CViewSchemeInterpreter) ON_WM_DESTROY() ON_WM_KEYUP() //}}AFX_MSG_MAP END_MESSAGE_MAP()

A.32. VIEWSCHEMEINTERPRETER.CPP

141

//------------------------------------------------------------------// Method........: OnDraw // Author........: // Date..........: // Description...: // Revisions.....: //=================================================================== void CViewSchemeInterpreter::OnDraw(CDC* pDC) { CEditView::OnDraw(pDC); } ///////////////////////////////////////////////////////////////////////////// // CViewSchemeInterpreter diagnostics #ifdef _DEBUG void CViewSchemeInterpreter::AssertValid() const { CEditView::AssertValid(); } void CViewSchemeInterpreter::Dump(CDumpContext& dc) const { CEditView::Dump(dc); } #endif //_DEBUG //------------------------------------------------------------------// Method........: GetDocument // Author........: // Date..........: // Description...: // Revisions.....: //=================================================================== CSchemeInterpreterDoc * CViewSchemeInterpreter::GetDocument() { return (CSchemeInterpreterDoc*)m_pDocument; } //------------------------------------------------------------------// Method........: OnInitialUpdate // Author........: Thomas gotnes // Date..........: 980401 // Description...: Writes welcome text and attaches ports to the

APPENDIX A. SOURCE CODE

142

// evaluator. // Revisions.....: //=================================================================== void CViewSchemeInterpreter::OnInitialUpdate() { CEditView::OnInitialUpdate(); // Initial window text String s = "This is Scheme."; s += CRLF; s += "Copyright Thomas gotnes 1998, libscheme 0.5 Copyright Brent Benson 1994."; s += CRLF; s += CRLF; s += "Press F5 to evaluate."; s += CRLF; s += CRLF; s += "> "; SetWindowText(s.GetBuffer()); // Place cursor at the end GetEditCtrl().SetSel(s.GetLength(), s.GetLength()); // Create ports if ((output_port_ = new SchemeEditCtrlOutputPort(&GetEditCtrl())) != NULL) GetDocument()->SetOutputPort(output_port_); else { Message::FatalError("Unable to open output port."); DestroyWindow(); } if ((input_port_ = new SchemeEditCtrlInputPort(&GetEditCtrl())) != NULL) GetDocument()->SetInputPort(input_port_); else { Message::FatalError("Unable to open input port."); DestroyWindow(); } } //------------------------------------------------------------------// Method........: Write // Author........: Thomas gotnes // Date..........: 980401

A.32. VIEWSCHEMEINTERPRETER.CPP // Description...: Insert text at current cursor position // // Revisions.....: //=================================================================== void CViewSchemeInterpreter::Write(String str) { GetEditCtrl().ReplaceSel(str.GetBuffer()); } //------------------------------------------------------------------// Method........: EndCursor // Author........: Thomas gotnes // Date..........: 980401 // Description...: Positions cursor at the end. // // Revisions.....: //=================================================================== void CViewSchemeInterpreter::EndCursor() { GetEditCtrl().SetSel( (int) GetBufferLength(), (int) GetBufferLength()); } //------------------------------------------------------------------// Method........: OnDestroy // Author........: Thomas gotnes // Date..........: 980401 // Description...: Closes the ports before the rug gets pulled beneath // their feet. // // Revisions.....: //=================================================================== void CViewSchemeInterpreter::OnDestroy() { CEditView::OnDestroy(); input_port_->Close(); // Terminate internal message loop output_port_->Close(); } //------------------------------------------------------------------// Method........: OnKeyUp

143

APPENDIX A. SOURCE CODE

144

// Author........: Thomas gotnes // Date..........: 980401 // Description...: Captures user interaction (after the current key // has been processed by Windows). // // This is the read-eval-print loop. Evaluates // expressions read from the current input port and // prints the result. Resets the input port to // indicate that the next read is to begin after the // new mark (">"). // // This is a -- not very robust -- prototype version. // The user must not edit the buffer, nor place the // cursor, above the most recent mark. // // Comments......: This function must catch ReadPanic thrown by // SchemeEditCtrlInputPort::Getc. A future // reimplementation of the port should remedy this. // // Revisions.....: //=================================================================== void CViewSchemeInterpreter::OnKeyUp(UINT nChar, UINT nRepCnt, UINT nFlags) { Handle val; if (nChar == VK_RETURN) { input_port_->Update(); if (evaluating_) { } else { CString cstr; CSchemeInterpreterDoc *pDoc = GetDocument(); evaluating_ = TRUE; try { val = pDoc->Eval(); } catch (SchemeEditCtrlInputPort::ReadPanic) {

A.33. SCHEME.H

145 return; } // Output result EndCursor(); String result = CRLF; result += " => "; Write(result); if (val == NULL) Message::Error("Evaluation failed"); else val->Print(output_port_); result = CRLF; result += CRLF; result += "> "; Write(result); input_port_->Reset(); evaluating_ = FALSE;

} } CEditView::OnKeyUp(nChar, nRepCnt, nFlags); }

A.33 Scheme.h This is the libscheme de nition le. /* libscheme Copyright (c) 1994 Brent Benson All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this software and its documentation for any purpose, provided that the above copyright notice and the following two paragraphs appear in all copies of this software. IN NO EVENT SHALL BRENT BENSON BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT

APPENDIX A. SOURCE CODE

146

OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF BRENT BENSON HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. BRENT BENSON SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND BRENT BENSON HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. */ #ifndef SCHEME_H #define SCHEME_H #include #include #include #include



#include #include #include #include #include #include

"stdafx.h" // TAA "rosetta.h" "gc_c++.h"

//#ifdef __cplusplus //extern "C" //{ //#endif struct Scheme_Bucket { char *key; void *val; struct Scheme_Bucket *next; }; typedef struct Scheme_Bucket Scheme_Bucket; struct Scheme_Hash_Table { int size; Scheme_Bucket **buckets; }; typedef struct Scheme_Hash_Table Scheme_Hash_Table;

A.33. SCHEME.H

147

class SchemeMessageDispatcher; // TAA class SchemeUniversalEnv; // TAA class SchemeEnvironment; typedef SchemeEnvironment Scheme_Env; struct Scheme_Object { union { char char_val; int int_val; double double_val; char *string_val; void *ptr_val; struct { void *ptr1, *ptr2; } two_ptr_val; struct Scheme_Object *(*prim_val) (int argc, struct Scheme_Object *argv[]); struct Scheme_Object *(*syntax_val) //(struct Scheme_Object *form, struct Scheme_Env *env); (struct Scheme_Object *form, Scheme_Env *env); // TAA struct { struct Scheme_Object *car, *cdr; } pair_val; struct { int size; struct Scheme_Object **els; } vector_val; //struct { struct Scheme_Env *env; struct Scheme_Object *code; } closure_val; struct { Scheme_Env *env; struct Scheme_Object *code; } closure_val; //TAA struct { struct Scheme_Object *def; struct Scheme_Method *meths; } methods_val; } u; struct Scheme_Object *type; Handle wrapper; // TAA: hack }; typedef struct Scheme_Object Scheme_Object;

// TAA: // Changed Scheme_Env from struct to class

/* class Scheme_Env { public: int num_bindings; struct Scheme_Object **symbols; struct Scheme_Object **values;

148

APPENDIX A. SOURCE CODE

Scheme_Hash_Table *globals; struct Scheme_Env *next; virtual Scheme_Object* Lookup(Scheme_Object* obj) }; */ // typedef struct Scheme_Env Scheme_Env;

/* access macros */ #define SCHEME_TYPE(obj) ((obj)->type) #define SCHEME_CHAR_VAL(obj) ((obj)->u.char_val) #define SCHEME_INT_VAL(obj) ((obj)->u.int_val) #define SCHEME_DBL_VAL(obj) ((obj)->u.double_val) #define SCHEME_STR_VAL(obj) ((obj)->u.string_val) #define SCHEME_PTR_VAL(obj) ((obj)->u.ptr_val) #define SCHEME_PTR1_VAL(obj) ((obj)->u.two_ptr_val.ptr1) #define SCHEME_PTR2_VAL(obj) ((obj)->u.two_ptr_val.ptr2) #define SCHEME_SYNTAX(obj) ((obj)->u.syntax_val) #define SCHEME_PRIM(obj) ((obj)->u.prim_val) #define SCHEME_CAR(obj) ((obj)->u.pair_val.car) #define SCHEME_CDR(obj) ((obj)->u.pair_val.cdr) #define SCHEME_VEC_SIZE(obj) ((obj)->u.vector_val.size) #define SCHEME_VEC_ELS(obj) ((obj)->u.vector_val.els) #define SCHEME_CLOS_ENV(obj) ((obj)->u.closure_val.env) #define SCHEME_CLOS_CODE(obj)((obj)->u.closure_val.code) #define SCHEME_METH_DEF(obj) ((obj)->u.methods_val.def) #define SCHEME_METHS(obj) ((obj)->u.methods_val.meths) struct Scheme_Method { Scheme_Object *type; Scheme_Object *fun; struct Scheme_Method *next; }; typedef struct Scheme_Method Scheme_Method; typedef struct Scheme_Object * (Scheme_Prim) (int argc, struct Scheme_Object *argv[]); typedef struct Scheme_Object * (Scheme_Syntax) (struct Scheme_Object *form, Scheme_Env *env); // TAA //(Scheme_Syntax) (struct Scheme_Object *form, struct Scheme_Env *env); /* error handling */ extern jmp_buf scheme_error_buf; void scheme_signal_error (char *msg, ...); void scheme_warning (char *msg, ...);

A.33. SCHEME.H

149

void scheme_default_handler (void); #define SCHEME_CATCH_ERROR(try_expr, err_expr) \ (setjmp(scheme_error_buf) ? (err_expr) : (try_expr)) #define SCHEME_ASSERT(expr,msg) \ ((expr) ? 0 : (scheme_signal_error(msg), 1)) /* types */ extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object

*scheme_type_type; *scheme_char_type; *scheme_integer_type, *scheme_double_type; *scheme_string_type, *scheme_symbol_type; *scheme_null_type, *scheme_pair_type; *scheme_vector_type; *scheme_prim_type, *scheme_closure_type; *scheme_cont_type; *scheme_input_port_type, *scheme_output_port_type; *scheme_eof_type; *scheme_true_type, *scheme_false_type; *scheme_syntax_type, *scheme_macro_type; *scheme_promise_type, *scheme_struct_proc_type;

/* common symbols */ extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object

*scheme_quote_symbol; *scheme_quasiquote_symbol; *scheme_unquote_symbol; *scheme_unquote_splicing_symbol;

/* constants */ extern Scheme_Object extern Scheme_Object extern Scheme_Object extern Scheme_Object

*scheme_eof; *scheme_null; *scheme_true; *scheme_false;

/* basics */ Scheme_Object *scheme_read (Scheme_Object *port); Scheme_Object *scheme_eval (Scheme_Object *obj, Scheme_Env *env); void scheme_write (Scheme_Object *obj, Scheme_Object *port); void scheme_display (Scheme_Object *obj, Scheme_Object *port); void scheme_write_string (char *str, Scheme_Object *port); char *scheme_write_to_string (Scheme_Object *obj); char *scheme_display_to_string (Scheme_Object *obj); void scheme_debug_print (Scheme_Object *obj); Scheme_Object *scheme_apply (Scheme_Object *rator, int num_rands, Scheme_Object **rands); Scheme_Object *scheme_apply_to_list (Scheme_Object *rator, Scheme_Object *rands); Scheme_Object *scheme_apply_struct_proc (Scheme_Object *rator, Scheme_Object *rands); Scheme_Object *scheme_alloc_object (void);

APPENDIX A. SOURCE CODE

150 void *scheme_malloc (size_t size); void *scheme_calloc (size_t num, size_t size); char *scheme_strdup (char *str);

/* garbage collected heap interface */ /*extern void *GC_malloc (size_t size_in_bytes); extern int GC_expand_hp (int num_4k_blocks); */ //TAA /* hash table interface */ Scheme_Hash_Table *scheme_hash_table (int size); void scheme_add_to_table (Scheme_Hash_Table *table, char *key, void *val); void scheme_change_in_table (Scheme_Hash_Table *table, char *key, void *new_val); void *scheme_lookup_in_table (Scheme_Hash_Table *table, char *key); /* constructors */ Scheme_Object *scheme_make_prim (Handle w); //TAA Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object Scheme_Object

*scheme_make_prim (Scheme_Prim *prim); *scheme_make_closure (Scheme_Env *env, Scheme_Object *code); *scheme_make_cont (jmp_buf buf); *scheme_make_type (char *name); *scheme_make_pair (Scheme_Object *car, Scheme_Object *cdr); *scheme_make_string (char *chars); *scheme_alloc_string (int size, char fill); *scheme_make_vector (int size, Scheme_Object *fill); *scheme_make_integer (int i); *scheme_make_double (double d); *scheme_make_char (char ch); *scheme_make_syntax (Scheme_Syntax *syntax); *scheme_make_promise (Scheme_Object *expr, Scheme_Env *env);

/* generic port support */

/******************** TAA:

*******************/

SchemeUniversalEnv *scheme_universal_env (Handle project); extern Scheme_Object *cur_in_port; extern Scheme_Object *cur_out_port; /*************************************************/ struct Scheme_Input_Port { Scheme_Object *sub_type; void *port_data;

A.33. SCHEME.H int (*getc_fun) (struct Scheme_Input_Port *port); void (*ungetc_fun) (int ch, struct Scheme_Input_Port *port); int (*char_ready_fun) (struct Scheme_Input_Port *port); void (*close_fun) (struct Scheme_Input_Port *port); }; typedef struct Scheme_Input_Port Scheme_Input_Port; struct Scheme_Output_Port { Scheme_Object *sub_type; void *port_data; void (*write_string_fun) (char *str, struct Scheme_Output_Port *); void (*close_fun) (struct Scheme_Output_Port *); }; typedef struct Scheme_Output_Port Scheme_Output_Port; int scheme_getc (Scheme_Object *port); void scheme_ungetc (int ch, Scheme_Object *port); int scheme_char_ready (Scheme_Object *port); void scheme_close_input_port (Scheme_Object *port); void scheme_close_output_port (Scheme_Object *port); Scheme_Input_Port * scheme_make_input_port ( Scheme_Object *subtype, void *data, int (*getc_fun) (Scheme_Input_Port*), void (*ungetc_fun) (int, Scheme_Input_Port*), int (*char_ready_fun) (Scheme_Input_Port*), void (*close_fun) (Scheme_Input_Port*) ); Scheme_Output_Port * scheme_make_output_port ( Scheme_Object *subtype, void *data, void (*write_string_fun) (char *str, Scheme_Output_Port*), void (*close_fun) (Scheme_Output_Port*) ); Scheme_Object *scheme_make_file_input_port (FILE *fp); Scheme_Object *scheme_make_string_input_port (char *str); Scheme_Object *scheme_make_file_output_port (FILE *fp); Scheme_Object *scheme_make_string_output_port (char *str); extern Scheme_Object *scheme_stdin_port; extern Scheme_Object *scheme_stdout_port; extern Scheme_Object *scheme_stderr_port; /* environment */

151

APPENDIX A. SOURCE CODE

152

void scheme_add_global (char *name, Scheme_Object *val, Scheme_Env *env); Scheme_Env *scheme_new_frame (int num_bindings); void scheme_add_binding (int index, Scheme_Object *sym, Scheme_Object *val, Scheme_Env *frame); Scheme_Env *scheme_extend_env (Scheme_Env *frame, Scheme_Env *env); Scheme_Env *scheme_add_frame (Scheme_Object *syms, Scheme_Object *vals, Scheme_Env *env); Scheme_Env *scheme_pop_frame (Scheme_Env *env); void scheme_set_value (Scheme_Object *var, Scheme_Object *val, Scheme_Env *env); Scheme_Object *scheme_lookup_value (Scheme_Object *symbol, Scheme_Env *env); Scheme_Object *scheme_lookup_global (Scheme_Object *symbol, Scheme_Env *env); extern Scheme_Env *scheme_env; /* symbols */ Scheme_Object *scheme_intern_symbol (char *name); /* initialization */ Scheme_Env *scheme_basic_env (void); void scheme_init_type (Scheme_Env *env); void scheme_init_list (Scheme_Env *env); void scheme_init_port (Scheme_Env *env); void scheme_init_proc (Scheme_Env *env); void scheme_init_vector (Scheme_Env *env); void scheme_init_string (Scheme_Env *env); void scheme_init_number (Scheme_Env *env); void scheme_init_eval (Scheme_Env *env); void scheme_init_promise (Scheme_Env *env); void scheme_init_struct (Scheme_Env *env); /* misc */ int scheme_eq (Scheme_Object *obj1, Scheme_Object *obj2); int scheme_eqv (Scheme_Object *obj1, Scheme_Object *obj2); int scheme_equal (Scheme_Object *obj1, Scheme_Object *obj2); int scheme_list_length (Scheme_Object *list); Scheme_Object *scheme_alloc_list (int size); Scheme_Object *scheme_map_1 (Scheme_Object *(*fun)(Scheme_Object*), Scheme_Object *lst); Scheme_Object *scheme_car (Scheme_Object *pair); Scheme_Object *scheme_cdr (Scheme_Object *pair); Scheme_Object *scheme_cadr (Scheme_Object *pair); Scheme_Object *scheme_caddr (Scheme_Object *pair); Scheme_Object *scheme_vector_to_list (Scheme_Object *vec); Scheme_Object *scheme_list_to_vector (Scheme_Object *list); /* convenience macros */ #define SCHEME_CHARP(obj) #define SCHEME_INTP(obj)

(SCHEME_TYPE(obj) == scheme_char_type) (SCHEME_TYPE(obj) == scheme_integer_type)

A.33. SCHEME.H #define SCHEME_DBLP(obj) #define SCHEME_NUMBERP(obj) #define SCHEME_STRINGP(obj) #define SCHEME_SYMBOLP(obj) #define SCHEME_BOOLP(obj) #define SCHEME_SYNTAXP(obj) #define SCHEME_PRIMP(obj) #define SCHEME_CONTP(obj) #define SCHEME_NULLP(obj) #define SCHEME_PAIRP(obj) #define SCHEME_LISTP(obj) #define SCHEME_VECTORP(obj) #define SCHEME_CLOSUREP(obj) #define SCHEME_PROCP(obj) || SCHEME_CONTP(obj)) #define SCHEME_INPORTP(obj) #define SCHEME_OUTPORTP(obj) #define SCHEME_EOFP(obj) #define SCHEME_PROMP(obj) /* other */ #define SCHEME_CADR(obj) #define SCHEME_CAAR(obj) #define SCHEME_CDDR(obj)

153 (SCHEME_TYPE(obj) == scheme_double_type) (SCHEME_INTP(obj) || SCHEME_DBLP(obj)) (SCHEME_TYPE(obj) == scheme_string_type) (SCHEME_TYPE(obj) == scheme_symbol_type) ((obj == scheme_true) || (obj == scheme_false)) (SCHEME_TYPE(obj) == scheme_syntax_type) (SCHEME_TYPE(obj) == scheme_prim_type) (SCHEME_TYPE(obj) == scheme_cont_type) (obj == scheme_null) (SCHEME_TYPE(obj) == scheme_pair_type) (SCHEME_NULLP(obj) || SCHEME_PAIRP(obj)) (SCHEME_TYPE(obj) == scheme_vector_type) (SCHEME_TYPE(obj) == scheme_closure_type) (SCHEME_PRIMP(obj) || SCHEME_CLOSUREP(obj) (SCHEME_TYPE(obj) (SCHEME_TYPE(obj) (SCHEME_TYPE(obj) (SCHEME_TYPE(obj)

== == == ==

scheme_input_port_type) scheme_output_port_type) scheme_eof_type) scheme_promise_type)

(SCHEME_CAR (SCHEME_CDR (obj))) (SCHEME_CAR (SCHEME_CAR (obj))) (SCHEME_CDR (SCHEME_CDR (obj)))

/* constants */ #define SCHEME_MAX_ARGS 256 /* max number of args to function */ //#ifdef __cplusplus //} //#endif #endif /* ! SCHEME_H */

Suggest Documents