School's computer programming course for about one year. Over 1,000 students .... good performance while testing [Stephen H., 2003]. The other one of the ...
ISBN: 978-972-8924-42-3 © 2007 IADIS
USING TEST-DRIVEN DEVELOPMENT IN A PARSE-TREE BASED ON-LINE ASSESSMENT SYSTEM Li-Ren Chien Department of Computer Science and Information Engineering, Chung Cheng University. #168, University Rd, Min-Hsing, Chia-Yi, Taiwan, R.O.C
Daniel J. Buehrer Department of Computer Science and Information Engineering, Chung Cheng University. #168, University Rd, Min-Hsing, Chia-Yi, Taiwan, R.O.C
Chin Yi Yang Department of Information Management, Chung Cheng University. #168, University Rd, Min-Hsing, Chia-Yi, Taiwan, R.O.C
ABSTRACT
DICE is a parse-tree-based on-line computer aided assessment system that has been used for several years to help teach computer programming language courses at Hsing Kuo High School in Taiwan. We need a more sophisticated testing mechanism for the underachievers. This paper describes the use of test-driven development (TDD) as an extension of our DICE system. The DICE system was used to access studentwritten code, which was graded by comparing the output against instructor-given answer. This extension gives us the ability to make a TDD test unit by the instructor, co-learner or student her/himself. We expect the new functionality can prompt the underachievers to improve their programming ability. KEYWORDS DICE, Test-Driven Development, Test-based Grader, Parse-Tree based, Automatic Grading, Computer Aided Assessments.
1. INTRODUCTION Many researchers have argued that assessment is fundamental to education [Sandra, F. et al, 1997]. An automatic assessment system not only can save time and improve grading consistency, but it also can give immediate feedback to students [Edward L. Jones. 2001]. Most grading systems attempt to automate the testbased assessment of student programming assignments by against an instructor’s answers [Christopher, D. et al, 2005]. Such systems, including our DICE system, tried to correct the shortcoming of students by focusing on output correctness first and foremost [Stephen H., 2003]. Dice is a parse-tree-based on-line automatic assessment system for an environment with test-based assignment tutoring and problem solving. At the moment, Dice has been working for Hsing Kuo High School’s computer programming course for about one year. Over 1,000 students have used the system and proven to quickly achieve more programming skills than students in the past, who were taught by traditional teaching methods. We found some well-known problems of a test-based grader, which caused the underachievers to be eliminated from DICE. So we need a more sophisticated testing mechanism for the underachievers.
342
IADIS International Conference e-Learning 2007
This paper describes a TDD extension of DICE. Section 2 briefly describes DICE, our computer-aided assessment system. Section 3 describes the TDD model in DICE and our view of TDD in education. Section 4 gives a C example for our DICE TDD model. Section 5 concludes with areas for future work.
2. AN OVERVIEW OF DICE The DICE project springs from the requirements of a teaching assistants’ job in a programming language course [Li-Ren Chien, et al, 2007]. Previously, we usually asked students to deliver their programming assignments weeks’ later after class. This policy caused a serious plagiarism problem. So we decided to implement an on-line assessment system that could judge the students code in the course automatically and immediately. DICE was implemented to be in an OS-independent, distributed, client-server environment, with a parsetree-based automatic assessment system. We also plan to have an automatic grading and intelligent tutoring system to be based on parse trees.
Figure 1. Client of student
Figure 2. Grading result
Figure 4. Client of instructor Figure 3. Monitor a particular client
343
ISBN: 978-972-8924-42-3 © 2007 IADIS
The teacher starts his testing plan of a programming language (C or Java) by making a problem set. He is asked to organize his problem descriptions, input datasets, and standard output to a specified directory. Each student’s data will be stored in a specified directory. The students’ data can be stored in either text files, Excel spreadsheets or a database that could be connected to by JDBC. After the teacher starts the server at a particular port, the students can login to the judge server from an IP network, and so can other teachers. The servers can be deployed on the same host by using different ports or on different hosts by using the same port. A load balancer will distribute the clients to the different hosts based on the loading on each host. At the server side, the system manager can monitor the actions of the whole system. He can dialog with each client, to supervise what the client is doing, or terminate the client’s session. The teacher can login as a client and get more rights than students after passing the teacher validation. A teacher can get all of the functions of the server from any client computer. He also can get the parse tree of each student’s answer. Throughout the term, the teacher can merge the testing results over the semester into an Excel file. A student will login and be assigned to a server after a course unit. There are many problems waiting for the student. He is asked to solve those problems within a stipulated time range. After login, he can look over the problem set, and send his answer as a source file or an executable file. The system will judge his answer by executing the executable file or recompiling his code and executing it. The student’s program will be fed with the input dataset that was prepared by teachers. The system compares the output of student’s program with standard output file to decide the score that he gets. The result is immediately sent to the student. For a lightweight and database-free system, all information is stored in files. The system information, like the examination questions from the teachers, the answers and scores of the students and so on, are organized with pure text files and directories. We also have a connection by JDBC to traditional databases for student information for some built-in environments. The plagiarism problem, which we mentioned at the beginning of this section, should be solved in the system. We think that the key is providing rapid, concrete and immediate feedback by the automated assessment tool to which students can submit their code. Because most students will be concerned only with their own problem solving, the probability of cheating will come down to that of traditional testing. But we still provide for four levels of plagiarism detection to avoid some cheating actions like answer resending, adding white space, variable renaming, semantic copying and so on. We used SableCC [Gagnone, Hendren L J. 1998], a parser tool that was developed by Etienne Gagnon. An Abstract Syntax Tree (AST) is built by a C or Java parser for each student’s program. The AST is translated to a Polish Reverse Notation (PRN) form for the evaluation. As we translate the student’s answer to a PRN string then we can do pattern matching on it. Because the PRN was translated from the AST, we can treat it as semantic symbols of the original string. Some screenshots of DICE were shown from Figure 1 to Figure 4. In summary, we have implemented an Automated Assessment System for test-based assignment tutoring. According to this system, we can push the students of a computer language course to put more effort into improving their coding ability and significantly reduce the burden of grading the programs.
3. TDD MODEL IN DICE After running DICE, our parse-tree based grader, at Hsing Kuo High School for years, we found some wellknown problems of a test-based grader. These caused the underachievers to be eliminated from the DICE system. One problem is that only clearly defined questions with a completely specified interface can be used. It leads students to focus on output correctness first and foremost, and it does not encourage or reward for good performance while testing [Stephen H., 2003]. The other one of the perceived shortcomings is that its inflexibility prevents assessment of more complex questions [Christopher, D. et al, 2005]. When a complex question arrives, we found that some underachievers just sat before his/her computer and waited for the bell to ring. So we need a more sophisticated mechanism to help underachievers. Over the past five years, the idea of including software testing practices in programming assignments within the undergraduate computer science curriculum has grown from a fringe practice to a recurring theme [Stephen H. et al, 2007]. Some researchers may argue that starting too early with a test-first approach can
344
IADIS International Conference e-Learning 2007
lead to the “paralysis of analysis” [Don Colton. et al, 2006]. We believe the TDD with instructor made-test suites will help overcome the shortcomings of test-based graders.
3.1 TDD in Automatic Grading System Some systems have already integrated TDD with a grader through xUnit [Allowatt, A. 1004], assert-like functions [David S. Janzen. et al, 2006] or a Web-based center [Edwards, S.H. 2003] to a computer aided assessment system. We introduce a more comprehensive model of TDD in automatic grading system for DICE.
Figure 5. The TDD model in DICE
As Figure 5 shows, a program assignment was given after each teaching unit. The instructor makes a test plan consisting of problems (or so called test cases). A test case was composed by the test-based grader, like data sets and TDD-like data sets.
3.2 Test-based Data sets vs. TDD Data sets Test-based grader-like data sets in DICE are composed of input files and standard output files. A test-based grader was to provide a few hand-made, hand-verified pairs of files for each case [Don Colton. et al, 2006]. One file would feed as input into the student program. The output results are collected to compare with the desired output. If there are n test cases in a teaching unit, the correctness score can be expressed as follows: n
C = ∑ si ∗ ri , where si and ri are score and correctness ratio of test case i i =1
345
ISBN: 978-972-8924-42-3 © 2007 IADIS
n
Some graders, like those for ACM contests, may set the scoring policy as C = s ∗ δ for the ∑ i 1r i =1
i
competition. We left the elasticity of scoring as a function of DICE for grading variety. The TDD data sets in DICE consist of test suites that map to each test case. A test suite is composed of test units. A test unit represents a conceptual unit for solving a test case. Let Uij be the test unit j of test case i. Here i= 1 to n and j=1 to s(i). Where s(i) is the number of test units for problem i. A test unit is composed of test items. Each test item is an extreme Boolean expression testing for a test unit. The extreme testing makes sure all functions in a test unit can work correctly, and that it does nothing more than it needs to do.
3.3 TDD of Programming Skill vs. TDD of Concept We denote the set of concepts or functions in a test unit P as Γ(P). Let P and Q be two test units of a test suite. We define 1) If Γ(P) ⊇ Γ(Q) than P entails Q semantically on a test unit, denoted by P ╞ΓQ. Similarly, we define P ╡Γ Q. 2) If │Γ(P)│>│Γ(Q)│then P’ semantics are greater than Q’s, denoted by P >Γ Q. Similarly, we define P =Γ Q and P │Ω(D)│then C has more information than a teaching unit D, denoted by C >Ω D. Similarly, we define C =Ω D and C Γ UY(j-1)) then we say it’s a TDD-like test
9)
j
suite, denoted as YΓ>. In other words, a TDD-like test suite in DICE has more concepts in a unit than in its antecedent. Obviously, YΓ╞ implies YΓ>. The sequence of test cases in a teaching unit is denoted by X. We have n
s (i )
i =1
j =1
X = ∪ Ω( xi ) , where n is number of test cases and Ω( xi ) = U Γ(U ij )
10)
11) If the sequence of test cases in a teaching unit X has ∀ (xi ╞Ω xi-1) then we say that such a unit is a well-defined TDD teaching unit, denoted by XΩ╞. i
346
IADIS International Conference e-Learning 2007
In other words, a well-defined TDD Teaching Unit is one where the concepts in a test case are extensions of its antecedent. Obviously, YΩ╞ implies YΩ>. 12) For the sequence of a test cases in a teaching unit X, let Sj as set with elements from X0 to X(j-1) (eg. {X0,X1...X(j-1)}) and Pj as power set of Sj. If we have ∀j∃x( x ∈ Pj , x ╞Γ X(j-1) ) then its said to a TDD-modular test unit, XΩm. 13) If the sequence of test cases in a teaching unit X has ∀ (xi >Ω xi-1) then we say it’s a TDD-like teaching unit, denoted as XΩ>. i
In other words, a TDD-like Teaching Unit has more concepts in the test cases than its antecedent. In our opinion, the TDD model in DICE can be coordinated to two dimensions, one for test cases in a teaching unit (XΩ) and the other for the test units in a test case (YΓ). The axis XΩ represents the coupling degree of test case sequences in a teaching unit. At the same time, the axis YΓ represents the coupling degree of test unit sequences in a test case. Obviously, XΩ represents more concepts in a teaching unit than YΓ, since YΓ represents more programming skills than XΩ. Table 1. TDD classification by Ω and Γ XΩ YΓ YΓø
XΩø
XΩ>
XΩm
XΩ╞
Exploration
Concept-like
Concept-modular
Concept-Instruction
YΓ>
Skill-like
Like
Concept- Instruction Skill-like
YΓm
Skill-Modular
Concept-like, Skill-Modular
Concept-modular, Skill-like Modular
YΓ╞
Skill- Instruction
Concept-like Skill-Instruction
Concept-modular, Skill-Instruction
Concept- Instruction, Skill-Modular Instruction
As Table 1 shows, we divided the TDD plane in DICE into nine areas by using three critical factors, denoted by ø for “none”, > for “like” and ╞ for “completed”. For example, the XΩø - YΓø mode represents a teaching unit assignment with a “none” relationship between test cases, and so do “between” test units. The XΩ╞ - YΓ╞ mode means an assignment has TDD well-defined test cases with TDD-like test units. Most typical test-based automatic graders can support XΩ- YΓø modes by rearranging the problem sets. DICE will support all nine modes..
4. AN EXAMPLE WITH C We take the famous 3n+1 problem from the ACM programming contest (http://acm.uva.es/p/v1/100.html ) as an example of TDD skill-instruction mode (YΓ╞ - XΩø) in DICE. The students were asked to solve the maximum cycle length of 3n+1 between two numbers. There are two parts to the problem, the problem and the test units.
4.1 The Problem Description The problem description, made by an instructor, consists of background, the problem, the input, the output, an input sample, and an output sample. In traditional DICE, the instructor should supply three files to the system. Like most test-based automatic grading systems, DICE will publish the problem description to the students and ask the student to turn in their whole program for assessment. We do not show the 3n+1 problem descriptions here. One can find the content easily from the Web.
347
ISBN: 978-972-8924-42-3 © 2007 IADIS
4.2 The Sample Test-driven Code Table 2. A test unit Serial Number: 1 Unit name : handleEvenOdd Description : input a integer n, if n is odd than return 3*n+1 else return n/2 Proto Type : int handleEvenOdd(int n) Default Return : -999 Test items : 1. handleEvenOdd(1) == 4 2. handleEvenOdd(2) == 1 3. handleEvenOdd(3) == 10 Serial Number: 2 Unit name : getCycLength Description : input a integer n, return a integer as the cycle length of n Proto Type : int getCycLength(int n) Default Return : -999 Test items : 1. getCycLength(22) == 16 2. getCycLength(2) == 2 Serial Number: 3 …
The test cases in Table 2 will be translated into the following C TDD sample code by DICE, to deliver to the student. // Created by DICE // You should not modify any of the following code until the end of function testFirst() #include #define jassert(expression,mainNo,subNo) \ ((void) ((expression) ? \ printf("1,%d,%d,%s Passed.\n",mainNo,subNo,#expression) : \ printf("0,%d,%d,%s Failed.\n",mainNo,subNo,#expression))) void testFirst(); int handleEvenOdd(int n); int getCycLength(int n); int main(int argc, char *argv[]) { testFirst(); system("Pause"); // only this line can be replaced return 0; } void testFirst(){ // Testing handelEvenOdd jassert(handleEvenOdd(1) == 4,1,1); jassert(handleEvenOdd(2) == 1,1,2); jassert(handleEvenOdd(3) == 10,1,3); //Testing getCycLength jassert(getCycLength(22) == 16,2,1); jassert(getCycLength(2) == 2,2,1); } // You should not modify any previous code // You shoud modify following code to pass the testing phase
348
IADIS International Conference e-Learning 2007
int handleEvenOdd(int n){// return -999; } int getCycLength(int n){// return -999; } Notice how to use of the jassert() macro that was corrected from the standard C library assert(). Many languages contain a standard mechanism for executing assertions. The assert approach minimizes the barriers to introducing unit testing [David S. Janzen. 2006]. The return value of the last two functions of the previous C code will make all Boolean expressions in jassert() fail. Now the student gets an executable C program. As Figure 6 shows, he/she can compile and run the program, and he will get the result immediately.
Figure 6. The result of executing the TDD sample code
Now the student focuses on writing functions to make those test cases pass. For example, if a student corrects his handleEvenOdd() as following code, he will get the result shown in Figure 7. int handleEvenOdd(int n){ return n%2==0?n/2:3*n+1; }
Figure 7. The result of executing the TDD sample code after coding some functions
This is what is called TDD in DICE.
5. CONCLUSIONS AND FUTURE WORK In this paper we have described the introduction the TDD model into DICE. Unlike a test-based system that support data to test the correctness of a test case’s output, the TDD model supports two -dimensional TDD plans to tune the test plans of DICE. Such a model supports more elasticity for programming assignments to different types of students. Obviously, a XΩ-YΓ╞ or XΩ-YΓ> mode can supply the students with an executable frame that will encourage them to do the exercises enthusiastically. This is helpful for solving complex problems, especially for underachievers. We have defined a systematic and symbolic model for TDD in Section 3.3. It’s more suitable to develop the machine learning and reasoning mechanisms with artificial intelligence methods. At same time, we proposed a scoring model for TDD in DICE, based on both theory and implementation. Especially, we consider that there are three ways to make the TDD test units. A TDD test case can be made by the instructor, the classmates or the students themselves. An instructor-made test case may be good for teaching programming skills and concepts, while student-made ones are better for learning TDD skills. A
349
ISBN: 978-972-8924-42-3 © 2007 IADIS
classmate-made one can implement pair programming and collaboration. The scoring model should distinguish the different between those modes exactly. We left this as a future work. Another future work is to distinguish different kinds of students, and to design different TDDs for them. Educational psychologists have long been of the opinion that different people learn in different ways [Lynda Thomas. et al, 2002]. Researchers have demonstrated the importance of learning style for improving teaching performance [Bostrom. et al, 1990] [Snow, R. E., 1986]. The findings suggest that the same training method may not be suitable for every novice. For best improving teaching performance, we will try to find out the relationship between our TDD model plans with different kinds of learning styles of students. We will choose Kolb’s learning style inventory (KLSI) to classify different kinds of learning styles [Kolb and Fry, 1975]. Referring to Kolb’s learning styles, we can obviously find TDD-complete oriented learning methods will do more to help students inclined to a concrete-experience learning style. Next, we will use statistics and data mining to find precisely cross tabulate the four kinds of learning styles and the nine kinds of teaching in our TDD model. Finally we will develop a TDD-oriented system for students with different learning styles.
REFERENCES Allowatt, A., and Edwards, S.H. 2004. IDE support for test-driven development and automated grading in both Java and C++, Proc. Eclipse Technology Exchange (eTX) Workshop at OOPSLA, ACM, October 2005, pp. 100-104. Bostrom, R.P. 1990B The importance of learning style in End-User training, MIS Quarterly (14-1), 1990, pp.101-119 Edward L. Jones. 2001. Grading Student Programs - A Software Testing Approach, Journal of Computing in Small Colleges 16, 2 (2001), 195-192. Edwards, S.H. 2003. Improving student performance by evaluating how well students test their own programs. J. Educational Resources in Computing, 3(2):1-24, Sept. 2003. Christopher, D., David, L. and Jams, O. 2005. Automatic Test-Based Assessment of Programming: A Review, ACM Journal of Educational Resources in Computing, Vol. 5, No 3, Stempember 2005. Article 4. Christopher G. Jones, 2004. Test-driven development goes to school, Journal of Computing Sciences in Colleges, v.20 n.1, p.220-231. David S. Janzen and Hossein Saiedian. 2006. Test-Driven Learning: Intrinsic Integration of Testing into the CS/SE Curriculum, SIGCSE 6, March 1-5,2006, Houston,Texas, USA. Don Colton., Leslie Fife., and Andrew Thompson., 2006. A Web-based Automatic Program Grader, Proc ISECON 2006, v23. Gagnone, Hendren L J. 1998. SableCC-an Object-oriented Compiler Framework. Proceedings of Tools 26: Technology of Object-Oriented Languages. Kolb, D.A. and Fry, R. 1975. Toward an applied theory of experiential learning. In Thories of Group Process, G..L. Cooper(ed.), John Wiley and Sons, Inc., New York, NY, PP.33-54 Kolb, D.A. 1976. The Learning Style Inventory Technical Manual, Mcber and Company, Boston, MA L Thomas, M Ratcliffe, J Woodbury, E Jarman, 2002, Learning Styles and Performance in the Introductory Programming Sequence, Proceedings of the 33rd SIGCSE technical symposium on ACM, 2002 Li-Ren Chien, D. Buehrer and Chin Yi Yang, 2007, Dice, a Parse-Tree Based On-Line Assessment System for a Programming Language Course. The Third Conference on Computer and Network Technology. Sandra, F., Greg, M. and Nils Toms. 1997. Automatic assessment of elementary Standard ML programs using Ceilidh, Journal of Computer Assisted Learning, (1997) 13, 99-108. Snow, R.E. 1986. Individual Difference in the design of educational programs, American Psychologist(41:10), October 1986, pp. 1029-1039 Stephen H. Edwards. 2003. Improving student performance by evaluating how well students test their own programs. ACM Journal of Educational Resources in Computing, 3, 3, Article 01.
350