The Design of an Automated C Programming Assessment ... - CiteSeerX

National Conference on Software Engineering and Computer Systems 2007, organized by Universiti Malaysia Pahang, Pahang, Malaysia.

The Design of an Automated C Programming Assessment Using Pseudo-code Comparison Technique. 1

Khirulnizam Abd Rahman1, Syarbaini Ahmad2, Md Jan Nordin3 [email protected], [email protected], 3 [email protected] 1, 2 Faculty of Technology and Information Science Kolej Universiti Islam Antarabangsa Selangor (KUIS) Bandar Seri Putera, Bangi 43000 Kajang, Selangor 3 Fakulti Teknologi dan Sains Maklumat Universiti Kebangsaan Malaysia (UKM) 43200 Bangi, Selangor

Abstract — Automated programming assessment is a computer aided approach in checking and grading students programming exercises, without the hassle of doing it manually. This is an attempt to conduct programming exercise assessment using one of the static analysis approaches; a non-structural similarity analysis by using the pseudo-code comparison. The application will generate the pseudo-code for the students’ C programming exercises and the solution models (answer schemes) provided by the programming instructor. At the end of the process, the students’ pseudo-code and the pseudo-code of the solution models will be compared and the similarity percentage will be considered as the mark. This paper will be discussing the details about the design of this application only. Since the development of this application is based on the object oriented approach, the design will be represented using the Unified Modeling Language (UML). Each class and the relationship with the other classes will be elaborated. Keywords : automated programming assessment, pseudo-code generator, pseudo-code comparison, static analysis, non-structural similarity. INTRODUCTION Automated programming assessment system is a computer aided application to check, evaluate, or even grade the programming exercises. The main reason for the development of all the automated programming assessor or grader is to facilitate the programming instructor in checking and evaluating their student’s performance in programming course. Since the students’ enrollment in most of the programming subjects are high, and too much exercises or assignments need to be given to the students [8] the instructors need a helping hand to minimize their burden. In this case, automated programming assessment system is much more preferred than hiring tutors or demonstrators. There are two major approaches in automated programming assessment. The first is through the dynamic approach which needs the program execution in order to determine the program’s correctness. The later is the static approach where by program will be assessed without being executed [1]. This research is about developing another assessment method

categorized in the static analysis approach. The method is a non-structural similarity analysis [5]. The assessment is done by translating the students’ and instructors’ source code into the pseudo-codes and they are compared to find the similarity percentage. This paper will be focusing on the design part of the application. The summary of the whole process is represented by a flow chart. Since the development of the application is implementing the object oriented programming methodology, the UML class diagram is used to visualize the objects interaction and relationship. The use case diagram is also drawn to show the interaction between the application and the users involved. PREVIOUS WORKS There are many methods categorized in the static analysis approach of automated programming assessment. Ala-Mutka [1] suggested that among the methods are coding style assessment, programming errors detector, software metrics measurement, design assessment, keyword detector and plagiarisme detector. Thruong et. al. [4] added structural similarity analysis to the list. Norshuhani et. al. [5] added nonstructural similarity analysis to complement Thruong’s suggestion. Environment for Learning Programming (ELP) In the research to develop Environment for Learning Programming (ELP), Thruong et. al. [4] are using structural similarity analysis as a part of the static analysis assessment to evaluate students’ programming answers. The structural similarity analisis is done by converting the program source code into the pseudocode abstract. The pseudo-code abstract is the representation of the basic algorithmic structure of the program structure of the programs. The student’s abstract representations then compared to the abstract representation generated from the instructors’ solution models (the answer scheme). Since the abstract is too simple, the method proposed is only suitable to assess simple programming exercise. More complicated programming exercise will produce variation in students’ answers. The instructors need to provide several different answer schemes in order to provide all the possible answers. 1


Web-based Automatic Grader System (WAGS) for programming exercises WAGS is another APAS which concentrate solely on the static analysis approach assessment. It’s being developed by a group of researchers from Universiti Teknologi Petronas to automatically grade programming exercises in Visual Basic, C and Java [5]. The system is capable of comparing programming source code submitted by the students with the answer schemes provided by the instructor. The instructor needs to provide more than one answer scheme in order to facilitate all the possible answer variation by the students. The students answer will be compared to all of the answer schemes provided. The highest mark from the comparison will be the mark for the students answer. Fig. 1. Solution proposed by Thruong et. al. [4] for the AST comparison. The Fig. 1 shows how the whole process is done in the static analysis assessment as a part of the ELP. The student’s and the instructor’s programs are translated into the AST (abstract syntax tree), and the marked up with the XML tags. The AST then compared to decide whether they are match or not. Refer to Fig. 2 for the sample of the generated abstract pseudo-code.

Fig. 3 is provided by the researchers of WAGS to explain all the process involved in the automatic grading of the student’s programming assignments.

Fig. 3. Solution proposed by Norshuhani et. al. [5]. There is one weakness that can be observed from the comparison process in WAGS. As we know, in programming, the variables may be declared anywhere in the program, as long as it is declared before used. In our opinion, it is better to separate all the variable declarations from the main program. Compare the student’s variable declarations with the variable declarations from the answer scheme. If it is done this way, we believe the comparison result will be better. Fig. 2. Sample of AST generated by ELP. According to the authors, the comparison process will produce only yes (for matching structure) or no (for un-matching structure). This is the main weakness of the system. The result is too objective. There should be a percentage value to indicate how much similar the student’s program structure as compared to the instructor’s program scheme structure.

THE SUMMARY OF THE WHOLE PROCESS The illustration in Fig. 4 shows the flow of the major processes involved. There are two user categories, students and the instructors. The instructors prepare all the questions and key-in them to the system. For each question, the instructor needs to prepare all the possible solutions. This is to provide better similarity 2


percentage between the students answer and the schemes provided.

Fig. 4. The whole process in flowchart.

3


USE CASE DIAGRAM Students’ source code will be tokenized and translated into pseudo code. The same goes to all of the source code schemes provided by the instructor. Next the student’s pseudo code will be compared to one of the pseudo-code schemes. The similarity percentage will be calculated and stored in the application. The process will be repeated with the same student’s pseudo-code compared to the next pseudo-code scheme. Since each pseudo code comparison process might produce different similarity percentage, there will be step to find the highest percentage. Thus the highest percentage will be chosen as the mark for the student’s answer. This mark will be stored, and displayed to the instructor.

The use case diagram is used to visualize the role of the users involved in the system [3]. There are students and instructors as the users (actors) represented by the stickman. The students will access the system through a web-based interface to answer read the questions, and upload the source code files as the answers. The instructor will also use the web-based interface to create the questions for the exercises and upload all the answer schemes for each question. The instructor will then activate the exercise, so the students can access the exercise page and answer the questions. The instructors will then view all the results and finalize the marks.

Fig. 5. The use case diagram. CLASS DIAGRAM There are four main classes in the application which are the interface (input and output), LexicalAnalyzer, PseudocodeGenerator, and PseudocodeComparator. The other classes are supporting classes; Token,

TokenList, Keyword and VariableSeparator. Each class has it’s own job function. The complete class diagram with the list of all attributes and methods define in each class can be found in the Appendix A.

4


The Interface Class This is the mediator between the users and the application. The instructors will use this interface to create the exercise and all the questions for the exercise. They also need to provide all the answer schemes for the questions.

The students will use this interface to gain access to the exercises prepared by the instructors. They will use this interface to read the questions and upload the sourcecode files in order to answer the questions (assignments). All of these input (instructor’s questions and answer schemes and student’s answers) will be stored in the application to be used later in the next class.

Fig. 6. The class diagram.

The LexicalAnalyzer Class This class will receive all the source-code files (instructor’s answer schemes and student’s answers) and generates the token list. In order to do its work, this class is supported by two other classes; Token, Keywords and TokenList. Token is the most basic node of the token linked list data structure. Keywords is the class which assist LexicalAnalyzer in the process of listing token. It contains the list of all the keywords define in the token specification (as defined in Appendix B). It also identifies a token whether the token is a keyword or else. What actually happens in this class? The source code is processed character by character until a word or a specific symbol is found. If a word is found, the word will be compared to the list of keywords in Keywords class. If there is any match found, the token will be categorized as a keyword token type. If no match is found, the token is categorized as a variable. We are also proposing a variable renaming procedure

to standardize all the variables used. This is to overcome the problem of variable name variation.

Example : Let say there is this segment of program; float number1, number2, number3; float sum, average; sum=number1+number2+number3; average=sum/3; Through the variable renaming procedure, these statements will be converted into; float v1, v2, v3; float v4, v5; v4=v1+v2+v3; v5=v4/3; If the token is identified as an operator, it will be categorized as the definitions in the Appendix B, the grammar specification. The new token will be stored in a node (using class Token) and will be attached as the next node in the token linked list. The class also provides the facilities to insert a new token node and the function to traverse the token sequential 5


list. The linked list of tokens created in this class will be sent to the next class, PsudocodeGenerator. The PseudocodeGenerator Class As the name suggests, this class is to convert the source code (in the form of token list) from the previous class into the pseudo-code. We are using the pseudo-code specification suggested by Robertson [7]. Each statement as defined in the grammar specification (Appendix B) has its own method in the class that act as the pseudo-code converter. The pseudo-code produced in this class is in the form of a string. Each line of the pseudo-code is separated with a newline (‘\n’) character. The pseudocode string is then stored in the system to be used later for comparison.

The PseudocodeComparator Class This class needs the student’s pseudo-code and the instructor’s pseudo-code in order to perform the comparison process. Each pseudo-code will go through the VariableSeparator class to separate the variable declarations from the main pseudo-code before the comparison is done. Since the position of the variable declarations may be scattered every where in the pseudo-code, it will have a negative impact on the comparison process. Separating them will increase the similarity percentage.

strings as the parameters and calculates the similarity of the two strings character by character. 95% is allocated for the similarity percentage of the total comparison. Main pseudocode similarity =

string similarity x 95 number of characters in pseudocode scheme

the

In the end the similarity percentage is to total the sum of variable similarity percentage and the main pseudocode similarity percentage. Similarity percentage = Variable similarity + Main pseudocode similarity All the comparison processes will be repeated between the students’ pseudo-code answer and the other instructors’ pseudo-code schemes until all the schemes are compared. The similarity percentage values are stored to be displayed in the output interface later. The Output Interface Class Since there are many values of the similarity percentage resulted from the previous process, this class will determine the highest percentage. The highest percentage will be displayed to the instructor to be finalized.

CONCLUSION AND FUTURE WORKS The first comparison is between the variable declarations part of the students’ pseudo-code and the pseudo-code scheme. Five percent (5%) is allocated for variable declarations comparison. There are three major variable categories; i) integer (int and long), ii) decimal (float and double), and iii) character (char). The comparison is done by calculating the number of variables used in each of the category. If there is no variable declared for a category for both the students’ and the answer schemes’ pseudo-code, the variable category is not included in the calculation. If the number of variable in students’ pseudo-code (in a category) is less than the number of variable in the answer scheme, no mark will be given. If the number is similar or more than the number in the answer scheme, the mark is one (1). Refer Appendix C for a sample of the calculation. Variable similarity=

This paper has discussed the design of yet another automated programming assessment using static analysis approach. The different in this static analysis assessment is the program source codes are converted into pseudo-codes in order to map the source code into a more uniformed text. The whole process is represented by a flowchart, and there is a UML use case diagram and class diagram to visualize the users’ interaction and class’ relationship respectively. The system is in the development stage and will be deployed soon. It will be deployed in the webbased environment, using PHP as the main server side script, and MySQL as the database server. We are planning to implement the system to the students in Selangor International Islamic University College Selangor when it is ready.

mark of the right categories x 5 number of categories involved

The next comparison is between the main student’s answer pseudocode and the instrcutor’s main pseudocode. The comparison is done using one of the functions provided in the PHP library, similar_text. This function provides the facilities to calculate the similarity between the two strings. It receives two 6


REFERENCES [5] [1] Ala-Mutka, Kirsti M. 2005. "A Survey of Automated Assessment Approaches for Programming Assignments". Computer Science Education 15 (June 2005):83-102. [2] Ala-Mutka, et. l.. 2004. "Supporting Students in C++ Programming Courses with Automatic Program Style Assessment". Journal of Information Technology Education 3. [3] Booch G., Rubargh J., Jacobson I. 1999. The Unified Modeling Language User Guide: Addison Wesley. [4] Nghi Truong, Paul Roe, Peter Bancroft. 2004. "Static Analysis of Students’ Java Programs". Paper read at 6th Australian Computing Education

[6] [7]

[8]

Conference (ACE2004), at Dunedin, New Zealand. Norshuhani Zamin, Emy Elyanee Mustapha, Savita K.Sugathan, Mazlina Mehat, Ellia, and Anuar. 2006. "Development Of A Web-Based Automated Grading System For Programming Assignments Using Static Analysis Approach". Paper read at International Conference on Electrical and Informatics, at Bandung, Indonesia. php.net. 2007. PHP Manual 2007 [cited Januari 2007]. Available from http://php.net/similar_text. Robertson, L. A. 2004. Simple Program Design: A Step by Step Approach: Thomson Course Technology. Venables A., Haywood L. 2003. "Programming students NEED instant feedback!" Paper read at Conferences in Research and Practice in Information Technology.

7


Appendix A : The list of attributes and methods in the class diagram.

8


Appendix B : The token specification (in BNF) of the C language supported by the application. InputElement: Whitespace: Comment: LineComment: BlockComment: Token: Identifier: Literal: Number: Integer: Real: Character: String: Letter: Digit: Keyword:

Operator:

ArithmetikOperator: ArithopMult: ArithopDiv: ArithopMod: ArithopAdd: ArithopSub: LogicalOperator: LogicopNot: LogicopAnd: LogicopOr: RelationalOperator: RelopEqual: RelopNotEqual: RelopLessthan: RelopGreaterthan: Relopltorequal: Relopgtorequal: AssignmentOperator: Assop: AssopMult: AssopDiv: AssopMod: AssopAdd: AssopSub: DecrementOperator: IncrementOperator: Separator:

Seperatorcoma: SeperatorEndStmt:

Whitespace | Comment | Token \b|\t|\r|\n|\f|\r\n LineComment|BlockComment // any string ended by \r or \n or \r\n /* any string */ Identifier | Keyword | Literal | Operator | Separator Letter(Letter | Digit)* Number | Character | String Integer | Real -?Digit+ -?Digit+(\.Digit+)? ‘any character’ “any string ended with” [A,…,Z,a,…,z] [0,…,9] asm | auto | break | case | char | const | continue | default | do | double | else | enum | extern | float | for | goto | if | int | long | register | return | short | signed | sizeof | static | struct | switch | typedef | union | unsigned | void | volatile | while | main | printf | scanf ArithmetikOperator | LogicalOperator | RelationalOperator | AssignmentOperator |DecrementOperator | IncrementOperator | DataTypeRepsOperator ArithopMult|ArithopDiv|ArithopMod |ArithopAdd|ArithopSub * / % + LogicopNot|LogicopAnd|LogicopOr ! && || RelopEequal| RelopNotEqual| RelopLessthan |RelopGreaterthan |Relopltorequal | Relopgtorequal == != < > = Assop|AssopMult|AssopDiv|AssopMod|AssopAdd| AssopSub | DecrementOperator| IncrementOperator = *= /= %= += -= -++ Seperatorcoma | SeperatorEndStmt | OpenStmtBlock | CloseStmtBlock | OpenArrBound |CloseArrBound| OpenParenthesis |CloseParenthesis | CloseDDot , ; 9


OpenStmtBlock: CloseStmtBlock: OpenArrBound: CloseArrBound: OpenParenthesis: CloseParenthesis: CloseDDot DataTypeRepsOperator: DTROInteger: DTRODecimal: DTROChar: DTROString:

{ } [ ] ( ) : DTROInteger| DTRODecimal| DTROChar| DTROString %d %f %c %s

APPENDIX C : A sample of variable declaration similarity calculation. Category Variable Variable number Mark Number number in in answer of student’s scheme code categori code es integer 3 4 0 1 decimal 2 2 1 1 character 0 0 0 0 TOTAL 1 2 Variable similarity for the sample = (mark from the correct category/ number of categories involved) x 5 = (1/2) x 5 = 2.5

10

The Design of an Automated C Programming Assessment ... - CiteSeerX

The Design of an Automated C Programming Assessment ... - CiteSeerX

Suggest Documents

The Design of an Automated C Programming Assessment ... - CiteSeerX

Automated assessment of C++ programming exercises with unit tests

Automated Assessment of Programming Assignments - Department of ...

An Introduction to the C Programming Language and Software Design

An Overview of the C++ Programming Language

Advances in Mathematical Programming for Automated Design ...

An Agent Programming Framework Based on the C# ... - CiteSeerX

An Agent Programming Framework Based on the C# ... - CiteSeerX

Design of an Automated Data Entry System for ... - CiteSeerX

Design and Evaluation of an Automated Aspect Mining Tool - CiteSeerX

Design of an Automated Data Entry System for ... - CiteSeerX

An Automated Assessment System for Analysis of

an accuracy assessment of automated ... - ISPRS Archives

Design Patterns for Generic Programming in C++ - CiteSeerX

Automated Assessment of Upper Extremity Movement ... - CiteSeerX

Automated Eclectic Instructional Design: Design factors - CiteSeerX

An improved automated immunoassay for C-reactive ... - CiteSeerX

Automated Self-Assembly Programming Paradigm: Initial ... - CiteSeerX

Design of an Automated Intrusion Detection System

An Assessment of the Quality of Automated Program Operator Repair

An Assessment of the Quality of Automated Program Operator Repair

A C++ Implementation of Genetic Programming - CiteSeerX

Automated Design of Quantum Circuits - CiteSeerX

Automated Synthesis of Electromechanical Design ... - CiteSeerX