Integrating Technology into Computer Science Examinations . I 4:
,
,’ I
;
dwoitOscs.ryerson.ca
'
[email protected]
: ,1 I, : ,.;
I, /
Denise M. Woit
David V. Mason-
I
9’
’ School,qf Computer Science Ryerson Polytechnic University 350 Victoria Ftreet Toronto, Ontario Canada M5B 2K3
a :
t; ,‘,.,
/
‘>
0
I
Abstract
” :.
On-line program&g testsand examinations,wereadtinistere,dto ~ppro;i~ately 120 first ye? computer sciencestudentsin o$er to ,evalpatetheir practical skills. We describe o,$ motivation for on-line testing, outline, the technical details of ‘qur closefl testing environment, and presentour observations&out studentp+ormance., We ,$so comparethe effectiyenessof on-line tests versus conventional tests, rep-0~$e problems we en,counteredan! our solutions, relate stude?tppinion regarciingthe on-line tesrjngexperiment,and presentsomeinsights gainedby this experiment. : ,’ & .
lhtroductidn , ,.ll:l
’
.I
:, ,.
-.
We uieie motivatedto try on-line examinationsas a result-of discussionswith colleaguesand senior studentsabout copying of assignnientsand labs’by computer students. Some were concernedthat it was possible for a student to obtain a passing grade in our first year computer courses without having atta@d adequatepr$i@ programming skills. By, copying, such student; could obtain excellent assignment and,l?b,mar&s,tkough little or no practiFa1work of their own; $?,d,cou!d,tianagetq,concatenafeenoughmemoqizedcodefragmentsfo achievea “good” Bade in ;$ cqurse thE?ughpart-markson examinations. We felt that i effort should be made.Q distinguish, in someway, betweenthose studeqtswho badiengagedin the amountof practicd cd&se york we ,expecfedof them, and those,who had pot. Although we desired to.penalize those who,had not fulfilled the programmjngportion of the course, we did not believe they should neces@l) fail the course,as they may have attained@equ?Fetheoreticalknowledge(proble,m-solving,design, etc.) to,justify a passinggrade. We require that practical progrgmipg stills be,acomponentof mark compos@on ‘, ,i ,( _r Permission tomake digital/hardcopiesof all or part of this materialfor personalof chssroomuseis grantedwithout fee providedthat the copies arenot madeor distributedfor profit or commercialadvantage,the copyright notice,the title of thepublicationand its dateappear,andnoticeis given that copyrightis by permissionofthe ACM, Inc. To copy oh-wise, to republish,to post on serverSor to redistributeto lists,requiresspecific permissionand/orfee. SIGSCE98 AtlantaGA USA copyright 1998o-89791-994-7/98/2..%5.00
140
for several of our computer sciencecourses;the main reason being that the university is a polytechnic and we desire to ‘teachstudentsboth practical skills and theoretical knowledge. Our discussionsof on-line’testilig highlighted further potential advantages,which we hop&d would be validated or refuted by our experimerit. We expectedthat students would undeistand, although perhapsnot until after the first midterm, that they could not do well in an on-line test without having complkted their ihdividual assignmentswithout copyinticheating. The processof completing their assignmentsCithout copyink dould constitutepart of their “studying” for the on-line tests. This is true even of conventional tests,bul it was felt that studentunderstandingof this point wou!d b& incresisedwhen b&line tests were encountered. Another advantagewaSexpedtedto be the availability of the on-line documentatidn(man-pages)for C and Unix. It was expectedthat g combin?tion of the apropos commandand varibus shell utilities, such q grep would provide a better resourcethan could 6heatshQts or referencetexts. Expectations amongour colle’agueswere that the marking time re qu&d would be subst&tially less for on-line tests,as timesairing tools &uld be built to perform much of the marking automatically, ’ ‘. In the subsequentsectionsof this paper,we discussthe implementationof the on-line tests,the problemsencountered, and thei;solutions. We discussthe experimentin relation to our objectivesand expectatiohsand outline what we learned from this exercise. ’
Related Work
/,
Cdmputers have been used for ielf-administered progress testsin engineering[4,3j andfor on-line programmingdrills in computerscience[2]. In mathematicsandchemistry,computershavebeenusedto administerandgradeon-line placement tests [l]. An attempthaseven beenmadeto use computerSto automatically giade student writing samples[5]. However, to the best.of our knowledge, no one has previously chronicled the execution of on-line programmingtests and examinations.
partitions were put in place between close rows of terminals. We had a wide range,of terminals available, including characterbasedterminals, PCs, and X-terminals. We brought them all to a least-common-denominatorof 24 x 80 character-modeterminal. Special login namesand user-ids were createdand all non-testaccountswere disabled for the duration of the tests. Passwordsfor the test accountswere handedout after the studentswere seatedat their terminals. :/’
lmpletientation We taught twdcourses, CPS209and CPS393,to essentially the samegroup of about 120 secondsemestercomputersciencestudents. CPS209is a second,course in principles of computer science. Conceptsare illustrated using the functional programming languageScheme. CPS393introduces the proceduralparadigm and focuseson developingthe students’ skills in both the C programminglanguageand Unix shell programming. This courseis in place so that in future core computer science coursesusing,C and Unix, students are not burdenedwith learning the technicalities of the programming environment and can concentrateinstead on the computerscience. 1 CPS209had two midterms and a final exam,,-wherethe first midterm and part of the final were on-line and the second midterm was a conventional test. The conventional midterm and the conventional portion of the final exam testedthe more theoretical aspectsof the course,lwhile the on-line midterm and on-line section of the final testedmore practical programmingskills. CPS393had a midterm andfinal examthat wereboth on-line tests.As themain purposeof this courseis the students’acquisition of practical programming ski!ls in C and Unix, we surmisedit was appropriate to do all on-line testing. We attemptedto incorporate the testing of skills such asproblem-solving and design,into the on-line questions.
Familiarity To makedie environmentasfamiliar aspossible it’included mostcommandswith which the studentswere familiar, aswell asthe on-line documentationfor thoie commands.
Problems Encountered In this section we highlight the problemsencounteredin the on-line testingprocessandtheir solutions,both implemented and proposed. ‘I
Terminals
,.
,
In the interest of fairness: and becauseof scheduling con&&its, we req&red that all studentswrite the test at once. As mentioned abdve,studentswere not awareof which terminal they would be using until immediately~prior to the comniencementof the test, and g wide range of terminals were used. Thus, it was possiblefor studentsto be testedon a terminal with which they ha’dno prior experience. We attemptedto amelioratethis probl&n by warning the students of this potential, arid allowing them as much free time as desiied to familiarize themselveswith the various terminals for the week prior to the firsi ‘on-line test. Thi~approach proved effective in ge’nkral;however,a small number of stu- . dents experienceddifficulty becausethey had not familiarized themselveswith the specific kinds of terminals they were assigned.
Physical Environment All of the on-line testswere implementedin an environment which was a subsetof the students’usual Unix environment. Our threemain goals in the developmentof the test en\iironL ment were security, fairness and familiarity. The environment was createdon a systemruiming Solaris 2.3, although it should be portable to any othCrmodernUnix implementation. Security and Fairness Although thesestudentswere onli secondsemester,they were very enterprising,so we worked hard to makethe environmentcompletely secure. Unix has a facility called chroot which allows a priyileged program to make a directory appearto be the top of the directory structure. We used this to put each studentin an individual, private directory tree. Even directories such as / ttnp (whereeveryonecan createfiles andstudentscould otherwise leave notes for each other) were private. Other communicationfacilities suchasmail, talk, and telnet were not made available. There were no covert channels availableto the outsideworld.’ I\ Studentswere randomly assignedto terminals to reduce the slight chanceof collaboration betweentwo &ends and
I
Hardcopy Required
‘,’
The Emacseditor &owed the studentsto maintain multiple windows on their screen.We expectedthat the test questions could be displayedin onewindow, while the studentsworked on the problem in another.However,the studentsfound this onerouson their small screens,especiallyin situationswhere more than 2 windows wererequired (for example,shell windows and man-pagewindows were often desired as well). At the end of the first on-line test, the studentsrequesteda hard copy of the questionsheetfor subsequenton-line tests. The studentsalso requestedthat they be allowed scrap paper in order to perform initial problem solving. Requiring them to use an editor for the initial problem-solving activities resultedin unduepressureand loss of familiarity. In responseto theserequests,in the subsequenttestswe handed
‘Eliminating covert channelscan be very challenging, as even commaadssuch as those to show active processeson the system can provide communicationbetweenusers.
141
out hardcopiesof the questionsheetsalong with scrappaper on which they could problem-solve.
One unexpectedresult of on-line tests was that some studentswho tend to be more anxious becamealarmedand discouragedwhen their code did not compile. Occasionally a subtle syntactical error was not discovered,and the student went pn to the next questionfeeling as if the previous question,were mainly wrong, when in fact it was mainly correct, and,indeedreceivedalmostfull marks. This underminedstudent confidenceandaffectedtheir performanceon remaining questions.In conventionaltests,suchsmall syntacticalerrors would not haveproducedsuch effects.
Evaluation
,:
Both coursesemployeda test harnessto aid in the evaluation of the questions.The Schemecourseuseda completely automatedtest harness,while the C and Unix courseemployed only a partially automatedone. Both harnessesexpectedthe studentprogramsto be given particular names.In the C and Unix course,when expectednaming practiceswere violated by the students,it wasflaggedby the partially automatedharness.The professorsimply listed the testdirectory, andcompiled,the incorrectly namedprogram,deductingsomemarks for not following jnstructions. However, incorrect naming practices were far more problematic for the completely automatedtesthamess.usedin the Schemecourse: the harness would not find, the expectedfile or function and would assign,a mark ofzero. “Por.the mid-term test in this course, the professor spent much time correcting these gratuitous changesso the harnesscould,assignreasonablegrades.For the final examin this course,we provided a simple program that checkedthe files and function namesand gavethe students feedbackif they had violated expectednaming practices. This significantly reducedthe problem. The feedback programhasbeenmodified to makeonetest-call to the functions to makesurethat the domain andrangeof the functions are within the specification. This has eliminated most problems with using a completely automatedtest harnessfor the Schemecourse.’,I:! ’ ’ ’ ‘) i :
Observations .;
li’
,
bnObjectives Ii
and Expectations
In this sectionwe presentobservationsresulting from the online testing experiment. / ‘ .
Marking
,/i ’ ‘,,
courses. In the C and Unix course, approximately 50% of the marking required extensive visual on-line inspection of the code (for assignmentof part marks.) The remainder of the marking was performed using a test harnessto compile and executethe students’ programs. In the Schemecourse a completely automatedtest harnesswas used; mark? were basedonly on the behavior of the students’code. . In the C and Unix course, the on-line marking required less time than it would have for a traditional test. This occurred partly becauseapproximately 15% of the questions were answeredwith correct programs.Marking thesequestions required noting the “correct” output from the test harnessand then a brief visual inspection of the code to verify correctness. A visual inspection was necessary,especially for the Unix shell programs,in order to verify that the studentsfollowed directions. For example,if the questionasked them to write a shell program “from scratch” to simulate Unix WC,then it was necessaryto check that the program did not in fact utilize the Unix WCcommand!For thoseprogramsthat compiled andran, but did not passal1tests,a mark usually could be quickly assignedbasedon the type and/or number of testspassed.Again, a brief visual inspection was also necessary.A more extensiveinspection was sometimes necessarydependingon the testsfailed. Programsthat did not compile were markedin a conventionalfashion with visualinspection. In rarecasescompilation washinderedby an obvious syntax error; thus,it waspossibleto quickly edit the code, compile it, and run it through the test harness,which decreasedthe marking time of suchcode. In the Schemecourse, a more elaboratetest harnesswas used in an attempt to ,do completely automatic marking. This programloadedthe students’questionsinto the Scheme interpreter, .verified that the expectedprocedureshad been loaded, called the procedureswith a full test suite of valid and out of rangedata,andverified the resultsfrom the procedure calls. Considerableeffort wasput into making the processrobust, but the inventivenessof the studentsexceeded the imagination of the professor,and in addition to file and function namesbeing assignedarbitrarily by the students, there were many small’differencesbetweenthe expectedresults and the generatedresults, so that manual intervention was frequently required. ‘Becauseof the need for frequent intervention by the professor,the overall marking time for this course was somewhatincreased. In a subsequentrunning of the course,the feedbackprogrammentionedearlier solvedmost of theseproblems.’
Student Opinions
,
The term following the on-line experiments,we surveyed the studentswho had been involved in order to obtain their overall opinions of the on-line testing versusconventional testing. Studentswere asked46 questions,which they answeredon scannersheetsand in prose. For most questions studentswere presentedwith a scale;for example,the question “Did you find the types of questionswere similar for
I
The expectationsof our colleagueswas that on-line testing would save much time in marking.’ It turned out that the markingtime was reducedin one courseand was increased in the other. Marking was handled differently in:the two 142
on-line and conventional tests?” had scale (1) no difference ...1(5)completely different. For 14 of the questions,students were given spaceon the questionsheetto include their opinions. Through a combinationof studentopinion andour own observations,we addressour objectivesand expectationsof the on-line testing experiment. ‘_ The original motivator for the on-line experimentwasthat we would attempt to identify and somewhatpenalize those studentswho were not performing the practical work we expectedof them (e.g.,we wantedto distinguish betweenthose who had implementedthe numberof programsand other lab questionswe expectedthroughoutthe course,and thosewho ’ had avoided gaining such practical skills.) Becauseno oracle exists to statewhether or not a particular studentshould fail, or what mark they should receive, it is not possible to objectively evaluatethe effectivenessof the on-line testsin penalizing studentswho did not perform the expectedpractical coursework. We cannotdeterminedefinitively whether or not the on-line testshad failed/passedstudentswho had practical skills that should have allowed them to pass/fail, or whether on-line tests assignedlower/higher marks than should have been,given the students’skill levels. We could, however,elicit studentopinion on the matter, which was as follows. Student responseindicated that they think conventional testsare more likely to passstudentswithout adequatepractical skills (a mean answerof 3.7 for scale (1) on-line were likely to passstudentswithout adequatepractical skills, (3) no difference (5) conventional were likely to pass students without adequatepractical skills, with 8% of studentschoosing a responseless than (3), 35% choosing response(3) and 57% choosing a responsegreater than (3).) They also believed that conventional tests were more likely to assign higher marks to studentswho did not have adequatepractical skills. For the question “How possible do you think it is that a studentwould fail the on-line tests but still have adequatepractical skills?” and the same question but for conventional tests,studentsrespondedwith meansof 2 and 2.2 respectively on the scale (1) very possible (5) impossible.) They respondedthat studentswho “deserved” to fail becauseof extreme lack of practical skills would be more likely to be failed by the on-linerather than the conventional tests (a mean of 2.1 in scale (1) on-line more likely to fail (3) no difference (5) conventional more likely to fail, with 55% choosinga numberlessthan (3), 36% choosing(3) and 9% choosing a number greaterthan (3).) Studentsthought that on-line were a better indicator of their practical skills (mean.2.3 in (1) on-line (3) no difference (5) conventional, with 55% choosinga numberlessthan (3), 26% choosing(3) and 19%choosinga numberabove(3).) And of thoseselecting (1) or (2) above(on-line a betterindicator) 88% believed that the benefitsof the on-line testsfor testingpractical skills could not be achievedby making “better” conventionaltests. An interesting point, given the above student responses,is that they think conventional tests are somewhat“easier to 143
pass” (mean 3.6 in (1) on-line easier (3) no difference (5) conventional easier,with 17% choosing a number less than (3), 36% choosing (3) and 47% choosing a number greater than (3)) Therefore, studentsbelieve on-line testsare a better indicator of their practical skills, .eventhough they find them more difficult to pass.They think on-line aremore appropriate for testing practical skills becauseas,one student put it: “it allows us to becomebetter programmers. In the real world, we must get things done before deadlineswith stress!’ (mean 2.2 of (1) on-line (3) same(5) conventional, with 62% choosinga number,lessthan (3), 16%choosing(3) and 22% choosing a numberabove(3).) Students’didsomewhatbelievethat environmentalfactors, such as typing skills could lower one’s mark on the on-line tests (mean of 2.8 on scale,(1) very likely (5) impossible, with 43% choosinga numbergreaterthan (3), 19%choosing (3)‘and 38% choosing a number greaterthan (3)). However, they tended to feel that they, personally, were not significantly slowed down by the on-line process(mean of 3.4 on scale (1) slowed down a lot (5) not slowed at all, with 35% choosinga’numberless than (3), 12%choosing(3) and 53% choosing a number greaterthan (3).) In the commentssection for‘this question, those’studentswho felt slowed down mainly cited unfamiliarity with the editor commands. (We believe that it is unlikely that studentswho have attainedthe practical skills we desire in this coursecould be unfamiliar with the editor.) Interestingly, we found no correlation with studentsmarks and their answersto this question. ?‘hey felt that having the compiler/interpreter available during the test helpedthem and did not slow them down sig,nificantly (meanof 2.36 on scale(1) it helped (3) no difference(5) it hindered, with 62% choosing a number less than ‘(3), 9% choosing (3) and 29% choosing a number greater than (3).) Students’who felt the compiler helped, and who commentedon this question,estimatedthat it took them less ‘time to debug/testtheir codethan it would haveby hand. Our original motivating factor wasrelatedalso to copying of assignments/labs.It is;important to note that we encourage collaboration, but not copying. We expect that by the time the lab/assignmentis handedin, even the weakeststudent in the group has a good understandingof the solution and,would likely be able to reproduceit independently. We exp.ectedthat those studentswho were habitual copiers as well as those who received “too much help” would be less likely, to do well, in on-line tests(mainly becauseof lack of practical experience.) A majority of the studentsfelt that marks on on-line tests were a good indication of whether or not a student copied/cheatedon labs/assignments(mean 2.6 on scale (1) very good indication (5) no indication at all, with 60% choosing a number less than (3), 9% choosing (3) and 31% choosing a number greaterthan (3).) Stu- , dents were askedwhether or.not the on-line testsmotivated them,personally,not to cheatandnot to get “too much help” on labs/assignments:They respondedwith a meanof 3.3 on scale (1) big motivation (5) no more motivation than con-
. !
’ ventional tests, with 37% choosing a number less than (3), 16% choosing (3) and 47% choosing a number greaterthan (3). Interestingly, there was a slightly positive correlation between students’ marks and their own motivation not to copy and not to get too much help. Studentswith the highest marks felt more motivated not to cheat on assignments/labs becauseof the on-line tests. However, the responseswere fairly bi-modal; for example,out of the “A? students,nearly equal numbers felt a big motivation as felt no more motivation than conventional tests: (We suspectthis is because many of the “A’ studentsdo not cheat/copyregardlessof the situation.) The poorer the mark, the less motivated they felt not to cheat/copy (although they were still more motivated than with conventional tests.) The “threat” of the on-line tests was a motivating factor not to cheat or get too much help for a majority of the students. It is important to note that this response,waselicited well after the studentswere awarethat they would receivepart marksfor non-perfectanswers,just asthey would for conventionaltests. Our expectationthat the combination of on-line documentation (especially aproposin the man-pages)and Unix shell tools such as grep would be more useful to studentsduring testing than cheatsheetsor openreferencetestswas well substantiatedby our student survey (mean of 2.35 on scale (1) much more useful than referencetexts (5) no more useful, with 62% choosing a numberless than (3), 13% choosing (3) and 25% choosing a number greater than (3).) The studentswith lower marks in the on-line tests reported the on-line documentationmore useful than the students with higher marks (As and Bs). We suspectthis is becausethe strongerstudentsrequire lessdocumentationof any kind. ,In general, many of our expectationswere substantiated by studentopinion. Studentsfelt that on-line documentation wasmorehelpful to them than a hard copy referencemanual. Studentsfelt that on-line testswere better for testing practical skills andtendednot to be hinderedby the environmental factors associatedwith on-line tests. They believed on-line test were more likely to fail those that should fail and that conventionaltest were more likely to’passthosethat should have failed. Even with the knowledge that part marks were given for on-line tests,studentsstill felt that on-line tests(as opposedto conventional tests) motivated them not to copy and to ‘attain the practical skills expectedof them through coursework. We interpret this to indicate that they felt as if they werebeing held to a higher standardby the on-line tests. In general,studentsbelieve that on-line testswere preferable to conventionaltestsfor practical skills, andthis opinion was not motivatedby expectedhigher marksfor on-line tests. ,‘,, ! /’ ,/I
Conclusions We feel that it is possible to produceboth good on-line and goodconventionalteststo assessstudents’skill levels. However,for assessmentof practical skills we expectstudentsto obtain in thesecourses,we conclude that on-line testshave 144
the advantageof bringing the studentsto a higher standard, of causing them to reduce the level of cheating and copying, and of encouraging them to attain the practical skills we expectfrom coursework. Theseconclusionshave been substantiatedby studentopinion of our on-line testing experiment. Therefore, we will continue to have someportion of future testsin thesecourseson-line, in orderto raisethe standardsto which studentsaspire in the course,and to encouragethem to attain the practical skills expectedof them in our polytechnic environment. We do not advocateon-line testing for most computersciencecourses;however,for testing practical skills in coursesin which acquisition of practical skills is deemedimportant, we conclude that on-line testing is appropriate. The tools to set up a secure’testingenvironmentare available from: http://www.scs.ryer~‘on.ca/dma~on/online/
,f
References [l] AGER,T. Online placementtesting in mathematicsand chemistry.Journal of Computer-Based Instruction 20,2 (1993), 52-57. .
PI
BENNETT, R., AND WADKINS, J. Interactive performance assessmentini computer science: the Advanced PlacementComputer Science @PCS) practice system.Journal of Educaiional Computing Reseamh 12, 4 (1995), 363-78. 1 R., AND NAGSUE, P. Selftest, a versatile menu-drivenPC tutorial simulatestest-taking. Computers in Education Journal 2, 1 (1992), 58-69.
131SANFORD,
A., AND HERRICK, R. The use of computers for educational and testing purposes. In Proc.
141WALWORTH,
Frontiers in Education. Twenty-jirst Annual Confereke. Engineering Education in a ‘New World Order: (1991),
IEEE, pp. 510-14.
[51 WRESCH, W. The imminenceof gradingessaysby computer - 25 yearslater. Computers and Composition IO, 2 (1993), 45-58.
‘