An Approach to Automatic Evaluation of Higher

An Approach to Automatic Evaluation of Higher Cognitive Levels Assessment Items Shilpi Banerjee∗ , Chandrashekar Ramanathan∗ , N.J.Rao∗ ∗ International

Institute of Information Technology, Bangalore Email: [email protected], [email protected], [email protected],

Abstract—Large-scale assessments assess relatively large numbers of students. One of the biggest limitations/ challenges in MOOC today is conducting effective assessments in a largescale environment. The quality of large-scale assessment is under threat from multiple sources including assessment instrument specific and measurement errors. Assessment instrument specific errors are related to the extent to which assessments meet its objectives while measurement errors are incurred during the process of evaluation. A survey of sample of existing assessment instruments used for large scale assessments is conducted to identify assessment instrument specific errors. In this paper, we have proposed the usage of technology to build electronic item banks for avoiding assessment instrument specific and measurement errors, thereby improving the quality of assessments. We have proposed 12 unique item types which are amenable for automatic evaluation. The process of evaluating students response automatically is discussed in detail for each item type. These automated item types provides cost effective ways for achieving validity and reliability for large scale assessments.

I. I NTRODUCTION Designing quality large scale assessments is extremely challenging as it is characterized by higher stakes of students enrolled in multiple centres and national/ state level competencies for assessments. The main challenge here is to make the process of manual evaluation easy for large number of student scripts. To meet this challenge, assessment items which only focuses on rote learning and memorization are included in assessment to ease the process of evaluation[1]. Items which assesses higher mental abilities such as critical thinking, problem solving and creative ability are not included as these items are heavy on evaluation. Therefore, it is observed that students who can remember the concepts do better in assessment as compared to students who can apply those concepts. This type of error in assessment is termed as assessment instrument specific error which occurs due to insufficient cognitive level coverage. Assessment instruments are valid only if assessment items are in alignment with course competencies[2]. Another challenge in achieving the quality for assessments is the existence of measurement error. Manual evaluation of students scripts creates extreme variations between universities due to inter rater and intra rater factors[3][4]. These factors can be eliminated if a system based approach is used to automatically evaluate the assessment items thereby achieving reliable measures for assessment. In this paper, we have proposed 12 unique item types which are amenable for automatic evaluation and can be used

to assess wide range of student’s abilities pertaining to a competency. II. Q UALITY OF E XISTING A SSESSMENT I NSTRUMENTS A S URVEY We have conducted a survey of twelve assessment instruments used for NIELIT O Level which is a national level examination for facilitating students in the development of concept based approach for problem solving using IT as a tool. The motivation behind conducting this survey is to evaluate the cognitive level coverage for these assessment instruments. The authoring agency conducting large scale assessments provides a list of competencies, their cognitive level and the expected number of lecture hours. Each subject has 8 to 12 competencies. Each assessment instrument is composed of items from all the competencies. It was observed that mostly all the competencies either belong to understand or apply cognitive level[5]. Therefore, we have assumed the estimated cognitive level weight for remember(WR ), understand(WU ) and apply(WA ) assessment items for an assessment instrument as 30%, 40% and 30% respectively. Each item in all the twelve assessment instruments are tagged with appropriate cognitive level and competency. The actual cognitive level weight for each assessment instrument is found out. Deviation of actual weight from the estimated weight corresponding to each cognitive level is found out for all twelve assessment instruments as shown in Table I. Following were the observations derived from the survey results: •

•

The existence of standard deviation shows that the actual cognitive level distribution is not same as expected cognitive level weights. The assessment instruments fails to assess student’s abilities for all cognitive levels adequately. Lower cognitive levels (Remember/Understand) were given more significance.

Assessment instruments focusing more on lower cognitive levels are easy to evaluate, but their quality is poor because of the presence of assessment instrument specific errors. Validity and reliability can be achieved for assessment instruments by including assessment items which are in alignment with course competencies and by avoiding inter-rater and intra-rater errors which occurs during evaluation of answer scripts.

Assessment Instruments AI1 AI2 AI3 AI4 AI5 AI6 AI7 AI8 AI9 AI10 AI9 AI10 Standard deviation

WRactual WUactual WAactual 56% 35% 9% 22% 68% 10% 50% 40% 10% 38% 49% 13% 47% 51% 2% 38% 50% 12% 15% 15% 70% 25% 7% 68% 100% 100% 100% 100% 41 23 26 TABLE I C OGNITIVE L EVEL C OVERAGE FOR A SSESSMENT I NSTRUMENTS

III. R ELATED W ORK - I TEM T YPES There is a clear distinction between selective and supply assessment items[6] in existing literature. Selective items require students to select the correct response from several alternatives or supply a word to answer a question. It is mainly comprised of following item types: Multiple choice, True/false, Match the following and Fill in the blanks. Each of these item types can be automatically evaluated. Supply items require students to write and present an original response. It includes short-answer essay, extendedresponse essay, problem solving and performance tasks. Supply items assess students abilities for wide range of cognitive level but they cannot be automatically evaluated because of various reasons[7]. Assessment instruments with supply items are valid, but are not reliable because of various measurement errors[3]. For evaluating essay type items, various automated techniques including latent semantic analysis, natural language processing and multivariate bernoulli model are proposed in literature[8][9][10], but the average accuracy of evaluation for each of these method is 90%[7]. At one end, manual evaluation of supply type assessment items for large number of answer scripts for large scale assessment is not feasible practically, at the same time, it is essential to assess student’s abilities for higher cognitive levels which is only possible with the help of supply items. To overcome this challenge, we have proposed a systems approach to assess students abilities for higher cognitive levels using items which can be automatically evaluated.

Fig. 1. Assessment Item Attributes

IV. AUTOMATED A SSESSMENT I TEMS

Fig. 2. Compare Item Type

Fig. 3. Reorder Item Type

Following is a discussion on 12 automated item types and their evaluation criterion. An item is created by an item author using an item authoring tool which is used for creating and storing items. Each item is represented by the attributes including cognitive level, knowledge dimension and difficulty as per IMS QTI[12] specifications as shown in Figure 1. Item authors also provides sample response which is used for automatic evaluation. An item bank is a collection of assessment items that are used for constructing assessment instruments. 1) Compare: This item type is used when two or more concepts/ procedures/ systems/ products are compared based on given set of criteria. A comparison can be made on the basis of set of criteria including accuracy, efficiency and speed. Since the expected response is finite and fixed, it can be automatically evaluated by comparing student’s response with expected response as entered by item authors. A supply type item, Explain the difference between insertion and merge sort. can be asked using the format shown in Figure 2. 2) Reorder: A sequence of actions is provided. The specific objective that the sequence of actions is supposed to achieve is

Fig. 4. Complete the Block Diagram Item Type

in the blanks) can be selected which makes it amenable for automatic evaluation. A supply type item, Explain basic organization of computer system using a block diagram can be asked using the format shown in Figure 4. 4) Complete the flow chart: In this item type, flow chart with certain number of blanks will be provided to students and they will be asked to fill those blanks. Blanks can be provided corresponding to connector type, block type or block captions. Block type can be input, output, process, decision, summation, etc. Connector can be any one of single sided arrow, double sided arrow, dotted line or connector without any arrows type. The specific system that the flow chart is supposed to represent is also specified. For each blank, any one selective item type (multiple choice, true/false, match the following and fill in the blanks) can be selected which makes it amenable for automatic evaluation. A supply type item, Draw the flow chart for the following code. main ( ) { int i f o r ( i = 1 ; i