Use of Data Mining to Determine Cheating in Online ... - CiteSeerX

368

Conference on Data Mining | DMIN'06 |

Use of Data Mining to Determine Cheating in Online Student Assessment Alberto Ochoa1,2 1

Programa de Ingeniería en Computación, UAIE; Universidad Autónoma de Zacatecas, Mexico. 2 Institute (Postdoctoral Program), State University of Campinas, Brazil.

Abstract There are several online assessment applications developed as per various technology specifications. All of them execute certain basic functions such as providing assessment items, training and/or evaluation, and the assignment of a grade. A large amount of data is stored by these applications to execute their tasks. Data items such as starting times, local or remote IP addresses, finishing times are stored into the database. Data regarding student’s behavior such as frequency of visits, training attempts, and preliminary grades for specific subjects, demographics and perceptions about subjects being evaluated are also stored. It is important to use this vast amount of data effectively to learn patterns and trends in student’s behavior for identifying and preventing any fraudulent behaviour. Hence we propose a data mining based approach to identify students (persons) who commit cheat in online assessments and identify patterns to detect and avoid this practice. Keywords: Data mining, Internet Frauds, Online Assessment, Data Warehouse, KDD

1.0 Introduction Online assessments are useful to evaluate the student’s knowledge. In online assessment the identity of the students can be misleading. Hence the issue of fraudulent behavior in on-line assessment is interesting. For the undertaken work, we consider fraud as a deception made for personal gain in education through fraudulent activities [1]. In simple words, “committing cheat,” means obtaining a “better grade” in an online assessment using fraudulent ways. Considering that the Internet is the media of interaction to commit cheats, we introduce the term cyber cheat. Our discussion on this issue focuses on the student’s behavior under the online assessment environment. We

Amol Wagholikar3 3

School of Information and Communication Technology Faculty of Engineering and Information Technology Griffith University, Australia

are proposing a model to help organizations detect and prevent cheats in online assessments. First we analyze different student personalities, stress situations generated by online assessments, and the common cheating practices used by students to obtain a better grade on the exams. Following this discussion, we present our proposed DMDC (Data Mining to Detect Cheats) model. We will explain the components of our model. Next, we present the analysis of the designed database schema to register the student’s information. In the following section, we will explain the use of Weka [2] to carry out data mining in order to find fraud behavior patterns that fits suspect profiles. Finally, we discuss the data preliminary obtained by the application of the proposed model in a real university environment and summarize our conclusions.

2.0 Analysis of student behavior Donald McCabe at the Center conducted a survey for Academic Integrity [3]. It discovered the top five reasons behind student’s cheating behavior. These are: lazy / didn’t study or prepare, to pass a class or improve a grade, external pressure to succeed, didn’t know answers, time pressure / too much work. In a sample of 1,800 students at nine state universities in United States of America, seventy percent of students admitted cheating on exams [4]. This proves the importance of this issue and hence there is need to apply various approaches to resolve this issue.

3.0 Proposed Model to Detect Cheats Data mining is a knowledge discovery process to reveal patterns and relationships in large and complex data sets [5]. It refers to extracting or “mining” knowledge from large amounts of data [6]. Moreover, data mining can be


369

used to predict an outcome for a given entity. Data mining has been successfully used to analyze student behavior [7, 8], and to detect user cheats in credit cards [9] and insurance companies. We propose the use of Knowledge (interesting patterns) Discovery in Databases (KDD), a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [8] for detecting student cheats in online exams. Figure 1 shows required modules: a Data Base, a Data Mining Engine (DME), the Knowledge base (Training Data, Verification and evaluation), and the Model of pattern recognition. Clean, Collect, Summariz e

Data Base

Data Preparation

Data Mining

Training Data

Verification & Evaluation

Operational Databases

Model, Patterns

Figure 1. KDD schema and data mining We explain the components of our model next.

3.2 Online Surveys Information about student’s behavior, demographics, and student’s perception about subject and professor style is obtained using this approach. Information about Subject Environment is obtained from the professor.

3.3 Proprietary Online Testing System (OTS) Information about careers, subjects, topics, results of test and log files must be supplied by the selected OTS. In our case we used a customized OTS. This system uses XML learning objects based on IMS QTI Version 2.1 [12]. It is multiplatform (developed on Java), ciphers communication between the Client and the Server and allows students to be tested from remote places or in local area network [13]. Our OTS shows the test’s questions and the answers in a random way. Each question has assigned a time to be answered (depending on the difficulty level), and do not permit to go back to the previous questions. These features are recommended to avoid cheating [14].

3.1 Database In this section we analyze relevant classes, their key variables or attributes, and expected values. The suggested variables for the Class Students are based on the works of Genderman [10] and Smyth M. L. et al [11] that will be stored the DBMS. Once we identified key variables, we establish relationships among classes as shown in the database (DB) schema see Figure 2. The classes Careers, Subjects and Topics are used to manage tests.

3.4 Knowledge base Knowledge Database contains knowledge that allows the inference mechanisms obtain conclusions [15]. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction [16]. The knowledge acquisition is shown in the following figure.

Splitting Busy?

Students IdStudent IdCareer (FK) HistoryLog IdTransaction IdCareer (FK) IdStudent (FK) IPaddress Action Date Time

first last middle age gender electronicSignature permission GPA Semester Employment Enrollment Major Personality SocialActivities Influenced Awareness

No

Careers IdCareer

Ye

name

SubjectEnvironment

N

Major Business

IdSubject IdProfessor

Subjects IdSubject IdCareer (FK)

ChatAccess SpacingOfStudents MultipleChoiceExam MultipleVersionsExam

name semester

GPA 80

ResultsOfTests IdResult IdCareer IdTopic (FK) IdSubject (FK) TestPassed TestInterrupted Rights Wrongs Grade Rating Level Questions Date Time

YE

NO

Topics IdTopic IdCareer (FK) IdSubject (FK)

Figure 3. Knowledge Hierarchies

name associatedTes

3.4. Data mining engine (DME)

StudentsPerception IdStudent IdProfessor IdSubject SubjectPerception ProffesorInvolved InstructorVigilance UnfairExam ConfusingExam QuizTest

Professor IdProfessor Name

Figure 2. Database schema to control transactions.

Ideally DME consists of a set of modules for tasks such as characterization, classification, cluster analysis, and evolution and deviation analysis. We will use Weka as data mining engine. Weka is a collection of machine learning algorithms for data

370


mining tasks. The algorithms can either be applied directly to a dataset or be called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well suited for developing new machine learning schemes [17]. categorical categorical continuous class T/d 1 2 3 3 4 5 6 7 8 9 10 11 12 13 14

Bussy Yes No Yes Yes Yes No Yes Yes No Yes Yes No Yes No Yes

Major Business Business Business Education Education Education Fine Arts/Humanities Fine Arts/Humanities Fine Arts/Humanities Health Professions Health Professions Health Professions Math Sciences/IT Math Sciences/IT Math Sciences/IT

GPA Cheat 9.0 No >8.0 Yes 9.0 No >9.0 No >9.0 No 9.0 No >9.0 No

Use of Data Mining to Determine Cheating in Online ... - CiteSeerX

Use of Data Mining to Determine Cheating in Online ... - CiteSeerX

Suggest Documents

Use of Smart-meter data to determine Distribution ...

Use of Biochemical Kinetic Data To Determine Strain Relatedness ...

Use of Smart-meter data to determine Distribution

Use of secondary data sources to determine the business vitality ...

Use of secondary data sources to determine the business vitality ...

use of ssr data to determine relationships among early ... - Maydica

Determine Appropriate Post Mining Land Use in Indonesia Coal ...

implementation of data mining in online shopping system ... - CiteSeerX

Data mining in astronomy - CiteSeerX

Mining for Data: Assessing the Use of Online Research - International ...

Online Mining of Data Streams:

Use of Data Mining Methodologies in Evaluating Educational Data

Use of Temperature and Humidity Sensors to Determine ... - CiteSeerX

Use of a chelating agent to determine the metal ... - CiteSeerX

The use of the Taguchi approach to determine the ... - CiteSeerX

use data mining to improve student retention in ... - Semantic Scholar

A Systematic Classification of Cheating in Online Games - CiteSeerX

Dynamic Data Mining - CiteSeerX

Educational Data Mining - CiteSeerX

Medical Data Mining - CiteSeerX

Dynamic Data Mining - CiteSeerX

Spatial Data Mining - CiteSeerX

Educational Data Mining - CiteSeerX

From unsupervised learning to data mining - CiteSeerX