Learning SQL with a Computerized Tutor

1 downloads 0 Views 662KB Size Report
The course is usually taken by abotit 50 higher-level undergraduate stu$&ts,. and covers various topics, such as data models, database design; relational query ...
Learning SQL with a Computerized

Tutor

Antonija Mitrovic Computer Science Depaitment, University of Canterbury Chriskhurch, New- Zealand [email protected] Abstract

Why SQL?

SQL, the dominant databaselanguage, is a simple and highly structured language; yet, students have many difficulties learning it. This paper presentsSQL-Tutor, an Intelligent Teaching Systemdesignedasa guided discovery learning environmept,which helps studentsin overcoming thesedifficulties. We present design issuesand the current state in the implementation of the system, with special focus on individualization of instruction towards a particular student.

SQL is a completedatabaselanguage;it contains data and view definition statements,as well as data manipulation statements.The entry level of SQL2 standardfrom 1992 is supportedby mostDBMS vendorstoday. Although there is a movementtowards graphical query interfaces, SQL is an extremely important developmentin the databaseworld. It will be, used for years to come either for interactive or programmed access to databases (embedded in other programming languages. and tools for application development). SQL3, the latest standard scheduled to ~pp.~ in 1998,-islikely to gain even more importance for SQL, with the introduction of support for knowledge-based and 00 -applications and distributed databases,among other-newfeatures. Despite the simplicity and highly structured nature of SQL, studentshavemany problems learning it. Someerrors in students’ queries come from the burden of having to memorize databaseschemas;incorrect solutions may, for example, contain incorrect table or attribute names. Other errors come from misconceptions in student’s understanding of the elements of SQL and the relational data model in general. Someof the conceptsstudentsfind particularly difficult to grasp are grouping and restricting grouping. Join conditions and the difference between aggregateand scalar functions are another two common sources of confusion. Other reseaizhers report the same studentmisconceptions[6]. SQL is usually taught in the classroom, by solving problems on the blackboard, complemented by lab exercises.However, studentsfind that is not easy to learn SQL directly by working with a DBMS, as the error! messagesare limited to the syntax only. Figure 1 illustrates a situation in which example 1 requires the student to specify a query with five clauses,as shown in the correct solution. When the studententershis/her incorrect solution, typically the error messagegeneratedby a RDBMS (Ingres in this case) will not be of much help. The same figure illustrates the kind of messages’the student may obtain from SQL-Tutor. Note that SQL-Tutor can give feedback on semanticerrors aswell (such as specifying two tables in FROM where only the MOVIE table is needed).

Introduction

_

_

Thk author has beenteaching SQL since 1991,as a part bf a Database Management course. The course is usually taken by abotit 50 higher-level undergraduatestu$&ts,.and coversvarious topics, suchas datamodels,databasedesign; relational query languages (SQL, relational algebra and calculus), query processing, normalization theory, transactionprocessingand distributed daebases. SQL is a simple and highly structured language; yet, studentshave many difficulties learning it. In this paperwe present SQL-Tutor, an Intelligent Teaching System (ITS) developedfor guided discovery learning of SQL. As other ITSs, SQL-Tutor focuses on the individualization of instructional sessions towards a particular student, by developing a model of student’s knowledge, learning abilities and general characteristics, and tailoring instructional actionsto student’sneeds. The rest of the paper is organized as follows. hi the next sectionwe look at the typical problemsstudentshave when learning SQL and then ‘discusssomerelated approachesto supporting SQL learning ,in section 3. Section 4 presents the architecture pf SQL-Tutor, and briefly surveys’ its components.Learning support provided by SQL-Tutor is the theme of section 5. Finally, section 6 gives die&ions for fitore researchand our plans for SQL-Tutor. Permissionto makedigitalhard copiesof all or part of &is mate&i for personalor classroomuseis grantedwithout fee providedthat the copies areoot madeor distributedfor profit or commercialadvantage,the copy- _ right notice,the title of the publicationandits dateappear,andnoticeis given that copyrightis by permissionof the ACM, Inc. To copy otherwise, , to republish,to poston serversor to rediibute to lii requiresspecific permissionandlorfee.

SIGSCE98 AtkintaGA USA Copyright 19980-89791-994-7I98I2.S.00

e

’ Notethatthestudentis givenonly onemessage at a time,as governedby thkpedagogical rules.Herewe showall relevant messages for illustration. 307

Example 1: For eachdirector, list the director’snumberand the total number of awardswon by comedieshe/she directed if that number is greaterthan 1. Correct solution: SELECT DIRECTOR,SUM(AAWON) FROM MOVIE WHERETYPE='comedy GROUPBY DIRECTOR HAVING SUM(AAWON)> 1

Student’s solution: SELECT DIRECTOR,SUM@AWON) ' FROM DIRECTOR JOIN MOVIE ON DIRECTOR=DIRECTOR.NUMBER WHERETYPE='comedy'

INGRES:.E_USOB63 line 1, The columns the GROUPBY clause. , " . ’ SQL-Tutor:

in

l

. l

0’ . II

the SELECT clause

must

be contained

.

in

I You need to speciJL the GROUP BY clause! The problem requires summary information. Spectfy the HAJ%VG clauseas well! Not allgroupsproduced by the GROUP BYclause are relevantfor this problem. You do not need all thetables you specified! Ifthere are aggregate functions in the SELWT clause, and the GROUP BY clause is empty, then SELECT must consists / ‘_ ,I of aggregate functions only. For-everytable’that appears-in the FROMclause, there must be at least one attributefrom that table used in any clause I{‘ii of the query. /, ,I :.& I f i .,

1, $ia&quacy of feedback from a RDBMS : I notice the error made. However, SQL-Tutor provides an / Example 2: .List ,thenarhesof,all directors born in or after appropriatemessage. ” ,,a,, I 1920. ‘, ’ ,” Related Work 1 !I Correct sollion: ,--. 0 SELECT LNAME, = 1920 ,,I to specify queries in relational algebra, tuple or domain IL relational calculus, or SQL. Queries can be executed and I’, Student’s solution: students can inspect resulting tables. The system also SELECT LNAME, FNAME allows studentsto inspect definitions of tables, createnew FROM DIRECTOR i ,: databases,alter or update existing databasesand store !. WHEREDIqD >= 1920 definitions of queries. The esql system[6] supportslearning SQL by visualizing Tngres: the stagesin query processing.The student seesesql as a fnamk lnqqe graphical query interface. Once the studenthas specified a 'Alfrbd Hitchco& SQL statement,the systemprovides a step-by-stepdisplay Cecil De' Mille of how the resulting table is formed. Fbrd, ', ,.' !,," ,t',John " Both systems provide more’ information on SQL than commercial DBMSs and better user interfaces. However, SQL-Tuto;: ’ ’ they suffer from the sameproblem as DBMSs: they cam@ Check that you are comparing the numerical constant tothe provide feedbackbasedon the student’ssolution, due to the tight attribute in the WERE clause! ’ ’ . . lack of knowledge neededto reasonabout the semanticsof :, the problem being solved. Figure 2. Inability of a’ RDBMS to deal with semantic ‘_I errors b SQL-Tutor ., , ’ * I_ Figure 2 illustrates a’situation of a semanticerror. Instead It is well known that one-onione human tutoring is much of using the BORN attribute, the students specified the more effective than traditional classroom instruction [2], search condition on the DIED attribute, and the ‘DBMS ’ The goal of researchin ITS is to build computerizedtutors produced the result ,showh. The student.may not even that achieve the effects of learning individually with a ,;‘i

:, &&e

I

308

human tutor. ITSs contain domain knowledge, which enablesthem to selectproblemsto be posedto students,to diagnoze student’ssolution and/or to solve the problems. Furthermore,such systemsalso contain knowledge of their students, represented in the form of student models. Pedagogical knowledge is necessaryin ITSs in order to generate appropriate pedagogical actions (such as feedback). Finally, these systems also require communication knowledge, in order to communicate effectively with students. SQL-Tutor is an ITS for SQL programming,implemented in CLOS [5] on SUN workstations.It will soonbe ported to PC compatibles. Many dialects of SQL exist, since databasevendors do not follow the standards.SQL-Tutor hasbeentailored to SQL as implementedby Ingres. SQL-Tutor is designed as a practice environment; we supposethat studentshave previously been exposedto the concepts of databasemanagementin lectures. Therefore, the systemis not a substitute for the conventional style of education, but a complement to it. The system currently covers only the SELECT statement of SQL, but the same approach could be used with other SQL statements.This focus on the SELECT statement does not reduce the importance of the system,becausequeries causethe most misconceptions for students. Moreover, many of the conceptscoveredby SELECT are directly relevant to other SQL statementsand other relational databaselanguagesin general. student models

CBM

I Pedagogical module T

9

Interface

Student Figure 3. Architecture of SQL,-Tutor As illustrated in figure 3, SQL-Tutor has a very simple architecture; it consists of a user interface, pedagogical module and a student modeler. The interface is illustrated in figure 4. The interface is a mediating device and henceit provides information about the system itself The main window of SQL-Tutor is divided into three areasthat are always visible to the student The upper part of the window displays the text of the problem being solved and the student can always remind him/herself easily of the elementsrequestedin the query. The middle part contains

309

,

..r--

the clausesof the SQL SELECT statement,thus visualizing the goal structure. Studentsneed not rememberthe exact keywordsusedand the relative order of clauses.The lowest part displays the schemaof the currently chosen database. The schema name is given first, followed by the descriptionsof tables.Each table is shown by its name and schemaenclosedin a box. The name(s) of the attribute(s) forming the primary key is underlined and given in blue. The foreign key attributes are given in red. In such ways, the interface of SQL-Tutor supportsthe reification of goal structure and reduces the working-memory load of students. The visualization of schemas is quite important; all databaseusers are painfully aware of the constantneed to remembertable and attribute namesand the corresponding semanticsas well. Studentscan ask for the description of databases,tables or attributes by selecting appropriate options from the Help menu, or by directly selecting table/attribute names. Furthermore, users can learn about elements of SQL, such as functions, expressions, predicates, operators and others, by selecting appropriate options in the Help menu.The motivation here is to remove from the student some of the cognitive load required for checking the low-level syntax and to enablethe studentto focus on higher-level query definition problems. Students can also obtain the descriptions of various clauses by selectingthe appropriateclauseor by asking for help from the main menu The Open menu allows for selection of a databaseor a problem to work on. The pedagogicalmodule (PM) is the heart of the system;it selects problems to be given to students and generates appropriate instructional actions according to the student model. PM observesevery student’saction performedin the interface, and reactsto it appropriately.At the beginning of the interaction, a problem must be selectedfor the student to work on. When the student enters the solution for the current problem, PM sendsit to the studentmodeler,which checks whether the solution is correct or incorrect, and updatesthe student model. The pedagogical module then generatesappropriatefeedback.When the current problem is solved, or the studentrequiresa new problem to work on, the pedagogicalmodule selectsa new problem on the basis of the studentmodel. The system contains definitions of several databases,which are also implemented on the RDBMS used in the lab (currently Ingres). New databases can easily be added,by supplying the sameSQL files used to createthe databasein Ingres. SQL.-Tutor also contains a set of problems for specified databasesand the ideal solutions to them. The solutions are necessarybecausethe systemhasno domain module and is not capableof solving problems. The rationale for such a departure from the typical architecture of an ITS, which also includes a domain module, follows. Designing an ITS to teach SQL presentsvarious difficulties. Databasequeries

J~2-,“^‘.;

,,,

_

-,,...,.

. . .

r-...,..

*--

.,,.

-.

J 2

*

.

I-

-..._,.

-.-

_

i

.I ',

Figure 4. The Interfaceof SQL-Tutor semanticsof problems,by comparing students’solutions to the ideal (correct) ones.That is the reasonfor SQGTutor to require ideal solutions to problems. Constraintsthat comparethe student’ssolution to the ideal one are more complex. For example,constraint 186 applies to situationswhere the WHERE clauseof the ideal solution contains (at least one) condition which checkswhether the value of a numeric attribute is greater than some numeric constant and the same attribute appears in the student’s solution in a condition with greater-than-or-equalinsteadof the grater-than operator. If that is the case,the constraint ensuresthat the constantin the student’ssolution is 1 less than the constantin the ideal solution. The constraintbaseof SQL-Tutor currently consists of 199 constraints, which are acquired by analyzing the domain knowledge [4,9] and on the basis of a comparativeanalysis of correct and incorrect solutions. It is well known that knowledge acquisition is a very slow, time-consuming and labour-intensive nrocess.Anderson [l] reports 10 or more hours necessaryior induction of a production rule. When interviewing domain expertsin order to acquire knowledge for expert systems, usually 2 to 5 production rules ,equivalents are identified per day. The time spent on identification, implementation and testing of SQL-Tutor constraints averagesat 1.3 hours per constraint, which is significantly shorter than times above. This may be the consequenceof the same person serving as the domain expert and knowledge engineer (and the systemdeveloper, at that matter), but may also illustrate the appropriateness of the chosenformalism.

are given in a natural language;however,the current stateof-the-art in Natural LanguageProcessing(NLP) is still far from being able of handling various problems present in queries, such as references and synonyms. There is a possibility to circumscribethe I&P problem: the text of the problem may be representednot in its natural-language .form, but in a’form which could be the product of NLP, as done in [l]. However, it is hard not to build parts of the solution into such a representation1,IFurthermore,even if we overlook the NLP problem, the knowledge required to ,write, SQL queries isvery fuzzy. Therefore, it, would !bk ,very difficult,, if not entirely impossible, to develop a ’ .,, problem solver inthis area. SQL-Tutor is basedon Constraint-BasedModeling (CBM) .[8], a student modeling approach that focuseson student errors. For further details. of CBM and how it is implementedin SQL-Tutor, see [7]. Domain knowledge is representedin CBM in a descriptive form, as constraints, and is used to identify the errors. Constraints divide all possible problem statesinto equivalenceclasses.All states in a single classare deemedto be pedagogically equivalent ‘in that they generatethe sameinstructional action. _ : SQL-Tutor evaluatesstudents’solutions by matching them to constraints:Someconstraintsdeal with the syntax of the &urguage;for example,there is a constraint saying that the SELECT clauses of all solutions must not be empty. Another example of a syntactic constraint checksthat if a student’s solution contains aggregate functions in the SELECT clauseand the GROUP BY clauseis empty, then the only kind of expressionsallowed in the SELECT clause are aggregate functions. Other constraints deal with 310

on the basis of error messagesand correct solutions. We plan to elaboratethe pedagogicalactions that will provide more emphasison self-explanation,and also to incorporate other forms of meta-learning,such asusing analogies.

A student model contains general information about the student (his/her name and the level of knowledge), a history of previously solved problems, and information about the usage of constrains, as mancfestedin student’s solutions.

Conclusions Learning in SQL-Tutor The main goal of ITSs is the individualization of instruction. In SQL-Tutor, insection can be individualized in several ways, by generating feedbackdyn&nically and selecting topics and problems, on the basis of the student model. The level of feedbackdetermineshow much information is provided to the student. Currently, there are five levels of feedback in the system: positive/negative feedback, error flag, hint, partial solution and complete solution, arranged in the increasingorder of the amount of information. At the lowest level (positive/negative feedback), the message simply informs the student whether the solution is correct or not and, in the later case,how many errors there are. An error flag messageinforms the studentab&t the clause*in which the error occurred.A hint-type:messagegives more information about the type of error, as illustrated in figure 4. Here, the student is given a general descri&ion of the cause of the error. Partial solution feedbackdisplays the correct content bf the. clause in question, , while the complete soiution sbply displays the correct solution of the current problem. Problemsare also selectedon the basis of a studentmodel. SQL-Tutor examines the student model and s’elects 8 problem for a constraint that’ the student has violated before, or a problem that requiresthe use of a constraintnot used by the student.The systemalso allows,the studentto select the problem on his/her own. Such an approach introduces randomness in the coverage of co&raints, which can meari that the student in practising the use of someknown cotistraint or even introducing new ones.,The randomnessthus provides for challengeand/or review, and at the sametime helps control for potential inaccuraciesin the student model. Admittedly, the problem selection strategiesjust discussedaretoo simple and we are currently . . ” 2. 2 developing more sophisticatedones. SQL-Tutor is based on guided disc&&y ahd’learning-bydoing. It supports three kinds of’ learning: conceptual, problem solving and me&learning. The @dent can learn about concepts and elements,,!of SQL by asking for explanations, using menu option’sand interface don’trols. SQL-Tutor is a problem-solving environment th& supports acquisition of domain knowledge in a declarativeform (i.e. constraints) and strengthening of ,+is knowledge in practice. SQl-Tutor provides assistancehi problem solving and argumentsagainstincorrect actions.Finally, th$ system encouragesme&learning by supporting self-explanation

/

* In casethatthereareseveralmessages for variousclauses,the pedagogical modulewill selectonepf themto startwith. 311

This paper presented the current state in the implement&on of SQL-Tutor. The systemhas been shown to a number of database teachers, who were very supportive and expressedgreat ‘enthusiasmfor using it in their. own courses. We plan the system to be ready for classroomuse in early 1998. Before the systemcan bp evaluated,th&e are severalshortterm goals, such as further sophistication of the interface and completion of the constraint base.In order to provide a more realistic working environment, we plan to connect SQL-Tutor to a DBMS. In such a way, the student may inspecttablesor query results. We believe that SQL-Tutor will prove to be invaluable due ‘to the semanticallyrich feedbackit generatesand its ability to adapt,to a particular student.There are many possibilities for extending this research. More research is needed on pedagogicalrules and problem-selectingstrategies.Related arks ‘in the databasearena, such as relational algebra and calculus, data modeling or nomialization, could serve as domains for other small instructional tools and be connected with SQL-Tutor into a database,_exploration “worldl’. -j, ,/~ ;. Refer’ences ii Ander$on,’ J.k; Corbktt, A.T., Koedinger, K.R. and ‘Pdletier, R. Cognitive ‘Tutors: LessonsLearned. The Journal of the Learning Sciences 4, (1995) 167-207 (1995). 2.. Bloom, B. The 2 Sigma Problem: The Search for Methods of Grotip Instruction as Effective as One-toone Tutoring. Educational Researcher 13, (1984) 3-16. 3. Dietrich, S. WinRDBI: a Windows-Based Relational ‘DatabaseEdudationalTool. In SIGCSEP7 126-130. 4. ghnasri, R and Navathe, S.B. Fundamentals of s,_database qstems (2nd ed.). Benjamin/Cummings, ’ Redwood,CA, 1994. ;. Franz Inc. Allegro Common Lisp, 1996. A Teaching 6. Kearns, R, Shead, S. and Fekete, A. System for SQL. In Australasian Computer Science Education ACSE’97,ACM Press,(1997) 224-231. 7. Mitrovic, A. SQL-Tutor: a Preliminary report. Tech. Rep., Computer Sci&ce Dept., Univ. of Canterbury, TR-COSC 08/?7,1997. 8. Ohlsson, S. Cons&&t-Based StudentModeling. In J.E. Greer and,G.I. McC+la (eds.): Student Modeling: the I@

to Individualized

fiowledge-Based

Insiruction.

Springer--Verlag,Berlin (1994) 167-l 89. 9. Pratt, P.J. A Guide to SQL, Boyd & Fraser, Boston, 1990.

Suggest Documents