gradually losing their interest in the C++ programming course because they .... version 3.0 onwards, Python also uses functions for both input and ... indented. Indentation of instructions in one instruction .... submitting their solutions to the assignments in C and, .... Python," in Proceedings of the 8th Australian Conference on.
Introductory Programming Languages in Higher Education D. Krpan, I. Bilobrk Faculty of Science, Split, Croatia E-Mail: {divna.krpan, ivan.bilobrk}@pmfst.hr
Abstract - A growing number of students enrolled in introductory programming courses is experiencing difficulties grasping the basic concepts and algorithms, which in turn manifest themselves in their poor exam results. While determining the cause of these difficulties, we found that they are not unique to Croatian universities, but rather a worldwide phenomenon. Focusing on, in our findings the leading cause: the choice of the introductory programming language, we have perceived that today’s students require constant motivation which tends to be absent in languages having a complex syntax. In this paper we describe our experiences in using the C, QBasic and Python programming languages in introductory programming courses. In order to better examine the problems, we conducted a two part research with the students. The first part focused on group work with programming assignments using the three aforementioned languages, and the second part was a follow up research where we examined students’ experiences and knowledge retainment after a relatively short period of time.
I.
INTRODUCTION
Students enrolled in introductory programming courses often experience difficulties grasping basic programming concepts and algorithms. Many first year students in our faculty come from high schools without any previous programming knowledge or experience. In their first semester they learn basic algorithms in the QBasic programming language, and in their second semester they learn C. The complex C syntax makes learning programming difficult, more so for students who failed to pass the introductory programming course in the previous semester. Problems while learning programming are present in universities all over the world. Students in introductory programming courses often encounter complex languages such as Java or C++ because they are very popular in the industry or the job market, but their complexity is often noted as the source of problems [1]. Such programming languages are powerful but not intended for writing small beginner style programs. Python emerged as an alternative, offering flexibility, simple and elegant programming for simple novice problems and asserting itself as the language of choice for introductory programming courses. Python has gained a lot of popularity in the industry, from the likes of Philips, Google, NASA, US Navy and Disney [2]. Although its syntax is simple, Python is a powerful programming language, enabling different programming
styles, from procedural to object-oriented and also rapid prototyping [3]. Novice programmers might benefit from its simple syntax because they focus on designing algorithms, not the syntax. Usually, students like to see their program work as soon as possible and immediately start writing programs in the language they learn, but later they often realize that they don’t possess enough syntax knowledge to express their idea. In the second section we will give a short overview of the use of programming languages in the world and in Croatia as well as our findings regarding problems students face while learning programming. In the third part we describe in detail our research, which was conducted with first year students on the Faculty of Science in Split during two semesters. Lastly we give a conclusion of our findings and a possible direction for future research. II.
ORIGIN OF DIFFICULTIES WHILE LEARNING PROGRAMMING IN HIGHER EDUCATION
Although we acknowledge the fact that students who are beginning to learn programming need to learn programming concepts and techniques and not a specific programming language, one language must be chosen in order to express those concepts and realize those routines. This is the introductory programming language. In this paper, we consider the introductory programming language to be the one with which students get familiar with basic data types and algorithms for solving more or less simple problems. Some are better suited than others but every language should have a simple syntax, quick feedback (interpreter languages), and support structured programming (to visually distinguish block statements) [4]. Statistics on the other hand, unequivocally show which languages are most sought after in the industry, and what languages the beginners of today will have to know when they stop being beginners and start applying for jobs tomorrow. For example, C++ might be used for introducing modularity and abstraction, but although it is used on many universities, it is also very difficult for learning and teaching [5] and students quickly lose their motivation. A. Learning programming prior to higher education Students should spend as much time coding as possible. That is why language syntax should be simple, structured, but also powerful [4]. Pascal and Logo were considered as adequate teaching programming languages,
but Pascal did not develop as fast as other languages, and Logo is considered as language for children. Programming courses prepare students for real life and many universities consider real life requirements, and the best way to look into that is to look into real life programming language relevance. One of the most cited programming language relevance indexes is the TIOBE Programming Community Index, which takes into account the number of times a certain language was sought after on a number of the most popular search engines, courses taught, job ads posted, third party vendor sales made and the number of skilled engineers world-wide in the past month. In Table I. we give the top 10 rankings for July 2010 [6]. It is obvious that Java, C and C++ are very popular. From our point of view, we question suitability of those languages for beginners because of their complexity. Many students often come to our faculty with poor or no programming or even computer skills. The Croatian National curriculum envisages informatics twice a week for 45 minutes [7], but not in all high schools. Some schools teach informatics only in the first year, and students learn mostly office applications (word editors, spreadsheet tables etc.) with very little programming if any. Then, they have a three-year gap before university. Schools with “mathematics” in their title, teach four years of informatics with much more programming. There is very little online information about programming languages used in different high schools, but from our research and experience in working in some of those schools, languages used to teach programming are: Pascal, BASIC and C. Developer of Python, Guido von Rossum recommends teaching programming in elementary school as one of basic skills such as reading, writing and mathematics [8]. All students that learn programming will not become great programmers, just like all students that learn writing will not become great writers, which is ok but the baseline of computer skills should be higher than it is today. The problems students face in higher education are a direct result of the unresolved problems in elementary and high schools. Elkner in [5] described his experiences with teaching programming in high school. They chose C++ to help students because it was also used in universities that students might choose to enroll in later. Students were
TABLE I. LANGUAGE POPULARITY BASED ON TIOBE INDEX Rank July 2010 1 2 3 4 5 6 7 8 9 10
Rank July 2009 1 2 3 4 6 5 7 8 21 9
Programming language
July 2010
Java C C++ PHP C# Visual Basic Python Perl Objective-C Javascript
18.673% 18.480% 10.469% 8.566% 5.730% 5.516% 4.217% 3.099% 2.498% 2.432%
Change from July 2009 -1.78% +1.16% +0.05% -0.70% +1.19% -2.27% -0.22% -1.10% +1.99% -1.08%
gradually losing their interest in the C++ programming course because they found it very difficult. Teachers and students were thrilled after switching to Python, especially advanced programmers because the students were able to make far more complex programs which were later used by the school. The number of enrolled students increased although the course was described as challenging [9]. So, there are problems with learning programming in education prior to universities which certainly affect students’ success in university programming courses. We also emphasize the importance of choosing an adequate introductory programming language and have decided to review what others use. Because there are many universities in the world, we decided to select top ranked universities. Just like we chose a ranking system for programming languages, we selected a ranking system for universities which is based on the two most prominent ones: •
The Academic Ranking od World Univeristies (ARWU) (http://www.arwu.org/) developed on Jiao Tang University of Shanghai,
•
Times Higher Education - QS World University Rankings (http://www.topuniversities.com/) (THE-QS) published in "Times Higher Education magazine".
The rankings differ somewhat in the position of certain universities as well in the inclusion of some and exclusion of others in the top ten. We have chosen, without giving precedence to one ranking system, to analyze the seven universities found among the top ten of both rankings. They are listed here alphabetically with their respective positions on each ladder given in parentheses, the first number being the position in the ARWU top ten (Table II.). A is shown by Table II, the “top seven” employ a mixture of languages to introduce the novice student to programming, from procedural through scripting and object-oriented languages to functional ones. Considering the fact that the Croatian higher education system differs profoundly from these universities, it is pointless to analyze Croatian universities per se. We shall in turn analyze Croatian faculties emphasizing on computer science in their curricula. Unlike the wide variety of languages used in the “top 7” universities, Croatian colleges are mostly unison when it comes to choosing an introductory programming language. We observed the following colleges in Croatia: •
Department of Electrical Engineering Information Technology (EIR) Dubrovnik
•
Department of Informatics (INF) Rijeka
•
Department of Mathematics (MATH) Zagreb
•
Faculty of Electrical Engineering (ETF) Osijek
•
Faculty of Electrical Engineering and Computing (FER) Zagreb
•
Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture (FESB) Split
and
TABLE II.
„TOP RANKED“ UNIVERSITIES
University
ARWU
THE-QS
Language
CalTech Cambridge Harvard MIT Oxford Princeton University of Chicago
6 4 1 5 10 8 9
10 2 1 9 6 8 7
Python Java C, PHP i JavaScript Python Haskel JavaScript, Java C/C++
•
Faculty of Organization and Informatics (FOI) Varaždin
Since only FOI and INF teach C++ and all others teach the venerable C, it is evident that the introductory programming language in Croatia is mostly the same. The reason for excluding our Faculty of Science in Split in the list above, is the number of introductory programming courses, because it has two as opposed to all others which have one. The reason for this is fairly complex and goes outside the scope of this article, but to summarize, it is because of the previously mentioned issue of the poor standard of teaching programming in both primary and secondary education. Students (or more precisely pupils) rarely write a single line of code, and those who actually do some programming do it maybe in their first year, 3 years before they enroll to our Faculty. Of course there are schools which focus on programming in their computer science courses, but they are the exception to the rule. Since we cannot treat the vast majority of these students the same as the few who have actually done some programming, the introduction to this world is fairly gentle. The first course Programming 1 (P1), introduces the student with the basic concepts using QBasic, and then the following course Programming 2 (P2), deepens that knowledge using a more powerful programming language, namely C. In the academic year 2009/10, courses P1 and P2 both consisted of 15 weeks of lectures and labs. One group of students also had advanced labs. Students auditing advanced labs in course P1 solved advanced QBasic assignments, but compared to other groups, they did not produce better results. So, it didn’t seem very likely that advanced assignments in C would make any improvement for them in next semester. We therefore introduced Python in order to try to simplify things for them. Students worked on the same assignments in Python on advanced labs as in C on regular labs. Simpler syntax would hopefully enable them to understand algorithms. The recurrent term so far is “simpler syntax” so lets observe that. B. Syntax comparison: QBasic, C and Python First program beginners write is “Hello world” program. In C it looks like this: #include int main() {
printf("Hello world!\n"); return 0; } On the other hand, in Python and QBasic-u we must write simply: print "Hello world!" Here we give a comparison of Python, C and QBasic based on varying programming concepts we teach our students, including a description of the idiosyncrasies of each language and emphasising on the syntax of each language. The most rudimentary programming concepts every student needs to comprehend in an introductory programming course are: data types, input/output, decision making (branching), loops, arrays, functions and files. The full complexity of C syntax is notable in the “Hello world” example. We had to use functions, include libraries, special characters, and returned function value. All this complicates programming for beginners. 1) Data types Generally, number data types are similar in all three languages. C requires variable declaration, but QBasic and Python do not. Python is dynamically typed, which means that one variable may contain different types of values in the same program. There are pros and cons for this [10]. Python is special because of the inclusion of the very long integer, which is virtually unlimited in its size and built in complex numbers type. In C, character is also considered as number (equivalent to ASCII code). String is not available as a simple data type in C. Since a string is defined in C as a nul-terminated array of characters, an assignment operation is only available in declaration. On the other hand, a string in QBasic is treated as a simple data type, which unlike in C complicates access to each character. Python is more flexible here because it may treat a string as simple data type, array of characters and also as an object. It is impossible to input a string in C without introducing arrays. 2) Input and output Data input and output in C is most complicated because students must use functions, understand memory addresses and passing parameters to functions by value and reference in order to simply input a number. From version 3.0 onwards, Python also uses functions for both input and output, but they are simpler and Python also offers conformance with C formatted output style using “placeholders” if necessary. QBasic contains input and output statements. Here we should probably remind ourselves that we are observing this “complications” from novice programmer’s point of view. String input may become complicated or confusing when spaces are involved. A space character is considered as a “data separation” character in some input functions and the programmer should be more careful with strings. QBasic and Python are not “space-sensitive”. 3) Branching Curly brackets in C are used to denote instruction block. Each statement in QBasic which may contain instruction block has keyword for that purpose. Python imposes indentation. Each instruction block must be
indented. Indentation of instructions in one instruction block must be the same. This feature may be interesting for beginners because it forces them to write tidy code. “Select case” type of statement is not available in Python. 4) Arrays Standard data structure available in QBasic and C is array. Instead of arrays, Python offers: lists, tuples and dictionaries. Lists resemble one-dimensional arrays, but may contain mixed data types like integers, strings, etc. and also other lists. Tuples are unchallengeable arrays of one data type. Dictionaries are like hash tables. String input is more complicated in C because programmer must consider maximum size of an array, but for QBasic and Python there is no difference compared to input of character. 5) Loops All three programming languages have the numbered (FOR) loop as well as conditional loops. QBasic and C contain loops with conditions at the end, but Python does not. FOR loops in C and QBasic only work with integers, but Python allows any data type, including arrays, strings and files. 6) Functions Python is stricter about passing parameters to functions. Parameters are passed by reference, but some immutable data structures like tuples, integers, strings, etc cannot be changed. 7) Files File access is available in all three languages in text and in binary form, reading one character at the time, or one line, structure, etc. We decided to conduct a two part research in order to determine and resolve problems while learning programming in our introductory programming courses. The first part was conducted with the intention to improve students’ achievement by encouraging them to work in groups, and the second was conducted as a follow up research. III.
DETERMINING AND RESOLVING ISSUES IN INTRODUCTORY PROGRAMMING COURSES
Students involved in the first part of the research were monitored in their first semester during the course P1. Through informal conversation, we realized that they rarely collaborate or learn together. As first year students, they do not know each other well, so this was expected. After the first semester, we noticed they continued with same practice of individual work although some students expressed interest for collaboration. Students obviously seem more comfortable asking their peers for help, and since they lacked collaboration, teachers had to take the initiative. A. Group modeling research Students involved in the course P2 described syntax as very complex and difficult. Introducing Python would hopefully focus students’ attention to algorithms. Python seems very close to the natural English language and offers different modes and styles for solving problems. During labs, students expressed their opinions about the
three programming languages they encountered. Some of our students were excited about Python’s often shorter solutions. On the other hand, some students learned C in their high schools, and did not share same enthusiasm about learning another programming language besides C and QBasic. We decided to conduct the research in the last three weeks of advanced labs, after students learned enough to be able to work on more advanced assignments. Students were divided into 5 small groups of 6 members. Each group had to program in all three languages, but group members were allowed to select their preferred language. 1) Assignment selection With respect for students’ different preferences, we decided to distribute assignments they had to solve in all three programming languages. To minimize the chances for slacking, it was necessary to break the workload into pairs. Assignments for each group were similar and consisted of two parts, to make working in pairs easier. Each assignment should be too complex for individual member, but simple enough to keep students interested and motivated, based on Hoppe’s problem-selection criterion [11]. Students had to write a program for coding/decoding text files. The first part of program reads a text file and translates each character into ASCII code, and then translates each code into a number system with base N. Second part of program reads a file containing numbers in base N, translates it into ASCII codes, and back to characters. Each group had different number system (3, 4, 5, 7, 8). Since we formed groups of six students, there were three pairs available in each group. Each pair of students had to write a program in one programming language, and each student in the pair should write one direction (coding or decoding). Students were free to choose peers and languages, according to their preferences. Assignments were graded in three aspects: (i) speed and accuracy, (ii) communication and (iii) report. Only the delivery of a complete solution (both directions at the same time) was accepted. Communication between groups was not allowed. Since faster students would get more points, such communication was not in their best interest. This is inspired by competitiveness criterion [11]. Communication inside group was recommended. We selected the social networking tool Facebook as a communication medium. 2) Selection of group members There are two kinds of student groups: informal and formal ones. Informal groups are temporary, unplanned and exist in a short period of time. Formal groups are planned and members work together on a project or an assignment. Learning in small groups might increase students’ satisfaction, learning and retaining of knowledge [12]. Groups discussed in this paper are formal groups. To simplify, we might say there are two ways to select group members: teacher’s choice or students’ choice. The teacher plans the formal groups, but then students might not collaborate or communicate. If we allow students to choose, they might form uneven groups, and such a process could last longer.
In order to overcome disadvantages in both approaches, we used a hybrid approach. We prepared a questionnaire which contained all students’ names. Each of the 30 students had to grade at least five other students on the scale from 0 to 4. Grade 0 indicated dislike (“I don’t want to work with that student”) and other positive grades indicated a different intensity or level of likeness, with a maximum grade 4 (“I really want to work with that student”). Students were encouraged to grade all other students to provide more detailed information. Generally, students are not very enthusiastic about questionnaires [13] and our students provided less details than expected. However, from that information we formed three groups consisted of six students. Because of insufficient information on students’ preferences, for the remaining 12 students, we also considered other parameters such as test results in programming languages (final exams in P1 and mid-term exams in Python and C). We did not choose homogenous groups with a similar level of knowledge because those two groups would be too different, so we formed the last two groups with evenly distributed levels of knowledge. 3) Disscussion of results Students were excited about the unusual informal conversation with their teachers on Facebook, and that seemed promising. Groups were labeled with letters A, B, C, D and E, by the order of selecting group members. That means that groups A, B and C were selected using students’ social preferences, and groups D and E were selected by using previous test results. Members of group B were fastest in joining their Facebook group and started communicating with the teachers. The next groups were A and C. It is interesting to notice that some members of D and E never joined their Facebook groups, which resulted in incomplete groups and according to students’ statements, they were concerned about solving their assignments since they didn’t know each other well, and that they would not be able to collaborate. Group A consisted of students who had to take the course P2 second time and they did not know other students well. They were unusually motivated for work, submitting their solutions to the assignments in C and, QBasic before any other group, but Python remained unfinished. One of the students reported the groups progress to the teachers, how they were meeting after classes and how they communicated offline, which could not be monitored. Although group B was first in joining the Facebook group, in the end they sent nothing. Group C finished the QBasic and Python programs and did not finish C part. Individual students in groups D and E sent programs in Python by email, and they did not collaborate with other members. Obviously, assignments remained incomplete for all groups. Since the research was conducted in the last three weeks of labs, some students from group B stated they had to prepare for pending exams. The best groups and individuals were rewarded with an increased final grade in the course P2, but there was no punishment for unfinished assignments. Although one of the pending exams was the exam from the course P2, students missed a good chance for practice.
Random students’ statements did not provide us with much useful information about programming language they prefer and why, so we decided to take a closer look. In the beginning of their second year, in the third semester, we decided to gather opinions from the students on all programming languages they encountered in the introductory courses P1 and P2, in order to try and determine how much knowledge stuck with them after the holidays. We were hoping that the gathered data could help us in choosing an appropriate introductory programming language. B. Follow-up research We prepared 3 problems of varying difficulties for the students to solve in each of the programming languages they had learned in the courses P1 and P2. We encouraged students to use a modular approach by using functions to help them get past different parts of each problem, but did not insist on it. The selected problems were •
Calculate n!
•
Reverse a given string
•
Count the words in a given text file
The first problem is the easiest of the three and we believed that none of the students would have any problems whatsoever in any of the languages. The second problem was an intermediate one, where students had to enter a string and reverse it letter-by-letter, as opposed to word-by-word. Students were free to choose to use pointer arithmetic or “classic” array indexing while programming in C. The third problem, and in our opinion the most difficult one, was to count the words in a text file. Students were given predefined sentences, which they had to input into a text file. Then they would have to open that text file and count the words in it. We declared the words to be separated with at least a single white space from each other and not using punctuation marks, dashes or dots. Before and after the actual programming, we conducted a quick survey regarding student expectations with regard of them actually solving the problems, which language would pose more difficulties, which assignment and so on. 1) Disscussion of solutions The solutions varied in number of lines of code according to the difficulty of the assignment but also according to the programming language used. This was particularly noticeable with the solutions of the same assignment in C and in Python. The C solutions were as much as three times lengthier, whereas the Python solutions were short, elegant and extremely selfTABLE III.
STUDENT SOLUTION SUCCESS
Assignment
Done in QBasic
Done in C
Done in Python
1 2 3
74,19% 51,61% 0,00%
74,19% 38,71% 3,23%
35,71% 21,43% 0,00%
explanatory. Students’ solutions were varying. Very few of them solved the assignments in all of the languages they had learned, and most of them didn’t solve the third problem at all. 2) Disscussion on survey The survey results were quite interesting, with students mostly correctly perceiving where their difficulties lie, be it a specific assignment or a programming concept they grasp in one programming language but not the other(s) or a programming language as a whole. The questions in the pre-survey were: 1.
Grade the ease of use of each programming language from 1 to 5
2.
Which programming concepts do you find easy and which do you find difficult?
3.
Which programming language will be easiest and which most difficult for you to solve each assignment?
4.
Will you solve all three assignments?
education. We have found that these problems are not unique to our own Faculty but also in leading universities in Croatia and the world. We have identified these causes to be firstly a poor programming background in primary and secondary education and secondly the choice of the introductory programming language. During our research we tested three programming languages (QBasic, C and Python) in order to assess which would be the best as an introductory programming language for our students. Although Python seemed promising, since regrettably the final grade in the introductory programming course P2 was based on the knowledge of Python, the students didn’t approach it seriously enough. We have also found that the retainment of knowledge was poor which we attribute to the relatively long time between the courses and our research. Future research would include a more detailed study not only in our Faculty but also in the entire country, to get a clearer picture of student problems. REFERENCES [1]
In the post-survey we asked these questions: 1.
To what degree were your expectations true?
[2]
2.
Did you overestimate or underestimate the difficulty of any of the assignments?
[3]
3.
Which programming language was the easiest to solve the assignments and why?
4.
Which programming language was the most difficult to solve the assignments and why?
Although the students graded C to be the most difficult programming language to use (average grade 2,60) before the actual solving of the assignments, 70,97% of them later stated that it was easier to solve the assignments in C. The main reason stated was that C was the most recently used language, which would seem to coincide with 51,61% of the students saying QBasic was more difficult because they learned in the first semester and they forgot the syntax. Although Python was studied in the same period of time as C, the final course grade included only programming in C, not in Python, so it is in our opinion that the students didn’t take Python seriously as they should have. Students accurately perceived which programming concepts they would have the most difficulties with (files and string manipulation). This is apparent from the poor number of students solving the third and second assignment. We are pleased however, that our perception of difficulty did not differ from the students’ because upon ranking the assignments based on their respective difficulties, the students ranked the assignments in exactly the same way as we did. IV.
CONCLUSION
In this paper we wanted to assess the potential causes for problems while learning programming in higher
[4]
[5]
[6]
[7] [8]
[9]
[10]
[11]
[12] [13]
J. Zelle, Python programming: An introduction to computer science. Wilsonville, Oregon: Franklin, Beedle & Associates Inc., 2003. H. Fangohr, "A comparison of C, MATLAB, and Python as teaching languages in engineering," in Computational Science ICCS 2004, ed, 2004, pp. 1210-1217. A. Gauld. (2010, March 3.). Learning to Program. Available: http://www.freenetpages.co.uk/hp/alan.gauld/, 2007. L. Grandell, M. Peltomäki, R. Back, and T. Salakoski, "Why complicate things?: Introducing programming in high school using Python," in Proceedings of the 8th Australian Conference on Computing Education, 2006, pp. 71-80.. J. Elkner. (2010, 20.06.). Using python in a high school computer science program. Available: http://www.python.org/workshops/200001/proceedings/papers/elkner/pyYHS.html, 2000. TIOBE. (2010, 10.07.). TIOBE programming community index for July 2010. Available: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html MZOŠ. (2010, 22.10.). Gimnazijski i umjetnički programi. Available: http://public.mzos.hr/Default.aspx?sec=3261 G. von Rossum, "Computer programming for everybody (revised proposal): A scouting expedition for the programmers of tomorrow", Corporation for National Research Initiatives, Reston VA, 1999. J. Elkner, L. Berezhny, and J. Straw, "Using Python in a high school computer science program - year 2," presented at the Tenth Internationl Python Conference, 2002. S. Ferg. (2010, 04.03.). Python & Java: A Side-by-Side Comparison, Available: http://pythonconquerstheuniverse.wordpress.com/category/javaand-python/, 2009. U. Hoppe, "The use of multiple student modeling to parameterize group learning," in Proceedings of AI-ED'95, 7th World Conference on Artificial Intelligence in Education, Washington, DC, AACE, 1995, pp. 234-249. B. G. Davis, Tools for teaching. San Francisco: Jossey-Bass Publisher, 1993. A. Gleeson, J. McDonald, and J. Williams, "Introductory microeconomics students’ perceptions of the effectiveness of a collaborative learning method", Innovation, 2005.