British Journal of Educational Technology doi:10.1111/j.1467-8535.2009.00980.x
Vol 42 No 1 2011
40–49
Using text mining to uncover students’ technology-related problems in live video streaming
M’hammed Abdous and Wu He M’hammed Abdous is director of the Center for Learning Technologies at Old Dominion University in Norfolk, Virginia. His research interests include distributed learning trends, e-learning and quality assurance, process reengineering, and curriculum planning and development. Wu He is an instructional technologist in the Center for Learning Technologies at Old Dominion University in Norfolk, Virginia. His email address is
[email protected]. Address for correspondence: M’hammed Abdous, Center for Learning Technologies, Old Dominion University, Norfolk, VA 23529, USA. Tel: 1757 683 6378; email:
[email protected] _980
40..49
Abstract Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students’ learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students’ technology related-problems and to improve their learning experience, we applied text mining to data culled from LVS interactions. Our findings revealed low LVS student participation, which triggered us to initiate several actions to promote more active student participation. Our findings support previous studies regarding the effectiveness of data mining in transforming raw educational data into knowledge and decision-making tools.
Introduction At the risk of overstating the obvious, unprecedented advances and convergences in hardware, software and networking technologies are reshaping the distance learning landscape. A new generation of broadband wireless tools with staggering computing processing power has increased storage capability, and more efficient audio/video compression encoding standards are diversifying distance learning delivery modes. According to a 2006–07 survey completed by the US National Center for Educational Statistics, 75% of all US distance education programmes use some form of synchronous internet-based technology, with 49% of these programmes using two-way interactive video (Parsad & Lewis, 2008). Although traditional distance learning programmes have long used video as a delivery modality, either by broadcast, videotape playback or web-based video archive, few institutions use live video streaming (LVS) as a delivery mode. As a real-time delivery mode, © 2009 The Authors. British Journal of Educational Technology © 2009 Becta. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
Using text mining to uncover students
41
LVS leverages the explosive growth of broadband connectivity to expand classroom walls by enabling students to attend classes remotely. Although this delivery mode clearly provides real-time interaction and feedback opportunities to students, it also poses new logistical and pedagogical challenges, both for instructors and for students. Instructors teaching LVS students have to manage interaction using multiple devices while integrating their LVS students into an expanded classroom dynamic. Meanwhile, the LVS students are challenged to remain active and engaged, and to participate in the classroom dynamic while watching their instructor on their desktop screens. Parallel to this trend, explosive growth in online learning management systems is generating massive amounts of unstructured educational data. This phenomenon is reflected in the emergence of data/text mining (TM) and analytics as complementary forces transforming the landscape of technology in higher education (Norris, Baer, Leonard, Pugliese & Lefrere, 2008). Indeed, Allison and DeBlois (2008) note in the latest EDUCAUSE Current Issues Annual Survey that business intelligence, analytics and data mining are among the main areas of e-services which are currently consuming resources. According to Campbell, DeBlois and Oblinger (2007), analytics uses statistical techniques to mine institutional data and uses predictive modelling to produce ‘actionable intelligence’. By exploiting the capabilities of recent technological innovations (computational processing power, the sophistication of databases, artificial intelligence, and modelling and statistical methods), data mining is enabling organisations both to uncover and to understand the hidden trends and patterns found in vast databases (Luan, 2002), and to turn data into useful information and knowledge (Hanna, 2004). This paper is focused upon the intersection of these two relatively new trends: LVS and data/TM. By applying TM to the data generated from LVS students’ questions, we aim to unveil hidden patterns, technology-related problems and trends capable of guiding our efforts in improving LVS students’ learning experience. We begin by reviewing the literature on the usefulness of data/TM in turning educational data into knowledge and decision-making tools. After that, we share some background information about our study, followed by a description of our data collection and mining process. Finally, we discuss some preliminary findings and make recommendations for the effective use of TM as a tool which can improve LVS students’ learning experience.
Literature review Data mining Although data mining has been widely used in business environments to predict future trends and consumer behaviours (Harding, Shahbaz, Srinivas & Kusiak, 2006; Ngai, Xiu & Chau, 2009), only recently have higher education institutions started to exploit the potential of this powerful analytical tool (Black, Dawson & Priem, 2008). According to Castro, Vellido, Nebot and Mugica (2007), data mining is being used in higher education (1) to assess students’ learning performance, (2) to provide feedback and © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
42
British Journal of Educational Technology
Vol 42 No 1 2011
learning recommendations based on students’ learning behaviours, (3) to evaluate learning material and web-based courses, and (4) to detect atypical students’ learning behaviour. Following this line of thinking, Perera, Kay, Koprinska, Yacef and Zaiane (2009) used clustered data mining techniques to support learning group skills by building automated mirroring tools capable of facilitating group work. In a similar study, Sun, Cheng, Lin and Wang (2008) used rules based on data mining to form high-interaction learning groups. For their part, Hung and Zhang (2008) applied data mining techniques to server logs, both to reveal online learning behaviour patterns and to support online learning management, facilitation and design. Their study’s results revealed students’ behavioural patterns and preferences, which helped them to identify active and passive learners, and which extracted important parameters for the prediction of their performance (Hung & Zhang). Using a similar approach, Ba-Omar, Petrounias and Anwar (2007) analysed Web access logs to identify learning patterns and offline learning styles. Elsewhere, Zaiane and Luo (2001) have analysed server logs to understand online learners’ behaviours in an effort to improve their web-based learning environment. Later, Zaiane (2002) used association rule mining to construct a recommender system based on data from online learners’ profiles, access histories and collective navigation patterns. This system can ‘intelligently’ recommend learning activities or shortcuts to learners, based on the actions of previous learners. Similarly, Burr and Spennemann (2004) have pointed out that analysis of the patterns of user behaviour is important from both the technical and pedagogical perspectives in order to: (1) predict network and traffic load, (2) align pedagogy with users’ behaviours, and (3) plan and deliver services in a timely manner. TM TM is focused on finding and extracting useful or interesting patterns, models, directions, trends or rules from unstructured text documents (Feldman, 1995; Hung & Zhang, 2008; Nahm, 2004). As an automated technique, TM is used to efficiently and systematically identify, extract, manage, integrate and exploit knowledge for research and education (Ananiadou, 2008). To illustrate the steps in the process of TM, we propose the following, as seen in Figure 1. In sum, with their ability to sift through massive structured textual data (data mining) and unstructured textual data (TM), both analytical tools have great potential to transform raw educational data into actionable information. This paper describes our contribution to this upcoming field of research. By applying TM to messages submitted by students to their instructors, we propose that support for the LVS students’ learning experience can be improved. To limit the scope of this paper, our goal is to explore the potential of TM as a tool (1) to improve student support during LVS courses © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
Using text mining to uncover students
43
Data/text mining process
Step 1
Step 2
Data/text preprocessing
Data/text processing
Step 3 Data/text analysis
Raw educational data database
Extraction and preparation
Step 4 Actionable information
Trends, mondels, behaviours, preferences
Data/text collection
Prepared data/text
Data/text mining software data cleaning
Data/text analysis extraction, categorisation, clustering, etc. Recommendations, implementation of actions
Data/text collection
Prepared data/text
Results
Figure 1: Text mining process
(particularly by detecting self-reported technical problems by the students) in an effort to improve student satisfaction, and (2) to detect students’ participation patterns, engagement or disengagement, in order to improve both the students’ learning experience and their instructors’ ongoing support. Background of the study This study was conducted in a moderate-sized, urban, public university that has been involved in technology-delivered distance learning since the mid-1980s. Historically, its distance learning courses have been broadcasted via satellite from the main campus to remote receive sites around the country. In recent years, though, delivery modes have been expanded, and they now include two-way video, the Internet, CD-ROM and LVS. In Spring 2009, all of the televised distance learning courses archives (115 courses) were available for distance learning students to view online. Of these courses, 75% are offered in an LVS format, accessible by students unable to attend remote receive sites. To enable the option of LVS, a cross-functional team was created, which included network engineers, programmers, instructional designers and audio/video experts who addressed a mix of technical, pedagogical, logistical and policy issues. On the technical side, in order to build upon our existing two-way videoconferencing system, our video © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
44
British Journal of Educational Technology
Vol 42 No 1 2011
Figure 2: Live video streaming student interface
streaming team selected H.264 (a worldwide standard known for delivering highquality video) as our video compression standard. To support this choice, a cutting-edge hardware and software infrastructure was built, which ingests, stores and pushes the signal to an external provider. In order to reduce the usual delay and latency problems associated with streaming video, we outsourced our live streaming services to Akamai, a leader in Internet content delivery.
LVS interface For the receiving end, a comprehensive LVS interface (Figure 2) was designed. This interface allows students to view the video stream (either live or archived), take notes and email them, send messages to the instructor during class, communicate with other streaming students during class and get help with their questions. In addition, students can search, bookmark and share their bookmarks, on either the live or the archived videos of their classes. © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
Using text mining to uncover students
45
Virtual instructional assistant As part of the user interface, the virtual instructional assistant was developed to enable LVS students to submit their questions/comments to their instructor. Questions submitted by LVS students are displayed instantaneously on a monitor next to the instructor. Instructors have the option to read/answer the messages, or to save, archive and email them for later review. This tool is intended to enable instructors to seamlessly integrate LVS students into their classroom dynamic, without distraction or overburdening during class time. Methodology Sample In an effort to improve the LVS learning experience, this study applied TM techniques (extraction and categorisation) to messages generated by students during their interaction with their instructors. Our sample included 125 LVS courses (from five different colleges within the university) with a total enrolment of 942. A total of 1780 online text messages were generated from students’ messages to their instructors during their participation in these LVS courses. Each saved message was associated with a course ID, a user ID, a question, and a date and time stamp. TM process As shown in Figure 1, data/TM is a multi-step process. In our case study, we followed a process similar to one proposed by Clementine (Clementine, SPSS Inc., Chicago, Illinois, USA), a leading TM software using Statistical Package for the Social Sciences (SPSS): 1. Identify the text to be mined. We retrieved the online text messages from the database into Excel spreadsheets and conducted pre-processing. In this step, we deleted repetitive entries and analysed the average length of each message. In total, 1289 messages were left for further analysis, with an average of 15 words per message. 2. Mine the text and extract structured data. We created a data stream using Clementine’s TM module. After that, we initiated the mining process, which applied linguistic methods (extracting, grouping, indexing, etc) to explore and extract key concepts from the SPSS data file. 3. Build concept and category models. After executing the current stream, the TM node launched an interactive workbench session. This session allowed us to create categories, work with extracted concepts from our text data, and explore patterns and clusters. This workbench session also offered us the ability to identify relationships and associations between concepts, based on known patterns. Research findings Improving LVS student support Because one of our intents was to improve user satisfaction by addressing technical problems, we were particularly interested in uncovering information related to © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
46
British Journal of Educational Technology
Vol 42 No 1 2011
Table 1: Four main categories Categories Course content Class logistics Technical difficulty
Other
Descriptions Concepts, questions, tasks, etc Test/exam, time, recording, video archives, planning, etc Computer problems Video problems Audio problems Account login problems Time out, cannot see other video streamers, etc Includes messages difficult to categorise and social statements, such as greetings and emotional responses
technical difficulties or issues encountered by the LVS students. Some representative quotes about technical difficulties faced by LVS students, particularly during the first 2 weeks of class, are listed here: • Prof. __, if you can see this. none of the video streamers can see or hear you. • Can the sound on the video be turned up. • I don’t know if you can see this. I have no sound or picture. I couldn’t even login until a few minutes ago. • I am checking in ... . However, there is no sound. • I did not check in early due technical issues. When will the video archives be available? Related to this, we noticed that students were eager to help each other and report solved technical issues: • Video and sound now up and running • We had technical issues with video streaming today but it is now working (on my computer). • I had some technical difficulties, but I got my communication working now and I’m checking in. In order to capture the full range of the messages and to produce better results, we combined several concept-grouping techniques as we created categories. Among them were concept derivation, concept inclusion and semantic networks (Clementine, 2008). Each works in a slightly different way. Using these automated techniques, a number of categories were identified. After reviewing the generated categories, we made some adjustments by deleting, merging, combining and/or refining, and then, we identified four main categories (Table 1). In general, most of the technical problems were addressed as they appeared. In addition to this, we used the mined data to update our frequently asked question website, and we upgraded our computer compatibility test, which is an online tool that we developed for students to ensure their technical readiness to take LVS courses. Also, while reviewing the text messages, we identified a frequent request from students asking to enable them © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
Using text mining to uncover students
47
Page views 20 000 10 000
22.2% 11.1% 0% y da Sa tu r
ur
sd
sd Th
ne W
ed
ay Fr id ay
y da
y
es Tu
da
y da
on M
Su n
ay
0
Figure 3: Page views noting use on particular days of the week, over a semester
Page views 9000 6000 3000 0 0:00 3:00 midnight a.m.
10.0% 6.6% 3.3% 0% 6:00 a.m.
9:00 12:00 3:00 a.m. noon p.m.
6:00 p.m.
9:00 p.m.
Figure 4: Page views noting hours of use
to expand their viewing window by adding a full-screen option to the video window. As a result of our TM, we were able to build this option into a recent update to the LVS interface. Finally, to gain a better understanding about video streaming students’ navigation patterns and habits, we analysed the server logs using Sawmill, a powerful, hierarchical Web server log analysis tool. Figures 3 and 4 display statistics showing day and time patterns. The information shown in Figures 3 and 4 gave us enough information to identify students’ viewing habits and to ensure adequate technical support during periods of use. Improving student participation To our surprise, our findings revealed a very low level of engagement and participation by LVS students. The average number of messages posted by each student was fewer than two. Many LVS students seldom participated in the live class discussion; in fact, some students never posted a single message during the entire semester. After reviewing the number of messages per course against class time, we concluded that this lack of student participation was not related to class time (ie, early morning and Saturday morning classes). This low level of engagement and interaction triggered several actions, which targeted both instructors and students. Assuming that instructors play a critical role in engaging their students, we set out to implement the following actions to facilitate instructors’ work with LVS students: © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
48
British Journal of Educational Technology
Vol 42 No 1 2011
1. Develop a comprehensive online orientation covering all aspects of LVS course facilitations. The topics included logistics, participation, interaction and collaboration techniques. Under the collaboration and interaction section, we provided several examples of engagement and participation techniques, such as encouraging students to post at least one question and reserving time for answering LVS students’ questions. 2. Provide an online interface for instructors to review archived messages, thus highlighting the need to focus on the quality of messages. Clearly, instructors are aware that LVS students’ participation or engagement is not merely assessed by the number of messages, but rather by the thoughtfulness of the messages. Our interface elucidates this for them. 3. Provide an online interface to track LVS student attendance by automatically capturing login and logout times. For the students, we developed a separate, student-focused orientation, which explains LVS interface functions and encourages them to participate in class. The orientation also shares some practical tips on ways to actively participate in their LVS courses. Conclusion and future research The fast-developing field of educational data and TM is enabling higher education institutions to sift through large amounts of data generated by various learning management systems. These powerful analytical tools are helping administrators to uncover hidden trends and meaningful student learning behaviours (Romero & Ventura, 2007). In our research for this paper, we applied TM techniques to textual data generated from students’ interactions during LVS courses. As a result, we identified some insights and issues associated with students’ technical problems and interaction. Our results support the findings in previous studies (Hung & Zhang, 2008) that data and TM techniques are efficient tools for discovering patterns of data and can be potentially insightful about students’ learning performance. Indeed, our findings revealed a low level of student participation, which led to several action items intended to improve the overall LVS student learning experience. To assess the effectiveness of these actions, we intend to mine data generated during multiple semesters, including LVS student-to-student chat messages. By exploiting the computational power of data/TM to understand the variables influencing students’ participations and engagement in LVS courses, we anticipate an improvement, both in our students’ overall learning experience and in their satisfaction with this flexible delivery mode. References Allison, D. H. & DeBlois, P. B. (2008). Current issues survey report, 2008. EDUCAUSE Quarterly, 31, 2, 14–30. Ananiadou, S. (2008). National centre for text mining: introduction to tools for researchers. Retrieved February 8, 2009, from http://www.jisc.ac.uk/publications/publications/ bpnationalcentrefortextminingv1.aspx Ba-Omar, H., Petrounias, I. & Anwar, F. (2007). A framework for using web usage mining to personalise e-learning. In Proceedings of the Seventh IEEE International Conference on Advanced Learning Technologies (pp. 937–938). Niigata, Japan. © 2009 The Authors. British Journal of Educational Technology © 2009 Becta.
Using text mining to uncover students
49
Black, E. W., Dawson, K. & Priem, J. (2008). Data for free: using LMS activity logs to measure community in online courses. The Internet and Higher Education, 11, 2, 65–70. Burr, L. & Spennemann, D. H. (2004). Patterns of user behavior in university online forums. Journal of Instructional Technology and Distance Learning, 1, 10. Retrieved February 8, 2009, from http://www.itdl.org/Journal/Oct_04/article01.htm Campbell, J. P., DeBlois, P. B. & Oblinger, D. G. (2007). Academic analytics: a new tool for a new era. EDUCAUSE Review, 42, 4, 40–42. Castro, F., Vellido, A., Nebot, À. & Mugica, F. (2007). Applying data mining techniques to e-learning problems. Studies in Computational Intelligence, 62, 183–221. Clementine (2008). Clementine. Retrieved February 8, 2009, from http://www.spss.com/ clementine/ Feldman, R. D. (1995). Knowledge discovery in textual databases (KDT). Paper presented at the First International Conference on Knowledge Discovery and Data Mining (KDD-95), August 20–21, Montreal, Canada. Hanna, M. (2004). Data mining in the e-learning domain. Campus-Wide Information Systems, 21, 1, 29–34. Harding, J. A., Shahbaz, M., Srinivas, S. & Kusiak, A. (2006). Data mining in manufacturing: a review. Journal of Manufacturing Science and Engineering, 128, 4, 969–976. Hung, J. & Zhang, K. (2008). Revealing online learning behaviors and activity patterns and making predictions with data mining techniques in online teaching. MERLOT Journal of Online Learning and Teaching, 4, 4. Retrieved April 6, 2009, from http://jolt.merlot.org/vol4no4/ hung_1208.htm Luan, J. (2002). Data mining and its applications in higher education. New Directions for Institutional Research, 113, 17–36. Nahm, U. Y. (2004). Text-mining with information extraction (Doctoral dissertation, The University of Texas at Austin, Austin). Ngai, E. W. T., Xiu, L. & Chau, D. C. K. (2009). Application of data mining techniques in customer relationship management: a literature review and classification. Expert Systems with Applications, 36, 2, 2592–2602. Norris, D., Baer, L., Leonard, J., Pugliese, L. & Lefrere, P. (2008). Action analytics: measuring and improving performance that matters in higher education. EDUCAUSE Review, 43, 1, 42–44. Parsad, B. & Lewis, L. (2008). Distance education at degree-granting postsecondary institutions: 2006–07. First look. NCES 2009-044. National Center for Education Statistics. Retrieved February 8, 2009, from http://nces.ed.gov/pubSearch/pubsinfo.asp?pubid=2009044 Perera, D., Kay, J., Koprinska, I., Yacef, K. & Zaiane, O. (2009). Clustering and sequential pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering, 21, 6, 759–772. Romero, C. & Ventura, S. (2007). Educational data mining: a survey from 1995 to 2005. Expert Systems with Applications, 33, 1, 135–146. Sun, P. C., Cheng, H. K., Lin, T. C. & Wang, F. S. (2008). A design to promote group learning in e-learning: experiences from the field. Computers & Education, 50, 3, 661–677. Zaiane, O. R. (2002). Building a recommender agent for e-learning systems. In Proceedings of the International Conference on Computers in Education, December 3–6, 2002 (pp. 55–59). Washington, DC, USA. Zaiane, O. R. & Luo, J. (2001). Towards evaluating learners’ behavior in a web-based distance learning environment. In Proceedings of the IEEE international Conference on Advanced Learning Technologies, August 6–8, 2001 (pp. 357–360). ICALT, IEEE Computer Society, Washington, DC, USA.
© 2009 The Authors. British Journal of Educational Technology © 2009 Becta.