Mar 3, 2017 - An R Function to generate proportions and logistic regression models of correct answers based on topic, learning levels, and questions format.
An R Function to generate proportions and logistic regression models of correct answers based on topic, learning levels, and questions format Felix E. Rivera-Mariani, PhD March 3, 2017
Summary This report shares a simple R function, prop_correct_logit, which generates tables of proportions for correct answers in formative or summative assessments by topic, Bloom’s taxonomy levels, knowledge dimmensions, and question’s format. With this function, rather than writing a single line to generate a proportion table, this function generates all four proportions simultaneously. In addition, a logistic regression model is included to determine which topic, Bloom’s taxonomy level, knowledge dimmension, or questions format has higher weights on obtaining a correct answer. Additional instructions on how to gather and organize the data are discussed below. Note: Find the complete RMarkdown for this report here.
Outline 1) 2) 3) 4) 5) 6) 7)
R packages needed How to collect the data How to organize the data The prop_correct_logit function Example output Discussion R session information that generated this report
R packages needed 1) pander: R package to easily render tables in R and RMarkdown documents. 2) tidyr: R package with functions to manipulate the data, including converting a a wide dataset into a tidy dataset. Refer to find [more information about tidy datasets] below. 3) package to upload the data, such as reader, readxl, xlsx, among others, depending on the format your data was saved.
How to collect the data The prop_correct_logit main goal is to facilitate analysis of proportions of corrects to incorrect (in percentages) in assessments. For this purpose, the answers must be collected in a dichotomous format: correct/incorrect or 1/0, for example. See below an example of data collected for analysis with the prop_correct_logit function:
1
Table 1: Examble 1 for collecting data to use in the prop_correct_logit function question
topic
bt
kd
format
student1
student2
student3
1 2 3 4 5 6
bio_hierch sci_meth sci_meth bio_hierch bio_hierch bio_hierch
RMB UND APPL RMB ANLZ ANLZ
FACT FACT CNCP FACT CNCP CNCP
SA SA T/F SA SA SA
1 0 0 0 0 0
0 0 0 1 1 1
1 1 1 1 1 1
In the example above, which corresponds to students responses in in-class quizzes, the first five columns corresponds to information about the each questions: question number, topic of the question, Bloom’s taxonomy level(bt; RMB = remembering, UND = understanding, APPL = applying, ANLZ = analyzing), knowledge dimension (kd, FACT = factual, CNCP = conceptual), and question format (SA = short answer, T/F = true/false). The remaining columns corresponds to each student’s responses in the quizzes: 1 = correct, 0 = incorrect.
How to organize the data It is important that the data to be analyzed with the function is in a tidy format. Find more information about tidy datasets here. Briefly, a tidy dataset has the following main features: • each column is a variable • each row is an observation For example, the dataset shown in table 1 is not tidy because students1, 2, and 3 should be under one column: students. The table generated below ([Table 2]) is the tidy dataset generated with the data from table 1. Table 2: Tidy dataset to use in the prop_correct_logit function question
topic
bt
kd
format
student
answer
1 2 3 4 5 6
bio_hierch sci_meth sci_meth bio_hierch bio_hierch bio_hierch
RMB UND APPL RMB ANLZ ANLZ
FACT FACT CNCP FACT CNCP CNCP
SA SA T/F SA SA SA
student1 student1 student1 student1 student1 student1
1 0 0 0 0 0
The prop_correct_logit function Below is the prop_correct_logit function, with each code line commented to its right. setwd("C:/Users/Felix/Dropbox/DataScience/Projects/teaching/R_Functions") prop_correct_logit