Analyzing data from an adaptive MOOC

Ilia Rushkin, Harvard University

[You’re reading part 4 of an 4-part series on adaptive learning, a deep dive into the technology and approach deployed in HarvardX’s Super-Earths and Life in 2016-17. You can also read Part 1, Part 2, Part 3.]

In October 2016 an experimental HarvardX MOOC “Super-Earths and Life” was launched, in which some of the assessment problems were made adaptive. While normally all students in a MOOC see the same problems on a webpage, here the problems are served to a student one by one, in an individualized sequence produced on the fly by an artificial intelligence algorithm.

This being an experiment, all users were randomly split 50/50 into an experimental group and a control group. The adaptive functionality is turned on only for the experimental group, and its implementation is local and limited: it occurs in only 4 out of 18 homeworks of the course and involves a pool of just 39 problems, along with some instructional materials to help with the more difficult problems. The problems assigned to these homeworks were tagged with difficulty levels (easy, regular and advanced), as well as with learning objectives. Both groups of users can attempt these problems: if you are in the experimental group, some of these problems will be served to you adaptively, and if you are in the control group, the regular-level problems will be the ones you see in the homeworks, while all the easy and advanced ones will be offered as extra materials in subsections after the homeworks.

Since the users are split into control and experimental groups randomly, we can be reasonably certain that at the outset there are no significant differences between these two groups (although yes, we verify it just in case). They have very similar distributions of prior knowledge, demographics, etc. And until they interact with one of the 4 special homeworks, there is no difference in their user experience. Hence, to compare the groups, we subset both of them to include only those students who interacted with any problems in the 4 homeworks.

The data that we are collecting from this experiment allows us to examine many research questions, but we can divide them into 4 broad categories:

  1. Performance. Do students who experience adaptivity tend to finish the course with better grades? Is there a difference in the scores they get on other, non-adaptive problems in the course? What about their scores in the pre-test and the post-test - how do these compare with those of the students in the control group? Any difference in the drop-out rates? Etc.
  2. Speed. Do students who experience adaptivity go through the course faster? Is their time on task affected? The number of days they are active in the course? How many problems do they interact with? Etc.
  3. Engagement. Does experiencing adaptivity affect the likelihood for students to drop-out? Or their activity level in the forum? Or the number of videos they watch? Etc.
  4. Demographics. Could it be that the effects of adaptivity are different for different demographic groups? Perhaps, how much it helps you depends on your age, or education level? Etc.

What do we hope to see in the data?

In the first two categories of research questions, it is simple. In the best case scenario we would like to see a performance gain among the students who experienced adaptivity (e.g. if we saw that adaptivity gives a boost to the average final grade). Speed, in conjunction with performance, is also important to us. It would certainly be great if it turned out that experimental students achieve the same performance benchmarks as the control students, while spending less time in the course and doing fewer problems. This would be a sign of efficiency in learning.

When it comes to engagement and demographics, there are no unambiguous “bad” and “good” research results. There are, however, useful ones. In our future adaptivity work it will be helpful to know how different demographics react to adaptivity and what to expect in the engagement metrics.

What do we see so far?

Since the scope of adaptivity in this pilot project is very limited, we do not expect to see a substantial effect of adaptivity on the students, but we used this pilot course as a training ground for data analysis and put in place all the data machinery. It is now ready to process data from future adaptive courses, where the adaptivity will be on a much greater scale.

But even though the adaptivity data from this pilot is limited so far, can we see in it any hints of what could be going on? Are we able to make any conjectures that could be verified with more data later?

Yes. The most promising finding is about efficiency in learning: we find that the experimental users, who experienced adaptivity, tended to try fewer adaptive problems (especially the easy ones) than the users in the control group. And yet, experimental and control students show no significant difference in overall performance or engagement.

engagementIf anything, the comparison of pre-test and post-test shows that the learning gains were greater for the experimental students. (The figure below illustrates that: in it, p is the two-tailed p-value from the Welch t-test and ES is the effect size measured by Cohen’s d). It appears that serving problems to students with an adaptive algorithm streamlines the learning process: students do not spend time unnecessarily on those problems that add little or nothing to their level of mastery.

pre-postAnother interesting observation is that among the users who had low scores in the pre-test, the chances of completing the course were higher in the experimental group. Indeed, initially there was no significant difference in pre-test scores of the experimental and the control groups (which makes sense, since the groups are formed randomly). But the figure above shows only the students who interacted with the four special homeworks and reached the post-test, and here we see a significant difference in the pre-tests. In other words, it appears that by serving problems in a way mindful of the student’s current mastery level, adaptivity decreases the drop-out tendency among students who come to the course with little prior knowledge.

What next?

In the future, we plan to have more adaptive MOOCs, in which the adaptivity will be both broader (not limited to a small number of assessment modules) and deeper (bigger pool of problems for each learning objective, from which the adaptive algorithm can pick), providing us with much richer data. Moreover, there are many flavors of adaptivity. For instance: should we let the algorithm serve at any time any problem tagged with a particular learning objective, or should we create a dependency structure such as “Content X should be served only if the student has already seen content Y”? What is the effect of increasing the granularity of learning objectives (surely, at some point we should see diminishing returns)? What is the effect of increasing the complexity in the structure of learning objectives (they can be organized by topic only, or by topic and cross-cutting concepts). What is the optimal number difficulty levels in problems? And so on. Many intricate parameters in the adaptive algorithm can be tuned, and many different policy decisions can be made. Here, we compared the effect of having some adaptivity vs. having none. In the courses to come, we will be able to start searching for an adaptivity configuration best adapted (pun intended) for MOOCs.

 

Disclaimer: Responsibility for the information and views expressed in this blog lies entirely with the author and are NOT endorsed in any way by organizations mentioned in the blog.