Entering the Big Data Waves: Actionable Insights for Online Learning

by Yigal Rosen, Senior Research Scientist at VPAL-Research, Harvard University


The recent increase in popularity of massive open-access online courses (MOOCs), distributed on platforms such as edX, Udacity, and Coursera, has made it possible for anyone with an Internet connection to enroll in free, university-level courses. Harvard University has contributed to MOOC offerings with HarvardX, which aims to provide the highest quality courses representing Harvard’s academic diversity to serious learners everywhere. By offering open and primarily no-cost online courses, HarvardX has reached 3.5 million global learners (i.e., registrants) with over 100 courses. This “digital wave” of data from HarvardX learners’ online activities provides a powerful platform to conduct big data analytics and lead experiments that could transform teaching, learning, and assessment in a variety of educational settings.

Let’s take a look at a new wave of data collected from learners’ activities in HarvardX online courses on Tuesday, October 13, 2015:

HarvardX learners

20,121 unique learners from 167 countries; 7,487 learners from U.S. (from all 50 states); 12,634 learners outside of the U.S.; 4,779 learners from developing countries


1,595,550 learner-generated clicks

The course Super-Earths and Life was launched on October 13, 2015. This is a six-week course in astrobiology led by Prof. Dimitar Sasselov, Phillips Professor of Astronomy at Harvard, that describes how exoplanets are discovered, how life evolved on Earth, and how we may one day discover life around other stars. The course is intended for individuals who are interested in science and have roughly a high-school-level understanding of it. A closer look at the digital wave from October 13 reveals interesting patterns in problem solving space that could be further explored. 

digital wave

The data itself is only a starting point, however, that is necessary but not sufficient to transform education. The ultimate goal of data collection and analysis is to provide insight and inform decisions. Thus, the question is how to gain truly actionable insights from big data generated by HarvardX and other major online learning initiatives. This is a major challenge currently being explored by the Vice Provost for Advances in Learning (VPAL) Research Team. Building on a broad range of experts in cognitive psychology, education, data science, instructional design, and computer science, our team is focused on advancing the science of online and residential learning in Harvard and beyond. By analyzing data from click-streams and open-ended discussions in forums as well as by conducting focused experiments across multiple courses through partnerships with Harvard faculty members and course teams, our team is tasked with making generalizable inferences that will ultimately improve learning outcomes.

The beauty of data and the excitement of new findings is fascinating and invigorating. In order to make use of this powerful digital data wave and generate useful conclusions about best practices in teaching, learning, and assessment, researchers now have at their disposal a wealth of new tools and methods. For instance, trends over the two-year span of 2012-2014 in HarvardX and MITx were reported using comprehensive survey instruments and course participation pathways were charted using network analysis. In addition computational tools were developed to help instructional teams uncover themes and patterns as MOOC students write in forums, assignments, and surveys. Various factors that affect certification attainment rates and usage patterns were explored, experimentation with content release and learner support strategies were explored, and further directions for research were proposed.

Our research agenda builds on lessons-learned from this line of studies and includes a broad range of areas for further exploration. Examples include:

1. How do learners acquire knowledge, skills, and abilities across HarvardX courses? What are the characteristics of “good” learning practices in MOOC environments across subject areas, instructional settings, and learner profiles? When the various data streams are combined, the accumulated information can potentially provide increasingly reliable and valid evidence about what learners know and can do across multiple contexts.

2. What are the factors that affect learners’ motivation and persistence? MOOCs employ a variety of components to engage students in learning (e.g., videos, performance tasks, discussions forums, assessments). Further studies should shed light into optimal ways of creating meaningful learning experiences. Enabling quality lifelong learning for all has been a defining goal for education for many years, emphasizing the need for learning to take place over the whole lifespan and across the different spheres that make up our lives.

3. What technology tools best support formative assessment and actionable feedback? Developing novel ways of assessing knowledge and real-world skills can dramatically change how learning takes place in MOOCs. This involves high-quality, ongoing, unobtrusive assessments embedded in various stages of learning that can be aggregated to inform a learner’s evolving learning trajectory and also aggregated across learners to inform higher-level decisions.

4. How can educators create more personalized learning experiences that acknowledge diversity in learner characteristics? Online learning environments now often make use of adaptive technologies whose purpose is to “adapt” assignments to the individual learner based on his or her knowledge, skills, and attributes. Further research should explore how such adaptive technologies can be best embedded into learning experiences. As education systems continue to diversify, offering a greater set of educational opportunities and engaging with a broader range of learners than in the past, the transition to personalized learning can enable MOOCs to be even more attractive. Proving automated recommendations for next activities for learners is one of the promising directions.

5. How can emerging online resources be leveraged and blended with residential learning? The level of success in implementing blended courses suggests that using MOOC resources in traditional post-secondary institutions holds promise for expanding curricular offerings.

This new generation of online learning experiences has the promise to level inequalities in the educational model, advance the state-of-the-art in pedagogy, and further the learning sciences and assessment techniques. In our Research Blog series, VPAL Research team members, colleagues from Harvard and MIT, and researchers from other institutions will share the most recent insights highlighting both conceptual advancements and practical implications for both on-campus and online learning.

Welcome to the VPAL Research blog series!



Disclaimer: Responsibility for the information and views expressed in this blog lies entirely with the author and are NOT endorsed in any way by organizations mentioned in the blog.