Big(ger) Data as Better Data in Open Distance Learning

International Review of Research in Open and D istributed Learnin g
Volum e 16, Num ber 1

February – 2015

Big(ger) Data as Better Data in Open Distance
Learning

Paul Prinsloo, Elizabeth Archer, Glen Barnes, Yuraisha Chetty and Dion van Zyl
University of South Africa

Abstract
In the context of the hype, prom ise and perils of Big Data and the currently dom inant paradigm of
data-driven decision-m akin g, it is im portant to critically engage with the potential of Big Data for
higher education. We do not question the potential of Big Data, but we do raise a num ber of
issues, and present a num ber of theses to be seriously considered in realisin g this potential.
The Un iversity of South Africa (Unisa) is one of the m ega ODL institutions in the world with m ore
than 360 ,0 0 0 students an d a range of courses and program m es. Unisa already has access to a
staggering am ount of student data, hosted in disparate sources, and govern ed by different
processes. As the university m oves to m ainstream in g online learn ing, the am ount of and need for
analyses of data are increasin g, raising im portan t questions regarding our assum ptions,

understanding, data sources, system s and processes.
This article presents a descriptive case study of the current state of student data at Unisa, as well
as explores the im pact of existing data sources and analytic approaches. From the analysis it is
clear that in order for big(ger) data to be better data, a num ber of issues n eed to be addressed.
The article concludes by presenting a num ber of theses that should form the basis for the
im perative to optim ise the harvesting, analysis and use of student data.
Ke yw o rd s : Big Data, learning analytics, student success

284

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

Introduction
Technology is neither good nor bad; nor is it n eutral...technology’s
interaction with the social ecology is such that technical
developm ents frequently have en vironm ental, social, and hum an
consequences that go far beyond the im m ediate purposes of the
technical devices and practices them selves.
Melvin Kranzberg (198 6, p. 545)

The view that Big Data is neither in herently good nor bad, increasingly finds its way into the
current discourses in higher education. There is therefore a need for critical scholarly engagem ent
am idst claim s that “Big Data represents a paradigm shift in the ways we understand and study
our world” (Eynon , 20 13, p. 237). Much of the discourses regarding Big Data in higher education
focus on increasin g efficiency and cost-effectiven ess (Altbach, Reisberg & Rum bley, 20 0 9),
am idst concerns regardin g privacy, surveillance, the n ature of evidence in education, and so forth
(Biesta, 20 0 7, 20 10 ; Eynon , 20 13; Prinsloo & Slade, 20 13; Wagner & Ice, 20 12). While Big Data
has been lauded as “chang(ing) everythin g” (Wagner & Ice, 20 12), and “revolutionis(in g)
learnin g” (Van Rijm enam , 20 13), boyd and Crawford (20 12, 20 13) question , quite rightfully,
whether Big Data is an unqualified good.
The potential, perils, and the harvestin g and analysis of Big Data in general, and in higher
education in particular are, therefore, still relatively unexplored (Lyon, 20 14; Siem ens & Long,
20 11). There is a need to establish a com m on language (Van Barneveld, Arnold, & Cam pbell,
20 12) as well as resolve a n um ber of conceptual, ethical and practical m atters (Andrejevic, 20 14;
boyd & Crawford, 20 12; Bryant & Raja, 20 14; Clow, 20 13a, 20 13b, 20 13c; Morozov, 20 13a, 20 13b;
Mayer-Schön berger, 20 0 9; Mayer-Schönberger & Cukier, 20 13; Puschm an n & Burgess, 20 14;
Richards & Kin g, 20 13; Slade & Prinsloo, 20 13; Siem ens, 20 13).
While higher education institutions always had access to relatively large data sets and tools for
analysis, there is an increasing am ount of digital student data that can be harvested and analysed
(Swain , 20 13), as well as increased technological and analytical capabilities (Wishon & Rom e,

20 12). Describing learnin g analytics as the “new black” (Booth, 20 12), and student data as the
“new oil” (Watters, 20 13) m ay resem ble “techno-solutionism ” (Morozov, 20 13b) and a certain
“techno-rom anticism ” in education (Selwyn, 20 14). Despite various claim s regarding the success
of learnin g analytics to im prove student success and retention (e.g., Arnold, 20 10 ; Clow 20 13a,
20 13b, 20 13c), Watters (20 13) warns that “the claim s about big data and education are incredibly
bold, and as of yet, m ostly unproven” (par. 17). Without ascribin g to a technological pessim ism ,
we have to critically explore the current belief in higher education that harvestin g and analysing
increasingly bigger data sets, will, n ecessarily im prove student success and retention (boyd &
Crawford, 20 12; Briggs, 20 14; Epling, Tim m ons & Wharrad, 20 0 3).

This work is licensed under a Creative Commons Attribution 4.0 International License.

285

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

This article does not question the potential of big(ger) data to be better data but would rather
present a num ber of theses that m ay ensure that big(ger) data will result in a m ore thorough
understanding of the com plexities of student success and retention , and m ore appropriate,

tim eous and effective institutional support.

Method and Context
This descriptive and interpretive case study explores the proposition of big(ger) data within the
context of a m ega open, distance learnin g institution (ODL), the University of South Africa
(Unisa). With m ore than 360 ,0 0 0 students registered for a variety of qualifications (m ore than
1,20 0 ) and selecting from a range of close to 3,0 0 0 courses– the size of available data m ay already
qualify as ‘big, ’ though we acknowledge that com pared to definitions of Big Data (e.g., Kitchen ,
20 13) the data harvested, analysed and used by Unisa currently falls short.
As a descriptive case study, the issues and theses raised do not claim to be original, nor
generalisable to other higher education institutions, but rather linked to a num ber of theoretical
propositions or theses (Yin, 20 0 9). The value of descriptive case studies lies in its potential to
extrapolate from a specific phenom enon a num ber of abstract interpretations of data and
propositions for theoretical developm ent (Mills, Eurepos & Wiebe, 20 10 ). As an interpretive case
study we attem pt to not on ly describe a specific case of the use of student data, but interpret the
case to m ove to a tem porary conceptual fram ework, propositions or theses (Thom as, 20 11). Our
aim is therefore to advance phron esis or practical knowledge.
As Unisa increasin gly m oves into digitised and onlin e learning, the am ount of data available, as
well as increased analytical capability provides am ple foundation for an intensification of data
harvesting and analysis efforts. Som e of the question s integral to our exploration relate to how

optim ally we use data that we already have access to and whether m ore and/ or different data will
support our endeavours in finding the Holy Grail of a unified theory of student engagem ent,
retention and success. How do we distinguish between signals and noise (Silver, 20 12)? We
accept that harvesting m ore and different data m ay hold potential, but if we do not thin k critically
about institutional support and operational integration with regard to the harvesting and analysis
of data, we m ay never realise the potential of bigger data.
All of the authors except one, who is a researcher in open distance learn ing, are involved in
institutional research at Unisa, with the m andate to provide high quality, relevant and tim ely
inform ation, analysis and institutional research, m aking the institution m ore intelligible to itself.
As such, the question regarding Big Data is central to the daily praxis of the researchers.
The authors en gaged with literature and the current status of student data at Unisa in a
system atic way over the course of six m onths. Several m eetin gs were held, and the analysis in this
This work is licensed under a Creative Commons Attribution 4.0 International License.

286

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

article resulted from the field notes from these m eetings. Construct validity was ensured by

engagin g with m ultiple sources of evidence. The team confirm ed internal validity by explanation
buildin g and addressing rival explanations as suggested by Yin (20 0 9). The resulting theses were
com pared with the literature and available evidence in the field.
This paper em ploys the insights, experience and thoughts of these researchers engaging with the
often undocum ented realities of engagin g with big(ger) data within a m ega open, distance
learnin g institution . The aim of the paper is to m ove beyond the discussion of whether or not
big(ger) data is better data towards the m ore practical questions of: in order for big(ger) data to
be better data, what n eeds to be in place?

Big Data
Big Data refer to “the capacity to search, aggregate and cross-reference large data sets” (boyd &
Crawford, 20 12, p. 663) an d should be explored not only for its potential but also to question its
capacities, its socio-political consequences and the need for critique (Lyon , 20 14). “Big Data is, in
m any ways, a poor term ” (boyd & Crawford, 20 12, p. 1) and increasingly refers to m etadata or
“data about data” (Lyon, 20 14, p. 3). Rob Kitchen (in Lyon, 20 14, p. 5) describes Big Data as
having the followin g characteristics:
huge volum e, consisting of terabytes or petabytes of data; high
velocity, being created in or near real tim e; extensive variety,
both structured and unstructured; exhaustive in scope, strivin g
to capture entire populations of system s; fine-grain ed resolution,

aim in g

at

m axim um

detail,

while

bein g

in dexical

in

identification; relational with com m on fields that enable the
conjoining of different

data-sets; flexible, with


traits of

extensionality (easily adding new fields) and scalability (the
potential to expand rapidly)
While m ost of the current discourses em phasise the increasin g am ount of data, the ‘real’ value
(and peril) in Big Data lies in its networked and relational nature (Baum an & Lyon, 20 13; boyd &
Crawford, 20 12; Marwick, 20 14; Mayer-Schön berger & Cukier, 20 13; Solove, 20 0 1) with “at least
three sign ificant actors in this dram a, governm ent agencies, private corporations and, albeit
unwittin gly, ordinary users” (Lyon, 20 14, p. 3). “It is the kind of data that encourages the practice
of apophenia: seein g patterns where none actually exist, sim ply because m assive quantities of
data can offer connections that radiate in all directions” (boyd & Crawford, 20 12, p. 2).
In the context of Big Data, there is talk of the “age of analytics” (Tene & Polonetsky, 20 12, p.1),
and increasingly, the “algorithm ic turn”, “the algorithm as institution” (Napoli, 20 13, p. 1), the
This work is licensed under a Creative Commons Attribution 4.0 International License.

287

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl


“threat of algocracy” (Danaher, 20 14) and “algorithm ic regulation ” (Morozov, 20 13a, par.15). In
these instances, algorithm s have regulative, norm ative, and cultural-cogn itive dim ensions in the
intersection between algorithm and institution where code becom es law (Napoli, 20 13, referrin g
to Lessig, 20 0 6). A num ber of authors interrogate the potential of Big Data through the lens of
“societies of control” (Deleuze, 1992, p. 4) (also see H enm an , 20 0 4). Big Data an d its algorithm s
resem ble a possible “gn oseological turning point” in our understanding of knowledge,
inform ation and faculties of learning where bureaucracies increasingly aspire to transform and
reduce “ontological entities, individuals, to standardized ones through form al classification” into
algorithm s and calculable processes (Totaro & Ninno, 20 14, p. 29).
A num ber of authors (boyd & Crawford, 20 13; Lyon, 20 14; Richards & King, 20 13) therefore posit
som e provocations regarding Big Data that dem and critical reflection. The increasing reliance on
Big Data questions our traditional assum ptions about knowledge in the context of Big Data’s
claim to “objectivity and accuracy [which] are m isleading.” We also need to realise that “not all
data are equivalent” (boyd & Crawford, 20 13, pp. 3-12). There is also the potential that Big Data
will create new divides an d be em ployed to perpetuate and increase existin g inequalities and
injustices (Andrejevic, 20 14; boyd & Crawford, 20 13; Couldry, 20 14; Lyon , 20 14; Richard & Kin g,
20 13).
There are also claim s that Big Data “is about w hat, n ot w hy . We don’t always n eed to know the
cause of the phenom enon; rather, we can let data speak for itself” (Mayer-Schön berger & Cukier,

20 13, p. 14). And herein lays the dilem m a. Data cannot speak for itself. boyd and Crawford
(20 12) refer to this assum ption as “an arrogant undercurrent in m any Big Data debates” (p. 4)
and Gitelm an (20 13) states that raw data “is an oxym oron.”
Am idst the hype and relative current scarcity of evidence regardin g the im pact of learning
analytics on increasing the effectiveness of teaching an d learning (Altbach, Reisberg & Rum bley,
20 0 9; Clow, 20 13a; Watters, 20 13), it is necessary to problem atise the relationship between Big
Data and education . Selwyn (20 14) rem arks that educational technology “needs to be understood
as a knot of social, political, econom ic and cultural agendas that is riddled with com plications,
contradictions and con flicts” (p. 6). There are also concerns about education’s current
preoccupation with evidence-based teachin g and learning (Biesta, 20 0 7, 20 10 ).
In the following section we therefore attem pt to chart the relationship of learn in g analytics with
academ ic analytics in the context of the bigger picture of data in education and Big Data.

Learning Analytics, Academic Analytics and Big Data
This article is not prim arily concerned with the distinctions, scope and defin itions of the different
term inology used in the context of Big Data in higher education such as business, academ ic and
learnin g analytics (see for exam ple Buckin gham Shum & Ferguson, 20 12; Cam pbell, DeBlois, &
Oblin ger, 20 0 7; Clow, 20 13a; Siem ens & Lon g, 20 11; Van Barneveld et al., 20 12).

This work is licensed under a Creative Commons Attribution 4.0 International License.


288

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

We therefore adopt the joint use (not definition) of learning and academ ic analytics (as proposed
by Siem ens & Lon g, 20 11). Learning analytics concern s itself with how students learn, integratin g
m any aspects of students’ transactions with am on gst others, the learner m anagem ent system ,
while areas such as adm ission, course success, graduation, em ploym ent and citizenship are better
encom passed under the broader definition of academ ic analytics (see Siem ens & Long, 20 11). The
notion of Big Data incorporates all sorts of electronic interactions even beyond the higher
education environm ent, incorporatin g different elem ents of students’ digital identities and
personal learning environ m ents (PLEs). Interestingly, as various sources of inform ation start to
resem ble “electron ic collages” and an “elaborate lattice of inform ation n etworkin g” (Solove, 20 0 4,
p. 3), learn ing analytics m ay potentially overlap with academ ic analytics, the available data in the
context of higher education , and data available ‘outside’ the strict boundaries of higher education
(see Diagram 1).
D iagram 1: Mappin g Learning and Academ ic Analytics in the Context of Big Data

It is crucial to delim it the scope of this article’s focus on the potential of big(ger) data as better
data as focusing only on the data available in , harvested and analysed by higher education
institutions. This delim itation is, however, som ewhat problem atic, as the lattices of inform ation
networkin g (Solove, 20 0 4) do not necessarily care about the (relatively) artificial boundaries
between data ‘in’ higher education and data ‘outside’ of higher education. There is evidence that
the nature of these boun daries m ay, actually, becom e increasin gly artificial as student data
becom es a potential incom e stream for institutions, and secondly, as student learning m oves
This work is licensed under a Creative Commons Attribution 4.0 International License.

289

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

beyond the strict confin es of institutional learn ing m an agem ent system s (LMS) (Prinsloo & Slade,
20 13) (also see Bennet, 20 0 1; Giroux, 20 14; Lyon, 20 14; Marx & Muschert, 20 0 7).

An Overview of the Current State of Student Data at Unisa
The current state of the harvestin g and use of student data at Unisa should be understood in
term s of the institution’s understanding of student retention and success (the student journey or
walk), and secondly in the context of existing data sources and analytical approaches. The aim of
this section is therefore not to offer im m ediate and possible solutions to som e of the challen ges
experienced, but to contextualize and highlight the current state.

Student Data and the Student Journey
Unisa understands and predicts student retention, dropout and student success based on a sociocritical understanding of student success as described by Subotzky and Prinsloo (20 11). This
m odel draws attention to five student data elem ents, which can be harvested to support
institutional plan nin g an d decision-m aking and to determ ine possible student support
interventions. These include adm ission, learnin g activity, course success, graduation, and
em ploym ent and citizenship. Data, which pertains to learning activities is perhaps the one area
where learnin g analytics finds it hom e given that it is concerned with how students learn. The
other areas (adm ission, course success, graduation, and em ploym ent and citizenship) are better
encom passed under the broader defin ition of academ ic analytics (see Siem ens & Long, 20 11).

Admissions and Registrations
A variety of dem ographic data is captured at registration about the student, including attributes
such as gender, ethnicity, age and educational background, which can be used by researchers to
create student profiles lin ked to success indicators such as exam success and graduation, as well
as learnin g analytics data, which are available on the learnin g m anagem ent system (LMS). The
integrity of the data, is of course som ethin g that n eeds to be regularly m onitored. There are
im portant profiling elem en ts, which are not currently available in the databases, such as livin g
standard m easures and access to ICTs – these are instead sourced through research surveys
involving sam ples of studen ts and not the entire studen t population .

Learning Activities
This aspect of the student journey includes, but is not lim ited to, downloadin g study m aterials
and learning objects, preparin g and subm ittin g assign m ents, engagin g with e-tutors and em entors, en gaging with a range of learnin g facilitators via the LMS through discussions groups
and blogs, uploading assign m ents as they prepare for exam inations and digitised learnin g activity
indicators such as tutorial attendance. The Un isa LMS is a key data source, which can be used for
harvesting and m aking sen se of student data, and discussions are underway to m ake relevant data
This work is licensed under a Creative Commons Attribution 4.0 International License.

290

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

available for research and analysis purposes. It is im portant to note that while m ost of the above
data can be m ade available, the data currently reside in different system s, which are not
integrated. This presents a challenge and lengthens the tim e of gainin g access to relevant data.
Considering the fact that m uch of students’ learn in g and learning activities m ay actually take
place outside of the institutional LMS in digital an d non-digital personal learning networks
(PLNs) or personal learnin g environm ents (PLEs), this poses som e additional challenges to the
harvesting and analysis of data, on a num ber of levels (e.g., interoperability, appropriateness of
data, etc.). Slade and Prin sloo (20 13) also recognised the potential loss of gaining a “holistic
picture of students’ lifeworlds” (p. 1515) due to the “inability to track activity outside of an
institution” (also see Prinsloo & Slade, 20 14).

Course Success
Course success (as m easured by passin g the exam ination) rem ains but one elem ent of m easured
success (Subotzky & Prinsloo, 20 11). The exam ination process is closely m onitored and reported
on, covering aspects of exam adm ission , exam absence, and success in term s of num ber of
students that passed relative to the num ber that sat for the exam ination . Additional inform ation
is also available on deferred exam ination statistics. All these data provide a com prehensive view
of the exam ination event. These data can also be extended to com pute other cohort aspects of
success including exam ination passes relative to enrolm ents and intake. A num ber of reportin g
repositories are available to academ ics for the dissem in ation of exam ination statistics.
Recent engagem ent in this area suggests that attrition during the exam ination event is relatively
sm all and focused attention is now m ovin g to retention prior to the exam ination . The
com bination of academ ic and real-tim e learning analytics will allow us to get a sense of students’
losing or changing m om entum as they progress, whether by analysin g their engagem ent in online
forum s, and/ or subm ission of assign m ents.

Retention
At present, m ost of the analysis on retention is focused on various form s of student cohort
analyses that are conducted and reported on at various decision m akin g forum s. While cohort
retention and throughput inform ation is readily available, the issue of the level of granularity of
the data is under question and autom ated ‘drill through’ reports are not yet in place. These
analyses rem ain plagued by data structure issues and the lack of a nuanced definition of ‘dropout’. A pertinent issue is that student dropout is only picked up when students fail to subm it
assign m ents, sit for the exam ination or re-register. Currently there is no m echanism in place to
record, analyse and respon d to real-tim e data, for exam ple, where students do not log onto the
LMS in m ore than a week, which would then enable early identification of students at risk of
drop-out and the ability to provide personalised support and feedback. (For a critical appraisal of
the use of learnin g analytics to increase student retention, see Clow, 20 13b.)

This work is licensed under a Creative Commons Attribution 4.0 International License.

291

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

Graduation
Analysis of graduation data provides another lens an d level of granularity on student retention.
Graduation data are available for analysis and reporting purposes to a range of stakeholders.
However, this is a com plex process as graduation data can be reported from a num ber of different
perspectives, which are m utually exclusive and result in different outcom e indicators.

Employment and Citizenship
Though the Alum ni database at the institution stores data on graduates and their contact details,
it is alm ost im possible to keep this database current. As an im portant data source, an up-to-date
and inform ation-rich Alum ni database m ay provide crucial data for analyses such as m easuring
the im pact of qualifications on graduate em ploym ent trajectories, as well as provide opportun ities
for post-enrollm ent guidan ce and lifelon g learn ing.

Engaging with the Student Data Elements
Existing Data Sources and Analytical Approaches
Exis tin g d ata s o u rce s .
The existin g data sources lie in a num ber of disparate operational system s m aintained by various
functional units within the institution. Broadly, the roles of Data Owners, Data Custodians and
Data Stewards have been identified and com m unicated, however, the need for cross-functional
data access m akes operatin g within these specific roles problem atic. A far greater level of system s
integration needs to be achieved if these roles are to be m ean ingfully applied. The disparate
system s typically have grown from the n eed of different functional areas to have custom ised
functionality built into operational system s. These developm ents take place without a reportin g or
analytic objective in m ind with the result that leverin g sensible analytics from these sources is a
problem .
The disparate nature of system s at Unisa has not em braced the concept of central warehousin g,
with data stores rem ainin g fragm ented into the m ajor areas of (1) student adm inistration, (2)
learner m anagem ent, (3) hum an resources, (4) finance and (5) space and the built environm ent.
Not only are these areas supported by different operational system s, they have produced separate
data warehouses and even separate business intelligen ce fram eworks. While this is not a problem
for operational needs, it rem ains a huge problem for integrated analysis and reportin g, which
requires the integration of these data sets.
Considering the velocity, agility and com plexity of Big Data in the context of disparate existing
data sources, it is clear that in order for big(ger) data to be better data, Unisa will have to rethin k
This work is licensed under a Creative Commons Attribution 4.0 International License.

292

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

the governance and integration of these different sources. Not only does Big Data increasin gly
allow the sam e data to be used for different purposes (Lyon, 20 14), but with each recursion, there
is an increasing threat of the loss of original context. With each new configuration and
com bination the data m ay “be given n ew m eanings that are cut off from the values that once m ade
sense of them and the identifiable subjects whose activities generated them in the first place”
(Lyon, 20 14, p. 6) (also see Am oore, 20 11).

Analytical Approaches
The disparate nature of the various data sources typically requires techn ically skilled staff within
the institution to be com m ission ed to translate the ‘busin ess rules’ of the organisation into
extracts that afford data m igrated from an operational system towards an analytic fram ework.
Data extracts vary in com plexity depending on the alignm ent of business rules to analytic
opportun ities. In m any cases, tim e-based analyses of the raw data are not possible as there are no
date stam ps on certain areas of the data. For exam ple, the need to analyse the academ ic drift in
the offerin gs of the institution requires the Program m e Qualification Mix (PQM) or academ ic
structure to be stored in a way that differentiates tim e (per year). In Unisa’s case, the academ ic
structure can be overwritten each year with the n ew (or com binations of) offerings with the result
that the history of the data set is lost. The warehousin g project will then have to take snapshots at
designated points in the year to store this inform ation in order to conduct the analytics.
Once on the warehouse a num ber of ‘views’ of the data are required prim arily to address the
perform ance of analytics and are set up to constitute a ‘reporting layer’. Since the core business of
the institution is undergraduate teaching and learn in g, m ore em phasis is placed on the various
student views. The term used here does not refer to a specific table on a database but rather a
design feature that can be applied to any technical en viron m ent. Typically, views are determ ined
by a particular granularity (level of detail) of the data, or could also be designed around a
com m on ‘currency’ in the data. For exam ple, a course view has a sign ificantly different granularity
of student inform ation com pared with a program m e or qualification view. Currency would refer
to the underlying com m on elem ent lin king aspects of the analytics, for exam ple the course could
form a com m on currency for assign m ent inform ation, tutor inform ation and student inform ation.
Broadly speaking, student views lie separately in the areas of assign m ents, courses, qualifications,
tutors and lecturers. All m etrics are govern ed by the structure within which they reside. Courses
and qualifications reside within the academ ic structure or curriculum , but are also closely
associated with the organ isational structure at the tim e. Changes in these areas over tim e need to
be addressed in an autom ated way in order to retain the integrity of analytics. Challenges in these
areas em anate when the objectives of analytics are not clearly defin ed as an institutional take on
reporting requirem ents. A good exam ple of this would be the creation or dissolution of a faculty,
school or college. These m ajor chan ges occur relatively frequently in the current higher education
environm ent and have m ajor im pacts on the intelligence of the institution. An institutional
decision is required where reporting is either done em ploying the current structure or not. Many
institutions choose to link all dependencies to the current structure and to transpose the history
This work is licensed under a Creative Commons Attribution 4.0 International License.

293

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

in the data to that structure. In the case of our current exam ple, if a college is created in a
particular year, the history of all the related m etrics (enrolm ents, exam statistics, assignm ents,
etc.) are lin ked to the college and reported as if it were in existence in history. The question which
needs to be considered is ‘at what point is historical data irrelevant?’
The design of the reportin g layer takes cognisance of the vast variety of analytics to be perform ed,
from autom ated sum m ary tables that underpin aggregated dashboards to the need for extended
data m inin g of all attributes in these views to pick up underlying trends. The m ain purposes for
the creation of the reportin g layer is to support the user requirem ents of analysts and researchers,
this being different from Managem ent Inform ation referred to above. Data sets are provided to
these users either via direct access to the warehouse environ m ent (SQL Server database), by
provisioning front-end tools linked to these sources (MS Excel pivots, IBM SPSS, etc.) and also
via autom ated web applications (Busin ess Intelligence tools, custom ised n et applications,
proprietary tools, etc.).

Ensuring Big(Ger) Data as Better Data: A Number of
Theses
Having described the current state of student data in a descriptive case study, we now propose a
num ber of theoretical theses to inform further research and policy developm ent. The theses are
not m utually exclusive and the sequencing of the theses indicate a relative weightin g in
im portance in order to address the research focus of this article, nam ely: In order for big(ger)
data to be better data, what are the issues that we n eed to consider?

Thesis 1: A Critical View of Big Data
Am idst the hype and prom ise of Big Data in higher education, it is crucial to critically question
and engage not only with its potential and its perils, but also with our assum ptions about student
engagem ent, retention and success. In order for big(ger) data to be better data in the context of
Unisa, it is clear that a skeptical (Selwyn, 20 14) and m ost probably a disenchan ted view of Big
Data’s potential and prom ise is necessary (Couldry, 20 14) (also see boyd & Crawford, 20 12, 20 13;
Lyon, 20 14 and Richards & King, 20 13). In following Gitelm an (20 13) we believe that raw data “is
an oxym oron”, and contrary to Mayer-Schönberger an d Cukier (20 13) we do not believe that data
speaks for itself. We need to consider ‘what’ and the ‘why’ (also see Kitchen , 20 13). Considering
that data is increasin gly used to determ ine access to support and resources (Hen m an, 20 0 4), we
need to be cognisant of the im pact and unintended consequences of our assum ptions
underpinning the algorithm ic turn in higher education. Being skeptical about the potential of
m ore data to be n ecessarily better data, we also need to critically engage with why m uch of the
data harvested and analysed are not used (Macfadyen & Dawson , 20 12).

This work is licensed under a Creative Commons Attribution 4.0 International License.

294

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

Not only does such a critical and even skeptical view of big(ger) data herald and necessitate the
need for a paradigm shift, such an approach instigates a num ber of other paradigm shifts with
regard to institutional processes, skills-sets, system s and institutional knowledge.

Thesis 2: Process Changes
A com bination of system s is used at different levels to harvest the data, but first let us determ ine
the sources. The m ain source and m anagem ent of data occur at the enterprise system level
(Oracle student, Oracle HR, LMS, XMO, etc.). At this level, the data entry points are established
and processes are put in place to capture and m an age data in put. It is at this level where
boundaries and roles need to be clearly delineated to ensure the correct access is given to those
that need update rights on the data. In m any cases, we have staff with access to update data
elem ents that are not specific to their environm ent and operational needs.
From these system s, a very lim ited quantity of Managem ent Inform ation (MI) is provided, m ainly
because these system s are in-house developm ents, which were not design ed with a reporting
perspective in m ind. We m ake a distinction between MI and analytics in this article and subm it
that any respectable proprietary or in-house developed system should cater for all the needs of the
user to perform the operational functions of their respective departm ents. As we m ove towards
em bracing analytics and the possible benefits of big(ger) data as better data, a paradigm shift will
be required as it is this very operational and transactional data that will provide the basis for real
tim e analytics.
The process of harvestin g is im portant as durin g this process attention is given to the quality of
the data. Procedures in place rely on a series of ‘exception’ or ‘error’ reports being run to
determ in e data quality issues. These processes also allow for feedback to the source and
responsible person to address data quality issues. This process is an iterative one of checkin g,
feedback, fixing and re-extracting the data before release to users in the reporting layer.

Thesis 3: Skills and Capabilities Shift
The harvestin g and analysis of available data by institutional researchers within Unisa is
im perative as a contributor, firstly, towards understanding student success and secondly, to
support strategic and operational decision m aking in order to ensure institution al sustainability
and growth. However, where institutional researchers have dwelled within a paradigm of m ore
traditional research design s and m ethods, Big Data offer significant new challen ges from a data
harvesting and m inin g perspective.
The availability of such Big Data is, however, still only in its infancy at Unisa. In order for Unisa
researchers to harvest and analyse such data in future to create valuable and actionable insights,
a new range of skills and com petencies will be required. Traditional research m ethods often focus
on data gen eration due to a scarcity or lack of data. This required researchers to focus and
develop their skillsets around areas such as the science of m easurem ent (are we m easurin g what

This work is licensed under a Creative Commons Attribution 4.0 International License.

295

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

we are supposed to m easure), data collection (to ensure representativeness) and analysis (can we
m ake inferences about the larger population ).
The prom ise of Big Data offers a new world and requires a new skillset for an alysts. Big data
em anate from m ultiple sources, but require som e m anipulations to answer specific research
questions. Researchers an d stakeholders m ust now scope inform ation needs to m ore exact
questions. But with m ore data com es m ore noise and also m issing values (i.e., gaps in data). The
‘new’ institutional researcher m ust have skills to find or direct others to m eanin gful big data and
m ake relevant connections. Program m ing skills in new languages com bin ed with m ulti-source
data m ining, statistical m odelling and prediction are now required. While the developm ent of
algorithm s will be a part of this process, from an institutional and pedagogical perspective, an
understanding of what drives student learning an d success will rem ain key. Institutional
researchers m ust balance the “what” provided by the patterns in data with the “why” which
require m ore in-depth investigation through traditional research approaches. Identifying
correlations alone will therefore not be sufficient. See Table 1 for a contrasting view between
traditional approaches to in stitutional research and data analyses and m odern analytics.

This work is licensed under a Creative Commons Attribution 4.0 International License.

296

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

Table 1
Shifts from Traditional Institutional Research Skills to the Dem ands of Modern Analy tics

Data

Skillset em phasis

More Traditional Institutional
Research
• Scarcity of data
• Focus on data
generation
• Generally historical,
trend or snap shot
orientation
• Science of
m easurem ent including
instrum ent design and
construct m easurem ent
• Data collection to
ensure representativeness
and generalisations
• Analysis for in ference
about the larger population

Modern Analytics
• Big Data
• Multiple data
sources
• Generally trend and
real-tim e data.
• Findin g m eanin g in
Big Data and m aking
relevant conn ections
• Program m ing skills
• Multi-source data
m ining
• Statistical
m odelling
• Developm ent of
algorithm s
• Big Data, not
tailored to any
questions, narrowed
down to inform ation
needs for specific
questions
• More data = m ore
noise, difficulty to
determ in e which data
is m ean in gful and what
the patterns are
reflecting
• Missin g values
• Data Quality

Approach

• Multitude of possible
questions narrowed down
to m ore specific research
questions

Main com plexities

• Lim ited granularity
and ability to segm ent due
to sm all num bers
• Representativeness of
sam ple and ability to m ake
inferences about the larger
population

Driver

• Understandin g of what drives student learning and
success

Will the future institutional researcher be on e that is described as statistician, m athem atician ,
com puter scientist, database adm in istrator, coder, hardware guru, system s adm inistrator,
researcher and interrogator, all in one? Or will the in dividual m ake way for a m ore team based
approach? A literature search points to a lack of research investigating the dem ands of Big Data
on the skills and capabilities of institutional researchers. The con vergence of traditional
This work is licensed under a Creative Commons Attribution 4.0 International License.

297

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

institutional research skills, data science, analytical services and organisation intelligence
therefore becom es a key area of consideration that should be taken cogn isance of.

Thesis 4: Systems Evolution
The m ove toward central warehousing is im perative if the potential of Big Data and enterprisewide analytics is to be achieved. A system atic m appin g of the data elem ents needed for reportin g
and analysis is required, this in conjunction with an overview approach to system s architecture.
Very little opportunity is taken to look at reporting and analytic requirem ents from an
institutional perspective, which is exacerbated by the lack of clear roles and responsibilities, poor
com m unication between affected parties and ‘turf wars’.
Unisa should m ove towards the deploym ent of an ‘inform ation control cen tre’, which has
analytics as the prim ary objective, and which is built from a solid base of technical hardware,
software and skilled personnel. The use of sophisticated software to bridge the gaps and
boundaries of storage and location of data needs to be in vestigated. The m ovem ent away from
traditional developm ent thinking needs to happen in order to be agile in this area. We rem ain
trapped in the m indset that requires a particular process to be followed: (1) data elem ents are
identified for a particular purpose, (2) the m etrics and m easures for the elem ent then defined, (3)
the specifications are written to enhance a particular system for the purposes of storage, (4)
developm ent takes place to enhance the system , (5) the im plem entation of the upgrade is
com m issioned through extensive, often invaluable consultation, and (6) the process of capture
and m onitoring is docum en ted and applied.
Alignm ent of processes an d system s will allow Unisa to not only use historical data in m ore
effective (and appropriate) ways, but ensure an agile inform ation architecture to optim ise the
potential and guard against the perils of Big Data.

Thesis 5: Institutional Relevance, Context, and Knowledge
With the possibilities of workin g with Big Data, traditional research m ethods will have to evolve
to face this new reality of Big Data. The challen ge lies also not only with the technical aspects of
findin g, organ isin g and com binin g the often unstructured data, but with the contextual insights
needed to interpret and apply the knowledge and intelligence gained. Contextual intelligence
speaks to an in -depth understanding of the institution and how it functions, and will include
knowledge of its operation al and strategic objectives and direction . Herein lays one of the m ajor
challenges with bigger data. All this will require new ways of thin kin g on various levels.
Im portantly, researchers and analysts will need to balance the prom ise of Big Data, and the
various opportunities it presents to uncover patterns in data and em ploy advanced analytics (the
“what”), with an ongoing search for the drivers of student learnin g, success, retention, dropout
and throughput (the “why”) in order to provide relevant intelligence with m axim um im pact. A
stark reality within the Un isa context, which also faces institutional researchers globally, is the
pertin ent question of “So what”? If all the analytics and research do not result in actionable
interventions by the un iversity, in our case to strengthen support to students towards success,
This work is licensed under a Creative Commons Attribution 4.0 International License.

298

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

then how relevant are we? This is a sobering thought, which also speaks to the tension between
the provisionin g of intelligence and influencin g action.

Limitations to this Study
We acknowledge the concerns regarding case study m ethodology of a lack of rigor, and the
im possibility to generalise from a single case (Yin , 20 0 9). The value this case study adds is an
attem pt to generalise to “theoretical propositions, an d not to populations or universes” (Yin ,
20 0 9, p. 15). The theses proposed in the preceding section therefore could be used as a heuristic
fram ework for engaging with the state and use of student data in other higher education contexts.

(In)Conclusion
Despite and am idst the fact that “big data is all the rage” (Richards & Kin g, 20 13, p. 41) and
various claim s that Big Data will revolution ise and transform the way we live, work, and thin k
(Mayer-Schön berger & Cukier, 20 13), there are also m any authors who take a m ore sceptical and
critical approach to Big Data (Am oore, 20 11; boyd & Crawford, 20 12, 20 13; Bryan t & Raja, 20 14;
Couldry, 20 14; Lyon, 20 14; Richards & Kin g, 20 13).
In the current context of persisting concerns about student success and retention in higher
education and in particular ODL (e.g., Subotzky & Prinsloo, 20 11) and the prevailin g logic of
evidence-based decision-m akin g and the “algorithm ic turn” (Napoli, 20 13, p. 1) and “algorithm ic
regulation” (Morozov, 20 13a, par.15) in higher education, a critical interrogation of the potential
of Big Data is called for.
This article did not attem pt to question the potential of Big Data for higher education but rather
raised the question: “In order for big(ger) data to be better data, and to result in m ore effective
and appropriate teaching, learnin g and support, what are the issues that we need to consider?” In
the context of Unisa as a m ega ODL institution, this question necessitated a sober, and som ewhat
sceptical (if not disenchanted), view of the practical im plications to realise big(ger) data’s
potential. We concluded the article with proposin g five theses that, when considered, can
increase the realisation of Big Data’s potential in a specific context, that of Unisa.
Though som e of the issues m ay be m ore applicable to higher and distance education contexts in
developing world contexts, we propose the concluding theses to inform further research,
contem plation and consideration.

This work is licensed under a Creative Commons Attribution 4.0 International License.

299

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

Acknowledgement
The authors would like to acknowledge the com m ents and critical input received from the
reviewers during the review process. Their critical and supportive engagem ent allowed the
authors to reconsider and rework the value proposition of the article.

This work is licensed under a Creative Commons Attribution 4.0 International License.

300

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

References
Altbach, P.G., Reisberg, L., & Rum bley, L.E. (20 0 9). Trends in global higher education: Tracking
an academ ic revolution. A report prepared for the UN ESCO W orld Conference on
Higher Education. Paris. UNESCO. Retrieved from
http:/ / atepie.cep.edu.rs/ public/ Altbach,_ Reisberg,_ Rum bley_ Tracking_ an_ Academ ic_
Revolution ,_ UNESCO_ 20 0 9.pdf
Am oore, L. (20 11). Data derivatives: On the em ergence of a security risk calculus for our tim es.
Theory Culture Society , 28 , pp. 24—43. DOI: 10 .1177/ 0 263276411417430
Andrejevic, M. (20 14). The Big Data divide. International Journal of Com m unication, 8, 1673—
1689.
Arnold, K. (20 10 , March 3). Signals: Applying academ ic analytics. EDUCAUSErev iew [online].
Retrieved from http:/ / www.educause.edu/ ero/ article/ signals-applying-academ icanalytics
Baum an , Z., & Lyon, D. (20 13). Liquid surveillance. Cam bridge, UK: Polity.
Benn ett, C. J . (20 0 1). Cookies, web bugs, webcam s and cue cats: Patterns of surveillance on the
world wide web. Com m unications of the ACM, 3, 197– 210 .
DOI:10 .10 23/ A:10 12235815384.
Biesta, G. (20 0 7). Why “what works” won’t work: Evidence-based practice and the dem ocratic
deficit in educational research. Educational Theory , 57(1), 1– 22. DOI: 10 .1111/ j.17415446.20 0 6.0 0 241.x.
Biesta, G. (20 10 ). Why ‘what works’ still won’t work: From evidence-based education to valuebased education . Studies in Philosophy of Education, 29, 491– 50 3. DOI 10 .10 0 7/ s112170 10 -9191-x.
Bollier, D. (20 10 ). The prom ise and peril of Big Data. Eighteenth Annual Aspen Institute
Roundtable on Inform ation Technology . Retrieved from
http:/ / www.aspeninstitute.org/ sites/ default/ files/ content/ docs/ pubs/ The_ Prom ise_ and
_ Peril_ of_ Big_ Data.pdf
Booth, M. (20 12, J uly 18 ). Learning analytics: The new black. EDUCAUSEreview [online].
Retrieved from http:/ / www.educause.edu/ ero/ article/ learnin g-analytics-n ew-black
boyd, D., & Crawford, K. (20 12). Critical questions for Big Data. Inform ation, Com m unication &
Society , 15(5), 662—679. DOI: 10 .10 80 / 1369118 X.20 12.678878

This work is licensed under a Creative Commons Attribution 4.0 International License.

301

Big(ger) Data as Better Data in Open Distance Learning
Prinsloo, Archer, Barnes, Chetty and van Zyl

boyd, D., & Crawford, K. (20 13). Six provocations for Big Data. Retrieved from
http:/ / papers.ssrn .com / sol3/ papers.cfm ?abstract_ id=1926431
Briggs, S. (20 14, J anu