The Human Element of Big Data Issues, Analytics, and Performance pdf pdf

  

The Human Element of

Big Data

  

Issues, Analytics, and Performance

  

The Human Element of

Big Data

  

Issues, Analytics, and Performance

Edited by

  

Geetam S. Tomar

Narendra S. Chaudhari

Robin Singh Bhadoria

Ganesh Chandra Deka CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2017 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper Version Date: 20160824 International Standard Book Number-13: 978-1-4987-5415-6 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been

made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-

ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright

holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this

form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may

rectify in any future reprint.

  

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-

lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-

ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the

publishers.

  

For permission to photocopy or use material electronically from this work, please accesr contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,

978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For

organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

  

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for

identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at and the CRC Press Web site at

  Contents

  

  

  

  

  

  

   Kuldeep Singh Jadon and Radhakishan Yadav

  

   George Papachristos and Scott W. Cunningham

  

Meena Jha, Sanjay Jha, and Liam O’Brien

  

Utkarsh Sharma and Robin Singh Bhadoria

  

Ankita Sinha and Prasanta K. Jana

  Contents

  

Siddhartha Duggirala

  

Richard Millham and Surendra Thakur

  

Rafael Souza and Chandrakant Patil

  

Akshi Kumar and Abhilasha Sharma

  

  

Zhihan Lv, Xiaoming Li, Weixi Wang, Jinxing Hu, and Ling Yin

  

Raghavendra Kankanady and Marilyn Wells

   Mayank Bhushan, Apoorva Gupta, and Sumit Kumar Yadav

  

Daphne Lopez and Gunasekaran Manogaran

  

  

  This book contains 16 chapters of eminent quality research and practice in the field of Big Data analytics from academia, research, and industry experts. The book tries to provide quality discussion on the issues, challenges, and research trends in Big Data in regard to human behavior that could inherit the decision-making processes.

  During the last decade, people began interacting with so many devices, creating a huge amount of data to handle. This led to the concept of Big Data necessitating development of more efficient algorithms, techniques, and tools for analyzing this huge amount of data.

  As humans, we put out a lot of information on several social networking websites, including Facebook, Twitter, and LinkedIn, and this information, if tapped properly, could be of great value to perform analysis through Big Data algorithms and techniques. Data available on the Web can be in the form of video from surveillance systems or voice data from any call center about a particular client/human. Mostly, this information is in unstructured form, and a challenging task is to segregate this data.

  This trend inspired us to write this book on the human element of Big Data to present a wide conceptual view about prospective challenges and its remedies for an architectural paradigm for Big Data. Chapters in this book present detailed surveys and case studies for different application areas like the Internet of Things (IoT), healthcare, social media, mar- ket prediction analysis, and climate change variability. Fast data analysis is a very crucial phase in Big Data analytics, which is briefed in this book. Another important aspect of Big Data in this book is costing issues. For smooth navigation, the book is divided into the fol- lowing four sections:

  

Introduction to the Human Element of Big Data: Definition, New Trends,

  and Methodologies

  uture Research and Scope for the Human Element of Big Data

Case Studies for the Human Element of Big Data: Analytics and

  Performance

   Geetam Singh Tomar earned an undergraduate degree at the Institute

  of Engineers Calcutta, a postgraduate degree at REC Allahabad, and a PhD at RGPV Bhopal in electronics engineering. He completed post- doctoral work in computer engineering at the University of Kent, Canterbury, UK. He is the director of Machine Intelligence Research Labs, Gwalior, India. He served prior to this in the Indian Air Force, MITS Gwalior, IIITM Gwalior, and other institutes. He also served at the University of Kent and the University of the West Indies, Trinidad.

  He received the International Plato Award for academic excellence in 2009 from IBC Cambridge UK. He was listed in the 100 top academi- cians of the world in 2009 and 2013, and he was listed in Who’s Who in the World for 2008 and 2009. He has organized more than 20 IEEE international conferences in India and other countries. He is a member of the IEEE/ISO working groups to finalize protocols. He has delivered the keynote address at many conferences. He is the chief editor of five international journals, holds 1 patent, has published 75 research papers in international journals and 75 papers at IEEE conferences, and written 6 books and 5 book chapters for CRC Press and IGI Global. He has more than 100 citations per year. He is associated with many other universities as a visiting professor.

  Narendra S. Chaudhari has more than 20 years of rich experience

  and more than 300 publications in top-quality international confer- ences and journals. Currently, he is the director for the Visvesvaraya National Institute of Technology (VNIT) Nagpur, Maharashtra, India. Prior to VNIT Nagpur, he was with the Indian Institute of Technology (IIT) Indore as a professor of computer science and engineering. He has also served as a professor in the School of Computer Engineering at Nanyang Technological University, Singapore. He earned BTech, MTech, and PhD degrees at the Indian Institute of Technology Bombay, Mumbai, Maharashtra, India. He has been the keynote speaker at many conferences in the areas of soft computing, game artificial intelligence, and data management. He has been a referee and reviewer for a number of premier conferences and journals, including IEEE Transactions and Neurocomputing.

  Robin Singh Bhadoria is pursuing a PhD in computer science and engi-

  neering at the Indian Institute of Technology Indore. He has worked in numerous fields, including data mining, frequent pattern mining, cloud computing era and service-oriented architecture, and wire- less sensor networks. He earned bachelor’s and master’s of engineer- ing degrees in computer science and engineering at Rajiv Gandhi Technological University, Bhopal (MP), India. He has published more than 40 articles in international and national conferences, journals, and books published by IEEE and Springer. Presently, he is an associ- ate editor for the International Journal of Computing, Communications and

  Networking (IJCCN) as well as an editorial board member for different

  Editors

  journals. He is a member of several professional research bodies, including IEEE (USA), IAENG (Hong Kong), Internet Society (Virginia), and IACSIT (Singapore).

  Ganesh Chandra Deka is the deputy director (training) under the

  Directorate General of Training, Ministry of Skill Development and Entrepreneurship, Government of India. His research interests include ICT (information and communications technology) in rural development, e-governance, cloud computing, data mining, NoSQL databases, and vocational education and training. He has published more than 57 research papers at various conferences and workshops and in reputed international journals published by IEEE and Elsevier. He is the editor-in-chief of the International Journal of Computing,

  Communications, and Networking. He has organized eight IEEE interna-

  tional conferences as the technical chair in India. He is a member of editorial boards and a reviewer for various journals and international conferences. He is the coauthor of four books on the fundamentals of computer science, and he has published four edited books on cloud computing. He earned a PhD in computer science. He is a member of IEEE, the Institution of Electronics and Telecommunication Engineers, India, and he is an associate member of the Institution of Engineers, India.

   Awais Ahmad

  Raghavendra Kankanady

  Prasanta K. Jana

  Department of Computer Science and Engineering

  Indian School of Mines Dhanbad, India

  Meena Jha

  Central Queensland University Sydney, Australia

  Sanjay Jha

  Central Queensland University Sydney, Australia

  School of Engineering and Technology Central Queensland University Melbourne, Australia

  Institute of Information Technology and Management

  Akshi Kumar

  Department of Computer Science and Engineering

  Delhi Technological University New Delhi, India

  Xiaoming Li

  Shenzhen Institutes of Advanced Technology

  Chinese Academy of Sciences Shenzhen, China

  Daphne Lopez

  School of Information Technology and Engineering

  Madhya Pradesh, India

  Kuldeep Singh Jadon

  School of Computer Science and Engineering

  Delft Technical University Delft, The Netherlands

  Kyungpook National University Daegu, South Korea

  Robin Singh Bhadoria

  Discipline of Computer Science and Engineering

  Indian Institute of Technology Indore, India

  Mayank Bhushan

  ABES Engineering College Ghaziabad, India

  Scott W. Cunningham

  Faculty of Technology Policy and Management

  Audrey Depeige

  Chinese Academy of Sciences Shenzhen, China

  Telecom Ecole de Management—LITEM Evry, France

  Siddhartha Duggirala

  Bharat Petroleum Corporation Limited Mumbai, India

  Apoorva Gupta

  Institute of Innovation in Technology and Management (IITM)

  New Delhi, India

  Jinxing Hu

  Shenzhen Institutes of Advanced Technology

  VIT University Vellore, India

  Contributors Zhihan Lv

  Shenzhen Institutes of Advanced Technology

  G.L. Bajaj Group of Institutions Mathura, Uttar Pradesh, India

  Ankita Sinha

  Department of Computer Science and Engineering

  Indian School of Mines Dhanbad, India

  Rafael Souza Cipher Ltd.

  São Paulo, Brazil

  Surendra Thakur

  Durban University of Technology Durban, South Africa

  Weixi Wang

  Chinese Academy of Sciences Shenzhen, China

  Utkarsh Sharma

  Marilyn Wells

  School of Engineering and Technology Central Queensland University Rockhampton, Australia

  Radhakishan Yadav

  Discipline of Computer Science and Engineering

  Indian Institute of Technology Indore, India

  Sumit Kumar Yadav

  Indira Gandhi Delhi Technological University for Women

  New Delhi, India

  Ling Yin

  Shenzhen Institutes of Advanced Technology

  Department of Computer Science and Engineering

  Delhi Technological University

  Shenzhen Institutes of Advanced Technology

  George Papachristos

  Chinese Academy of Sciences Shenzhen, China

  Gunasekaran Manogaran

  School of Information Technology and Engineering

  VIT University Vellore, India

  Sourav Mazumder

  IBM Analytics San Francisco, California, USA

  Richard Millham

  Durban University of Technology Durban, South Africa

  Liam O’Brien

  Geoscience Australia Canberra, Australia

  Faculty of Technology Policy and Management

  Department of Computer Science and Engineering

  Delft Technical University Delft, The Netherlands

  Chandrakant Patil Texec Pvt. Ltd.

  Pune, India

  Anand Paul

  School of Computer Science and Engineering

  Kyungpook National University Daegu, South Korea

  M. Mazhar Rathore

  School of Computer Science and Engineering

  Kyungpook National University Daegu, South Korea

  Abhilasha Sharma

  Chinese Academy of Sciences Shenzhen, China

  

   Audrey Depeige CONTENTS

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

   ABSTRACT Undeniably, Big Data analytics have drawn increased interest among research-

  ers and practitioners in the data sciences, digital information and communication, and policy shaping or decision making at multiple levels. Complex data models and knowledge-intensive problems require efficient analysis techniques, which otherwise performed manually would be time consuming or prone to numerous errors. The need for efficient solutions to manage growing amounts of data has resulted in the rise of data mining and knowledge discovery techniques, and in particular the development of computer intelligence via powerful algo- rithms. Yet, complex problem-solving and decision-making areas do not constitute a single source of truth and still require human intelligence. The human elements of Big Data are aspects of strategic importance: they are essential to combine the advantages provided by the

  The Human Element of Big Data

  speed and accuracy of scalable algorithms, together with the capabilities of the human mind to perceive, analyze and make decisions e.g., letting people interact with integrative data visu- alization solutions. This chapter thus seeks to reflect on the various methods available to com- bine data mining and visualization techniques toward an approach integrating both machine capabilities and human sense-making. Building on literature review in the fields of knowledge discovery, Big Data analytics, human–computer interactions, and decision making, the chapter highlights evolution in knowledge discovery theorizations, trends in Big Data applications, challenges of techniques such as machine learning, and how human capabilities can best opti- mize the use of mining and visualization techniques.

1.1 Big Data for All: A Human Perspective on Knowledge Discovery

  1.1.1 The Knowledge Revolution: State of the Art and Challenges of Data Mining

  The rise of Big Data over the last couple of years is easily noticeable. Referring to our abil- ity to harness, store, and extract valuable meaning from vast amounts of data, the term Big

  

Data holds the implicit promise of answering fundamental questions, which disciplines such

  as the sciences, technology, healthcare, and business have yet to answer. In fact, as the vol- ume of data available to professionals and researchers steadily grows opportunities for new discoveries as well as potential to answer research challenges at stake are fast increasing (Manovich, 2011) it is expected that Big Data will transform various fields such as medicine, businesses, and scientific research overall (Chen and Zhang, 2014), and generate profound shifts in numerous disciplines (Kitchin, 2014). Yet, the adoption of advanced technologies in the field of Big Data remains a challenge for organizations, which still need to strategi- cally engage in the change toward rapidly shifting environments (Bughin et al., 2010). What is more, organizations adopting Big Data at an early stage still face difficulties in under- standing its guiding principles and the value it adds to the business (Wamba et al., 2015). Moreover, data sets are often of different types, which urges organizations to develop or apply “new forms of processing to enable enhanced decision making, insights discovery and process optimization” (Chen and Zhang, 2014, p. 315) as well as “a knowledge of analytics approaches” to different unstructured data types such as text, pictures, and video format, proving to be highly beneficial (Davenport et al., 2014) so that data scientists can quickly test and provide solutions to business challenges, emphasizing the application of Big Data analytics in their business context over a specific analytical approach. Indeed, a data scientist student can be taught “how to write a Python program in half an hour” but can’t be taught “very easily what is the domain knowledge” (Dumbill et al., 2013). This argument highlights the dependencies that exist for an effective analysis and up-to-speed discovery process.

  1.1.2 Big Data: Relational Dependencies and the Discovery of Knowledge

  Specialized literature and research on the topic conceals that Big Data involves working on data sets that are so voluminous that their size goes beyond the capability of popular soft- ware to extract, manage, and process data in a short time (Manovich, 2011). The question of what type of insights and understanding can be gained through data analysis, in com- parison to traditional science methods, is an important one in the context of digitalization

  Taming the Realm of Big Data Analytics

  creation, collection, analysis, curation, and broadcasting of knowledge (Amer-Yahia et al., 2010) having demonstrated the benefits of spontaneous collaboration and analysis of interactions of vast amounts of users to tackle scientific problems that remained unsolved by smaller amounts of people. Yet, challenges arise when organizations need to adopt new technologies to process vast amounts of data while they also need to overcome issues related to the capture, storage, curation, analysis and visualization of data in their quest for optimized decision making and gaining new insights on potential business oppor- tunities. Issues that organizations face to implement Big Data applications are related to the technology and techniques used, the access to data itself, as well as organizational change and talent issues (Wamba et al., 2015). These results indicate that human elements such as skills and knowledge required to implement and generate value from Big Data analytics (technical skills, analytical skills, and governance skills), as well as change man- agement factors such as the buy-in from the top management, remain much needed to unlock its full potential.

1.1.3 Potentials and Pitfalls of Knowledge Discovery

  Big Data and data intensive applications have become a new paradigm for innovative discoveries and data-centric applications. As Chen and Zhang (2014) recall, the potential value and insights hidden in the sea of data sets surrounding us is massive, giving birth to new research paradigms such as data-intensive scientific discovery (DISD). Big Data represents opportunities to achieve tremendous progress in varied scientific fields, while business model landscapes are also transformed by explorations and experimentations with Big Data analytics. This argument is supported by high-level organizations and gov- ernment bodies, which argue that the use of data-intensive decision making has had sub- stantial impact on their present and future developments (Chen and Zhang, 2014). Such potentials cover the improvement of operational efficiencies, making informed decisions, providing better customer services, identifying and developing new products and ser- vices, as well as identifying new markers or accelerating go-to-market cycles. However, it appears that very little empirical research has assessed the real potential of Big Data in realizing business value (Wamba et al., 2015). The process of knowledge discovery, as illustrated in , is a good example of such value creation, as intrinsically guiding attempts to identify relationships existing within a data set and extracting meaningful insights on the basis of their configuration. This process is highly dependent on guided assumptions and strategic decisions as regards the framework and analysis strategies, so that “theoretically informed decisions are made as to how best to tackle a data set, such that it will reveal information which will be of potential interest and is worthy of further research” (Kitchin, 2014).

  Data Selecting and Patterns Collecting and Integrating visualization transforming discovery and cleaning data data and data evaluation decision aiding

  FIGURE 1.1

  The Human Element of Big Data

  Big Data is thus estimated to generate billions of dollars of potential value if exploited accurately, although this is notwithstanding the challenges correlative to data-intensive technologies and application. Such issues related to the collection, storage, analysis, and visualization stages involved in processing Big Data. In other words, organizations need to grow their capabilities to explore and exploit data, in a context where “information surpasses our capability to harness” (Chen and Zhang, 2014, p. 5), where pitfalls faced by organizations typically include inconsistencies, incompleteness, lack of scalability, irrel- evant timeliness or security issues in handling, processing, and representing structured and unstructured data. In particular, it appears that organizations need to rely on high- performing storage technologies and adapted network bandwidth, as well as the capabil- ity to manage large-scale data sets in a structured way. The potential of Big Data emerges in the “proliferation, digitization and interlinking of diverse set of analogue and unstruc- tured data” (Kitchin, 2014). Thus, the next steps are to cope with the volume of data to analyze and increment analytical data mining techniques, algorithms, and visualization methods that are possibly scalable, the aspect of timeliness constitutes a priority for real- time Big Data applications (Chen and Zhang, 2014). In this perspective, methods con- centrating on the curation, management, and analysis of hundreds of thousands of data entries reflect the progression of new digital humanities techniques.

1.2 The Data Mining Toolbox: Untangling Human-Generated Texts

1.2.1 Interactive Generation and Refinement of Knowledge: The Analytic-Self

  The evolution of humanist and social sciences toward the “mining” of human-generated data comes as an answer to the digitalization of businesses, which calls for the use of “techniques needed to search, analyze and understand these every day materials” (Manovich, 2011). The rise of social media communications early in the 21st century has provided researchers and data analysts with new opportunities to deepen their understanding of socially accepted theories such as opinion spreading, sentiment expression, ideas generation, amongst others. Research fields relying on such quantitative amounts of surfaced data include marketing, economics, and behavioral science (sociology, communications). In between the “surface data” and “deep data” has also emerged the pioneering discipline of digital ethnography, which offers a new approach for depicting and analyzing storytelling in social media, using interactive components such as user-generated data, and applying anthropological research methods in digital data analysis and planning. As an illustration, the increasing number of digital ethnography centers reveals the intersections made possible between anthropo- logical and business perspectives on one hand, and between the individual or consumer behaviors, and the corporate world on the other hand. Such methods rely on the use of public data generated on online networks and social media, which constitute a pool of daily interactions. In this perspective, digital ethnography and other methods relying on the use of Big Data on digital platform places the user at the center, where self-representation and online identities emerge from the different interactions and strategies, which the user acti- vates in various digital public spheres. In this perspective, the use of mixed research meth- ods (both quantitative and qualitative) enables researchers to focus on the digital life of the users, combining techniques such as co-occurrences or network analysis (from a quantita- tive standpoint) with sentiment analysis (from a qualitative standpoint).

  Taming the Realm of Big Data Analytics

  1.2.2 Looking into the Mirror: Data Mining and Users’ Profile Building

  Large data sets are being used in projects resonating with “digital humanities” application fields, as professionals start working with user-generated content (e.g., videos), user inter- actions (web searches, comments, clicks, etc.), user-created data (tags), and user commu- nications (messages). Such data sets are extremely large and continuously growing, not to mention “infinitely larger than already digitized cultural heritage” (Manovich, 2011). These developments raise theoretical, practical, and ethical issues related to the collection, use, and analysis of large amounts of individually and socially generated data. The monitoring and collection of such user-generated interactions (voluntary communications such as blog posts, comments, tweets, check ins, and video sharing) has been on the rise and sought after by marketing and advertising agencies, reusing this data to analyze and extract value from “deep data” about individuals’ trajectories in the online world (Manovich, 2011). The rise of social media combined with the emergence of new technologies has made it possible to adopt a new approach to understand individuals and society at large, erasing the long existing dichotomy between large sample size (quantitative studies) and in-depth analysis (qualitative studies). In other words, profiles or “persona” that were earlier built based on extended analysis of a small set of people is now rendered achievable at a large scale, rely- ing on continuous data generated from daily user interactions.

  The study of social interactions and human behaviors in the context of the consequently offers opportunities to analyze interaction patterns directly from the structured and unstruc- tured data, opening the door to the development of new services that take into account how interactions emerge, evolve, and link with others or disaggregate across collective digital spheres. This view confirms the opportunities represented by consumers’ data mining, since numerous companies see their customers spread around the world and generating vast amounts as well as fast moving transactional artifacts. However, previous work has reported that even though Big Data can provide astounding detailed pictures on the custom- ers (Madsbjerg and Rasmussen, 2014), such profiles are actually far from complete and may also mislead people working with such insights. The challenge of getting the right insights to make relevant customer decisions is critical and is detailed in the next section.

  

1.2.3 Accurately Interpreting Knowledge Artifacts: The Shadows of Human Feedback

  The Office of Digital Humanities, created in 2008, has opened the door for humanists to pursue their research work making use of large data sets (Manovich, 2011) that include transactional data such as web searches and message records. The use and analysis of such data sources does prelude exciting opportunities for research and practice, yet the analysis of millions and billions of online interactions represents a few “dark areas” that deserve attention from decision makers, those who will make final use of this new, large scale, user-generated data. In particular, there is a need to clarify the skills digital human- ists will require in order to take full advantage of such data (Manovich, 2011), that is to say specific statistics and data analysis methods. This means that interpreting knowledge artifacts extracted from large-scale data sets and related visualization class for skills in statistics and data mining, skills that social researchers often do not gain, at least in the way they are initially trained. This view is supported by recent research work highlight- ing that Big Data shall be envisioned not only considering its analytical side, rather, acute human skills are critical: Big Data shall be approached “not only in terms of analytics, but more in terms developing high-level skills that allow the use of a new generation of

  IT tools and architectures to collect data from various sources, store, organize, extract,

  The Human Element of Big Data

  analyze, generate valuable insights” (Wamba et al., 2015, p. 6). There exists, indeed, a “large gap between what can be done with the right software tools, right data, and no knowledge of computer science and advanced statistics, and what can only be done if you have this knowledge” (Manovich, 2011), highlighting that researchers and professionals do need specialized skills and knowledge (statistics, computational linguistics, text mining, com- puter science, etc.) in order to be able to extract meaningful results of the collected data.

  Organizations that capitalize on Big Data often tend to rely on data scientists rather than data analysts (Davenport et al., 2012), since the information that is collected and processed is often too voluminous, unstructured, and flowing as opposed to conventional database structures. The role of data scientist appeared early in the 21st century, together with the acceleration of social media presence and the development of roles dedicated to the storage, processing, and analysis of data, which Davenport (2014, p. 87) depicts as “hacker, scien- tist, qualitative analyst, trusted advisor and business expert,” pointing out that “many of the skills are self taught anyway.” Although such skills have become prevalent in today’s context, the access to the data and its publication raises some questions related to the use, storage, and informational use of such user-generated data. Specifically, not all interactions on social media and in the digital world in general can be deemed as authentic (Manovich, 2011), rather such data reflects a well-thought curation and management of online presence and expressions. Reversely, the interpretation outcomes of data analysis can be rendered difficult in relation to the quality of the collected data, which may happen to be inconsistent, incomplete, or simply noisy (Chen and Zhang, 2014). This issue is proper to the “veracity” property of Big Data, inducing uncertainty about the level of completeness and consistency of the data as well as other ambiguous characteristics (Jin et al., 2015). Indeed, there always exists a risk of the data being “redundant, inaccurate and duplicate data which might under- mine service delivery and decision making processes” (Wamba et al., 2015, p. 24).

  Even though there exists techniques dedicated to virtually correct inconsistencies in data sets as well as removing noise, we have to keep in mind that this data is not a “transpar- ent window into people’s imaginations, intentions, motives, opinion and ideas” (Manovich, 2011), rather it may include fictional data that aimed to construct and project a certain online expression. Despite gaining access to a new set of digitally captured interactions and records of individual behaviors, the human elements of Big Data remains such that data scientists and analysts will gain different insights than those ethnographers on the field would get. In other words, one can say that in order to “understand what makes customer tick, you have to observe them in their natural habitats” (Madsbjerg and Rasmussen, 2014). This view is in line with the fact that subject matter experts in data science and therefore humans elements are much needed as they have “a very narrow and particular way of understanding” and are “needed to assess the results of the work, especially when dealing with sensitive data about human behavior” (Kitchin, 2014), making it difficult to interpret data independently from the context in which it has been generated considering it as anemic from its domain expertise.

1.3 The Deep Dialogue: Lessons of Machine Learning for Data Analysis

1.3.1 Human–Machine Interaction and Data Analysis: The Rise of Machine Learning

  One of the questions raised by the use of Big Data analytics is as follows: Could the enter-

  Taming the Realm of Big Data Analytics Gathering Recording/ Analyzing data storing data data

FIGURE 1.2 Premises of Big Data’s promises: from data collection to data analysis.

  insights from every customer interaction, and didn’t have to wait for months to get data from the field (Bughin et al., 2010)? It is estimated that data available publicly doubles every eighteen months, while the access to capture and analyze such data streams is becoming widely available at reduced cost. The first stages of the data analysis process are depicted in , and used as a foundation by companies to analyze customer situations and support them in making real-time decisions, such as testing new products and customer experiences.

  Companies may therefore make use of real-time information from any sensor in order to better understand the business context in which they evolve; develop new prod- ucts, processes, and services; and anticipate and respond to changes in usage patterns (Davenport et al., 2012) as well as taking advantage of more granular analyses. Beyond these developments, the opportunities brought by machine learning research are note- worthy, and the methods that enable marshaling the data generated from customers’ interactions and using it to predict outcomes or upcoming interactions, places data sci- ence as having the potential to radically transform the way people conduct research, develop innovations, and market their ideas (Bughin et al., 2010). Similarly, Kitchin (2014) states that applications of Big Data and analytics bring disruptive innovations into play and contribute to reinventing how research is conducted. This context calls for research aiming to understand the impact of Big Data on processes, systems, and business chal- lenges overall (Wamba et al., 2015). Several large players in the technology industry have been using and developing such paradigms in order to refine their marketing methods, identify user groups, and develop tailored offers for certain profiles. Everyday informa- tion collected from transactions (payments, clicks, posts, etc.) are collected and analyzed in order to optimize existing opportunities or develop new services in very short times, even real time. Does it mean that Big Data applications make each of us a human sensor, connected to a global system, and thus has Big Data the potential to become the human- ity’s dashboard (Smolan and Erwitt, 2012)? Other researchers have reported worries of such possibility, because Big Data can typically expand the frontier of the “knowable future,” questioning the “people’s ability to analyze it wisely” (Anderson and Rainie, 2012). As the sea level of data sets is rising rapidly, the crunch of algorithms might draw right (or wrong) conclusions about who people are, how they behave now, how they may behave in the future, how they feel, and so forth in a context where “nowcasting” or real- time analytics are getting better.

1.3.2 Using Machine Learning Techniques to Classify Human Expressions

  Other companies are going a step forward and seek to better understand the impact of dedicated actions/initiatives such as marketing campaigns on their customers: not only do machine learning technologies enable companies to gauge and classify consumers accord- ing to sentiment they express toward the brand, company, or site, rather the analysis also enables companies to trace, test, and learrom user interactions how senti-

  The Human Element of Big Data Gathering Recording/ Analyzing Predicting data storing data data interaction

FIGURE 1.3 From data collection to prediction: a test-and-learn approach.

  Where organizations may be interested to understand evolutions that exist within the collected data and how they can be meaningful—something that is traditionally casted as being specific to the human mind—data analytics software developed for such applica- tions (data mining and visualization to answer customers) have claimed to have removed “the human element that goes into data mining, and as such the human bias that goes with it” (Kitchin, 2014). This tends to inaccurately suggest that data speaks for itself, not requiring any human framing neither efforts to depict meaning of patterns and relation- ships within Big Data. Kitchin (2014) coined this paradox: the attractive set of ideas that surrounds Big Data is based on the principle that the reasoning that underpins Big Data is inductive in nature, and runs counter to the deductive approach that dominates in modern science. Researchers shall be particularly cautious as regards Big Data, because it repre- sents a sample that is shaped by several parameters such as the use of the tools, the data ontology shaping the analysis, sample bias, and a relative abstraction from the world that is generally accepted but provides oligoptic views of the world.

1.3.3 Learning Decision Rules: The Expertise of Human Forecasting

  Previous research has argued that Big Data has the potential to transform ways decisions are made, providing senior executives with increased visibility over operations and per- formance (Wamba et al., 2015). Managers may for instance use the Big Data infrastructure to gain access to dashboards fed with real-time data, so that they can identify future needs and formulate strategies that incorporate predicted risks and opportunities. Professionals can also take advantage of Big Data by identifying specific needs and subsequently deliv- ering tailored services that will meet each of those needs. Yet, while platforms enabling the analysis of real-time data may for some be considered as a single source of truth, decision- making capabilities do not solely rely on capabilities brought by machine learning tech- nologies, rather it comes forward that experimentation, test-and-learn scenarios (Bughin et al., 2010), and human sense-making of the outcomes and patterns identified are essential to the organizational and cultural changes brought into picture. This attitude specifically highlights “the role of imagination (or lack thereof) in artificial, human and quantum cog- nition and decision-making processes” (Gustafson, 2015). In other words, “analysts should also try to interpret the results of machine learning analyses, looking into the black box to try and make sense out of why a particular model fits the best” (Davenport, 2014, p. 96). Another example of the role of the human thought process in Big Data is given by Wamba et al. (2015, p. 21), pointing out that “having real-time information on ‘who’ and ‘where’ is allowing not only the realignment and movement of critical assets …, but also informing strategic decision about where to invest in the future to develop new capabilities”. Such perspective encompasses efforts from companies that have identified the right skills and methods they need in order to lead and conduct experiential scenarios as well as extract- ing value from Big Data analytics. These scenarios are represented in , high- lighting the role of data in decision making processes, while supporting the fact that the

  Taming the Realm of Big Data Analytics Gathering Recording/ Analyzing Predicting Making a data storing data data interaction decision

FIGURE 1.4 A structured path to decision making in Big Data projects.

  for data scientists” (Davenport, 2014, p. 87). This is where the human elements of Big Data are commonly stronger: a rigorous analysis and decision making over the various scenarios identified via Big Data analytics require people to be aware that strong cultural changes are at stake. Executives must embrace “the value of experimentation” (Bughin et al., 2010) and act as a role model for all echelons of the company. In parallel to this, human interactions and especially communication and strong relationships are highly necessary, data scientists being “on the bridge advising the captain at close range” (Davenport, 2014).

1.4 Making Sense of Analytics: From Insights to Value

1.4.1 Complementarity of Data and Visual Analytics: A View on Integrative Solutions

  Initiatives such as the Software Studies lab (Manovich, 2011) have focused on developing techniques to analyze visual data and exploring new visualization methods in order to detect patterns in large sets of visual artifacts such as user-generated videos, photographs, or films. The widespread preference for visual analytics (Davenport, 2014) is very notice- able in Big Data projects, for several reasons: they are easier to interpret and catch the audience’s eye more easily, even though they may not be adapted for complex modeliza- tions. Manovich’s work highlights that human understanding and analysis is still needed to provide nuanced interpretations of data and understand deep meanings that remain uncovered. This is supported by the fact that even though sophisticated approaches have emerged in the field of data visualization, current available solutions are offering poor functionalities, scalability, and performances (Chen and Zhang, 2014). Very few tools have the capability to handle complex, large-scale data sets and transform them into intuitive representations, while being interactive. It is therefore certain that modeling complex data sets and graphically characterizing their properties needs to be rethought to support the visual analytics process. In other words, Big Data does not aim to substantiate human judgment or replace experts with technology, rather technology helps visualizing huge sets of data and detect patterns or outliers, where human judgment is needed for closer analysis and making sense out of the detected patterns. This may explain why visual ana- lytics are extremely common in Big Data projects (Davenport, 2014), since they are much more appealing in order to communicate results and findings to nontechnical audiences. By processing structured and unstructured data, organizations are able to push some intelligence into their structure so as to support operations in the field, and implement innovative products and services (Wamba et al., 2015). Therefore, it comes forward that the combined ability of the technology to analyze huge sets of data with that of the human mind to interpret data undoubtedly gives most meaningful results, since human analyti- cal thinking can’t process such large data volumes, and computers’ ability to understand

  The Human Element of Big Data

  working with the analysis of data needs to be able to communicate well and easily explain the outcomes of analyses to nontechnical people (Davenport, 2014).

1.4.2 From Analytics to Actionable Knowledge-as-a-Service