Machine Learning, Optimization, and Big Data 2017

Giuseppe Nicosia · Panos Pardalos (Eds.)

Giovanni Giuffrida · Renato Umeton

Machine Learning, Optimization,

LNCS 10710 and Big Data Third International Conference, MOD 2017 Volterra, Italy, September 14–17, 2017 Revised Selected Papers

Lecture Notes in Computer Science 10710

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board

David Hutchison Lancaster University, Lancaster, UK

Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA

Josef Kittler University of Surrey, Guildford, UK

Jon M. Kleinberg Cornell University, Ithaca, NY, USA

Friedemann Mattern ETH Zurich, Zurich, Switzerland

John C. Mitchell Stanford University, Stanford, CA, USA

Moni Naor Weizmann Institute of Science, Rehovot, Israel

C. Pandu Rangan Indian Institute of Technology, Madras, India

Bernhard Steffen TU Dortmund University, Dortmund, Germany

Demetri Terzopoulos University of California, Los Angeles, CA, USA

Doug Tygar University of California, Berkeley, CA, USA

Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at

^• Giuseppe Nicosia Panos Pardalos ^• Giovanni Giuffrida Renato Umeton (Eds.)

Machine Learning, Optimization, and Big Data

Third International Conference, MOD 2017 Volterra, Italy, September 14–17, 2017 Revised Selected Papers Editors Giuseppe Nicosia Giovanni Giuffrida University of Catania University of Catania Catania Catania Italy Italy Panos Pardalos Renato Umeton University of Florida Harvard University Gainesville, FL Cambridge, MA

USA USA

ISSN 1611-3349 (electronic) Lecture Notes in Computer Science

ISBN 978-3-319-72925-1

ISBN 978-3-319-72926-8 (eBook) https://doi.org/10.1007/978-3-319-72926-8 Library of Congress Control Number: 2017962876 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, express or implied, with respect to the material contained herein or for any errors or

omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in

published maps and institutional afﬁliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG

Preface

MOD is an international conference embracing the ﬁelds of machine learning, optimization, and data science. The third edition, MOD 2017, was organized during September 14–17, 2017 in Volterra (Pisa, Italy), a stunning medieval town dominating the picturesque countryside of Tuscany.

The key role of machine learning, reinforcement learning, artiﬁcial intelligence, large-scale optimization, and big data for developing solutions to some of the greatest challenges we are facing is undeniable. MOD 2017 attracted leading experts from the academic world and industry with the aim of strengthening the connection between these institutions. The 2017 edition of MOD represented a great opportunity for professors, scientists, industry experts, and postgraduate students to learn about recent developments in their own research areas and to learn about research in contiguous research areas, with the aim of creating an environment to share ideas and trigger new collaborations.

As chairs, it was an honor to organize a premiere conference in these areas and to have received a large variety of innovative and original scientiﬁc contributions. During this edition, six plenary lectures were presented:

Yi-Ke Guo, Department of Computing, Faculty of Engineering, Imperial College London, UK. Founding Director of Data Science Institute Panos Pardalos, Department of Systems Engineering, University of Florida, USA.

Director of the Center for Applied Optimization Ruslan Salakhutdinov, Machine Learning Department, School of Computer Science at Carnegie Mellon University, USA. Director of AI Research at Apple My Thai, Department of Computer and Information Science and Engineering, University of Florida, USA Jun Pei, Hefei University of Technology, China Vincenzo Sciacca, Cloud and Cognitive Division – IBM Rome, Italy

There were also two tutorial speakers: Domenico Talia, Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica Università della Calabria, Italy Xin–She Yang, School of Science and Technology – Middlesex University London, UK

Moreover, the conference hosted the second edition of the industrial session on “Machine Learning, Optimization and Data Science for Real-World Applications”:

Luca Maria Aiello, Nokia Bell Labs, UK Pierpaolo Basile, University of Bari, Italy VI Preface

Carlos Castillo, Universitat Pompeu Fabra in Barcelona, Spain Moderator: Aris Anagnostopoulos, Sapienza University of Rome, Italy

We received 126 submissions from 46 countries and five continents; each manu- script was independently reviewed by a committee formed by at least five members through a blind review process. These proceedings contain 49 research articles written by leading scientists in the fields of machine learning, artificial intelligence, reinforcement learning, computational optimization, and data science presenting a sub- stantial array of ideas, technologies, algorithms, methods, and applications.

For MOD 2017, Springer generously sponsored the MOD Best Paper Award. This year, the paper by Khaled Sayed, Cheryl Telmer, Adam Butchy, and Natasa Miskov-Zivanov titled “Recipes for Translating Big Data Machine Reading to Exe- cutable Cellular Signaling Models” received the MOD Best Paper Award.

This conference could not have been organized without the contributions of these researchers, and so we thank them all for participating. A sincere thank you also goes to all the Program Committee, formed by more than 300 scientists from academia and industry, for their valuable work of selecting the scientiﬁc contributions.

Finally, we would like to express our appreciation to the keynote speakers, tutorial speakers, and the industrial panel who accepted our invitation, and to all the authors who submitted their research papers to MOD 2017. September 2017

Giuseppe Nicosia Panos Pardalos

Giovanni Giuffrida Renato Umeton

Organization

General Chair

Renato Umeton Harvard University, USA

Conference and Technical Program Committee Co-chairs

Giuseppe Nicosia University of Catania, Italy and University of Reading, UK

Panos Pardalos University of Florida, USA Giovanni Giuffrida University of Catania, Italy

Tutorial Chair

Giuseppe Narzisi New York University Tandon School of Engineering, USA

Industrial Session Chairs

Ilaria Bordino UniCredit R&D, Italy Marco Firrincieli UniCredit R&D, Italy Fabio Fumarola UniCredit R&D, Italy Francesco Gullo UniCredit R&D, Italy

Organizing Committee

Piero Conca CNR, Italy Jole Costanza Italian Institute of Technology, Milan, Italy Giorgio Jansen University of Catania, Italy Giuseppe Narzisi New York University Tandon School of Engineering,

USA Andrea Patane’

University of Oxford, UK Andrea Santoro Queen Mary University London, UK Renato Umeton Harvard University, USA

Technical Program Committee

Agostinho Agra Universidade de Aveiro, Portugal Kerem Akartunali University of Strathclyde, UK Richard Allmendinger The University of Manchester, UK Aris Anagnostopoulos Università di Roma La Sapienza, Italy Takaya Arita Nagoya University, Japan Jason Atkin The University of Nottingham, UK Chloe-Agathe Azencott Institut Curie Research Centre, Paris, France Jaume Bacardit Newcastle University, UK James Bailey University of Melbourne, Australia Baski Balasundaram Oklahoma State University, USA Elena Baralis Politecnico di Torino, Italy Xabier E. Barandiaran University of the Basque Country, Spain Cristobal Barba-Gonzalez University of Malaga, Spain Helio J. C. Barbosa

Laboratório Nacional de Computacao Cientiﬁca, Brazil Roberto Battiti University of Trento, Italy Lucia Beccai Istituto Italiano di Tecnologia, Italy Aurelien Bellet Inria Lille, France Gerardo Beni University of California at Riverside, USA Khaled Benkrid The University of Edinburgh, UK Peter Bentley University College London, UK Katie Bentley Harvard Medical School, USA Heder Bernardino Universidade Federal de Juiz de Fora, Brazil Daniel Berrar Tokyo Institute of Technology, Japan Adam Berry CSIRO, Australia Luc Berthouze University of Sussex, UK Martin Berzins SCI Institute, University of Utah, USA Mauro Birattari

IRIDIA, Université Libre de Bruxelles, Belgium Leonidas Bleris University of Texas at Dallas, USA Christian Blum Spanish National Research Council, Spain Paul Bourgine École Polytechnique Paris, France Anthony Brabazon University College Dublin, Ireland Paulo Branco Instituto Superior Tecnico, Portugal Juergen Branke University of Warwick, UK Larry Bull University of the West of England, UK Tadeusz Burczynski Polish Academy of Sciences, Poland Robert Busa-Fekete Yahoo! Research, NY, USA Sergiy I Butenko Texas A&M University, USA Stefano Cagnoni University of Parma, Italy Yizhi Cai University of Edinburgh, UK Guido Caldarelli

IMT Lucca, Italy Alexandre Campo Université Libre de Bruxelles, Belgium Angelo Cangelosi University of Plymouth, UK Salvador Eugenio Caoili University of the Philippines Manila, Philippines Timoteo Carletti University of Namur, Belgium Jonathan Carlson Microsoft Research, USA Celso Carneiro Ribeiro Universidade Federal Fluminense, Brazil Michelangelo Ceci University of Bari, Italy Adelaide Cerveira Universidade de Tras-os-Montes e Alto Douro,

Portugal

VIII Organization

Xu Chang University of Sydney, Australia W. Art Chaovalitwongse University of Washington, USA Antonio Chella Università di Palermo, Italy Ying-Ping Chen National Chiao Tung University, Taiwan Haifeng Chen NEC Labs, USA Keke Chen Wright State University, USA Gregory Chirikjian Johns Hopkins University, USA Silvia Chiusano Politecnico di Torino, Italy Miroslav Chlebik University of Sussex, UK Sung-Bae Cho Yonsei University, South Korea Anders Christensen Lisbon University Institute, Portugal Dominique Chu University of Kent, UK Philippe Codognet

University Pierre and Marie Curie – Paris 6, France Carlos Coello Coello CINVESTAV-IPN, Mexico George Coghill University of Aberdeen, UK Pietro Colombo University of Insubria, Italy David Cornforth University of Newcastle, UK Luís Correia University of Lisbon, Portugal Chiara Damiani University of Milan-Bicocca, Italy Thomas Dandekar University of Würzburg, Germany Ivan Luciano Danesi Unicredit Bank, Italy Christian Darabos Dartmouth College, USA Kalyanmoy Deb Michigan State University, USA Nicoletta Del Buono University of Bari, Italy Jordi Delgado Universitat Politecnica de Catalunya, Spain Ralf Der MPG, Germany Clarisse Dhaenens Université Lille, France Barbara Di Camillo University of Padua, Italy Gianni Di Caro

IDSIA, Switzerland Luigi Di Caro University of Turin, Italy Luca Di Gaspero University of Udine, Italy Peter Dittrich Friedrich Schiller University of Jena, Germany Federico Divina Pablo de Olavide University of Seville, Spain Stephan Doerfel Kassel University, Germany Devdatt Dubhashi Chalmers University, Sweden George Dulikravich Florida International University, USA Juan J. Durillo University of Innsbruck, Austria Omer Dushek University of Oxford, UK Marc Ebner Ernst-Moritz-Arndt-Universität Greifswald, Germany Pascale Ehrenfreund The George Washington University, USA Gusz Eiben

VU Amsterdam, The Netherlands Aniko Ekart Aston University, UK Talbi El-Ghazali University of Lille, France Michael Elberfeld RWTH Aachen University, Germany Michael T. M. Emmerich Leiden University, The Netherlands

Organization

IX Anton Eremeev Sobolev Institute of Mathematics, Russia Harold Fellermann Newcastle University, UK Chrisantha Fernando Queen Mary University, UK Cesar Ferri Universidad Politecnica de Valencia, Spain Paola Festa University of Naples Federico II, Italy Jose Rui Figueira Instituto Superior Tecnico, Lisbon, Portugal Grazziela Figueredo The University of Nottingham, UK Alessandro Filisetti Explora Biotech Srl, Italy Christoph Flamm University of Vienna, Austria Enrico Formenti Nice Sophia Antipolis University, France Giuditta Franco University of Verona, Italy Piero Fraternali Politecnico di Milano, Italy Valerio Freschi University of Urbino, Italy Enrique Frias Martinez Telefonica Research, Spain Walter Frisch University of Vienna, Austria Rudolf M. Fuchslin Zurich University of Applied Sciences, Switzerland Claudio Gallicchio University of Pisa, Italy Patrick Gallinari

LIP6 – University of Paris 6, France Luca Gambardella

IDSIA, Switzerland Jean-Gabriel Ganascia

Pierre and Marie Curie University – LIP6, France Xavier Gandibleux Université de Nantes, France Alfredo G. Hernandez-Diaz

Pablo de Olvide University – Seville, Spain Jose Manuel Garcia Nieto University of Malaga, Spain Paolo Garza Politecnico di Torino, Italy Romaric Gaudel Inria, France Nicholas Geard University of Melbourne, Australia Philip Gerlee Chalmers University, Sweden Mario Giacobini University of Turin, Italy Onofrio Gigliotta University of Naples Federico II, Italy Giovanni Giuffrida University of Catania, Italy Giorgio Stefano Gnecco University of Genoa, Italy Christian Gogu Université Toulouse III, France Faustino Gomez

IDSIA, Switzerland Michael Granitzer University of Passau, Germany Alex Graudenzi University of Milan-Bicocca, Italy Julie Greensmith University of Nottingham, UK Roderich Gross

The University of Shefﬁeld, UK Mario Guarracino

ICAR-CNR, Italy Francesco Gullo Unicredit Bank, Italy Steven Gustafson GE Global Research, USA Jin-Kao Hao University of Angers, France Simon Harding Machine Intelligence Ltd., Canada Richard Hartl University of Vienna, Austria Inman Harvey University of Sussex Jamil Hasan University of Idaho, USA

X Organization

Geir Hasle SINTEF ICT, Norway Carlos Henggeler Antunes University of Coimbra, Portugal Francisco Herrera University of Granada, Spain Arjen Hommersom Radboud University, The Netherlands Vasant Honavar Pennsylvania State University, USA Fabrice Huet University of Nice Sophia Antipolis, France Hiroyuki Iizuka Hokkaido University, Japan Takashi Ikegami University of Tokyo, Japan Bordino Ilaria Unicredit Bank, Italy Hisao Ishibuchi Osaka Prefecture University, Japan Peter Jacko Lancaster University Management School, UK Christian Jacob University of Calgary, Canada Yaochu Jin University of Surrey, UK Colin Johnson University of Kent, UK Gareth Jones Dublin City University, Ireland Laetitia Jourdan Inria/LIFL/CNRS, France Narendra Jussien Ecole des Mines de Nantes/LINA, France Janusz Kacprzyk Polish Academy of Sciences, Poland Theodore Kalamboukis Athens University of Economics and Business, Greece George Kampis Eotvos University, Hungary Dervis Karaboga Erciyes University, Turkey George Karakostas McMaster University, Canada Istvan Karsai ETSU, USA Jozef Kelemen Silesian University, Czech Republic Graham Kendall Nottingham University, UK Didier Keymeulen

NASA – Jet Propulsion Laboratory, USA Daeeun Kim Yonsei University, South Korea Zeynep Kiziltan University of Bologna, Italy Georg Krempl University of Magdeburg, Germany Erhun Kundakcioglu Ozyegin University, Turkey Renaud Lambiotte University of Namur, Belgium Doron Lancet Weizmann Institute of Science, Israel Pier Luca Lanzi Politecnico di Milano, Italy Sanja Lazarova-Molnar University of Southern Denmark, Denmark Doheon Lee KAIST, South Korea Jay Lee

Center for Intelligent Maintenance Systems – UC, USA Eva K. Lee Georgia Tech, USA Tom Lenaerts Université Libre de Bruxelles, Belgium Rafael Leon Universidad Politecnica de Madrid, Spain Shuai Li Cambridge University, UK Lei Li Florida International University, USA Xiaodong Li RMIT University, Australia Joseph Lizier The University of Sydney, Australia Giosue’ Lo Bosco

Università di Palermo, Italy Daniel Lobo University of Maryland Baltimore County, USA

Organization

XI Daniele Loiacono Politecnico di Milano, Italy Jose A. Lozano University of the Basque Country, Spain Paul Lu University of Alberta, Canada Angelo Lucia University of Rhode Island, USA Dario Maggiorini University of Milan, Italy Gilvan Maia Universidade Federal do Cear, Brazil Donato Malerba University of Bari, Italy Lina Mallozzi University of Naples Federico II, Italy Jacek Mandziuk Warsaw University of Technology, Poland Vittorio Maniezzo University of Bologna, Italy Marco Maratea University of Genoa, Italy Elena Marchiori Radboud University, The Netherlands Tiziana Margaria University of Limerick and Lero, Ireland Omer Markovitch University of Groningen, The Netherlands Carlos Martin-Vide Rovira i Virgili University, Spain Dominique Martinez LORIA, France Matteo Matteucci Politecnico di Milano, Italy Giancarlo Mauri University of Milan-Bicocca, Italy Mirjana Mazuran Politecnico di Milano, Italy Suzanne McIntosh NYU Courant Institute, and Cloudera Inc., USA Peter Mcowan Queen Mary University, UK Gabor Melli Sony Interactive Entertainment Inc., Japan Jose Fernando Mendes University of Aveiro, Portugal David Merodio-Codinachs ESA, France Silja Meyer-Nieberg Universität der Bundeswehr München, Germany Martin Middendorf University of Leipzig, Germany Taneli Mielikainen Nokia, Finland Kaisa Miettinen University of Jyvaskyla, Finland Orazio Miglino

University of Naples “Federico II”, Italy Julian Miller University of York, UK Marco Mirolli

ISTC-CNR, Italy Natasa Miskov-Zivanov University of Pittsburgh, USA Carmen Molina-Paris University of Leeds, UK Sara Montagna Università di Bologna, Italy Marco Montes de Oca Clypd, Inc., USA Sanaz Mostaghim Otto von Guericke University Magdeburg, Germany Mohamed Nadif University of Paris Descartes, France Hidemoto Nakada NIAIST, Japan Amir Nakib Università Paris EST Creteil, Laboratoire LISSI, France Mirco Nanni

CNR – ISTI, Italy Sriraam Natarajan Indiana University, USA Chrystopher L. Nehaniv University of Hertfordshire, UK Michael Newell Athens Consulting, LLC Giuseppe Nicosia University of Catania, Italy Xia Ning

IUPUI, USA

XII Organization

Eirini Ntoutsi Leibniz University of Hanover, Germany Michal Or-Guil Humboldt University of Berlin, Germany Mathias Pacher Goethe-Universität Frankfurt am Main, Germany Ping-Feng Pai National Chi Nan University, Taiwan Wei Pang University of Aberdeen, UK George Papastefanatos

IMIS/RC Athena, Greece Luis Paquete University of Coimbra, Portugal Panos Pardalos University of Florida, USA Andrew J. Parkes Nottingham University, UK Andrea Patane’

University of Oxford, UK Joshua Payne University of Zurich, Switzerland Jun Pei University of Florida, USA Nikos Pelekis University of Piraeus, Greece Dimitri Perrin Queensland University of Technology, Australia Koumoutsakos Petros ETH, Switzerland Juan Peypouquet Universidad Tecnica Federico Santa Maria, Chile Andrew Philippides University of Sussex, UK Vincenzo Piuri University of Milan, Italy Alessio Plebe University of Messina, Italy Silvia Poles Noesis Solutions NV Philippe Preux Inria, France Mikhail Prokopenko University of Sydney, Australia Paolo Provero University of Turin, Italy Buyue Qian

IBM T. J. Watson, USA Chao Qian University of Science and Technology of China, China Gunther Raidl TU Wien, Austria Helena R. Dias Lourenco Pompeu Fabra University, Spain Palaniappan Ramaswamy University of Kent, UK Jan Ramon Inria, France Vitorino Ramos Technical University of Lisbon, Portugal Shoba Ranganathan Macquarie University, Australia Cristina Requejo Universidade de Aveiro, Portugal John Rieffel Union College, USA Laura Anna Ripamonti Università degli Studi di Milano, Italy Eduardo Rodriguez-Tello Cinvestav-Tamaulipas, Mexico Andrea Roli Università di Bologna, Italy Vittorio Romano University of Catania, Italy Andre Rosendo University of Cambridge, UK Samuel Rota Bulo Fondazione Bruno Kessler, Italy Arnab Roy Fujitsu Laboratories of America, USA Alessandro Rozza Parthenope University of Naples, Italy Kepa Ruiz-Mirazo University of the Basque Country, Spain Florin Rusu University of California Merced, USA Jakub Rydzewski N. Copernicus University, Poland Nick Sahinidis Carnegie Mellon University, USA

Organization

XIII Francisco C. Santos

INESC-ID Instituto Superior Tecnico, Portugal Claudio Sartori University of Bologna, Italy Frederic Saubion

Université d’Angers, France Andrea Schaerf University of Udine, Italy Oliver Schuetze CINVESTAV-IPN, Mexico Luis Seabra Lopes Universidade of Aveiro, Portugal Roberto Serra University of Modena and Reggio Emilia, Italy Marc Sevaux Lab-STICC, Université de Bretagne-Sud, France Ruey-Lin Sheu National Cheng Kung University, Taiwan Hsu-Shih Shih Tamkang University, Taiwan Patrick Siarry Université de Paris 12, France Alkis Simitsis HP Labs, USA Johannes Sollner Emergentec Biodevelopment GmbH, Germany Ichoua Soumia Embry-Riddle Aeronautical University, USA Giandomenico Spezzano CNR-ICAR, Italy Antoine Spicher LACL University of Paris Est Creteil, France Pasquale Stano University of Salento, Italy Thomas Stibor GSI Helmholtz Centre for Heavy Ion Research,

Germany Catalin Stoean University of Craiova, Romania Reiji Suzuki Nagoya University, Japan Domenico Talia University of Calabria, Italy Kay Chen Tan National University of Singapore, Singapore Letizia Tanca Politecnico di Milano, Italy Charles Taylor UCLA, USA Maguelonne Teisseire

Cemagref – UMR Tetis, France Tzouramanis Theodoros University of the Aegean, Greece Jon Timmis University of York, UK Gianna Toffolo University of Padua, UK Joo Chuan Tong Institute of HPC, Singapore Nickolay Trendaﬁlov

Open University, UK Soichiro Tsuda University of Glasgow, UK Shigeyoshi Tsutsui Hannan University, Japan Aditya Tulsyan MIT, USA Ali Emre Turgut

IRIDIA-ULB, France Karl Tuyls University of Liverpool, UK Jon Umerez University of the Basque Country, Spain Renato Umeton Harvard University, USA Ashish Umre University of Sussex, UK Olgierd Unold Politechnika Wroclawska, Poland Giorgio Valentini Università degli Studi di Milano, Italy Edgar Vallejo

ITESM Campus Estado de Mexico, Mexico Sergi Valverde Pompeu Fabra University, Spain Werner Van Geit EPFL, Switzerland Pascal Van Hentenryck University of Michigan, USA

XIV Organization

Carlos Varela Rensselaer Polytechnic Institute, USA Eleni Vasilaki

University of Shefﬁeld, UK Richard Vaughan Simon Fraser University, Canada Kalyan Veeramachaneni MIT, USA Vassilios Verykios Hellenic Open University, Greece Mario Villalobos-Arias Univesidad de Costa Rica, Costa Rica Marco Villani University of Modena and Reggio Emilia, Italy Katya Vladislavleva Evolved Analytics LLC, Belgium Stefan Voss University of Hamburg, Germany Dean Vucinic Vrije Universiteit Brussel, Belgium Markus Wagner The University of Adelaide, Australia Toby Walsh UNSW Sydney, Australia Lipo Wang Nanyang Technological University, Singapore Liqiang Wang University of Central Florida, USA Rainer Wansch Fraunhofer IIS, Germany Syed Waziruddin Kansas State University, USA Janet Wiles University of Queensland, Australia Man Leung Wong Lingnan University, Hong Kong, SAR China Andrew Wuensche University of Sussex, UK Petros Xanthopoulos University of Central Florida, USA Ning Xiong Malardalen University, Sweden Xin Xu George Washington University, USA Gur Yaari Yale University, USA Larry Yaeger Indiana University, USA Shengxiang Yang De Montfort University, USA Qi Yu Rochester Institute of Technology, USA Zelda Zabinsky University of Washington, USA Ras Zbyszek University of North Carolina, USA Hector Zenil University of Oxford, UK Guang Lan Zhang Boston University, USA Qingfu Zhang City University of Hong Kong, Hong Kong,

SAR China Rui Zhang

IBM Research – Almaden, USA Zhi-Hua Zhou Nanjing University, China Tom Ziemke University of Skovde, Sweden Antanas Zilinskas Vilnius University, Lithuania

Organization

XV XVI Organization Best Paper Awards

MOD 2017 Best Paper Award “Recipes for Translating Big Data Machine Reading to Executable Cellular Signaling Models” Khaled Sayed*, Cheryl Telmer**, Adam Butchy*, and Natasa Miskov-Zivanov*

University of Pittsburgh, USA

Carnegie Mellon University, USA Springer sponsored the MOD 2017 Best Paper Award with a cash prize of EUR 1,000. MOD 2016 Best Paper Award “Machine Learning: Multi-site Evidence-Based Best Practice Discovery” Eva Lee, Yuanbo Wang and Matthew Hagen Eva K. Lee, Professor Director, Center for Operations Research in Medicine and HealthCare H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA MOD 2015 Best Paper Award “Learning with Discrete Least Squares on Multivariate Polynomial Spaces Using Evaluations at Random or Low-Discrepancy Point Sets” Giovanni Migliorati

Ecole Polytechnique Federale de Lausanne – EPFL, Lausanne, Switzerland

Contents

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Khaled Sayed, Cheryl A. Telmer, Adam A. Butchy, and Natasa Miskov-Zivanov . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Michael Cohen . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

Gregor Ulm, Emil Gustavsson, and Mats Jirstrand . . . . . . . . . .

Tome Eftimov, Peter Korošec, and Barbara Koroušić Seljak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Angelo Lucia, Edward Thomas, and Peter A. DiMaggio

Ahmad Mazyad, Fabien Teytaud, and Cyril Fonlupt

S. P. Sidorov, S. V. Mironov, and M. G. Pleshakov

Danny D’Agostino, Andrea Serani, Emilio F. Campana, and Matteo Diez

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Manousos Rigakis, Dimitra Trachanatzi, Magdalene Marinaki, and Yannis Marinakis

Stefano Cagnoni, Paolo Fornacciari, Juxhino Kavaja, Monica Mordonini, Agostino Poggi, Alex Solimeo, and Michele Tomaiuolo

Mauro Dell’Amico, Natalia Selini Hadjidimitriou, Thorsten Koch, and Milena Petkovic

Maria João Alves and Carlos Henggeler Antunes

Jason Adair, Alexander Brownlee, Fabio Daolio, and Gabriela Ochoa

Riccardo Pellegrini, Andrea Serani, Giampaolo Liuzzi, Francesco Rinaldi, Stefano Lucidi, Emilio F. Campana, Umberto Iemma, and Matteo Diez

Beatrice Lazzerini and Francesco Pistolesi

Alice Plebe and Mario Pavone

Stéphane Chrétien and Sébastien Darses

Francesco Bagattini, Paola Cappanera, and Fabio Schoen

Joana Dias, Humberto Rocha, Tiago Ventura, Brígida Ferreira,

XVIII Contents

Contents

XIX

Ogerta Elezaj, Sule Yildirim, and Edlira Kalemi

Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor

Margarita Zaleshina, Alexander Zaleshin, and Adriana Galvani

Ziad Salem, Gerald Radspieler, Karlo Griparić, and Thomas Schmickl

Ramses Sala, Niccolò Baldanzini, and Marco Pierini

Roy Khristopher Bayot and Teresa Gonçalves

Peng Shi and Dario Landa-Silva

Iván Darío López, Cristian Heidelberg Valencia, and Juan Carlos Corrales

Kamer Kaya, Ş. İlker Birbil, M. Kaan Öztürk, and Amir Gohari

Mingxi Li, Yusuke Tanimura, and Hidemoto Nakada

Marco Baioletti, Gabriele Di Bari, Valentina Poggioni, and Mirco Tracolli

Chunfeng Ma, Min Kong, Jun Pei, and Panos M. Pardalos

Guillermo Rela, Franco Robledo, and Pablo Romero

Gabriel Bayá, Antonio Mauttone, Franco Robledo, and Pablo Romero

Cristian Galleguillos, Alina Sîrbu, Zeynep Kiziltan, Ozalp Babaoglu, Andrea Borghesi, and Thomas Bridi

Boris Musarais

Olgierd Unold and Radosław Tarnawski

Humberto Rocha and Joana Dias

Natalia Castro, Graciela Ferreira, Franco Robledo, and Pablo Romero

Matthias Horn, Günther Raidl, and Christian Blum

Reiji Hatsugai and Mary Inaba

Benedikt Klocker, Herbert Fleischner, and Günther R. Raidl

Francesco Calimeri, Mirco Caracciolo, Aldo Marzullo, and Claudio Stamile

Roberto Aringhieri, Davide Dell’Anna, Davide Duma, and Michele Sonnessa

XX Contents

Contents

XXI

Christopher Bacher and Günther R. Raidl

Stefano Mauceri, Louis Smith, James Sweeney, and James McDermott

Alberto Castellini and Giuditta Franco Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Recipes for Translating Big Data Machine

Reading to Executable Cellular

Signaling Models

3 Khaled Sayed , Cheryl A. Telmer , Adam A. Butchy , 1,3,4(&) ₁ and Natasa Miskov-Zivanov

Department of Electrical and Computer Engineering,

University of Pittsburgh, Pittsburgh, PA, USA

₂

{k.sayed,nmzivanov}@pitt.edu

Department of Biological Sciences, Carnegie Mellon University,

Pittsburgh, PA, USA

₃

[email protected]

Department of Bioengineering, University of Pittsburgh,

Pittsburgh, PA, USA

₄

[email protected]

Department of Computational and Systems Biology,

University of Pittsburgh, Pittsburgh, PA, USA

Abstract. Biological literature is rich in mechanistic information that can be

utilized to construct executable models of complex systems to increase our

understanding of health and disease. However, the literature is vast and frag-

mented, and therefore, automation of information extraction from papers and of

model assembly from the extracted information is necessary. We describe here

our approach for translating machine reading outputs, obtained by reading

biological signaling literature, to discrete models of cellular networks. We use

outputs from three different reading engines, and demonstrate the translation of

different features using examples from cancer literature. We also outline several

issues that still arise when assembling cellular network models from

state-of-the-art reading engines. Finally, we illustrate the details of our approach

with a case study in pancreatic cancer. Keywords: Machine reading Big data in literature Text mining Cell signaling networks Automated model generation

1 Introduction

Biological knowledge is voluminous and fragmented; it is nearly impossible to read all scientiﬁc papers on a single topic such as cancer. When building a model of a particular biological system, one example being cancer microenvironment, researchers usually start by searching for existing relevant models and by looking for information about system components and their interactions in published literature.

Although there have been attempts to automate the process of model building

], most often modelers conduct these steps manually, with multiple iterations

2 K. Sayed et al.

between (i) information extraction, (ii) model assembly, (iii) model analysis, and (iv) model validation through comparison with most recently published results. To allow for rapid modeling of complex diseases like cancer, and for efﬁciently using ever-increasing amount of information in published work, we need representation standards and interfaces such that these tasks can be automated. This, in turn, will allow researchers to ask informed, interesting questions that can improve our understanding of health and disease.

The systems biology community has designed and proposed a standardized format for representing biological models called the systems biology markup language (SBML). This language allows for using different software tools, without the need for recreating models speciﬁc for each tool, as well as for sharing the built models between different research groups [

However, the SBML standard is not easily understood by

biologists who create mechanistic models, and thus requires an interface that allows biologists to focus on modeling tasks while hiding the details of the SBML language

[

To this end, the contributions of the work presented in this paper include:

A representation format that is straightforward to use by both machines and humans, and allows for efﬁcient synthesis of models from big data in literature.
An approach to effectively use state-of-the-art machine reading output to create executable discrete models of cellular signaling.
A proposal for directions to further improve automation of assembly of models from big data in literature.

In Sect.

we briefly describe cellular networks, our modeling approach, and our framework that integrates machine reading, model assembly and model analysis. In Sect.

outlines

our approach to translate reading output to the model representation format. Section

discusses other issues that need to be taken into account when building interface between big data reading and model assembly in biology. Section

describes a case

study that uses our translation methodology. Section concludes the paper.

2 Background

2.1 Cellular Networks Intra-cellular networks include signal transduction, gene regulation, and metabolic networks [

Signaling networks are characterized by protein phosphorylation and

binding events, which transduce extracellular signals across the plasma membrane and through the cytoplasm

]. Gene regulatory networks involve translocation of signaling

proteins from the cytoplasm to the nucleus, where the integration of these protein signals act on the genome, resulting in changes in gene expression and cellular pro- cesses [

The regulation of metabolic networks incorporates phosphorylation and

binding, as do signaling networks, and also integrates allosteric regulation, other ]. protein modiﬁcations, and subcellular compartmentalization

Recipes for Translating Big Data Machine Reading

Inter-cellular networks assume interactions between cells of the same or different types. These interactions occur via signaling molecules such as growth factors and cytokines, synthesized and secreted by one cell, and bound to itself or other cells in its surroundings, or via a cell-cell contact.

At all levels of signaling, there are feedforward and feedback loops and crosstalk between signaling pathways to either maintain homeostasis or amplify changes initi- ated by extracellular signals ].

2.2 Modeling Approach When generating executable models, we use a discrete modeling approach previously described in

we represent system com-

ponents as model elements (A, B, and C in the example), where each element is deﬁned as having a discrete number of levels of activity. Each element has a list of regulators called influence set. In our example, A is a positive regulator of C, B and C are positive regulators of A, and C activates itself while B inhibits itself. Additionally, each element has a corresponding update rule, a discrete function of its regulators. In our example, A is a conjunction of B and C, while C is a disjunction of A and C. Although the model

is stochastic, and thus, allows for

structure is ﬁxed, the simulator that we use [ closely recapitulating the behavior of biological pathways and networks.

Fig. 1.

Toy example illustrating our modeling approach.

2.3 Framework Overview To automatically incorporate new reading outputs into models, we have developed a reading-modeling-explanation framework, called DySE (Dynamic System Explana- tion), outlined in Fig.

. This framework allows for (i) expansion of existing models or

assembly of new models from machine reading output, (ii) analysis and explanation of models, and (iii) generation of machine-readable feedback to reading engines. We focus here on the front end of the framework, the translation from reading outputs to the list of elements and their influence sets, with context information, where available .

3 Model Representation Format

To enable comprehensive translation from reading engine outputs to executable models, the models are ﬁrst represented in tabular format. It is important to note here that the tabular representation does not include ﬁnal update rules, that is, the tabular version of the model is further translated into an executable model that can be

4 K. Sayed et al.

Fig. 2. DySE framework.

simulated. Each row in the model table corresponds to one speciﬁc model element (i.e., modeled system component), and the columns are organized in several groups: (i) information about the modeled system component, (ii) information about the component’s regulators, and (iii) information about knowledge sources. This format enables straightforward model extension to represent both additional system components as new rows in the table, and additional component-related features by including new columns in the table. The addition of new columns occurs with improvements in machine reading.

The first group of fields in our representation format includes system component- related information. This information is either used by the executable model, or kept as background information to provide specific details about the system component when creating a hypothesis or explaining outcomes of wet lab experiments.

A. Name _{– full name of element, e.g., “Epidermal growth factor receptor”.}

B. Nomenclature ID – name commonly used in the ﬁeld for cellular components, e.g., “EGFR” is used for “Epidermal growth factor receptor”.

C. Type

_{– these are types of entities used by reading engines as listed in Table} .

D. Unique ID _{– we use identiﬁers corresponding to elements that are listed in} databases, according to Table

E. Location _{– we include subcellular locations and the extracellular space, as listed} in Table

F. Location identiﬁer – we use location identiﬁers as listed in Table

G. Cell line _{– obtained from reading output.}

H. Cell type _{– obtained from reading outputs.}

Table 2. Table 1. The list of cellular locations and Element type and ID database. their IDs from the Gene Ontology [ Element type Database name database.

Protein UniProt ] Location name Location ID Protein family Pfam [ Cytoplasm GO:0005737 Protein complex Bioentities [ Cytosol GO:0005829 Chemical PubChem ] Plasma membrane GO:0005886 Gene HGNC ] Nucleus GO:0005634 Biological process GO ] Mitochondria GO:0005739

Extracellular GO:0005576 Endoplasmic reticulum GO:0005783 Recipes for Translating Big Data Machine Reading

I. Tissue type _{– obtained from reading output.} J. Organism _{– obtained from reading output.} K. Executable model variable _{– variable names currently include above} described ﬁelds B, C, E, and H.

The second group of fields in our representation includes component regulators- related information that is mainly used by executable models, with a few fields used for bookkeeping, similar to the first group of fields.

L. Positive regulator nomenclature IDs _{– list of positive regulators of the} element. M. Negative regulator nomenclature IDs _{– list of negative regulators of} the element.

N. Interaction type _{– for each listed regulator, in case it is known whether} interaction is direct or indirect. O. Interaction mechanism _{– for each known direct interaction, if the mecha-} nism of interaction is known. Mechanisms that can be obtained from reading engines are listed in Table . P. Interaction score _{– for each interaction, a conﬁdence score obtained from} reading.

The third group of ﬁelds in our representation includes interaction-related provenance information. Q. Reference paper IDs _{– for each interaction, we list IDs of published papers} that mention the interaction. This information is obtained directly from reading output. R. Sentences _{– for each interaction, we list sentences describing the interaction.}

This information is obtained directly from reading output. It is worth mentioning that this representation format can be converted into the

SBML format to be used by different software tools and shared between different working groups. Additionally, the tabular format provides an interface that can be easily created or read by biologists, and generated or parsed by a machine.

4 From Reading to Model

We obtain outputs from three types of reading engines, namely REACH

], RUBI-

CON [

, we list the interaction

ﬁles with similar but not exactly the same format. In Table mechanisms that can be obtained from these three reading engines, and in the following sub-sections we outline their differences and the advantages of each reading engine.

6 K. Sayed et al.

Table 3. Intracellular interactions (mechanisms) recognized by the three reading engines.

Reading Recognized mechanisms engine REACH Activation, Inhibition, Binding, Phosphorylation, Dephosphorylation,

] Ubiquitination, Acetylation, Methylation, Increase or Decrease Amount, Transcription, Translocation RUBICON Activation, Inhibition, Promotes, Signaling, Reduce, Induce, Supports,

] Attenuates, Stimulate, Antagonize, Synergize, Increase and Decrease Amount, Abrogates LTR ] Binding, Phosphorylation, Dephosphorylation, Isomerizations

4.1 Simple Interaction Translation

can extract both direct and indirect

The ﬁrst type of reading engine, REACH [ interactions, as well as interaction mechanisms, where available. The simplest and most common reading outputs are those that include only a regulated element and a single regulator, each of them having one of the entity types listed in Table

, with the

interaction mechanism being one of the mechanisms described in Table

Such

interactions have straightforward translation to our representation format, that is, they . are translated into a single table row with some or all of the ﬁelds described in Sect. Given that our modeling formalism accounts for positive and negative regulators, while reading engines can also output speciﬁc mechanisms where available in text, we assume in the translation that Phosphorylation, Acetylation, Increase Amount, and Methylation represent positive regulations, and Dephosphorylation, Ubiquitination, Decrease Amount, and Demethylation represent negative regulations. Additionally, we treat Transcription events as positive regulation.

4.2 Translation of Translocation Interaction We translate translocation events (moving components from one cellular location to another) using the formalism described in

Machine Learning, Optimization, and Big Data 2017

USA USA

ISBN 978-3-319-72925-1

1 Introduction

3 Model Representation Format

4 K. Sayed et al.

B. Nomenclature ID – name commonly used in the ﬁeld for cellular components, e.g., “EGFR” is used for “Epidermal growth factor receptor”.

4 From Reading to Model

6 K. Sayed et al.

Dokumen yang terkait

Panduan Pengisian Data Sanitasi Sekolah Dapodik Versi 2017

Design of Jigs and Fixtures for Hydraulic Press Machine

Pengembangan Aplikasi Antarmuka Layanan Big Data Analysis

Sentiment Analysis Berbasis Big Data Sentiment Analysis Based Big Data

Automatic Exchange of Information sebagai Big Data di Bidang Perpajakan

Conceptual Learning Data Machine Learning

Teaching, Learning, and Assessment Strategy

Introducing Big Data Concepts in an Introductory Technology Course

RPS CCB 210 Basis Data S. Ganjil 2017

Medical Computer Vision Algorithms for Big Data

Dukungan

Links

Machine Learning, Optimization, and Big Data 2017

USA USA

ISBN 978-3-319-72925-1

1 Introduction

3 Model Representation Format

4 K. Sayed et al.

B. Nomenclature ID – name commonly used in the ﬁeld for cellular components, e.g., “EGFR” is used for “Epidermal growth factor receptor”.

4 From Reading to Model

6 K. Sayed et al.

Dokumen yang terkait

Panduan Pengisian Data Sanitasi Sekolah Dapodik Versi 2017

Design of Jigs and Fixtures for Hydraulic Press Machine

Pengembangan Aplikasi Antarmuka Layanan Big Data Analysis

Sentiment Analysis Berbasis Big Data Sentiment Analysis Based Big Data

Automatic Exchange of Information sebagai Big Data di Bidang Perpajakan

Conceptual Learning Data Machine Learning

Teaching, Learning, and Assessment Strategy

Introducing Big Data Concepts in an Introductory Technology Course

RPS CCB 210 Basis Data S. Ganjil 2017

Medical Computer Vision Algorithms for Big Data

Dokumen yang Anda mencari sudah siap untuk unduhkan