Machine Learning, Optimization, and Big Data 2017

  Giuseppe Nicosia · Panos Pardalos (Eds.)

  

Giovanni Giuffrida · Renato Umeton

Machine Learning, Optimization,

  LNCS 10710 and Big Data Third International Conference, MOD 2017 Volterra, Italy, September 14–17, 2017 Revised Selected Papers

  

Lecture Notes in Computer Science 10710

  Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

  Editorial Board

  David Hutchison Lancaster University, Lancaster, UK

  Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA

  Josef Kittler University of Surrey, Guildford, UK

  Jon M. Kleinberg Cornell University, Ithaca, NY, USA

  Friedemann Mattern ETH Zurich, Zurich, Switzerland

  John C. Mitchell Stanford University, Stanford, CA, USA

  Moni Naor Weizmann Institute of Science, Rehovot, Israel

  C. Pandu Rangan Indian Institute of Technology, Madras, India

  Bernhard Steffen TU Dortmund University, Dortmund, Germany

  Demetri Terzopoulos University of California, Los Angeles, CA, USA

  Doug Tygar University of California, Berkeley, CA, USA

  Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at

  • Giuseppe Nicosia Panos Pardalos Giovanni Giuffrida Renato Umeton (Eds.)

  Machine Learning, Optimization, and Big Data

Third International Conference, MOD 2017 Volterra, Italy, September 14–17, 2017 Revised Selected Papers Editors Giuseppe Nicosia Giovanni Giuffrida University of Catania University of Catania Catania Catania Italy Italy Panos Pardalos Renato Umeton University of Florida Harvard University Gainesville, FL Cambridge, MA

USA USA

  ISSN 1611-3349 (electronic) Lecture Notes in Computer Science

ISBN 978-3-319-72925-1

  ISBN 978-3-319-72926-8 (eBook) https://doi.org/10.1007/978-3-319-72926-8 Library of Congress Control Number: 2017962876 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, express or implied, with respect to the material contained herein or for any errors or

omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG

  

Preface

  MOD is an international conference embracing the fields of machine learning, opti- mization, and data science. The third edition, MOD 2017, was organized during September 14–17, 2017 in Volterra (Pisa, Italy), a stunning medieval town dominating the picturesque countryside of Tuscany.

  The key role of machine learning, reinforcement learning, artificial intelligence, large-scale optimization, and big data for developing solutions to some of the greatest challenges we are facing is undeniable. MOD 2017 attracted leading experts from the academic world and industry with the aim of strengthening the connection between these institutions. The 2017 edition of MOD represented a great opportunity for professors, scientists, industry experts, and postgraduate students to learn about recent developments in their own research areas and to learn about research in contiguous research areas, with the aim of creating an environment to share ideas and trigger new collaborations.

  As chairs, it was an honor to organize a premiere conference in these areas and to have received a large variety of innovative and original scientific contributions. During this edition, six plenary lectures were presented:

  Yi-Ke Guo, Department of Computing, Faculty of Engineering, Imperial College London, UK. Founding Director of Data Science Institute Panos Pardalos, Department of Systems Engineering, University of Florida, USA.

  Director of the Center for Applied Optimization Ruslan Salakhutdinov, Machine Learning Department, School of Computer Science at Carnegie Mellon University, USA. Director of AI Research at Apple My Thai, Department of Computer and Information Science and Engineering, University of Florida, USA Jun Pei, Hefei University of Technology, China Vincenzo Sciacca, Cloud and Cognitive Division – IBM Rome, Italy

  There were also two tutorial speakers: Domenico Talia, Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica Università della Calabria, Italy Xin–She Yang, School of Science and Technology – Middlesex University London, UK

  Moreover, the conference hosted the second edition of the industrial session on “Machine Learning, Optimization and Data Science for Real-World Applications”:

  Luca Maria Aiello, Nokia Bell Labs, UK Pierpaolo Basile, University of Bari, Italy VI Preface

  Carlos Castillo, Universitat Pompeu Fabra in Barcelona, Spain Moderator: Aris Anagnostopoulos, Sapienza University of Rome, Italy

  We received 126 submissions from 46 countries and five continents; each manu- script was independently reviewed by a committee formed by at least five members through a blind review process. These proceedings contain 49 research articles written by leading scientists in the fields of machine learning, artificial intelligence, rein- forcement learning, computational optimization, and data science presenting a sub- stantial array of ideas, technologies, algorithms, methods, and applications.

  For MOD 2017, Springer generously sponsored the MOD Best Paper Award. This year, the paper by Khaled Sayed, Cheryl Telmer, Adam Butchy, and Natasa Miskov-Zivanov titled “Recipes for Translating Big Data Machine Reading to Exe- cutable Cellular Signaling Models” received the MOD Best Paper Award.

  This conference could not have been organized without the contributions of these researchers, and so we thank them all for participating. A sincere thank you also goes to all the Program Committee, formed by more than 300 scientists from academia and industry, for their valuable work of selecting the scientific contributions.

  Finally, we would like to express our appreciation to the keynote speakers, tutorial speakers, and the industrial panel who accepted our invitation, and to all the authors who submitted their research papers to MOD 2017. September 2017

  Giuseppe Nicosia Panos Pardalos

  Giovanni Giuffrida Renato Umeton

  

Organization

General Chair

  Renato Umeton Harvard University, USA

  Conference and Technical Program Committee Co-chairs

  Giuseppe Nicosia University of Catania, Italy and University of Reading, UK

  Panos Pardalos University of Florida, USA Giovanni Giuffrida University of Catania, Italy

  Tutorial Chair

  Giuseppe Narzisi New York University Tandon School of Engineering, USA

  Industrial Session Chairs

  Ilaria Bordino UniCredit R&D, Italy Marco Firrincieli UniCredit R&D, Italy Fabio Fumarola UniCredit R&D, Italy Francesco Gullo UniCredit R&D, Italy

  Organizing Committee

  Piero Conca CNR, Italy Jole Costanza Italian Institute of Technology, Milan, Italy Giorgio Jansen University of Catania, Italy Giuseppe Narzisi New York University Tandon School of Engineering,

  USA Andrea Patane’

  University of Oxford, UK Andrea Santoro Queen Mary University London, UK Renato Umeton Harvard University, USA

  Technical Program Committee

  Agostinho Agra Universidade de Aveiro, Portugal Kerem Akartunali University of Strathclyde, UK Richard Allmendinger The University of Manchester, UK Aris Anagnostopoulos Università di Roma La Sapienza, Italy Takaya Arita Nagoya University, Japan Jason Atkin The University of Nottingham, UK Chloe-Agathe Azencott Institut Curie Research Centre, Paris, France Jaume Bacardit Newcastle University, UK James Bailey University of Melbourne, Australia Baski Balasundaram Oklahoma State University, USA Elena Baralis Politecnico di Torino, Italy Xabier E. Barandiaran University of the Basque Country, Spain Cristobal Barba-Gonzalez University of Malaga, Spain Helio J. C. Barbosa

  Laboratório Nacional de Computacao Cientifica, Brazil Roberto Battiti University of Trento, Italy Lucia Beccai Istituto Italiano di Tecnologia, Italy Aurelien Bellet Inria Lille, France Gerardo Beni University of California at Riverside, USA Khaled Benkrid The University of Edinburgh, UK Peter Bentley University College London, UK Katie Bentley Harvard Medical School, USA Heder Bernardino Universidade Federal de Juiz de Fora, Brazil Daniel Berrar Tokyo Institute of Technology, Japan Adam Berry CSIRO, Australia Luc Berthouze University of Sussex, UK Martin Berzins SCI Institute, University of Utah, USA Mauro Birattari

  IRIDIA, Université Libre de Bruxelles, Belgium Leonidas Bleris University of Texas at Dallas, USA Christian Blum Spanish National Research Council, Spain Paul Bourgine École Polytechnique Paris, France Anthony Brabazon University College Dublin, Ireland Paulo Branco Instituto Superior Tecnico, Portugal Juergen Branke University of Warwick, UK Larry Bull University of the West of England, UK Tadeusz Burczynski Polish Academy of Sciences, Poland Robert Busa-Fekete Yahoo! Research, NY, USA Sergiy I Butenko Texas A&M University, USA Stefano Cagnoni University of Parma, Italy Yizhi Cai University of Edinburgh, UK Guido Caldarelli

  IMT Lucca, Italy Alexandre Campo Université Libre de Bruxelles, Belgium Angelo Cangelosi University of Plymouth, UK Salvador Eugenio Caoili University of the Philippines Manila, Philippines Timoteo Carletti University of Namur, Belgium Jonathan Carlson Microsoft Research, USA Celso Carneiro Ribeiro Universidade Federal Fluminense, Brazil Michelangelo Ceci University of Bari, Italy Adelaide Cerveira Universidade de Tras-os-Montes e Alto Douro,

  Portugal

  VIII Organization

  Xu Chang University of Sydney, Australia W. Art Chaovalitwongse University of Washington, USA Antonio Chella Università di Palermo, Italy Ying-Ping Chen National Chiao Tung University, Taiwan Haifeng Chen NEC Labs, USA Keke Chen Wright State University, USA Gregory Chirikjian Johns Hopkins University, USA Silvia Chiusano Politecnico di Torino, Italy Miroslav Chlebik University of Sussex, UK Sung-Bae Cho Yonsei University, South Korea Anders Christensen Lisbon University Institute, Portugal Dominique Chu University of Kent, UK Philippe Codognet

  University Pierre and Marie Curie – Paris 6, France Carlos Coello Coello CINVESTAV-IPN, Mexico George Coghill University of Aberdeen, UK Pietro Colombo University of Insubria, Italy David Cornforth University of Newcastle, UK Luís Correia University of Lisbon, Portugal Chiara Damiani University of Milan-Bicocca, Italy Thomas Dandekar University of Würzburg, Germany Ivan Luciano Danesi Unicredit Bank, Italy Christian Darabos Dartmouth College, USA Kalyanmoy Deb Michigan State University, USA Nicoletta Del Buono University of Bari, Italy Jordi Delgado Universitat Politecnica de Catalunya, Spain Ralf Der MPG, Germany Clarisse Dhaenens Université Lille, France Barbara Di Camillo University of Padua, Italy Gianni Di Caro

  IDSIA, Switzerland Luigi Di Caro University of Turin, Italy Luca Di Gaspero University of Udine, Italy Peter Dittrich Friedrich Schiller University of Jena, Germany Federico Divina Pablo de Olavide University of Seville, Spain Stephan Doerfel Kassel University, Germany Devdatt Dubhashi Chalmers University, Sweden George Dulikravich Florida International University, USA Juan J. Durillo University of Innsbruck, Austria Omer Dushek University of Oxford, UK Marc Ebner Ernst-Moritz-Arndt-Universität Greifswald, Germany Pascale Ehrenfreund The George Washington University, USA Gusz Eiben

  VU Amsterdam, The Netherlands Aniko Ekart Aston University, UK Talbi El-Ghazali University of Lille, France Michael Elberfeld RWTH Aachen University, Germany Michael T. M. Emmerich Leiden University, The Netherlands

  Organization

  IX Anton Eremeev Sobolev Institute of Mathematics, Russia Harold Fellermann Newcastle University, UK Chrisantha Fernando Queen Mary University, UK Cesar Ferri Universidad Politecnica de Valencia, Spain Paola Festa University of Naples Federico II, Italy Jose Rui Figueira Instituto Superior Tecnico, Lisbon, Portugal Grazziela Figueredo The University of Nottingham, UK Alessandro Filisetti Explora Biotech Srl, Italy Christoph Flamm University of Vienna, Austria Enrico Formenti Nice Sophia Antipolis University, France Giuditta Franco University of Verona, Italy Piero Fraternali Politecnico di Milano, Italy Valerio Freschi University of Urbino, Italy Enrique Frias Martinez Telefonica Research, Spain Walter Frisch University of Vienna, Austria Rudolf M. Fuchslin Zurich University of Applied Sciences, Switzerland Claudio Gallicchio University of Pisa, Italy Patrick Gallinari

  LIP6 – University of Paris 6, France Luca Gambardella

  IDSIA, Switzerland Jean-Gabriel Ganascia

  Pierre and Marie Curie University – LIP6, France Xavier Gandibleux Université de Nantes, France Alfredo G. Hernandez-Diaz

  Pablo de Olvide University – Seville, Spain Jose Manuel Garcia Nieto University of Malaga, Spain Paolo Garza Politecnico di Torino, Italy Romaric Gaudel Inria, France Nicholas Geard University of Melbourne, Australia Philip Gerlee Chalmers University, Sweden Mario Giacobini University of Turin, Italy Onofrio Gigliotta University of Naples Federico II, Italy Giovanni Giuffrida University of Catania, Italy Giorgio Stefano Gnecco University of Genoa, Italy Christian Gogu Université Toulouse III, France Faustino Gomez

  IDSIA, Switzerland Michael Granitzer University of Passau, Germany Alex Graudenzi University of Milan-Bicocca, Italy Julie Greensmith University of Nottingham, UK Roderich Gross

  The University of Sheffield, UK Mario Guarracino

  ICAR-CNR, Italy Francesco Gullo Unicredit Bank, Italy Steven Gustafson GE Global Research, USA Jin-Kao Hao University of Angers, France Simon Harding Machine Intelligence Ltd., Canada Richard Hartl University of Vienna, Austria Inman Harvey University of Sussex Jamil Hasan University of Idaho, USA

  X Organization

  Geir Hasle SINTEF ICT, Norway Carlos Henggeler Antunes University of Coimbra, Portugal Francisco Herrera University of Granada, Spain Arjen Hommersom Radboud University, The Netherlands Vasant Honavar Pennsylvania State University, USA Fabrice Huet University of Nice Sophia Antipolis, France Hiroyuki Iizuka Hokkaido University, Japan Takashi Ikegami University of Tokyo, Japan Bordino Ilaria Unicredit Bank, Italy Hisao Ishibuchi Osaka Prefecture University, Japan Peter Jacko Lancaster University Management School, UK Christian Jacob University of Calgary, Canada Yaochu Jin University of Surrey, UK Colin Johnson University of Kent, UK Gareth Jones Dublin City University, Ireland Laetitia Jourdan Inria/LIFL/CNRS, France Narendra Jussien Ecole des Mines de Nantes/LINA, France Janusz Kacprzyk Polish Academy of Sciences, Poland Theodore Kalamboukis Athens University of Economics and Business, Greece George Kampis Eotvos University, Hungary Dervis Karaboga Erciyes University, Turkey George Karakostas McMaster University, Canada Istvan Karsai ETSU, USA Jozef Kelemen Silesian University, Czech Republic Graham Kendall Nottingham University, UK Didier Keymeulen

  NASA – Jet Propulsion Laboratory, USA Daeeun Kim Yonsei University, South Korea Zeynep Kiziltan University of Bologna, Italy Georg Krempl University of Magdeburg, Germany Erhun Kundakcioglu Ozyegin University, Turkey Renaud Lambiotte University of Namur, Belgium Doron Lancet Weizmann Institute of Science, Israel Pier Luca Lanzi Politecnico di Milano, Italy Sanja Lazarova-Molnar University of Southern Denmark, Denmark Doheon Lee KAIST, South Korea Jay Lee

  Center for Intelligent Maintenance Systems – UC, USA Eva K. Lee Georgia Tech, USA Tom Lenaerts Université Libre de Bruxelles, Belgium Rafael Leon Universidad Politecnica de Madrid, Spain Shuai Li Cambridge University, UK Lei Li Florida International University, USA Xiaodong Li RMIT University, Australia Joseph Lizier The University of Sydney, Australia Giosue’ Lo Bosco

  Università di Palermo, Italy Daniel Lobo University of Maryland Baltimore County, USA

  Organization

  XI Daniele Loiacono Politecnico di Milano, Italy Jose A. Lozano University of the Basque Country, Spain Paul Lu University of Alberta, Canada Angelo Lucia University of Rhode Island, USA Dario Maggiorini University of Milan, Italy Gilvan Maia Universidade Federal do Cear, Brazil Donato Malerba University of Bari, Italy Lina Mallozzi University of Naples Federico II, Italy Jacek Mandziuk Warsaw University of Technology, Poland Vittorio Maniezzo University of Bologna, Italy Marco Maratea University of Genoa, Italy Elena Marchiori Radboud University, The Netherlands Tiziana Margaria University of Limerick and Lero, Ireland Omer Markovitch University of Groningen, The Netherlands Carlos Martin-Vide Rovira i Virgili University, Spain Dominique Martinez LORIA, France Matteo Matteucci Politecnico di Milano, Italy Giancarlo Mauri University of Milan-Bicocca, Italy Mirjana Mazuran Politecnico di Milano, Italy Suzanne McIntosh NYU Courant Institute, and Cloudera Inc., USA Peter Mcowan Queen Mary University, UK Gabor Melli Sony Interactive Entertainment Inc., Japan Jose Fernando Mendes University of Aveiro, Portugal David Merodio-Codinachs ESA, France Silja Meyer-Nieberg Universität der Bundeswehr München, Germany Martin Middendorf University of Leipzig, Germany Taneli Mielikainen Nokia, Finland Kaisa Miettinen University of Jyvaskyla, Finland Orazio Miglino

  University of Naples “Federico II”, Italy Julian Miller University of York, UK Marco Mirolli

  ISTC-CNR, Italy Natasa Miskov-Zivanov University of Pittsburgh, USA Carmen Molina-Paris University of Leeds, UK Sara Montagna Università di Bologna, Italy Marco Montes de Oca Clypd, Inc., USA Sanaz Mostaghim Otto von Guericke University Magdeburg, Germany Mohamed Nadif University of Paris Descartes, France Hidemoto Nakada NIAIST, Japan Amir Nakib Università Paris EST Creteil, Laboratoire LISSI, France Mirco Nanni

  CNR – ISTI, Italy Sriraam Natarajan Indiana University, USA Chrystopher L. Nehaniv University of Hertfordshire, UK Michael Newell Athens Consulting, LLC Giuseppe Nicosia University of Catania, Italy Xia Ning

  IUPUI, USA

  XII Organization

  Eirini Ntoutsi Leibniz University of Hanover, Germany Michal Or-Guil Humboldt University of Berlin, Germany Mathias Pacher Goethe-Universität Frankfurt am Main, Germany Ping-Feng Pai National Chi Nan University, Taiwan Wei Pang University of Aberdeen, UK George Papastefanatos

  IMIS/RC Athena, Greece Luis Paquete University of Coimbra, Portugal Panos Pardalos University of Florida, USA Andrew J. Parkes Nottingham University, UK Andrea Patane’

  University of Oxford, UK Joshua Payne University of Zurich, Switzerland Jun Pei University of Florida, USA Nikos Pelekis University of Piraeus, Greece Dimitri Perrin Queensland University of Technology, Australia Koumoutsakos Petros ETH, Switzerland Juan Peypouquet Universidad Tecnica Federico Santa Maria, Chile Andrew Philippides University of Sussex, UK Vincenzo Piuri University of Milan, Italy Alessio Plebe University of Messina, Italy Silvia Poles Noesis Solutions NV Philippe Preux Inria, France Mikhail Prokopenko University of Sydney, Australia Paolo Provero University of Turin, Italy Buyue Qian

  IBM T. J. Watson, USA Chao Qian University of Science and Technology of China, China Gunther Raidl TU Wien, Austria Helena R. Dias Lourenco Pompeu Fabra University, Spain Palaniappan Ramaswamy University of Kent, UK Jan Ramon Inria, France Vitorino Ramos Technical University of Lisbon, Portugal Shoba Ranganathan Macquarie University, Australia Cristina Requejo Universidade de Aveiro, Portugal John Rieffel Union College, USA Laura Anna Ripamonti Università degli Studi di Milano, Italy Eduardo Rodriguez-Tello Cinvestav-Tamaulipas, Mexico Andrea Roli Università di Bologna, Italy Vittorio Romano University of Catania, Italy Andre Rosendo University of Cambridge, UK Samuel Rota Bulo Fondazione Bruno Kessler, Italy Arnab Roy Fujitsu Laboratories of America, USA Alessandro Rozza Parthenope University of Naples, Italy Kepa Ruiz-Mirazo University of the Basque Country, Spain Florin Rusu University of California Merced, USA Jakub Rydzewski N. Copernicus University, Poland Nick Sahinidis Carnegie Mellon University, USA

  Organization

  XIII Francisco C. Santos

  INESC-ID Instituto Superior Tecnico, Portugal Claudio Sartori University of Bologna, Italy Frederic Saubion

  Université d’Angers, France Andrea Schaerf University of Udine, Italy Oliver Schuetze CINVESTAV-IPN, Mexico Luis Seabra Lopes Universidade of Aveiro, Portugal Roberto Serra University of Modena and Reggio Emilia, Italy Marc Sevaux Lab-STICC, Université de Bretagne-Sud, France Ruey-Lin Sheu National Cheng Kung University, Taiwan Hsu-Shih Shih Tamkang University, Taiwan Patrick Siarry Université de Paris 12, France Alkis Simitsis HP Labs, USA Johannes Sollner Emergentec Biodevelopment GmbH, Germany Ichoua Soumia Embry-Riddle Aeronautical University, USA Giandomenico Spezzano CNR-ICAR, Italy Antoine Spicher LACL University of Paris Est Creteil, France Pasquale Stano University of Salento, Italy Thomas Stibor GSI Helmholtz Centre for Heavy Ion Research,

  Germany Catalin Stoean University of Craiova, Romania Reiji Suzuki Nagoya University, Japan Domenico Talia University of Calabria, Italy Kay Chen Tan National University of Singapore, Singapore Letizia Tanca Politecnico di Milano, Italy Charles Taylor UCLA, USA Maguelonne Teisseire

  Cemagref – UMR Tetis, France Tzouramanis Theodoros University of the Aegean, Greece Jon Timmis University of York, UK Gianna Toffolo University of Padua, UK Joo Chuan Tong Institute of HPC, Singapore Nickolay Trendafilov

  Open University, UK Soichiro Tsuda University of Glasgow, UK Shigeyoshi Tsutsui Hannan University, Japan Aditya Tulsyan MIT, USA Ali Emre Turgut

  IRIDIA-ULB, France Karl Tuyls University of Liverpool, UK Jon Umerez University of the Basque Country, Spain Renato Umeton Harvard University, USA Ashish Umre University of Sussex, UK Olgierd Unold Politechnika Wroclawska, Poland Giorgio Valentini Università degli Studi di Milano, Italy Edgar Vallejo

  ITESM Campus Estado de Mexico, Mexico Sergi Valverde Pompeu Fabra University, Spain Werner Van Geit EPFL, Switzerland Pascal Van Hentenryck University of Michigan, USA

  XIV Organization

  Carlos Varela Rensselaer Polytechnic Institute, USA Eleni Vasilaki

  University of Sheffield, UK Richard Vaughan Simon Fraser University, Canada Kalyan Veeramachaneni MIT, USA Vassilios Verykios Hellenic Open University, Greece Mario Villalobos-Arias Univesidad de Costa Rica, Costa Rica Marco Villani University of Modena and Reggio Emilia, Italy Katya Vladislavleva Evolved Analytics LLC, Belgium Stefan Voss University of Hamburg, Germany Dean Vucinic Vrije Universiteit Brussel, Belgium Markus Wagner The University of Adelaide, Australia Toby Walsh UNSW Sydney, Australia Lipo Wang Nanyang Technological University, Singapore Liqiang Wang University of Central Florida, USA Rainer Wansch Fraunhofer IIS, Germany Syed Waziruddin Kansas State University, USA Janet Wiles University of Queensland, Australia Man Leung Wong Lingnan University, Hong Kong, SAR China Andrew Wuensche University of Sussex, UK Petros Xanthopoulos University of Central Florida, USA Ning Xiong Malardalen University, Sweden Xin Xu George Washington University, USA Gur Yaari Yale University, USA Larry Yaeger Indiana University, USA Shengxiang Yang De Montfort University, USA Qi Yu Rochester Institute of Technology, USA Zelda Zabinsky University of Washington, USA Ras Zbyszek University of North Carolina, USA Hector Zenil University of Oxford, UK Guang Lan Zhang Boston University, USA Qingfu Zhang City University of Hong Kong, Hong Kong,

  SAR China Rui Zhang

  IBM Research – Almaden, USA Zhi-Hua Zhou Nanjing University, China Tom Ziemke University of Skovde, Sweden Antanas Zilinskas Vilnius University, Lithuania

  Organization

  XV XVI Organization Best Paper Awards

  MOD 2017 Best Paper Award “Recipes for Translating Big Data Machine Reading to Executable Cellular Signaling Models” Khaled Sayed*, Cheryl Telmer**, Adam Butchy*, and Natasa Miskov-Zivanov*

  • University of Pittsburgh, USA
    • Carnegie Mellon University, USA Springer sponsored the MOD 2017 Best Paper Award with a cash prize of EUR 1,000. MOD 2016 Best Paper Award “Machine Learning: Multi-site Evidence-Based Best Practice Discovery” Eva Lee, Yuanbo Wang and Matthew Hagen Eva K. Lee, Professor Director, Center for Operations Research in Medicine and HealthCare H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA MOD 2015 Best Paper Award “Learning with Discrete Least Squares on Multivariate Polynomial Spaces Using Evaluations at Random or Low-Discrepancy Point Sets” Giovanni Migliorati

  Ecole Polytechnique Federale de Lausanne – EPFL, Lausanne, Switzerland

  

Contents

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

   Khaled Sayed, Cheryl A. Telmer, Adam A. Butchy, and Natasa Miskov-Zivanov . . . . .

  . . . . . . . . . . . . . . .

  . . . . . . . . . . . . . . . . . . . . . . . . . . .

   Michael Cohen . . . . . . . . . . . . . . . . . . . . . . . . . . .

   . . . . . . .

   Gregor Ulm, Emil Gustavsson, and Mats Jirstrand . . . . . . . . . .

   Tome Eftimov, Peter Korošec, and Barbara Koroušić Seljak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

   Angelo Lucia, Edward Thomas, and Peter A. DiMaggio

  Ahmad Mazyad, Fabien Teytaud, and Cyril Fonlupt

  S. P. Sidorov, S. V. Mironov, and M. G. Pleshakov

  Danny D’Agostino, Andrea Serani, Emilio F. Campana, and Matteo Diez

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

   Manousos Rigakis, Dimitra Trachanatzi, Magdalene Marinaki, and Yannis Marinakis

   Stefano Cagnoni, Paolo Fornacciari, Juxhino Kavaja, Monica Mordonini, Agostino Poggi, Alex Solimeo, and Michele Tomaiuolo

   Mauro Dell’Amico, Natalia Selini Hadjidimitriou, Thorsten Koch, and Milena Petkovic

   Maria João Alves and Carlos Henggeler Antunes

   Jason Adair, Alexander Brownlee, Fabio Daolio, and Gabriela Ochoa

   Riccardo Pellegrini, Andrea Serani, Giampaolo Liuzzi, Francesco Rinaldi, Stefano Lucidi, Emilio F. Campana, Umberto Iemma, and Matteo Diez

   Beatrice Lazzerini and Francesco Pistolesi

   Alice Plebe and Mario Pavone

   Stéphane Chrétien and Sébastien Darses

   3

  Francesco Bagattini, Paola Cappanera, and Fabio Schoen

   Joana Dias, Humberto Rocha, Tiago Ventura, Brígida Ferreira,

  XVIII Contents

  Contents

  XIX

   Ogerta Elezaj, Sule Yildirim, and Edlira Kalemi

   Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor

   Margarita Zaleshina, Alexander Zaleshin, and Adriana Galvani

   Ziad Salem, Gerald Radspieler, Karlo Griparić, and Thomas Schmickl

   Ramses Sala, Niccolò Baldanzini, and Marco Pierini

  

  Roy Khristopher Bayot and Teresa Gonçalves

  Peng Shi and Dario Landa-Silva

  Iván Darío López, Cristian Heidelberg Valencia, and Juan Carlos Corrales

  Kamer Kaya, Ş. İlker Birbil, M. Kaan Öztürk, and Amir Gohari

  Mingxi Li, Yusuke Tanimura, and Hidemoto Nakada

  Marco Baioletti, Gabriele Di Bari, Valentina Poggioni, and Mirco Tracolli

   Chunfeng Ma, Min Kong, Jun Pei, and Panos M. Pardalos

   Guillermo Rela, Franco Robledo, and Pablo Romero

  

  Gabriel Bayá, Antonio Mauttone, Franco Robledo, and Pablo Romero

  Cristian Galleguillos, Alina Sîrbu, Zeynep Kiziltan, Ozalp Babaoglu, Andrea Borghesi, and Thomas Bridi

   Boris Musarais

  

  Olgierd Unold and Radosław Tarnawski

  Humberto Rocha and Joana Dias

  Natalia Castro, Graciela Ferreira, Franco Robledo, and Pablo Romero

  Matthias Horn, Günther Raidl, and Christian Blum

  Reiji Hatsugai and Mary Inaba

  Benedikt Klocker, Herbert Fleischner, and Günther R. Raidl

  Francesco Calimeri, Mirco Caracciolo, Aldo Marzullo, and Claudio Stamile

  Roberto Aringhieri, Davide Dell’Anna, Davide Duma, and Michele Sonnessa

  XX Contents

  Contents

  XXI

  

  Christopher Bacher and Günther R. Raidl

  Stefano Mauceri, Louis Smith, James Sweeney, and James McDermott

  Alberto Castellini and Giuditta Franco Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  

Recipes for Translating Big Data Machine

Reading to Executable Cellular

Signaling Models

  1

  2

  3 Khaled Sayed , Cheryl A. Telmer , Adam A. Butchy , 1,3,4(&) 1 and Natasa Miskov-Zivanov

Department of Electrical and Computer Engineering,

University of Pittsburgh, Pittsburgh, PA, USA

2

{k.sayed,nmzivanov}@pitt.edu

Department of Biological Sciences, Carnegie Mellon University,

  

Pittsburgh, PA, USA

3

[email protected]

Department of Bioengineering, University of Pittsburgh,

Pittsburgh, PA, USA

4

[email protected]

Department of Computational and Systems Biology,

  

University of Pittsburgh, Pittsburgh, PA, USA

Abstract. Biological literature is rich in mechanistic information that can be

utilized to construct executable models of complex systems to increase our

understanding of health and disease. However, the literature is vast and frag-

mented, and therefore, automation of information extraction from papers and of

model assembly from the extracted information is necessary. We describe here

our approach for translating machine reading outputs, obtained by reading

biological signaling literature, to discrete models of cellular networks. We use

outputs from three different reading engines, and demonstrate the translation of

different features using examples from cancer literature. We also outline several

issues that still arise when assembling cellular network models from

state-of-the-art reading engines. Finally, we illustrate the details of our approach

with a case study in pancreatic cancer. Keywords: Machine reading Big data in literature Text mining Cell signaling networks Automated model generation

1 Introduction

  Biological knowledge is voluminous and fragmented; it is nearly impossible to read all scientific papers on a single topic such as cancer. When building a model of a particular biological system, one example being cancer microenvironment, researchers usually start by searching for existing relevant models and by looking for information about system components and their interactions in published literature.

  Although there have been attempts to automate the process of model building

   ], most often modelers conduct these steps manually, with multiple iterations

  2 K. Sayed et al.

  between (i) information extraction, (ii) model assembly, (iii) model analysis, and (iv) model validation through comparison with most recently published results. To allow for rapid modeling of complex diseases like cancer, and for efficiently using ever-increasing amount of information in published work, we need representation standards and interfaces such that these tasks can be automated. This, in turn, will allow researchers to ask informed, interesting questions that can improve our understanding of health and disease.

  The systems biology community has designed and proposed a standardized format for representing biological models called the systems biology markup language (SBML). This language allows for using different software tools, without the need for recreating models specific for each tool, as well as for sharing the built models between different research groups [

  However, the SBML standard is not easily understood by

  biologists who create mechanistic models, and thus requires an interface that allows biologists to focus on modeling tasks while hiding the details of the SBML language

  • [

  

  To this end, the contributions of the work presented in this paper include:

  • A representation format that is straightforward to use by both machines and humans, and allows for efficient synthesis of models from big data in literature.
  • An approach to effectively use state-of-the-art machine reading output to create executable discrete models of cellular signaling.
  • A proposal for directions to further improve automation of assembly of models from big data in literature.

  In Sect.

  

  we briefly describe cellular networks, our modeling approach, and our framework that integrates machine reading, model assembly and model analysis. In Sect.

   outlines

  our approach to translate reading output to the model representation format. Section

  

  discusses other issues that need to be taken into account when building interface between big data reading and model assembly in biology. Section

   describes a case

  study that uses our translation methodology. Section concludes the paper.

  2 Background

  2.1 Cellular Networks Intra-cellular networks include signal transduction, gene regulation, and metabolic networks [

  Signaling networks are characterized by protein phosphorylation and

  binding events, which transduce extracellular signals across the plasma membrane and through the cytoplasm

   ]. Gene regulatory networks involve translocation of signaling

  proteins from the cytoplasm to the nucleus, where the integration of these protein signals act on the genome, resulting in changes in gene expression and cellular pro- cesses [

  The regulation of metabolic networks incorporates phosphorylation and

  binding, as do signaling networks, and also integrates allosteric regulation, other ]. protein modifications, and subcellular compartmentalization

  Recipes for Translating Big Data Machine Reading

  3

  Inter-cellular networks assume interactions between cells of the same or different types. These interactions occur via signaling molecules such as growth factors and cytokines, synthesized and secreted by one cell, and bound to itself or other cells in its surroundings, or via a cell-cell contact.

  At all levels of signaling, there are feedforward and feedback loops and crosstalk between signaling pathways to either maintain homeostasis or amplify changes initi- ated by extracellular signals ].

  2.2 Modeling Approach When generating executable models, we use a discrete modeling approach previously described in

  we represent system com-

  ponents as model elements (A, B, and C in the example), where each element is defined as having a discrete number of levels of activity. Each element has a list of regulators called influence set. In our example, A is a positive regulator of C, B and C are positive regulators of A, and C activates itself while B inhibits itself. Additionally, each element has a corresponding update rule, a discrete function of its regulators. In our example, A is a conjunction of B and C, while C is a disjunction of A and C. Although the model

  is stochastic, and thus, allows for

  structure is fixed, the simulator that we use [ closely recapitulating the behavior of biological pathways and networks.

  Fig. 1.

  Toy example illustrating our modeling approach.

  2.3 Framework Overview To automatically incorporate new reading outputs into models, we have developed a reading-modeling-explanation framework, called DySE (Dynamic System Explana- tion), outlined in Fig.

   . This framework allows for (i) expansion of existing models or

  assembly of new models from machine reading output, (ii) analysis and explanation of models, and (iii) generation of machine-readable feedback to reading engines. We focus here on the front end of the framework, the translation from reading outputs to the list of elements and their influence sets, with context information, where available .

3 Model Representation Format

  To enable comprehensive translation from reading engine outputs to executable models, the models are first represented in tabular format. It is important to note here that the tabular representation does not include final update rules, that is, the tabular version of the model is further translated into an executable model that can be

4 K. Sayed et al.

  

Fig. 2. DySE framework.

  simulated. Each row in the model table corresponds to one specific model element (i.e., modeled system component), and the columns are organized in several groups: (i) in- formation about the modeled system component, (ii) information about the compo- nent’s regulators, and (iii) information about knowledge sources. This format enables straightforward model extension to represent both additional system components as new rows in the table, and additional component-related features by including new columns in the table. The addition of new columns occurs with improvements in machine reading.

  The first group of fields in our representation format includes system component- related information. This information is either used by the executable model, or kept as background information to provide specific details about the system component when creating a hypothesis or explaining outcomes of wet lab experiments.

  A. Name – full name of element, e.g., “Epidermal growth factor receptor”.

B. Nomenclature ID – name commonly used in the field for cellular components, e.g., “EGFR” is used for “Epidermal growth factor receptor”.

  C. Type

  • – these are types of entities used by reading engines as listed in Table .

  D. Unique ID – we use identifiers corresponding to elements that are listed in databases, according to Table

  

  E. Location – we include subcellular locations and the extracellular space, as listed in Table

  

  

  F. Location identifier – we use location identifiers as listed in Table

  G. Cell line – obtained from reading output.

  H. Cell type – obtained from reading outputs.

  Table 2. Table 1. The list of cellular locations and Element type and ID database. their IDs from the Gene Ontology [ Element type Database name database.

  Protein UniProt ] Location name Location ID Protein family Pfam [ Cytoplasm GO:0005737 Protein complex Bioentities [ Cytosol GO:0005829 Chemical PubChem ] Plasma membrane GO:0005886 Gene HGNC ] Nucleus GO:0005634 Biological process GO ] Mitochondria GO:0005739

  Extracellular GO:0005576 Endoplasmic reticulum GO:0005783 Recipes for Translating Big Data Machine Reading

  5

  I. Tissue type – obtained from reading output. J. Organism – obtained from reading output. K. Executable model variable – variable names currently include above described fields B, C, E, and H.

  The second group of fields in our representation includes component regulators- related information that is mainly used by executable models, with a few fields used for bookkeeping, similar to the first group of fields.

  L. Positive regulator nomenclature IDs – list of positive regulators of the element. M. Negative regulator nomenclature IDs – list of negative regulators of the element.

  N. Interaction type – for each listed regulator, in case it is known whether interaction is direct or indirect. O. Interaction mechanism – for each known direct interaction, if the mecha- nism of interaction is known. Mechanisms that can be obtained from reading engines are listed in Table . P. Interaction score – for each interaction, a confidence score obtained from reading.

  The third group of fields in our representation includes interaction-related provenance information. Q. Reference paper IDs – for each interaction, we list IDs of published papers that mention the interaction. This information is obtained directly from reading output. R. Sentences – for each interaction, we list sentences describing the interaction.

  This information is obtained directly from reading output. It is worth mentioning that this representation format can be converted into the

  SBML format to be used by different software tools and shared between different working groups. Additionally, the tabular format provides an interface that can be easily created or read by biologists, and generated or parsed by a machine.

4 From Reading to Model

  We obtain outputs from three types of reading engines, namely REACH

   ], RUBI-

  CON [

   , we list the interaction

  files with similar but not exactly the same format. In Table mechanisms that can be obtained from these three reading engines, and in the following sub-sections we outline their differences and the advantages of each reading engine.

6 K. Sayed et al.

  

Table 3. Intracellular interactions (mechanisms) recognized by the three reading engines.

Reading Recognized mechanisms engine REACH Activation, Inhibition, Binding, Phosphorylation, Dephosphorylation,

   ] Ubiquitination, Acetylation, Methylation, Increase or Decrease Amount, Transcription, Translocation RUBICON Activation, Inhibition, Promotes, Signaling, Reduce, Induce, Supports,

   ] Attenuates, Stimulate, Antagonize, Synergize, Increase and Decrease Amount, Abrogates LTR ] Binding, Phosphorylation, Dephosphorylation, Isomerizations

  4.1 Simple Interaction Translation

  can extract both direct and indirect

  The first type of reading engine, REACH [ interactions, as well as interaction mechanisms, where available. The simplest and most common reading outputs are those that include only a regulated element and a single regulator, each of them having one of the entity types listed in Table

   , with the

  interaction mechanism being one of the mechanisms described in Table

  Such

  interactions have straightforward translation to our representation format, that is, they . are translated into a single table row with some or all of the fields described in Sect. Given that our modeling formalism accounts for positive and negative regulators, while reading engines can also output specific mechanisms where available in text, we assume in the translation that Phosphorylation, Acetylation, Increase Amount, and Methylation represent positive regulations, and Dephosphorylation, Ubiquitination, Decrease Amount, and Demethylation represent negative regulations. Additionally, we treat Transcription events as positive regulation.

  4.2 Translation of Translocation Interaction We translate translocation events (moving components from one cellular location to another) using the formalism described in