Machine Learning, Optimization, and Big Data 2017
Giuseppe Nicosia · Panos Pardalos (Eds.)
Giovanni Giuffrida · Renato Umeton
Machine Learning, Optimization,LNCS 10710 and Big Data Third International Conference, MOD 2017 Volterra, Italy, September 14–17, 2017 Revised Selected Papers
Lecture Notes in Computer Science 10710
Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison Lancaster University, Lancaster, UK
Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler University of Surrey, Guildford, UK
Jon M. Kleinberg Cornell University, Ithaca, NY, USA
Friedemann Mattern ETH Zurich, Zurich, Switzerland
John C. Mitchell Stanford University, Stanford, CA, USA
Moni Naor Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan Indian Institute of Technology, Madras, India
Bernhard Steffen TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos University of California, Los Angeles, CA, USA
Doug Tygar University of California, Berkeley, CA, USA
Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at
- • Giuseppe Nicosia Panos Pardalos • Giovanni Giuffrida Renato Umeton (Eds.)
Machine Learning, Optimization, and Big Data
Third International Conference, MOD 2017 Volterra, Italy, September 14–17, 2017 Revised Selected Papers Editors Giuseppe Nicosia Giovanni Giuffrida University of Catania University of Catania Catania Catania Italy Italy Panos Pardalos Renato Umeton University of Florida Harvard University Gainesville, FL Cambridge, MA
USA USA
ISSN 1611-3349 (electronic) Lecture Notes in Computer Science
ISBN 978-3-319-72925-1
ISBN 978-3-319-72926-8 (eBook) https://doi.org/10.1007/978-3-319-72926-8 Library of Congress Control Number: 2017962876 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG
Preface
MOD is an international conference embracing the fields of machine learning, opti- mization, and data science. The third edition, MOD 2017, was organized during September 14–17, 2017 in Volterra (Pisa, Italy), a stunning medieval town dominating the picturesque countryside of Tuscany.
The key role of machine learning, reinforcement learning, artificial intelligence, large-scale optimization, and big data for developing solutions to some of the greatest challenges we are facing is undeniable. MOD 2017 attracted leading experts from the academic world and industry with the aim of strengthening the connection between these institutions. The 2017 edition of MOD represented a great opportunity for professors, scientists, industry experts, and postgraduate students to learn about recent developments in their own research areas and to learn about research in contiguous research areas, with the aim of creating an environment to share ideas and trigger new collaborations.
As chairs, it was an honor to organize a premiere conference in these areas and to have received a large variety of innovative and original scientific contributions. During this edition, six plenary lectures were presented:
Yi-Ke Guo, Department of Computing, Faculty of Engineering, Imperial College London, UK. Founding Director of Data Science Institute Panos Pardalos, Department of Systems Engineering, University of Florida, USA.
Director of the Center for Applied Optimization Ruslan Salakhutdinov, Machine Learning Department, School of Computer Science at Carnegie Mellon University, USA. Director of AI Research at Apple My Thai, Department of Computer and Information Science and Engineering, University of Florida, USA Jun Pei, Hefei University of Technology, China Vincenzo Sciacca, Cloud and Cognitive Division – IBM Rome, Italy
There were also two tutorial speakers: Domenico Talia, Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica Università della Calabria, Italy Xin–She Yang, School of Science and Technology – Middlesex University London, UK
Moreover, the conference hosted the second edition of the industrial session on “Machine Learning, Optimization and Data Science for Real-World Applications”:
Luca Maria Aiello, Nokia Bell Labs, UK Pierpaolo Basile, University of Bari, Italy VI Preface
Carlos Castillo, Universitat Pompeu Fabra in Barcelona, Spain Moderator: Aris Anagnostopoulos, Sapienza University of Rome, Italy
We received 126 submissions from 46 countries and five continents; each manu- script was independently reviewed by a committee formed by at least five members through a blind review process. These proceedings contain 49 research articles written by leading scientists in the fields of machine learning, artificial intelligence, rein- forcement learning, computational optimization, and data science presenting a sub- stantial array of ideas, technologies, algorithms, methods, and applications.
For MOD 2017, Springer generously sponsored the MOD Best Paper Award. This year, the paper by Khaled Sayed, Cheryl Telmer, Adam Butchy, and Natasa Miskov-Zivanov titled “Recipes for Translating Big Data Machine Reading to Exe- cutable Cellular Signaling Models” received the MOD Best Paper Award.
This conference could not have been organized without the contributions of these researchers, and so we thank them all for participating. A sincere thank you also goes to all the Program Committee, formed by more than 300 scientists from academia and industry, for their valuable work of selecting the scientific contributions.
Finally, we would like to express our appreciation to the keynote speakers, tutorial speakers, and the industrial panel who accepted our invitation, and to all the authors who submitted their research papers to MOD 2017. September 2017
Giuseppe Nicosia Panos Pardalos
Giovanni Giuffrida Renato Umeton
Organization
General ChairRenato Umeton Harvard University, USA
Conference and Technical Program Committee Co-chairs
Giuseppe Nicosia University of Catania, Italy and University of Reading, UK
Panos Pardalos University of Florida, USA Giovanni Giuffrida University of Catania, Italy
Tutorial Chair
Giuseppe Narzisi New York University Tandon School of Engineering, USA
Industrial Session Chairs
Ilaria Bordino UniCredit R&D, Italy Marco Firrincieli UniCredit R&D, Italy Fabio Fumarola UniCredit R&D, Italy Francesco Gullo UniCredit R&D, Italy
Organizing Committee
Piero Conca CNR, Italy Jole Costanza Italian Institute of Technology, Milan, Italy Giorgio Jansen University of Catania, Italy Giuseppe Narzisi New York University Tandon School of Engineering,
USA Andrea Patane’
University of Oxford, UK Andrea Santoro Queen Mary University London, UK Renato Umeton Harvard University, USA
Technical Program Committee
Agostinho Agra Universidade de Aveiro, Portugal Kerem Akartunali University of Strathclyde, UK Richard Allmendinger The University of Manchester, UK Aris Anagnostopoulos Università di Roma La Sapienza, Italy Takaya Arita Nagoya University, Japan Jason Atkin The University of Nottingham, UK Chloe-Agathe Azencott Institut Curie Research Centre, Paris, France Jaume Bacardit Newcastle University, UK James Bailey University of Melbourne, Australia Baski Balasundaram Oklahoma State University, USA Elena Baralis Politecnico di Torino, Italy Xabier E. Barandiaran University of the Basque Country, Spain Cristobal Barba-Gonzalez University of Malaga, Spain Helio J. C. Barbosa
Laboratório Nacional de Computacao Cientifica, Brazil Roberto Battiti University of Trento, Italy Lucia Beccai Istituto Italiano di Tecnologia, Italy Aurelien Bellet Inria Lille, France Gerardo Beni University of California at Riverside, USA Khaled Benkrid The University of Edinburgh, UK Peter Bentley University College London, UK Katie Bentley Harvard Medical School, USA Heder Bernardino Universidade Federal de Juiz de Fora, Brazil Daniel Berrar Tokyo Institute of Technology, Japan Adam Berry CSIRO, Australia Luc Berthouze University of Sussex, UK Martin Berzins SCI Institute, University of Utah, USA Mauro Birattari
IRIDIA, Université Libre de Bruxelles, Belgium Leonidas Bleris University of Texas at Dallas, USA Christian Blum Spanish National Research Council, Spain Paul Bourgine École Polytechnique Paris, France Anthony Brabazon University College Dublin, Ireland Paulo Branco Instituto Superior Tecnico, Portugal Juergen Branke University of Warwick, UK Larry Bull University of the West of England, UK Tadeusz Burczynski Polish Academy of Sciences, Poland Robert Busa-Fekete Yahoo! Research, NY, USA Sergiy I Butenko Texas A&M University, USA Stefano Cagnoni University of Parma, Italy Yizhi Cai University of Edinburgh, UK Guido Caldarelli
IMT Lucca, Italy Alexandre Campo Université Libre de Bruxelles, Belgium Angelo Cangelosi University of Plymouth, UK Salvador Eugenio Caoili University of the Philippines Manila, Philippines Timoteo Carletti University of Namur, Belgium Jonathan Carlson Microsoft Research, USA Celso Carneiro Ribeiro Universidade Federal Fluminense, Brazil Michelangelo Ceci University of Bari, Italy Adelaide Cerveira Universidade de Tras-os-Montes e Alto Douro,
Portugal
VIII Organization
Xu Chang University of Sydney, Australia W. Art Chaovalitwongse University of Washington, USA Antonio Chella Università di Palermo, Italy Ying-Ping Chen National Chiao Tung University, Taiwan Haifeng Chen NEC Labs, USA Keke Chen Wright State University, USA Gregory Chirikjian Johns Hopkins University, USA Silvia Chiusano Politecnico di Torino, Italy Miroslav Chlebik University of Sussex, UK Sung-Bae Cho Yonsei University, South Korea Anders Christensen Lisbon University Institute, Portugal Dominique Chu University of Kent, UK Philippe Codognet
University Pierre and Marie Curie – Paris 6, France Carlos Coello Coello CINVESTAV-IPN, Mexico George Coghill University of Aberdeen, UK Pietro Colombo University of Insubria, Italy David Cornforth University of Newcastle, UK Luís Correia University of Lisbon, Portugal Chiara Damiani University of Milan-Bicocca, Italy Thomas Dandekar University of Würzburg, Germany Ivan Luciano Danesi Unicredit Bank, Italy Christian Darabos Dartmouth College, USA Kalyanmoy Deb Michigan State University, USA Nicoletta Del Buono University of Bari, Italy Jordi Delgado Universitat Politecnica de Catalunya, Spain Ralf Der MPG, Germany Clarisse Dhaenens Université Lille, France Barbara Di Camillo University of Padua, Italy Gianni Di Caro
IDSIA, Switzerland Luigi Di Caro University of Turin, Italy Luca Di Gaspero University of Udine, Italy Peter Dittrich Friedrich Schiller University of Jena, Germany Federico Divina Pablo de Olavide University of Seville, Spain Stephan Doerfel Kassel University, Germany Devdatt Dubhashi Chalmers University, Sweden George Dulikravich Florida International University, USA Juan J. Durillo University of Innsbruck, Austria Omer Dushek University of Oxford, UK Marc Ebner Ernst-Moritz-Arndt-Universität Greifswald, Germany Pascale Ehrenfreund The George Washington University, USA Gusz Eiben
VU Amsterdam, The Netherlands Aniko Ekart Aston University, UK Talbi El-Ghazali University of Lille, France Michael Elberfeld RWTH Aachen University, Germany Michael T. M. Emmerich Leiden University, The Netherlands
Organization
IX Anton Eremeev Sobolev Institute of Mathematics, Russia Harold Fellermann Newcastle University, UK Chrisantha Fernando Queen Mary University, UK Cesar Ferri Universidad Politecnica de Valencia, Spain Paola Festa University of Naples Federico II, Italy Jose Rui Figueira Instituto Superior Tecnico, Lisbon, Portugal Grazziela Figueredo The University of Nottingham, UK Alessandro Filisetti Explora Biotech Srl, Italy Christoph Flamm University of Vienna, Austria Enrico Formenti Nice Sophia Antipolis University, France Giuditta Franco University of Verona, Italy Piero Fraternali Politecnico di Milano, Italy Valerio Freschi University of Urbino, Italy Enrique Frias Martinez Telefonica Research, Spain Walter Frisch University of Vienna, Austria Rudolf M. Fuchslin Zurich University of Applied Sciences, Switzerland Claudio Gallicchio University of Pisa, Italy Patrick Gallinari
LIP6 – University of Paris 6, France Luca Gambardella
IDSIA, Switzerland Jean-Gabriel Ganascia
Pierre and Marie Curie University – LIP6, France Xavier Gandibleux Université de Nantes, France Alfredo G. Hernandez-Diaz
Pablo de Olvide University – Seville, Spain Jose Manuel Garcia Nieto University of Malaga, Spain Paolo Garza Politecnico di Torino, Italy Romaric Gaudel Inria, France Nicholas Geard University of Melbourne, Australia Philip Gerlee Chalmers University, Sweden Mario Giacobini University of Turin, Italy Onofrio Gigliotta University of Naples Federico II, Italy Giovanni Giuffrida University of Catania, Italy Giorgio Stefano Gnecco University of Genoa, Italy Christian Gogu Université Toulouse III, France Faustino Gomez
IDSIA, Switzerland Michael Granitzer University of Passau, Germany Alex Graudenzi University of Milan-Bicocca, Italy Julie Greensmith University of Nottingham, UK Roderich Gross
The University of Sheffield, UK Mario Guarracino
ICAR-CNR, Italy Francesco Gullo Unicredit Bank, Italy Steven Gustafson GE Global Research, USA Jin-Kao Hao University of Angers, France Simon Harding Machine Intelligence Ltd., Canada Richard Hartl University of Vienna, Austria Inman Harvey University of Sussex Jamil Hasan University of Idaho, USA
X Organization
Geir Hasle SINTEF ICT, Norway Carlos Henggeler Antunes University of Coimbra, Portugal Francisco Herrera University of Granada, Spain Arjen Hommersom Radboud University, The Netherlands Vasant Honavar Pennsylvania State University, USA Fabrice Huet University of Nice Sophia Antipolis, France Hiroyuki Iizuka Hokkaido University, Japan Takashi Ikegami University of Tokyo, Japan Bordino Ilaria Unicredit Bank, Italy Hisao Ishibuchi Osaka Prefecture University, Japan Peter Jacko Lancaster University Management School, UK Christian Jacob University of Calgary, Canada Yaochu Jin University of Surrey, UK Colin Johnson University of Kent, UK Gareth Jones Dublin City University, Ireland Laetitia Jourdan Inria/LIFL/CNRS, France Narendra Jussien Ecole des Mines de Nantes/LINA, France Janusz Kacprzyk Polish Academy of Sciences, Poland Theodore Kalamboukis Athens University of Economics and Business, Greece George Kampis Eotvos University, Hungary Dervis Karaboga Erciyes University, Turkey George Karakostas McMaster University, Canada Istvan Karsai ETSU, USA Jozef Kelemen Silesian University, Czech Republic Graham Kendall Nottingham University, UK Didier Keymeulen
NASA – Jet Propulsion Laboratory, USA Daeeun Kim Yonsei University, South Korea Zeynep Kiziltan University of Bologna, Italy Georg Krempl University of Magdeburg, Germany Erhun Kundakcioglu Ozyegin University, Turkey Renaud Lambiotte University of Namur, Belgium Doron Lancet Weizmann Institute of Science, Israel Pier Luca Lanzi Politecnico di Milano, Italy Sanja Lazarova-Molnar University of Southern Denmark, Denmark Doheon Lee KAIST, South Korea Jay Lee
Center for Intelligent Maintenance Systems – UC, USA Eva K. Lee Georgia Tech, USA Tom Lenaerts Université Libre de Bruxelles, Belgium Rafael Leon Universidad Politecnica de Madrid, Spain Shuai Li Cambridge University, UK Lei Li Florida International University, USA Xiaodong Li RMIT University, Australia Joseph Lizier The University of Sydney, Australia Giosue’ Lo Bosco
Università di Palermo, Italy Daniel Lobo University of Maryland Baltimore County, USA
Organization
XI Daniele Loiacono Politecnico di Milano, Italy Jose A. Lozano University of the Basque Country, Spain Paul Lu University of Alberta, Canada Angelo Lucia University of Rhode Island, USA Dario Maggiorini University of Milan, Italy Gilvan Maia Universidade Federal do Cear, Brazil Donato Malerba University of Bari, Italy Lina Mallozzi University of Naples Federico II, Italy Jacek Mandziuk Warsaw University of Technology, Poland Vittorio Maniezzo University of Bologna, Italy Marco Maratea University of Genoa, Italy Elena Marchiori Radboud University, The Netherlands Tiziana Margaria University of Limerick and Lero, Ireland Omer Markovitch University of Groningen, The Netherlands Carlos Martin-Vide Rovira i Virgili University, Spain Dominique Martinez LORIA, France Matteo Matteucci Politecnico di Milano, Italy Giancarlo Mauri University of Milan-Bicocca, Italy Mirjana Mazuran Politecnico di Milano, Italy Suzanne McIntosh NYU Courant Institute, and Cloudera Inc., USA Peter Mcowan Queen Mary University, UK Gabor Melli Sony Interactive Entertainment Inc., Japan Jose Fernando Mendes University of Aveiro, Portugal David Merodio-Codinachs ESA, France Silja Meyer-Nieberg Universität der Bundeswehr München, Germany Martin Middendorf University of Leipzig, Germany Taneli Mielikainen Nokia, Finland Kaisa Miettinen University of Jyvaskyla, Finland Orazio Miglino
University of Naples “Federico II”, Italy Julian Miller University of York, UK Marco Mirolli
ISTC-CNR, Italy Natasa Miskov-Zivanov University of Pittsburgh, USA Carmen Molina-Paris University of Leeds, UK Sara Montagna Università di Bologna, Italy Marco Montes de Oca Clypd, Inc., USA Sanaz Mostaghim Otto von Guericke University Magdeburg, Germany Mohamed Nadif University of Paris Descartes, France Hidemoto Nakada NIAIST, Japan Amir Nakib Università Paris EST Creteil, Laboratoire LISSI, France Mirco Nanni
CNR – ISTI, Italy Sriraam Natarajan Indiana University, USA Chrystopher L. Nehaniv University of Hertfordshire, UK Michael Newell Athens Consulting, LLC Giuseppe Nicosia University of Catania, Italy Xia Ning
IUPUI, USA
XII Organization
Eirini Ntoutsi Leibniz University of Hanover, Germany Michal Or-Guil Humboldt University of Berlin, Germany Mathias Pacher Goethe-Universität Frankfurt am Main, Germany Ping-Feng Pai National Chi Nan University, Taiwan Wei Pang University of Aberdeen, UK George Papastefanatos
IMIS/RC Athena, Greece Luis Paquete University of Coimbra, Portugal Panos Pardalos University of Florida, USA Andrew J. Parkes Nottingham University, UK Andrea Patane’
University of Oxford, UK Joshua Payne University of Zurich, Switzerland Jun Pei University of Florida, USA Nikos Pelekis University of Piraeus, Greece Dimitri Perrin Queensland University of Technology, Australia Koumoutsakos Petros ETH, Switzerland Juan Peypouquet Universidad Tecnica Federico Santa Maria, Chile Andrew Philippides University of Sussex, UK Vincenzo Piuri University of Milan, Italy Alessio Plebe University of Messina, Italy Silvia Poles Noesis Solutions NV Philippe Preux Inria, France Mikhail Prokopenko University of Sydney, Australia Paolo Provero University of Turin, Italy Buyue Qian
IBM T. J. Watson, USA Chao Qian University of Science and Technology of China, China Gunther Raidl TU Wien, Austria Helena R. Dias Lourenco Pompeu Fabra University, Spain Palaniappan Ramaswamy University of Kent, UK Jan Ramon Inria, France Vitorino Ramos Technical University of Lisbon, Portugal Shoba Ranganathan Macquarie University, Australia Cristina Requejo Universidade de Aveiro, Portugal John Rieffel Union College, USA Laura Anna Ripamonti Università degli Studi di Milano, Italy Eduardo Rodriguez-Tello Cinvestav-Tamaulipas, Mexico Andrea Roli Università di Bologna, Italy Vittorio Romano University of Catania, Italy Andre Rosendo University of Cambridge, UK Samuel Rota Bulo Fondazione Bruno Kessler, Italy Arnab Roy Fujitsu Laboratories of America, USA Alessandro Rozza Parthenope University of Naples, Italy Kepa Ruiz-Mirazo University of the Basque Country, Spain Florin Rusu University of California Merced, USA Jakub Rydzewski N. Copernicus University, Poland Nick Sahinidis Carnegie Mellon University, USA
Organization
XIII Francisco C. Santos
INESC-ID Instituto Superior Tecnico, Portugal Claudio Sartori University of Bologna, Italy Frederic Saubion
Université d’Angers, France Andrea Schaerf University of Udine, Italy Oliver Schuetze CINVESTAV-IPN, Mexico Luis Seabra Lopes Universidade of Aveiro, Portugal Roberto Serra University of Modena and Reggio Emilia, Italy Marc Sevaux Lab-STICC, Université de Bretagne-Sud, France Ruey-Lin Sheu National Cheng Kung University, Taiwan Hsu-Shih Shih Tamkang University, Taiwan Patrick Siarry Université de Paris 12, France Alkis Simitsis HP Labs, USA Johannes Sollner Emergentec Biodevelopment GmbH, Germany Ichoua Soumia Embry-Riddle Aeronautical University, USA Giandomenico Spezzano CNR-ICAR, Italy Antoine Spicher LACL University of Paris Est Creteil, France Pasquale Stano University of Salento, Italy Thomas Stibor GSI Helmholtz Centre for Heavy Ion Research,
Germany Catalin Stoean University of Craiova, Romania Reiji Suzuki Nagoya University, Japan Domenico Talia University of Calabria, Italy Kay Chen Tan National University of Singapore, Singapore Letizia Tanca Politecnico di Milano, Italy Charles Taylor UCLA, USA Maguelonne Teisseire
Cemagref – UMR Tetis, France Tzouramanis Theodoros University of the Aegean, Greece Jon Timmis University of York, UK Gianna Toffolo University of Padua, UK Joo Chuan Tong Institute of HPC, Singapore Nickolay Trendafilov
Open University, UK Soichiro Tsuda University of Glasgow, UK Shigeyoshi Tsutsui Hannan University, Japan Aditya Tulsyan MIT, USA Ali Emre Turgut
IRIDIA-ULB, France Karl Tuyls University of Liverpool, UK Jon Umerez University of the Basque Country, Spain Renato Umeton Harvard University, USA Ashish Umre University of Sussex, UK Olgierd Unold Politechnika Wroclawska, Poland Giorgio Valentini Università degli Studi di Milano, Italy Edgar Vallejo
ITESM Campus Estado de Mexico, Mexico Sergi Valverde Pompeu Fabra University, Spain Werner Van Geit EPFL, Switzerland Pascal Van Hentenryck University of Michigan, USA
XIV Organization
Carlos Varela Rensselaer Polytechnic Institute, USA Eleni Vasilaki
University of Sheffield, UK Richard Vaughan Simon Fraser University, Canada Kalyan Veeramachaneni MIT, USA Vassilios Verykios Hellenic Open University, Greece Mario Villalobos-Arias Univesidad de Costa Rica, Costa Rica Marco Villani University of Modena and Reggio Emilia, Italy Katya Vladislavleva Evolved Analytics LLC, Belgium Stefan Voss University of Hamburg, Germany Dean Vucinic Vrije Universiteit Brussel, Belgium Markus Wagner The University of Adelaide, Australia Toby Walsh UNSW Sydney, Australia Lipo Wang Nanyang Technological University, Singapore Liqiang Wang University of Central Florida, USA Rainer Wansch Fraunhofer IIS, Germany Syed Waziruddin Kansas State University, USA Janet Wiles University of Queensland, Australia Man Leung Wong Lingnan University, Hong Kong, SAR China Andrew Wuensche University of Sussex, UK Petros Xanthopoulos University of Central Florida, USA Ning Xiong Malardalen University, Sweden Xin Xu George Washington University, USA Gur Yaari Yale University, USA Larry Yaeger Indiana University, USA Shengxiang Yang De Montfort University, USA Qi Yu Rochester Institute of Technology, USA Zelda Zabinsky University of Washington, USA Ras Zbyszek University of North Carolina, USA Hector Zenil University of Oxford, UK Guang Lan Zhang Boston University, USA Qingfu Zhang City University of Hong Kong, Hong Kong,
SAR China Rui Zhang
IBM Research – Almaden, USA Zhi-Hua Zhou Nanjing University, China Tom Ziemke University of Skovde, Sweden Antanas Zilinskas Vilnius University, Lithuania
Organization
XV XVI Organization Best Paper Awards
MOD 2017 Best Paper Award “Recipes for Translating Big Data Machine Reading to Executable Cellular Signaling Models” Khaled Sayed*, Cheryl Telmer**, Adam Butchy*, and Natasa Miskov-Zivanov*
- University of Pittsburgh, USA
- Carnegie Mellon University, USA Springer sponsored the MOD 2017 Best Paper Award with a cash prize of EUR 1,000. MOD 2016 Best Paper Award “Machine Learning: Multi-site Evidence-Based Best Practice Discovery” Eva Lee, Yuanbo Wang and Matthew Hagen Eva K. Lee, Professor Director, Center for Operations Research in Medicine and HealthCare H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA MOD 2015 Best Paper Award “Learning with Discrete Least Squares on Multivariate Polynomial Spaces Using Evaluations at Random or Low-Discrepancy Point Sets” Giovanni Migliorati
Ecole Polytechnique Federale de Lausanne – EPFL, Lausanne, Switzerland
Contents
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Khaled Sayed, Cheryl A. Telmer, Adam A. Butchy, and Natasa Miskov-Zivanov . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Michael Cohen . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
Gregor Ulm, Emil Gustavsson, and Mats Jirstrand . . . . . . . . . .
Tome Eftimov, Peter Korošec, and Barbara Koroušić Seljak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Angelo Lucia, Edward Thomas, and Peter A. DiMaggio
Ahmad Mazyad, Fabien Teytaud, and Cyril Fonlupt
S. P. Sidorov, S. V. Mironov, and M. G. Pleshakov
Danny D’Agostino, Andrea Serani, Emilio F. Campana, and Matteo Diez
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Manousos Rigakis, Dimitra Trachanatzi, Magdalene Marinaki, and Yannis Marinakis
Stefano Cagnoni, Paolo Fornacciari, Juxhino Kavaja, Monica Mordonini, Agostino Poggi, Alex Solimeo, and Michele Tomaiuolo
Mauro Dell’Amico, Natalia Selini Hadjidimitriou, Thorsten Koch, and Milena Petkovic
Maria João Alves and Carlos Henggeler Antunes
Jason Adair, Alexander Brownlee, Fabio Daolio, and Gabriela Ochoa
Riccardo Pellegrini, Andrea Serani, Giampaolo Liuzzi, Francesco Rinaldi, Stefano Lucidi, Emilio F. Campana, Umberto Iemma, and Matteo Diez
Beatrice Lazzerini and Francesco Pistolesi
Alice Plebe and Mario Pavone
Stéphane Chrétien and Sébastien Darses
3
Francesco Bagattini, Paola Cappanera, and Fabio Schoen
Joana Dias, Humberto Rocha, Tiago Ventura, Brígida Ferreira,
XVIII Contents
Contents
XIX
Ogerta Elezaj, Sule Yildirim, and Edlira Kalemi
Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor
Margarita Zaleshina, Alexander Zaleshin, and Adriana Galvani
Ziad Salem, Gerald Radspieler, Karlo Griparić, and Thomas Schmickl
Ramses Sala, Niccolò Baldanzini, and Marco Pierini
Roy Khristopher Bayot and Teresa Gonçalves
Peng Shi and Dario Landa-Silva
Iván Darío López, Cristian Heidelberg Valencia, and Juan Carlos Corrales
Kamer Kaya, Ş. İlker Birbil, M. Kaan Öztürk, and Amir Gohari
Mingxi Li, Yusuke Tanimura, and Hidemoto Nakada
Marco Baioletti, Gabriele Di Bari, Valentina Poggioni, and Mirco Tracolli
Chunfeng Ma, Min Kong, Jun Pei, and Panos M. Pardalos
Guillermo Rela, Franco Robledo, and Pablo Romero
Gabriel Bayá, Antonio Mauttone, Franco Robledo, and Pablo Romero
Cristian Galleguillos, Alina Sîrbu, Zeynep Kiziltan, Ozalp Babaoglu, Andrea Borghesi, and Thomas Bridi
Boris Musarais
Olgierd Unold and Radosław Tarnawski
Humberto Rocha and Joana Dias
Natalia Castro, Graciela Ferreira, Franco Robledo, and Pablo Romero
Matthias Horn, Günther Raidl, and Christian Blum
Reiji Hatsugai and Mary Inaba
Benedikt Klocker, Herbert Fleischner, and Günther R. Raidl
Francesco Calimeri, Mirco Caracciolo, Aldo Marzullo, and Claudio Stamile
Roberto Aringhieri, Davide Dell’Anna, Davide Duma, and Michele Sonnessa
XX Contents
Contents
XXI
Christopher Bacher and Günther R. Raidl
Stefano Mauceri, Louis Smith, James Sweeney, and James McDermott
Alberto Castellini and Giuditta Franco Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Recipes for Translating Big Data Machine
Reading to Executable Cellular
Signaling Models
1
2
3 Khaled Sayed , Cheryl A. Telmer , Adam A. Butchy , 1,3,4(&) 1 and Natasa Miskov-Zivanov
Department of Electrical and Computer Engineering,
University of Pittsburgh, Pittsburgh, PA, USA
2{k.sayed,nmzivanov}@pitt.edu
Department of Biological Sciences, Carnegie Mellon University,
Pittsburgh, PA, USA
3Department of Bioengineering, University of Pittsburgh,
Pittsburgh, PA, USA
4Department of Computational and Systems Biology,
University of Pittsburgh, Pittsburgh, PA, USA
Abstract. Biological literature is rich in mechanistic information that can beutilized to construct executable models of complex systems to increase our
understanding of health and disease. However, the literature is vast and frag-
mented, and therefore, automation of information extraction from papers and of
model assembly from the extracted information is necessary. We describe here
our approach for translating machine reading outputs, obtained by reading
biological signaling literature, to discrete models of cellular networks. We use
outputs from three different reading engines, and demonstrate the translation of
different features using examples from cancer literature. We also outline several
issues that still arise when assembling cellular network models from
state-of-the-art reading engines. Finally, we illustrate the details of our approach
with a case study in pancreatic cancer. Keywords: Machine reading Big data in literature Text mining Cell signaling networks Automated model generation1 Introduction
Biological knowledge is voluminous and fragmented; it is nearly impossible to read all scientific papers on a single topic such as cancer. When building a model of a particular biological system, one example being cancer microenvironment, researchers usually start by searching for existing relevant models and by looking for information about system components and their interactions in published literature.
Although there have been attempts to automate the process of model building
], most often modelers conduct these steps manually, with multiple iterations
2 K. Sayed et al.
between (i) information extraction, (ii) model assembly, (iii) model analysis, and (iv) model validation through comparison with most recently published results. To allow for rapid modeling of complex diseases like cancer, and for efficiently using ever-increasing amount of information in published work, we need representation standards and interfaces such that these tasks can be automated. This, in turn, will allow researchers to ask informed, interesting questions that can improve our understanding of health and disease.
The systems biology community has designed and proposed a standardized format for representing biological models called the systems biology markup language (SBML). This language allows for using different software tools, without the need for recreating models specific for each tool, as well as for sharing the built models between different research groups [
However, the SBML standard is not easily understood by
biologists who create mechanistic models, and thus requires an interface that allows biologists to focus on modeling tasks while hiding the details of the SBML language
- [
To this end, the contributions of the work presented in this paper include:
- A representation format that is straightforward to use by both machines and humans, and allows for efficient synthesis of models from big data in literature.
- An approach to effectively use state-of-the-art machine reading output to create executable discrete models of cellular signaling.
- A proposal for directions to further improve automation of assembly of models from big data in literature.
In Sect.
we briefly describe cellular networks, our modeling approach, and our framework that integrates machine reading, model assembly and model analysis. In Sect.
outlines
our approach to translate reading output to the model representation format. Section
discusses other issues that need to be taken into account when building interface between big data reading and model assembly in biology. Section
describes a case
study that uses our translation methodology. Section concludes the paper.
2 Background
2.1 Cellular Networks Intra-cellular networks include signal transduction, gene regulation, and metabolic networks [
Signaling networks are characterized by protein phosphorylation and
binding events, which transduce extracellular signals across the plasma membrane and through the cytoplasm
]. Gene regulatory networks involve translocation of signaling
proteins from the cytoplasm to the nucleus, where the integration of these protein signals act on the genome, resulting in changes in gene expression and cellular pro- cesses [
The regulation of metabolic networks incorporates phosphorylation and
binding, as do signaling networks, and also integrates allosteric regulation, other ]. protein modifications, and subcellular compartmentalization
Recipes for Translating Big Data Machine Reading
3
Inter-cellular networks assume interactions between cells of the same or different types. These interactions occur via signaling molecules such as growth factors and cytokines, synthesized and secreted by one cell, and bound to itself or other cells in its surroundings, or via a cell-cell contact.
At all levels of signaling, there are feedforward and feedback loops and crosstalk between signaling pathways to either maintain homeostasis or amplify changes initi- ated by extracellular signals ].
2.2 Modeling Approach When generating executable models, we use a discrete modeling approach previously described in
we represent system com-
ponents as model elements (A, B, and C in the example), where each element is defined as having a discrete number of levels of activity. Each element has a list of regulators called influence set. In our example, A is a positive regulator of C, B and C are positive regulators of A, and C activates itself while B inhibits itself. Additionally, each element has a corresponding update rule, a discrete function of its regulators. In our example, A is a conjunction of B and C, while C is a disjunction of A and C. Although the model
is stochastic, and thus, allows for
structure is fixed, the simulator that we use [ closely recapitulating the behavior of biological pathways and networks.
Fig. 1.
Toy example illustrating our modeling approach.
2.3 Framework Overview To automatically incorporate new reading outputs into models, we have developed a reading-modeling-explanation framework, called DySE (Dynamic System Explana- tion), outlined in Fig.
. This framework allows for (i) expansion of existing models or
assembly of new models from machine reading output, (ii) analysis and explanation of models, and (iii) generation of machine-readable feedback to reading engines. We focus here on the front end of the framework, the translation from reading outputs to the list of elements and their influence sets, with context information, where available .
3 Model Representation Format
To enable comprehensive translation from reading engine outputs to executable models, the models are first represented in tabular format. It is important to note here that the tabular representation does not include final update rules, that is, the tabular version of the model is further translated into an executable model that can be
4 K. Sayed et al.
Fig. 2. DySE framework.
simulated. Each row in the model table corresponds to one specific model element (i.e., modeled system component), and the columns are organized in several groups: (i) in- formation about the modeled system component, (ii) information about the compo- nent’s regulators, and (iii) information about knowledge sources. This format enables straightforward model extension to represent both additional system components as new rows in the table, and additional component-related features by including new columns in the table. The addition of new columns occurs with improvements in machine reading.
The first group of fields in our representation format includes system component- related information. This information is either used by the executable model, or kept as background information to provide specific details about the system component when creating a hypothesis or explaining outcomes of wet lab experiments.
A. Name – full name of element, e.g., “Epidermal growth factor receptor”.
B. Nomenclature ID – name commonly used in the field for cellular components, e.g., “EGFR” is used for “Epidermal growth factor receptor”.
C. Type
- – these are types of entities used by reading engines as listed in Table .
D. Unique ID – we use identifiers corresponding to elements that are listed in databases, according to Table
E. Location – we include subcellular locations and the extracellular space, as listed in Table
F. Location identifier – we use location identifiers as listed in Table
G. Cell line – obtained from reading output.
H. Cell type – obtained from reading outputs.
Table 2. Table 1. The list of cellular locations and Element type and ID database. their IDs from the Gene Ontology [ Element type Database name database.
Protein UniProt ] Location name Location ID Protein family Pfam [ Cytoplasm GO:0005737 Protein complex Bioentities [ Cytosol GO:0005829 Chemical PubChem ] Plasma membrane GO:0005886 Gene HGNC ] Nucleus GO:0005634 Biological process GO ] Mitochondria GO:0005739
Extracellular GO:0005576 Endoplasmic reticulum GO:0005783 Recipes for Translating Big Data Machine Reading
5
I. Tissue type – obtained from reading output. J. Organism – obtained from reading output. K. Executable model variable – variable names currently include above described fields B, C, E, and H.
The second group of fields in our representation includes component regulators- related information that is mainly used by executable models, with a few fields used for bookkeeping, similar to the first group of fields.
L. Positive regulator nomenclature IDs – list of positive regulators of the element. M. Negative regulator nomenclature IDs – list of negative regulators of the element.
N. Interaction type – for each listed regulator, in case it is known whether interaction is direct or indirect. O. Interaction mechanism – for each known direct interaction, if the mecha- nism of interaction is known. Mechanisms that can be obtained from reading engines are listed in Table . P. Interaction score – for each interaction, a confidence score obtained from reading.
The third group of fields in our representation includes interaction-related provenance information. Q. Reference paper IDs – for each interaction, we list IDs of published papers that mention the interaction. This information is obtained directly from reading output. R. Sentences – for each interaction, we list sentences describing the interaction.
This information is obtained directly from reading output. It is worth mentioning that this representation format can be converted into the
SBML format to be used by different software tools and shared between different working groups. Additionally, the tabular format provides an interface that can be easily created or read by biologists, and generated or parsed by a machine.
4 From Reading to Model
We obtain outputs from three types of reading engines, namely REACH
], RUBI-
CON [
, we list the interaction
files with similar but not exactly the same format. In Table mechanisms that can be obtained from these three reading engines, and in the following sub-sections we outline their differences and the advantages of each reading engine.
6 K. Sayed et al.
Table 3. Intracellular interactions (mechanisms) recognized by the three reading engines.
Reading Recognized mechanisms engine REACH Activation, Inhibition, Binding, Phosphorylation, Dephosphorylation,] Ubiquitination, Acetylation, Methylation, Increase or Decrease Amount, Transcription, Translocation RUBICON Activation, Inhibition, Promotes, Signaling, Reduce, Induce, Supports,
] Attenuates, Stimulate, Antagonize, Synergize, Increase and Decrease Amount, Abrogates LTR ] Binding, Phosphorylation, Dephosphorylation, Isomerizations
4.1 Simple Interaction Translation
can extract both direct and indirect
The first type of reading engine, REACH [ interactions, as well as interaction mechanisms, where available. The simplest and most common reading outputs are those that include only a regulated element and a single regulator, each of them having one of the entity types listed in Table
, with the
interaction mechanism being one of the mechanisms described in Table
Such
interactions have straightforward translation to our representation format, that is, they . are translated into a single table row with some or all of the fields described in Sect. Given that our modeling formalism accounts for positive and negative regulators, while reading engines can also output specific mechanisms where available in text, we assume in the translation that Phosphorylation, Acetylation, Increase Amount, and Methylation represent positive regulations, and Dephosphorylation, Ubiquitination, Decrease Amount, and Demethylation represent negative regulations. Additionally, we treat Transcription events as positive regulation.
4.2 Translation of Translocation Interaction We translate translocation events (moving components from one cellular location to another) using the formalism described in