Web and Big Data Part I 2018 pdf pdf

  Yi Cai Yoshiharu Ishikawa (Eds.) Jianliang Xu

  LNCS 10987

Web and Big Data

  Second International Joint Conference, APWeb-WAIM 2018 Macau, China, July 23–25, 2018 Proceedings, Part I

  

Lecture Notes in Computer Science 10987

  Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

  Editorial Board

  David Hutchison Lancaster University, Lancaster, UK

  Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA

  Josef Kittler University of Surrey, Guildford, UK

  Jon M. Kleinberg Cornell University, Ithaca, NY, USA

  Friedemann Mattern ETH Zurich, Zurich, Switzerland

  John C. Mitchell Stanford University, Stanford, CA, USA

  Moni Naor Weizmann Institute of Science, Rehovot, Israel

  C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India

  Bernhard Steffen TU Dortmund University, Dortmund, Germany

  Demetri Terzopoulos University of California, Los Angeles, CA, USA

  Doug Tygar University of California, Berkeley, CA, USA

  Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at

  • Yi Cai Yoshiharu Ishikawa Jianliang Xu (Eds.)

  Web and Big Data

Second International Joint Conference, APWeb-WAIM 2018 Macau, China, July 23–25, 2018 Proceedings, Part I Editors Yi Cai Jianliang Xu South China University of Technology Hong Kong Baptist University Guangzhou Kowloon Tong, Hong Kong China China Yoshiharu Ishikawa Nagoya University Nagoya Japan

ISSN 0302-9743

  ISSN 1611-3349 (electronic) Lecture Notes in Computer Science

ISBN 978-3-319-96889-6

  ISBN 978-3-319-96890-2 (eBook) https://doi.org/10.1007/978-3-319-96890-2 Library of Congress Control Number: 2018948814 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG, part of Springer Nature 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, express or implied, with respect to the material contained herein or for any errors or

omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG

  

Preface

  This volume (LNCS 10987) and its companion volume (LNCS 10988) contain the proceedings of the second Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, called APWeb-WAIM. This joint conference aims to attract participants from different scientific communities as well as from industry, and not merely from the Asia Pacific region, but also from other continents. The objective is to enable the sharing and exchange of ideas, expe- riences, and results in the areas of World Wide Web and big data, thus covering Web technologies, database systems, information management, software engineering, and big data. The second APWeb-WAIM conference was held in Macau during July 23–25, 2018. As an Asia-Pacific flagship conference focusing on research, develop- ment, and applications in relation to Web information management, APWeb-WAIM builds on the successes of APWeb and WAIM: APWeb was previously held in Beijing (1998), Hong Kong (1999), Xi’an (2000), Changsha (2001), Xi’an (2003), Hangzhou (2004), Shanghai (2005), Harbin (2006), Huangshan (2007), Shenyang (2008), Suzhou (2009), Busan (2010), Beijing (2011), Kunming (2012), Sydney (2013), Changsha (2014), Guangzhou (2015), and Suzhou (2016); and WAIM was held in Shanghai (2000), Xi’an (2001), Beijing (2002), Chengdu (2003), Dalian (2004), Hangzhou (2005), Hong Kong (2006), Huangshan (2007), Zhangjiajie (2008), Suzhou (2009), Jiuzhaigou (2010), Wuhan (2011), Harbin (2012), Beidaihe (2013), Macau (2014), Qingdao (2015), and Nanchang (2016). The first joint APWeb-WAIM conference was held in Bejing (2017). With the fast development of Web-related technologies, we expect that APWeb-WAIM will become an increasingly popular forum that brings together outstanding researchers and developers in the field of the Web and big data from around the world. The high-quality program documented in these proceedings would not have been possible without the authors who chose APWeb-WAIM for disseminating their findings. Out of 168 submissions, the conference accepted 39 regular (23.21%), 31 short research papers, and six demonstrations. The contributed papers address a wide range of topics, such as text analysis, graph data processing, social networks, recommender systems, information retrieval, data streams, knowledge graph, data mining and application, query processing, machine learning, database and Web applications, big data, and blockchain. The technical program also included keynotes by Prof. Xuemin Lin (The University of New South Wales, Australia), Prof. Lei Chen (The Hong Kong University of Science and Technology, Hong Kong, SAR China), and Prof. Ninghui Li (Purdue University, USA) as well as industrial invited talks by Dr. Zhao Cao (Huawei Blockchain) and Jun Yan (YiDu Cloud). We are grateful to these distinguished scientists for their invaluable contributions to the conference program. As a joint conference, teamwork was particularly important for the success of APWeb-WAIM. We are deeply thankful to the Program Committee members and the external reviewers for lending their time and expertise to the con- ference. Special thanks go to the local Organizing Committee led by Prof. Zhiguo Gong. VI Preface

  Thanks also go to the workshop co-chairs (Leong Hou U and Haoran Xie), demo co-chairs (Zhixu Li, Zhifeng Bao, and Lisi Chen), industry co-chair (Wenyin Liu), tutorial co-chair (Jian Yang), panel chair (Kamal Karlapalem), local arrangements chair (Derek Fai Wong), and publicity co-chairs (An Liu, Feifei Li, Wen-Chih Peng, and Ladjel Bellatreche). Their efforts were essential to the success of the conference. Last but not least, we wish to express our gratitude to the treasurer (Andrew Shibo Jiang), the Webmaster (William Sio) for all the hard work, and to our sponsors who generously supported the smooth running of the conference. We hope you enjoy the exciting program of APWeb-WAIM 2018 as documented in these proceedings. June 2018

  Yi Cai Jianliang Xu

  Yoshiharu Ishikawa Organization Organizing Committee

  Honorary Chair Lionel Ni University of Macau, SAR China General Co-chairs Zhiguo Gong University of Macau, SAR China Qing Li City University of Hong Kong, SAR China Kam-fai Wong Chinese University of Hong Kong, SAR China Program Co-chairs Yi Cai South China University of Technology, China Yoshiharu Ishikawa Nagoya University, Japan Jianliang Xu Hong Kong Baptist University, SAR China Workshop Chairs Leong Hou U University of Macau, SAR China Haoran Xie Education University of Hong Kong, SAR China Demo Co-chairs Zhixu Li Soochow University, China Zhifeng Bao RMIT, Australia Lisi Chen Wollongong University, Australia Tutorial Chair Jian Yang Macquarie University, Australia Industry Chair Wenyin Liu Guangdong University of Technology, China Panel Chair Kamal Karlapalem

  IIIT, Hyderabad, India Publicity Co-chairs An Liu Soochow University, China Feifei Li University of Utah, USA Wen-Chih Peng National Taiwan University, China Ladjel Bellatreche

  ISAE-ENSMA, Poitiers, France Treasurers Leong Hou U University of Macau, SAR China Andrew Shibo Jiang Macau Convention and Exhibition Association,

  SAR China Local Arrangements Chair Derek Fai Wong University of Macau, SAR China Webmaster William Sio University of Macau, SAR China

  Senior Program Committee

  Bin Cui Peking University, China Byron Choi Hong Kong Baptist University, SAR China Christian Jensen Aalborg University, Denmark Demetrios

  Zeinalipour-Yazti University of Cyprus, Cyprus

  Feifei Li University of Utah, USA Guoliang Li Tsinghua University, China K. Selçuk Candan Arizona State University, USA Kyuseok Shim Seoul National University, South Korea Makoto Onizuka Osaka University, Japan Reynold Cheng The University of Hong Kong, SAR China Toshiyuki Amagasa University of Tsukuba, Japan Walid Aref Purdue University, USA Wang-Chien Lee Pennsylvania State University, USA Wen-Chih Peng National Chiao Tung University, Taiwan Wook-Shin Han Pohang University of Science and Technology, South Korea Xiaokui Xiao National University of Singapore, Singapore Ying Zhang University of Technology Sydney, Australia

  Program Committee

  Alex Thomo University of Victoria, Canada An Liu Soochow University, China Baoning Niu Taiyuan University of Technology, China Bin Yang Aalborg University, Denmark Bo Tang Southern University of Science and Technology, China Zouhaier Brahmia University of Sfax, Tunisia Carson Leung University of Manitoba, Canada

  VIII Organization

  Chih-Chien Hung Tamkang University, China Chih-Hua Tai National Taipei University, China Cuiping Li Renmin University of China, China Daniele Riboni University of Cagliari, Italy Defu Lian Big Data Research Center, University of Electronic

  Science and Technology of China, China Dejing Dou University of Oregon, USA Dimitris Sacharidis Technische Universität Wien, Austria Ganzhao Yuan Sun Yat-sen University, China Giovanna Guerrini Università di Genova, Italy Guanfeng Liu The University of Queensland, Australia Guoqiong Liao Jiangxi University of Finance and Economics, China Guanling Lee National Dong Hwa University, China Haibo Hu Hong Kong Polytechnic University, SAR China Hailong Sun Beihang University, China Han Su University of Southern California, USA Haoran Xie The Education University of Hong Kong, SAR China Hiroaki Ohshima University of Hyogo, Japan Hong Chen Renmin University of China, China Hongyan Liu Tsinghua University, China Hongzhi Wang Harbin Institute of Technology, China Hongzhi Yin The University of Queensland, Australia Hua Wang Victoria University, Australia Ilaria Bartolini University of Bologna, Italy James Cheng Chinese University of Hong Kong, SAR China Jeffrey Xu Yu Chinese University of Hong Kong, SAR China Jiajun Liu Renmin University of China, China Jialong Han Nanyang Technological University, Singapore Jianbin Huang Xidian University, China Jian Yin Sun Yat-sen University, China Jiannan Wang Simon Fraser University, Canada Jianting Zhang City College of New York, USA Jianxin Li Beihang University, China Jianzhong Qi University of Melbourne, Australia Jinchuan Chen Renmin University of China, China Ju Fan Renmin University of China, China Jun Gao Peking University, China Junhu Wang

  Griffith University, Australia Kai Zeng Microsoft, USA Kai Zheng University of Electronic Science and Technology of China, China Karine Zeitouni Université de Versailles Saint-Quentin, France Lei Zou Peking University, China Leong Hou U University of Macau, SAR China Liang Hong Wuhan University, China

  Organization

  IX Lisi Chen Wollongong University, Australia Lu Chen Aalborg University, Denmark Maria Damiani University of Milan, Italy Markus Endres University of Augsburg, Germany Mihai Lupu Vienna University of Technology, Austria Mirco Nanni

  ISTI-CNR Pisa, Italy Mizuho Iwaihara Waseda University, Japan Peiquan Jin University of Science and Technology of China, China Peng Wang Fudan University, China Qin Lu University of Technology Sydney, Australia Ralf Hartmut Güting Fernuniversität in Hagen, Germany Raymond Chi-Wing Wong Hong Kong University of Science and Technology,

  SAR China Ronghua Li Shenzhen University, China Rui Zhang University of Melbourne, Australia Sanghyun Park Yonsei University, South Korea Sanjay Madria Missouri University of Science and Technology, USA Shaoxu Song Tsinghua University, China Shengli Wu Jiangsu University, China Shimin Chen Chinese Academy of Sciences, China Shuai Ma Beihang University, China Shuo Shang King Abdullah University of Science and Technology,

  Saudi Arabia Takahiro Hara Osaka University, Japan Tieyun Qian Wuhan University, China Tingjian Ge University of Massachusetts, Lowell, USA Tom Z. J. Fu Advanced Digital Sciences Center, Singapore Tru Cao Ho Chi Minh City University of Technology, Vietnam Vincent Oria New Jersey Institute of Technology, USA Wee Ng Institute for Infocomm Research, Singapore Wei Wang University of New South wales, Australia Weining Qian East China Normal University, China Weiwei Sun Fudan University, China Wen Zhang Wuhan University, China Wolf-Tilo Balke Technische Universität Braunschweig, Germany Wookey Lee Inha University, South Korea Xiang Zhao National University of Defence Technology, China Xiang Lian Kent State University, USA Xiangliang Zhang King Abdullah University of Science and Technology,

  Saudi Arabia Xiangmin Zhou RMIT University, Australia Xiaochun Yang Northeast University, China Xiaofeng He East China Normal University, China Xiaohui (Daniel) Tao The University of Southern Queensland, Australia Xiaoyong Du Renmin University of China, China

  X Organization

  Xin Cao The University of New South Wales, Australia Xin Huang Hong Kong Baptist University, SAR China Xin Wang Tianjin University, China Xingquan Zhu Florida Atlantic University, USA Xuan Zhou Renmin University of China, China Yafei Li Zhengzhou University, China Yanghua Xiao Fudan University, China Yanghui Rao Sun Yat-sen University, China Yang-Sae Moon Kangwon National University, South Korea Yaokai Feng Kyushu University, Japan Yi Cai South China University of Technology, China Yijie Wang National University of Defense Technology, China Yingxia Shao Peking University, China Yongxin Tong Beihang University, China Yu Gu Northeastern University, China Yuan Fang Institute for Infocomm Research, Singapore Yunjun Gao Zhejiang University, China Zakaria Maamar Zayed University, United Arab of Emirates Zhaonian Zou Harbin Institute of Technology, China Zhiwei Zhang Hong Kong Baptist University, SAR China

  Organization

  XI Keynotes

  

Graph Processing: Applications, Challenges,

and Advances

  Xuemin Lin

  

School of Computer Science and Engineering,

University of New South Wales, Sydney

lxue@cse.unsw.edu.au

Abstract. Graph data are key parts of Big Data and widely used for modelling

complex structured data with a broad spectrum of applications. Over the last

decade, tremendous research efforts have been devoted to many fundamental

problems in managing and analyzing graph data. In this talk, I will cover various

applications, challenges, and recent advances. We will also look to the future

of the area.

  

Differential Privacy in the Local Setting

  Ninghui Li

  

Department of Computer Sciences, Purdue University

ninghui@cs.purdue.edu

Abstract. Differential privacy has been increasingly accepted as the de facto

standard for data privacy in the research community. Recently, techniques for

satisfying differential privacy (DP) in the local setting, which we call LDP, have

been deployed. Such techniques enable the gathering of statistics while pre-

serving privacy of every user, without relying on trust in a single data curator.

Companies such as Google, Apple, and Microsoft have deployed techniques for

collecting user data while satisfying LDP. In this talk, we will discuss the state

of the art of LDP. We survey recent developments for LDP, and discuss pro-

tocols for estimating frequencies of different values under LDP, and for com-

puting marginal when each user has multiple attributes. Finally, we discuss

limitations and open problems of LDP.

  

Big Data, AI, and HI, What is the Next?

  Lei Chen

  

Department of Computer Science and Engineering, Hong Kong University

of Science and Technology

leichen@cse.ust.hk

Abstract. Recently, AI has become quite popular and attractive, not only to the

academia but also to the industry. The successful stories of AI on Alpha-go and

Texas hold ’em games raise significant public interests on AI. Meanwhile,

human intelligence is turning out to be more sophisticated, and Big Data

technology is everywhere to improve our life quality. The question we all want

to ask is “what is the next?”. In this talk, I will discuss about DHA, a new

computing paradigm, which combines big Data, Human intelligence, and AI.

First I will briefly explain the motivation of DHA. Then I will present some

challenges and possible solutions to build this new paradigm.

  

Contents – Part I

  . . .

  . . .

  . . .

   Gang Chen, Yue Peng, and Chongjun Wang . . .

  . . .

   Zenan Xu, Yetao Fu, Xingming Chen, Yanghui Rao, Haoran Xie, Fu Lee Wang, and Yang Peng

  . . .

   Xuewen Shi, Heyan Huang, Ping Jian, and Yi-Kun Tang . . .

  . . .

   Jianjun Cheng, Longjie Li, Haijuan Yang, Qi Li, and Xiaoyun Chen

  Jian Xu, Xiaoyi Fu, Liming Tu, Ming Luo, Ming Xu, and Ning Zheng

  Menghao Zhang, Binbin Hu, Chuan Shi, Bin Wu, and Bai Wang

  Zhijian Zhang, Ling Liu, Kun Yue, and Weiyi Liu XX Contents – Part I

  

  Guowang Du, Lihua Zhou, Lizhen Wang, and Hongmei Chen

  Cairong Yan, Yan Huang, Qinglong Zhang, and Yan Wan

  Zhang Chuanyan, Hong Xiaoguang, and Peng Zhaohui

  Xiaotian Han, Chuan Shi, Lei Zheng, Philip S. Yu, Jianxin Li, and Yuanfu Lu

  Feifei Li, Hongyan Liu, Jun He, and Xiaoyong Du

  Hongzhi Liu, Yingpeng Du, and Zhonghai Wu

  Zhenhua Huang, Chang Yu, Jiujun Cheng, and Zhixiao Wang

  Mohammad Hossein Namaki, Yinghui Wu, and Xin Zhang

  Xin Ding, Yuanliang Zhang, Lu Chen, Yunjun Gao, and Baihua Zheng

  Yan Fan, Xinyu Liu, Shuni Gao, Zhaohua Zhang, Xiaoguang Liu, and Gang Wang

  Yang Song, Yu Gu, and Ge Yu

  Zhongmin Zhang, Jiawei Chen, and Shengli Wu

  XXI Contents – Part I

   Anzhen Zhang, Jinbao Wang, Jianzhong Li, and Hong Gao

  

  Wenjing Wei, Xiaoyi Jia, Yang Liu, and Xiaohui Yu 2 DMDP

  Yi Zhao, Yong Huang, and Yanyan Shen

  Linlin Gao, Haiwei Pan, Fujun Liu, Xiaoqin Xie, Zhiqiang Zhang, Jinming Han, and the Alzheimer’s Disease Neuroimaging Initiative

   Bofang Li, Tao Liu, Zhe Zhao, and Xiaoyong Du

   Fan Feng, Jikai Wu, Wei Sun, Yushuang Wu, HuaKang Li, and Xingguo Chen

   Zherong Zhang, Wenge Rong, Yuanxin Ouyang, and Zhang Xiong

  

  Jianhui Ding, Shiheng Ma, Weijia Jia, and Minyi Guo

  Yongjian You, Shaohua Zhang, Jiong Lou, Xinsong Zhang, and Weijia Jia

  Zichao Huang, Bo Li, and Jian Yin

  Qiang Xu, Xin Wang, Jianxin Li, Ying Gan, Lele Chai, and Junhu Wang

  Yongxin Shen, Zhixu Li, Wenling Zhang, An Liu, and Xiaofang Zhou

   Guozheng Rao, Bo Zhao, Xiaowang Zhang, Zhiyong Feng, and Guohui Xiao

  

  Peizhong Yang, Tao Zhang, and Lizhen Wang

  Yuan Fang, Lizhen Wang, Teng Hu, and Xiaoxuan Wang

  Zichen Wang, Tian Li, Yingxia Shao, and Bin Cui

  Xiaoli Wang, Chuchu Gao, Jiangjiang Cao, Kunhui Lin, Wenyuan Du, and Zixiang Yang

  Chaozhou Yang, Xin Wang, Qiang Xu, and Weixi Li

  Yajie Zhu, Feng Xiong, Qing Xie, Lin Li, and Yongjian Liu

  XXII Contents – Part I

  

Contents – Part II

   . . .

   He Chen, Xiuxia Tian, and Cheqing Jin . . .

   Jian Li, An Liu, Weiqi Wang, Zhixu Li, Guanfeng Liu, Lei Zhao, and Kai Zheng . . .

   Huan Zhou, Jinwei Guo, Ouya Pei, Weining Qian, Xuan Zhou, and Aoying Zhou . . .

   Xiaolan Zhang, Yeting Li, Fei Tian, Fanlin Cui, Chunmei Dong, and Haiming Chen . . .

   Wei Chit Tan . . .

   Jiaxun Hua, Yu Liu, Yibin Shen, Xiuxia Tian, and Cheqing Jin . . .

  . . .

   Rong Kang, Chen Wang, Peng Wang, Yuting Ding, and Jianmin Wang

  Guiling Wang, Xiaojiang Zuo, Marc Hesenius, Yao Xu, Yanbo Han, and Volker Gruhn XXIV Contents – Part II

  

  Wentao Wang, Chunqiu Zeng, and Tao Li

  Hongbo Sun, Chenkai Guo, Jing Xu, Jingwen Zhu, and Chao Zhang

  Rong Liu, Guanglin Cong, Bolong Zheng, Kai Zheng, and Han Su

  Wentao Wang, Bo Tang, and Min Zhu

  Na Ta, Jiuqi Wang, and Guoliang Li

  Chen Yang, Wei Chen, Bolong Zheng, Tieke He, Kai Zheng, and Han Su

  Xibo Zhou, Qiong Luo, Dian Zhang, and Lionel M. Ni

  Minxi Li, Jiali Mao, Xiaodong Qi, Peisen Yuan, and Cheqing Jin

  Meiling Zhu, Chen Liu, and Yanbo Han

  Zhaoan Dong, Ju Fan, Jiaheng Lu, Xiaoyong Du, and Tok Wang Ling

  Guohai Xu, Chengyu Wang, and Xiaofeng He

  Haifeng Zhu, Pengpeng Zhao, Zhixu Li, Jiajie Xu, Lei Zhao, and Victor S. Sheng

  Yuan Fang, Lizhen Wang, and Teng Hu

  XXV Contents – Part II

   Lihua Zhou, Guowang Du, Qing Xiao, and Lizhen Wang

  

  Xin Ding, Yuanliang Zhang, Lu Chen, Keyu Yang, and Yunjun Gao

  Muxi Leng, Yajun Yang, Junhu Wang, Qinghua Hu, and Xin Wang

  Guohui Li, Qi Chen, Bolong Zheng, and Xiaosong Zhao

  Wenyan Chen, Zheng Liu, Wei Shi, and Jeffrey Xu Yu

  Zhefan Zhong, Xin Lin, Liang He, and Yan Yang

  Yifeng Luo, Junshi Guo, and Shuigeng Zhou

  Yin Zhang, Huiping Liu, Cheqing Jin, and Ye Guo

  Kun Hao, Junchang Xin, Zhiqiong Wang, Zhuochen Jiang, and Guoren Wang

  An Zhang and Kunlong Zhang

  Dayu Jia, Junchang Xin, Zhiqiong Wang, Wei Guo, and Guoren Wang

  Wen Zhao and Xiaoying Wu

  Text Analysis

  

Abstractive Summarization with the Aid

of Extractive Summarization

( )

  B

  Yangbin Chen , Yun Ma, Xudong Mao, and Qing Li

  

City University of Hong Kong, Hong Kong SAR, China

{robinchen2-c,yunma3-c,xdmao2-c}@my.cityu.edu.hk,

qing.li@cityu.edu.hk

Abstract. Currently the abstractive method and extractive method are

two main approaches for automatic document summarization. To fully

integrate the relatedness and advantages of both approaches, we pro-

pose in this paper a general framework for abstractive summarization

which incorporates extractive summarization as an auxiliary task. In

particular, our framework is composed of a shared hierarchical docu-

ment encoder, an attention-based decoder for abstractive summarization,

and an extractor for sentence-level extractive summarization. Learn-

ing these two tasks jointly with the shared encoder allows us to bet-

ter capture the semantics in the document. Moreover, we constrain the

attention learned in the abstractive task by the salience estimated in

the extractive task to strengthen their consistency. Experiments on the

CNN/DailyMail dataset demonstrate that both the auxiliary task and

the attention constraint contribute to improve the performance signifi-

cantly, and our model is comparable to the state-of-the-art abstractive

models.

  Keywords: Abstractive document summarization Squence-to-sequence Joint learning ·

1 Introduction

  Automatic document summarization has been studied for decades. The target of document summarization is to generate a shorter passage from the docu- ment in a grammatically and logically coherent way, meanwhile preserving the important information. There are two main approaches for document summa- rization: extractive summarization and abstractive summarization. The extrac- tive method first extracts salient sentences or phrases from the source document and then groups them to produce a summary without changing the source text. Graph-based ranking model

  are

  typical models for extractive summarization. However, the extractive method unavoidably includes secondary or redundant information and is far from the way humans write summaries ].

4 Y. Chen et al.

  The abstractive method, in contrast, produces generalized summaries, con- veying information in a concise way, and eliminating the limitations to the orig- inal words and sentences of the document. This task is more challenging since it needs advanced language generation and compression techniques. Discourse structures

   ] are most commonly used by researchers for generating abstractive summaries.

  Recently, Recurrent Neural Network (RNN)-based sequence-to-sequence model with attention mechanism has been applied to abstractive summariza- tion, due to its great success in machine translation

  . However, there

  are still some challenges. First, the RNN-based models have difficulties in cap- turing long-term dependencies, making summarization for long document much tougher. Second, different from machine translation which has strong correspon- dence between the source and target words, an abstractive summary corresponds to only a small part of the source document, making its attention difficult to be learned.

  We adopt hierarchical approaches for the long-term dependency problem, which have been used in many tasks such as machine translation and docu- ment classification

   ]. But few of them have been applied to the abstractive

  summarization tasks. In particular, we encode the input document in a hierar- chical way from word-level to sentence-level. There are two advantages. First, it captures both the local and global semantic representations, resulting in better feature learning. Second, it improves the training efficiency because the time complexity of the RNN-based model can be reduced by splitting the long docu- ment into short sentences.

  The attention mechanism is widely used in sequence-to-sequence tasks

   ]. However, for abstractive summarization, it is difficult to learn the attention

  since only a small part of the source document is important to the summary. In this paper, we propose two methods to learn a better attention distribution. First, we use a hierarchical attention mechanism, which means that the attention is applied in both word and sentence levels. Similar to the hierarchical approach in encoding, the advantage of using hierarchical attention is to capture both the local and global semantic representations. Second, we use the salience scores of the auxiliary task (i.e., the extractive summarization) to constrain the sentence- level attention.

  In this paper, we present a novel technique for abstractive summarization which incorporates extractive summarization as an auxiliary task. Our frame- work consists of three parts: a shared document encoder, a hierarchical attention- based decoder and an extractor. As Fig.

   (1) and (2)) in order to address the long-term depen-

  dency problem. Then the learned document representations are shared by the extractor (Fig.

   (5)).

  The extractor and the decoder are jointly trained which can capture better semantics of the document. Furthermore, as both the sentence salience scores in the extractor and the sentence-level attention in the decoder indicate the

  

Abstractive Summarization with the Aid of Extractive Summarization

  5 Fig. 1. General framework of our proposed model with 5 components: (1) word-level

encoder encodes the sentences word-by-word independently, (2) sentence- level encoder

encodes the document sentence-by-sentence, (3) sentence extractor makes binary clas-

sification for each sentence, (4) hierarchical attention calculates the word-level and

sentence-level context vectors for decoding steps, (5) decoder decodes the output

sequential word sequence with a beam-search algorithm.

  importance of source sentences, we constrain the learned attention (Fig.

   (4)) with the extracted sentence salience in order to strengthen their consistency.

  We have conducted experiments on a news corpus - the CNN/DailyMail dataset

  . The results demonstrate that adding the auxiliary extractive task

  and constraining the attention are both useful to improve the performance of the abstractive task, and our proposed joint model is comparable to the state- of-the-art abstractive models.

2 Neural Summarization Model

  In this section we describe the framework of our proposed model which consists of five components. As illustrated in Fig.

  the hierarchical document encoder

  which includes both the word-level and the sentence-level encoders reads the input word sequences and generates shared document representations. On one hand, the shared representations are fed into the sentence extractor which is a sequence labeling model to calculate salience scores. On the other hand, the representations are used to generate abstractive summaries by a GRU-based lan- guage model, with the hierarchical attention including the sentence-level atten- tion and word-level attention. Finally, the two tasks are jointly trained.

6 Y. Chen et al.

  2.1 Shared Hierarchical Document Encoder We encode the document in a hierarchical way. In particular, the word sequences are first encoded by a bidirectional GRU network parallelly, and a sequence of sentence-level vector representations called sentence embeddings are generated. Then the sentence embeddings are fed into another bidirectional GRU network and get the document representations. Such an architecture has two advantages.

  First, it can reduce the negative effects during the training process caused by the long-term dependency problem, so that the document can be represented from both local and global aspects. Second, it helps improve the training efficiency as the time complexity of RNN-based model increases with the sequence length.

  Formally, let V denote the vocabulary which contains D tokens, and each token is embedded as a d-dimension vector. Given an input document X con- , i taining m sentences {X i ∈ 1, ..., m}, let n i denote the number of words in X i . Word-level Encoder reads a sentence word-by-word until the end, using a bidirectional GRU network as the following equations:

  ← → − → ← − w w w h = [ h ; h ] (1) i,j i,j i,j → − → − w w h = GRU (x i,j , h ) (2) i,j i,j −

  1

  ← − ← − w w h , h i,j i,j +1 = GRU (x i,j ) (3) where x i,j represents the embedding vector of the jth word in th ith sentence.

  ← → w → − w h is a concatenated vector of the forward hidden state h and the backward i,j i,j

  ← − w h hidden state . H is the size of the hidden state. i,j

  Furthermore, the ith sentence is represented by a non-linear transformation of the word-level hidden states as follows: n i 1 ← → w s i = tanh(W · h + b) (4) i,j n i j

  =1 where s i is the sentence embedding and W, b are learnable parameters.

  Sentence-level Encoder reads a document sentence-by-sentence until the end, using another bi-directional GRU network as depicted by the following equations:

  ← → − → ← − s s s h = [ h ; h ] (5) i i i → − → − s s h = GRU (s i , h ) (6) i i −

  1

  ← − ← − s s h = GRU (s i , h ) (7) i i

  • 1

  ← → s → − s where h is a concatenated vector of the forward hidden state h and the i i

  ← − s backward hidden state h . i ← → s

  The concatenated vectors h are document representations shared by the i two tasks which will be introduced next.

  

Abstractive Summarization with the Aid of Extractive Summarization

  7

  2.2 Sentence Extractor The sentence extractor can be viewed as a sequential binary classifier. We use a logistic function to calculate a score between 0 and 1, which is an indicator of whether or not to keep the sentence in the final summary. The score can also be considered as the salience of a sentence in the document. Let p denote the i score and q ∈ {0, 1} denote the result of whether or not to keep the sentence. i

  In particular, p is calculated as follows: i ← → s h p = P (q = 1| ) i i i

  (8) extr s extr ← → extr = σ(W · h + b ) extr i where W is the weight and b is the bias which can be learned. The sentence extractor generates a sequence of probabilities indicating the importance of the sentences. As a result, the extractive summary is created by selecting sentences with a probability larger than a given threshold τ . We set τ

  = 0.5 in our experiment. We choose the cross entropy as the extractive loss function, i.e., m

  1 E se = − q logp + (1 − q )log(1 − p ) (9) i i i i m i =1

  2.3 Decoder Our decoder is a unidirectional GRU network with hierarchical attention. We use the attention to calculate the context vectors which are weighted sums of the hidden states of the hierarchical encoders. The equations are given as below: m s s ← → c = α t,i · h (10) t i i m =1 n i w w ← → s w c = β t,i,j · h (11) t i,j i =1 j =1 where c is the sentence-level context vector and c is the word-level context t t vector at decoding time step t. Specifically, α t,i denotes the attention value on the ith sentence and β t,i,j denotes the attention value on the jth word of the i th sentence.

  The input of the GRU-based language model at decoding time step t con- tains three vectors: the word embedding of previous generated word ˆ y , the t − s

  1

  sentence-level context vector of previous time step c and the word-level con- t − w

  1

  text vector of previous time step c . They are transformed by a linear function t −

  1

  and fed into the language model as follows: s w ˜

  , f , c , c h t = GRU (˜ h t − in (ˆ y )) (12)

  1 t − t − t −

  1

  1

  1

8 Y. Chen et al.

  where ˜ h t is the hidden state of decoding time step t. f in is the linear transfor- dec dec mation function with W as the weight and b as the bias.

  The hidden states of the language model are used to generate the output word sequence. The conditional probability distribution over the vocabulary in the tth time step is: s w

  P y y , ..., y , x h , c , c (ˆ |ˆ ˆ ) = g(f out (˜ t )) (13) t

  1 t − 1 t t sof t

  where g is the softmax function and f out is a linear function with W and sof t b as learnable parameters.

  The negative log likelihood loss is applied as the loss of the decoder, i.e., T

  1 E y = −log(y ) (14) t T t

  

=1

where T is the length of the target summary.

  2.4 Hierarchical Attention The hierarchical attention mechanism consists of a word-level attention reader and a sentence-level attention reader, so as to take full advantage of the multi- level knowledge captured by the hierarchical document encoder. The sentence- level attention indicates the salience distribution over the source sentences. It is calculated as follows: s e t,i

  α t,i = (15) m s k =1 t,k e s sT dec s s s ← → e h h t,i i = exp{V · tanh(W · ˜ t + W · + b )} (16) s dec s s

  

1

  1

  1 where V , W , W and b are learnable parameters.

  1

  1

1 The word-level attention indicates the salience distribution over the source

  words. As the hierarchical encoder reads the input sentences independently, our model has two distinctions. First, the word-level attention is calculated within a sentence. Second, we multiply the word-level attention by the sentence-level attention of the sentence which the word belongs to. The word-level attention calculation is shown below: w e t,i,j

  β t,i,j = α t,i n i (17) w l t,i,l e

  =1 w w T dec w w w ← → e h h t,i,j = exp{V · tanh(W · ˜ t + W · + b )} (18) w dec w w

  

2

2 i,j

  2

  where V , W , W and b are learnable parameters for the word-level atten-

  2

  2

  2 tion calculation.

  The abstractive summary of a long document can be viewed as a new expres- sion of the most salient sentences of the document, so that a well-learned sentence extractor and a well-learned attention distribution should both be able to detect

  

Abstractive Summarization with the Aid of Extractive Summarization

  9

  the important sentences of the source document. Motivated by this, we design a constraint to the sentence-level attention which is an L2 loss as follows: m T

  1

  1

  2 E a = (p − α t,i ) (19) i m T i =1 =1 t

  As p is calculated by the logistic function which is trained simultaneously with i the decoder, it is not suitable to constrain the attention with the inaccurate p . In our experiment, we use the labels of sentence extractor to constrain the i attention.

  2.5 Joint Learning We combine three types of loss functions mentioned above to train our proposed model – the negative log likelihood loss E y for the decoder, the cross entropy loss E se for the extractor, and the L2 loss E a as the attention constraint which performs as a regularizer. Hence,

  E = E y + λ · E se + γ · E a (20) The parameters are trained to minimize the joint loss function. In the infer- ence stage, we use the beam search algorithm to select the word which approxi- mately maximizes the conditional probability

  .

3 Experimental Setup

  3.1 Dataset We adopt the news dataset which is collected from the websites of CNN and DailyMail. It is originally prepared for the task of machine reading by Hermann et al.

   ] added labels to the sentences for the task of

  extractive summarization. The corpus contains pairs of news content and human- generated highlights for training, validation and test. Table

   lists the details of the dataset.

  

Table 1. The statistics of the CNN/DailyMail dataset. S.S.N. indicates the average

number of sentences in the source document. S.S.L. indicates the average length of the

sentences in the source document. T.S.L. indicates the average length of the sentences

in the target summary.

  

Dataset Train Valid Test S.S.N S.S.L T.S.L

CNN/DailyMail 277,554 13,367 11,443 26.9

  27.3

  53.8

  10 Y. Chen et al.

  3.2 Implementation Details In our implementation, we set the vocabulary size D to be 50 K and word embed- ding size d as 300. The word embeddings have not been pretrained as the training corpus is large enough to train them from scratch. We cut off the documents as a maximum of 35 sentences and truncate the sentences with a maximum of 50 words. We also truncate the targeted summaries with a maximum of 100 words. The word-level encoder and the sentence-level encoder each corresponds a layer of bidirectional GRU, and the decoder also is a layer of unidirectional GRU.

  All the three networks have the hidden size H as 200. For the loss function, λ is set as 100 and γ is set as 0.5. During the training process, we use Adagrad optimizer

  with the learning rate of 0.15 and initial accumulator value of

  0.1. The mini-batch size is 16. We implement the model in Tensorflow and train it using a GTX-1080Ti GPU. The beam search size for decoding is 5. We use ROUGE scores ] to evaluate the summarization models.

  4 Experimental Results

  4.1 Comparison with Baselines We compare the full-length Rouge-F1 score on the entire CNN/DailyMail test set. We use the fundamental sequence-to-sequence attentional model and the words-lvt2k-hieratt

  

  

Table 2. Performance comparison of various abstractive models on the entire

CNN/DailyMail test set using full- length F1 variants of Rouge.

  

Method Rouge-1 Rouge-2 Rouge-L

seq2seq+attn

  33.6

  12.3

  31.0 words-lvt2k-hieratt 35.4

  13.3

  32.6 Our method

  

35.8

  13.6

  33.4 From Table we can see that our model performs the best in Rouge-1, Rouge-

  2 and Rouge-L. Compared to the vanilla sequence-to-sequence attentional model, our proposed model performs quite better. And compared to the hierarchical model, our model performs better in Rouge-L, which is due to the incorporation of the auxiliary task.

  4.2 Evaluation of Proposed Components To verify the effectiveness of our proposed model, we conduct ablation study by removing the corresponding parts, i.e. the auxiliary extractive task, the attention constraint and combination of them in order to make a comparison among their

  

Abstractive Summarization with the Aid of Extractive Summarization

  11 Table 3. Performance comparison of removing the components of our proposed model

on the entire CNN/DailyMail test set using full-length F1 variants of Rouge.

  

Method Rouge-1 Rouge-2 Rouge-L

Our method

  35.8

  13.6

  33.4 w/o extr

  34.3

  12.6

  31.6 w/o attn

  34.7

  12.8

  32.2 w/o extr+attn 34.2

  12.5

  31.6 effects. We choose the full-length Rouge-F1 score on the test sets for evaluation.

  The results are shown in Table