Web and Big Data Part I 2018 pdf pdf
Yi Cai Yoshiharu Ishikawa (Eds.) Jianliang Xu
LNCS 10987
Web and Big Data
Second International Joint Conference, APWeb-WAIM 2018 Macau, China, July 23–25, 2018 Proceedings, Part I
Lecture Notes in Computer Science 10987
Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison Lancaster University, Lancaster, UK
Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler University of Surrey, Guildford, UK
Jon M. Kleinberg Cornell University, Ithaca, NY, USA
Friedemann Mattern ETH Zurich, Zurich, Switzerland
John C. Mitchell Stanford University, Stanford, CA, USA
Moni Naor Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos University of California, Los Angeles, CA, USA
Doug Tygar University of California, Berkeley, CA, USA
Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at
- • Yi Cai Yoshiharu Ishikawa Jianliang Xu (Eds.)
Web and Big Data
Second International Joint Conference, APWeb-WAIM 2018 Macau, China, July 23–25, 2018 Proceedings, Part I Editors Yi Cai Jianliang Xu South China University of Technology Hong Kong Baptist University Guangzhou Kowloon Tong, Hong Kong China China Yoshiharu Ishikawa Nagoya University Nagoya Japan
ISSN 0302-9743
ISSN 1611-3349 (electronic) Lecture Notes in Computer Science
ISBN 978-3-319-96889-6
ISBN 978-3-319-96890-2 (eBook) https://doi.org/10.1007/978-3-319-96890-2 Library of Congress Control Number: 2018948814 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG, part of Springer Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG
Preface
This volume (LNCS 10987) and its companion volume (LNCS 10988) contain the proceedings of the second Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, called APWeb-WAIM. This joint conference aims to attract participants from different scientific communities as well as from industry, and not merely from the Asia Pacific region, but also from other continents. The objective is to enable the sharing and exchange of ideas, expe- riences, and results in the areas of World Wide Web and big data, thus covering Web technologies, database systems, information management, software engineering, and big data. The second APWeb-WAIM conference was held in Macau during July 23–25, 2018. As an Asia-Pacific flagship conference focusing on research, develop- ment, and applications in relation to Web information management, APWeb-WAIM builds on the successes of APWeb and WAIM: APWeb was previously held in Beijing (1998), Hong Kong (1999), Xi’an (2000), Changsha (2001), Xi’an (2003), Hangzhou (2004), Shanghai (2005), Harbin (2006), Huangshan (2007), Shenyang (2008), Suzhou (2009), Busan (2010), Beijing (2011), Kunming (2012), Sydney (2013), Changsha (2014), Guangzhou (2015), and Suzhou (2016); and WAIM was held in Shanghai (2000), Xi’an (2001), Beijing (2002), Chengdu (2003), Dalian (2004), Hangzhou (2005), Hong Kong (2006), Huangshan (2007), Zhangjiajie (2008), Suzhou (2009), Jiuzhaigou (2010), Wuhan (2011), Harbin (2012), Beidaihe (2013), Macau (2014), Qingdao (2015), and Nanchang (2016). The first joint APWeb-WAIM conference was held in Bejing (2017). With the fast development of Web-related technologies, we expect that APWeb-WAIM will become an increasingly popular forum that brings together outstanding researchers and developers in the field of the Web and big data from around the world. The high-quality program documented in these proceedings would not have been possible without the authors who chose APWeb-WAIM for disseminating their findings. Out of 168 submissions, the conference accepted 39 regular (23.21%), 31 short research papers, and six demonstrations. The contributed papers address a wide range of topics, such as text analysis, graph data processing, social networks, recommender systems, information retrieval, data streams, knowledge graph, data mining and application, query processing, machine learning, database and Web applications, big data, and blockchain. The technical program also included keynotes by Prof. Xuemin Lin (The University of New South Wales, Australia), Prof. Lei Chen (The Hong Kong University of Science and Technology, Hong Kong, SAR China), and Prof. Ninghui Li (Purdue University, USA) as well as industrial invited talks by Dr. Zhao Cao (Huawei Blockchain) and Jun Yan (YiDu Cloud). We are grateful to these distinguished scientists for their invaluable contributions to the conference program. As a joint conference, teamwork was particularly important for the success of APWeb-WAIM. We are deeply thankful to the Program Committee members and the external reviewers for lending their time and expertise to the con- ference. Special thanks go to the local Organizing Committee led by Prof. Zhiguo Gong. VI Preface
Thanks also go to the workshop co-chairs (Leong Hou U and Haoran Xie), demo co-chairs (Zhixu Li, Zhifeng Bao, and Lisi Chen), industry co-chair (Wenyin Liu), tutorial co-chair (Jian Yang), panel chair (Kamal Karlapalem), local arrangements chair (Derek Fai Wong), and publicity co-chairs (An Liu, Feifei Li, Wen-Chih Peng, and Ladjel Bellatreche). Their efforts were essential to the success of the conference. Last but not least, we wish to express our gratitude to the treasurer (Andrew Shibo Jiang), the Webmaster (William Sio) for all the hard work, and to our sponsors who generously supported the smooth running of the conference. We hope you enjoy the exciting program of APWeb-WAIM 2018 as documented in these proceedings. June 2018
Yi Cai Jianliang Xu
Yoshiharu Ishikawa Organization Organizing Committee
Honorary Chair Lionel Ni University of Macau, SAR China General Co-chairs Zhiguo Gong University of Macau, SAR China Qing Li City University of Hong Kong, SAR China Kam-fai Wong Chinese University of Hong Kong, SAR China Program Co-chairs Yi Cai South China University of Technology, China Yoshiharu Ishikawa Nagoya University, Japan Jianliang Xu Hong Kong Baptist University, SAR China Workshop Chairs Leong Hou U University of Macau, SAR China Haoran Xie Education University of Hong Kong, SAR China Demo Co-chairs Zhixu Li Soochow University, China Zhifeng Bao RMIT, Australia Lisi Chen Wollongong University, Australia Tutorial Chair Jian Yang Macquarie University, Australia Industry Chair Wenyin Liu Guangdong University of Technology, China Panel Chair Kamal Karlapalem
IIIT, Hyderabad, India Publicity Co-chairs An Liu Soochow University, China Feifei Li University of Utah, USA Wen-Chih Peng National Taiwan University, China Ladjel Bellatreche
ISAE-ENSMA, Poitiers, France Treasurers Leong Hou U University of Macau, SAR China Andrew Shibo Jiang Macau Convention and Exhibition Association,
SAR China Local Arrangements Chair Derek Fai Wong University of Macau, SAR China Webmaster William Sio University of Macau, SAR China
Senior Program Committee
Bin Cui Peking University, China Byron Choi Hong Kong Baptist University, SAR China Christian Jensen Aalborg University, Denmark Demetrios
Zeinalipour-Yazti University of Cyprus, Cyprus
Feifei Li University of Utah, USA Guoliang Li Tsinghua University, China K. Selçuk Candan Arizona State University, USA Kyuseok Shim Seoul National University, South Korea Makoto Onizuka Osaka University, Japan Reynold Cheng The University of Hong Kong, SAR China Toshiyuki Amagasa University of Tsukuba, Japan Walid Aref Purdue University, USA Wang-Chien Lee Pennsylvania State University, USA Wen-Chih Peng National Chiao Tung University, Taiwan Wook-Shin Han Pohang University of Science and Technology, South Korea Xiaokui Xiao National University of Singapore, Singapore Ying Zhang University of Technology Sydney, Australia
Program Committee
Alex Thomo University of Victoria, Canada An Liu Soochow University, China Baoning Niu Taiyuan University of Technology, China Bin Yang Aalborg University, Denmark Bo Tang Southern University of Science and Technology, China Zouhaier Brahmia University of Sfax, Tunisia Carson Leung University of Manitoba, Canada
VIII Organization
Chih-Chien Hung Tamkang University, China Chih-Hua Tai National Taipei University, China Cuiping Li Renmin University of China, China Daniele Riboni University of Cagliari, Italy Defu Lian Big Data Research Center, University of Electronic
Science and Technology of China, China Dejing Dou University of Oregon, USA Dimitris Sacharidis Technische Universität Wien, Austria Ganzhao Yuan Sun Yat-sen University, China Giovanna Guerrini Università di Genova, Italy Guanfeng Liu The University of Queensland, Australia Guoqiong Liao Jiangxi University of Finance and Economics, China Guanling Lee National Dong Hwa University, China Haibo Hu Hong Kong Polytechnic University, SAR China Hailong Sun Beihang University, China Han Su University of Southern California, USA Haoran Xie The Education University of Hong Kong, SAR China Hiroaki Ohshima University of Hyogo, Japan Hong Chen Renmin University of China, China Hongyan Liu Tsinghua University, China Hongzhi Wang Harbin Institute of Technology, China Hongzhi Yin The University of Queensland, Australia Hua Wang Victoria University, Australia Ilaria Bartolini University of Bologna, Italy James Cheng Chinese University of Hong Kong, SAR China Jeffrey Xu Yu Chinese University of Hong Kong, SAR China Jiajun Liu Renmin University of China, China Jialong Han Nanyang Technological University, Singapore Jianbin Huang Xidian University, China Jian Yin Sun Yat-sen University, China Jiannan Wang Simon Fraser University, Canada Jianting Zhang City College of New York, USA Jianxin Li Beihang University, China Jianzhong Qi University of Melbourne, Australia Jinchuan Chen Renmin University of China, China Ju Fan Renmin University of China, China Jun Gao Peking University, China Junhu Wang
Griffith University, Australia Kai Zeng Microsoft, USA Kai Zheng University of Electronic Science and Technology of China, China Karine Zeitouni Université de Versailles Saint-Quentin, France Lei Zou Peking University, China Leong Hou U University of Macau, SAR China Liang Hong Wuhan University, China
Organization
IX Lisi Chen Wollongong University, Australia Lu Chen Aalborg University, Denmark Maria Damiani University of Milan, Italy Markus Endres University of Augsburg, Germany Mihai Lupu Vienna University of Technology, Austria Mirco Nanni
ISTI-CNR Pisa, Italy Mizuho Iwaihara Waseda University, Japan Peiquan Jin University of Science and Technology of China, China Peng Wang Fudan University, China Qin Lu University of Technology Sydney, Australia Ralf Hartmut Güting Fernuniversität in Hagen, Germany Raymond Chi-Wing Wong Hong Kong University of Science and Technology,
SAR China Ronghua Li Shenzhen University, China Rui Zhang University of Melbourne, Australia Sanghyun Park Yonsei University, South Korea Sanjay Madria Missouri University of Science and Technology, USA Shaoxu Song Tsinghua University, China Shengli Wu Jiangsu University, China Shimin Chen Chinese Academy of Sciences, China Shuai Ma Beihang University, China Shuo Shang King Abdullah University of Science and Technology,
Saudi Arabia Takahiro Hara Osaka University, Japan Tieyun Qian Wuhan University, China Tingjian Ge University of Massachusetts, Lowell, USA Tom Z. J. Fu Advanced Digital Sciences Center, Singapore Tru Cao Ho Chi Minh City University of Technology, Vietnam Vincent Oria New Jersey Institute of Technology, USA Wee Ng Institute for Infocomm Research, Singapore Wei Wang University of New South wales, Australia Weining Qian East China Normal University, China Weiwei Sun Fudan University, China Wen Zhang Wuhan University, China Wolf-Tilo Balke Technische Universität Braunschweig, Germany Wookey Lee Inha University, South Korea Xiang Zhao National University of Defence Technology, China Xiang Lian Kent State University, USA Xiangliang Zhang King Abdullah University of Science and Technology,
Saudi Arabia Xiangmin Zhou RMIT University, Australia Xiaochun Yang Northeast University, China Xiaofeng He East China Normal University, China Xiaohui (Daniel) Tao The University of Southern Queensland, Australia Xiaoyong Du Renmin University of China, China
X Organization
Xin Cao The University of New South Wales, Australia Xin Huang Hong Kong Baptist University, SAR China Xin Wang Tianjin University, China Xingquan Zhu Florida Atlantic University, USA Xuan Zhou Renmin University of China, China Yafei Li Zhengzhou University, China Yanghua Xiao Fudan University, China Yanghui Rao Sun Yat-sen University, China Yang-Sae Moon Kangwon National University, South Korea Yaokai Feng Kyushu University, Japan Yi Cai South China University of Technology, China Yijie Wang National University of Defense Technology, China Yingxia Shao Peking University, China Yongxin Tong Beihang University, China Yu Gu Northeastern University, China Yuan Fang Institute for Infocomm Research, Singapore Yunjun Gao Zhejiang University, China Zakaria Maamar Zayed University, United Arab of Emirates Zhaonian Zou Harbin Institute of Technology, China Zhiwei Zhang Hong Kong Baptist University, SAR China
Organization
XI Keynotes
Graph Processing: Applications, Challenges,
and Advances
Xuemin Lin
School of Computer Science and Engineering,
University of New South Wales, Sydney
lxue@cse.unsw.edu.au
Abstract. Graph data are key parts of Big Data and widely used for modelling
complex structured data with a broad spectrum of applications. Over the last
decade, tremendous research efforts have been devoted to many fundamental
problems in managing and analyzing graph data. In this talk, I will cover various
applications, challenges, and recent advances. We will also look to the future
of the area.
Differential Privacy in the Local Setting
Ninghui Li
Department of Computer Sciences, Purdue University
ninghui@cs.purdue.edu
Abstract. Differential privacy has been increasingly accepted as the de facto
standard for data privacy in the research community. Recently, techniques for
satisfying differential privacy (DP) in the local setting, which we call LDP, have
been deployed. Such techniques enable the gathering of statistics while pre-
serving privacy of every user, without relying on trust in a single data curator.
Companies such as Google, Apple, and Microsoft have deployed techniques for
collecting user data while satisfying LDP. In this talk, we will discuss the state
of the art of LDP. We survey recent developments for LDP, and discuss pro-
tocols for estimating frequencies of different values under LDP, and for com-
puting marginal when each user has multiple attributes. Finally, we discuss
limitations and open problems of LDP.
Big Data, AI, and HI, What is the Next?
Lei Chen
Department of Computer Science and Engineering, Hong Kong University
of Science and Technology
leichen@cse.ust.hk
Abstract. Recently, AI has become quite popular and attractive, not only to the
academia but also to the industry. The successful stories of AI on Alpha-go and
Texas hold ’em games raise significant public interests on AI. Meanwhile,
human intelligence is turning out to be more sophisticated, and Big Data
technology is everywhere to improve our life quality. The question we all want
to ask is “what is the next?”. In this talk, I will discuss about DHA, a new
computing paradigm, which combines big Data, Human intelligence, and AI.
First I will briefly explain the motivation of DHA. Then I will present some
challenges and possible solutions to build this new paradigm.
Contents – Part I
. . .
. . .
. . .
Gang Chen, Yue Peng, and Chongjun Wang . . .
. . .
Zenan Xu, Yetao Fu, Xingming Chen, Yanghui Rao, Haoran Xie, Fu Lee Wang, and Yang Peng
. . .
Xuewen Shi, Heyan Huang, Ping Jian, and Yi-Kun Tang . . .
. . .
Jianjun Cheng, Longjie Li, Haijuan Yang, Qi Li, and Xiaoyun Chen
Jian Xu, Xiaoyi Fu, Liming Tu, Ming Luo, Ming Xu, and Ning Zheng
Menghao Zhang, Binbin Hu, Chuan Shi, Bin Wu, and Bai Wang
Zhijian Zhang, Ling Liu, Kun Yue, and Weiyi Liu XX Contents – Part I
Guowang Du, Lihua Zhou, Lizhen Wang, and Hongmei Chen
Cairong Yan, Yan Huang, Qinglong Zhang, and Yan Wan
Zhang Chuanyan, Hong Xiaoguang, and Peng Zhaohui
Xiaotian Han, Chuan Shi, Lei Zheng, Philip S. Yu, Jianxin Li, and Yuanfu Lu
Feifei Li, Hongyan Liu, Jun He, and Xiaoyong Du
Hongzhi Liu, Yingpeng Du, and Zhonghai Wu
Zhenhua Huang, Chang Yu, Jiujun Cheng, and Zhixiao Wang
Mohammad Hossein Namaki, Yinghui Wu, and Xin Zhang
Xin Ding, Yuanliang Zhang, Lu Chen, Yunjun Gao, and Baihua Zheng
Yan Fan, Xinyu Liu, Shuni Gao, Zhaohua Zhang, Xiaoguang Liu, and Gang Wang
Yang Song, Yu Gu, and Ge Yu
Zhongmin Zhang, Jiawei Chen, and Shengli Wu
XXI Contents – Part I
Anzhen Zhang, Jinbao Wang, Jianzhong Li, and Hong Gao
Wenjing Wei, Xiaoyi Jia, Yang Liu, and Xiaohui Yu 2 DMDP
Yi Zhao, Yong Huang, and Yanyan Shen
Linlin Gao, Haiwei Pan, Fujun Liu, Xiaoqin Xie, Zhiqiang Zhang, Jinming Han, and the Alzheimer’s Disease Neuroimaging Initiative
Bofang Li, Tao Liu, Zhe Zhao, and Xiaoyong Du
Fan Feng, Jikai Wu, Wei Sun, Yushuang Wu, HuaKang Li, and Xingguo Chen
Zherong Zhang, Wenge Rong, Yuanxin Ouyang, and Zhang Xiong
Jianhui Ding, Shiheng Ma, Weijia Jia, and Minyi Guo
Yongjian You, Shaohua Zhang, Jiong Lou, Xinsong Zhang, and Weijia Jia
Zichao Huang, Bo Li, and Jian Yin
Qiang Xu, Xin Wang, Jianxin Li, Ying Gan, Lele Chai, and Junhu Wang
Yongxin Shen, Zhixu Li, Wenling Zhang, An Liu, and Xiaofang Zhou
Guozheng Rao, Bo Zhao, Xiaowang Zhang, Zhiyong Feng, and Guohui Xiao
Peizhong Yang, Tao Zhang, and Lizhen Wang
Yuan Fang, Lizhen Wang, Teng Hu, and Xiaoxuan Wang
Zichen Wang, Tian Li, Yingxia Shao, and Bin Cui
Xiaoli Wang, Chuchu Gao, Jiangjiang Cao, Kunhui Lin, Wenyuan Du, and Zixiang Yang
Chaozhou Yang, Xin Wang, Qiang Xu, and Weixi Li
Yajie Zhu, Feng Xiong, Qing Xie, Lin Li, and Yongjian Liu
XXII Contents – Part I
Contents – Part II
. . .
He Chen, Xiuxia Tian, and Cheqing Jin . . .
Jian Li, An Liu, Weiqi Wang, Zhixu Li, Guanfeng Liu, Lei Zhao, and Kai Zheng . . .
Huan Zhou, Jinwei Guo, Ouya Pei, Weining Qian, Xuan Zhou, and Aoying Zhou . . .
Xiaolan Zhang, Yeting Li, Fei Tian, Fanlin Cui, Chunmei Dong, and Haiming Chen . . .
Wei Chit Tan . . .
Jiaxun Hua, Yu Liu, Yibin Shen, Xiuxia Tian, and Cheqing Jin . . .
. . .
Rong Kang, Chen Wang, Peng Wang, Yuting Ding, and Jianmin Wang
Guiling Wang, Xiaojiang Zuo, Marc Hesenius, Yao Xu, Yanbo Han, and Volker Gruhn XXIV Contents – Part II
Wentao Wang, Chunqiu Zeng, and Tao Li
Hongbo Sun, Chenkai Guo, Jing Xu, Jingwen Zhu, and Chao Zhang
Rong Liu, Guanglin Cong, Bolong Zheng, Kai Zheng, and Han Su
Wentao Wang, Bo Tang, and Min Zhu
Na Ta, Jiuqi Wang, and Guoliang Li
Chen Yang, Wei Chen, Bolong Zheng, Tieke He, Kai Zheng, and Han Su
Xibo Zhou, Qiong Luo, Dian Zhang, and Lionel M. Ni
Minxi Li, Jiali Mao, Xiaodong Qi, Peisen Yuan, and Cheqing Jin
Meiling Zhu, Chen Liu, and Yanbo Han
Zhaoan Dong, Ju Fan, Jiaheng Lu, Xiaoyong Du, and Tok Wang Ling
Guohai Xu, Chengyu Wang, and Xiaofeng He
Haifeng Zhu, Pengpeng Zhao, Zhixu Li, Jiajie Xu, Lei Zhao, and Victor S. Sheng
Yuan Fang, Lizhen Wang, and Teng Hu
XXV Contents – Part II
Lihua Zhou, Guowang Du, Qing Xiao, and Lizhen Wang
Xin Ding, Yuanliang Zhang, Lu Chen, Keyu Yang, and Yunjun Gao
Muxi Leng, Yajun Yang, Junhu Wang, Qinghua Hu, and Xin Wang
Guohui Li, Qi Chen, Bolong Zheng, and Xiaosong Zhao
Wenyan Chen, Zheng Liu, Wei Shi, and Jeffrey Xu Yu
Zhefan Zhong, Xin Lin, Liang He, and Yan Yang
Yifeng Luo, Junshi Guo, and Shuigeng Zhou
Yin Zhang, Huiping Liu, Cheqing Jin, and Ye Guo
Kun Hao, Junchang Xin, Zhiqiong Wang, Zhuochen Jiang, and Guoren Wang
An Zhang and Kunlong Zhang
Dayu Jia, Junchang Xin, Zhiqiong Wang, Wei Guo, and Guoren Wang
Wen Zhao and Xiaoying Wu
Text Analysis
Abstractive Summarization with the Aid
of Extractive Summarization
( )B
Yangbin Chen , Yun Ma, Xudong Mao, and Qing Li
City University of Hong Kong, Hong Kong SAR, China
{robinchen2-c,yunma3-c,xdmao2-c}@my.cityu.edu.hk,
qing.li@cityu.edu.hk
Abstract. Currently the abstractive method and extractive method are
two main approaches for automatic document summarization. To fully
integrate the relatedness and advantages of both approaches, we pro-
pose in this paper a general framework for abstractive summarization
which incorporates extractive summarization as an auxiliary task. In
particular, our framework is composed of a shared hierarchical docu-
ment encoder, an attention-based decoder for abstractive summarization,
and an extractor for sentence-level extractive summarization. Learn-
ing these two tasks jointly with the shared encoder allows us to bet-
ter capture the semantics in the document. Moreover, we constrain the
attention learned in the abstractive task by the salience estimated in
the extractive task to strengthen their consistency. Experiments on the
CNN/DailyMail dataset demonstrate that both the auxiliary task and
the attention constraint contribute to improve the performance signifi-
cantly, and our model is comparable to the state-of-the-art abstractive
models.Keywords: Abstractive document summarization Squence-to-sequence Joint learning ·
1 Introduction
Automatic document summarization has been studied for decades. The target of document summarization is to generate a shorter passage from the docu- ment in a grammatically and logically coherent way, meanwhile preserving the important information. There are two main approaches for document summa- rization: extractive summarization and abstractive summarization. The extrac- tive method first extracts salient sentences or phrases from the source document and then groups them to produce a summary without changing the source text. Graph-based ranking model
are
typical models for extractive summarization. However, the extractive method unavoidably includes secondary or redundant information and is far from the way humans write summaries ].
4 Y. Chen et al.
The abstractive method, in contrast, produces generalized summaries, con- veying information in a concise way, and eliminating the limitations to the orig- inal words and sentences of the document. This task is more challenging since it needs advanced language generation and compression techniques. Discourse structures
] are most commonly used by researchers for generating abstractive summaries.
Recently, Recurrent Neural Network (RNN)-based sequence-to-sequence model with attention mechanism has been applied to abstractive summariza- tion, due to its great success in machine translation
. However, there
are still some challenges. First, the RNN-based models have difficulties in cap- turing long-term dependencies, making summarization for long document much tougher. Second, different from machine translation which has strong correspon- dence between the source and target words, an abstractive summary corresponds to only a small part of the source document, making its attention difficult to be learned.
We adopt hierarchical approaches for the long-term dependency problem, which have been used in many tasks such as machine translation and docu- ment classification
]. But few of them have been applied to the abstractive
summarization tasks. In particular, we encode the input document in a hierar- chical way from word-level to sentence-level. There are two advantages. First, it captures both the local and global semantic representations, resulting in better feature learning. Second, it improves the training efficiency because the time complexity of the RNN-based model can be reduced by splitting the long docu- ment into short sentences.
The attention mechanism is widely used in sequence-to-sequence tasks
]. However, for abstractive summarization, it is difficult to learn the attention
since only a small part of the source document is important to the summary. In this paper, we propose two methods to learn a better attention distribution. First, we use a hierarchical attention mechanism, which means that the attention is applied in both word and sentence levels. Similar to the hierarchical approach in encoding, the advantage of using hierarchical attention is to capture both the local and global semantic representations. Second, we use the salience scores of the auxiliary task (i.e., the extractive summarization) to constrain the sentence- level attention.
In this paper, we present a novel technique for abstractive summarization which incorporates extractive summarization as an auxiliary task. Our frame- work consists of three parts: a shared document encoder, a hierarchical attention- based decoder and an extractor. As Fig.
(1) and (2)) in order to address the long-term depen-
dency problem. Then the learned document representations are shared by the extractor (Fig.
(5)).
The extractor and the decoder are jointly trained which can capture better semantics of the document. Furthermore, as both the sentence salience scores in the extractor and the sentence-level attention in the decoder indicate the
Abstractive Summarization with the Aid of Extractive Summarization
5 Fig. 1. General framework of our proposed model with 5 components: (1) word-level
encoder encodes the sentences word-by-word independently, (2) sentence- level encoder
encodes the document sentence-by-sentence, (3) sentence extractor makes binary clas-
sification for each sentence, (4) hierarchical attention calculates the word-level and
sentence-level context vectors for decoding steps, (5) decoder decodes the output
sequential word sequence with a beam-search algorithm.importance of source sentences, we constrain the learned attention (Fig.
(4)) with the extracted sentence salience in order to strengthen their consistency.
We have conducted experiments on a news corpus - the CNN/DailyMail dataset
. The results demonstrate that adding the auxiliary extractive task
and constraining the attention are both useful to improve the performance of the abstractive task, and our proposed joint model is comparable to the state- of-the-art abstractive models.
2 Neural Summarization Model
In this section we describe the framework of our proposed model which consists of five components. As illustrated in Fig.
the hierarchical document encoder
which includes both the word-level and the sentence-level encoders reads the input word sequences and generates shared document representations. On one hand, the shared representations are fed into the sentence extractor which is a sequence labeling model to calculate salience scores. On the other hand, the representations are used to generate abstractive summaries by a GRU-based lan- guage model, with the hierarchical attention including the sentence-level atten- tion and word-level attention. Finally, the two tasks are jointly trained.
6 Y. Chen et al.
2.1 Shared Hierarchical Document Encoder We encode the document in a hierarchical way. In particular, the word sequences are first encoded by a bidirectional GRU network parallelly, and a sequence of sentence-level vector representations called sentence embeddings are generated. Then the sentence embeddings are fed into another bidirectional GRU network and get the document representations. Such an architecture has two advantages.
First, it can reduce the negative effects during the training process caused by the long-term dependency problem, so that the document can be represented from both local and global aspects. Second, it helps improve the training efficiency as the time complexity of RNN-based model increases with the sequence length.
Formally, let V denote the vocabulary which contains D tokens, and each token is embedded as a d-dimension vector. Given an input document X con- , i taining m sentences {X i ∈ 1, ..., m}, let n i denote the number of words in X i . Word-level Encoder reads a sentence word-by-word until the end, using a bidirectional GRU network as the following equations:
← → − → ← − w w w h = [ h ; h ] (1) i,j i,j i,j → − → − w w h = GRU (x i,j , h ) (2) i,j i,j −
1
← − ← − w w h , h i,j i,j +1 = GRU (x i,j ) (3) where x i,j represents the embedding vector of the jth word in th ith sentence.
← → w → − w h is a concatenated vector of the forward hidden state h and the backward i,j i,j
← − w h hidden state . H is the size of the hidden state. i,j
Furthermore, the ith sentence is represented by a non-linear transformation of the word-level hidden states as follows: n i 1 ← → w s i = tanh(W · h + b) (4) i,j n i j
=1 where s i is the sentence embedding and W, b are learnable parameters.
Sentence-level Encoder reads a document sentence-by-sentence until the end, using another bi-directional GRU network as depicted by the following equations:
← → − → ← − s s s h = [ h ; h ] (5) i i i → − → − s s h = GRU (s i , h ) (6) i i −
1
← − ← − s s h = GRU (s i , h ) (7) i i
- 1
← → s → − s where h is a concatenated vector of the forward hidden state h and the i i
← − s backward hidden state h . i ← → s
The concatenated vectors h are document representations shared by the i two tasks which will be introduced next.
Abstractive Summarization with the Aid of Extractive Summarization
7
2.2 Sentence Extractor The sentence extractor can be viewed as a sequential binary classifier. We use a logistic function to calculate a score between 0 and 1, which is an indicator of whether or not to keep the sentence in the final summary. The score can also be considered as the salience of a sentence in the document. Let p denote the i score and q ∈ {0, 1} denote the result of whether or not to keep the sentence. i
In particular, p is calculated as follows: i ← → s h p = P (q = 1| ) i i i
(8) extr s extr ← → extr = σ(W · h + b ) extr i where W is the weight and b is the bias which can be learned. The sentence extractor generates a sequence of probabilities indicating the importance of the sentences. As a result, the extractive summary is created by selecting sentences with a probability larger than a given threshold τ . We set τ
= 0.5 in our experiment. We choose the cross entropy as the extractive loss function, i.e., m
1 E se = − q logp + (1 − q )log(1 − p ) (9) i i i i m i =1
2.3 Decoder Our decoder is a unidirectional GRU network with hierarchical attention. We use the attention to calculate the context vectors which are weighted sums of the hidden states of the hierarchical encoders. The equations are given as below: m s s ← → c = α t,i · h (10) t i i m =1 n i w w ← → s w c = β t,i,j · h (11) t i,j i =1 j =1 where c is the sentence-level context vector and c is the word-level context t t vector at decoding time step t. Specifically, α t,i denotes the attention value on the ith sentence and β t,i,j denotes the attention value on the jth word of the i th sentence.
The input of the GRU-based language model at decoding time step t con- tains three vectors: the word embedding of previous generated word ˆ y , the t − s
1
sentence-level context vector of previous time step c and the word-level con- t − w
1
text vector of previous time step c . They are transformed by a linear function t −
1
and fed into the language model as follows: s w ˜
, f , c , c h t = GRU (˜ h t − in (ˆ y )) (12)
1 t − t − t −
1
1
1
8 Y. Chen et al.
where ˜ h t is the hidden state of decoding time step t. f in is the linear transfor- dec dec mation function with W as the weight and b as the bias.
The hidden states of the language model are used to generate the output word sequence. The conditional probability distribution over the vocabulary in the tth time step is: s w
P y y , ..., y , x h , c , c (ˆ |ˆ ˆ ) = g(f out (˜ t )) (13) t
1 t − 1 t t sof t
where g is the softmax function and f out is a linear function with W and sof t b as learnable parameters.
The negative log likelihood loss is applied as the loss of the decoder, i.e., T
1 E y = −log(y ) (14) t T t
=1
where T is the length of the target summary.2.4 Hierarchical Attention The hierarchical attention mechanism consists of a word-level attention reader and a sentence-level attention reader, so as to take full advantage of the multi- level knowledge captured by the hierarchical document encoder. The sentence- level attention indicates the salience distribution over the source sentences. It is calculated as follows: s e t,i
α t,i = (15) m s k =1 t,k e s sT dec s s s ← → e h h t,i i = exp{V · tanh(W · ˜ t + W · + b )} (16) s dec s s
1
1
1 where V , W , W and b are learnable parameters.
1
1
1 The word-level attention indicates the salience distribution over the source
words. As the hierarchical encoder reads the input sentences independently, our model has two distinctions. First, the word-level attention is calculated within a sentence. Second, we multiply the word-level attention by the sentence-level attention of the sentence which the word belongs to. The word-level attention calculation is shown below: w e t,i,j
β t,i,j = α t,i n i (17) w l t,i,l e
=1 w w T dec w w w ← → e h h t,i,j = exp{V · tanh(W · ˜ t + W · + b )} (18) w dec w w
2
2 i,j2
where V , W , W and b are learnable parameters for the word-level atten-
2
2
2 tion calculation.
The abstractive summary of a long document can be viewed as a new expres- sion of the most salient sentences of the document, so that a well-learned sentence extractor and a well-learned attention distribution should both be able to detect
Abstractive Summarization with the Aid of Extractive Summarization
9
the important sentences of the source document. Motivated by this, we design a constraint to the sentence-level attention which is an L2 loss as follows: m T
1
1
2 E a = (p − α t,i ) (19) i m T i =1 =1 t
As p is calculated by the logistic function which is trained simultaneously with i the decoder, it is not suitable to constrain the attention with the inaccurate p . In our experiment, we use the labels of sentence extractor to constrain the i attention.
2.5 Joint Learning We combine three types of loss functions mentioned above to train our proposed model – the negative log likelihood loss E y for the decoder, the cross entropy loss E se for the extractor, and the L2 loss E a as the attention constraint which performs as a regularizer. Hence,
E = E y + λ · E se + γ · E a (20) The parameters are trained to minimize the joint loss function. In the infer- ence stage, we use the beam search algorithm to select the word which approxi- mately maximizes the conditional probability
.
3 Experimental Setup
3.1 Dataset We adopt the news dataset which is collected from the websites of CNN and DailyMail. It is originally prepared for the task of machine reading by Hermann et al.
] added labels to the sentences for the task of
extractive summarization. The corpus contains pairs of news content and human- generated highlights for training, validation and test. Table
lists the details of the dataset.
Table 1. The statistics of the CNN/DailyMail dataset. S.S.N. indicates the average
number of sentences in the source document. S.S.L. indicates the average length of the
sentences in the source document. T.S.L. indicates the average length of the sentences
in the target summary.
Dataset Train Valid Test S.S.N S.S.L T.S.L
CNN/DailyMail 277,554 13,367 11,443 26.927.3
53.8
10 Y. Chen et al.
3.2 Implementation Details In our implementation, we set the vocabulary size D to be 50 K and word embed- ding size d as 300. The word embeddings have not been pretrained as the training corpus is large enough to train them from scratch. We cut off the documents as a maximum of 35 sentences and truncate the sentences with a maximum of 50 words. We also truncate the targeted summaries with a maximum of 100 words. The word-level encoder and the sentence-level encoder each corresponds a layer of bidirectional GRU, and the decoder also is a layer of unidirectional GRU.
All the three networks have the hidden size H as 200. For the loss function, λ is set as 100 and γ is set as 0.5. During the training process, we use Adagrad optimizer
with the learning rate of 0.15 and initial accumulator value of
0.1. The mini-batch size is 16. We implement the model in Tensorflow and train it using a GTX-1080Ti GPU. The beam search size for decoding is 5. We use ROUGE scores ] to evaluate the summarization models.
4 Experimental Results
4.1 Comparison with Baselines We compare the full-length Rouge-F1 score on the entire CNN/DailyMail test set. We use the fundamental sequence-to-sequence attentional model and the words-lvt2k-hieratt
Table 2. Performance comparison of various abstractive models on the entire
CNN/DailyMail test set using full- length F1 variants of Rouge.
Method Rouge-1 Rouge-2 Rouge-L
seq2seq+attn33.6
12.3
31.0 words-lvt2k-hieratt 35.4
13.3
32.6 Our method
35.8
13.6
33.4 From Table we can see that our model performs the best in Rouge-1, Rouge-
2 and Rouge-L. Compared to the vanilla sequence-to-sequence attentional model, our proposed model performs quite better. And compared to the hierarchical model, our model performs better in Rouge-L, which is due to the incorporation of the auxiliary task.
4.2 Evaluation of Proposed Components To verify the effectiveness of our proposed model, we conduct ablation study by removing the corresponding parts, i.e. the auxiliary extractive task, the attention constraint and combination of them in order to make a comparison among their
Abstractive Summarization with the Aid of Extractive Summarization
11 Table 3. Performance comparison of removing the components of our proposed model
on the entire CNN/DailyMail test set using full-length F1 variants of Rouge.
Method Rouge-1 Rouge-2 Rouge-L
Our method35.8
13.6
33.4 w/o extr
34.3
12.6
31.6 w/o attn
34.7
12.8
32.2 w/o extr+attn 34.2
12.5
31.6 effects. We choose the full-length Rouge-F1 score on the test sets for evaluation.
The results are shown in Table