Web and Big Data Part II 2018

  Yi Cai Yoshiharu Ishikawa (Eds.) Jianliang Xu

  LNCS 10988

Web and Big Data

  Second International Joint Conference, APWeb-WAIM 2018 Macau, China, July 23–25, 2018 Proceedings, Part II 123

  

Lecture Notes in Computer Science 10988

  Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

  Editorial Board

  David Hutchison Lancaster University, Lancaster, UK

  Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA

  Josef Kittler University of Surrey, Guildford, UK

  Jon M. Kleinberg Cornell University, Ithaca, NY, USA

  Friedemann Mattern ETH Zurich, Zurich, Switzerland

  John C. Mitchell Stanford University, Stanford, CA, USA

  Moni Naor Weizmann Institute of Science, Rehovot, Israel

  C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India

  Bernhard Steffen TU Dortmund University, Dortmund, Germany

  Demetri Terzopoulos University of California, Los Angeles, CA, USA

  Doug Tygar University of California, Berkeley, CA, USA

  Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at

  • Yi Cai Yoshiharu Ishikawa Jianliang Xu (Eds.)

  Web and Big Data

Second International Joint Conference, APWeb-WAIM 2018

Macau, China, July 23–25, 2018 Proceedings, Part II

  123 Editors Yi Cai Jianliang Xu South China University of Technology Hong Kong Baptist University Guangzhou Kowloon Tong, Hong Kong China China Yoshiharu Ishikawa Nagoya University Nagoya Japan

ISSN 0302-9743

  ISSN 1611-3349 (electronic) Lecture Notes in Computer Science

ISBN 978-3-319-96892-6

  ISBN 978-3-319-96893-3 (eBook) https://doi.org/10.1007/978-3-319-96893-3 Library of Congress Control Number: 2018948814 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG, part of Springer Nature 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, express or implied, with respect to the material contained herein or for any errors or

omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

  

Preface

  This volume (LNCS 10987) and its companion volume (LNCS 10988) contain the proceedings of the second Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, called APWeb-WAIM. This joint conference aims to attract participants from different scientific communities as well as from industry, and not merely from the Asia Pacific region, but also from other continents. The objective is to enable the sharing and exchange of ideas, expe- riences, and results in the areas of World Wide Web and big data, thus covering Web technologies, database systems, information management, software engineering, and big data. The second APWeb-WAIM conference was held in Macau during July 23–25, 2018. As an Asia-Pacific flagship conference focusing on research, develop- ment, and applications in relation to Web information management, APWeb-WAIM builds on the successes of APWeb and WAIM: APWeb was previously held in Beijing (1998), Hong Kong (1999), Xi’an (2000), Changsha (2001), Xi’an (2003), Hangzhou (2004), Shanghai (2005), Harbin (2006), Huangshan (2007), Shenyang (2008), Suzhou (2009), Busan (2010), Beijing (2011), Kunming (2012), Sydney (2013), Changsha (2014), Guangzhou (2015), and Suzhou (2016); and WAIM was held in Shanghai (2000), Xi’an (2001), Beijing (2002), Chengdu (2003), Dalian (2004), Hangzhou (2005), Hong Kong (2006), Huangshan (2007), Zhangjiajie (2008), Suzhou (2009), Jiuzhaigou (2010), Wuhan (2011), Harbin (2012), Beidaihe (2013), Macau (2014), Qingdao (2015), and Nanchang (2016). The first joint APWeb-WAIM conference was held in Bejing (2017). With the fast development of Web-related technologies, we expect that APWeb-WAIM will become an increasingly popular forum that brings together outstanding researchers and developers in the field of the Web and big data from around the world. The high-quality program documented in these proceedings would not have been possible without the authors who chose APWeb-WAIM for disseminating their findings. Out of 168 submissions, the conference accepted 39 regular (23.21%), 31 short research papers, and six demonstrations. The contributed papers address a wide range of topics, such as text analysis, graph data processing, social networks, recommender systems, information retrieval, data streams, knowledge graph, data mining and application, query processing, machine learning, database and Web applications, big data, and blockchain. The technical program also included keynotes by Prof. Xuemin Lin (The University of New South Wales, Australia), Prof. Lei Chen (The Hong Kong University of Science and Technology, Hong Kong, SAR China), and Prof. Ninghui Li (Purdue University, USA) as well as industrial invited talks by Dr. Zhao Cao (Huawei Blockchain) and Jun Yan (YiDu Cloud). We are grateful to these distinguished scientists for their invaluable contributions to the conference program. As a joint conference, teamwork was particularly important for the success of APWeb-WAIM. We are deeply thankful to the Program Committee members and the external reviewers for lending their time and expertise to the con- ference. Special thanks go to the local Organizing Committee led by Prof. Zhiguo Gong. VI Preface

  Thanks also go to the workshop co-chairs (Leong Hou U and Haoran Xie), demo co-chairs (Zhixu Li, Zhifeng Bao, and Lisi Chen), industry co-chair (Wenyin Liu), tutorial co-chair (Jian Yang), panel chair (Kamal Karlapalem), local arrangements chair (Derek Fai Wong), and publicity co-chairs (An Liu, Feifei Li, Wen-Chih Peng, and Ladjel Bellatreche). Their efforts were essential to the success of the conference. Last but not least, we wish to express our gratitude to the treasurer (Andrew Shibo Jiang), the Webmaster (William Sio) for all the hard work, and to our sponsors who generously supported the smooth running of the conference. We hope you enjoy the exciting program of APWeb-WAIM 2018 as documented in these proceedings. June 2018

  Yi Cai Jianliang Xu

  Yoshiharu Ishikawa Organization Organizing Committee

  Honorary Chair Lionel Ni University of Macau, SAR China General Co-chairs Zhiguo Gong University of Macau, SAR China Qing Li City University of Hong Kong, SAR China Kam-fai Wong Chinese University of Hong Kong, SAR China Program Co-chairs Yi Cai South China University of Technology, China Yoshiharu Ishikawa Nagoya University, Japan Jianliang Xu Hong Kong Baptist University, SAR China Workshop Chairs Leong Hou U University of Macau, SAR China Haoran Xie Education University of Hong Kong, SAR China Demo Co-chairs Zhixu Li Soochow University, China Zhifeng Bao RMIT, Australia Lisi Chen Wollongong University, Australia Tutorial Chair Jian Yang Macquarie University, Australia Industry Chair Wenyin Liu Guangdong University of Technology, China Panel Chair Kamal Karlapalem

  IIIT, Hyderabad, India Publicity Co-chairs An Liu Soochow University, China Feifei Li University of Utah, USA Wen-Chih Peng National Taiwan University, China Ladjel Bellatreche

  ISAE-ENSMA, Poitiers, France Treasurers Leong Hou U University of Macau, SAR China Andrew Shibo Jiang Macau Convention and Exhibition Association,

  SAR China Local Arrangements Chair Derek Fai Wong University of Macau, SAR China Webmaster William Sio University of Macau, SAR China

  Senior Program Committee

  Bin Cui Peking University, China Byron Choi Hong Kong Baptist University, SAR China Christian Jensen Aalborg University, Denmark Demetrios

  Zeinalipour-Yazti University of Cyprus, Cyprus

  Feifei Li University of Utah, USA Guoliang Li Tsinghua University, China K. Selçuk Candan Arizona State University, USA Kyuseok Shim Seoul National University, South Korea Makoto Onizuka Osaka University, Japan Reynold Cheng The University of Hong Kong, SAR China Toshiyuki Amagasa University of Tsukuba, Japan Walid Aref Purdue University, USA Wang-Chien Lee Pennsylvania State University, USA Wen-Chih Peng National Chiao Tung University, Taiwan Wook-Shin Han Pohang University of Science and Technology, South Korea Xiaokui Xiao National University of Singapore, Singapore Ying Zhang University of Technology Sydney, Australia

  Program Committee

  Alex Thomo University of Victoria, Canada An Liu Soochow University, China Baoning Niu Taiyuan University of Technology, China Bin Yang Aalborg University, Denmark Bo Tang Southern University of Science and Technology, China Zouhaier Brahmia University of Sfax, Tunisia Carson Leung University of Manitoba, Canada Cheng Long

  Queen’s University Belfast, UK

  VIII Organization

  Chih-Chien Hung Tamkang University, China Chih-Hua Tai National Taipei University, China Cuiping Li Renmin University of China, China Daniele Riboni University of Cagliari, Italy Defu Lian Big Data Research Center, University of Electronic

  Science and Technology of China, China Dejing Dou University of Oregon, USA Dimitris Sacharidis Technische Universität Wien, Austria Ganzhao Yuan Sun Yat-sen University, China Giovanna Guerrini Università di Genova, Italy Guanfeng Liu The University of Queensland, Australia Guoqiong Liao Jiangxi University of Finance and Economics, China Guanling Lee National Dong Hwa University, China Haibo Hu Hong Kong Polytechnic University, SAR China Hailong Sun Beihang University, China Han Su University of Southern California, USA Haoran Xie The Education University of Hong Kong, SAR China Hiroaki Ohshima University of Hyogo, Japan Hong Chen Renmin University of China, China Hongyan Liu Tsinghua University, China Hongzhi Wang Harbin Institute of Technology, China Hongzhi Yin The University of Queensland, Australia Hua Wang Victoria University, Australia Ilaria Bartolini University of Bologna, Italy James Cheng Chinese University of Hong Kong, SAR China Jeffrey Xu Yu Chinese University of Hong Kong, SAR China Jiajun Liu Renmin University of China, China Jialong Han Nanyang Technological University, Singapore Jianbin Huang Xidian University, China Jian Yin Sun Yat-sen University, China Jiannan Wang Simon Fraser University, Canada Jianting Zhang City College of New York, USA Jianxin Li Beihang University, China Jianzhong Qi University of Melbourne, Australia Jinchuan Chen Renmin University of China, China Ju Fan Renmin University of China, China Jun Gao Peking University, China Junhu Wang

  Griffith University, Australia Kai Zeng Microsoft, USA Kai Zheng University of Electronic Science and Technology of China, China Karine Zeitouni Université de Versailles Saint-Quentin, France Lei Zou Peking University, China Leong Hou U University of Macau, SAR China Liang Hong Wuhan University, China Lianghuai Yang Zhejiang University of Technology, China

  Organization

  IX Lisi Chen Wollongong University, Australia Lu Chen Aalborg University, Denmark Maria Damiani University of Milan, Italy Markus Endres University of Augsburg, Germany Mihai Lupu Vienna University of Technology, Austria Mirco Nanni

  ISTI-CNR Pisa, Italy Mizuho Iwaihara Waseda University, Japan Peiquan Jin University of Science and Technology of China, China Peng Wang Fudan University, China Qin Lu University of Technology Sydney, Australia Ralf Hartmut Güting Fernuniversität in Hagen, Germany Raymond Chi-Wing Wong Hong Kong University of Science and Technology,

  SAR China Ronghua Li Shenzhen University, China Rui Zhang University of Melbourne, Australia Sanghyun Park Yonsei University, South Korea Sanjay Madria Missouri University of Science and Technology, USA Shaoxu Song Tsinghua University, China Shengli Wu Jiangsu University, China Shimin Chen Chinese Academy of Sciences, China Shuai Ma Beihang University, China Shuo Shang King Abdullah University of Science and Technology,

  Saudi Arabia Takahiro Hara Osaka University, Japan Tieyun Qian Wuhan University, China Tingjian Ge University of Massachusetts, Lowell, USA Tom Z. J. Fu Advanced Digital Sciences Center, Singapore Tru Cao Ho Chi Minh City University of Technology, Vietnam Vincent Oria New Jersey Institute of Technology, USA Wee Ng Institute for Infocomm Research, Singapore Wei Wang University of New South wales, Australia Weining Qian East China Normal University, China Weiwei Sun Fudan University, China Wen Zhang Wuhan University, China Wolf-Tilo Balke Technische Universität Braunschweig, Germany Wookey Lee Inha University, South Korea Xiang Zhao National University of Defence Technology, China Xiang Lian Kent State University, USA Xiangliang Zhang King Abdullah University of Science and Technology,

  Saudi Arabia Xiangmin Zhou RMIT University, Australia Xiaochun Yang Northeast University, China Xiaofeng He East China Normal University, China Xiaohui (Daniel) Tao The University of Southern Queensland, Australia Xiaoyong Du Renmin University of China, China Xike Xie University of Science and Technology of China, China

  X Organization

  Xin Cao The University of New South Wales, Australia Xin Huang Hong Kong Baptist University, SAR China Xin Wang Tianjin University, China Xingquan Zhu Florida Atlantic University, USA Xuan Zhou Renmin University of China, China Yafei Li Zhengzhou University, China Yanghua Xiao Fudan University, China Yanghui Rao Sun Yat-sen University, China Yang-Sae Moon Kangwon National University, South Korea Yaokai Feng Kyushu University, Japan Yi Cai South China University of Technology, China Yijie Wang National University of Defense Technology, China Yingxia Shao Peking University, China Yongxin Tong Beihang University, China Yu Gu Northeastern University, China Yuan Fang Institute for Infocomm Research, Singapore Yunjun Gao Zhejiang University, China Zakaria Maamar Zayed University, United Arab of Emirates Zhaonian Zou Harbin Institute of Technology, China Zhiwei Zhang Hong Kong Baptist University, SAR China

  Organization

  XI Keynotes

  

Graph Processing: Applications, Challenges,

and Advances

  Xuemin Lin

  

School of Computer Science and Engineering,

University of New South Wales, Sydney

lxue@cse.unsw.edu.au

Abstract. Graph data are key parts of Big Data and widely used for modelling

complex structured data with a broad spectrum of applications. Over the last

decade, tremendous research efforts have been devoted to many fundamental

problems in managing and analyzing graph data. In this talk, I will cover various

applications, challenges, and recent advances. We will also look to the future

of the area.

  

Differential Privacy in the Local Setting

  Ninghui Li

  

Department of Computer Sciences, Purdue University

ninghui@cs.purdue.edu

Abstract. Differential privacy has been increasingly accepted as the de facto

standard for data privacy in the research community. Recently, techniques for

satisfying differential privacy (DP) in the local setting, which we call LDP, have

been deployed. Such techniques enable the gathering of statistics while pre-

serving privacy of every user, without relying on trust in a single data curator.

Companies such as Google, Apple, and Microsoft have deployed techniques for

collecting user data while satisfying LDP. In this talk, we will discuss the state

of the art of LDP. We survey recent developments for LDP, and discuss pro-

tocols for estimating frequencies of different values under LDP, and for com-

puting marginal when each user has multiple attributes. Finally, we discuss

limitations and open problems of LDP.

  

Big Data, AI, and HI, What is the Next?

  Lei Chen

  

Department of Computer Science and Engineering, Hong Kong University

of Science and Technology

leichen@cse.ust.hk

Abstract. Recently, AI has become quite popular and attractive, not only to the

academia but also to the industry. The successful stories of AI on Alpha-go and

Texas hold ’em games raise significant public interests on AI. Meanwhile,

human intelligence is turning out to be more sophisticated, and Big Data

technology is everywhere to improve our life quality. The question we all want

to ask is “what is the next?”. In this talk, I will discuss about DHA, a new

computing paradigm, which combines big Data, Human intelligence, and AI.

First I will briefly explain the motivation of DHA. Then I will present some

challenges and possible solutions to build this new paradigm.

  

Contents – Part II

   . . .

   He Chen, Xiuxia Tian, and Cheqing Jin . . .

   Jian Li, An Liu, Weiqi Wang, Zhixu Li, Guanfeng Liu, Lei Zhao, and Kai Zheng . . .

   Huan Zhou, Jinwei Guo, Ouya Pei, Weining Qian, Xuan Zhou, and Aoying Zhou . . .

   Xiaolan Zhang, Yeting Li, Fei Tian, Fanlin Cui, Chunmei Dong, and Haiming Chen . . .

   Wei Chit Tan . . .

   Jiaxun Hua, Yu Liu, Yibin Shen, Xiuxia Tian, and Cheqing Jin . . .

  . . .

   Rong Kang, Chen Wang, Peng Wang, Yuting Ding, and Jianmin Wang

  Guiling Wang, Xiaojiang Zuo, Marc Hesenius, Yao Xu, Yanbo Han, and Volker Gruhn XX Contents – Part II

  

  Wentao Wang, Chunqiu Zeng, and Tao Li

  Hongbo Sun, Chenkai Guo, Jing Xu, Jingwen Zhu, and Chao Zhang

  Rong Liu, Guanglin Cong, Bolong Zheng, Kai Zheng, and Han Su

  Wentao Wang, Bo Tang, and Min Zhu

  Na Ta, Jiuqi Wang, and Guoliang Li

  Chen Yang, Wei Chen, Bolong Zheng, Tieke He, Kai Zheng, and Han Su

  Xibo Zhou, Qiong Luo, Dian Zhang, and Lionel M. Ni

  Minxi Li, Jiali Mao, Xiaodong Qi, Peisen Yuan, and Cheqing Jin

  Meiling Zhu, Chen Liu, and Yanbo Han

  Zhaoan Dong, Ju Fan, Jiaheng Lu, Xiaoyong Du, and Tok Wang Ling

  Guohai Xu, Chengyu Wang, and Xiaofeng He

  Haifeng Zhu, Pengpeng Zhao, Zhixu Li, Jiajie Xu, Lei Zhao, and Victor S. Sheng

  Yuan Fang, Lizhen Wang, and Teng Hu

  XXI Contents – Part II

   Lihua Zhou, Guowang Du, Qing Xiao, and Lizhen Wang

  

  Xin Ding, Yuanliang Zhang, Lu Chen, Keyu Yang, and Yunjun Gao

  Muxi Leng, Yajun Yang, Junhu Wang, Qinghua Hu, and Xin Wang

  Guohui Li, Qi Chen, Bolong Zheng, and Xiaosong Zhao

  Wenyan Chen, Zheng Liu, Wei Shi, and Jeffrey Xu Yu

  Zhefan Zhong, Xin Lin, Liang He, and Yan Yang

  Yifeng Luo, Junshi Guo, and Shuigeng Zhou

  Yin Zhang, Huiping Liu, Cheqing Jin, and Ye Guo

  Kun Hao, Junchang Xin, Zhiqiong Wang, Zhuochen Jiang, and Guoren Wang

  An Zhang and Kunlong Zhang

  Dayu Jia, Junchang Xin, Zhiqiong Wang, Wei Guo, and Guoren Wang

  Wen Zhao and Xiaoying Wu

  

Contents – Part I

  . . .

  . . .

  . . .

   Gang Chen, Yue Peng, and Chongjun Wang . . .

  . . .

   Zenan Xu, Yetao Fu, Xingming Chen, Yanghui Rao, Haoran Xie, Fu Lee Wang, and Yang Peng

  . . .

   Xuewen Shi, Heyan Huang, Ping Jian, and Yi-Kun Tang . . .

  . . .

   Jianjun Cheng, Longjie Li, Haijuan Yang, Qi Li, and Xiaoyun Chen

  Jian Xu, Xiaoyi Fu, Liming Tu, Ming Luo, Ming Xu, and Ning Zheng

  Menghao Zhang, Binbin Hu, Chuan Shi, Bin Wu, and Bai Wang

  Zhijian Zhang, Ling Liu, Kun Yue, and Weiyi Liu XXIV Contents – Part I

  

  Guowang Du, Lihua Zhou, Lizhen Wang, and Hongmei Chen

  Cairong Yan, Yan Huang, Qinglong Zhang, and Yan Wan

  Zhang Chuanyan, Hong Xiaoguang, and Peng Zhaohui

  Xiaotian Han, Chuan Shi, Lei Zheng, Philip S. Yu, Jianxin Li, and Yuanfu Lu

  Feifei Li, Hongyan Liu, Jun He, and Xiaoyong Du

  Hongzhi Liu, Yingpeng Du, and Zhonghai Wu

  Zhenhua Huang, Chang Yu, Jiujun Cheng, and Zhixiao Wang

  Mohammad Hossein Namaki, Yinghui Wu, and Xin Zhang

  Xin Ding, Yuanliang Zhang, Lu Chen, Yunjun Gao, and Baihua Zheng

  Yan Fan, Xinyu Liu, Shuni Gao, Zhaohua Zhang, Xiaoguang Liu, and Gang Wang

  Yang Song, Yu Gu, and Ge Yu

  Zhongmin Zhang, Jiawei Chen, and Shengli Wu

  XXV Contents – Part I

   Anzhen Zhang, Jinbao Wang, Jianzhong Li, and Hong Gao

  

  Wenjing Wei, Xiaoyi Jia, Yang Liu, and Xiaohui Yu 2 DMDP

  Yi Zhao, Yong Huang, and Yanyan Shen

  Linlin Gao, Haiwei Pan, Fujun Liu, Xiaoqin Xie, Zhiqiang Zhang, Jinming Han, and the Alzheimer’s Disease Neuroimaging Initiative

   Bofang Li, Tao Liu, Zhe Zhao, and Xiaoyong Du

   Fan Feng, Jikai Wu, Wei Sun, Yushuang Wu, HuaKang Li, and Xingguo Chen

   Zherong Zhang, Wenge Rong, Yuanxin Ouyang, and Zhang Xiong

  

  Jianhui Ding, Shiheng Ma, Weijia Jia, and Minyi Guo

  Yongjian You, Shaohua Zhang, Jiong Lou, Xinsong Zhang, and Weijia Jia

  Zichao Huang, Bo Li, and Jian Yin

  Qiang Xu, Xin Wang, Jianxin Li, Ying Gan, Lele Chai, and Junhu Wang

  Yongxin Shen, Zhixu Li, Wenling Zhang, An Liu, and Xiaofang Zhou

   Guozheng Rao, Bo Zhao, Xiaowang Zhang, Zhiyong Feng, and Guohui Xiao

  

  Peizhong Yang, Tao Zhang, and Lizhen Wang

  Yuan Fang, Lizhen Wang, Teng Hu, and Xiaoxuan Wang

  Zichen Wang, Tian Li, Yingxia Shao, and Bin Cui

  Xiaoli Wang, Chuchu Gao, Jiangjiang Cao, Kunhui Lin, Wenyuan Du, and Zixiang Yang

  Chaozhou Yang, Xin Wang, Qiang Xu, and Weixi Li

  Yajie Zhu, Feng Xiong, Qing Xie, Lin Li, and Yongjian Liu

  XXVI Contents – Part I

  Database and Web Applications

  

Fuzzy Searching Encryption

with Complex Wild-Cards Queries

on Encrypted Database

  Wild-cards searching CryptDB

  _

  Springer International Publishing AG, part of Springer Nature 2018 c Y. Cai et al. (Eds.): APWeb-WAIM 2018, LNCS 10988, pp. 3–18, 2018.

  

Supported by the National Key Research and Development Program of China

(No. 2016YFB1000905), NSFC (Nos. 61772327, 61532021, U1501252, U1401256 and

61402180), Project of Shanghai Science and Technology Committee Grant (No.

15110500700).

  sourced encrypted database (OEDB) which supports executing SQL statements on cipher-texts. Its transparency essentially relies on the design of splitting attri- butions and rewriting queries on proxy middle-ware. Under this proxy-based encrypted framework, several auxiliary columns are extended with different encryptions and query semantics are preserved through modifying or appending SQL statements.

   ] is a typical out-

  Cloud database is a prevalent paradigm for data outsourcing. In considera- tion of data security and commercial privacy, both individuals and enterprises prefer outsourcing them in encrypted form. CryptDB

  Keywords: Fuzzy searching encryption ·

  He Chen

  Achieving fuzzy searching encryption (FSE) can greatly

enrich the basic function over cipher-texts, especially on encrypted

database (like CryptDB). However, most proposed schemes base on cen-

tralized inverted indexes which cannot handle complicated queries with

wild-cards. In this paper, we present a well-designed FSE schema through

Locality-Sensitive-Hashing and Bloom-Filter algorithms to generate two

types of auxiliary columns respectively. Furthermore, an adaptive rewrit-

ing method is described to satisfy queries with wild-cards, such as percent

and underscore. Besides, security enhanced improvements are provided

to avoid extra messages leakage. The extensive experiments show effec-

tiveness and feasibility of our work.

  1 1 School of Data Science and Engineering, East China Normal University,

Shanghai, China

watch ch@163.com , cqjin@dase.ecnu.edu.cn

2 College of Computer Science and Technology,

Shanghai University of Electric Power, Shanghai, China

xxtian@fudan.edu.cn

Abstract.

  , and Cheqing Jin

  )

  

2(

B

  , Xiuxia Tian

  1

1 Introduction

  4

H. Chen et al.

  To enrich basic functions on cipher-texts, searchable symmetric encryp- tion (SSE) is proposed for keyword searching with encrypted inverted indexes

   ], and then dynamic SSE (DSSE) achieves alterations on various cen-

  tralized indexes to enhance applicability

   ]. Besides, the studies about

  exact searching with boolean expressions are extended in this field to increase accuracy

  . Furthermore, the researches of similar searching among docu-

  ments or words are widely discussed through introducing locality sensitive hash- ing algorithms

  

  

   ]. However, these proposed schemes are

  not applicable to OEDB scenario because of the centralized index design and cannot handle complex fuzzy searching with wild-cards. Plaintext results Queries Matched DET values Rewrited SQLs

  Di

DET determined cipher-texts

OPE preserving order of numbers

HOM homomorphism for sum,avg

FSE fuzzy searching with wild-cards

HOM homomorphism for sum,avg

FSE fuzzy searching with wild-cards

OPE preserving order of numbers

DET determined cipher-texts

fferent Encryp ons Fig. 1.

  The client-proxy-database framework synthesizes various encryptions together,

such as the determined encryption (DET) preserves symmetric character for

en/decryption, the order-preserving encryption (OPE) persists order among numeric

values, the fuzzy searching encryption (FSE) handles queries on text, and the homo-

morphic encryption (HOM) achieves aggregation computing.

  Therefore, it is meaningful and necessary to achieve fuzzy searching encryp- tion over outsourced encrypted database. As shown in Fig.

  the specific frame-

  work accomplishes transparency and homomorphism by rewriting SQL state- ments on auxiliary columns. In this paper, we focus on resolving the functionality of ‘like’ queries with wild-cards (‘%’ and ‘ ’). Our contributions are summarized as follows:

  • – We propose a fuzzy searching encryption with complex wild-cards queries on encrypted database which extends extra functionality for the client-proxy- database framework like CryptDB.
  • – We present an adaptive rewriting method to handle different query cases on two types of auxiliary columns. The formal column works for similar search- ing by locality sensitive hashing and the latter multiple columns work for maximum substring matching by designed bloom-filter vectors.
  • – We evaluate the efficiency, correctness rate and space overhead by adjusting the parameters in auxiliary columns. Besides, security enhanced improve- ments are provided to avoid extra messages leakage. The extensive experi- ments also indicate the effectiveness and feasibility of our work.

  The rest of paper is organized as follows. Section

   describes

  our schema including initialization of auxiliary columns, adaptive rewriting queries and security enhanced improvements. Section

   Fuzzy Searching Encryption with Complex Wild-Cards Queries

  5

  2 Related Work

  In recent years, many proposed schemes have been attempting to achieve fuzzy searching encryption with helps of similarity

   ,

  . The most of

  them introduce locality sensitive hashing (LSH) to map similar items together and bloom-filter to change the method of measuring. Wang et al.’s work

   ] was

  one of the first works to present fuzzy searching. They encode every words in each file into same large bloom-filter space as a vector and evaluate similarity of target queries by computing the inner product for top-k results among vectors. Kuzu et al.’s work

   ] generates similar feature vectors by embedding keyword

  strings into the Euclidean space which approximately preserves the relative edit distance. Fu et al.’s work

   ] proposes an efficient multi-keyword fuzzy ranked

  search schema which is suitable for common spelling mistakes. It benefits from counting uni-gram among keywords and transvection sorting to obtain ranked candidates. Wang et al.’s work

  generates a high-dimensional feature vector

  by LSH to support large-scale similarity search over encrypted feature-rich mul- timedia data. It stores encrypted inverted file identifier vectors as indexes while mapping similar objects into same or neighbor keyword-buckets by LSH based on Euclidean distance. In contrast to sparse vectors from bi-gram mapping, their work eliminates the sparsity and promotes the correctness as well. However, there are many problems in existing schemes including the insufficient metric conversion, the coarse-grained similarity comparison, the extreme dependency of assistant programs and the neglect about wild-card queries.

  Meanwhile, the proposal of CryptDB

  has attracted world-wide attention

  because they provide a practical way to combine various attribution-preserving encryptions over encrypted database. Then many analogous researches

  study its security definitions, feasible frameworks, extensible functions and

  optimizations. Chen et al.

   ] consider these encrypted database as a client-proxy-

  database framework and presents symmetric column for en/decryption and aux- iliary columns for supporting executions. This framework helps execute SQL statements directly over cipher-texts through appending auxiliary columns with different encryptions. It also benefits from the transparency of en/decryption processes and combines various functional encryptions together. Therefore, it is meaningful to achieve efficient fuzzy searching with complex wild-cards queries on proxy-based encrypted database.

  3 Preliminaries

  3.1 Basic Concepts A. N-gram.

  In the fields of computational linguistics and probability, the n- gram method is proposed for measurement by generating a contiguous sequence of items from given strings. Essentially, it converts texts to fragments sets for vectorization while preserving some connotative connections. As shown in Table

  various n-gram methods are utilized to preserve different implicit inner relation from origin strings.

  6

H. Chen et al.

  

Table 1. Various n-gram forms in our scheme

N-gram methods Value Description String secure The original keyword Counting uni-gram [

  8 ] s1, e1, c1, u1, r1, e2 Preserve repetitions Bi-gram #s, se, ec, cu, ur, re, e# Preserve adjacent letters Tri-gram sec, ecu, cur, ure Preserve triple adjacent letters

Prefix and suffix @s, e@ Beginning and ending of sentence

  In general, bi-gram is the most common converting method which maintains the connotative information between adjacent letters. However, each change of single letter will double influence bi-gram results and cause reduction of matching probability. The counting uni-gram preserves repetitions and benefits on letter- confused comparison cases, such as misspelling of a letter, missing or adding a letter and reversing the order of two letters. However, it reduces the degree of constraint along with increasing false positives. The tri-gram is a more strict method which only suits the specific scene like existing judgment. The prefix and suffix preserve the beginning and ending of data to meet edge-searching.

  B. Bloom-Filter.

  The Bloom-filter is a compact structure reflecting whether specific elements exist in prepared union. In our schema, we introduce this algo- rithm to judge existence about maximized substring fragments and represent the sparse vector through decimal numbers in separated columns. Given words fragments set S = {e , . . . , e }, a bloom-filter maps each element e into a same

  1 #e i

  l -bit sparse array by k independent hash functions. Positive answer is provided only if all bits of matched positions are true.

  C. Locality Sensitive Hashing.

  The locality sensitive hashing (LSH) algo- rithm helps reduce the dimension of high-dimensional data. In our schema, we introduce this algorithm to map similar items together with high probability. Besides, the specific manifestation of the algorithm is different under different measurement standards. However, there is no available method for levenshtein distance among text. So that a common practice is converting texts to fragment sets with n-gram methods.

  Definition 1 (Locality sensitive hashing). Given a distance metric function d t

  D, a hash function family H = {h i : {0, 1} → {0, 1} |i = 1, . . . , M } is d

  , r , p , p (r )-sensitive if for any s, t ∈ {0, 1} and any h ∈ H satisfies:

  1

  2

  1

  2 if D(s, t) ≤ r then P r[h i (p) = h i (q)] ≥ p ;

  1

  1 if D(s, t) ≥ r then P r[h i (p) = h i (q)] ≤ p .

  2

  2 For nearest neighbor searching, p > p and r < r is needed. Practically,

  1

  2

  1

  2

  feasible permutations are generated through surjective hashing functions with our security parameter λ. And the minhash algorithm helps map fragment sets of every separated words which achieves similar searching.

  Fuzzy Searching Encryption with Complex Wild-Cards Queries

  7

  3.2 Functional Model Let D = (d , . . . , d ) be sensitive row data (each line contains some words

  1 #D i |d | i

  w , c , c respectively, as d i = ) and C = {c det lsh bf } be the corresponding j j

  =1

  cipher-texts. Two types of indexing methods are enforced: the first one achieves similar searching among words through dimension reductions with locality sensi- tive hashing (let m be the dimension of LSH, n be the tolerance and L represents its conversion); the last one achieves maximum substring matching through bit operation with bloom-filter (let l be the length of vector space, k be the amount of hashing functions and B represents its conversion). We consider LSH tokens i n i |d | set T i = L ( G ss (w )) be the elementary ciphers for c lsh , and BF vector k |d | i m j i j =1

  V i = B ( G msm (w )) be the ciphers of whole continuous sequence for c bf . l j j

  =1

  Besides, G represents n-gram methods for similar searching or maximum sub- string matching. Definition 2

  (Fuzzy searching encryption). A proxy-based encrypted database

  

implements fully fuzzy searching with rewriting SQL statements through the fol-

lowing polynomial-time algorithms: n k

  (K det , L , B ) ← KeyGen(λ, m, n, l, k): Given security parameter λ, m l dimension m of LSH and tolerance n, vector length l of BF and hash amount k, n k it outputs a primary key K for determining encryption, L for LSH, B for det m l BF. The security parameter λ helps initialize the hash functions and random- ization processes. n k n

  (c det , T i , V i ) ← Index(d i , L , B ): Given the LSH function L and the k m m l BF function B , the plain-text d i is encrypted to determined cipher-texts c det , l ciphers T i for similar searching and ciphers V i for maximum substring matching respectively.

  (c det ||T i ||V i ) ← Trapdoor(expression): Given the query expression analyzed from ‘like’ clause, the adaptive rewriting method help generate representing ele- ments out of different considerations with wild-cards condition. The determined cipher-texts would return in next step over encrypted database and K det helps decryption.

  As shown in definition of fuzzy searching encryption, we mainly emphasize transformation processes like building, indexing and executing. There exist other functional methods such as updating, deleting to achieve dynamically of our schema. It is applicable for outsourced encrypted database through rewriting SQL statements including ‘create’, ‘insert’, ‘select’ and so on.

  3.3 Security Notions Our security definition follows the widely-accepted security frameworks in this field

  . It is summarized in fuzzy query over encrypted database that

  the overall security relies on the cryptographic assurance of indexes and trap- doors. In our schema, we store extra functional ciphers as indexes and rewrite queries as trapdoors. The security guarantee means there is no additional infor- mation leaked other than the functional results of fuzzy query.

  8

H. Chen et al.

4 Proposed Fuzzy Searching Encryption

  4.1 Two Types of Functional Auxiliary Columns The multiple-attributions-splitting design in cloud database synthesizes various encryptions to preserve query semantics. As shown in Table

  two types of aux-

  iliary columns (c-LSH and c-BF) are appended on cloud database along with a symmetrical determined column (DET).

  Table 2. c c c

det lsh (m = 4, wid = 2) bf (1) . . . c bf ( )

Storage pattern of multiple functional columns in database 32 l

0x1234 (“I love apple”) 19030024, 01000409, 00020412 1077036627 . . . 1957741388

0x3456 (“lave banana”) 01000409, 00020303 1079642851 . . . 625017556

0x5678 (“I love coconut”) 19030024, 01000409, 06000700 1626500087 . . . 1687169793

  This schema aims at handling queries with wild-cards on cipher-texts. So that several appended columns could store different functional ciphers with various encryptions, such as determination (DET) of data for equality, locality sensitive hashing (LSH) of words fragments for similar searching, bloom-filter (BF) among lines for maximum substring matching.

  A. c-LSH. The c-LSH column, which stores the locality sensitive hashing values of each sentence, represents a message digest after dimensionality reduction. It Word #l la av ve e# lave

  N-gram Sparse vector 1 IPA: inverted position array(lave)={1,2,4,7,10,11,12,14,17,20} 1 1 Permutation Hashing 1 1 1 1 1 1 Signatures 1 Signatures Algorithm MinHash p3 p4

p2 ={16,13,18,9,1,5,2,4,10,12,14,8,3,6,15,7,17,19,20,11}

p1

={19,1,3,6,12,9,2,4,7,10,11,13,20,15,14,16,5,17,18,8}

={13,5,15,3,18,6,9,2,4,16,7,10,12,14,19,17,11,8,1,20}

={8,3,1,6,9,5,2,4,7,12,13,14,15,10,11,16,18,19,17,20}

2 8 5 3 Enc(I love apple too) 1, Enc(I lave apple) 1,

DET h1 h2 h3 h4 c-LSH

Encrypted data in DB 3 ,5,4 2, 3 ,5 2,

5 ,1,2 6,

5 ,1 6,

9 ,1,2 2, 8 ,1 2, 2 ,4,5 1262, 3592 ,5114,4225 2 ,4 1262, 3582 ,5114 Fig. 2.

  A sample with bi-gram method (counting uni-gram as well) to show trans-

forming process: (1) split sentences in line to multiple words; (2) transport a word

to fragments with n-gram and build inverted position array; (3) execute dimension

reduction with LSH and get m features; (4) link features to a token for each word; (5)

combine tokens in line with comma.

  Fuzzy Searching Encryption with Complex Wild-Cards Queries

  9

  helps map similar items together with probability which equals to the jaccard distance between their inverted position arrays (IPA for short).

  During transforming process, n-gram methods are utilized (such as bi-gram and counting uni-gram) for dividing texts into fragments and finally to sparse vec- tors (IPA for short). As shown in Fig.

  the transforming process maps every rows

  to separate signature collections by steps. This process changes measurement from levenshtein distance on texts to jaccard similarity on IPAs. So that the particular minhash algorithm could reduce the dimensions of numeric features for each sub- ject (words). Finally, each word is converted to a linked sequence as a token and the c-LSH stores tokens set with comma to represent data of whole line.

  B. c-BF.