Cognitive Radio Oriented Wireless Networks pdf pdf

  Dominique Noguet Klaus Moessner (Eds.)

  Jacques Palicot 172

  Cognitive Radio Oriented Wireless Networks 11th International Conference, CROWNCOM 2016 Grenoble, France, May 30 – June 1, 2016 Proceedings

  Lecture Notes of the Institute for Computer Sciences, Social Informatics

and Telecommunications Engineering 172

Editorial Board

  Ozgur Akan

  Middle East Technical University, Ankara, Turkey

  Paolo Bellavista

  University of Bologna, Bologna, Italy

  Jiannong Cao

  Hong Kong Polytechnic University, Hong Kong, Hong Kong

  Falko Dressler

  University of Erlangen, Erlangen, Germany

  Domenico Ferrari

  Università Cattolica Piacenza, Piacenza, Italy

  Mario Gerla

  UCLA, Los Angeles, USA

  Hisashi Kobayashi

  Princeton University, Princeton, USA

  Sergio Palazzo

  University of Catania, Catania, Italy

  Sartaj Sahni

  University of Florida, Florida, USA

  Xuemin (Sherman) Shen

  University of Waterloo, Waterloo, Canada

  Mircea Stan

  University of Virginia, Charlottesville, USA

  Jia Xiaohua

  City University of Hong Kong, Kowloon, Hong Kong

  Albert Zomaya

  University of Sydney, Sydney, Australia

  Geoffrey Coulson

  Lancaster University, Lancaster, UK More information about this series at

  • Dominique Noguet Klaus Moessner Jacques Palicot (Eds.)

  Cognitive Radio Oriented Wireless Networks

11th International Conference, CROWNCOM 2016

Grenoble, France, May 30 – June 1, 2016 Proceedings

  Editors Dominique Noguet Jacques Palicot CEA-LETI

  CentraleSupélec – Campus de Rennes Grenoble Institut d’Electronique et de France Télécommunications de Rennes, UMR CNRS 6164 Klaus Moessner Cesson-Sévigné University of Surrey France Guildford UK

ISSN 1867-8211

  ISSN 1867-822X (electronic) Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

ISBN 978-3-319-40351-9

  ISBN 978-3-319-40352-6 (eBook) DOI 10.1007/978-3-319-40352-6 Library of Congress Control Number: 2016940880 ©

  

ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, express or implied, with respect to the material contained herein or for any errors or

omissions that may have been made.

  Printed on acid-free paper This Springer imprint is published by Springer Nature

  

CROWNCOM 2016

Preface

  The 11th EAI International Conference on Cognitive Radio Oriented Wireless Networks (CROWNCOM 2016) was hosted by CEA-LETI and was held in Grenoble, France, the capital of the Alps. 2016 was the 30th anniversary of wireless activities at CEA-LETI, but Grenoble has a more ancient prestigious scientific history, still very fresh in the signal processing area as Joseph Fourier worked there on his fundamental research, and established the Imperial Faculty of Grenoble in 1810, now the Joseph Fourier University.

  This year, the main themes of the conference centered on the application of cog- nitive radio to 5G and to the Internet of Things (IoT). According to the current trend, the requirements of 5G and the IoT will increase demands on the wireless spectrum, accelerating the spectrum scarcity problem further. Both academic and regulatory bodies have focused on dynamic spectrum access and or dynamic spectrum usage to optimize scarce spectrum resources. Cognitive radio, with the capability to flexibly adapt its parameters, has been proposed as the enabling technology for unlicensed secondary users to dynamically access the licensed spectrum owned by legacy primary users on a negotiated or an opportunistic basis. It is now perceived in a much broader paradigm that will contribute toward solving the resource allocation problem that 5G requirements raise.

  The program of the conference was structured to address these issues from the perspectives of industry, regulation bodies, and academia. In this transition period, where several visions of 5G and spectrum usage coexist, the CROWNCOM Committee decided to have a very strong presence of keynote speeches from key stakeholders. We had the pleasure of welcoming keynotes from industry 5G leaders such as Huawei and Qualcomm, the IoT network operator Sigfox, and the European Commission with perspectives on policy and regulation. We were also honored to welcome prestigious academic views from the Carnegie Mellon University and Zhejiang University. Along with the keynote speeches, we had the opportunity to debate the impact of massive IoT deployments in future spectrum use with a panel gathering high-profile experts in the field, thanks to the help of the conference panel chair.

  The committee also wanted to emphasize how cognitive radio techniques could be applied in different areas of wireless communication in a pragmatic way, in a context where cognitive radio is often perceived as a theoretical approach. With this in mind, a significant number of demonstrations and exhibits were presented at CROWNCOM. Seven demonstrations and two exhibition booths were present during the conference, showcasing the maturity level of the technology. In this regard, we would like to express our gratitude to the conference demonstration and exhibit chair for his out- standing role in the organization of the demonstration area in conjunction with the conference local chairs. In addition, we decided to integrate a series of workshops that

VI CROWNCOM 2016

  can be seen as special sessions where specific aspects of cognitive radio regarding applications, regulatory frameworks, and research were discussed. Four such work- shops were organized as part of the conference embracing topics such as modern spectrum management, 5G technology enabler, software-defined networks and virtu- alization, and cloud technologies. This year, delegates could also attend three tutorials organized as part of the conference. We are thankful to the tutorial chair for organizing these tutorials.

  Of course, the core of the conference was composed of regular paper presentations covering the seven tracks of the conference topics. We received a fair number of submissions, out of which 62 high-quality papers were selected. The paper selection was the result of a rigorous and high-standard review process involving more than 150 Technical Program Committee (TPC) members with a rejection rate of 25 %. We are grateful to all TPC members and the 14 track co-chairs for providing high-quality reviews. This year, the committee decided to reward the best papers of the conference in two ways. First, the highest-ranked presented papers were invited to submit an extended version of their work to a special issue of the EURASIP Journal on Wireless

  

Communications and Networking (Springer) on “Dynamic Spectrum Access and

  Cognitive Techniques for 5G.” This special issue is planned for publication at the end of 2016. Then, a smaller number of papers with the highest review scores competed for the conference “Best Paper Award,” sponsored by Orange. The committee would like to thank the authors for submitting high-quality papers to the conference, thereby maintaining COWNCOM as a key conference in the field.

  Of course the organization of this event would have been impossible without the constant guidance and support of the European Alliance for Innovation (EAI). We would like to warmly thank the organization team at EAI and in particular Anna Horvathova, the conference manager of CROWNCOM 2016. We would also like to express our thanks to the team at CEA: our conference secretary and the conference webchair for their steady support and our local chairs for setting up all the logistics.

  Finally, the committee would like express its gratitude to the conference sponsors. In 2016, CROWNCOM was very pleased by the support of a large number of pres- tigious sponsors, stressing the importance of the conference for the wireless commu- nity. Namely, we express our warmest thanks to Keysight Technologies, Huawei, Nokia, National Instruments, Qualcomm, and the European ForeMont project.

  We hope you will enjoy the proceedings of CROWNCOM 2016. April 2016

  Dominique Noguet Klaus Moessner

  Jacques Palicot

  

Organization

Steering Committee

  Imrich Chlamtac Create-Net, EAI, Italy Thomas Hou Virginia Tech, USA Abdur Rahim Biswas Create-Net, Italy Tao Chen

  VTT – Technical Research Centre of Finland, Finland Tinku Rasheed CREATE-NET, Italy Athanasios Vasilakos Kuwait University, Kuwait

  Organizing Committee

  General Chair Dominique Noguet CEA-LETI, France Conference Secretary Sandrine Bertola CEA-LETI, France Technical Program Chair Klaus Moessner University of Surrey, UK Publicity and Social Media Chairs Emilio Calvanese-Strinati CEA-LETI, France Masayuki Ariyoshi NEC, Japan Apurva Mody BAE systems, USA Publication Chair Jacques Palicot Centrale Supélec, France Keynote and Panel Chair Paulo Marques Instituto de Telecomunicacoes, Portugal Workshop Chair Jordi Pérez-Romero UPC, Spain Tutorial Chair

  KU Leuven, Belgium Sofie Polin VIII Organization

  Exhibition and Demonstration Chair Benoit Miscopein CEA-LETI, France Local Chairs Pascal Conche CEA, France Audrey Scaringella CEA, France Web Chair Yoann Roth CEA-LETI, France

  Technical Program Committee Chairs

  Track 1: Dynamic Spectrum Access/Management and Database Co-chairs Oliver Holland King’s College, London, UK Kaushik Chowdhury Northeastern University, USA Track 2: Networking Protocols for Cognitive Radio Co-chairs Luca De Nardis Sapienza University of Rome, Italy Christophe Le Martret Thales Communications & Security, France Track 3: PHY and Sensing Co-chairs Friedrich Jondral Karlsruhe Institute of Technology, Germany Stanislav Filin NICT, Japan Track 4: Modelling and Theory Co-chairs Panagiotis Demestichas University of Piraeus, Greece Danijela Cabric UCLA, USA Track 5: Hardware Architecture and Implementation Co-chairs Seungwon Choi Hanyang University, Korea Olivier Sentieys Inria, France Track 6: Next Generation of Cognitive Networks Co-chairs

  Track 7: Standards, Policies, and Business Models Co-chairs Martin Weiss University of Pittsburgh, USA Baykas Tuncer Istanbul Medipol University, Turkey

  Technical Program Committee Members

  Hamed Ahmadi University College Dublin, Ireland Adnan Aijaz Toshiba Research European Labs, UK Ozgur Barış Akan Koc University, Turkey Anwer Al-Dulaimi University of Toronto, Canada Mohammed Altamimi Communications and IT Commission, Saudi Arabia Onur Altintas Toyota, Japan Osama Amin KAUST, Saudi Arabia Xueli An European Research Centre, Huawei Technologies,

  Germany Peter Anker Netherlands Ministry of Economic Affairs,

  The Netherlands Angelos Antonopoulos Telecommunications Technological Centre of Catalonia (CTTC), Spain Ludovic Apvrille Telecom-ParisTech, France Masayuki Ariyoshi NEC, Japan Stefan Aust NEC, Japan Faouzi Bader Centrale Supélec Rennes, France Arturo Basaure Aalto University, Finland Bernd Bochow Fraunhofer Focus, Germany Tadilo Bogale

  Institut National de recherche Scientifique, Canada Doug Brake Information Technology and Innovation Foundation,

  USA Omer Bulakci Huawei, Germany Carlos E. Caicedo Syracuse University, USA Emilio Calvanese-Strinati CEA-LETI, France Chan-Byoung Chae Yonsei University, Korea Ranveer Chandra Microsoft, USA Periklis Chatzimisios Alexander TEI of Thessaloniki, Greece Pravir Chawdhry Joint Research Center EC, Italy Luiz Da Silva Virginia Tech, USA Antonio De Domenico CEA-LETI, France Luca De Nardis Sapienza University of Rome, Italy Thierry Defaix DGA-MI, France Simon Delaere iMinds, Belgium Panagiotis Demestichas University of Piraeus, Greece Marco Di Felice University of Bologna, Italy Rogério Dionisio Instituto de Telecomunicações, Portugal

  Organization

  IX Marc Emmelmann Fraunhofer FOKUS, Germany Ozgur Ergul Koc University, Turkey Serhat Erkucuk Kadir Has University, Turkey Takeo Fujii University of Electro-Communications, Japan Piotr Gajewski Military University of Technology, Poland Yue Gao Queen Mary University of London, UK Matthieu Gautier

  IRISA, France Liljana Gavrilovska Ss Cyril and Methodius University of Skopje,

  Republic of Macedonia Andrea Giorgetti WiLAB, University of Bologna, Italy Michael Gundlach Nokia, Germany Hiroshi Harada Kyoto University, Japan Ali Hassa NUST School of Electrical Engineering and Computer Science (SEECS), Pakistan Stefano Iellamo Crete (at ICS-FORTH), Greece Florian Kaltenberger Eurecom, France Du Ho Kang Ericsson, Sweden Pawel Kaniewski Military Communication Institute, Poland Mika Kasslin Nokia, Finland Shuzo Kato Tohoku University, Japan Seong-Lyun Kim Yonsei University, Korea Adrian Kliks Poznan University of Technology, Poland Heikki Kokkinen Fairspectrum, Finland Kimon Kontovasilis NCSR Demokritos, Greece Pawel Kryszkiewicz Poznan University of Technology, Poland Samson Lasaulce L2S-CNRS, France Vincent Le Nir Royal Military Academia, Belgium Didier Le Ruyet CNAM, France Per H. Lehne Telenor, Norway William Lehr MIT, USA Janne Lehtomäki University of Oulu, Finland Zexian Li Nokia Networks, Finland Dan Lubar RelayServices, USA Irene MacAluso Trinity College Dublin, Ireland Allen, Brantley MacKenzie Virginia Tech, USA Milind Madhav Buddhikot Alcatel Lucent Bell Labs, USA Petri Mähönen RWTH Aachen University, Germany Vuk Marojevic Virginia Tech, USA Paulo Marques Instituto de Telecomunicacoes, Portugal Torleiv Maseng FFI, Norway Daniel Massicotte UQTR, Canada Raphaël Massin Thales Communications & Security, France Marja Matinmikko

  VTT, Finland Arturas Medeisis

  ITU Representative, Saudi Arabia Albena Mihovska Aalborg University, Denmark

  X Organization

  Markus Mueck Intel Deutschland, Germany Dominique Noguet CEA-LETI, France Keith Nolan Intel, Ireland Jacques Palicot Centrale Supélec, France Dorin Panaitopol Thales Communications & Security, France Stephane Paquelet BCOM, France Kathick Parashar

  IMEC, Belgium Przemyslaw Pawelczak Tu Delft, The Netherlands Milica Pejanovic-Djurisic University of Montenegro, Montenegro Jordi Pérez-Romero UPC, Spain Marina Petrova RWTH Aachen University, Germany Sofie Polin

  KU Leuven, Belgium Zhijin Qin Queen Mary University of London, UK Igor Radusinovic University of Montenegro, Montenegro Mubashir Rehmani COMSATS Institute of Information Technology,

  Pakistan Alex Reznik InterDigital, USA Tanguy Risset

  INSA Lyon, France Olivier Ritz iMinds, Belgium Henning Sanneck Nokia Networks, Germany Naotaka Sato Sony, Japan Berna Sayrac Orange Lab, France Shahriar Shahabuddin University of Oulu, Finland Hongjian Sun Durham University, UK Wuchen Tang TAMU, Qatar Terje Tjelta Telenor, Norway Dionysia Triantafyllopoulou University of Surrey, UK Theodoros Tsiftsis Industrial Systems Institute, Greece Fernando Velez Instituto de Telecomunicações-DEM, Portugal Christos Verikoukis CTTC, Spain Guillaume Villemaud

  INSA Lyon, France Anna Vizziello University of Pavia, Italy Ji Wang Virginia Tech, USA Honggang Wang UMass Dartmouth, USA Gerhard Wunder Fraunhofer HHI, Germany Alex Wyglinski Worcester Polytechnic Institute, USA Seppo Yrjöla Nokia, Finland Jens Zander KTH, Sweden Yonghong Zeng

  I2R A-STAR, Singapore Hans-Jurgen Zepernick Blekinge Institute of Technology, Sweden

  Organization

  XI

  

Contents

  and Daniel Riviello . . .

  

  Markku Jokinen, Marko Mäkeläinen, Tuomo Hänninen, Marja Matinmikko, and Miia Mustonen

  

  Sumit J. Darak, Amor Nafkha, Christophe Moy, and Jacques Palicot

   Andrey Garnaev, Shweta Sagari, and Wade Trappe

  . . .

   Pawan Dhakal, Shree K. Sharma, Symeon Chatzinotas, Björn Ottersten,

  Dynamic Spectrum Access/Management and Database . . . . . . .

   Lifeng Hao, Sixing Yin, and Zhaowei Qu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  . . . . . . .

  and Rabah Attia . . . . . . . . . . . . . .

   Feten Slimeni, Bart Scheers, Vincent Le Nir, Zied Chtourou,

   Yiteng Wang, Youping Zhao, Xin Guo, and Chen Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

   Navikkumar Modi, Christophe Moy, Philippe Mary, and Jacques Palicot . . . . . . . . . . . . . . . . . . . . .

  Suzan Bayhan, Gopika Premsankar, Mario Di Francesco, and Jussi Kangasharju XIV Contents

  

  Charles Jumaa Katila, Melchiorre Danilo Abrignani, and Roberto Verdone

  

  Yongjae Kim, Yonghoon Choi, and Youngnam Han

  

  Anirudh Agarwal, Shivangi Dubey, Ranjan Gangopadhyay, and Soumitra Debnath

  

  Fatima Zohra Kaddour, Dimitri Kténas, and Benoît Denis

  

  Adrian Kliks and Paweł Kryszkiewicz

  Networking Protocols for Cognitive Radio

  Van-Thiep Nguyen, Matthieu Gautier, and Olivier Berder

  

  M. Ranjeeth and S. Anuradha

  

  Marcela M. Gomez and Martin B.H. Weiss

  

  Maryam Riaz, Seiamak Vahid, and Klaus Moessner

  PHY and Sensing

  Yoann Roth, Jean-Baptiste Doré, Laurent Ros, and Vincent Berg

  

  Shaojie Liu, Zhiyong Feng, Yifan Zhang, Sai Huang, and Dazhi Bao

  Contents

  XV

  

  Hussein Kobeissi, Youssef Nasser, Amor Nafkha, Oussama Bazzi, and Yves Louët

  

  Kalyana Gopala and Dirk Slock

  

  Jean-Baptiste Doré and Vincent Berg

  

  Dominique Noguet and Jean-Baptiste Doré 2 D.K. Patel and Y.N. Trivedi

  

  Natalia Y. Ermolova and Olav Tirkkonen

  

  Hussein Kobeissi, Amor Nafkha, Youssef Nasser, Oussama Bazzi, and Yves Louët

  

  Abbass Nasser, Ali Mansour, Koffi-Clement Yao, Hussein Charara, and Mohamad Chaitou

  

  Deep Chandra Kandpal, Vaibhav Kumar, Ranjan Gangopadhyay, and Soumitra Debnath

  Modelling and Theory

  Alexandre Marquet, Cyrille Siclet, Damien Roque, and Pierre Siohan

  

  Juwendo Denis, Mylene Pischella, and Didier Le Ruyet

  

  

  Vincent Berg and Jean-Baptiste Doré

  Essia Ben Abdallah, Serge Bories, Dominique Nicolas, Alexandre Giry, and Christophe Delaveaud

  

  Alexandre Debard, Patrick Rosson, David Dassonville, and Vincent Berg

  

  Hanna Becker, Ankit Kaushik, Shree Krishna Sharma, Symeon Chatzinotas, and Friedrich Jondral

  

  Lin Li, Amanullah Ghazi, Jani Boutellier, Lauri Anttila, Mikko Valkama, and Shuvra S. Bhattacharyya

  

  Hardware Architecture and Implementation

  June Hwang, Jinho Choi, Riku Jäntti, and Seong-Lyun Kim

  Victor Quintero, Samir M. Perlaza, Iñaki Esnaola, and Jean-Marie Gorce

  

  Eva Perez, Karl-Josef Friederichs, Andreas Lobinger, Bernhard Wegmann, and Ingo Viering

  

  Alexandros Palaios, Vanya M. Miteva, Janne Riihijärvi, and Petri Mähönen

  

  Vincent Savaux, Apostolos Kountouris, Yves Louët, and Christophe Moy

  

  XVI Contents

  Contents

  XVII

  

  Mai-Thanh Tran, Matthieu Gautier, and Emmanuel Casseau

  

  Robin Gerzaguet, Laurent Ros, Fabrice Belvéze, and Jean-Marc Brossier

  

  Marko Höyhtyä, Juha Korpi, and Mikko Hiivala

  Next Generation of Cognitive Networks

  Jessica Oueis and Emilio Calvanese Strinati

  

  Zaheer Khan and Janne Lehtomäki

  

  Prasanth Karunakaran and Wolfgang Gerstacker

  

  Rémi Bonnefoi, Christophe Moy, and Jacques Palicot

  

  Francesco Guidi, Anna Guerra, Antonio Clemente, Davide Dardari, and Raffaele DErrico

  

  Mouhcine Mendil, Antonio De Domenico, Vincent Heiries, Raphaël Caire, and Nouredine Hadj-said

  

  Hangqi Li, Xiaohui Zhao, and Yongjun Xu

  

  Dazhi Bao, Hao Zhou, Hao Chen, Shaojie Liu, Yifan Zhang, and Zhiyong Feng XVIII Contents

  Standards, Policies and Business Models

  Seppo Yrjölä, Petri Ahokangas, and Pekka Talmola

  

  Petri Ahokangas, Kari Horneman, Marja Matinmikko, Seppo Yrjölä, Harri Posti, and Hanna Okkonen

  

  Michal Szydelko and Marcin Dryjanski

  Workshop Papers

  Cyril Jouanlanne, Christophe Delaveaud, Yolanda Fernández, and Adrián Sánchez

  

  Slavica Tomovic and Igor Radusinovic

  

  Bessem Sayadi, Marco Gramaglia, Vasilis Friderikos, Dirk von Hugo, Paul Arnold, Marie-Line Alberi-Morel, Miguel A. Puente, Vincenzo Sciancalepore, Ignacio Digon, and Marcos Rates Crippa

  

  Niccolò Iardella, Giovanni Stea, Antonio Virdis, Dario Sabella, and Antonio Frangioni

  Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  Dynamic Spectrum Access/Management and Database

  

A New Evaluation Criteria for Learning

Capability in OSA Context

1( )

  1

  Navikkumar Modi , Christophe Moy , Philippe Mary ,

  2 B

  1 1 and Jacques Palicot

CentraleSupelec/IETR, Avenue de la Boulaie, 35576 Cesson Sevigne, France

{navikkumar.modi,christophe.moy,jacques.palicot}@centralesupelec.fr

2 INSA de Rennes, IETR, UMR CNRS 6164, 35043 Rennes, France

philippe.mary@insa-rennes.fr

  

Abstract. The activity pattern of different primary users (PUs) in the

spectrum bands has a severe effect on the ability of the multi-armed

bandit (MAB) policies to exploit spectrum opportunities. In order to

apply MAB paradigm to opportunistic spectrum access (OSA), we must

find out first whether the target channel set contains sufficient structure,

over an appropriate time scale, to be identified by MAB policies. In this

paper, we propose a criteria for analyzing suitability of MAB learning

policies for OSA scenario. We propose a new criteria to evaluate the

structure of random samples measured over time and referred as Optimal

Arm Identification (OI) factor. OI factor refers to the difficulty associated

with the identification of the optimal channel for opportunistic access.

We found in particular that the ability of a secondary user to learn

the activity of PUs spectrum is highly correlated to the OI factor but

not really to the well known LZ complexity measure. Moreover, in case

of very high OI factor, MAB policies achieve very little percentage of

improvement compared to random channel selection (RCS) approach.

  

Keywords: Cognitive radio Opportunistic spectrum access Rein-

· ·

forcement learning Multi-armed bandit Lempel-Ziv (LZ) complexity

  · · · Optimal Arm Identification (OI) factor

1 Introduction

  Spectrum learning and decision making is a core part of the cognitive radio (CR) to get access to the underutilized spectrum when not occupied by a licensed or primary users (PUs). In particular, we deal with multi-armed bandit (MAB) paradigm, which allows unlicensed or secondary user (SU) to make action to select a free channel to transmit when no PUs are using it, and finally learns about the optimal channel, i.e. channel with the highest probability of being vacant, in the long run

   ].

  The PU activity pattern, i.e. presence or absence of PU signal in the spec- trum band, can be modeled as a 2-state Markov Process

   ]. In case of SU

  trying to learn about the probability for a channel to be vacant, the success of

  4 N. Modi et al.

  MAB policy is affected by the amount of structure obtained in the PUs activity pattern in these channels

   ]. There exist several pieces of work on opportunistic

  spectrum access (OSA), by means of learning and predicting an opportunity, which deals with CR to find out the PUs activity pattern. In this paper, we address the following fundamental questions, affecting MAB performance: (i) when is it advantageous to apply MAB learning framework to address the prob- lem of opportunistic access for SU? (ii) Is the use of MAB policies for OSA is justified over a simple non-intelligent approach?

  In almost all cases, the performance of MAB learning policies has been studied with respect to PUs activity level, i.e. probability that PUs occupy radio channels

  . In some cases, spectrum utilization is modeled as an independent and iden-

  tically distributed (i.i.d) process

  , and it does not take into account the likely

  sequential activity patterns of PUs in the spectrum. To address the first question raised above, the Lempel-Ziv (LZ) complexity was introduced in

  to character-

  ize the PU activity pattern for general reinforcement learning (RL) problem. How- ever, MAB paradigm is a special kind of RL game where SU maximizes its long term reward by making action to learn about the optimal channel, opposed to gen- eral RL framework where SU is interacting with a system by making actions and learns about the underlying structure of the system. Moreover, in this paper, we propose the Optimal Arm Identification (OI) factor to identify the difficulty asso- ciated with prediction of an optimal channel having highest probability of being vacant from the set of channels. Finally, the last question raised above is answered by comparing the performance of MAB policies against the random channel selec- tion (RCS) approach (a non-intelligent approach).

  We found out that, for several spectrum utilization patterns, MAB poli- cies can be beneficial compared to non-intelligent approach, but the percentage of improvement is highly correlated with the level of OI factor and very little affected by the level of LZ complexity. This result does not just emphasize apho- rism that performance of MAB policies in OSA framework depends extremely on the OI factor associated with the selected channel set. The work presented in this paper can be the answer to the question raised in several papers about the effectiveness of MAB framework for OSA scenario. The remainder of this paper is organized as follows. In Sect.

  we introduce MAB framework and RL

  policies which are used to verify the effect of PUs activity pattern on spectrum learning performance. Section

  numer-

  ical results giving the efficiency of MAB policies w.r.t. the output of LZ and OI criteria are presented. Finally, Sect.

   concludes the paper.

  2 System Model and RL Policies

   secondary transceiver pair (Tx-Rx) and a

  set of channel K = {1, · · · , K}. SU can access one of the K channels if it is not 1

  

A New Evaluation Criteria for Learning Capability in OSA Context

  5

  occupied by PUs. The i-th channel is modeled by an irreducible and aperiodic

  i i i

  discrete time Markov chain with finite state space S . P = p , (k, l ∈ {0, 1})

  kl

  denotes the state transition probability matrix of the i-th channel, where 0 and

  i

  1 are the Markov states, i.e. occupied and free respectively. Let, π be the sta- tionary distribution of the Markov chain defined as:

  i i

  p p π i i i

  10

  01

  = [π , π ] = , . (1)

  1 i i i i

  p + p p + p

  

10

  01

  10

  01 i i

  S (t) being the state of the channel i at time t and r (t) ∈ R is the reward

  i

  associated to the band i. Without loss of generality we can assume, that r (t) =

  i i i

  S (t), i.e. S (t) = 1 if sensed free and S (t) = 0 if sensed occupied. The stationary

  i i

  mean reward µ of the i-th channel under stationary distribution π is given

  i i

  by: µ = π . A channel is said optimal when it has the highest mean reward ∗ ∗

  i i i ∗

  µ , such that µ > µ and i = i, i ∈ {1, · · · , K}, i.e. a channel with the highest probability to be vacant. The mean reward optimality gap is defined as

  i i

  ∆ = µ − µ . The regret R(t) of a MAB policy up to time t, is defined as the

  i i

  reward loss due to selecting sub-optimal channel µ : t i R(t) = tµ − r(m), (2)

  m=0

  2.1 Reinforcement Learning (RL) Approaches We consider two different reinforcement learning (RL) strategies, i.e. UCB1 and Thomson-Sampling (TS), in order to evaluate the learning efficiency of MAB policies on channel set containing different PUs activity pattern. These policies

  

  as an approach to solve MAB

  problem and they attempt to identify the most vacant channel in order to max- imize their long term reward. Figure

   illustrates a realization of the random

  process: ‘occupancy of spectrum bands by PUs’. In this figure, all channels do not have the same occupancy ratio and it seems intuitively clear that the more different the channel occupations are, the easier the learning.

  Upper Confidence Bound (UCB) Policy. It has been shown previously in

  

] that UCB1 allows spectrum learning and decision making in OSA context in

  order to maximize the transmission opportunities. UCB1 is a RL based policy, learning about the optimal channel from previously observed rewards starting from scratch, i.e. without any a priori knowledge on the activity within the set of channels. For each time t, UCB1 policy updates indices named as B ,

  t,i,T i (t)

  where T i (t) is the number of times the i-th channel has been sensed up to time t, and returns the channel index a = i of the maximum UCB1 index. UCB1 is

  t

  detailed in Algorithm

   where α is the exploration-exploitation coefficient. If α increases, the bias A dominates and UCB1 policy explores new channels. t,i,T (t) i

  Otherwise, if α decreases, the index computation is governed by ¯ X and the

  i,T (t) i policy tends to exploit the previously observed optimal channel.

6 N. Modi et al.

  Fig. 1.

  PUs activity pattern

  Algorithm 1. UCB1 policy

  Input: K , α Output: a t 1: for t = 1 to n do if t 2: ≤ K then a 3: t = t + 1 else 4: t−1 1 T , 5: i (t) = a =i ∀i m=0 m α ln t t−1 =0 m S (m)1 i am=i ¯

  X A , 6: = , = ∀i i,T (t) t,i,T (t) i T (t) i T (t) i i B X , 7: t,i,T (t) = ¯ i,T (t) + A t,i,T (t) ∀i i i i 8: a = arg max (B ) t i t,i,T (t) i 9: end if

  10: end for

  Thomson-Sampling (TS) Policy. Introduced in

   ] and detailed in

  Algorithm

  TS selects a channel having the highest J index, computed t,i,T (t) i t−1 i i

  with the β function w.r.t. two arguments, i.e. G = S (m)1 and

  i,T (t) a m =i m=0 i i i i F = T (t) − G , where T (t) has the same meaning than previously. i,T (t) i,T (t)

  The former argument is the total number of free state observed up to time t for channel i and the second is the total number of occupied state. For start, no prior knowledge on the mean reward of each channel is assumed i.e. uniform dis- tribution and hence the index for all channels is set to β(1, 1). TS policy updates

  i i i the distribution on mean reward µ as β G + 1, F + 1 . i,T (t) i,T (t)

  

A New Evaluation Criteria for Learning Capability in OSA Context

  7

  Algorithm 2. Thomson-Sampling (TS) policy

  Input: K , G i,1 = 0, F i,1 = 0 Output: a t 1: for t = 1 to n do J 2: = β(G i + 1, F i + 1) t,i,T (t) i,T (t) i,T (t) i J 3: Sense channel a t = arg max i i t,i,T (t) i 4: Observe state S (t) t−1 t−1 1 i T , G S ,

  5: i (t) = a =i ∀i, i,T (t) = (m)1 a =i ∀i m=0 m=0 m m i F , 6: = T i (t) − G ∀i i,T (t) i,T (t) i i 7: end for

3 PUs Activity Pattern vs Difficulties of Prediction

  In general, multi-armed bandit (MAB) algorithms are evaluated with the PUs traffic load which characterizes the occupancy of the spectrum band. Intuitively, the higher occupancy of the channel by PUs, the more difficult the opportunis- tic access for SU will be. However, traffic load of the PUs is not sufficient for evaluating the efficiency of MAB policies. In fact, performance of MAB policies leverages on the structure of the PUs activity pattern and also on the difficulties associated with identification of the optimal channel, i.e. channel with optimal

  i

  mean reward distribution µ . The ON/OFF PUs activity model approximates the spectrum usage pattern as depicted in Fig.

  Moreover, if the separation

  between the mean reward distribution of the optimal and a sub-optimal channel is large, SU should be able to converge to the optimal channel faster, and thus achieves a higher number of opportunistic accesses. Therefore, estimating the amount of structure present in the PUs activity pattern is of essential interest for applying machine learning strategies to OSA.

  3.1 Lempel-Ziv (LZ) Complexity Lempel-Ziv (LZ) complexity was proposed in

   ] as a measure for character-

  izing randomness of sequences. It has been widely adopted in several research areas such as biomedical signal analysis, data compression and pattern recogni- tion. Lempel and Ziv, in

   ], have associated to every sequence a complexity c

  which is estimated by looking at the sequence and incrementing c every time a new substring of consecutive symbols is available. Then c is normalized via the asymptotic limit n/ log (n), where n is the length of the sequence. LZ complexity

  2

  is a property of individual sequences and it can be estimated regardless of any assumptions about the underlying process that generated the data. In

   ], the

  authors have applied the LZ definition to the production rate of new patterns in Markovian processes. This is of particular interest when PUs activity is modeled as Markov process to evaluate the efficiency of MAB policies. For an ergodic source, LZ complexity equals the entropy rate of the source, which for a Markov chain S is given by

   ]:

8 N. Modi et al.

  h(S) = − π k p k,l log p k,l , k, l ∈ {0, 1}, (3)

  k,l

  where p k,l is the transition probability between state k and l. System with LZ complexity equal to 1 implies very high rate of new patterns production and thus it could make difficult for the learning policy to predict the next sequence. For example in Fig.

  channels 1 to 4 have different PUs activity pattern char- acterized by normalized LZ complexity of 0.05, 0.30, 0.60 and 0.66, respectively.

  It is clear that prediction of next vacancy is an easy task in case of channel 1 which has lower LZ complexity, whereas it becomes more and more difficult to predict next vacancy in channel 4 which has higher LZ complexity.

  3.2 Optimal Arm Identification (OI) Factor As stated before, performance of MAB policy applied to OSA context also lever- ages on the separation between optimal and sub-optimal channels mean reward distribution. Here, we define another criteria to characterize the difficulty for a MAB to learn the PUs spectrum occupancy. The MAB policy learns, based on past observations, which channel is optimal in term of mean reward distribution in the long run. The optimal arm identification for MAB framework has been studied since the 1950s under the name ‘ranking and identification problems’

  .

  In recent advances in MAB context, an important focus was set on a different perspective, in which each observation is considered as a reward: the user tries to maximize his cumulative reward. Equivalently, its goal is to minimize the expected regret R(t), as defined in

  , regret

  R(t), defined as the reward loss due to the selection of sub-optimal channels, up to time t is upper bounded uniformly by a logarithmic function: ln t

  i i

  R(t) ≤ a + b (µ − µ ), (4)

  i i i i∗ i i∗ (µ − µ ) i:µ <µ i:µ <µ where a and b are constants independent from channel parameters and time t.

  As stated in

  , upper bound on regret of MAB policy is scaled by the change

i i

  in mean reward optimality gap ∆ = (µ − µ ). Intuitively, decreasing ∆ makes

  i i

  the upper bound looser and thus increases the uncertainty on MAB policies performance. In this paper, we propose the OI factor H as a measure of difficulty

  1

  associated with finding an optimal channel among several other channels:

  K i i

  (µ − µ ) H = 1 − , (5)

  1 K i=1 i i

  where, µ and µ are the mean reward distribution of sub-optimal and optimal

  ∗

  channels respectively, K is the number of channels and i is the index of optimal channel. H measures how close the mean reward of all sub-optimal channels are

  1

  from the mean reward of the optimal channel. If H is close to 1 then all channels

  1

  have very closely distributed mean reward, thus it becomes almost impossible for learning policy to identify the optimal channel from the set of channels.

  

A New Evaluation Criteria for Learning Capability in OSA Context

  9

  

4 Impact of LZ Complexity and OI Factor on Prediction

Accuracy

  In this section, two MAB policies, i.e. UCB1 and Thomson-Sampling (TS), are investigated and the performance they achieve are put in correlation with the information given by LZ complexity and OI factor H . Markov chains with sev-

  1

  eral levels of stationary distribution π = [π , π ], LZ complexity and H factor

  1

  1

  are generated for further numerical analysis. For simulation convenience, some parameters need to be set. Indeed,

  is an undetermined system with two

  unknowns p

  01 and p 10 . Therefore, as a side step, we considered 9 different levels

  of π

  

1 , i.e. probability of being vacant, as 0.1, 0.2, · · · , 0.9. For these values of π

1 ,

  we obtained 45 different transition probability matrices P , each corresponding

  45

  to different LZ complexity. A total of combinations are obtained by con-

  5

  sidering 45 different transition probability matrices and K = 5 channels, and those correspond to various H

  1 factor. Finally, MAB policies are applied to the

  45

  randomly selected 2000 combinations from a total of combinations. Every

  5

  point in each figure corresponds to one realization of MAB policies. For each

  2

  4

  realization, policy is executed over 10 iterations of 10 time slots each. More- over, the exploitation-exploration coefficient in UCB1 is set to α = 0.5 which is proved to be efficient for maintaining a good tradeoff between exploration and exploitation

   ].

  4.1 Probability of Success Probability of success P is computed by considering the number of times

  Succ

  vacant channel is explored over the number of iterations. The success probability depends on the probability P that there exists at least one free channel from

  f

  the set of channels K. Considering that the channel occupation is independent from one channel to another, we have

  

:

K i

  P f = 1 − π , (6)

  i=1 i

  where π is the probability that the i-th channel is occupied.

  Figure

  a) and (b) depict the probability of success P of UCB1 and TS Succ

  policies, i.e. the probability that these policies access to a free channel, according to the probability that at least one channel is free, i.e. P and LZ complexity.

  f

  In both figures, success probability increases with P for a given level of LZ

  f complexity. However, in Fig. a) and (b), for a given P , several values of LZ f

  complexity lead to the same level of performance for UCB1 and TS algorithms. This reveals that LZ complexity is not really related to the ability of UCB1 and TS policies to learn the scenario. For instance in Fig.

  a) and (b), SU is able

  to achieve more than 90 % of probability of success on a channel set with LZ complexity of 0.2 and P = 0.98, whereas it only achieve 75 % of probability of

  f