Cognitive Radio Oriented Wireless Networks pdf pdf
Dominique Noguet Klaus Moessner (Eds.)
Jacques Palicot 172
Cognitive Radio Oriented Wireless Networks 11th International Conference, CROWNCOM 2016 Grenoble, France, May 30 – June 1, 2016 Proceedings
Lecture Notes of the Institute for Computer Sciences, Social Informatics
and Telecommunications Engineering 172
Editorial BoardOzgur Akan
Middle East Technical University, Ankara, Turkey
Paolo Bellavista
University of Bologna, Bologna, Italy
Jiannong Cao
Hong Kong Polytechnic University, Hong Kong, Hong Kong
Falko Dressler
University of Erlangen, Erlangen, Germany
Domenico Ferrari
Università Cattolica Piacenza, Piacenza, Italy
Mario Gerla
UCLA, Los Angeles, USA
Hisashi Kobayashi
Princeton University, Princeton, USA
Sergio Palazzo
University of Catania, Catania, Italy
Sartaj Sahni
University of Florida, Florida, USA
Xuemin (Sherman) Shen
University of Waterloo, Waterloo, Canada
Mircea Stan
University of Virginia, Charlottesville, USA
Jia Xiaohua
City University of Hong Kong, Kowloon, Hong Kong
Albert Zomaya
University of Sydney, Sydney, Australia
Geoffrey Coulson
Lancaster University, Lancaster, UK More information about this series at
- • Dominique Noguet Klaus Moessner Jacques Palicot (Eds.)
Cognitive Radio Oriented Wireless Networks
11th International Conference, CROWNCOM 2016
Grenoble, France, May 30 – June 1, 2016 ProceedingsEditors Dominique Noguet Jacques Palicot CEA-LETI
CentraleSupélec – Campus de Rennes Grenoble Institut d’Electronique et de France Télécommunications de Rennes, UMR CNRS 6164 Klaus Moessner Cesson-Sévigné University of Surrey France Guildford UK
ISSN 1867-8211
ISSN 1867-822X (electronic) Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
ISBN 978-3-319-40351-9
ISBN 978-3-319-40352-6 (eBook) DOI 10.1007/978-3-319-40352-6 Library of Congress Control Number: 2016940880 ©
ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.Printed on acid-free paper This Springer imprint is published by Springer Nature
CROWNCOM 2016
Preface
The 11th EAI International Conference on Cognitive Radio Oriented Wireless Networks (CROWNCOM 2016) was hosted by CEA-LETI and was held in Grenoble, France, the capital of the Alps. 2016 was the 30th anniversary of wireless activities at CEA-LETI, but Grenoble has a more ancient prestigious scientific history, still very fresh in the signal processing area as Joseph Fourier worked there on his fundamental research, and established the Imperial Faculty of Grenoble in 1810, now the Joseph Fourier University.
This year, the main themes of the conference centered on the application of cog- nitive radio to 5G and to the Internet of Things (IoT). According to the current trend, the requirements of 5G and the IoT will increase demands on the wireless spectrum, accelerating the spectrum scarcity problem further. Both academic and regulatory bodies have focused on dynamic spectrum access and or dynamic spectrum usage to optimize scarce spectrum resources. Cognitive radio, with the capability to flexibly adapt its parameters, has been proposed as the enabling technology for unlicensed secondary users to dynamically access the licensed spectrum owned by legacy primary users on a negotiated or an opportunistic basis. It is now perceived in a much broader paradigm that will contribute toward solving the resource allocation problem that 5G requirements raise.
The program of the conference was structured to address these issues from the perspectives of industry, regulation bodies, and academia. In this transition period, where several visions of 5G and spectrum usage coexist, the CROWNCOM Committee decided to have a very strong presence of keynote speeches from key stakeholders. We had the pleasure of welcoming keynotes from industry 5G leaders such as Huawei and Qualcomm, the IoT network operator Sigfox, and the European Commission with perspectives on policy and regulation. We were also honored to welcome prestigious academic views from the Carnegie Mellon University and Zhejiang University. Along with the keynote speeches, we had the opportunity to debate the impact of massive IoT deployments in future spectrum use with a panel gathering high-profile experts in the field, thanks to the help of the conference panel chair.
The committee also wanted to emphasize how cognitive radio techniques could be applied in different areas of wireless communication in a pragmatic way, in a context where cognitive radio is often perceived as a theoretical approach. With this in mind, a significant number of demonstrations and exhibits were presented at CROWNCOM. Seven demonstrations and two exhibition booths were present during the conference, showcasing the maturity level of the technology. In this regard, we would like to express our gratitude to the conference demonstration and exhibit chair for his out- standing role in the organization of the demonstration area in conjunction with the conference local chairs. In addition, we decided to integrate a series of workshops that
VI CROWNCOM 2016
can be seen as special sessions where specific aspects of cognitive radio regarding applications, regulatory frameworks, and research were discussed. Four such work- shops were organized as part of the conference embracing topics such as modern spectrum management, 5G technology enabler, software-defined networks and virtu- alization, and cloud technologies. This year, delegates could also attend three tutorials organized as part of the conference. We are thankful to the tutorial chair for organizing these tutorials.
Of course, the core of the conference was composed of regular paper presentations covering the seven tracks of the conference topics. We received a fair number of submissions, out of which 62 high-quality papers were selected. The paper selection was the result of a rigorous and high-standard review process involving more than 150 Technical Program Committee (TPC) members with a rejection rate of 25 %. We are grateful to all TPC members and the 14 track co-chairs for providing high-quality reviews. This year, the committee decided to reward the best papers of the conference in two ways. First, the highest-ranked presented papers were invited to submit an extended version of their work to a special issue of the EURASIP Journal on Wireless
Communications and Networking (Springer) on “Dynamic Spectrum Access and
Cognitive Techniques for 5G.” This special issue is planned for publication at the end of 2016. Then, a smaller number of papers with the highest review scores competed for the conference “Best Paper Award,” sponsored by Orange. The committee would like to thank the authors for submitting high-quality papers to the conference, thereby maintaining COWNCOM as a key conference in the field.
Of course the organization of this event would have been impossible without the constant guidance and support of the European Alliance for Innovation (EAI). We would like to warmly thank the organization team at EAI and in particular Anna Horvathova, the conference manager of CROWNCOM 2016. We would also like to express our thanks to the team at CEA: our conference secretary and the conference webchair for their steady support and our local chairs for setting up all the logistics.
Finally, the committee would like express its gratitude to the conference sponsors. In 2016, CROWNCOM was very pleased by the support of a large number of pres- tigious sponsors, stressing the importance of the conference for the wireless commu- nity. Namely, we express our warmest thanks to Keysight Technologies, Huawei, Nokia, National Instruments, Qualcomm, and the European ForeMont project.
We hope you will enjoy the proceedings of CROWNCOM 2016. April 2016
Dominique Noguet Klaus Moessner
Jacques Palicot
Organization
Steering CommitteeImrich Chlamtac Create-Net, EAI, Italy Thomas Hou Virginia Tech, USA Abdur Rahim Biswas Create-Net, Italy Tao Chen
VTT – Technical Research Centre of Finland, Finland Tinku Rasheed CREATE-NET, Italy Athanasios Vasilakos Kuwait University, Kuwait
Organizing Committee
General Chair Dominique Noguet CEA-LETI, France Conference Secretary Sandrine Bertola CEA-LETI, France Technical Program Chair Klaus Moessner University of Surrey, UK Publicity and Social Media Chairs Emilio Calvanese-Strinati CEA-LETI, France Masayuki Ariyoshi NEC, Japan Apurva Mody BAE systems, USA Publication Chair Jacques Palicot Centrale Supélec, France Keynote and Panel Chair Paulo Marques Instituto de Telecomunicacoes, Portugal Workshop Chair Jordi Pérez-Romero UPC, Spain Tutorial Chair
KU Leuven, Belgium Sofie Polin VIII Organization
Exhibition and Demonstration Chair Benoit Miscopein CEA-LETI, France Local Chairs Pascal Conche CEA, France Audrey Scaringella CEA, France Web Chair Yoann Roth CEA-LETI, France
Technical Program Committee Chairs
Track 1: Dynamic Spectrum Access/Management and Database Co-chairs Oliver Holland King’s College, London, UK Kaushik Chowdhury Northeastern University, USA Track 2: Networking Protocols for Cognitive Radio Co-chairs Luca De Nardis Sapienza University of Rome, Italy Christophe Le Martret Thales Communications & Security, France Track 3: PHY and Sensing Co-chairs Friedrich Jondral Karlsruhe Institute of Technology, Germany Stanislav Filin NICT, Japan Track 4: Modelling and Theory Co-chairs Panagiotis Demestichas University of Piraeus, Greece Danijela Cabric UCLA, USA Track 5: Hardware Architecture and Implementation Co-chairs Seungwon Choi Hanyang University, Korea Olivier Sentieys Inria, France Track 6: Next Generation of Cognitive Networks Co-chairs
Track 7: Standards, Policies, and Business Models Co-chairs Martin Weiss University of Pittsburgh, USA Baykas Tuncer Istanbul Medipol University, Turkey
Technical Program Committee Members
Hamed Ahmadi University College Dublin, Ireland Adnan Aijaz Toshiba Research European Labs, UK Ozgur Barış Akan Koc University, Turkey Anwer Al-Dulaimi University of Toronto, Canada Mohammed Altamimi Communications and IT Commission, Saudi Arabia Onur Altintas Toyota, Japan Osama Amin KAUST, Saudi Arabia Xueli An European Research Centre, Huawei Technologies,
Germany Peter Anker Netherlands Ministry of Economic Affairs,
The Netherlands Angelos Antonopoulos Telecommunications Technological Centre of Catalonia (CTTC), Spain Ludovic Apvrille Telecom-ParisTech, France Masayuki Ariyoshi NEC, Japan Stefan Aust NEC, Japan Faouzi Bader Centrale Supélec Rennes, France Arturo Basaure Aalto University, Finland Bernd Bochow Fraunhofer Focus, Germany Tadilo Bogale
Institut National de recherche Scientifique, Canada Doug Brake Information Technology and Innovation Foundation,
USA Omer Bulakci Huawei, Germany Carlos E. Caicedo Syracuse University, USA Emilio Calvanese-Strinati CEA-LETI, France Chan-Byoung Chae Yonsei University, Korea Ranveer Chandra Microsoft, USA Periklis Chatzimisios Alexander TEI of Thessaloniki, Greece Pravir Chawdhry Joint Research Center EC, Italy Luiz Da Silva Virginia Tech, USA Antonio De Domenico CEA-LETI, France Luca De Nardis Sapienza University of Rome, Italy Thierry Defaix DGA-MI, France Simon Delaere iMinds, Belgium Panagiotis Demestichas University of Piraeus, Greece Marco Di Felice University of Bologna, Italy Rogério Dionisio Instituto de Telecomunicações, Portugal
Organization
IX Marc Emmelmann Fraunhofer FOKUS, Germany Ozgur Ergul Koc University, Turkey Serhat Erkucuk Kadir Has University, Turkey Takeo Fujii University of Electro-Communications, Japan Piotr Gajewski Military University of Technology, Poland Yue Gao Queen Mary University of London, UK Matthieu Gautier
IRISA, France Liljana Gavrilovska Ss Cyril and Methodius University of Skopje,
Republic of Macedonia Andrea Giorgetti WiLAB, University of Bologna, Italy Michael Gundlach Nokia, Germany Hiroshi Harada Kyoto University, Japan Ali Hassa NUST School of Electrical Engineering and Computer Science (SEECS), Pakistan Stefano Iellamo Crete (at ICS-FORTH), Greece Florian Kaltenberger Eurecom, France Du Ho Kang Ericsson, Sweden Pawel Kaniewski Military Communication Institute, Poland Mika Kasslin Nokia, Finland Shuzo Kato Tohoku University, Japan Seong-Lyun Kim Yonsei University, Korea Adrian Kliks Poznan University of Technology, Poland Heikki Kokkinen Fairspectrum, Finland Kimon Kontovasilis NCSR Demokritos, Greece Pawel Kryszkiewicz Poznan University of Technology, Poland Samson Lasaulce L2S-CNRS, France Vincent Le Nir Royal Military Academia, Belgium Didier Le Ruyet CNAM, France Per H. Lehne Telenor, Norway William Lehr MIT, USA Janne Lehtomäki University of Oulu, Finland Zexian Li Nokia Networks, Finland Dan Lubar RelayServices, USA Irene MacAluso Trinity College Dublin, Ireland Allen, Brantley MacKenzie Virginia Tech, USA Milind Madhav Buddhikot Alcatel Lucent Bell Labs, USA Petri Mähönen RWTH Aachen University, Germany Vuk Marojevic Virginia Tech, USA Paulo Marques Instituto de Telecomunicacoes, Portugal Torleiv Maseng FFI, Norway Daniel Massicotte UQTR, Canada Raphaël Massin Thales Communications & Security, France Marja Matinmikko
VTT, Finland Arturas Medeisis
ITU Representative, Saudi Arabia Albena Mihovska Aalborg University, Denmark
X Organization
Markus Mueck Intel Deutschland, Germany Dominique Noguet CEA-LETI, France Keith Nolan Intel, Ireland Jacques Palicot Centrale Supélec, France Dorin Panaitopol Thales Communications & Security, France Stephane Paquelet BCOM, France Kathick Parashar
IMEC, Belgium Przemyslaw Pawelczak Tu Delft, The Netherlands Milica Pejanovic-Djurisic University of Montenegro, Montenegro Jordi Pérez-Romero UPC, Spain Marina Petrova RWTH Aachen University, Germany Sofie Polin
KU Leuven, Belgium Zhijin Qin Queen Mary University of London, UK Igor Radusinovic University of Montenegro, Montenegro Mubashir Rehmani COMSATS Institute of Information Technology,
Pakistan Alex Reznik InterDigital, USA Tanguy Risset
INSA Lyon, France Olivier Ritz iMinds, Belgium Henning Sanneck Nokia Networks, Germany Naotaka Sato Sony, Japan Berna Sayrac Orange Lab, France Shahriar Shahabuddin University of Oulu, Finland Hongjian Sun Durham University, UK Wuchen Tang TAMU, Qatar Terje Tjelta Telenor, Norway Dionysia Triantafyllopoulou University of Surrey, UK Theodoros Tsiftsis Industrial Systems Institute, Greece Fernando Velez Instituto de Telecomunicações-DEM, Portugal Christos Verikoukis CTTC, Spain Guillaume Villemaud
INSA Lyon, France Anna Vizziello University of Pavia, Italy Ji Wang Virginia Tech, USA Honggang Wang UMass Dartmouth, USA Gerhard Wunder Fraunhofer HHI, Germany Alex Wyglinski Worcester Polytechnic Institute, USA Seppo Yrjöla Nokia, Finland Jens Zander KTH, Sweden Yonghong Zeng
I2R A-STAR, Singapore Hans-Jurgen Zepernick Blekinge Institute of Technology, Sweden
Organization
XI
Contents
and Daniel Riviello . . .
Markku Jokinen, Marko Mäkeläinen, Tuomo Hänninen, Marja Matinmikko, and Miia Mustonen
Sumit J. Darak, Amor Nafkha, Christophe Moy, and Jacques Palicot
Andrey Garnaev, Shweta Sagari, and Wade Trappe
. . .
Pawan Dhakal, Shree K. Sharma, Symeon Chatzinotas, Björn Ottersten,
Dynamic Spectrum Access/Management and Database . . . . . . .
Lifeng Hao, Sixing Yin, and Zhaowei Qu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
and Rabah Attia . . . . . . . . . . . . . .
Feten Slimeni, Bart Scheers, Vincent Le Nir, Zied Chtourou,
Yiteng Wang, Youping Zhao, Xin Guo, and Chen Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Navikkumar Modi, Christophe Moy, Philippe Mary, and Jacques Palicot . . . . . . . . . . . . . . . . . . . . .
Suzan Bayhan, Gopika Premsankar, Mario Di Francesco, and Jussi Kangasharju XIV Contents
Charles Jumaa Katila, Melchiorre Danilo Abrignani, and Roberto Verdone
Yongjae Kim, Yonghoon Choi, and Youngnam Han
Anirudh Agarwal, Shivangi Dubey, Ranjan Gangopadhyay, and Soumitra Debnath
Fatima Zohra Kaddour, Dimitri Kténas, and Benoît Denis
Adrian Kliks and Paweł Kryszkiewicz
Networking Protocols for Cognitive Radio
Van-Thiep Nguyen, Matthieu Gautier, and Olivier Berder
M. Ranjeeth and S. Anuradha
Marcela M. Gomez and Martin B.H. Weiss
Maryam Riaz, Seiamak Vahid, and Klaus Moessner
PHY and Sensing
Yoann Roth, Jean-Baptiste Doré, Laurent Ros, and Vincent Berg
Shaojie Liu, Zhiyong Feng, Yifan Zhang, Sai Huang, and Dazhi Bao
Contents
XV
Hussein Kobeissi, Youssef Nasser, Amor Nafkha, Oussama Bazzi, and Yves Louët
Kalyana Gopala and Dirk Slock
Jean-Baptiste Doré and Vincent Berg
Dominique Noguet and Jean-Baptiste Doré 2 D.K. Patel and Y.N. Trivedi
Natalia Y. Ermolova and Olav Tirkkonen
Hussein Kobeissi, Amor Nafkha, Youssef Nasser, Oussama Bazzi, and Yves Louët
Abbass Nasser, Ali Mansour, Koffi-Clement Yao, Hussein Charara, and Mohamad Chaitou
Deep Chandra Kandpal, Vaibhav Kumar, Ranjan Gangopadhyay, and Soumitra Debnath
Modelling and Theory
Alexandre Marquet, Cyrille Siclet, Damien Roque, and Pierre Siohan
Juwendo Denis, Mylene Pischella, and Didier Le Ruyet
Vincent Berg and Jean-Baptiste Doré
Essia Ben Abdallah, Serge Bories, Dominique Nicolas, Alexandre Giry, and Christophe Delaveaud
Alexandre Debard, Patrick Rosson, David Dassonville, and Vincent Berg
Hanna Becker, Ankit Kaushik, Shree Krishna Sharma, Symeon Chatzinotas, and Friedrich Jondral
Lin Li, Amanullah Ghazi, Jani Boutellier, Lauri Anttila, Mikko Valkama, and Shuvra S. Bhattacharyya
Hardware Architecture and Implementation
June Hwang, Jinho Choi, Riku Jäntti, and Seong-Lyun Kim
Victor Quintero, Samir M. Perlaza, Iñaki Esnaola, and Jean-Marie Gorce
Eva Perez, Karl-Josef Friederichs, Andreas Lobinger, Bernhard Wegmann, and Ingo Viering
Alexandros Palaios, Vanya M. Miteva, Janne Riihijärvi, and Petri Mähönen
Vincent Savaux, Apostolos Kountouris, Yves Louët, and Christophe Moy
XVI Contents
Contents
XVII
Mai-Thanh Tran, Matthieu Gautier, and Emmanuel Casseau
Robin Gerzaguet, Laurent Ros, Fabrice Belvéze, and Jean-Marc Brossier
Marko Höyhtyä, Juha Korpi, and Mikko Hiivala
Next Generation of Cognitive Networks
Jessica Oueis and Emilio Calvanese Strinati
Zaheer Khan and Janne Lehtomäki
Prasanth Karunakaran and Wolfgang Gerstacker
Rémi Bonnefoi, Christophe Moy, and Jacques Palicot
Francesco Guidi, Anna Guerra, Antonio Clemente, Davide Dardari, and Raffaele D’Errico
Mouhcine Mendil, Antonio De Domenico, Vincent Heiries, Raphaël Caire, and Nouredine Hadj-said
Hangqi Li, Xiaohui Zhao, and Yongjun Xu
Dazhi Bao, Hao Zhou, Hao Chen, Shaojie Liu, Yifan Zhang, and Zhiyong Feng XVIII Contents
Standards, Policies and Business Models
Seppo Yrjölä, Petri Ahokangas, and Pekka Talmola
Petri Ahokangas, Kari Horneman, Marja Matinmikko, Seppo Yrjölä, Harri Posti, and Hanna Okkonen
Michal Szydelko and Marcin Dryjanski
Workshop Papers
Cyril Jouanlanne, Christophe Delaveaud, Yolanda Fernández, and Adrián Sánchez
Slavica Tomovic and Igor Radusinovic
Bessem Sayadi, Marco Gramaglia, Vasilis Friderikos, Dirk von Hugo, Paul Arnold, Marie-Line Alberi-Morel, Miguel A. Puente, Vincenzo Sciancalepore, Ignacio Digon, and Marcos Rates Crippa
Niccolò Iardella, Giovanni Stea, Antonio Virdis, Dario Sabella, and Antonio Frangioni
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dynamic Spectrum Access/Management and Database
A New Evaluation Criteria for Learning
Capability in OSA Context
1( )1
Navikkumar Modi , Christophe Moy , Philippe Mary ,
2 B
1 1 and Jacques Palicot
CentraleSupelec/IETR, Avenue de la Boulaie, 35576 Cesson Sevigne, France
{navikkumar.modi,christophe.moy,jacques.palicot}@centralesupelec.fr
2 INSA de Rennes, IETR, UMR CNRS 6164, 35043 Rennes, Francephilippe.mary@insa-rennes.fr
Abstract. The activity pattern of different primary users (PUs) in the
spectrum bands has a severe effect on the ability of the multi-armed
bandit (MAB) policies to exploit spectrum opportunities. In order to
apply MAB paradigm to opportunistic spectrum access (OSA), we must
find out first whether the target channel set contains sufficient structure,
over an appropriate time scale, to be identified by MAB policies. In this
paper, we propose a criteria for analyzing suitability of MAB learning
policies for OSA scenario. We propose a new criteria to evaluate the
structure of random samples measured over time and referred as Optimal
Arm Identification (OI) factor. OI factor refers to the difficulty associated
with the identification of the optimal channel for opportunistic access.
We found in particular that the ability of a secondary user to learn
the activity of PUs spectrum is highly correlated to the OI factor but
not really to the well known LZ complexity measure. Moreover, in case
of very high OI factor, MAB policies achieve very little percentage of
improvement compared to random channel selection (RCS) approach.
Keywords: Cognitive radio Opportunistic spectrum access Rein-
· ·forcement learning Multi-armed bandit Lempel-Ziv (LZ) complexity
· · · Optimal Arm Identification (OI) factor
1 Introduction
Spectrum learning and decision making is a core part of the cognitive radio (CR) to get access to the underutilized spectrum when not occupied by a licensed or primary users (PUs). In particular, we deal with multi-armed bandit (MAB) paradigm, which allows unlicensed or secondary user (SU) to make action to select a free channel to transmit when no PUs are using it, and finally learns about the optimal channel, i.e. channel with the highest probability of being vacant, in the long run
].
The PU activity pattern, i.e. presence or absence of PU signal in the spec- trum band, can be modeled as a 2-state Markov Process
]. In case of SU
trying to learn about the probability for a channel to be vacant, the success of
4 N. Modi et al.
MAB policy is affected by the amount of structure obtained in the PUs activity pattern in these channels
]. There exist several pieces of work on opportunistic
spectrum access (OSA), by means of learning and predicting an opportunity, which deals with CR to find out the PUs activity pattern. In this paper, we address the following fundamental questions, affecting MAB performance: (i) when is it advantageous to apply MAB learning framework to address the prob- lem of opportunistic access for SU? (ii) Is the use of MAB policies for OSA is justified over a simple non-intelligent approach?
In almost all cases, the performance of MAB learning policies has been studied with respect to PUs activity level, i.e. probability that PUs occupy radio channels
. In some cases, spectrum utilization is modeled as an independent and iden-
tically distributed (i.i.d) process
, and it does not take into account the likely
sequential activity patterns of PUs in the spectrum. To address the first question raised above, the Lempel-Ziv (LZ) complexity was introduced in
to character-
ize the PU activity pattern for general reinforcement learning (RL) problem. How- ever, MAB paradigm is a special kind of RL game where SU maximizes its long term reward by making action to learn about the optimal channel, opposed to gen- eral RL framework where SU is interacting with a system by making actions and learns about the underlying structure of the system. Moreover, in this paper, we propose the Optimal Arm Identification (OI) factor to identify the difficulty asso- ciated with prediction of an optimal channel having highest probability of being vacant from the set of channels. Finally, the last question raised above is answered by comparing the performance of MAB policies against the random channel selec- tion (RCS) approach (a non-intelligent approach).
We found out that, for several spectrum utilization patterns, MAB poli- cies can be beneficial compared to non-intelligent approach, but the percentage of improvement is highly correlated with the level of OI factor and very little affected by the level of LZ complexity. This result does not just emphasize apho- rism that performance of MAB policies in OSA framework depends extremely on the OI factor associated with the selected channel set. The work presented in this paper can be the answer to the question raised in several papers about the effectiveness of MAB framework for OSA scenario. The remainder of this paper is organized as follows. In Sect.
we introduce MAB framework and RL
policies which are used to verify the effect of PUs activity pattern on spectrum learning performance. Section
numer-
ical results giving the efficiency of MAB policies w.r.t. the output of LZ and OI criteria are presented. Finally, Sect.
concludes the paper.
2 System Model and RL Policies
secondary transceiver pair (Tx-Rx) and a
set of channel K = {1, · · · , K}. SU can access one of the K channels if it is not 1
A New Evaluation Criteria for Learning Capability in OSA Context
5
occupied by PUs. The i-th channel is modeled by an irreducible and aperiodic
i i i
discrete time Markov chain with finite state space S . P = p , (k, l ∈ {0, 1})
kl
denotes the state transition probability matrix of the i-th channel, where 0 and
i
1 are the Markov states, i.e. occupied and free respectively. Let, π be the sta- tionary distribution of the Markov chain defined as:
i i
p p π i i i
10
01
= [π , π ] = , . (1)
1 i i i i
p + p p + p
10
01
10
01 i i
S (t) being the state of the channel i at time t and r (t) ∈ R is the reward
i
associated to the band i. Without loss of generality we can assume, that r (t) =
i i i
S (t), i.e. S (t) = 1 if sensed free and S (t) = 0 if sensed occupied. The stationary
i i
mean reward µ of the i-th channel under stationary distribution π is given
i i
by: µ = π . A channel is said optimal when it has the highest mean reward ∗ ∗
i i i ∗
µ , such that µ > µ and i = i, i ∈ {1, · · · , K}, i.e. a channel with the highest probability to be vacant. The mean reward optimality gap is defined as ∗
i i
∆ = µ − µ . The regret R(t) of a MAB policy up to time t, is defined as the
i i
reward loss due to selecting sub-optimal channel µ : ∗ t i R(t) = tµ − r(m), (2)
m=0
2.1 Reinforcement Learning (RL) Approaches We consider two different reinforcement learning (RL) strategies, i.e. UCB1 and Thomson-Sampling (TS), in order to evaluate the learning efficiency of MAB policies on channel set containing different PUs activity pattern. These policies
as an approach to solve MAB
problem and they attempt to identify the most vacant channel in order to max- imize their long term reward. Figure
illustrates a realization of the random
process: ‘occupancy of spectrum bands by PUs’. In this figure, all channels do not have the same occupancy ratio and it seems intuitively clear that the more different the channel occupations are, the easier the learning.
Upper Confidence Bound (UCB) Policy. It has been shown previously in
] that UCB1 allows spectrum learning and decision making in OSA context in
order to maximize the transmission opportunities. UCB1 is a RL based policy, learning about the optimal channel from previously observed rewards starting from scratch, i.e. without any a priori knowledge on the activity within the set of channels. For each time t, UCB1 policy updates indices named as B ,
t,i,T i (t)
where T i (t) is the number of times the i-th channel has been sensed up to time t, and returns the channel index a = i of the maximum UCB1 index. UCB1 is
t
detailed in Algorithm
where α is the exploration-exploitation coefficient. If α increases, the bias A dominates and UCB1 policy explores new channels. t,i,T (t) i
Otherwise, if α decreases, the index computation is governed by ¯ X and the
i,T (t) i policy tends to exploit the previously observed optimal channel.
6 N. Modi et al.
Fig. 1.
PUs activity pattern
Algorithm 1. UCB1 policy
Input: K , α Output: a t 1: for t = 1 to n do if t 2: ≤ K then a 3: t = t + 1 else 4: t−1 1 T , 5: i (t) = a =i ∀i m=0 m α ln t t−1 =0 m S (m)1 i am=i ¯
X A , 6: = , = ∀i i,T (t) t,i,T (t) i T (t) i T (t) i i B X , 7: t,i,T (t) = ¯ i,T (t) + A t,i,T (t) ∀i i i i 8: a = arg max (B ) t i t,i,T (t) i 9: end if
10: end for
Thomson-Sampling (TS) Policy. Introduced in
] and detailed in
Algorithm
TS selects a channel having the highest J index, computed t,i,T (t) i t−1 i i
with the β function w.r.t. two arguments, i.e. G = S (m)1 and
i,T (t) a m =i m=0 i i i i F = T (t) − G , where T (t) has the same meaning than previously. i,T (t) i,T (t)
The former argument is the total number of free state observed up to time t for channel i and the second is the total number of occupied state. For start, no prior knowledge on the mean reward of each channel is assumed i.e. uniform dis- tribution and hence the index for all channels is set to β(1, 1). TS policy updates
i i i the distribution on mean reward µ as β G + 1, F + 1 . i,T (t) i,T (t)
A New Evaluation Criteria for Learning Capability in OSA Context
7
Algorithm 2. Thomson-Sampling (TS) policy
Input: K , G i,1 = 0, F i,1 = 0 Output: a t 1: for t = 1 to n do J 2: = β(G i + 1, F i + 1) t,i,T (t) i,T (t) i,T (t) i J 3: Sense channel a t = arg max i i t,i,T (t) i 4: Observe state S (t) t−1 t−1 1 i T , G S ,
5: i (t) = a =i ∀i, i,T (t) = (m)1 a =i ∀i m=0 m=0 m m i F , 6: = T i (t) − G ∀i i,T (t) i,T (t) i i 7: end for
3 PUs Activity Pattern vs Difficulties of Prediction
In general, multi-armed bandit (MAB) algorithms are evaluated with the PUs traffic load which characterizes the occupancy of the spectrum band. Intuitively, the higher occupancy of the channel by PUs, the more difficult the opportunis- tic access for SU will be. However, traffic load of the PUs is not sufficient for evaluating the efficiency of MAB policies. In fact, performance of MAB policies leverages on the structure of the PUs activity pattern and also on the difficulties associated with identification of the optimal channel, i.e. channel with optimal ∗
i
mean reward distribution µ . The ON/OFF PUs activity model approximates the spectrum usage pattern as depicted in Fig.
Moreover, if the separation
between the mean reward distribution of the optimal and a sub-optimal channel is large, SU should be able to converge to the optimal channel faster, and thus achieves a higher number of opportunistic accesses. Therefore, estimating the amount of structure present in the PUs activity pattern is of essential interest for applying machine learning strategies to OSA.
3.1 Lempel-Ziv (LZ) Complexity Lempel-Ziv (LZ) complexity was proposed in
] as a measure for character-
izing randomness of sequences. It has been widely adopted in several research areas such as biomedical signal analysis, data compression and pattern recogni- tion. Lempel and Ziv, in
], have associated to every sequence a complexity c
which is estimated by looking at the sequence and incrementing c every time a new substring of consecutive symbols is available. Then c is normalized via the asymptotic limit n/ log (n), where n is the length of the sequence. LZ complexity
2
is a property of individual sequences and it can be estimated regardless of any assumptions about the underlying process that generated the data. In
], the
authors have applied the LZ definition to the production rate of new patterns in Markovian processes. This is of particular interest when PUs activity is modeled as Markov process to evaluate the efficiency of MAB policies. For an ergodic source, LZ complexity equals the entropy rate of the source, which for a Markov chain S is given by
]:
8 N. Modi et al.
h(S) = − π k p k,l log p k,l , k, l ∈ {0, 1}, (3)
k,l
where p k,l is the transition probability between state k and l. System with LZ complexity equal to 1 implies very high rate of new patterns production and thus it could make difficult for the learning policy to predict the next sequence. For example in Fig.
channels 1 to 4 have different PUs activity pattern char- acterized by normalized LZ complexity of 0.05, 0.30, 0.60 and 0.66, respectively.
It is clear that prediction of next vacancy is an easy task in case of channel 1 which has lower LZ complexity, whereas it becomes more and more difficult to predict next vacancy in channel 4 which has higher LZ complexity.
3.2 Optimal Arm Identification (OI) Factor As stated before, performance of MAB policy applied to OSA context also lever- ages on the separation between optimal and sub-optimal channels mean reward distribution. Here, we define another criteria to characterize the difficulty for a MAB to learn the PUs spectrum occupancy. The MAB policy learns, based on past observations, which channel is optimal in term of mean reward distribution in the long run. The optimal arm identification for MAB framework has been studied since the 1950s under the name ‘ranking and identification problems’
.
In recent advances in MAB context, an important focus was set on a different perspective, in which each observation is considered as a reward: the user tries to maximize his cumulative reward. Equivalently, its goal is to minimize the expected regret R(t), as defined in
, regret
R(t), defined as the reward loss due to the selection of sub-optimal channels, up to time t is upper bounded uniformly by a logarithmic function: ∗ ln t
i i
R(t) ≤ a + b (µ − µ ), (4) ∗
i i i i∗ i i∗ (µ − µ ) i:µ <µ i:µ <µ where a and b are constants independent from channel parameters and time t.
As stated in
, upper bound on regret of MAB policy is scaled by the change
∗
i iin mean reward optimality gap ∆ = (µ − µ ). Intuitively, decreasing ∆ makes
i i
the upper bound looser and thus increases the uncertainty on MAB policies performance. In this paper, we propose the OI factor H as a measure of difficulty
1
associated with finding an optimal channel among several other channels: ∗
K i i
(µ − µ ) H = 1 − , (5)
1 K ∗ i=1 i i
where, µ and µ are the mean reward distribution of sub-optimal and optimal
∗
channels respectively, K is the number of channels and i is the index of optimal channel. H measures how close the mean reward of all sub-optimal channels are
1
from the mean reward of the optimal channel. If H is close to 1 then all channels
1
have very closely distributed mean reward, thus it becomes almost impossible for learning policy to identify the optimal channel from the set of channels.
A New Evaluation Criteria for Learning Capability in OSA Context
9
4 Impact of LZ Complexity and OI Factor on Prediction
AccuracyIn this section, two MAB policies, i.e. UCB1 and Thomson-Sampling (TS), are investigated and the performance they achieve are put in correlation with the information given by LZ complexity and OI factor H . Markov chains with sev-
1
eral levels of stationary distribution π = [π , π ], LZ complexity and H factor
1
1
are generated for further numerical analysis. For simulation convenience, some parameters need to be set. Indeed,
is an undetermined system with two
unknowns p
01 and p 10 . Therefore, as a side step, we considered 9 different levels
of π
1 , i.e. probability of being vacant, as 0.1, 0.2, · · · , 0.9. For these values of π
1 ,we obtained 45 different transition probability matrices P , each corresponding
45
to different LZ complexity. A total of combinations are obtained by con-
5
sidering 45 different transition probability matrices and K = 5 channels, and those correspond to various H
1 factor. Finally, MAB policies are applied to the
45
randomly selected 2000 combinations from a total of combinations. Every
5
point in each figure corresponds to one realization of MAB policies. For each
2
4
realization, policy is executed over 10 iterations of 10 time slots each. More- over, the exploitation-exploration coefficient in UCB1 is set to α = 0.5 which is proved to be efficient for maintaining a good tradeoff between exploration and exploitation
].
4.1 Probability of Success Probability of success P is computed by considering the number of times
Succ
vacant channel is explored over the number of iterations. The success probability depends on the probability P that there exists at least one free channel from
f
the set of channels K. Considering that the channel occupation is independent from one channel to another, we have
:
K iP f = 1 − π , (6)
i=1 i
where π is the probability that the i-th channel is occupied.
Figure
a) and (b) depict the probability of success P of UCB1 and TS Succ
policies, i.e. the probability that these policies access to a free channel, according to the probability that at least one channel is free, i.e. P and LZ complexity.
f
In both figures, success probability increases with P for a given level of LZ
f complexity. However, in Fig. a) and (b), for a given P , several values of LZ f
complexity lead to the same level of performance for UCB1 and TS algorithms. This reveals that LZ complexity is not really related to the ability of UCB1 and TS policies to learn the scenario. For instance in Fig.
a) and (b), SU is able
to achieve more than 90 % of probability of success on a channel set with LZ complexity of 0.2 and P = 0.98, whereas it only achieve 75 % of probability of
f