Dynamic spectrum access under partial observations: A restless - PowerPoint PPT Presentation

Dynamic spectrum access under partial observations: A restless bandit approach Nima Akbarzadeh, Aditya Mahajan McGill University, Electrical and Computer Engineering Department June 3, 2019 1/23

Restless Bandits Example 2/23

Channel Scheduling Problem At which time, which channel and which resource should be used? Features: Time-varying channels Partially-observable environment Resource Allocation Examples: Cognitive radio networks Resource constraint jamming 3/23

Model (Channel) n finite state Markov channels, N = { 1 , . . . , n } . State space is finite ordered set S i , i ∈ N Markov state process: { S i t } t ≥ 0 Transition Probability Matrix: P i Resource: rate, power, bandwidth, etc., R = {∅ , r 1 , . . . , r k } Payoff: ρ i ( s , r ), s ∈ S i , r ∈ R ρ i ( s , r ) = 0 if r = ∅ Example: S i = { s bad , s good } , R = { r low , r high }  r low , if r = r low   ρ i ( s , r ) = r high , if r = r high and s = s good  0 , if r = r high and s = s bad  4/23

Model (Transmitter) Two decisions to make at each time t : Select L channels indexed by L t A i t = 1 if i ∈ L t and 0 otherwise Select resources denoted by R i t R i t = ∅ if i / ∈ L t Observation Process: � S i if A i t , t = 1 Y i t = if A i E , t = 0 . Strategies: A t = f t ( Y 0: t − 1 , R 0: t − 1 , A 0: t − 1 ) , R t = g t ( Y 0: t − 1 , R 0: t − 1 , A 0: t − 1 , A t ) . 5/23

Model (Optimization Problem) Problem Given a discount factor β ∈ (0 , 1) , a set of resources R , and the state space, transition probability, and reward function ( S i , P i , ρ i ) i ∈N for all channels, choose a communication strategy ( f , g ) to maximize � ∞ � � β t � ρ i ( S i t , R i t ) A i J ( f , g ) = E . t t =0 i ∈N 6/23

Literature Review and Approaches Partially Observable Markov Decision Process (POMDP). POMDP models suffer from curse of dimensionality: The state space size is exponential in the number of channels Simplified modelling assumptions: Two state Gilbert-Elliot channels Multi-state channels but identical Fully-observable Markov Decision Process (MDP) 7/23

Our contributions Multi-state non-identical channels Restless Bandit approach Convert the POMDP into a countable MDP Finite-state Approximation of the MDP 8/23

POMDP (Belief State) Belief state: Π i t ( s ) = P ( S i t = s | Y i 0: t − 1 , R i 0: t − 1 , A i 0: t − 1 ) . Proposition Let Π t denote (Π 1 t , . . . , Π n t ) . Then, without loss of optimality, A t = f t ( Π t ) R t = g t ( Π t , A t ) . Recall: f is chennel selection policy and g is resource selection policy. 9/23

Optimal Resource Allocation Strategy No need for joint optimization of ( f , g ). Let � ρ i ( π ) := max π ( s ) ρ i ( s , r ) , ¯ r ∈R s ∈S i � r i , ∗ ( π ) := arg max π ( s ) ρ i ( s , r ) . r ∈R s ∈S i Proposition Define g i , ∗ : ∆( S i ) × { 0 , 1 } → R as follows g i , ∗ ( π, 0) = ∅ , g i , ∗ ( π, 1) = r i , ∗ ( π ) . For any channel selection policy, ( g ∗ , g ∗ , . . . ) is an optimal resource allocation strategy. 10/23

Restless Bandit Model (1) Each { Π i t } t ≥ 0 , i ∈ N , is a bandit process. (2) The transmitter can activate L of these processes. (3) Belief state evolution: � if process i is activated, A i δ S i t , t = 1 , Π i t +1 = Π i t · P i , if process i is passive, A i t = 0 . (4) Expected reward: � ρ i (Π i if process i is activated, A i ¯ t ) , t = 1 , ρ i t = if process i is passive, A i 0 , t = 0 . Process: g ∗ . . . → Π i → A i f → R i t → Y i t → ρ i → Π i Dynamics: t +1 → . . . − − − t t t � �� time t 11/23

Restless Bandit Model (1) Each { Π i t } t ≥ 0 , i ∈ N , is a bandit process. (2) The transmitter can activate L of these processes. (3) Belief state evolution: � if process i is activated, A i δ S i t , t = 1 , Π i t +1 = Π i t · P i , if process i is passive, A i t = 0 . (4) Expected reward: � ρ i (Π i if process i is activated, A i ¯ t ) , t = 1 , ρ i t = if process i is passive, A i 0 , t = 0 . (0, 0, 1) Dynamics: Process: g ∗ . . . → Π i → A i f → R i t → Y i t → ρ i → Π i t +1 → . . . − − − t t t � �� time t (1, 0, 0) (0, 1, 0) 11/23

Restless Bandit Solution The main idea is to decompose the coupled n -channel optimization problem to n independent one-channel problems. When the Whittle indexability is satisfied, then one may propose a Whittle index policy. The channels with minimum indices are selected. The index strategy performs close-to-optimal for many applications in the state-of-arts works. Goal: We provide an efficient algorithm to check the indexability and compute the Whittle index. 12/23

Problem Decomposition ρ i ( π ) − λ ) a i where λ can be viewed as Modified per-step reward: (¯ the cost for transmitting over channel i . Problem Given channel i ∈ N , the discount factor β ∈ (0 , 1) , the cost λ ∈ R , and the belief state space, transition probability, reward function tuple (∆( S i ) , P i , ρ i ) , choose a policy f i : ∆( S i ) → { 0 , 1 } to maximize � ∞ � � J i λ ( f i ) := E β t � ρ i (Π i A i � ¯ t ) − λ . t t =0 13/23

Dynamic Programming (Belief State) Theorem Let V i λ : ∆( S i ) → R be the unique fixed point of equation V i a ∈{ 0 , 1 } Q i λ ( π ) = max λ ( π, a ) where Q i λ ( π, 0) = β V i λ ( π · P i ) � Q i ρ i π ( s ) V i λ ( π, 1) = ¯ λ ( π ) − λ + β λ ( δ s ) . s ∈S i Let f i λ ( π ) = 1 if Q i λ ( π, 1) ≥ Q i λ ( π, 0) , and f i λ ( π ) = 0 otherwise. Then, f i λ is optimal for Problem 2. Challenge: Continuous state space! 14/23

Dynamic spectrum access under partial observations: A restless - PowerPoint PPT Presentation

Dynamic spectrum access under partial observations: A restless bandit approach Nima Akbarzadeh, Aditya Mahajan McGill University, Electrical and Computer Engineering Department June 3, 2019 1/23 Restless Bandits Example 2/23 Channel

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Dynamic Spectrum Access ROBERT HORVITZ bob@openspectrum.info Menu Menu TVWS in Europe (or

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Spectrum Sharing in Cognitive Radio Networks By: H.Feizresan Summer 2009 1 Spectrum sharing in

WP5 : Auction- -Driven Driven WP5 : Auction Dynamic Spectrum Allocation Dynamic Spectrum

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

Radio Club of America Nov 21 st 2015 1 1 1 Dynamic Spectrum Arbitrage (DSA) Intro DSA

Spectrum The Electromagnetic Spectrum The EM spectrum is the ENTIRE range of EM waves in order

National Advanced Spectrum and Communications Test Network (NASCTN) T rusted Spectrum Testing 5G

DATA AT SWEDEN'S TELEVISION Ismail Elouafiq A wide spectrum of Apps A wide spectrum of Apps

Electromagnetic Spectrum Effects of Radiation Solar Output Solar Spectrum Visible Spectrum

Spread Spectrum Concept Frequency Hopping Spread Spectrum Direct Sequence Spread

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Enabling Multiple Controllable Radios in OMNeT++ Nodes lafur Helgason w. Sylvia Kouyoumdjieva

OFDMA Backscatter: Boosted Capacity Low Power IoT System Renjie Zhao 2017 6

Some Discussion Points Doctoral Dissertation Colloquium CTS 2012, May 22, 2012 Todd Davies

The Internet @ rural: Why not TV- White spaces 4E in Mozambique Salomo David PhD. Student @

An overview of the IEEE 802.22 Standard PRastegari P.Rastegari P.Rastegari@ec.iut.ac.ir Summer

Statistical methods for the detection of continuous gravitational waves M . A L E S S A N D R A

COHERENT Elastic Neutrino-Nucleus Scattering Kate Scholberg, Duke University IPA 2016, Orsay,

The COHERENT Collaboration: Initial Results and Present Status Samuel Hedges 30 May 2018