Wednesday 19 February 2014

A reinforcement learning ticket-based probing path discovery scheme for MANETs

A reinforcement learning ticket-based probing path discovery scheme for MANETs:
Abstract:

In this paper, a path discovery scheme which supports QoS routing in mobile ad hoc networks (MANETs) in the presence of imprecise information is investigated. The aim is to increase the probability of success in finding feasible paths and reduce average path cost of a previously proposed ticket based probing (TBP) path discovery scheme. The proposed scheme integrates the original TBP scheme with a reinforcement learning method called the on-policy first visit Monte Carlo (ONMC) method. We investigate the performance of the ONMC method in the presence of imprecise information. Our numerical study shows that, in respect to a flooding based algorithm, message overhead reduction can be achieved with marginal difference in the path search ability and additional computational and storage requirements. When the average message overhead of the ONMC method is reduced to the same order of magnitude of the original TBP, the ONMC method gains an improvement of 28% in success ratio and 7% reduction in the average path cost over the original TBP

1. Introduction
A mobile ad hoc network (MANET) consists of a set of mobile nodes (hosts) which are equipped
with transmitters and receivers that allow them to communicate without the need of wire-based
infrastructures. Most of the existing routing protocols in MANETs have been focused on only best-effort data traffic. Routing schemes which can support connections with QoS requirements have only recently begun to receive attention. There are two keys to support QoS routing, namely, feasible route search and resource reservation [2–4,12,15]. Feasible route search can be done by distributed routing or source routing. In distributed routing, other nodes apart from the source node are involved in the feasible path(s) search and computation by identifying their neighboring nodes as the next hop
router. On the other hand, in source routing, a feasible path(s) is computed solely at the source
node. An alternative method is to perform flooding but have the source node calculate some
measure of control over the amount of flooding. This is a mixed feature of distributed and source routing which is the underlying concept of the Ticket-Based Probing (TBP) scheme [2,3] where
the amount of flooding here can be controlled by issuing a limited number of logical tickets at the
source node. The work reported in this paper builds on the earlier work of [2,3] in that it integrates the reinforcement learning (RL) framework into the TBP scheme. Despite several attractive features––as will be presented later on, the TBP scheme has some outstanding challenges. One of such issues relates to the restricted flooding method: the computation of a suitable number of logical tickets issued at the source node. More specifically, the original TBP scheme relies on an heuristic rule of ticket computation.
We enhance the original TBP scheme by using a RL technique. The RL-based TBP scheme learns a good rule for issuing tickets by interacting directly with the environment or by simulation.
Note that there are other works which apply the RL framework to MANET routing: [16] proposed
a multicast routing approach based on Q-learning concept, [13] suggested a possible application of
their multiagent routing scheme in MANETs. However, these works do not deal explicitly with
QoS requirements or message overhead. More recently in [8] a power-aware routing algorithm is
presented. In [8] the authors extend their work on cognitive packet networks (CPN) routing protocol
which provides intelligent QoS driven routing for peer-to-peer connections (see also [7,9]). The CPN
protocol uses smart packets to discover and maintain lists of neighbor nodes, and to dynamically
set up source-to-destination routes. Smart packets select routes using a RL algorithm rather
than flooding. In this paper, we gather further experimental evidence on the advantages and
limitations of RL techniques when employed to solve the underlying problem of QoS routing in
MANETs using flooding based strategies [17]. The underlying aim of the scheme presented in this
paper is to maximize the probability of success in finding feasible routes in dynamic topology networks in the presence of inaccurate information.

2. A POMDP model for MANETs Partial observability can occur when the topology of the MANET is highly dynamic. In such network, each mobile node acts as a router since there is no fixed infrastructure for routing support. Every mobile node is free to move and can enter or leave the network at any instant. In order to maintain up-to-date routing information at other mobile nodes, message exchanges between mobile nodes are required. These information exchanges are done periodically or when a topology change is detected. But even so, imprecise information can still arise due to delayed-arrival or lost update messages and restricted transmission of updating messages.
Furthermore, within MANETs which support quality-of-service (QoS) routing, residual resource
information in the network is critical. In particular, each mobile node maintains a state of the
network, i.e., the delay and bandwidth information to all other destinations in the network. Note
that such information depends on the route and consequently, on the current topology of the network.
The information is propagated through the MANET according to some updating protocol.
Since an accurate view of such information is difficult to obtain, each mobile node is faced with
only an ‘‘observation’’ of its environment which is most likely incomplete and inaccurate. Based on
its current network observation, each mobile node acts as an agent which must make certain decisions, e.g., how many control messages are needed to find a feasible path for some new connection arrival, when and how to perform path maintenance if an existing path is about to break, etc.

In this paper, we study a delay-constrained least cost routing problem. For this constrained routing
problem, there are two tasks. Firstly, we need to determine the suitable number of tickets ðM
Þ.In [2,3], M0(¼) Y0 G0 where Y 0 and G are the number of yellow and green tickets, respectively. These two types of tickets have different purposes. The yellow tickets are for maximizing the chances of finding feasible paths while the green tickets are 0 for maximizing the chances low cost paths.

6. Conclusion
In this paper, the TBP scheme based on the ONMC method studied in [17], is applied to support
QoS routing in a MANET environment. The reinforcement learning (RL)-based ONMC method,
relies on a look-up table representation which stores a value function for every observation and
action pair. The simulation study shows that the TBP schemes based on the ONMC method can achieve good ticket-issuing policies, in terms of the accumulated reward per episode, higher success ratio and lower average path cost, when compared to the original heuristic TBP scheme and a floodingbased TBP scheme. The RL-based TBP path discovery scheme (based on the ONMC method) here proposed, is flexible enough to foster various other objectives and costs functions proposed in the recent literature. In the present version of the RLbased TBP algorithm, decisions are made only once at the source node as new call requests are offered to the network.
Preliminary numerical results reported here suggest that the ONMC scheme can control the amount of flooding in the network. More specifically, it achieves 22.1–58.4% reduction in the average number of search messages compared to the flooding-based TBP scheme with marginal reduction 0.5–1.7% in success ratio. In addition, the ONMC scheme can attain 13–24.3% higher success ratio than the original heuristic TBP scheme at the expense of higher average message overhead. However, as the maximum number of allowable tickets is reduced to a level in which the average message overhead of the ONMC and the original TBP schemes are of the same magnitude, the ONMC scheme still gains 28% higher success ratio and 7% lower average path cost over the original heuristic TBP scheme.
In terms of implementation, the savings in the amount of generated search messages obtained by
the RL-based TBP schemes is at the expense of reasonable storage and computational requirements
of on-line decision parameters. The storage requirements grow linearly with the number of nodes in the network. The ONMC method requires OðjOjjAjÞ iterations where jOj and jAj are the sizes of the observation and action spaces. Note that jOj depends on the granulity of the quantized delay intervals whereas jAj depends on the number of tickets.
From the results of our experimental work gathered so far, it can be said that RL techniques
can play an important role in controlling search messages overhead in environments in which the
outcome of a decision is only partially observable. It is important to note that parameters of the
algorithm and the granularity of, for example, the action set A is important and further investigation
is being carried out at present on these issues. We are currently investigating methods to further reduce the average number of search messages and the integration of TBP schemes with other
POMDP RL approaches which will be reported in a forthcoming paper.

DOWNLOAD LINK:
CLICK ME

No comments:

Post a Comment