Designing States, Actions, and Rewards for Using POMDP in Session Search

نویسندگان

Jiyun Luo

Sicong Zhang

Xuchu Dong

Grace Hui Yang

چکیده

Session search is an information retrieval task that involves a sequence of queries for a complex information need. It is characterized by rich user-system interactions and temporal dependency between queries and between consecutive user behaviors. Recent efforts have been made in modeling session search using the Partially Observable Markov Decision Process (POMDP). To best utilize the POMDP model, it is crucial to find suitable definitions for its fundamental elements – States, Actions and Rewards. This paper investigates the best ways to design the states, actions, and rewards within a POMDP framework. We lay out available design options of these major components based on a variety of related work and experiment on combinations of these options over the TREC 2012 & 2013 Session datasets. We report our findings based on two evaluation aspects, retrieval accuracy and efficiency, and recommend practical design choices for using POMDP in session search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

Partially Observable Markov Decision Processes (POMDP) provide a standard framework for sequential decision making in stochastic environments. In this setting, an agent takes actions and receives observations and rewards from the environment. Many POMDP solution methods are based on computing a belief state, which is a probability distribution over possible states in which the agent could be. T...

متن کامل

Delayed reward-based genetic algorithms for partially observable Markov decision problems

Reinforcement learning often involves assuming Markov characteristics. However, the agent cannot always observe the environment completely, and in such cases, different states are observed as the same state. In this research, the authors develop a Delayed Reward-based Genetic Algorithm for POMDP (DRGA) as a means to solve a partially observable Markov decision problem (POMDP) which has such per...

متن کامل

Exploiting Fully Observable and Deterministic Structures in Goal POMDPs

When parts of the states in a goal POMDP are fully observable and some actions are deterministic it is possible to take advantage of these properties to efficiently generate approximate solutions. Actions that deterministically affect the fully observable component of the world state can be abstracted away and combined into macro actions, permitting a planner to converge more quickly. This proc...

متن کامل

Spatial and Temporal Abstractions in POMDPS: Learning and Planning

Introduction: A popular approach to artificial intelligence is to model an agent and its interaction with its environment through actions, perceptions, and rewards [1]. Intelligent agents should choose actions after every perception, such that their long-term reward is maximized. A well defined framework for this interaction is the partially observable Markov decision process (POMDP) model. Unf...

متن کامل

PUMA: Planning Under Uncertainty with Macro-Actions

Planning in large, partially observable domains is challenging, especially when a long-horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that plan a different potential action for each future observation can be prohibitively expensive when planning many steps ahead. An efficient solution for planning far into the future in fully observable domains is to use temp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Designing States, Actions, and Rewards for Using POMDP in Session Search

نویسندگان

چکیده

منابع مشابه

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

Delayed reward-based genetic algorithms for partially observable Markov decision problems

Exploiting Fully Observable and Deterministic Structures in Goal POMDPs

Spatial and Temporal Abstractions in POMDPS: Learning and Planning

PUMA: Planning Under Uncertainty with Macro-Actions

عنوان ژورنال:

اشتراک گذاری