Abstract
A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions.
Similar content being viewed by others
References
M. Aoki, Optimal control of partially observable Markovian systems, J. Frankl. I 280 (1965) 367–386.
K.J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965) 174–205.
J. Bean and R. Smith, Conditions for the existence of planning horizons, Math. Oper. Res. 9 (1984) 391–401.
D. Bertsekas,Dynamic Programming and Stochastic Control Academic Press, 1976).
D. Blackwell, Discounted dynamic programming, Ann. Math. Stat. 36 (1965) 226–235.
H. Cheng, Algorithms for partially observed Markov decision processes, Ph.D. dissertation, Faculty of Commerce and Business Administration, University of British Columbia (1988).
M. DeGroot,Optimal Statistical Decisions (McGraw-Hill, New York, 1970).
J. Eagle, The optimal search for a moving target when the search path is constrained, Oper. Res. 32 (1984) 1107–1115.
B.C. Eaves,A Course in Triangulations for Solving Equations with Deformations (Springer, New York, 1984).
J. Eckles, Optimum replacement of stochastically failing systems, Ph.D. thesis, Department of Engineering-Economic Systems, Stanford University, Stanford CA (1966).
A. Federgruen, P. Schweitzer and H. Tijms, Contraction mappings underlying undiscounted Markov decsion processes, J. Math. Anal. Appl. 65 (1978) 711–730.
D. Heyman and M. Sobel,Stochastic Models in Operations Research, vol. 2 (McGraw-Hill, New York, 1984).
R. Howard,Dynamic Probabilistic Systems (Wiley, New York, 1971).
J. Kakalik, Optimum policies for partially observable Markov systems, Tech. report 18, Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA (1965).
W.S. Lovejoy, Some monotonicity results for partially observed Markov decision processes, Oper. Res. 35 (1987) 736–743.
W.S. Lovejoy, Computationally feasible bounds for partially observed Markov decision processes, Research paper 1024, Stanford University Graduate School of Business, Stanford, CA (1989), to appear in Oper. Res.
J. MacQueen, A test for suboptimal actions in Markov decision processes, Oper. Res. 15 (1967) 559–561.
G. Monahan, A survey of partially observable Markov decision processes, Manag. Sci. 28 (1982) 1–16.
T. Morton and W. Wecker, Discounting, ergodicity, and convergence for Markov decision processes, Manag. Sci. 23 (1977) 890–900.
S. Mukherjee, N. Shahabuddin and K. Seth, Optimal control policies for partially observable Markov processes — A corrected and improved algorithm, unpublished manuscript, Indian Institute of Technology, Delhi, India (1985).
L.K. Platzman, Finite-memory estimation and control of finite probabilistic systems, Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA (1977).
L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM. J. Control Opt. 18 (1980) 362–380.
L.K. Platzman, A feasible computational approach to infinite-horizon partially observed Markov decision processes, Technical note J-81-2, Georgia Institute of Technology, Atlanta, GA (1981).
R.T. Rockafellar,Convex Analysis (Princeton University Press, Princeton, NJ, 1970).
D. Rosenfield, Markovian deterioration with uncertain information, Oper. Res. 24 (1976) 141–155.
S. Ross, Quality control under Markovian deterioration, Manag. Sci. 17 (1971) 587–596.
K. Sawaki and A. Ichikawa, Optimal control for partially observable Markov decision processes over an infinite horizon, J. Oper. Res. Soc. Japan 21 (1978) 1–15.
K. Sawaki, Piecewise linear Markov decision processes with an application to partially observable models, in:Recent Developments in Markov Decision Processes, R. Hartley et al. (eds.) (Academic Press, New York, 1980).
R.D. Smallwood and E.J. Sondik, Optimal control of partially observable processes over the finite horizon, Oper. Res. 21 (1973) 1071–1088.
E.J. Sondik, The optimal control of partially observable Markov processes, Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA (1971).
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted case, Oper. Res. 26 (1978) 282–304.
E.J. Sondik and R. Mendelssohn, Information seeking in Markov decision processes, Southwest Fisheries Center Administrative Resport H-79-13, Southwest Fisheries Center, National Marine Fisheries Service, NOAA, Honolulu, HI (1979).
A. Wald,Sequential Analysis, (Wiley, London, 1947).
C.C. White, Optimal control limit strategies for a partially observed replacement problem, Int. J. Sys. Sci. 10 (1979) 321–331.
C.C. White, Monotone control laws for noisy, countable-state Markov chains, Eur. J. Oper. Res. 5 (1980) 124–132.
C.C. White and W. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37 (1985) 791–797.
C.C. White and W. Scherer, Finite memory suboptimal design for partially observed Markov decision processes, Technical report, Department of Systems Engineering, University of Virginia, Charlottesville, VA (1989), submitted to Oper. Res.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lovejoy, W.S. A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28, 47–65 (1991). https://doi.org/10.1007/BF02055574
Issue Date:
DOI: https://doi.org/10.1007/BF02055574