Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey

S.  Mahajan

Open Access Article Go Back

Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey

S. Mahajan¹

Section:Survey Paper, Product Type: Journal Paper
Volume-2 , Issue-5 , Page no. 72-78, May-2014

Online published on May 31, 2014

Copyright © S. Mahajan . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: S. Mahajan, “Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey,” International Journal of Computer Sciences and Engineering, Vol.2, Issue.5, pp.72-78, 2014.

MLA Style Citation: S. Mahajan "Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey." International Journal of Computer Sciences and Engineering 2.5 (2014): 72-78.

APA Style Citation: S. Mahajan, (2014). Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey. International Journal of Computer Sciences and Engineering, 2(5), 72-78.

BibTex Style Citation:
@article{Mahajan_2014,
author = {S. Mahajan},
title = {Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2014},
volume = {2},
Issue = {5},
month = {5},
year = {2014},
issn = {2347-2693},
pages = {72-78},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=162},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=162
TI - Hierarchical Reinforcement Learning in Complex Learning Problems: A Survey
T2 - International Journal of Computer Sciences and Engineering
AU - S. Mahajan
PY - 2014
DA - 2014/05/31
PB - IJCSE, Indore, INDIA
SP - 72-78
IS - 5
VL - 2
SN - 2347-2693
ER -

VIEWS	PDF	XML
3732	3447 downloads	3576 downloads

Bar Line

Abstract

Reinforcement Learning (RL) is an active research area of machine learning research based on the mechanism of learning from rewards. RL has been applied successfully to variety of tasks and works well for relatively small problems, but as the complexity grows, standard RL methods become increasingly inefficient due to large state spaces. This paper surveys Hierarchical Reinforcement Learning (HRL) as one of the alternative approaches to cope with issues regarding complex problems and increasing the efficiency of reinforcement learning. HRL is the subfield of RL that deals with the discovery and/or exploitation of underlying structure of a complex problem and solving it using reinforcement learning by breaking it up into smaller sub-problems. This paper gives an introduction to HRL, discusses its basic concepts, different algorithms, approaches and related work regarding Hierarchical Reinforcement Learning. At last but not the least this paper briefly gives variation between flat RL and HRL following its pros and cons. It concludes with research scope of HRL in complex problems.

Key-Words / Index Term

Machine Learning; Reinforcement Learning; Hierarchical Reinforcement Learning

References

[1] Richard Sutton and Andrew Barto (1998).Reinforcement Learning.MIT Press.ISBN 0-585-02445-6.
[2] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." arXiv preprint cs/9605103 (1996).
[3] Richard Ernest Bellman (1961). Adaptive control processes: a guided tour. Princeton University Press.
[4] Rusell, Stuart, and Peter Norvig. "Artificial intelligent: A modern approach." (2003).
[5] Hengst, Bernhard. "Hierarchical approaches." Reinforcement Learning. Springer Berlin Heidelberg, 2012. 293-323.
[6] Polya G (1945) How to Solve It: A New Aspect of Mathematical Model. Princeton University Press
[7] Pfeiffer, M (2004). Reinforcement Learning of Strategies for Settlers of Catan. Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education, Reading, UK. November 2004
[8] A. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Special Issue on Reinforcement Learning, Discrete Event Systems Jouranl, 13:41ï¿½77, 2003.
[9] Parr, R. and Russell, S. (1997). Reinforcement learning with hierarchies of machines. In Proceedings of Advances in Neural Information Processing Systems 10. MIT Press.
[10] Parr, R. (1998). Hierarchical Control and Learning for Markov Decision Processes. PhD thesis, University of California at Berkeley.
[11] Precup, D. and Sutton, R. S. (1997). Multi-time models for temporally abstract planning. In Proceedings of Advances in Neural Information Processing Systems 10. MIT Press.
[12] Precup, D., Sutton, R. S., and Singh, S. (1998). Theoretical results on reinforcement learning with temporally abstract behaviors. In Proceedings of the Tenth European Conference on Machine Learning, ECMLï¿½98. Springer-Verlag.
[13] Asadi, M and Huber, M (2004). State Space Reduction for Hierarchical Reinforcement Learning. In Proceedings of the 17th International FLAIRS Conference, pp. 509 - 514, Miami Beach, FL. 2004 AAAI
[14] Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13:227- 303
[15] Boutilier C, Dearden R, Goldszmidt M (1995) Exploiting structure in policy construction. In: Proceedings of the 14th international joint conference onArtificial intelligence -Volume 2, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1104-1111
[16] Ravindran, Balaraman. "SMDP homomorphisms: An algebraic approach to abstraction in semi markov decision processes." (2003).
[17] Chiu, Chung-Cheng, and Von-Wun Soo. "Subgoal identification for reinforcement learning and planning in multiagent problem solving." Multiagent System Technologies. Springer Berlin Heidelberg, 2007. 37-48.
[18] Forestier, J-P., and Pravin Varaiya. "Multilayer control of large Markov chains."Automatic Control, IEEE Transactions on 23.2 (1978): 298-305.
[19] Dean, Thomas, and Shieu-Hong Lin. "Decomposition techniques for planning in stochastic domains." IJCAI. Vol. 2. 1995.
[20] Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning."Artificial intelligence 112.1 (1999): 181-211.
[21] Dietterich, Thomas G. "Hierarchical reinforcement learning with the MAXQ value function decomposition." arXiv preprint cs/9905014 (1999).
[22] Andre D, Russell SJ (2000) Programmable reinforcement learning agents. In: Leen TK, Dietterich TG, Tresp V (eds) NIPS, MIT Press, pp 1019-1025
[23] Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: Dechter R, Kearns M, Sutton RS (eds) Proceedings of the Eighteenth National Conference on Artificial Intelligence, AAAI Press, pp 119-125
[24] Dietterich, Thomas G. "An overview of MAXQ hierarchical reinforcement learning." Abstraction, Reformulation, and Approximation. Springer Berlin Heidelberg, 2000. 26-44.
[25] Hengst, Bernhard. Discovering hierarchy in reinforcement learning. University of New South Wales, 2003.
[26] Hengst B (2008) Partial order hierarchical reinforcement learning. In: Australasian Conference on Artificial Intelligence, pp 138-149
[27] Jonsson A, Barto AG (2006) Causal graph based decomposition of factored mdps. In Journal of Machine Learning, vol 7, pp 2259-2301
[28] Bakker B, Schmidhuber J (2004) Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, pp 438-445
[29] Moerman W (2009) Hierarchical reinforcement learning: Assignment of behaviors to sub-policies by self-organization. PhD thesis, Cognitive Artificial Intelligence, Utrecht University
[30] Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323-339.
[31] Kaelbling, L. P. (1993). Hierarchical reinforcement learning: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, pp. 167{173 SanFrancisco, CA. Morgan Kaufmann.
[32] Moerman, Wilco, and Cognitive Artificial Intelligence. Hierarchical reinforcement learning: Assignment of behaviours to subpolicies by self-organization. Diss. PhD thesis, Cognitive Artificial Intelligence, Utrecht University, 2009.
[33] Moore, A. W., Baird, L. C., and Kaelbling, L. (1999). Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In Dean, T., editor, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI99), volume 2, pages 1316ï¿½1321, San Francisco, CA. Morgan Kauffman Publishers, Inc.
[34] Dayan P, Hinton GE (1992) Feudal reinforcement learning. Advances in Neural Information Processing Systems 5 (NIPS)
[35] Jong NK, Stone P (2009) Compositional models for reinforcement learning. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
[36] Jonsson A, Barto AG (2006) Causal graph based decomposition of factored mdps. In Journal of Machine Learning, vol 7, pp 2259-2301
[37] Mugan J, Kuipers B (2009) Autonomously learning an action hierarchy using a learned qualitative state representation. In: Proceedings of the 21st international joint conference on Artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1175-1180
[38] Neumann G, Maass W, Peters J (2009) Learning complex motions by sequencing simpler motion templates. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA, ICML `09, pp 753-760
[39] Konidaris G, Barto AG (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A(eds) Advances in Neural Information Processing Systems 22, pp 1015-1023
[40] Osentoski S, Mahadevan S (2010) Basis function construction for hierarchical reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS `10, pp 747-754
[41] Mahadevan S (2010) Representation discovery in sequential descision making. In 24thConference on Artificial Intelligence (AAAI), Atlanta July 11-15 2010
[42] Fu, Yuchen, et al. "A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning." The Scientific World Journal 2014 (2014).
[43] Dethlefs, Nina, and Heriberto Cuayï¿½huitl. "Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue." Proceedings of the 13th European Workshop on Natural Language Generation. Association for Computational Linguistics, 2011.
[44] Kober, Jens, Erhan Oztop, and Jan Peters. "Reinforcement learning to adjust robot movements to new situations." Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three. AAAI Press, 2011.
[45] Ryan, Malcolm, and Mark Reid. "Learning to fly: An application of hierarchical reinforcement learning." In Proceedings of the 17th International Conference on Machine Learning. 2000.
[46] Ghavamzadeh, Mohammad, and Sridhar Mahadevan. "Hierarchical average reward reinforcement learning." The Journal of Machine Learning Research 8 (2007): 2629-2669.
[47] Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.
[48] Shapiro, Daniel, Pat Langley, and Ross Shachter. "Using background knowledge to speed reinforcement learning in physical agents." Proceedings of the fifth international conference on Autonomous agents. ACM, 2001.

Citations	2325
h-index	16
i10-index	47