re reinfor inforce ceme ment nt lea learn rning ing
play

re reinfor inforce ceme ment nt lea learn rning ing: A A co - PowerPoint PPT Presentation

Faculty of Informatics Etvs Lornd University Hippo Hippoca campa mpal l forma formation tion br brea eaks ks co combina mbinato torial rial ex explos plosion ion for for re reinfor inforce ceme ment nt lea learn


  1. Faculty of Informatics Eötvös Loránd University Hippo Hippoca campa mpal l forma formation tion br brea eaks ks co combina mbinato torial rial ex explos plosion ion for for re reinfor inforce ceme ment nt lea learn rning ing: A A co conjec njectu ture re Andras Lorincz Department of Information Systems Eötvös Lorá nd University

  2. Eötvös Loránd University Support and collaborators Support  AFOSR Information Directorate – on reinforcement learning  EU Framework Program – on multiagent systems Faculty of Informatics Collaborators  Barnabas Poczos  Zoltan Szabo  Gabor Szirtes  Istvan Szita Combinatorial Explosion AAAI FSS BICA 2008

  3. Eötvös Loránd University Motivation: Symbols and symbol manipulation Control Dynamical system Faculty of Informatics Mixed observation Independent driving components Combinatorial Explosion AAAI FSS BICA 2008

  4. Eötvös Loránd University Problem statement  Artificial Intelligence started from computations  Computations work by manipulating symbols  The symbol grounding problem emerges Faculty of Informatics  Grounding of symbols  connect the symbols to experiences  Symbols represent parts (components) of (in) the world and their relations  symbol grounding corresponds to graph matching  it is exponentially hard  It seems necessary to focus on polynomial time learning tasks  Then the symbol learning problem emerges (Lorincz, 2008) Combinatorial Explosion AAAI FSS BICA 2008

  5. Eötvös Loránd University The symbol learning task Find high-entropy variables, or symbols , x i ( i = 1 , 2 , …, k ) Faculty of Informatics and low-entropy random variables, or manifestations for the symbols z i,j i (( i = 1 , 2 , …, k ); ( j i = 1 , 2 , … ;K i ); K i >> 1 for all i such that the transition probability between the low-entropy variables z i,j i and z k,j k i.e., P ( z k,j k |z i,j i ) is roughly determined by the transition probability  between the high-entropy variables x i and x l, i.e., by P ( x l |x i )  for almost all manifestations. Combinatorial Explosion AAAI FSS BICA 2008

  6. Eötvös Loránd University The symbol learning task The symbol learning task is possible Tao (2005) rephrased the famouse Szemeredi Regularity Lemma of extreme graph theory Faculty of Informatics to information theory The symbol learning task is polynomial Frieze and Kannan (1999). Combinatorial Explosion AAAI FSS BICA 2008

  7. Eötvös Loránd University If we have the symbols  Reinforcement learning is still exponential  BUT IF variables factorize (  ‘complementarity’ )  e.g., [color and shape], [position and speed], [where and what] Faculty of Informatics  then factored RL is  polynomial  with a novel sampling technique (I. Szita and A. Lorincz, 2008)  No general method to find variables that factorize  No solution to the factored symbol learning task  Exception:  control (position, speed, acceleration,force)  in linear approximation  Autoregressive Moving Average (ARMA) processes Combinatorial Explosion AAAI FSS BICA 2008

  8. Eötvös Loránd University ARMA processes  Steps 1. Remove temporal dependencies (ARMA removal, Gaussian assumption) 2. Compute ARMA innovations := driving causes of ARMA processes Faculty of Informatics 3. Analyze the causes, they should be independent 4. Find the hidden independences: Independent Subspace Analysis 5. Learn the hidden processes driven by the hidden causes  Independent Process Analysis  polynomial time algorithm (Poczos, Szabo, Lorincz, 2006-2007)  Putting the steps into ANN and insisting on Hebbian learning at each step  one receives an architecture, which is similar to the hippocampal formation. HC is  responsible for declarative memory (planning aspect)  holds representations of position and direction in rodents Combinatorial Explosion AAAI FSS BICA 2008

  9. Faculty of Informatics Eötvös Loránd University Comparison: 1. Hebbian architecture for Autoregressive Independent Process Analysis versus 2. hippocampal formation

  10. Eötvös Loránd University The architecture we get Architecture Faculty of Informatics Hippocampal formation with additional CA3  dentate gyrus loops serving moving average compensation Combinatorial Explosion AAAI FSS BICA 2008

  11. Eötvös Loránd University Con onject jectur ure repeated repeated Faculty of Informatics Hippo Hippoca campa mpal l for orma mation tion br brea eaks ks co comb mbina inato toria rial l exp xplosio losion n for or reinf einfor orce cemen ment lear learning ning

  12. Faculty of Informatics Eötvös Loránd University Thank you!

  13. Faculty of Informatics Eötvös Loránd University Supplementary materials and references

  14. Eötvös Loránd University Grids and place cells inputs Hexagonal grids hexagonal grids Faculty of Informatics  grids and place fields emerge together in the model (Lorincz, Kiszlinger, Szirtes, 2008) place fields Combinatorial Explosion AAAI FSS BICA 2008

  15. Eötvös Loránd University Independent Process Analysis observed: Faculty of Informatics input of ISA: estimated : Combinatorial Explosion AAAI FSS BICA 2008

  16. Eötvös Loránd University References-1  Christian Jutten, Jeanny Hérault: Blind separation of sources: An adaptive algorithm based on neuromimetic architecture. Signal Processing , 24:1-10, 1991. Faculty of Informatics  Pierre Comon: Independent component analysis, a new concept? Signal Processing , 36 (3): 287-314, 1994.  Jean-Francois Cardoso: Multidimensional independent component analysis. ICASSP’98 , volume 4, 1941-1944.  Zoltán Szabó, Barnabás Póczos, András Lőrincz: Undercomplete blind subspace deconvolution. Journal of Machine Learning Research 8(May):1063-1095, 2007. Combinatorial Explosion AAAI FSS BICA 2008

  17. Eötvös Loránd University References-2  Aapo Hyvarinen: Independent component analysis for time-dependent stochastic processes, ICANN’98 , 541- 546.  Barnabás Póczos, Bálint Takács, András Lőrincz: Faculty of Informatics Independent subspace analysis on innovations, ECML- 2005 , 698-706.  Barnabás Póczos, András Lőrincz: D-optimal Bayesian interrogation for parameter and noise identification of recurrent neural networks, 2008 (submitted). Available at http://arxiv.org/abs/0801.1883  Zoltán Szabó, András Lőrincz: Towards independent subspace analysis in controlled dynamical systems. ICARN-2008 , (accepted). Combinatorial Explosion AAAI FSS BICA 2008

Recommend


More recommend