applying formal verification to reflective reasoning
play

Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. - PowerPoint PPT Presentation

Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. Fallenstein 2 1 Data61, CSIRO and UNSW ramana@intelligence.org 2 Machine Intelligence Research Institute benya@intelligence.org Artificial Intelligence for Theorem Proving,


  1. Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. Fallenstein 2 1 Data61, CSIRO and UNSW ramana@intelligence.org 2 Machine Intelligence Research Institute benya@intelligence.org Artificial Intelligence for Theorem Proving, Obergurgl 2017

  2. Who am I? Ramana Kumar PhD, University of Cambridge Researcher, Data61, CSIRO Theorem Proving in HOL

  3. Context: Beneficial AI Source: Future of Humanity Institute, Oxford. See also: https://intelligence.org/why-ai-safety/

  4. Context: Beneficial AI Technical Agenda

  5. Context: Beneficial AI Technical Agenda Highly Reliable Agent Design

  6. Context: Beneficial AI Technical Agenda Highly Reliable Agent Design ◮ Foundations ◮ Basic problems lacking in-principle solutions

  7. Context: Beneficial AI Technical Agenda Highly Reliable Agent Design ◮ Foundations ◮ Basic problems lacking in-principle solutions (Note: This is not MIRI’s only research agenda.)

  8. One problem within MIRI’s 2014 agenda happened to seem to align with my expertise, theorem proving and self-verification

  9. Problem Statement

  10. Problem Statement Design a system that ◮ always satisfies some safety property, ◮ but is otherwise capable of arbitrary self-improvement.

  11. Problem of Self Trust Too little self-trust Cannot make simple self-modifications Too much self-trust Unsound reasoning about successors

  12. Overview Reflective Reasoning ◮ Self-Modifying Agents ◮ Vingean Reflection ◮ Suggester-Verifier Architecture ◮ Problem and Partial Solutions Implementation ◮ Botworld ◮ Formalisation in HOL

  13. Reflective Reasoning

  14. The Agent Framework observation+reward agent ( π ) environment action π ( oa 1: n ) = a n +1

  15. The Agent Framework observation+reward agent ( π ) environment action π ( oa 1: n ) = a n +1 Cartesian boundary ◮ agent computed outside environment

  16. Reality is not Cartesian environment agent

  17. Reality is not Cartesian environment agent π n ( o n ) = ( a n +1 , � π n +1 � )

  18. Vingean Principle One can reason only abstractly about a stronger reasoner

  19. Vingean Principle One can reason only abstractly about a stronger reasoner Relevance Self-improving system must reason about programs it cannot run: its successors

  20. Vingean Principle One can reason only abstractly about a stronger reasoner Relevance Self-improving system must reason about programs it cannot run: its successors Approach Formal logic as a model of abstract reasoning

  21. Suggester-Verifier Architecture

  22. Suggester-Verifier Architecture observation π, a Suggester Verifier sophisticated, untrusted proof π, a or default

  23. Suggester-Verifier Architecture observation π, a Suggester Verifier sophisticated, untrusted proof π, a or default Verify: ⊢ u ( h ( π, a )) ≥ u ( h (default))

  24. Suggester-Verifier Architecture observation π, a Suggester Verifier sophisticated, untrusted proof π, a or default Verify: ⊢ u ( h ( π, a )) ≥ u ( h (default)) ( ≈ Safe( a ))

  25. Problem with Self-Modification Argument for Safety of Successor ◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it has proven to be safe ◮ However, to conclude that an action is actually safe from a proof is problematic.

  26. Problem with Self-Modification Argument for Safety of Successor ◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it has proven to be safe ◮ However, to conclude that an action is actually safe from a proof is problematic. This principle, T ⊢ � T � ϕ � = ⇒ ϕ , is inconsistent. (G¨ odel/L¨ ob)

  27. Partial Solutions Descending Trust T 100 ⊢ � T 99 � ϕ � = ⇒ ϕ , T 99 ⊢ � T 98 � ϕ � = ⇒ ϕ , . . .

  28. Partial Solutions Descending Trust T 100 ⊢ � T 99 � ϕ � = ⇒ ϕ , T 99 ⊢ � T 98 � ϕ � = ⇒ ϕ , . . . Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n )

  29. Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n )

  30. Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n ) If Safe( a ) ≡ ∀ n . Safe( a , n ) Take ϕ ( n ) ≡ n ≤ κ = ⇒ Safe( a , n )

  31. Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n ) If Safe( a ) ≡ ∀ n . Safe( a , n ) Take ϕ ( n ) ≡ n ≤ κ = ⇒ Safe( a , n ) ∀ a . �� ∀ n ≤ t + 1 + κ. Safe(¯ a , n ) � = ⇒ ∀ n ≤ t + κ. Safe( a , n )

  32. Implementation

  33. Botworld: Concrete Framework for Embedded Agents Robots can construct/inspect/destroy/program other robots

  34. Botworld Formalisation Semantics ◮ step : state → state

  35. Botworld Formalisation Semantics ◮ step : state → state ◮ Robots run policies in CakeML

  36. Botworld Formalisation Semantics ◮ step : state → state ◮ Robots run policies in CakeML Counterfactuals ◮ state-with-hole for proposed action

  37. Botworld Formalisation Semantics ◮ step : state → state ◮ Robots run policies in CakeML Counterfactuals ◮ state-with-hole for proposed action ◮ steph : s-w-h → a → (obs , state) option

  38. Suggester-Verifier Implementation sv( π default , σ ,obs): 1. ( π, a ) = run π default 2. ( π ′ , a ′ , thm) = run σ (obs , π, a ) 3. Check thm has correct form 4. Write ( π, a ) or ( π ′ , a ′ ) accordingly

  39. Suggester-Verifier Implementation sv( π default , σ ,obs): 1. ( π, a ) = run π default 2. ( π ′ , a ′ , thm) = run σ (obs , π, a ) 3. Check thm has correct form 4. Write ( π, a ) or ( π ′ , a ′ ) accordingly Reflection Library Automation for: �� LCA ¯ k = ⇒ P � implies LCA ( k + 1) = ⇒ P

  40. Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture.

  41. Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture. Eventual Project Discover how far theorem proving technology is from implementing the above...

  42. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years.

  43. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs)

  44. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!

  45. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice! Theorem Proving for AI ◮ Specifications Needed!

  46. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice! Theorem Proving for AI ◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on Suggester-Verifier to support logical induction and non-proof-based reasoning

  47. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice! Theorem Proving for AI ◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on Suggester-Verifier to support logical induction and non-proof-based reasoning ◮ Reducing Problems to Functional Correctness

Recommend


More recommend