Content-based encoding of mathematical and code libraries Josef - PowerPoint PPT Presentation

Content-based encoding of mathematical and code libraries Josef Urban Institute for Computing and Information Sciences Radboud University, Nijmegen August 27, 2011

Overview ◮ Introduction: Formal math libraries and wikis ◮ Motivation: naming problems and their implications ◮ Content-based naming methods ◮ Proposed usage in math libraries ◮ Limitations and extensions ◮ Feedback is appreciated!

Introduction: Formal math libraries and wikis ◮ Mathematics can be expressed fully formally ◮ This allows detailed computer understanding ◮ Similar to code libraries ◮ Proof verification (analogous to code compilation) is then possible ◮ Strong computer assistance possible: automated reasoning, semantic search ◮ Large formal libraries arise, similar to code libraries: Mizar, Coq, Isabelle, HOL ◮ Some problems very similar to software libraries management ◮ Actually, we do not know a crisp boundary between code and formal math (Prolog is clearly both)

Motivation: naming problems and their implications ◮ Bolzano-Weierstrass theorem or just Weierstrass theorem? ◮ Solomonoff vs. Kolmogorov vs. Chaitin complexity vs. algorithmic entropy? ◮ In a formal library: relation composition(R,S) or compose(R,S) or R*S ? ◮ many more (additive vs multiplicative groups, operations on all kinds of numbers ... )

Motivation: naming problems and their implications ◮ Renaming: Weierstrass gets renamed to Bolzano-Weierstrass ◮ Moving: CoRN.algebra.Basics.iterateN becomes CoRN.utilities.iterateN . ◮ Merging: Chaitin complexity and Kolmogorov complexity are found to be the same thing ◮ All these operations cause syntactic change of the depending proofs and theorems

Motivation: naming problems and their implications ◮ However, the changes are purely syntactic, there is no semantic difference ◮ How do we align two different concepts spaces with each other? ◮ How do we use various searching and automated reasoning tools modulo the different syntactic concept hierarchies? ◮ One use-case: a new user comes with his own vocabulary and does not know the concepts in a large library

Current naming methods ◮ serial numbering of theorems in textbooks and in Mizar: CARD 1:def 1 ◮ module-based paths in Coq: CoRN.algebra.Basics.iterateN or CoRN.utilities.iterateN ◮ possibly somewhat more descriptive names: commutativity of plus ◮ name mangling: types of arguments added explicitly to the name ◮ none of these are strictly depending on the semantics (contents) of the items

Content-based naming methods ◮ G¨ odel numbering ◮ Recursive term sharing ◮ Recursive cryptographic hashing

Content-based naming methods: G¨ odel numbering ◮ basic logic objects are assigned natural numbers ◮ complicated objects are modelled from less complicated as sequences ◮ a one-to-one encoding of finite sequences to numbers ◮ thus, every mathematical object is uniquelly assigned (a very large) number based purely on its contents ◮ this gives us (theoretically) purely content-based indentifiers ◮ however, this does not seem to be practically usable, the numbers will be very large

Content-based naming methods: Recursive term sharing ◮ automated/interactive theorem provers (ATPs), Prolog ◮ exhaustive sharing of terms is used to achieve space/time efficiency ◮ example: f(g(a)), g(g(a)) is represented as: ◮ a -> *0, g(*0) -> *1, f(*1) -> *2, g(*1) -> *3 ◮ difference to G¨ odel numbering: objects are numbered serially as they come ◮ this makes this scheme fragile ◮ in some sense, not perfectly content-based, depending also on ordering

Content-based naming methods: Recursive cryptographic hashing ◮ G¨ odel numbering results in impractically large identifiers ◮ Recursive term sharing too fragile ◮ Is there something usable? ◮ Minimal perfect hashing? Not really feasible for math objects ◮ Cryptographic hashing! SHA1 SHA256 used in git ◮ Conflicts are extremely unlikely ◮ SHA1 results in 40-character identifiers - this is feasible!

Content-based naming of formal mathematics ◮ The initial library items get an SHA1 value (e.g. their SHA1 value as strings, etc.) that does not change between the library versions ◮ A suitable semantic form (XML) is defined for terms, formulas, etc. ◮ The SHA1 of the semantic form (tree, DAG of items - SHA1 values) is used as the content-based identifier ◮ This is very similar to the way how git recursively computes fie/directory names

Proposed use ◮ See how much naming-based duplication is inside the libraries ◮ Multiplicative vs. additive versions of algebraic structures ◮ Tracking the items’ histories during wiki-like refactoring: ◮ Where were items moved, how were they renamed (semantic diff) ◮ Name-independent automated reasoning/search tools over the libraries: ◮ Should be useful particularly for new users that do not know the canonical concept names

Limitations and extensions ◮ Wikipedia article typically keeps its name for long time, even though its content changes ◮ This gives rise to an equivalence class of SHA1 hashes ◮ Such equivalence classes need to be propagated using some kind of congruence closure algorithm ◮ Semiformal libraries: take SHA1 only of the formal content (skip the comments) ◮ Interesting issue is normalization: ◮ Alternative versions of associative-commutative operations should be normalized into the same semantic form before the SHA1 is computed

Content-based encoding of mathematical and code libraries Josef - PowerPoint PPT Presentation

Content-based encoding of mathematical and code libraries Josef Urban Institute for Computing and Information Sciences Radboud University, Nijmegen August 27, 2011 Overview Introduction: Formal math libraries and wikis Motivation:

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

Ex. 8.4 7-4-2-1 code Codeconverter 7-4-2-1-code to BCD-code. When encoding the digits 0 ... 9

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

Deep Encode: Machine Learning for Per-Title Encoding Daniel Silhavy| IBC20| Per-Title Encoding

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

Xamarin One platform to rule them all? Erwin de Groot @ 040 coders .NET frameworks WPF UI SL

YCL Session 7 Bookmark these libraries! Libraries A library is a collection of code for

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Bidirectional and executable specifications of machine code decoding and encoding Gang Tan, Penn

Bidirectional and executable specifications of machine code decoding and encoding Gang Tan, Penn

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

1 Comparison of Encoding Comparison of Encoding Schemes (1) Schemes (2) Signal Spectrum

Automated Reasoning for Security Protocol Analysis The ASW Protocol Revisited: A Unified View

SMT Techniques and Solvers in Automated Termination Analysis Carsten Fuhs Birkbeck, University

Sound Reasoning about Integral Data Types with a Reusable SMT Solver Interface R egis Blanc

Individuals and Relations It is useful to view the world as consisting of individuals (objects,

Topics in Automated Deduction (CS 576) Elsa L. Gunter 2112 Siebel Center egunter@cs.uiuc.edu

1 CS486/686 Lecture Slides (c) 2009 K. Larson and P.Poupart 2 CS486/686 Lecture Slides (c) 2009

High Performance Experiment Data Archiving with gStore Chep 2012, New York May 21, 2012 Horst

Automated OpenCL GPU kernel fusion for Stan Math Tadej Ciglari (presenter) * , Rok enovar,