c 0x
play

C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk - PowerPoint PPT Presentation

Regular Expressions C++0x Sources C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Kbenhavns Universitet Maj 16, 2008 Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or


  1. Regular Expressions C++0x Sources C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Københavns Universitet Maj 16, 2008

  2. ❼ Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or regexp for short. ” A set of characters, metacharacters, and operators that define a string or group of strings in a search pattern. ” ❼ "regex" (simple regex matching the text ”regex”) The set of metacharacters, operators and other features are usually called a regex flavor.

  3. Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or regexp for short. ” A set of characters, metacharacters, and operators that define a string or group of strings in a search pattern. ” ❼ "regex" (simple regex matching the text ”regex”) ❼ "[-+]?([0-9]*.[0-9]+|[0-9]+)" (simple regular expression matching... what?) The set of metacharacters, operators and other features are usually called a regex flavor.

  4. ❼ ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . How are these these tasty flavours implemented?

  5. ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . How are these these tasty flavours implemented?

  6. ❼ ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. How are these these tasty flavours implemented?

  7. ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. ❼ The languages Perl and Tcl has their own flavors as build in language constructs. How are these these tasty flavours implemented?

  8. ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. ❼ The languages Perl and Tcl has their own flavors as build in language constructs. ❼ Libraries such as PCRE (used in PHP), Boost.Regex, Boost.Xpressive, QT/QRegExp each their own flavor. How are these these tasty flavours implemented?

  9. Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. ❼ The languages Perl and Tcl has their own flavors as build in language constructs. ❼ Libraries such as PCRE (used in PHP), Boost.Regex, Boost.Xpressive, QT/QRegExp each their own flavor. ❼ And the list goes on... How are these these tasty flavours implemented?

  10. Regular Expressions C++0x Sources Implementations Basicly all the different flavours are implemented with a NFA (non-deterministic finite automaton) or DFA. Machine size of M character expression, pattern recognition complexity for an N character sequence of S states. Algo Machine size Complexity O (2 M ) DFA O ( N ) O ( M ) ∨ (2 M ) bit-par non-backtracking NFA O (1 + ( S/B )) N ) O ( M ) ∨ (2 M ) non-backtracking NFA O ( SN ) O (2 N ) backtracking NFA O ( M ) Currently many different implementations for C++ exist, some being procedural others object oriented. Supporting various different flavours, but most are simply object oriented wrappers for c libraries.

  11. ❼ ❼ ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax.

  12. ❼ ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax.

  13. ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization.

  14. ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features.

  15. ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees.

  16. ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ Boost has a way to monitor the runtime complexity of expressions and stopping them.

  17. Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ Boost has a way to monitor the runtime complexity of expressions and stopping them. ❼ Customizing the expression syntax with trait classes. Nice!

  18. ❼ ❼ ❼ Regular Expressions C++0x Sources Implementation A full implementation in C++, not a wrapper! Available in the header file <regex> Representation: ❼ basic regex , holder of expressions, looks like a basic string . ❼ match results , iterator of match results Methods:

  19. Regular Expressions C++0x Sources Implementation A full implementation in C++, not a wrapper! Available in the header file <regex> Representation: ❼ basic regex , holder of expressions, looks like a basic string . ❼ match results , iterator of match results Methods: ❼ bool regex match(basic string, basic regex) ❼ bool regex search(basic string, match results, basic regex) ❼ basic string regex replace(basic string, basic regex, basic string )

  20. Regular Expressions C++0x Sources C++0x Example #i n c l u d e < s t d l i b . h > #i n c l u d e < regex > #i n c l u d e < s t r i n g > #i n c l u d e < iostream > using namespace std ; regex e x p r e s s i o n ( ”([0 − 9]+)( \\−| | ✩ ) ( . ✯ ) ” ) ; // p r o c e s s f t p : on s u c c e s s r e t u r n s the f t p r espo nse code , and f i l l s // msg with the f t p r espo nse message . i n t p r o c e s s f t p ( const char ✯ response , std : : s t r i n g ✯ msg) { cmatch what ; i f ( regex match ( response , what , e x p r e s s i o n )) { // what [ 0 ] c o n t a i n s the whole s t r i n g // what [ 1 ] c o n t a i n s the r espo nse code // what [ 2 ] c o n t a i n s the s e p a r a t o r c h a r a c t e r // what [ 3 ] c o n t a i n s the t e x t message . i f (msg) msg − > a s s i g n ( what [ 3 ] . f i r s t , what [ 3 ] . second ) ; std : : a t o i ( what [ 1 ] . f i r s t ) ; return } // f a i l u r e did not match i f (msg) msg − > e r a s e ( ) ; − 1; return } How is C++0x different from C++?

Recommend


More recommend