regular expressions in net regular expressions in net
play

Regular Expressions in .NET Regular Expressions in .NET By: Nasser - PowerPoint PPT Presentation

Regular Expressions in .NET Regular Expressions in .NET By: Nasser Alshammari College of Science, Department of Computer Science Old Dominion University Outline Outline Regular Expressions? Why do we need them? RE language RE


  1. Regular Expressions in .NET Regular Expressions in .NET By: Nasser Alshammari College of Science, Department of Computer Science Old Dominion University

  2. Outline Outline  Regular Expressions?  Why do we need them?  RE language  RE in .NET  Conclusion

  3. Regular Expressions? Regular Expressions?  Regular Expressions (regex) are a special string that describes a search pattern.  Provide flexible and easy string matching.  Syntax:  POSIX BRE  POSIX ERE  POSIX character classes  Example:  \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

  4. Why do we need them? Why do we need them?  Security  Buffer overflow attack.  SQL injection attack.  Cross-site scripting attack.  Easy and powerful  Flexibility

  5. RE language RE language  Metacharacters: a character that has a special meaning. . Any single character. a.c matches abc [ ] A single character contained within the [abc] matches ”a”, ”b”, or ”c” brackets. [a-z] matches any characater in this range [^ ] A single character that is not contained [^abc] matches any character other than ”a”, within the brackets. ”b”, or ”c”. ^ Starting position of the string $ Ending position of the string BRE \( \) Defines a group or subexpression ERE ( ) * Matches the preceding element ab*c matches ”ac”, ”abc”, ”abbc”, etc zero or more times BRE \{m,n\} Matches the preceding element at a{3,5} matches only ”aaa”, ”aaaa”, and ERE {m,n} least m and not more than n times ”aaaaa”

  6. Continue. Continue.  More Examples .at matches any three-character string ending with "at", including "hat", "cat", and "bat". [hc]at matches "hat" and "cat". [^b]at matches all strings matched by .at except "bat". ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line. [hc]at$ matches "hat" and "cat", but only at the end of the string or line. \[.\] matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]".

  7. Continue. Continue.  POSIX ERE Metacharacters ? Matches the preceding element zero or one time. For example, ba? matches "b" or "ba". + Matches the preceding element one or more times. For example, ba+ matches "ba", "baa", "baaa", and so on. | The choice (aka alternation or set union) operator matches either the expression before or the expression after the operator.  Examples [hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", and so on, but not "at". [hc]?at matches "hat", "cat", and "at". cat | dog matches "cat" or "dog".

  8. Continue. Continue.  POSIX character classes Many ranges of characters depend on the locale settings (i.e., in some  settings letters are organized as abc...zABC...Z, while in some others as aAbBcC...zZ) POSIX .NET ASCII Description [:alnum:] [A-Za-z0-9] Alphanumeric characters [:word:] \w [A-Za-z0-9_] Alphanumeric characters plus "_" [:alpha:] [A-Za-z] Alphabetic characters [:blank:] [ \t] Space and tab [:digit:] \d [0-9] Digits [:space:] \s [ \t\r\n\v\f] Whitespace characters

  9. RE in .NET RE in .NET  .NET's regex flavor is feature-rich.  RegularExpressionValidator Server Control.  System.Text.RegularExpressions Namespace.  Examples: ^[a-zA-Z''-'\s]{1,40}$ Ravi Mukkamala Allows uppercase or lowercase letters O'Dell or whitespaces up to 40 characters. ^\d{3}-\d{2}-\d{4}$ 000-11-2222 Allows only digits separated by hyphens. ^\d+$ 0 Positive integer greater than zero. 123 ^\d+(\.\d\d)?$ 5.00 Positive currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point. For example, 3.00 is valid but 3.1 is not. ^(-)?\d+(\.\d\d)?$ -2.44 Positive or negative currency.

  10. Continue. Continue.  Using the RegularExpressionValidator Server Control <%@ language="C#" %> <form ID="form1" runat="server"> <asp:TextBox ID="txtName" runat="server"/> <asp:Button ID="btnSubmit" runat="server" Text="Submit" /> <asp:RegularExpressionValidator ID="regexpName" runat="server" ErrorMessage="This expression does not validate." ControlToValidate="txtName" ValidationExpression="^[a-zA-Z'.\s]{1,40}$" /> </form>

  11. Continue. Continue.  Using System.Text.RegularExpressions  The Regex class  IsMatch : returns true if a match is found.  Match : returns a Match object if a match is found.  Matches : returns a MatchCollection object.  Replace : replaces a matched string with another.  Split : splits a string according to regex and retuens String[]. // Instance method: Regex reg = new Regex(@"^[a-zA-Z'.]{1,40}$"); Response.Write(reg.IsMatch(txtName.Text)); // Static method: if (!Regex.IsMatch(txtName.Text, @"^[a-zA-Z'.]{1,40}$")) { // Name does not match }

  12. Continue. Continue.  Regex.Replace public class Example { public static void Main() { string input = "This is text with far too much " + "whitespace."; string pattern = "\s+"; string replacement = " "; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); Console.WriteLine("Original String: {0}", input); Console.WriteLine("Replacement String: {0}", result); } // The example displays the following output: // Original String: This is text with far too much whitespace. // Replacement String: This is text with far too much whitespace.

  13. Continue. Continue.  Regex.Split string input = @"07/14/2007"; string pattern = @"(-)|(/)"; Regex regex = new Regex(pattern); foreach (string result in regex.Split(input)) { Console.WriteLine("'{0}'", result); } // Under .NET 1.0 and 1.1, the method returns an array of // 3 elements, as follows: // '07' // '14' // '2007' // Under .NET 2.0, the method returns an array of // 5 elements, as follows: // '07' // '/' // '14' // '/' // '2007'

  14. Continue. Continue.  Comments Regex regex = new Regex(@" ^ # anchor at the start (?=.*\d) # must contain at least one numeric character (?=.*[a-z]) # must contain one lowercase character (?=.*[A-Z]) # must contain one uppercase character .{8,10} # From 8 to 10 characters in length \s # allows a space $ # anchor at the end", RegexOptions.IgnorePatternWhitespace);

  15. Continue. Continue.  Match class: immutable and has no public constructor. // Search for a pattern that is not found in the input string. string pattern = "dog"; string input = "The cat saw the other cats playing in the back yard."; Match match = Regex.Match(input, pattern); if (match.Success ) // Report position as a one-based integer. Console.WriteLine("'{0}' was found at position {1} in '{2}'.", match.Value, match.Index + 1, input); else Console.WriteLine("The pattern '{0}' was not found in '{1}'.",pattern, input);

  16. Continue. Continue.  MatchCollection class: immutable and has no public constructor. static void Main(string[] args) { string input = "The quick brown dog jumps over the lazy dog"; string pattern = "dog"; MatchCollection matches = Regex.Matches(input, pattern); foreach (Match match in matches) Console.WriteLine("'{0}' found at position '{1}'", match.Value, match.Index); } Output: 'dog' found at position '16' 'dog' found at position '40'

  17. Continue. Continue.  More Examples // Create a Regex that accepts all URLs containing the host fragment www.odu.edu. Regex myRegex = new Regex(@"http://www\.odu\.edu/.*"); //a WebPermission that gives permissions to all the hosts containing the same host fragment. WebPermission myWebPermission = new WebPermission(NetworkAccess.Connect, myRegex); // Check whether all callers higher in the call stack have been granted the permission. myWebPermission.Demand();

  18. Conclusion Conclusion  Regex is your friend, use it.  Regex offers additional security measures.  .NET supports many of regex features. Thank You

  19. References References  http://www.regular-expressions.info  http://msdn.microsoft.com

Recommend


More recommend