Regular expressions in Java
Regular expressions are used to search patterns of specific sequence of characters in Strings. Regular expressions are widely used to validate user’s input in web pages or can be used to extract specific words or numbers from much larger text. Cut the crap... best way to learn regex is to look at examples.
Example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "555-535655";
String regex = "^[\\d]{3}-[\\d]{6}$";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("Found");
} else {
System.out.println("Not found");
}
}
}
Rules
Characters
x - the character x
\\ - backslash character
\xhh - the character with hexadecimal value 0xhh
\u0009 - tab character in unicode
\t - tab character (\x09)
\n - newline (line feed) character (\x0A)
\r - the carriage-return character (\x0D)
Character classes
[abc] - a, b or c
[^abc] - any character except a, b or c (negation)
[a-zA-Z] - from a to z or A to Z (range)
Predefined character classes
. - any character (may or may not match line terminators)
\d - digit [0-9]
\D - non-digit [^0-9]
\s - whitespace character [ \t\n\x0B\f\r]
\S - non-whitespace character [^\s]
\w - word character [a-zA-Z_0-9]
\W - non-word character [^\w]
Quantifiers
X? - X, once or not at all
X* - X, zero or more times
X+ - X, one or more times
X{n} - X, exactly n times
X{n,m} - X, at least n times but no more than m times
Boundary matchers
^ - beginning of line
$ - end of line
In Java some characters already have special meaning, so you will have to use ‘escape characters’ to properly construct your regular expression. Instead of single backslash ‘\’ you always have to use double backslashes ‘\\’ in Strings. The same goes for curly braces ‘{}’, which are reserved characters in regular expression rules indicating the number of characters in sequence. To search for character ‘{’ you’ll have to use ‘\{’, and hence ‘\’ is forbidden also, you’ll have to go with ‘\\{’.
Basic examples
String regex = "abc";
Matches:
abc
aaabccc
Doesn’t match:
aab
acb
String regex = "a.c";
Matches:
axc
a4c
aaa-c
Doesn’t match:
affc
String regex = "\\d\\d-\\d\\d\\d";
Matches:
22-333
a23-456789yyyy
Doesn’t match:
a2-333
22a-333
55_777
String regex = "\\D\\d-\\D\\d";
Matches:
a2-P7
aaa4-Q8ssss
Doesn’t match:
7a-b4
x2-PP2
String regex = "...\\s+...";
Matches:
aaa bb9
123 p9p9p9p9p9
Doesn’t match:
asdfasdf
ddd-bbb
String regex = "a[xyzQ]c";
Matches:
axc
ayc
aQc
aaazccc
Doesn’t match:
acc
axyc
aaa Qccc
String regex = "[a-fB-F][0-5]";
Matches:
a2
a320
F4
Doesn’t match:
xx
A1
b737
String regex = "[^aA]G";
Matches:
bG
123Grrrr
Doesn’t match:
aG
AG
String regex = "\\d{1}[abc]{2}";
Matches:
1ab
11ccx
Doesn’t match:
a1
1a
1xa
String regex = "\\d{2}-?\\d{2}-?.";
Matches:
11-22-a
1122-aaa
12345678
Doesn’t match:
1-2-3
String regex = "((\\d{3})\\s|(\\(\\d{3}\\)\\s?))?\\d{3}-\\d{3}";
Matches:
(555) 555-123
555 555-123
555-123
aa 555-123
Doesn’t match:
555 555
Boundary matchers
With boundary matchers you can specify that a string must
begin and/or end with given regular expression. While
expressions like \d\d will match 12 and A12A as well, the
boundary matchers exactly says that the string should only
include only two numbers: ^\d\d$. In this case A12A will fail
to validate.
The ‘^’ sign means beginning of string, and $ means end of
string.
Include only numbers in range from 0 to 125
String regex ="^([0-9]|[1-9][0-9]|1[0-1][0-9]|12[0-5])$"
Matches:
any number between 0 and 9, 10 - 99, 100 - 119, 120 - 125
Validating date with regular expression
Date format: dd/mm/yyyy
String regex = "^([1-9]|[0][1-9]|[12][0-9]|3[01])/([1-9]|[0][1-9]|1[012])/(19|20)\\d\\d$";
Days can have values from 1 to 9 or 01 - 09 or 10 - 29 or
30 and 31. Then comes the delimiter ‘/’.
Months can have values from 1 to 9 or 01 - 09 or 10 - 12,
followed by delimiter ‘/’.
Years at the end require four digits: first two digits must
be 19 or 20 followed by any two digits at the end.
Matches:
13/4/2011
04/08/1900
31/02/2099
Doesn’t match:
0/1/2000
5/123/2001
15/12/3333
31-12-1999
Validating email
Email format: firstname.lastname@company.something
Validating IP address
IP format: nnn.nnn.nnn.nnn