Regular expressions in Java

Regular expressions are used to search patterns of specific sequence of characters in Strings. Regular expressions are widely used to validate user’s input in web pages or can be used to extract specific words or numbers from much larger text. Cut the crap... best way to learn regex is to look at examples.

Example:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

public static void main(String[] args) {

String s = "555-535655";
String regex = "^[\\d]{3}-[\\d]{6}$";

Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);

if (m.find()) {
System.out.println("Found");
} else {
System.out.println("Not found");
}
}

}

Rules

Characters

x - the character x
\\ - backslash character
\xhh - the character with hexadecimal value 0xhh
\u0009 - tab character in unicode
\t - tab character (\x09)
\n - newline (line feed) character (\x0A)
\r - the carriage-return character (\x0D)

Character classes

[abc] - a, b or c
[^abc] - any character except a, b or c (negation)
[a-zA-Z] - from a to z or A to Z (range)

Predefined character classes

. - any character (may or may not match line terminators)
\d - digit [0-9]
\D - non-digit [^0-9]
\s - whitespace character [ \t\n\x0B\f\r]
\S - non-whitespace character [^\s]
\w - word character [a-zA-Z_0-9]
\W - non-word character [^\w]

Quantifiers

X? - X, once or not at all
X* - X, zero or more times
X+ - X, one or more times
X{n} - X, exactly n times
X{n,m} - X, at least n times but no more than m times

Boundary matchers

^ - beginning of line
$ - end of line

In Java some characters already have special meaning, so you will have to use ‘escape characters’ to properly construct your regular expression. Instead of single backslash ‘\’ you always have to use double backslashes ‘\\’ in Strings. The same goes for curly braces ‘{}’, which are reserved characters in regular expression rules indicating the number of characters in sequence. To search for character ‘{’ you’ll have to use ‘\{’, and hence ‘\’ is forbidden also, you’ll have to go with ‘\\{’.

Basic examples

String regex = "abc";

Matches:
abc
aaabccc

Doesn’t match:
aab
acb

String regex = "a.c";

Matches:
axc
a4c
aaa-c

Doesn’t match:
affc

String regex = "\\d\\d-\\d\\d\\d";

Matches:
22-333
a23-456789yyyy

Doesn’t match:
a2-333
22a-333
55_777

String regex = "\\D\\d-\\D\\d";

Matches:
a2-P7
aaa4-Q8ssss

Doesn’t match:
7a-b4
x2-PP2

String regex = "...\\s+...";

Matches:
aaa bb9
123   p9p9p9p9p9

Doesn’t match:
asdfasdf
ddd-bbb

String regex = "a[xyzQ]c";

Matches:
axc
ayc
aQc
aaazccc

Doesn’t match:
acc
axyc
aaa Qccc

String regex = "[a-fB-F][0-5]";

Matches:
a2
a320
F4

Doesn’t match:
xx
A1
b737

String regex = "[^aA]G";

Matches:
bG
123Grrrr

Doesn’t match:
aG
AG

String regex = "\\d{1}[abc]{2}";

Matches:
1ab
11ccx

Doesn’t match:
a1
1a
1xa

String regex = "\\d{2}-?\\d{2}-?.";

Matches:
11-22-a
1122-aaa
12345678

Doesn’t match:
1-2-3

String regex = "((\\d{3})\\s|(\\(\\d{3}\\)\\s?))?\\d{3}-\\d{3}";

Matches:
(555) 555-123
555 555-123
555-123
aa 555-123

Doesn’t match:
555 555

Boundary matchers

With boundary matchers you can specify that a string must begin and/or end with given regular expression. While expressions like \d\d will match 12 and A12A as well, the boundary matchers exactly says that the string should only include only two numbers: ^\d\d$. In this case A12A will fail to validate.
The ‘^’ sign means beginning of string, and $ means end of string.

Include only numbers in range from 0 to 125

String regex ="^([0-9]|[1-9][0-9]|1[0-1][0-9]|12[0-5])$"

Matches:
any number between 0 and 9, 10 - 99, 100 - 119, 120 - 125

Validating date with regular expression

Date format: dd/mm/yyyy

String regex = "^([1-9]|[0][1-9]|[12][0-9]|3[01])/([1-9]|[0][1-9]|1[012])/(19|20)\\d\\d$";

Days can have values from 1 to 9 or 01 - 09 or 10 - 29 or 30 and 31. Then comes the delimiter ‘/’.
Months can have values from 1 to 9 or 01 - 09 or 10 - 12, followed by delimiter ‘/’.
Years at the end require four digits: first two digits must be 19 or 20 followed by any two digits at the end.

Matches:
13/4/2011
04/08/1900
31/02/2099

Doesn’t match:
0/1/2000
5/123/2001
15/12/3333
31-12-1999

Validating email

Email format: firstname.lastname@company.something

Validating IP address

IP format: nnn.nnn.nnn.nnn