Java Regex – Regular Expression Examples

Java Regex, also known as Regular Expression, is a powerful tool used for manipulating and managing strings. It consists of a sequence of characters that form a pattern, which can be used to match and locate specific strings within a larger text. In Java, regular expression functionality is provided through the java.util.regex package, which includes three classes and one interface.

  • The Pattern class – It serves as a central component of Java’s regular expression support.The Pattern class does not have any public constructors. Instead, we obtain an object of the Pattern class by using its static method compile().
  • The Matcher class – It is used for performing match operations against an input string based on a specified pattern. Similar to the Pattern class, the Matcher class does not provide public constructors. Instead, we obtain a Matcher object by invoking the matcher() method on a Pattern object.
  • The PatternSyntaxException class – It is an unchecked exception that indicates a syntax error in a regular expression pattern. If there is any issue with the syntax of a regular expression, a PatternSyntaxException is thrown, providing information about the specific syntax error encountered.
  • The MatchResult interface – The MatchResult interface represents the result of a match operation performed by a Matcher object. This interface provides methods to retrieve information about the matched text, as well as capturing groups within the regular expression.

Regex Pattern Class

The Pattern class in Java is a fundamental component of the java.util.regex package, offering a rich set of functionality for working with regular expressions. This class provides methods to compile and manipulate regular expressions, as well as perform matching and searching operations on input strings. Let’s explore some of the key methods provided by the Pattern class:

  1. compile(String regex): This static method is used to compile a regular expression pattern into a Pattern object.
    String regex = "\\d+"; // Matches one or more digits
    Pattern pattern = Pattern.compile(regex);
  2. compile(String regex, int flags): Similar to the previous compile method, this variant allows you to specify additional flags or modifiers for the regular expression.
    String regex = "\\d+"; // Matches one or more digits
    Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    
  3. matcher(CharSequence input): Returns a Matcher object that matches the given input against the pattern.
    String input = "12345";
    Matcher matcher = pattern.matcher(input);
    
  4. split(CharSequence input): Splits the input sequence around occurrences of the pattern and returns an array of strings.
    String input = "apple,banana,orange";
    String[] fruits = pattern.split(input);
    
  5. split(CharSequence input, int limit): Similar to the previous split method, this variant allows you to specify a limit for the number of splits to be performed.
    String input = "apple,banana,orange";
    String[] fruits = pattern.split(input, 2);
    
  6. matches(): Attempts to match the entire input sequence against the pattern and returns a boolean value indicating a match or not.
    String input = "Hello, World!";
    boolean isMatch = pattern.matcher(input).matches();
    
  7. find(): Scans the input sequence to find the next occurrence of the pattern and returns a boolean value indicating a match or not.
    String input = "The quick brown fox";
    boolean isMatch = pattern.matcher(input).find();
    
  8. replaceAll(String replacement): Replaces every occurrence of the pattern in the input sequence with the specified replacement string and returns the modified string.
    String input = "Hello, World!";
    String modified = pattern.matcher(input).replaceAll("Hi");
    
  9. quote(String s): Returns a literal pattern String for the specified input string, treating any special characters as literals.
    String literalPattern = Pattern.quote("[Java]");
    
  10. pattern(): Returns the regular expression pattern used to create the Pattern object.
    String patternString = pattern.pattern();
    
  11. flags():
    Returns the flags or modifiers applied to the regular expression pattern.
  12. toString(): Returns the string representation of the Pattern object.
    String patternString = pattern.toString();
    

These are just a few of the methods provided by the Pattern class. Each method serves a specific purpose, enabling developers to perform various operations on regular expressions and input strings. Understanding and utilizing these methods effectively can greatly enhance your regex-based string manipulation tasks.

Regex Matcher Class

The Matcher class in Java complements the Pattern class by providing methods to perform matching operations against input strings based on a specified pattern. This class allows you to apply regular expressions and perform various operations such as matching, finding, and replacing strings. Let’s explore some of the key methods provided by the Matcher class:

  1. matches(): Attempts to match the entire input sequence against the pattern associated with the Matcher object and returns a boolean value indicating a match or not.
    String input = "Hello, World!";
    boolean isMatch = matcher.matches(input);
    
  2. find(): Scans the input sequence to find the next occurrence of the pattern associated with the Matcher object and returns a boolean value indicating a match or not.
    String input = "The quick brown fox";
    boolean isMatch = matcher.find(input);
    
  3. group(): Returns the input subsequence matched by the previous match operation or a specific capturing group.
    String match = matcher.group();
    
  4. start(): Returns the start index of the previous match.
    int startIndex = matcher.start();
    
  5. end(): Returns the end index (exclusive) of the previous match.
    int endIndex = matcher.end();
    
  6. replaceFirst(String replacement): Replaces the first occurrence of the pattern in the input sequence with the specified replacement string and returns the modified string.
    String modified = matcher.replaceFirst("Hi");
    
  7. replaceAll(String replacement): Replaces every occurrence of the pattern in the input sequence with the specified replacement string and returns the modified string.
    String modified = matcher.replaceAll("Hi");
    
  8. reset(CharSequence input): Resets the Matcher object with a new input sequence, allowing you to reuse the same Matcher instance for multiple match operations.
    String newInput = "New input string";
    matcher.reset(newInput);
    
  9. reset(): Resets the Matcher object, clearing any state information, allowing you to reuse the same Matcher instance for another match operation.
    matcher.reset();
    
  10. groupCount(): Returns the number of capturing groups in the pattern associated with the Matcher object.
    int groupCount = matcher.groupCount();
    

These are some of the essential methods offered by the Matcher class, allowing you to perform pattern matching and manipulation operations on input strings. Understanding and utilizing these methods effectively can greatly enhance your regular expression-based tasks.

In the upcoming sections, we will delve into code examples and detailed explanations for each method mentioned above, providing a comprehensive understanding of their usage and practical applications.

Using Regex in Java – Examples

Example 1: Matching Single Character Using Dot (.)

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

  public static void main(String[] args) {

    Pattern p = Pattern.compile(".d"); // . (DOT) is for single character
    Matcher matcher = p.matcher("zd");
    System.out.println(matcher.matches());

  }
}

Output:

true
  • The code snippet demonstrates how to use the dot (.) in a regular expression pattern to match a single character.
  • The regular expression pattern ".d" is created using the Pattern.compile() method. The dot (.) in the pattern represents any single character, while the letter ‘d’ represents a literal ‘d’ character.
  • The matcher object is created by invoking the p.matcher("zd") method, which matches the input string “zd” against the pattern.
  • The matcher.matches() method is called to perform a match operation on the entire input string. It returns a boolean value indicating whether the entire input sequence matches the pattern or not.
  • In this example, the pattern “.d” matches the input string “zd” because the dot (.) matches any character, and the second character in the input string is ‘d’.
  • The output of matcher.matches() is printed, which will be true in this case, indicating a successful match.

This code example showcases how to use the dot (.) in a regular expression to match a single character. It demonstrates the basic usage of the Pattern and Matcher classes to compile a pattern, create a matcher object, and perform a match operation on an input string.

Example 2: Extracting Numbers from a String

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

  public static void main(String[] args) {

    String stringToBeMatched = "London is the number 1 city in UK";
    String pattern = "(.*)(\\d+)(.*)";

    Matcher matcher = Pattern.compile(pattern).matcher(stringToBeMatched);

    if (matcher.find()) {
      System.out.println("group 0: " + matcher.group(0));
      System.out.println("group 1:  " + matcher.group(1));
      System.out.println("group 2: " + matcher.group(2));
    } else {
      System.out.println("No match found!");
    }

  }
}

Output:

group 0: London is the number 1 city in UK 
group 1: London is the number 
group 2: 1
  • This code example demonstrates how to use regular expressions to extract numbers from a given string.
  • The string to be matched is “London is the number 1 city in UK”.
  • The regular expression pattern is “(.)(\d+)(.)”:
    • (.*?) captures any characters (except line terminators) zero or more times, reluctantly.
    • (\\d+) captures one or more digits.
    • (.*?) captures any characters (except line terminators) zero or more times, reluctantly.
  • The Matcher object is created by compiling the pattern and invoking matcher() on it, passing the string to be matched.
  • The find() method is called to perform the match operation.
  • If a match is found:
    • matcher.group(0) returns the entire matched substring, which is “London is the number 1 city in UK”.
    • matcher.group(1) returns the substring captured by the first capturing group (.*?), which is “London is the number “.
    • matcher.group(2) returns the substring captured by the second capturing group (\\d+), which is “1”.
  • If no match is found, the “No match found!” message is printed.

This code example demonstrates the usage of capturing groups in regular expressions. It shows how to extract specific portions of a string based on the defined pattern. In this case, it extracts the number “1” from the given string.

Example 3: Finding All Occurrences of a Word in a Text

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WordFinder {

  public static void main(String[] args) {

    String text = "Java is a powerful programming language. Java is widely used in software development.";
    String word = "Java";

    String pattern = "\\b" + word + "\\b";

    Pattern wordPattern = Pattern.compile(pattern);
    Matcher matcher = wordPattern.matcher(text);

    int count = 0;
    while (matcher.find()) {
      count++;
      System.out.println("Match found at index " + matcher.start());
    }

    System.out.println("Total occurrences: " + count);

  }
}

Output:

Match found at index 0
Match found at index 36
Total occurrences: 2
  • This code example demonstrates how to use regular expressions and the Matcher class to find all occurrences of a specific word in a given text.
  • The text to be searched is “Java is a powerful programming language. Java is widely used in software development.”
  • The word we want to find is “Java”.
  • The regular expression pattern is \\bJava\\b:
    • \\b is a word boundary that matches the start or end of a word.
    • Java is the word we want to find.
    • \\b is the word boundary that matches the end of the word.
  • The Pattern object is created by compiling the pattern.
  • The Matcher object is created by invoking matcher() on the Pattern object, passing the text.
  • The find() method is called in a loop to find each occurrence of the word.
  • Each time a match is found, the start index of the match is printed.
  • The total number of occurrences is counted and displayed.

This code example demonstrates how to use regular expressions to find all occurrences of a specific word in a given text and retrieve their indices.

Example 4: Validating Phone Numbers

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PhoneNumberValidator {

  public static void main(String[] args) {

    String phoneNumber = "+1 (555) 123-4567";
    String pattern = "^\\+\\d{1,3} \\(\\d{3}\\) \\d{3}-\\d{4}$";

    Matcher matcher = Pattern.compile(pattern).matcher(phoneNumber);

    if (matcher.matches()) {
      System.out.println("Valid phone number");
    } else {
      System.out.println("Invalid phone number");
    }

  }
}

Output:

Valid phone number
  • This code example demonstrates how to use regular expressions to validate phone numbers with a specific format.
  • The phone number to be validated is “+1 (555) 123-4567”.
  • The regular expression pattern is "^\\+\\d{1,3} \\(\\d{3}\\) \\d{3}-\\d{4}$":
    • ^ represents the start of the string.
    • \\+ matches the plus symbol.
    • \\d{1,3} matches one to three digits.
    • matches a space.
    • \\( matches the opening parenthesis.
    • \\d{3} matches three digits.
    • \\) matches the closing parenthesis.
    • matches a space.
    • \\d{3} matches three digits.
    • - matches the hyphen.
    • \\d{4} matches four digits.
    • $ represents the end of the string.
  • The Matcher object is created by compiling the pattern and invoking matcher() on it, passing the phone number.
  • The matches() method is called to perform the match operation.
  • If the phone number matches the pattern, “Valid phone number” is printed. Otherwise, “Invalid phone number” is printed.

This code example demonstrates the use of regular expressions to validate phone numbers based on a specific format.

Regular Character Classes

Regular character classes, also known as character sets or character ranges, are a feature of regular expressions that allow you to specify a set of characters to match a single character at a particular position in a string. Regular character classes are enclosed within square brackets [ ] and provide flexibility in defining patterns that match a range of characters.

Here are a few common types of regular character classes:

. (DOT) Any character
\d A digit: [0-9]
\D A non-digit: [\^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [\^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [\^\w]
[abc] a, b, or c
[^abc] Any character except a, b, or c
[a-zA-Z] a through z or A through Z inclusive
[a-d[m-p]] a through d, or m through p: [a-dm-p]

Regular character classes provide a concise way to specify a set of characters to match against, allowing you to define patterns with flexibility and precision. By utilizing character classes, you can create regular expressions that efficiently capture specific character combinations or ranges based on your requirements.

Please note that regular character classes are just one aspect of regular expressions, and there are various other constructs and features available for more advanced pattern matching and manipulation.

Using Regular Character Classes – Examples

Example 1: Matching Characters with a Character Class

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        System.out.println(Pattern.matches("[dzb]", "olkm"));       // false (not d or z or b)
        System.out.println(Pattern.matches("[dzb]", "b"));         // true (among d or z or b)
        System.out.println(Pattern.matches("[dzb]", "dazbaze"));   // false (z specified more than once)
    }
}

Output:

false
true
false

In this code example, we are using the Pattern.matches() method to test whether a given input string matches a specified pattern. The pattern we are using is a character class defined within square brackets [ ]. Here’s a breakdown of the behavior and results for each line:

  • Pattern.matches("[dzb]", "olkm"):
    • The pattern [dzb] specifies that the character must be either ‘d’, ‘z’, or ‘b’.
    • The input string “olkm” does not contain any of these characters, so the result is false.
  • Pattern.matches("[dzb]", "b"):
    • The pattern [dzb] specifies that the character must be either ‘d’, ‘z’, or ‘b’.
    • The input string “b” matches the pattern since it is one of the specified characters, so the result is true.
  • Pattern.matches("[dzb]", "dazbaze"):
    • The pattern [dzb] specifies that the character must be either ‘d’, ‘z’, or ‘b’.
    • The input string “dazbaze” contains multiple occurrences of ‘z’, which is not allowed by the pattern. Therefore, the result is false.

By using character classes in regular expressions, we can define sets of characters that need to match for a successful pattern match. In this example, the pattern [dzb] ensures that the input string must contain either ‘d’, ‘z’, or ‘b’ for a match to occur.

Example 2: Validating Characters Using Pattern.matches()

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        System.out.println(Pattern.matches("\\D", "a"));   // true - provided a non-digit
        System.out.println(Pattern.matches(".", "ab"));   // false - two characters provided
        System.out.println(Pattern.matches("\\s", "a"));  // false - provided a non-whitespace character
    }
}

Output:

true
false
false
  • The code example demonstrates the usage of Pattern.matches() method to validate patterns against input strings.
  • Pattern.matches(pattern, input) returns a boolean value indicating whether the input string matches the specified regular expression pattern.
  • In the given code:
    • Pattern.matches("\\D", "a") checks if the input string “a” matches the pattern “\D”. Here, “\D” represents a non-digit character. The method returns true because “a” is indeed a non-digit character.
    • Pattern.matches(".", "ab") checks if the input string “ab” matches the pattern “.”. The dot (.) represents any character. Since the input string has two characters, the method returns false because the entire string doesn’t match the pattern.
    • Pattern.matches("\\s", "a") verifies if the input string “a” matches the pattern “\s”. The “\s” represents a whitespace character. As “a” is not a whitespace character, the method returns false.

This code example demonstrates the use of Pattern.matches() to validate specific patterns against input strings.

Example 3: Pattern.matches() – Validating String Patterns

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        System.out.println(Pattern.matches("[a-zA-Z0-9]{5}", "abcde"));
        System.out.println(Pattern.matches("[a-zA-Z0-9]{5}", "12bce"));
        System.out.println(Pattern.matches("[a-zA-Z0-9]{5}", "practise"));
        System.out.println(Pattern.matches("[a-zA-Z0-9]{5}", "count"));
    }
}

Output:

true
true
false
true

The given code demonstrates the usage of the Pattern.matches() method to validate string patterns against a regular expression. In this example, the regular expression “[a-zA-Z0-9]{5}” is used, which matches any sequence of exactly five characters consisting of uppercase letters, lowercase letters, or digits (0-9).

  • Pattern.matches("[a-zA-Z0-9]{5}", "abcde"):
    • The string “abcde” is matched against the regular expression.
    • Since “abcde” consists of five lowercase letters, the output will be true.
  • Pattern.matches("[a-zA-Z0-9]{5}", "12bce"):
    • The string “12bce” is matched against the regular expression.
    • Although “12bce” has a length of five characters, it contains a non-alphanumeric character, which violates the specified pattern.
    • Therefore, the output will be false.
  • Pattern.matches("[a-zA-Z0-9]{5}", "practise"):
    • The string “practise” is matched against the regular expression.
    • As “practise” has a length of eight characters, it does not satisfy the pattern requirement of exactly five characters.
    • Thus, the output will be false.
  • Pattern.matches("[a-zA-Z0-9]{5}", "count"):
    • The string “count” is matched against the regular expression.
    • “count” satisfies the pattern condition of exactly five alphanumeric characters.
    • Hence, the output will be true.

Example 4: Matching Alphanumeric Strings of Length 5

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("[179]{1}[0-9]{4}");
        
        System.out.println(pattern.matcher("72135").matches());
        System.out.println(pattern.matcher("123a5").matches());
        System.out.println(pattern.matcher("abc47").matches());
        System.out.println(pattern.matcher("93214").matches());
        System.out.println(pattern.matcher("abcndkoupz").matches());
    }
}

Output:

true 
false 
false 
true 
false
  • The provided code example demonstrates the use of regular expressions to validate whether a given 5-digit number starts with 1, 7, or 9.
  • Let’s break down the code and its output to understand the functionality:
    • Pattern Compilation:
      • The line Pattern pattern = Pattern.compile("[179]{1}[0-9]{4}"); compiles a regular expression pattern using the compile() method of the Pattern class.
      • The pattern [179]{1}[0-9]{4} specifies the desired criteria:
        • [179]{1} matches a single character that is either 1, 7, or 9. The {1} quantifier ensures exactly one occurrence.
        • [0-9]{4} matches four characters in the range of 0 to 9. The {4} quantifier ensures exactly four occurrences.
    • Pattern Matching and Output:
      • The code then proceeds to perform multiple match operations using the matcher() method and matches() method of the Matcher class.
      • pattern.matcher("72135").matches() matches the pattern against the input string “72135” and returns true. The output is printed accordingly.
      • pattern.matcher("123a5").matches() matches the pattern against the input string “123a5” and returns false. The output is printed accordingly.
      • pattern.matcher("abc47").matches() matches the pattern against the input string “abc47” and returns false. The output is printed accordingly.
      • pattern.matcher("93214").matches() matches the pattern against the input string “93214” and returns false. The output is printed accordingly.
      • pattern.matcher("abcndkoupz").matches() matches the pattern against the input string “abcndkoupz” and returns false. The output is printed accordingly.
    • Output Explanation:
      • The first number “72135” matches the pattern because it starts with 7, which satisfies the specified criteria. Hence, the output is true.
      • The second number “123a5” does not match the pattern as it does not start with 1, 7, or 9. Hence, the output is false.
      • Similarly, the third, fourth, and fifth numbers fail to match the pattern due to the same reason. Therefore, the output for all three is false.

This code example demonstrates how regular expressions can be utilized with Java’s Pattern and Matcher classes to validate and match specific patterns within strings. In this case, it ensures that a given 5-digit number starts with either 1, 7, or 9.

Example 5: Validating Email Addresses

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EmailValidator {

  public static void main(String[] args) {

    String emailAddress = "[email protected]";
    String emailPattern = "^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+$";

    Pattern pattern = Pattern.compile(emailPattern);
    Matcher matcher = pattern.matcher(emailAddress);

    if (matcher.matches()) {
      System.out.println("Valid email address!");
    } else {
      System.out.println("Invalid email address!");
    }

  }
}

Output:

Valid email address!
  • This code example showcases how to use regular expressions to validate email addresses.
  • The regular expression pattern used for email validation is ^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+$.
  • The pattern breaks down as follows:
    • ^ asserts the start of the string.
    • [A-Za-z0-9+_.-]+ matches one or more occurrences of alphanumeric characters, plus signs, underscores, periods, or hyphens for the email username.
    • @ matches the “@” symbol.
    • [A-Za-z0-9.-]+ matches one or more occurrences of alphanumeric characters, periods, or hyphens for the domain name.
    • $ asserts the end of the string.
  • The Matcher object is created by compiling the pattern and invoking matcher() on it, passing the email address to be validated.
  • The matches() method is called to perform the validation operation.
  • If the email address matches the pattern, it is considered valid, and the “Valid email address!” message is printed.
  • If the email address does not match the pattern, the “Invalid email address!” message is printed.

This code example demonstrates how to use regular expressions and the Matcher class to validate the format of email addresses. It allows you to check if an email address provided by a user or entered into a form follows the expected pattern.

Conclusion

In conclusion, this tutorial provides a comprehensive overview of using regular expressions (Regex) in Java. It covers essential topics such as the Pattern and Matcher classes, enabling developers to perform powerful string manipulation and validation tasks. The tutorial also includes a variety of practical examples that demonstrate the application of regular expressions in Java.

By understanding the concepts and techniques presented in this tutorial, developers can harness the power of regular expressions to effectively handle complex string patterns and improve their Java programming skills. Make sure to visit the Java Misc page to explore additional informative tutorials.

Frequently asked questions

  • What is the difference between the matches() and find() methods in the Matcher class?
    The matches() method attempts to match the entire input sequence against the pattern, while the find() method scans the input sequence to find the next occurrence of the pattern.
  • How can I ignore case sensitivity when using regular expressions in Java?
    You can use the Pattern.CASE_INSENSITIVE flag when compiling the pattern to make it case-insensitive.
  • Can I reuse a Matcher object for multiple match operations?
    Yes, you can reset the Matcher object using the reset() method, allowing you to reuse it for subsequent match operations.
  • How can I match a specific number of occurrences of a character or pattern?
    You can use quantifiers such as * (zero or more), + (one or more), ? (zero or one), or specify a specific number using {}. For example, a{3} matches exactly three consecutive ‘a’ characters.
  • How do I escape special characters in regular expressions?
    Special characters in regular expressions, such as . or +, need to be escaped with a backslash \ to treat them as literal characters. For example, to match a literal dot, use \. in your regex pattern.
  • Is there a way to replace specific parts of a string based on a regular expression pattern?Yes, the Matcher class provides methods like replaceAll() and replaceFirst() that allow you to replace specific parts of a string based on a regular expression pattern.

Leave a Reply

Your email address will not be published. Required fields are marked *