Simplifying regular expression?

In summary, the regular expression of (a*bc*bc*|c*)*b can be simplified by factoring out the same suffix on both words and disallowing the empty case with a branch. This results in the expression ((a*bc*b|c)c*)*b, but to avoid backtracking issues, the group (a*bc*b|c) must be made atomic. However, the last * still needs to be able to backtrack, making the expression more complex. Overall, regular expressions can be difficult to understand and learn, and resources such as the PHP manual's 'PCRE' section can be helpful.
  • #1
KataKoniK
1,347
0
Does anyone know if this regular expression can be simplified?
(a*bc*bc*|c*)*b

I tried (a|b|c)*b, but it's not correct.
 
Technology news on Phys.org
  • #2
Well in that group you have the same suffix on both words, so you can factor it out (1). Then notice that the content in that group might be empty which is a problem because it will run away on failure, so we must disallow that that empty case. We can do that with a branch (2).

#0: (a*bc*bc*|c*)*b
#1: ((a*bc*b)?c*)*b
#2: ((a*bc*b)c*|c+)*b

You can reduce the number of operators by one by factoring out c*:

#3: ((a*bc*b|c)c*)*b

Now you have a problem with the c's which is difficult to explain. The clash is between the last c* and the )* (it's because that group can simplify to c+ in certain cases). The way around that is to force that c* to group atomically so that it won't backtrack.

#4: ((a*bc*b|c)c*+)*b
OR ((a*bc*b|c)(?>c*))*b

Now that should work well enough but to be pedantic those first 2 *'s don't need to backtrack:

#5: ((a*+bc*+b|c)c*+)*b

Unfortunately that last * does need to be able to backtrack because there might be an even number of b's being matched against and it would fail in that case if that group was atomic.

Needless to say, regexes are difficult. I learned them from the PHP manual 'PCRE' section. You can find it online.
 
Last edited:
  • #3
Hmm, so it cannot be simplified to something really short ie. (a|b|c)*b (which is wrong) as it seems.

Thanks.
 

Related to Simplifying regular expression?

What is a regular expression?

A regular expression, or regex, is a sequence of characters that specifies a search pattern. It is commonly used in computer science and programming to search, validate, and manipulate text.

Why do we need to simplify regular expressions?

Simplifying regular expressions can improve code readability, reduce the chances of errors, and make it easier to maintain and update the code. It also helps to make the code more efficient and faster to execute.

What are some common techniques for simplifying regular expressions?

Some common techniques for simplifying regular expressions include using character classes, quantifiers, and grouping to match multiple characters with a single expression. Another technique is to use anchors to specify the position of the pattern in the text.

How do I test my regular expression?

You can test your regular expression using an online regex tester or a text editor that supports regex. These tools allow you to input your regex and a sample text, and then highlight the matches or provide the results of the search.

Can regular expressions be used in all programming languages?

Most programming languages support regular expressions, but the syntax and capabilities may differ slightly. It is important to check the documentation for the specific language you are using to ensure compatibility.

Similar threads

  • Programming and Computer Science
Replies
4
Views
1K
  • Programming and Computer Science
Replies
5
Views
579
  • Programming and Computer Science
Replies
14
Views
2K
  • Programming and Computer Science
Replies
1
Views
2K
  • Precalculus Mathematics Homework Help
Replies
3
Views
462
  • Programming and Computer Science
Replies
34
Views
2K
  • Introductory Physics Homework Help
Replies
4
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
4
Views
1K
  • Programming and Computer Science
Replies
1
Views
936
  • Calculus and Beyond Homework Help
Replies
6
Views
831
Back
Top