regex – Java String.replaceAll() with back reference

regex – Java String.replaceAll() with back reference

Ill try to explain whats happening in regex.

str.replaceAll((^\*)|(\*$)|\*, $1$2);

$1 represents first group which is (^\*)
$2 represents 2nd group (\*$)

when you call str.replaceAll, you are essentially capturing both groups and everything else but when replacing, replace captured text with whatever got captured in both groups.

Example: *abc**def* --> *abcdef*

Regex is found string starting with *, it will put in $1 group, next it will keep looking until it find * at end of group and store it in #2. now when replacing it will eliminate all * except one stored in $1 or $2

For more information see Capture Groups

You can use lookarounds in your regex:

String repl = str.replaceAll((?<!^)\*+(?!$), );

RegEx Demo

RegEx Breakup:

(?<!^)   # If previous position is not line start
\*+     # match 1 or more *
(?!$)    # If next position is not line end

OPs regex is:

(^*)|(*$)|*

It uses 2 captured groups, one for * at start and another for * at end and uses back-references in replacements. Which might work here but will be way more slower to finish for larger string as evident in # of steps taken in this demo. That is 209 vs 48 steps using look-arounds.

Another smaller improvement in OPs regex is to use quantifier:

(^*)|(*$)|*+

regex – Java String.replaceAll() with back reference

Well, lets first take a look at your regex (^\*)|(\*$)|\* – it matches every *, if it is at the start, it is captured into group 1, if it is at the end, it is captured into group 2 – every other * is matched, but not put into any group.

The Replace pattern $1$2 replaces every single match with the content of group 1 and group 2 – so in case of a * at the beginning or the end of a match, the content of one of the groups is that * itself and is therefore replaced by itself. For all the other matches, the groups contain only empty strings, so the matched * is replaced with this empty string.

Your problem was probably that $1$2 is not a literal replace, but a backreference to captured groups.

Leave a Reply

Your email address will not be published. Required fields are marked *