regex – Extracting a string between other two strings in R

regex – Extracting a string between other two strings in R

You may use str_match with STR1 (.*?) STR2 (note the spaces are meaningful, if you want to just match anything in between STR1 and STR2 use STR1(.*?)STR2, or use STR1\s*(.*?)\s*STR2 to trim the value you need). If you have multiple occurrences, use str_match_all.

Also, if you need to match strings that span across line breaks/newlines add (?s) at the start of the pattern: (?s)STR1(.*?)STR2 / (?s)STR1\s*(.*?)\s*STR2.

library(stringr)
a <-  anything goes here, STR1 GET_ME STR2, anything goes here
res <- str_match(a, STR1\s*(.*?)\s*STR2)
res[,2]
[1] GET_ME

Another way using base R regexec (to get the first match):

test <-  anything goes here, STR1 GET_ME STR2, anything goes here STR1 GET_ME2 STR2
pattern <- STR1\s*(.*?)\s*STR2
result <- regmatches(test, regexec(pattern, test))
result[[1]][2]
[1] GET_ME

Heres another way by using base R

a<- anything goes here, STR1 GET_ME STR2, anything goes here

gsub(.*STR1 (.+) STR2.*, \1, a)

Output:

[1] GET_ME

regex – Extracting a string between other two strings in R

Another option is to use qdapRegex::ex_between to extract strings between left and right boundaries

qdapRegex::ex_between(a, STR1, STR2)[[1]]
#[1] GET_ME

It also works with multiple occurrences

a <- anything STR1 GET_ME STR2, anything goes here, STR1 again get me STR2

qdapRegex::ex_between(a, STR1, STR2)[[1]]
#[1] GET_ME       again get me

Or multiple left and right boundaries

a <- anything STR1 GET_ME STR2, anything goes here, STR4 again get me STR5
qdapRegex::ex_between(a, c(STR1, STR4), c(STR2, STR5))[[1]]
#[1] GET_ME       again get me

First capture is between STR1 and STR2 whereas second between STR4 and STR5.

Leave a Reply

Your email address will not be published. Required fields are marked *