I'm pretty sure I'm missing something obvious here, but I cannot make R to use non-greedy regular expressions:
> library(stringr)
> str_match('xxx aaaab yyy', "a.*?b")
[,1]
[1,] "aaaab"
Base functions behave the same way:
> regexpr('a.*?b', 'xxx aaaab yyy')
[1] 5
attr(,"match.length")
[1] 5
attr(,"useBytes")
[1] TRUE
I would expect the match to be just ab
as per 'greedy' comment in http://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html:
By default repetition is greedy, so the maximal possible number of repeats is used. This can be changed to ‘minimal’ by appending ? to the quantifier. (There are further quantifiers that allow approximate matching: see the TRE documentation.)
Could someone please explain me what's going on?
Update. What's crazy is that in some other cases non-greedy patterns behave as expected:
> str_match('xxx <a href="abc">link</a> yyy <h1>Header</h1>', '<a.*>')
[,1]
[1,] "<a href="abc">link</a> yyy <h1>Header</h1>"
> str_match('xxx <a href="abc">link</a> yyy <h1>Header</h1>', '<a.*?>')
[,1]
[1,] "<a href="abc">"
See Question&Answers more detail:os