Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I wrote this regex to match all href and src links in an HTML page; (I know I should be using a parser; this just experimenting):

/((href|src)=").*?"/ # Without look-behind

It works fine, but when I try to modify the first portion of the expression as a look-behind pattern:

/(?<=(href|src)=").*?"/ # With look-behind

It throws an error stating 'invalid look-behind pattern'. Any ideas, whats going wrong with the look-behind?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

Lookbehind has restrictions:

   (?<=subexp)        look-behind
   (?<!subexp)        negative look-behind

                      Subexp of look-behind must be fixed character length.
                      But different character length is allowed in top level
                      alternatives only.
                      ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.

                      In negative-look-behind, captured group isn't allowed, 
                      but shy group(?:) is allowed.

You cannot put alternatives in a non-top level within a (negative) lookbehind.

Put them at the top level. You also don't need to escape some characters that you did.

/(?<=href="|src=").*?"/

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...