Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to replace certain patterns in a string with different replacement patters.

Example:

string test = "test replacing "these characters"";

What I want to do is replace all ' ' with '_' and all other non letter or number characters with an empty string. I have the following regex created and it seems to tokenize correctly, but I am not sure how to (if possible) perform a conditional replace using regex_replace.

string test = "test replacing "these characters"";
regex reg("(\s+)|(\W+)");

expected result after replace would be:

string result = "test_replacing_these_characters";

EDIT: I cannot use boost, which is why I left it out of the tags. So please no answer that includes boost. I have to do this with the standard library. It may be that a different regex would accomplish the goal or that I am just stuck doing two passes.

EDIT2: I did not remember what characters were included in w at the time of my original regex, after looking it up I have further simplified the expression. Again the goal is anything matching s+ should be replaced with '_' and anything matching W+ should be replaced with empty string.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
202 views
Welcome To Ask or Share your Answers For Others

1 Answer

The c++ (0x, 11, tr1) regular expressions do not really work (stackoverflow) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.

You may try if your compiler supports the regular expressions needed:

#include <string>
#include <iostream>
#include <regex>

using namespace std;

int main(int argc, char * argv[]) {
    string test = "test replacing "these characters"";
    regex reg("[^\w]+");
    test = regex_replace(test, reg, "_");
    cout << test << endl;
}

The above works in Visual Studio 2012Rc.

Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).

Therefore, you'll need two passes, as you already suspected:

 ...
 string test = "test replacing "these characters"";
 test = regex_replace(test, regex("\s+"), "_");
 test = regex_replace(test, regex("\W+"), "");
 ...

Edit 2:

If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:

 string output = regex_replace(test, regex("\s+|\W+"), tr);

with tr() doing the replacement work:

 string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }

the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:

...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...

string test = "test replacing "these characters"";
test = regex_replace(test, regex("\s+|\W+"), tr);   // <= works in Boost
...

Maybe some day this will work with C++11 or whatever number comes next.

Regards

rbo


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...