Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a data set thats of the format

ID     Content
1      2020-01-01 Car1 Technician1  Inspected parts
2      2020-01-01 Car2 Technician1   Inspected wipers
3      2020-05-01 Car5 Technician2   Fixed wipers 

I want to clean this up such that each time I encounter Technician I am removing the preceding 15 characters. So my result would look like

ID     Content
1      1 Inspected parts
2      1 Inspected wipers
3      2 Fixed wipers 

I am trying some wild card matching but have not been successful yet. Any thoughts?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.9k views
Welcome To Ask or Share your Answers For Others

1 Answer

I think your output is inconsistent, but the premise of removing preceding characters can be done with the {m,n} "limiting repetition" clause of regular expressions.

gsub(".{15}Technician", "", dat$Content)
# [1] "21  Inspected parts"   "21   Inspected wipers" "22   Fixed wipers "   

Note that {15} will work as long as there are 15 or more characters to remove. For example, nothing is done here.

gsub(".{17}Technician", "", dat$Content)
# [1] "2020-01-01 Car1 Technician1  Inspected parts"   "2020-01-01 Car2 Technician1   Inspected wipers"
# [3] "2020-05-01 Car5 Technician2   Fixed wipers "   

If you need "up to", then specify both numbers (or both positions) in the range, where an empty slot implies min or max:

gsub(".{0,15}Technician", "", dat$Content)
gsub(".{,15}Technician", "", dat$Content)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...