Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I was wondering if it is possible to extract nouns, verbs separately in R package openNLP? I use the the tagPOS function which tags the sentence but what to do in case I want to extract verbs, nouns separately.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
674 views
Welcome To Ask or Share your Answers For Others

1 Answer

Using an example: (this is to extract words tagged as /VBx, where x is any single character)

library("openNLP")

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."

acqTag <- tagPOS(acq)

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\s)(\w+$)", "\2", x))

     [,1]                           
[1,] "said"                         
[2,] "sold"                         
[3,] "engaged"                      
[4,] "said"                         
[5,] "is"                           
[6,] "did"                          
[7,] " not/RB explain./NN Reuter./."

Ok, my regular expression needs some improvement in order to get rid of the last line in the result.

EDIT

An alternative could be to ignore rows containing a space character

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\s)(\w+$)", "\2", x); res[!grepl("\s",res)]} )

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...