Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to parellelize my code so that I can utilize all the cores. Therefore, I want to replace the for loop with foreach loop. As I am begginner to R, I could not understand how diferent posts on this topic address the issue. It will be great if somebody can help me with it in step-by-step manner (posting comments with each line, so that I can understand it). Below is my for loop, that I want to replace with foreach:

# A function used for Janshon-Shanon-Divergence computation, that I use inside my nested for loop
JensShanDiver = function(a,b) {
        m = 0.5 * (a + b)
        LRa = ifelse(a > 0, log2(a/m), 0)
        LRb = ifelse(b > 0, log2(b/m), 0)
        JSD = 0.5 * (sum(a * LRa) + sum(b * LRb))
        return(JSD)
}

#an empty dataframe having same dimensions as input dataframe
output <- data.frame(matrix(NA, nrow = nrow(input), ncol = ncol(input)))

#a vector of same length as of each row in input dataframe
v2 <- numeric(length(input[1,]))

for (j in 1:nrow(input)){
  #take each row from input df
   v1 <- as.numeric(input[j,])
   for(i in 1:length(v1)){
  # update an index value in the initially defined vector
    v2[i] <- 1
  # Take the sum of both vectors
    ifelse(v1[i] == 0, output_vec <- 1, output_vec <- JensShanDiver(v1, v2))
  # Reset the updated index to 0 again
    v2[i] <- 0
  # write the output value at [j,i]th index in the output dataframe 
    output[j,i] <- output_vec
   }
 }

Sample of input dataframe is given below:

dput(input)
structure(c(0, 0.5, 0.5, 1, 0.333333333333333, 0.333333333333333, 
0.333333333333333, 0, 0, 1, 0, 0.5, 0.5, 0, 0.333333333333333, 
0.333333333333333, 0.333333333333333, 0.5, 0.5, 0, 1, 0, 0, 0, 
0.333333333333333, 0.333333333333333, 0.333333333333333, 0.5, 
0.5, 0), .Dim = c(10L, 3L), .Dimnames = list(NULL, c("ranges_in_X51214", 
"ranges_in_X56499", "ranges_in_X6383")))

Here is the expected output for the given input:

> dput(output)
structure(list(X1 = c(1, 0.311278124459133, 0.311278124459133, 
0, 0.459147917027245, 0.459147917027245, 0.459147917027245, 1, 
1, 0), X2 = c(1, 0.311278124459133, 0.311278124459133, 1, 0.459147917027245, 
0.459147917027245, 0.459147917027245, 0.311278124459133, 0.311278124459133, 
1), X3 = c(0, 1, 1, 1, 0.459147917027245, 0.459147917027245, 
0.459147917027245, 0.311278124459133, 0.311278124459133, 1)), .Names = c("X1", 
"X2", "X3"), row.names = c(NA, 10L), class = "data.frame")

Your help will be much appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
518 views
Welcome To Ask or Share your Answers For Others

1 Answer

Here is a first pass which removes the inner loop.
The construction of the ifelse statement was incorrect. I also don't understand the purpose of v2<-1 and then v2<-0 two steps later.

input<-read.table(header=TRUE, text ="ranges_in_X51214 ranges_in_X56499 ranges_in_X6383
0.0              0.0               1
0.5              0.5               0
0.5              0.5               0")

output <- data.frame(matrix(NA, nrow = nrow(input), ncol = ncol(input)))

#a vector of same length as of each row in input dataframe
v2 <- numeric(length(input[1,]))
v2 <- 1
for (j in 1:nrow(input)){
  #take each row from input df
  v1 <- as.numeric(input[j,])
  # Take the sum of both vectors
  output_vec<-ifelse(v1 == 0,  1, sum(v1)+1)
  # write the output value at j row
  output[j,] <- output_vec
}

This output matches the output of the original code. As the comments above say there is additional optimization which can be done.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...