Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to exclude only the empty rows that are at the end of a data.table. Is there a packaged and fast way of doing it?

EDIT1: that is, selection criteria: drop all rows that are empty (NAs in all columns for that row) AND for which all subsequent rows are also empty (or is the last empty row itself)

I came up with the solution below, which works but is too slow (I am using this function on thousands of tables), probably because of the while loop.

## Aux function to remove NA rows below table
remove_empty_row_last <- function(dt){
  dt[ ,row_empty:=rowSums(is.na(dt))==ncol(dt)] 
  while (dt[.N,row_empty]==TRUE) {
    dt <- dt[1:(.N-1)]
    
  }
  dt %>% return()
}

d <- data.table(a=c(1,NA,3,NA,5,NA,NA),b=c(1,NA,3,4,5,NA,NA))
remove_empty_row_last(d)

#EDIT2: adding more test cases
d2 <- data.table(A=c(1,NA,3,NA,5,1 ,NA),B=c(1,NA,3,4,5,NA,NA))
remove_empty_row_last(d2)
d3 <- data.table(A=c(1,NA,3,NA,5,NA,NA),B=c(1,NA,3,4,5,1,NA))
remove_empty_row_last(d3)

#Edit3:adding no NA rows test case
d4 <- data.table(A=c(1,2,3,NA,5,NA,NA),B=c(1,2,3,4,5,1,7))
d4 %>% remove_empty_row_last()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.7k views
Welcome To Ask or Share your Answers For Others

1 Answer

Maybe this will be fast enough?

d[!d[,any(rowSums(is.na(.SD)) == ncol(.SD)) & rleid(rowSums(is.na(.SD)) == ncol(.SD)) == max(rleid(rowSums(is.na(.SD)) == ncol(.SD))),]]
    a  b
1:  1  1
2: NA NA
3:  3  3
4: NA  4
5:  5  5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...