r - Remove duplicates based on 2nd column condition

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

r - Remove duplicates based on 2nd column condition

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

I am trying to remove duplicate rows from a data frame based on the max value on a different column

So, for the data frame:

df<-data.frame (rbind(c("a",2,3),c("a",3,4),c("a",3,5),c("b",1,3),c("b",2,6),c("r",4,5))
  colnames(df)<-c("id","val1","val2")

id val1 val2

  a    2    3

  a    3    4

  a    3    5

  b    1    3

  b    2    6

  r    4    5

I would like to keep remove all duplicates by id with the condition that for the corresponding rows they do not have the maximum value for val2.

Thus the data frame should become:

  a    3    5

  b    2    6

  r    4    5

-> remove all a duplicates but keep row with the max value for df$val2 for subset(df, df$id=="a")

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

706 views

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:50:15+0000

Using base R. Here, the columns are factors. Make sure to convert it to numeric

 df$val2 <- as.numeric(as.character(df$val2))
 df[with(df, ave(val2, id, FUN=max)==val2),]
 #  id val1 val2
 #3  a    3    5
 #5  b    2    6
 #6  r    4    5

Or using dplyr

 library(dplyr)
 df %>% 
    group_by(id) %>% 
    filter(val2==max(val2))
 #   id val1 val2
 #1  a    3    5
 #2  b    2    6
 #3  r    4    5

Categories

r - Remove duplicates based on 2nd column condition

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags