Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

In my data, there exist observations for some IDs in some months and not for others, e.g.

dat <- data.frame(c(1, 1, 1, 2, 3, 3, 3, 4, 4, 4), c(rep(30, 2), rep(25, 5), rep(20, 3)), c('2017-01-01', '2017-02-01', '2017-04-01', '2017-02-01', '2017-01-01', '2017-02-01', '2017-03-01', '2017-01-01',
                    '2017-02-01', '2017-04-01'))
colnames(dat) <- c('id', 'value', 'date')

I would like to, for each id value, insert a row that includes the month(s) missing for that id and NA for value.

Is there a way to (somewhat) concisely do this for all months in seq(min(as.Date(dat$date)), max(as.Date(dat$date)), by = 'months')? I often use tidyverse and data.table, but am open to any approach.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
418 views
Welcome To Ask or Share your Answers For Others

1 Answer

tidyr::complete() fills missing values

add id and date as the columns (...) to expand for

library(tidyverse)

complete(dat, id, date)


# A tibble: 16 x 3
      id date       value
   <dbl> <date>     <dbl>
 1  1.00 2017-01-01  30.0
 2  1.00 2017-02-01  30.0
 3  1.00 2017-03-01  NA  
 4  1.00 2017-04-01  25.0
 5  2.00 2017-01-01  NA  
 6  2.00 2017-02-01  25.0
 7  2.00 2017-03-01  NA  
 8  2.00 2017-04-01  NA  
 9  3.00 2017-01-01  25.0
10  3.00 2017-02-01  25.0
11  3.00 2017-03-01  25.0
12  3.00 2017-04-01  NA  
13  4.00 2017-01-01  20.0
14  4.00 2017-02-01  20.0
15  4.00 2017-03-01  NA  
16  4.00 2017-04-01  20.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...