Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to run a fixed effects regression using the plm package. The regression code is as following:

fixed = plm(hp~crime,index=c('year','country'),data=data,model='within')

which returns the following error code:

error in pdim.default(index[[1]], index[[2]]) : duplicate couples (id-time)

I have searched the web, including stackoverflow. What I understand is that plm can only run with two ID's, so if you have several ID's, you'll how to 'cheat' plm by merging these before indexing. However, my data only consists of the columns: country, year, hp and crime, so I do not understand how this is possible.

Essentially what I'm asking, am I doing something wrong? Do I still need to merge these two IDs or is the fault within my duplicates of my rows, if that is the case is it possibly to find the duplicates by coding? (I have manually tried to look through my panel data to find duplicates of IDs, i.e. several values of house prices for year 1 for country 1.

If I run

any(table(data$country,data$year)!=1) 

I get TRUE. As I can understand this shows that there aren't any duplicates of country+year combination.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
2.3k views
Welcome To Ask or Share your Answers For Others

1 Answer

Consider the following appropriate data.

set.seed(42)
(d1 <- transform(expand.grid(id=1:2, time=1:2), X=rnorm(4), y=rnorm(4)))
#   id time          X           y
# 1  1    1  1.3709584  0.40426832
# 2  2    1 -0.5646982 -0.10612452
# 3  1    2  0.3631284  1.51152200
# 4  2    2  0.6328626 -0.09465904

library(plm)
plm(y ~ X, index=c("id", "time"), d1)
# works

Now let's duplicate the last row to simulate a flaw in the data,

(d1 <- rbind(d1, d1[nrow(d1), ]))
#    id time          X           y
# 1   1    1  1.3709584  0.40426832
# 2   2    1 -0.5646982 -0.10612452
# 3   1    2  0.3631284  1.51152200
# 4   2    2  0.6328626 -0.09465904
# 41  2    2  0.6328626 -0.09465904  ## duplicated (X and y may be different though)

where we get an error:

plm(y ~ X, index=c("id", "time"), d1)
# Error in pdim.default(index[[1]], index[[2]]) : 
#   duplicate couples (id-time)

Similarly we get an error if we have data with id, time and some condition:

(d2 <- transform(expand.grid(id=1:2, time=1:2, cond=0:1), X=rnorm(4), y=rnorm(4)))
#   id time cond          X          y
# 1  1    1    0  2.0184237 -1.3888607
# 2  2    1    0 -0.0627141 -0.2787888
# 3  1    2    0  1.3048697 -0.1333213
# 4  2    2    0  2.2866454  0.6359504
# 5  1    1    1  2.0184237 -1.3888607
# 6  2    1    1 -0.0627141 -0.2787888
# 7  1    2    1  1.3048697 -0.1333213
# 8  2    2    1  2.2866454  0.6359504


plm(y ~ X, index=c("id", "time"), d2)
# Error in pdim.default(index[[1]], index[[2]]) : 
#   duplicate couples (id-time)

To overcome this, we can technically merge the two indices, whatever that means statistically:

(d2 <- transform(d2, id2=apply(d2[c("id", "cond")], 1, paste, collapse=".")))
#   id time cond          X          y id2
# 1  1    1    0  2.0184237 -1.3888607 1.0
# 2  2    1    0 -0.0627141 -0.2787888 2.0
# 3  1    2    0  1.3048697 -0.1333213 1.0
# 4  2    2    0  2.2866454  0.6359504 2.0
# 5  1    1    1  2.0184237 -1.3888607 1.1
# 6  2    1    1 -0.0627141 -0.2787888 2.1
# 7  1    2    1  1.3048697 -0.1333213 1.1
# 8  2    2    1  2.2866454  0.6359504 2.1

plm(y ~ X, index=c("id2", "time"), d2)
# works

At the end, this stopifnot should not yield an error, where c("id", "time") corresponds to what you have defined in plm(..., index=c("id", "time")):

stopifnot(!any(duplicated(d1[c("id", "time")])))
# Error: !any(duplicated(d1[c("id", "time")])) is not TRUE

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...