Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am starting with 3 large data tables (named A1,A2,A3). Each table has 4 data columns (V1-V4), 1 "Date" column that is constant across all three tables, and thousands of rows.

Here is some dummy data that approximates my tables.

A1.V1<-c(1,2,3,4)
A1.V2<-c(2,4,6,8)
A1.V3<-c(1,3,5,7)
A1.V4<-c(1,2,3,4)


A2.V1<-c(1,2,3,4)
A2.V2<-c(2,4,6,8)
A2.V3<-c(1,3,5,7)
A2.V4<-c(1,2,3,4)


A3.V1<-c(1,2,3,4)
A3.V2<-c(2,4,6,8)
A3.V3<-c(1,3,5,7)
A3.V4<-c(1,2,3,4)

Date<-c(2001,2002,2003,2004)

DF<-data.frame(Date, A1.V1,A1.V2,A1.V3,A1.V4,A2.V1,A2.V2,A2.V3,A2.V4,A3.V1,A3.V2,A3.V3,A3.V4)

So this is what my data frame ends up looking like:

  Date A1.V1 A1.V2 A1.V3 A1.V4 A2.V1 A2.V2 A2.V3 A2.V4 A3.V1 A3.V2 A3.V3 A3.V4
1 2001     1     2     1     1     1     2     1     1     1     2     1     1
2 2002     2     4     3     2     2     4     3     2     2     4     3     2
3 2003     3     6     5     3     3     6     5     3     3     6     5     3
4 2004     4     8     7     4     4     8     7     4     4     8     7     4

My goal is to calculate the row mean for each of the matching columns from each data table. So in this instance, I would want row means for all columns ending in V1, all columns ending in V2, all columns ending in V3 and all columns ending in V4.

The end result would look like this

      V1  V2  V3  V4
2001   1   2   1   1
2002   2   4   3   2
2003   3   6   5   3
2004   4   8   7   4

So my question is, how to I go about calculating row means based on a partial match in the column name?

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
494 views
Welcome To Ask or Share your Answers For Others

1 Answer

colnames = c("V1", "V2", "V3", "V4")
res <- sapply(colnames, function(x) rowMeans(DF [, grep(x, names(DF))] )  )
rownames(res) <- DF$Date
res
     V1 V2 V3 V4
2001  1  2  1  1
2002  2  4  3  2
2003  3  6  5  3
2004  4  8  7  4

The R grep function returns an integer vector that is used to selectively "pull" columns containing individual "V"-column names from the larger dataframe.

If you needed to generate the names automagically:

> unique(sapply(strsplit(names(DF)[-1], ".", fixed=TRUE), "[", 2) )
[1] "V1" "V2" "V3" "V4"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...