I have a large data.frame
where the first three columns contain information about a marker. The remaining columns are of numeric type for that marker in each individual. Each individual has three columns. The dataset looks as follows:
marker alleleA alleleB X818 X818.1 X818.2 X345 X345.1 X345.2 X346 X346.1 X346.2
1 kgp5209280_chr3_21902067 T A 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000
2 chr3_21902130_21902131_A_T A T 0.8626 0.1356 0.0018 0.7676 0.2170 0.0154 0.8626 0.1356 0.0018
3 chr3_21902134_21902135_T_C T C 0.6982 0.2854 0.0164 0.5617 0.3749 0.0634 0.6982 0.2854 0.0164
That is, for each marker (row), each individual has three values, one in each column.
I want to create a new data.frame
which has all the same rows as in the original, but only one column per individual. In the one column for each individual I want the value out of the three for each individual which is greater than 0.8. If no value is greater than 0.8 then I want to print NA. For instance, in the data set I have given for the first row I would want the second value for 818 (1.0000), and the first value for 345 (1.0000). In the second row, I want the first value for 818 (0.8626), and for 345 none of the values are above 0.8 so I want NA to be printed and so on. The new data set would therefore look like this:
marker alleleA alleleB X818 X345
1 kgp5209280_chr3_21902067 T A 1.0000 1
2 chr3_21902130_21902131_A_T A T 0.8626 NA
I have been trying to use if/else
statements, along the lines of if [, 4] > 0.8 then [, 4], else...
however it doesn't seem to give me what I want, and I would also have to loop this command so it doesn't just do it for one individual in the first three columns but for all columns.
Any help would be appreciated! Thanks in advance.
See Question&Answers more detail:os