Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need to generate a certain number of random numbers starting from a sequence of integers and I use the following code: result<-sample(x=c(2:50), size=10e6, replace=T). I find that increasing the length of the result vector (up to a length of 10^6), the distribution of random numbers is not random if the length of the vector x is an odd number. When plotting the histogram of result I usually get that the 1st number of the sequence (in the example the '2') has a column (and so a number of elements) that is always higher than the other columns. If x=c(1:50), and so the length of x is an even number, the behaviour of the random generator seems to be ok. Is there any issue about random number generators in R about this strange result? I use R 3.0.1 under Ubuntu 13.10.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
327 views
Welcome To Ask or Share your Answers For Others

1 Answer

As I mentioned in my comment above, this has absolutely nothing to do with random number generators.

Consider:

set.seed(123)
result <- sample(x=c(2:50), size=10e4, replace=TRUE)
x <- hist(result)

enter image description here

Something looks wrong, eh? But look closer:

> x$breaks
 [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
> x$counts
 [1] 6132 3971 4179 4115 4108 4002 4145 4073 4192 4117 4123 4099 4054 4013 4067 4055 4073 4082 4095
[20] 4088 4044 4050 4027 4096

versus...

> table(result)
result
   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21 
1979 2100 2053 1978 1993 2152 2027 2058 2057 2074 2034 1991 2011 2075 2070 2067 2006 2047 2145 2019 
  22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39   40   41 
2098 2060 2063 2099 2000 2016 2038 1990 2023 1976 2091 2060 1995 2061 2012 2003 2079 2008 2087 2036 
  42   43   44   45   46   47   48   49   50 
2052 1989 2055 2044 2006 2001 2026 2062 2034 

Note that the first bin from hist appears to include all 2, 3 and 4 values. This is because the default binning strategy employed by hist adds some "fuzziness" to the bin boundaries, which result in the first two break point being slightly less than 2.0 and slightly more than 4.0. Combine that with the intervals being right closed, and you get the resulting histogram.

Compare with:

hist(result,breaks = 1:50)

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...