I have two methods of generating m distinct random numbers in the range [0..n-1]
Method 1:
//C++-ish pseudocode
int result[m];
for(i = 0; i < m; ++i)
{
int r;
do
{
r = rand()%n;
}while(r is found in result array at indices from 0 to i)
result[i] = r;
}
Method 2:
//C++-ish pseudocode
int arr[n];
for(int i = 0; i < n; ++i)
arr[i] = i;
random_shuffle(arr, arr+n);
result = first m elements in arr;
The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)
Question: What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)
See Question&Answers more detail:os