Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Consider the following method:

public static boolean isPrime(int n) {
    return ! (new String(new char[n])).matches(".?|(..+?)\1+");
}

I've never been a regular expression guru, so can anyone fully explain how this method actually works? Furthermore, is it efficient compared to other possible methods for determining whether an integer is prime?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
711 views
Welcome To Ask or Share your Answers For Others

1 Answer

First, note that this regex applies to numbers represented in a unary counting system, i.e.

1       is 1
11      is 2
111     is 3
1111    is 4
11111   is 5
111111  is 6
1111111 is 7

and so on. Really, any character can be used (hence the .s in the expression), but I'll use "1".

Second, note that this regex matches composite (non-prime) numbers; thus negation detects primality.


Explanation:

The first half of the expression,

.?

says that the strings "" (0) and "1" (1) are matches, i.e. not prime (by definition, though arguable.)

The second half, in simple English, says:

Match the shortest string whose length is at least 2, for example, "11" (2). Now, see if we can match the entire string by repeating it. Does "1111" (4) match? Does "111111" (6) match? Does "11111111" (8) match? And so on. If not, then try it again for the next shortest string, "111" (3). Etc.

You can now see how, if the original string can't be matched as a multiple of its substrings, then by definition, it's prime!

BTW, the non-greedy operator ? is what makes the "algorithm" start from the shortest and count up.


Efficiency:

It's interesting, but certainly not efficient, by various arguments, some of which I'll consolidate below:

  • As @TeddHopp notes, the well-known sieve-of-Eratosthenes approach would not bother to check multiples of integers such as 4, 6, and 9, having been "visited" already while checking multiples of 2 and 3. Alas, this regex approach exhaustively checks every smaller integer.

  • As @PetarMinchev notes, we can "short-circuit" the multiples-checking scheme once we reach the square root of the number. We should be able to because a factor greater than the square root must partner with a factor lesser than the square root (since otherwise two factors greater than the square root would produce a product greater than the number), and if this greater factor exists, then we should have already encountered (and thus, matched) the lesser factor.

  • As @Jesper and @Brian note with concision, from a non-algorithmic perspective, consider how a regular expression would begin by allocating memory to store the string, e.g. char[9000] for 9000. Well, that was easy, wasn't it? ;)

  • As @Foon notes, there exist probabilistic methods which may be more efficient for larger numbers, though they may not always be correct (turning up pseudoprimes instead). But also there are deterministic tests that are 100% accurate and far more efficient than sieve-based methods. Wolfram's has a nice summary.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...