News:

MyKidsDiary.in :: Capture your kids magical moment and create your Online Private Diary for your kids

Main Menu

Prime numbers : C# in vb.net

Started by dhilipkumar, Dec 11, 2008, 08:43 PM

Previous topic - Next topic

dhilipkumar

This article will explore my "challenge" math library, talking about prime generation, primality testing, prime factorization and some other useful functions too.

Topics:

►Introduction
►The basics
►Prime generation
►Primality test
►Prime factorization
►Other stuff

Part 1: Introduction
Welcome! If you're searching the code to implement AKS in C# to factor RSA-2048 and win $200.000 (no it's not a php variable...) then you are in the wrong place. We're talking much simpler here. No AKS, no elliptic curves. Just something very tricky that everybody can understand but nonetheless very useful for many purposes.
I am addicted to challenges and when I play on maths challenge sites I use a math library I built challenge after challenge. I'm going to share this library with you in this article.


Part 2: The basics
So, let's talk about primes.
A prime number is an integer number that can be divided by 1 and itself and no others. This means that dividing a prime number for any number except itself or 1 will yield some non-integer result. The smallest prime number is number 2. Hey! You would complain about number 1 right? In fact it is divisible by itself and by 1 so it should be considered a prime number. What can I say... yes... you're right, and in fact it has been considered as a prime for many many years long time ago but not today since its primality is considered to be trivial.

Prime numbers were known in ancient Greece (to name just one) and are indeed the very basics of maths as we know it today. It is amazing to see how many times prime numbers are related to something we would never expect to be related to them. They are magical, absolutely non trivial and probably one of the most studied subjects today all over the world. If you can use you credit card when you buy some stuff in internet you should thank prime numbers, since your security depends on them.
In fact, one of the 7 most difficult math problems today is still the "Riemann hypotesis", which is considered to be the problem of the millennium.
So, let's begin making an extension of the prime definition:

A prime number is an integer number greater than 1 which is divisible just by itself (yielding 1) and 1 (yielding itself).


Part 3: Prime generation and primality testing
Let's see some methods that we could use to generate primes.
We could start with the very stupidest one. you start from 2 and check for any number if it is divisible by itself or 1. I mean, something like this:

int our_desired_limit=100;

for (int n=2;n<our_desired_limit;n++) {
    bool isPrime=true;
    for (int j=2;j<n;j++) {
        if ((n%j)==0) {
            isPrime=false;
            break;
        }
    }
    if (isPrime) {
        Console.WriteLine("{0} is prime.", n);
    }
}   


This little piece of code will show you all the primes from 2 to 100.
Can we improve it? yeah, a lot.
First of all consider this:

Let's say n is a number. You divide it by number a. The result is number b (a and b are always < n) hence n=a*b. What can you notice doing this operation? Well, I say that when a<sqrt(n) then b>sqrt(n) and when b<sqrt(n) then a must be >sqrt(n). Why? Let's call t=sqrt(n). Hence t*t=n. If a<sqrt(n) -> a<t hence a*t<t*t=n so a*t<n. To get to n you must multiply a for a number b which is surely > t. The vice versa is the same.
So, if I were checking the primality of number n, should I check until n-1 to be sure it's prime (as we did in the first snippet of code) or maybe I can stop somewhere before? I can stop at sqrt(n). In fact, checking from 2 to sqrt(n) is perfectly enough to be sure for primality. The reason is simple but let's analyse it.


►First case: n not a square number.

If n is not a square number and you find a number a that divides n, then you get some number b such that n=a*b. If a<sqrt(n) then b>sqrt(n). If a>sqrt(n), checking from 2 to sqrt(n) means that you should have found some b<sqrt(n) such that n=a*b. So if you didn't find any b<sqrt(n), you cannot find any b>sqrt(n).

►Second case: n is a square number.

Now let's see if n is a square number. Let's imagine n=p*p where p is a prime number. This means that the only number that divides n other than itself and 1 is p. If you go from 2 to p=sqrt(n) you find p and you're done, n is not prime. If n is not the square of a prime number, you're going to find a when you encounter the smallest divisor of n and this surely will happen before you get to sqrt(n).

Ok then, so we could change the code of the first snippet to:

int our_desired_limit=100;

for (int n=2;n<our_desired_limit;n++) {
    bool isPrime=true;
    for (int j=2;j<=(int)Math.Sqrt(n)+1;j++) {
        if ((n%j)==0) {
            isPrime=false;
            break;
        }
    }
    if (isPrime) {
        Console.WriteLine("{0} is prime.", n);
    }
}


And we are sure that the result will be correct anyway. But doing this we have reduced a lot the running time of the algorithm since now we stop our check at sqrt(n) and not at n.
Note that +1 in the for clause. Its meaning is because we must ensure to check until sqrt(n) and the sqrt() of a non square number gets truncated by int conversion.
So we just add 1 to be sure we check correctly.

Can we improve this again? Sure! What about this thougth? Every even number is divisible by 2. So, once n%2==1 (odd number) why to bother trying to divide n by any even number? No reason, since we know that being n an odd number it isn't divisible by any even number.
So, let's just check if n is divisible by 2. If not so, let's divide it by 3,5,7 and so on.

The code will look like this:

int our_desired_limit=100;

for (int n=2;n<our_desired_limit;n++) {
    bool isPrime=true;
    if (n==2) {
        Console.WriteLine("2 is prime");
        continue;
    } else {
        if ((n%2==0)) {
            // even number and not 2 => not prime
            continue;
        }
    }

    // now we know that n is not 2 and is not even so we adjust the check like this:

    for (int j=3;j<=(int)Math.Sqrt(n)+1;j+=2) {
        if ((n%j)==0) {
            isPrime=false;
            break;
        }
    }
    if (isPrime) {
        Console.WriteLine("{0} is prime.", n);
    }
}




dhilipkumar

so now the snippet just looks if n==2. If so then output that 2 is a prime number.
If not, check if it is even. If so then it is divisible by 2 hence is not a prime. If not, it is an odd number and hence we can test it just using odd numbers 3,5,7, ... , sqrt(n).

Nice, we have gone further reducing by (almost) a half the running time of the algorithm, since we now check just one number out of 2 (foreach couple of numbers j,j+1 where j is even, we just check with j+1).

Ok, you could think we're done now. The algorithm is fast enough. I disagree! There is a lot more that can be done. With the even number trick we just check 1500 numbers instead of 3000 (1 out of 2) but I can check just 1000 instead of 3000 (which is 1 out of 3).
Let's see why and how.

First consider this:

Every integer number n can be written like this: n=6*k+a where a belongs to [-1..4] and k is an integer number belonging to the Z set (positive and negative integer numbers with 0).
For example number 3 is written like 6*0+3 so k=0, a=3.
Number 18 is 6*3+0 so k=3, a=0.
Number 23 is 6*4-1 so k=4, a=-1.

Why this should help? Because:

If a=0 then 6*k is divisible by 2 and 3 hence not prime.
If a=2 then 6*k+2=2*(3*k+1) is divisible by 2 hence not prime.
If a=3 then 6*k+3=3*(2*k+1) is divisible by 3 hence not prime.
If a=4 then 6*k+4=2(3*k+2) is divisible by 2 hence not prime.

So, we have just realised that a prime number is necessarly of the form 6*k+1 or 6*k-1.

This means that when we check for primality we can discard any number of the form 6*k+a if a is 0,2,3 or 4.
So, -1..4 is six numbers. We have discarded 4 of them hence we check 2 out of 6 numbers which is 1 out of 3 I promised some lines before.

Let see how our algorithm becomes now:
(I have added an IsPrime() method to simplify the code)

class Program {

    public static bool IsPrime(long n) {
        // 2 and 3 are primes
            if (n==2 || n==3) return true;
            // check if even, divisible by 3, less than 2
            if ((n&0x1)==0 || n%3==0 || n<2) return false;
            long root=(long)System.Math.Sqrt(n)+1L;
            // we check just numbers of the form 6*k+1 and 6*k-1
            for (long k=6;k<=root;k+=6) {
                if (n%(k-1)==0) return false;
                if (n%(k+1)==0) return false;
            }
            return true;
        }

        static void Main(string[] args) {



            int our_desired_limit=100;

            for (int n=2;n<our_desired_limit;n++) {
                if (IsPrime(n)) Console.WriteLine("{0} is prime", n);
            }

            Console.ReadLine();

        }
    }


Check the IsPrime() method. It's commented to be understood.
Ok then, this example won't go any further because it's quite improved since the first version.

The best thing we could do to improve it a little more is to divide n just by prime numbers since they only are needed to check for primality. In fact, any number can be either prime or not prime (called composite) and when a number is composite you can always write it in a form such this:

n=p1e1 * p2e2 * ... * pheh where p1..ph are prime numbers and e1.. eh are their exponents.

To find all the prime factors of a number is called to factorize it.
For example number 200 can be factorized like this:

200 = 8*25 = 23 * 52

So p1=2, e1=3, p2=5, e2=2.


So, we need prime numbers, not just some of them, we want all of them in a given interval like [2..pn]. Ok, to divide just by prime numbers we must have an array of primes to use. Let say we want the first 1000 primes stored in an int[1000] array p.

We set the array (filled by default with 0's), then we put p[0]=2 and p[1]=3, which are the first two prime numbers. Then we just start from the first useful 6*k-1 number which is 5. Now, since our array is filled with the last tot primes we have found, we just use our array to check the primality of our number n. If it is prime then we put it in the array, else we check the next number. Obviously we stop our check at the root of the number we're checking.

Let's make an example:
We just have p[0]=2 and p[1]=3.
Next number is 6*1-1=5. 5 is divisible by 2 or 3? no. sqrt(5)=2 so we're done => p[2]=5.
Next is 6*1+1=7. Sqrt(7)=2 and 7 is not divisible by 2 or 3 so p[3]=7.
Next one is 6*2-1=11. Sqrt(11) is 3 so, since 11/2, 11/3 are not integer p[4]=11.
Next is 6*2+1=13. Sqrt(13)=3. 13/2, 13/3 are not integer so p[5]=13.
... and so on ...

This technique combines three tricks together.

►We just check for the primality of numbers which can be written as 6*k+1 or 6*k-1 for some k.

►We just divide our checked number by primes.

►We stop at sqrt(n) (eventually +1 to avoid int truncation when casting).

The C# code to do all this is wrapped in this method:

public static int[] GeneratePrimes(int dim) {
            if (dim<=0) {
                throw new ArgumentOutOfRangeException("Dim must be > 0");
            } else if (dim==1) {
                return new int[] { 2 };
            } else {
                int[] p=new int[dim];
                int root;
                p[0]=2;
                p[1]=3;
                int k=6;
                int n;
                int needle=-1;
                int index=2;
                bool prime=true;
                while (index<dim) {
                    n=k+needle;
                    root=(int)System.Math.Sqrt(n)+1;
                    for (int j=0;j<index && p[j]<=root;j++) {
                        if (n%p[j]==0) {
                            // not prime
                            prime=false;
                            break;
                        }
                    }

                    if (prime) {
                        // prime number
                        p[index++]=n;
                    } else {
                        prime=true;
                    }
                    k=(needle==1?k+6:k);
                    needle*=-1;
                }
                return p;
            }
        }


With this method I can generate the first 1000000 prime numbers in about 10 seconds on my p4 2.5 Ghz machine which I think is a nice result. Obviously, there are many other methods to do it, surely some others much quicker, but the aim of this article is to keep simplicity because this library will be for everybody, not just for who has studied high level maths. If you're asking yourself if this method will fail for any dim the answer is no. In fact, when we check primes we are sure that our array will be enough to test any n from 2 to sqrt(n). This is assured by some math theorems that I'm not going to talk about here.

Ok, we have our prime array (p from now on).
Let's see how can we change the IsPrime() method in order to use it and hence to speed itself up.

public static bool IsPrime(long num , int[] p) {
            if (num<2) return false;
            int i;
            long root=(long)System.Math.Sqrt(num)+1L;
            for (i=0; i<p.Length && p<root; i++) {
                if (num%p==0) {
                    return false;
                }
            }
            if (i==p.Length) {
                throw new ArgumentOutOfRangeException("p["+p.Length+"] is not enough to factorize "+num);
            }
            return true;
}   



Very easy. If num<2 it can't be prime. After that we just do the check of that number using our array. What if the smallest prime factor of our number is e.g. 97 and we have p which length is say 10? p[9]=29 and we can't reach 97 with our short p array. The for cycle exits without returning false and we would get a true result. Note that I avoid this possibility with the last check before returning. If we've come to the end of p and no prime factor has been found, maybe p is not long enough to check n so I throw an exception. This check could also be done at first but it would be not good for 2 reasons:

If you were to check 220 for example, you could just use p[0] which means p with length 1. And you could say "false! it is a composite number", but making the check before the for() cycle would yield an exception since p is too small to factorize a so large number. Also, I use this library to solve challenges and when you're talking about mathschallenge challenges you need speed. So I make a study about p length before starting my solver and I'm sure that p will fit the requirements hence I don't get in troubles with exceptions and I save time. Most of the time with a well designed algorithm you can solve a problem in 20 msecs while a trivial algorithm could take years!

In the file there are some IsPrime() overload. One with p array but for n int and not long, and another one designed when you don't have the primes array. They come in handy some times so I left them where they are. Let's move forward and see how primes factorization works.

dhilipkumar


Part 4: Prime factorization
As I quickly said before, prime factorization is the process that given a number n returns the factorization of n with prime numbers. For many reasons it could be useful to get the prime factorization of a number. Many math formulas work with factorization so I wrote my own algorithm to factorize a number.

Now, since I factorize one number at a time I chose to put in my Zm class three static fields. A long[] array Fact, representing the prime factors of a number, an int[] array Exp, representing the corresponding exponents and an int count field, sorrounded by a Count accessor.

If you were factorizing a number n e.g.
n= 23 * 52 * 114
then Fact would be like {2,5,11,*,*,*, ...,*}, Exp would be like {3,2,4,*,*,*,...,*} and count would be number 3.

This way you just cycle for(int x=0; x<count; x++) and get all the factors and exponents you just calculated. Notice that I don't care if from count to end of Fact and Exp there are other numbers given for example from a previous factorization. Count variable will always refer to last factorization.

As for the IsPrime() method I have written a couple of Factorize() methods. The first one takes the number n to be factorized and an array of primes whilst the second one takes just the number n to be factorized and works using the 6k+1,6k-1 trick.
I'll explain just the first one since I suppose it's the one you'll use most of the times.


public static void Factorize(long n, int[] p) {
            if (n<2) throw new ArgumentOutOfRangeException("n must be a positive number > 1");
            try {
                count=0;
                long root=(long)System.Math.Sqrt(n)+1L;
                for (int i=0; p<=root ;i++) {
                    if (n%p==0) {
                        n/=p;
                        Fact[count]=p;
                        Exp[count]=1;
                        while (n%p==0) {
                            n/=p;
                            Exp[count]++;
                        }
                        count++;
                        root=(long)System.Math.Sqrt(n)+1;
                    }
                }   
                if (n!=1) {
                    Fact[count]=n;
                    Exp[count++]=1;
                }   
            } catch {
                throw new ArgumentOutOfRangeException("P not long enough to factorize "+n);
            }
}


As you can see, the first thing is to set count=0. Then we start trying to see if number n is divisible by our prime numbers stored in p[]. When I encounter a prime that divides n, I put it in the Fact array and I put a 1 in the Exp array at the current position. Then I try to divide n again and while I get 0 as result I increase the exponent. Then the cycle goes ahead, increasing count and updating the root of the number we're factorizing, until p>root.

The cycle stops and we have to make one last check here, to see if n!=1.
If that is true it means that n is a prime number then it get pushed in the Fact array and its corresponding exponent set to 1 in the Exp array. If p is not long enough to factorize n, the for cycle will increase i until it goes out of p bounds. An exception will then be raised and catched by the code and then I'll raise my own "p not long enough" exception.

Note that Fact and Exp dimensions are set to 20. Maybe you would expect much more since maybe there could be long numbers that once factorized will yield more than 20 factors.
Well, this doesn't happen, because the product of the first 16 prime number exceeds long.MaxValue constant so you're sure that, being that the minimum factorization with all exponent set to 1, there is no other factorization longer than that one for any other long number. I just set 20 because I like round numbers.

If you are going to put this code for example in a java application with BigIntegers then you have to increase the length of Fact and Exp and consider making a check in the Factorize() method.



Part 5: Other stuff
The other stuff I left in the file I'm publishing here is just a bunch of functions I use sometimes and I consider quite useful. They are the Least Common Multiple (Lcm) function, the Greatest Common Divisor (gcd) function, the combinatorial nCr functions (gives the number of ways you can choose r different objects within a set of n distinct objects and it is n!/(r!*(n-r)!) but the calculation takes place with a smart semplification).

There is also the Phi function, AKA Euler Totient Function, which I needed some times playing on mathschallenge.net. They are all of immediate comprehension so I won't waste your time here. I'll just say one thing about the first Lcm() method. It doesn't use the classic Euclid algorithm (the second one does), but instead it uses prime factorization to fast calculate the the Lcm. That's a trick I learned on mathworld which is a site that I strongly suggest you to visit if you are interest in learning some maths.


A little disclaimer
The Zm class code is available here. Even if this code has been used for about one year to solve tens (or maybe hundreds) of challenges and it never has given troubles to me, please be aware if you're planning to use it in your own applications. I suggest you to read it, understand it and eventually correct bugs I didn't see.

I hope you liked the journey and know your knowledge about primes is deeper than before.


Curiosities
I want to leave you with a nice true story about prime numbers and how they are related to the most beautiful and strange situations in real life.
I'll tell you about cicadas.

These insects have a very strange life. They spend most of their life in the ground. At some point they get out. They have fun, eat and party for some weeks, then the females plant a bunch of eggs in the wood of the trees. After some weeks again the eggs hatch up and the babies get out. They immediately find their way down and dig into the ground ready to spend many years underground.

After some years they get out again and start another life cycle.
Now, as many other insects, they are killed (eaten) by periodic predators. There are for example the cicada killer wasps. These predators come out of the ground and eat the cicadas every two years so the cicadas had to find out a method to reduce the chances that the killer wasps had to meet them. How did they do? Simple! They started getting out from the ground every p years where p is typically 7, 13 or even 17 years. 7, 13 and 17 are prime numbers and this strongly reduces the possibilities for the two insects to find each other off the ground.

Suppose in fact that the cicadas get out every 12 years. If the predator would come out every 2, 3, 4 or 6 years, they will always find some cicadas off the ground. But if the cicadas would get out every 13 years, then they will meet just once every 26 years (or 39 or 52 or even 78 years, accordingly to how often the predators get out of the ground)!


surgency

In "other info" you talked about a file ("the other stuff I left in the file").
So, i liked your post and all this stuff, but i cannot find the file. Where is the file?
EDIT: sorry to bump though, your inbox is full