The Magic Sieve of Eratosthenes
Surely everyone who reads this post more than once used, or at least heard about the sieve of Eratosthenes  a method of finding prime numbers. The very problem of obtaining prime numbers occupies a key place in mathematics; some cryptographic algorithms, such as RSA, are based on it. There are quite a few approaches to this problem, but in this article I will focus on some modifications of the simplest of them  the sieve of Eratosthenes.
The principle of the sieve is simple: let us need to find primes in the interval from unity to some N <= 10 ^ 6 . We start the array with N elements and fill it with true. Then we successively go through it to the root of N, and meeting true, cross out all numbers with this step to N. The algorithm looks compact and simple, I quote it in java.
 Arrays.fill(isPrime,true);
 isPrime[1] = false;
 for (int i=2; i*i < N; i++)
 if (isPrime[i])
 for (int j=i*i; j < N; j+=i)
 isPrime[j] = false;

The algorithm works for O (N * log (log (N))) , so it is quite suitable for small numbers. Consider the case when we need to find primes not from one to N, but from n to m. Let us have the restrictions 1 <= m <= n <= 10 ^ 9; nm <= 10 ^ 6 . Here we need to apply an algorithm called a double sieve. It consists in finding primes to the root of n, then storing them in a separate array and in the same way “deleting” these primes with a certain step, but from the interval [m, n] we need. Briefly, this algorithm will look like this:
 int primes = new int[P]; // заполняем его простыми до корня из n
 boolean sieve = new boolean[nm+1]; // вторичное решето
 Arrays.fill(sieve, true);
 for (int i= 0; i
 int h = m % primes[i];
 int j = h == 0 ? 0 : primes[i]  h;
 for (; j<=nm; j+=primes[i])
 sieve[j] = false;

So you can cope with ranges far enough from zero. Now we come to the most interesting point: what if we need to print prime numbers from 0 to N = 10 ^ 8 ? If we recall the asymptotics, at first glance it seems real even for a regular sieve, but looking at the memory costs: 10 ^ 8 * 4 = 400MB, we see that the task is not so trivial, especially if we have strict time limits. To solve this problem, two approaches can be distinguished, namely:
 sieve packaging
 cache count
Let's consider them a little more in detail. The packaging is this: in modern programming languages, the boolean type takes on average one to four bytes, although it can be encoded with just one bit. Therefore, we can create an array of integers and work with each of them as a bitmask, thereby saving memory by 830 times. The advantages are immediately visible, but now let us dwell a little more on the second idea, which is guaranteed to speed up the sieve several times
Many people know that in our computer there is a cache memory between the processor and RAM, work with it is much faster than with RAM, but its size is limited. For example, when working with a large array, the processor loads some part of it into the cache, works with it, then transfers it back to the operational one, loads another, and so on. And now let's recall our sieve algorithm: we deleted each prime number from the entire array, going through it from beginning to end. Therefore, the processor will load different segments of the array into the cache many times and the speed at this will be greatly lost. This approach suggests minimizing the cost of copying an array from one memory to another. It’s easy to do this if our entire gap is divided into pieces, up to 3 * 10 ^ 4 elements, which is approximately equal to the size of the cache and work with them in order. Then we will deal with the entire array for the minimum number of downloads. It will look something like this:
 int CACHE = 30000; // размер кэша
 int M = (int)Math.sqrt(N)+1;

 int primes = new int[P]; // массив простых чисел до корня из N
 boolean segment = new boolean[CACHE]; // вторичное решето
 for (int I=M1; I < N; I+=CACHE) {
 Arrays.fill(segment, true);
 for (int i= 0; i < P; i++) {
 int h = I % primes[i];
 int j = h > 0 ? primes[i]  h : 0;
 for (; j < CACHE; j+=primes[i])
 segment[j] = false;
 }
 for (int i= 0; i
 if (segment[i] && (i + I < N)) {
 out.println(i+I); // выводим простое число на экран
 }
 }
 }
Using this method, we greatly accelerate our sieve, practically without complicating it, for many tasks this helps significantly, for example, one of them: TDPRIMES , in this task you can practice and see well that a regular sieve does not fit in 10s, and a segmented one passes in 2.8.