
Do you know arrays?
- From the sandbox
- Tutorial
I think that few of those preparing for their first interview, when hiring for their first job as a (pre) junior programmer, will answer this question in the negative. Or at least doubt the positive answer. Of course, such a simple data structure with direct access by index - no tricks! No, in some languages like JavaScript or PHP, arrays are, of course, implemented very interestingly and in fact are much more than just an array. But this is not about that, but about the "traditional" implementation of arrays in the form of a "continuous piece of memory." In this case, based on the indices and the size of one element, the address is simply calculated and the corresponding value is accessed. What's so complicated?
Let's figure it out. For example, in Java. We ask an unsuspecting applicant to create an array of integers nx n . A man confidently writes something in the spirit:
Excellent. Now we ask you to initialize the elements of the array with something. At least units, at least the sum of the indices. We get:
They even write more often
which is also a reason for conversation, but now we are talking about something else. We are trying to find out what a person knows and see how he thinks. Therefore, we draw his attention to the fact that the values are located symmetrically and we ask you to save on iteration of cycles. Of course, why run through all the index values when you can only go through the lower triangle? The subject usually agrees easily and wisely highlighting the main diagonal carefully writes something in the spirit:
Instead
Now is the time for a major blow. We run both versions of the code on some rather large value of n (of the order of several thousand), for example, like this .
What do we see? The optimized version works 10-100 times slower! Now is the time to observe the reaction of the applicant for the post. What will be the reaction to an unusual (or rather, usual in the developer's practice) stressful situation. If the face of a client showed excitement and he began to press buttons temporarily forgetting about your existence, then this is a good sign. To a certain extent. You do not want to hire a researcher who does not care about the result of the project? Then do not ask him the question “Why?”. Ask to remake the second option so that it really works faster than the first.
Now you can safely go about your business for some time. In half an hour you will have enough material to evaluate the main personal and professional qualities of the applicant.
By the way, when I briefly described this problem on my working site, the most popular comment was "This is your Java curve like this." Especially for them, I post the code on the Great and Free. And the lucky owners of Free Pascal for Windows can take a peek
In the above Pascal code, I removed the “confusing” points and left only the essence of the problem. If it can be called a problem.
What kind of questions do we get for the client?
1. Why has it become slower? And in more detail ...
2. How to make initialization faster?
If there is a need to dig deeper into the Java implementation, we ask the applicant to observe the execution time for small values of n . For example, on ideone.com for n = 117, the “optimized” option works twice as slow. But for the next value n = 118, it is already 100 (one hundred) times faster than not optimized! Suggest experimenting on a local machine. Let him play with the settings.
By the way, does everyone understand what is happening?
So far, the version is leading that the processor cache is “to blame”. Those. sequential access in the first embodiment works within a hash, which is updated when moving beyond a certain border. When accessing through columns, the hash is constantly updated and this takes a lot of time. Let's check this version in its purest form. Let's start an array and compare, which is faster - to process all the elements in a row or as many times to process the elements of an array with a random number? This program is ideone.com/tMaR2S . For 100,000 array elements, random access is usually noticeably faster. What does this mean?
Here I was rightly pointed out (Big_Lebowski) that permutation of cycles changes the results in favor of a sequential version. I had to put a cycle for warming up for the purity of the experiment. At the same time he made several repetitions in order to deduce the average operating time as advised by leventov. It turned out so ideone.com/yN1H4g . Those. random access to elements of a large array is ~ 10% slower than sequential. Perhaps the truth may be played by the cache. However, in the initial situation, performance sagged significantly. So there’s something else.
Gradually, a version about additional actions when moving from one line of the array to another comes out to the leaders. And it is right. It remains to figure out what exactly is happening there.
Let's figure it out. For example, in Java. We ask an unsuspecting applicant to create an array of integers nx n . A man confidently writes something in the spirit:
int g[][] = new int[n][n];
Excellent. Now we ask you to initialize the elements of the array with something. At least units, at least the sum of the indices. We get:
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
g[i][j] = i + j;
}
}
They even write more often
for(int i = 0; i < g.length; i++) {
for(int j = 0; j < g[i].length; j++) {
g[i][j] = i + j;
}
}
which is also a reason for conversation, but now we are talking about something else. We are trying to find out what a person knows and see how he thinks. Therefore, we draw his attention to the fact that the values are located symmetrically and we ask you to save on iteration of cycles. Of course, why run through all the index values when you can only go through the lower triangle? The subject usually agrees easily and wisely highlighting the main diagonal carefully writes something in the spirit:
for(int i = 0; i < n; i++) {
g[i][i] = 2* i;
for(int j = 0; j < i; j++) {
g[j][i] = g[i][j] = i + j;
}
}
Instead
g[i][i] = 2* i;
often write g[i][i] = i + i;
or g[i][i] = i << 1;
and this is also an occasion to talk. But we go further and ask the key question: How much faster will the program work? . The usual reasoning is as follows: almost 2 times less than index calculations; almost 2 times less than the calculation of values (summation); as many assignments. That means 30 percent faster. If a person has a good mathematical school, you can even see the exact number of operations saved and a more reasoned estimate of the effectiveness of optimization. Now is the time for a major blow. We run both versions of the code on some rather large value of n (of the order of several thousand), for example, like this .
Time Controlled Code
class A {
public static void main(String[] args) {
int n = 8000;
int g[][] = new int[n][n];
long st, en;
// one
st = System.nanoTime();
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
g[i][j] = i + j;
}
}
en = System.nanoTime();
System.out.println("\nOne time " + (en - st)/1000000.d + " msc");
// two
st = System.nanoTime();
for(int i = 0; i < n; i++) {
g[i][i] = i + i;
for(int j = 0; j < i; j++) {
g[j][i] = g[i][j] = i + j;
}
}
en = System.nanoTime();
System.out.println("\nTwo time " + (en - st)/1000000.d + " msc");
}
}
What do we see? The optimized version works 10-100 times slower! Now is the time to observe the reaction of the applicant for the post. What will be the reaction to an unusual (or rather, usual in the developer's practice) stressful situation. If the face of a client showed excitement and he began to press buttons temporarily forgetting about your existence, then this is a good sign. To a certain extent. You do not want to hire a researcher who does not care about the result of the project? Then do not ask him the question “Why?”. Ask to remake the second option so that it really works faster than the first.
Now you can safely go about your business for some time. In half an hour you will have enough material to evaluate the main personal and professional qualities of the applicant.
By the way, when I briefly described this problem on my working site, the most popular comment was "This is your Java curve like this." Especially for them, I post the code on the Great and Free. And the lucky owners of Free Pascal for Windows can take a peek
under the spoiler
program Time;
uses
Windows;
var
start, finish, res: int64;
n, i, j: Integer;
g: Array of Array of Integer;
begin
n := 10000;
SetLength(g, n, n);
QueryPerformanceFrequency(res);
QueryPerformanceCounter(start);
for i:=1 to n-1 do
for j:=1 to n-1 do
g[i,j] := i + j;
QueryPerformanceCounter(finish);
writeln('Time by rows:', (finish - start) / res, ' sec' );
QueryPerformanceCounter(start);
for i:=1 to n-1 do
for j:=1 to n-1 do
g[j,i] := i + j;
QueryPerformanceCounter(finish);
writeln('Time by cols:', (finish - start) / res, ' sec' );
end.
In the above Pascal code, I removed the “confusing” points and left only the essence of the problem. If it can be called a problem.
What kind of questions do we get for the client?
1. Why has it become slower? And in more detail ...
2. How to make initialization faster?
If there is a need to dig deeper into the Java implementation, we ask the applicant to observe the execution time for small values of n . For example, on ideone.com for n = 117, the “optimized” option works twice as slow. But for the next value n = 118, it is already 100 (one hundred) times faster than not optimized! Suggest experimenting on a local machine. Let him play with the settings.
By the way, does everyone understand what is happening?
A few words to justify
I want to say a few words to justify this method of interviewing for hire. Yes, I do not test knowledge of language syntax and knowledge of data structures. Perhaps in a civilized labor market, this all works. But in our conditions of a total shortage of qualified personnel, we have to evaluate the prospective adequacy of the applicant more likely to the work with which he will face. Those. ability to learn, break through, figure out, do.
In spirit, this is similar to an “interview” in recruiting legionnaires in ancient Rome. The future warrior was greatly frightened and looked he blushes or turns pale. If it turns pale, then in a stressful situation, the applicant’s blood drains from the head and he is prone to a passive reaction. For example, to faint. If the applicant blushed, then blood rushes to his head. Those. he is prone to action, rushing into a fight. This was considered fit.
Well, the last. Why did I tell everyone about this task, and do not continue to use it in interviews? Simply, potential applicants have already “learned” this task and others have to use it.
Actually, I drew attention to this effect precisely in connection with the real task of image processing. The situation was somewhat confusing and I did not immediately understand why I had fps so low after refactoring. In general, there are probably a lot of such wonderful moments for everyone.
In spirit, this is similar to an “interview” in recruiting legionnaires in ancient Rome. The future warrior was greatly frightened and looked he blushes or turns pale. If it turns pale, then in a stressful situation, the applicant’s blood drains from the head and he is prone to a passive reaction. For example, to faint. If the applicant blushed, then blood rushes to his head. Those. he is prone to action, rushing into a fight. This was considered fit.
Well, the last. Why did I tell everyone about this task, and do not continue to use it in interviews? Simply, potential applicants have already “learned” this task and others have to use it.
Actually, I drew attention to this effect precisely in connection with the real task of image processing. The situation was somewhat confusing and I did not immediately understand why I had fps so low after refactoring. In general, there are probably a lot of such wonderful moments for everyone.
So far, the version is leading that the processor cache is “to blame”. Those. sequential access in the first embodiment works within a hash, which is updated when moving beyond a certain border. When accessing through columns, the hash is constantly updated and this takes a lot of time. Let's check this version in its purest form. Let's start an array and compare, which is faster - to process all the elements in a row or as many times to process the elements of an array with a random number? This program is ideone.com/tMaR2S . For 100,000 array elements, random access is usually noticeably faster. What does this mean?
Here I was rightly pointed out (Big_Lebowski) that permutation of cycles changes the results in favor of a sequential version. I had to put a cycle for warming up for the purity of the experiment. At the same time he made several repetitions in order to deduce the average operating time as advised by leventov. It turned out so ideone.com/yN1H4g . Those. random access to elements of a large array is ~ 10% slower than sequential. Perhaps the truth may be played by the cache. However, in the initial situation, performance sagged significantly. So there’s something else.
Gradually, a version about additional actions when moving from one line of the array to another comes out to the leaders. And it is right. It remains to figure out what exactly is happening there.