Java job interview. Collections
Recently, I have had a persistent idea that professional development has slowed down a lot and I want to somehow fix it. Yes, I read books, listen to courses, but at the same time, it also comes to understanding that maybe it's time to change jobs, it seems like everything has been studied here, we are gradually moving into a routine. This idea encouraged me to send my resume to several companies - market leaders. After completing the interview in 3 of them, I decided how to bring my 5 kopecks into the coverage of the extensive topic of the interview, namely technical issues on Java collections that I have to deal with. Yes, I know, the reader will say: “collections are a hackneyed topic as much as possible,” but I asked part of the questions below to my acquaintances who are developers who occupy precisely the positions of developers (“strong middle peasants,” by the standards of a remote place not far from Moscow, which are confidently doing their work in practice, but in theory, let’s say there are gaps, because the work does not require solving some non-trivial tasks, and because not everyone is interested in studying inside data structure works), caused confusion. I think that the material discussed will not be very interesting to developers above the Junior level (I will ask them to comment, supplement and criticize the material presented here), but Junior’s sure will find in this article interesting for themselves.
Frankly, during the interview I didn’t know the answers to some of the questions below, although it seems like the junior stage has already passed. This is doubly offensive, given the position in those companies where sympathy arose from everything from communicating with HR and ending with a possible future area of activity, it was not possible to receive an offer and there were questions about collections that I could not handle (I’m sure they have made their negative contribution). But where everything went quite well from the point of view of the interview, the proposed field of activity and communication with future colleagues as a whole left a negative, so the law of "meanness" is in all its glory. As a result, with this topic I want to fill in the gaps found in my head + systematize this knowledge on paper.
In the article, I will consider not only the questions that caused me difficulties at the last interviews, but also the questions that I was asked for all my practice of interviewing. Well, I think it's time to move on to the questions:
1. What is the difference between ArrayList and LinkedList?
In my ranking, this is one of the two most popular questions about the collection, they ask in 90% of cases. It caused me a problem at my first interview at Junior Developer. In short, the answer to this question boils down to the following: ArrayList is an array-based list, and LinkedList is a classic linked list based on objects with links between them.
Advantages of ArrayList: in the ability to access an arbitrary element by index for a constant time (since it is an array), the minimum overhead when storing such a list, insertion at the end of the list on average is also done for a constant time. Averagebecause the array has a certain initial size n (in the code this is the capacity parameter), by default n = 10, when writing n + 1 elements, a new array of size (n * 3) / 2 + 1 will be created, all will be placed in it elements from the old array + new, added element. As a result, we get that when adding an element if it is necessary to expand the array, the time for adding will be much longer than when writing an element to a finished empty cell. However, on average, the insertion time of an item at the end of the list is constant. Removal of the last element occurs in constant time. The disadvantages of ArrayList appear when inserting / deleting an element in the middle of the list - this causes overwriting all elements placed “to the right” in the list one position to the left, in addition, when deleting elements, the size of the array does not decrease,
LinkedList, on the contrary, can insert / delete elements in the list for a constant time (namely, insert and delete, searching for the position of the insert and delete is not included here). Access to an arbitrary element is carried out in linear time (but access to the first and last element of the list is always carried out in constant time - links are constantly stored to the first and last, so adding an element to the end of the list does not mean that you have to iterate over the entire list in search of the last item). In general, LinkedList in absolute terms loses ArrayList both in memory consumption and in speed of operations. LinkedList is preferably used when there is active work (insertion / deletion) with the middle of the list or in cases where a guaranteed time is required to add an item to the list.
For in-depth and at the same time express training, I highly recommend reading the wonderful articles by tarzan82 about ArrayList and LinkedList . I also recommend an article from lany on memory consumption by collections - very informative .
2. What do you usually use (ArrayList or LinkedList)? Why?
This question is a slightly disguised version of the previous one, since the answer to this question will lead to a gradual presentation of the answer to the previous question. In 90% of cases, ArrayList will be faster and more economical than LinkedList, so they usually use ArrayList, but nevertheless there are always 10% of cases for LinkedList. I say that I usually use ArrayList, referring to the tests and the last paragraph from the previous question, but I do not forget about LinkedList (in what cases? Also the last paragraph of the previous question helps).
3. What works faster ArrayList or LinkedList?
Another disguised version of the first question. It is trickier than the above options that posing a question implies a monosyllabic answer with the choice of one of the proposed options, which, as I understand it, should immediately identify a person with a shallow knowledge in collections. The correct action will be the counter question of what actions will be performed on the structure. As a result, the dialogue smoothly proceeds to answer the first question.
4. It is necessary to add 1 million. element, what structure are you using?
Also a pretty popular hidden version of the first question. Also, the statement involves the selection of one of the proposed options, although in fact there is no information for an unambiguous choice. Additional questions need to be asked: in which part of the list are items added? Is there any information about what will happen next to the list items? any restrictions on memory or speed of execution? In general, this is the same first question, but on a slightly different side: through additional questions, you show the depth of understanding of the work of Array and Linked List.
Once I myself “pecked” at this hook, thinking to myself what to add was to “insert” at the end of the list and intensively promoted ArrayList, although I did not know (and did not try to find out) about further actions with this list and possible restrictions.
5. How are elements removed from an ArrayList? How does the size of the ArrayList change in this case?
Again, the answer to question 1 contains the answer to this question. When you remove an arbitrary element from the list, all elements located to the “right” are shifted one cell to the left and the actual size of the array (its capacity, capacity) does not change in any way. There is a mechanism for automatic “expansion” of the array, but there is no automatic “compression”, you can only explicitly perform “compression” with the trimToSize () command.
6. Suggest an effective algorithm for removing several adjacent elements from the middle of the list implemented by ArrayList.
An unbroken, by my standards, question I met only once, when I did not know the mechanism for removing elements from an ArrayList. As a result, it caused me serious difficulties. In fact, everything is quite simple and obvious when you know how one element is deleted. Suppose you want to remove n elements from position m in the list. Instead of deleting one element n times (each time shifting by 1 position the elements that are “to the right” in the list), you need to shift all elements that are “to the right” of the n + m position by n elements to the left to the top of the list. Thus, instead of performing n iterations of moving list items, everything is done in 1 pass.
7. How is the HashMap arranged?
This is the second from a list of the most popular collection questions. I don’t even remember if there was a case when this question was not asked to me.
In short, a HashMap consists of “baskets”. From a technical point of view, “baskets” are array elements that store links to item lists. When adding a new key-value pair, it calculates the hash code of the key, based on which the basket number (cell number of the array), into which the new element gets, is calculated. If the basket is empty, then the link to the newly added element is saved in it, if there is already an element there, then the links are followed sequentially between the elements in the chain, in search of the last element, from which the link to the newly added element is placed. If an item with the same key was found in the list, then it is replaced. Adding, searching and deleting elements is performed in constant time. Everything seems to be cool, with one caveat, hash functions should evenly distribute items across baskets,
In general, this answer is quite enough for the question posed, then most likely a HashMap dialogue will begin, with an in-depth understanding of the processes and subtleties.
Again, I recommend reading the article tarzan82 on HashMap .
8. What is the initial number of baskets in a HashMap?
A rather unexpected question, again, he once made me guess the number of baskets when using the default constructor.
The answer here is 16. Answering, it is worth noting that it is possible using constructors with parameters: through the capacity parameter, set your initial number of baskets.
9. What is the estimate of the time complexity of selecting an item from a HashMap? Does HashMap guarantee the specified complexity of fetching an element?
The answer to the first part of the question can be found in the answer to question 7 - constant time is necessary for selecting an element. Here on the second part of the question, I recently got confused. And the HashMap device knew and also knew about the hash function, but wasn’t ready for such a question, it rushed in the other direction in general and focused on the structure of the HashMap, pushing back the problem of the hash code, which in my head was always used to consider a hash code with uniform distribution. In fact, the answer is quite simple and follows from the answer to question 7.
If you take a hash function that constantly returns the same value, then the HashMap will turn into a linked list, with disgusting performance. Then, even if you use a hash function with a uniform distribution, in the extreme case only the time complexity of log N will be guaranteed. So, the answer to the second part of the question is no, it is not guaranteed.
10. The role of equals and hashCode in a HashMap?
The answer to this question follows from the answer to question 7, although it is clearly not spelled out there. hashCode allows you to define a basket to search for an item, and equals is used to compare the keys of the items in the list inside the basket and the key to find.
11. The maximum number of hashCode () values?
Everything is pretty simple here, just remember the method signature: int hashCode (). That is, the number of values is equal to a range of type int - 2 ^ 32 (the exact range was never asked, such an answer was enough).
12. How and when does the number of baskets in HashMap increase?
This is a pretty subtle question. As my mini-survey showed, if the essence of the HashMap device is understood by many more or less clearly, then this question often confused the interlocutor.
In addition to capacity, the HashMap also has the loadFactor parameter, based on which the maximum number of busy baskets is calculated (capacity * loadFactor). By default, loadFactor = 0.75. Upon reaching the limit value, the number of baskets increases by 2 times. For all stored items, a new “location” is calculated taking into account the new number of baskets.
13. In what case can an element be lost in a HashMap?
This interesting question was sent to me by LeoCcoder, they didn’t ask me like that and honestly admit that after reading it right away I couldn’t come up with a script for losing an element. Everything, again, turned out to be quite simple, although not so explicit: for example, an object with several fields is used as a key, not a primitive. After adding an element to the HashMap, the object that acts as the key changes one field that is involved in the calculation of the hash code. As a result, when you try to find this element by the source key, the correct basket will be accessed, but equals (after all, equals and hashCode must work with the same set of fields) will not find the specified key in the list of elements. Nevertheless, even if equals is implemented in such a way that changing the given field of the object does not affect the result, then after increasing the size of the baskets and recalculating the hash codes of the elements,
14. Why can't you use byte [] as a key in a HashMap?
Another question from LeoCcoder. As usual, everything turned out to be quite simple - the hash code of the array does not depend on the elements stored in it, but is assigned when creating the array (the method for calculating the hash code of the array is not overridden and is calculated using the standard Object.hashCode () based on the address of the array). Also, arrays do not override equals and perform pointer comparisons. This leads to the fact that you cannot access the element stored with the array key when using another array of the same size and with the same elements, access can be done only in one case - when using the same array reference that was used to save item. For the answer to this question, a special thanks goes to the user @dark_dimius.
15. What are the differences between TreeSet and HashSet?
To begin with, Set is a set (also called a “set”). Set does not allow storage of two identical elements. Formally speaking, the term "multitude" and so it means a collection of different elements, it is very important that it is different elements, since this is the main property of Set. Given this definition, an explanation about the storage of identical elements is not required, but in everyday life, the concept of “set” has lost its strict meaning regarding the uniqueness of the elements included in it, so nevertheless specify this property of the set separately.
TreeSet provides orderly storage of elements in the form of a red-black tree. The complexity of performing basic operations in TreeSet lg N. A HashSet uses the same approach for storing elements as a HashMap, with the difference that in the HashSet the element itself acts as a key, in addition, the HashSet (like the HashMap) does not support ordered storage of elements and provides temporary complexity of operations like HashMap.
16. TreeSet device?
This question is asked instead of question 14, and here is a short answer that the TreeSet is based on a red-black tree. As a rule, this is enough and the interlocutor immediately proceeds to the next question; I have never been asked the mechanism of balancing the tree or other details of its implementation.
For express deepening knowledge of red and black wood, I recommend this article .
17. What happens if you add items to the TreeSet in ascending order?
Usually, the interlocutor precedes this question with the phrase that the TreeSet is based on a binary tree and if we add elements in ascending order, how they will be distributed across the tree.
If there is no exact idea about the TreeSet device, but there is a general understanding that this is a binary tree (which the interlocutor additionally assures us of), then this question can lead to an interesting result: all elements after being added to a regular binary tree will be in one branch with a length of N elements, which negates all the advantages of such a structure as a tree (in fact, a list is obtained). In fact, as mentioned above, TreeSet is based on a red-black tree that can balance itself. As a result, TreeSet doesn’t care in what order you add elements to it, the advantages of this data structure will be preserved.
Conclusion
I hope the questions discussed will be useful to the habrayuzer. I also ask you to forgive me for some possible naivete that the above questions require such a detailed consideration, but in due time such an article would seriously help me. I am sure that there are inaccuracies in the article - I ask for comments, in addition, I hope that more experienced comrades in the comments will actively share questions from their practice and, if the article is favorably received by the habrasociety, then it is quite possible to continue the review of technical issues for Java interviews.
PS A bit of mercantile interest: the search for a new job continues and if one of the habrayuzers is in the process of searching for a Java developer in a company with a modern approach to development and interesting tasks, or may just recommend taking a closer look at any suitable vacancy - I will be grateful, please PM.
Frankly, during the interview I didn’t know the answers to some of the questions below, although it seems like the junior stage has already passed. This is doubly offensive, given the position in those companies where sympathy arose from everything from communicating with HR and ending with a possible future area of activity, it was not possible to receive an offer and there were questions about collections that I could not handle (I’m sure they have made their negative contribution). But where everything went quite well from the point of view of the interview, the proposed field of activity and communication with future colleagues as a whole left a negative, so the law of "meanness" is in all its glory. As a result, with this topic I want to fill in the gaps found in my head + systematize this knowledge on paper.
In the article, I will consider not only the questions that caused me difficulties at the last interviews, but also the questions that I was asked for all my practice of interviewing. Well, I think it's time to move on to the questions:
1. What is the difference between ArrayList and LinkedList?
In my ranking, this is one of the two most popular questions about the collection, they ask in 90% of cases. It caused me a problem at my first interview at Junior Developer. In short, the answer to this question boils down to the following: ArrayList is an array-based list, and LinkedList is a classic linked list based on objects with links between them.
Advantages of ArrayList: in the ability to access an arbitrary element by index for a constant time (since it is an array), the minimum overhead when storing such a list, insertion at the end of the list on average is also done for a constant time. Averagebecause the array has a certain initial size n (in the code this is the capacity parameter), by default n = 10, when writing n + 1 elements, a new array of size (n * 3) / 2 + 1 will be created, all will be placed in it elements from the old array + new, added element. As a result, we get that when adding an element if it is necessary to expand the array, the time for adding will be much longer than when writing an element to a finished empty cell. However, on average, the insertion time of an item at the end of the list is constant. Removal of the last element occurs in constant time. The disadvantages of ArrayList appear when inserting / deleting an element in the middle of the list - this causes overwriting all elements placed “to the right” in the list one position to the left, in addition, when deleting elements, the size of the array does not decrease,
LinkedList, on the contrary, can insert / delete elements in the list for a constant time (namely, insert and delete, searching for the position of the insert and delete is not included here). Access to an arbitrary element is carried out in linear time (but access to the first and last element of the list is always carried out in constant time - links are constantly stored to the first and last, so adding an element to the end of the list does not mean that you have to iterate over the entire list in search of the last item). In general, LinkedList in absolute terms loses ArrayList both in memory consumption and in speed of operations. LinkedList is preferably used when there is active work (insertion / deletion) with the middle of the list or in cases where a guaranteed time is required to add an item to the list.
For in-depth and at the same time express training, I highly recommend reading the wonderful articles by tarzan82 about ArrayList and LinkedList . I also recommend an article from lany on memory consumption by collections - very informative .
2. What do you usually use (ArrayList or LinkedList)? Why?
This question is a slightly disguised version of the previous one, since the answer to this question will lead to a gradual presentation of the answer to the previous question. In 90% of cases, ArrayList will be faster and more economical than LinkedList, so they usually use ArrayList, but nevertheless there are always 10% of cases for LinkedList. I say that I usually use ArrayList, referring to the tests and the last paragraph from the previous question, but I do not forget about LinkedList (in what cases? Also the last paragraph of the previous question helps).
3. What works faster ArrayList or LinkedList?
Another disguised version of the first question. It is trickier than the above options that posing a question implies a monosyllabic answer with the choice of one of the proposed options, which, as I understand it, should immediately identify a person with a shallow knowledge in collections. The correct action will be the counter question of what actions will be performed on the structure. As a result, the dialogue smoothly proceeds to answer the first question.
4. It is necessary to add 1 million. element, what structure are you using?
Also a pretty popular hidden version of the first question. Also, the statement involves the selection of one of the proposed options, although in fact there is no information for an unambiguous choice. Additional questions need to be asked: in which part of the list are items added? Is there any information about what will happen next to the list items? any restrictions on memory or speed of execution? In general, this is the same first question, but on a slightly different side: through additional questions, you show the depth of understanding of the work of Array and Linked List.
Once I myself “pecked” at this hook, thinking to myself what to add was to “insert” at the end of the list and intensively promoted ArrayList, although I did not know (and did not try to find out) about further actions with this list and possible restrictions.
5. How are elements removed from an ArrayList? How does the size of the ArrayList change in this case?
Again, the answer to question 1 contains the answer to this question. When you remove an arbitrary element from the list, all elements located to the “right” are shifted one cell to the left and the actual size of the array (its capacity, capacity) does not change in any way. There is a mechanism for automatic “expansion” of the array, but there is no automatic “compression”, you can only explicitly perform “compression” with the trimToSize () command.
6. Suggest an effective algorithm for removing several adjacent elements from the middle of the list implemented by ArrayList.
An unbroken, by my standards, question I met only once, when I did not know the mechanism for removing elements from an ArrayList. As a result, it caused me serious difficulties. In fact, everything is quite simple and obvious when you know how one element is deleted. Suppose you want to remove n elements from position m in the list. Instead of deleting one element n times (each time shifting by 1 position the elements that are “to the right” in the list), you need to shift all elements that are “to the right” of the n + m position by n elements to the left to the top of the list. Thus, instead of performing n iterations of moving list items, everything is done in 1 pass.
7. How is the HashMap arranged?
This is the second from a list of the most popular collection questions. I don’t even remember if there was a case when this question was not asked to me.
In short, a HashMap consists of “baskets”. From a technical point of view, “baskets” are array elements that store links to item lists. When adding a new key-value pair, it calculates the hash code of the key, based on which the basket number (cell number of the array), into which the new element gets, is calculated. If the basket is empty, then the link to the newly added element is saved in it, if there is already an element there, then the links are followed sequentially between the elements in the chain, in search of the last element, from which the link to the newly added element is placed. If an item with the same key was found in the list, then it is replaced. Adding, searching and deleting elements is performed in constant time. Everything seems to be cool, with one caveat, hash functions should evenly distribute items across baskets,
In general, this answer is quite enough for the question posed, then most likely a HashMap dialogue will begin, with an in-depth understanding of the processes and subtleties.
Again, I recommend reading the article tarzan82 on HashMap .
8. What is the initial number of baskets in a HashMap?
A rather unexpected question, again, he once made me guess the number of baskets when using the default constructor.
The answer here is 16. Answering, it is worth noting that it is possible using constructors with parameters: through the capacity parameter, set your initial number of baskets.
9. What is the estimate of the time complexity of selecting an item from a HashMap? Does HashMap guarantee the specified complexity of fetching an element?
The answer to the first part of the question can be found in the answer to question 7 - constant time is necessary for selecting an element. Here on the second part of the question, I recently got confused. And the HashMap device knew and also knew about the hash function, but wasn’t ready for such a question, it rushed in the other direction in general and focused on the structure of the HashMap, pushing back the problem of the hash code, which in my head was always used to consider a hash code with uniform distribution. In fact, the answer is quite simple and follows from the answer to question 7.
If you take a hash function that constantly returns the same value, then the HashMap will turn into a linked list, with disgusting performance. Then, even if you use a hash function with a uniform distribution, in the extreme case only the time complexity of log N will be guaranteed. So, the answer to the second part of the question is no, it is not guaranteed.
10. The role of equals and hashCode in a HashMap?
The answer to this question follows from the answer to question 7, although it is clearly not spelled out there. hashCode allows you to define a basket to search for an item, and equals is used to compare the keys of the items in the list inside the basket and the key to find.
11. The maximum number of hashCode () values?
Everything is pretty simple here, just remember the method signature: int hashCode (). That is, the number of values is equal to a range of type int - 2 ^ 32 (the exact range was never asked, such an answer was enough).
12. How and when does the number of baskets in HashMap increase?
This is a pretty subtle question. As my mini-survey showed, if the essence of the HashMap device is understood by many more or less clearly, then this question often confused the interlocutor.
In addition to capacity, the HashMap also has the loadFactor parameter, based on which the maximum number of busy baskets is calculated (capacity * loadFactor). By default, loadFactor = 0.75. Upon reaching the limit value, the number of baskets increases by 2 times. For all stored items, a new “location” is calculated taking into account the new number of baskets.
13. In what case can an element be lost in a HashMap?
This interesting question was sent to me by LeoCcoder, they didn’t ask me like that and honestly admit that after reading it right away I couldn’t come up with a script for losing an element. Everything, again, turned out to be quite simple, although not so explicit: for example, an object with several fields is used as a key, not a primitive. After adding an element to the HashMap, the object that acts as the key changes one field that is involved in the calculation of the hash code. As a result, when you try to find this element by the source key, the correct basket will be accessed, but equals (after all, equals and hashCode must work with the same set of fields) will not find the specified key in the list of elements. Nevertheless, even if equals is implemented in such a way that changing the given field of the object does not affect the result, then after increasing the size of the baskets and recalculating the hash codes of the elements,
14. Why can't you use byte [] as a key in a HashMap?
Another question from LeoCcoder. As usual, everything turned out to be quite simple - the hash code of the array does not depend on the elements stored in it, but is assigned when creating the array (the method for calculating the hash code of the array is not overridden and is calculated using the standard Object.hashCode () based on the address of the array). Also, arrays do not override equals and perform pointer comparisons. This leads to the fact that you cannot access the element stored with the array key when using another array of the same size and with the same elements, access can be done only in one case - when using the same array reference that was used to save item. For the answer to this question, a special thanks goes to the user @dark_dimius.
15. What are the differences between TreeSet and HashSet?
To begin with, Set is a set (also called a “set”). Set does not allow storage of two identical elements. Formally speaking, the term "multitude" and so it means a collection of different elements, it is very important that it is different elements, since this is the main property of Set. Given this definition, an explanation about the storage of identical elements is not required, but in everyday life, the concept of “set” has lost its strict meaning regarding the uniqueness of the elements included in it, so nevertheless specify this property of the set separately.
TreeSet provides orderly storage of elements in the form of a red-black tree. The complexity of performing basic operations in TreeSet lg N. A HashSet uses the same approach for storing elements as a HashMap, with the difference that in the HashSet the element itself acts as a key, in addition, the HashSet (like the HashMap) does not support ordered storage of elements and provides temporary complexity of operations like HashMap.
16. TreeSet device?
This question is asked instead of question 14, and here is a short answer that the TreeSet is based on a red-black tree. As a rule, this is enough and the interlocutor immediately proceeds to the next question; I have never been asked the mechanism of balancing the tree or other details of its implementation.
For express deepening knowledge of red and black wood, I recommend this article .
17. What happens if you add items to the TreeSet in ascending order?
Usually, the interlocutor precedes this question with the phrase that the TreeSet is based on a binary tree and if we add elements in ascending order, how they will be distributed across the tree.
If there is no exact idea about the TreeSet device, but there is a general understanding that this is a binary tree (which the interlocutor additionally assures us of), then this question can lead to an interesting result: all elements after being added to a regular binary tree will be in one branch with a length of N elements, which negates all the advantages of such a structure as a tree (in fact, a list is obtained). In fact, as mentioned above, TreeSet is based on a red-black tree that can balance itself. As a result, TreeSet doesn’t care in what order you add elements to it, the advantages of this data structure will be preserved.
Conclusion
I hope the questions discussed will be useful to the habrayuzer. I also ask you to forgive me for some possible naivete that the above questions require such a detailed consideration, but in due time such an article would seriously help me. I am sure that there are inaccuracies in the article - I ask for comments, in addition, I hope that more experienced comrades in the comments will actively share questions from their practice and, if the article is favorably received by the habrasociety, then it is quite possible to continue the review of technical issues for Java interviews.
PS A bit of mercantile interest: the search for a new job continues and if one of the habrayuzers is in the process of searching for a Java developer in a company with a modern approach to development and interesting tasks, or may just recommend taking a closer look at any suitable vacancy - I will be grateful, please PM.