From physicists to Data Science (From the engines of science to office plankton)
Not so long ago, namely twelve months ago, my last year of graduate school began at the physics department of the university under the name University of California, Davis. There was a legitimate question - what to do next? For teaching, the movement of science and other entertainments, the year will pass very quickly. It was necessary to decide in advance. The main plan was to find a postdoc position, somewhere in Tokyo, Rio de Janeiro or Singapore, so it’s kind of like traveling and sort of working. And in theory, everything was prepared for this matter: articles, and acquaintances, and knowledge in certain areas of condensed matter physics - for three. I began to actively google sites of different universities in geographically interesting parts of the world, wrote a science-oriented CV, subscribed to newsletters where postdoc vacancies are published, hinted to all my friends, what if something I need to say first. I even chatted with some professors on Skype about the work in their research groups. In general, everything went somewhere.
At about the same time, a friend of mine drove into our town, who at one time also graduated from our valiant faculty, but a couple of years earlier. For the last couple of years, he poked around and poked at various offices, and finally found a job in a position called Data Scientist. We sat in the bar, chattered - what he was doing, especially not hooked (every day you try to figure out what and where to quantize to describe the properties of nanomaterials, stories about how something is aggregated somewhere in a database and why it important for some sales of office supplies, generally does not catch), but the salary hooked. For reference, in the United States dirty, that is, before taxes:
And my acquaintance without a specialized education and without work experience was immediately accepted at $ 100k. That is, famously traveled the entire unfriendly team of post-docs. (As it turned out later, he also cheapened, he had to go 130-150k, so he would have bypassed the professors.)
But happiness is not in money, and not even in their quantity. Money is, after all, a tool and nothing more.
My friend left, and I plunged back into the abyss of the academic environment. I just gave lectures in that quarter, that is, to prepare a lecture, and quiz, and to answer students' emails. I wound up. But as the quarter ended, I again began to think about where to go after graduation.
What always confused me in the academic environment was how ossified it was, in the sense, not dynamic. Everyone sits in a comfort zone and refuses to get out of it. Boring. But judging by the films about Silicon Valley, everything is dynamic and youth. But I didn’t want to go to programmers, firstly, it’s not interesting, and, secondly, they won’t take me there. All my near-programmatic knowledge is self-education, but there is no fundamental education in this area. Then just another acquaintance showed up, which just recently was released and he also got a position as a Data Scientist. Either he by himself hangs better on his ears, or his work was indeed more interesting, but this time I was hooked.
Started google: nothing is clear. Data Science is mentioned in completely different contexts; descriptions of requirements for a Data Scientist position are fundamentally different from job to job. A bunch of beautiful words about big data and artificial intelligence, but they do not add up to the big picture.
It was necessary to start somewhere. And the first step I took was to sign up for a Data Science specialization on Coursera.In the specialization of 9 courses, each month long, so all this in theory stretches for 9 months, but I did not have this time. It was January, and I was going to graduate in June. That is, there was really no time left to gain knowledge in a completely new field for me, and even find work. Therefore, I took these 9 courses three at a time. Sometimes it was hard, but overall it is real.
What I learned from this specialization: Data Science is a dark matter, in the sense that everyone is trying to pull these two words on everything that is somehow connected with the data. But it became clear that the universal Data Scientist should know the statistics, understand machine learning and be able to write code in R, Python, Java, Scala.
It was March. A certain structure appeared in my head, but since this specialization is very basic in level, and lecturers, from the point of view of the level of their teaching and the general organization of the courses, frankly speaking, are in the three, I did not get much out of there. But! In one of the courses, a site was mentioned on which you can practice your knowledge of machine learning, namely kaggle.com . And in my case, information about the existence of this site greatly helped me with a further job search. I poked, a couple of competitions failed miserably, but then I got involved and for the next many months, despite the chronic lack of time, I participated in all competitions.
In parallel, I wrote the first version of my resume, tried to improve my profile on LinkedIn, I even got a couple of interviews. But in general, time passed, I was not looking for work very actively, word for word - June, I defended myself, you can’t postpone the job search anymore. And here I rolled up my sleeves and began to act.
Before that there were tales. And now I’ll try to write in a more structured way, because finding a job is a serious matter.
The interview process for a Data Scientist position in the United States consists of the following stages:
- The resume goes to the recruiter.
- If the recruiter liked the resume, you are taken to the next step, namely, a telephone conversation with the recruiter.
- If this telephone conversation was successful, you proceed to the next step, namely the conversation with a member of the Data Science team.
- If this conversation was successful, you go to the next step, namely a telephone technical interview. Usually in shared google docs, collabedit or in some similar tool.
- As a rule, if there are no doubts about your technical skills, then you will be invited to an on-site interview, where you will be interviewed by a bunch of different people for many hours with a lunch break.
- If the past stage has passed normally, they will make you a job offer and you will begin to specify the details. (negotiation)
This is a standard set, but the question of how to hire the right Data Scientist is very acute. Therefore, each company has a slightly different approach, for example, after a technical interview, they can give you some data at home that you need to somehow analyze and present in the form of a presentation on an onsite interview (for example, Pivotal, Bidgely, Uptake), or this task will be given before a technical interview (Example - Capital One). They may be asked to solve puzzles at HackerRank (Again, Capital One). Or they can skip technical and immediately invite to onsite (Example Affirm).
Between each of the steps can take from one day to several weeks, so you need to start in advance! In large companies like LinkedIn or Google, you can safely apply for a job 9 months before graduation. (This was one of my serious miscalculations, I did not expect that finding a job takes so much time.)
Each of the steps in this process requires different skills. So.
LinkedIn Resume / Profile
First, you should look good on paper. This is a LinkedIn profile and your resume. (Anyone interested can come to me on LinkedIn and copy everything that you like from there. It worked for me, maybe it will help you somehow.)
A common mistake of people who compose resumes / fill LinkedIn - they insert into it what they are proud, and not at all what they really need to write there. For example, the criterion for your personal steepness in the academic environment is your articles (the order of the authors is important, it’s better to be the first - it’s terribly fashionable), speeches at conferences and other achievements that are generally indifferent to everyone outside of your closed world. They are dear to you, you have been thinking hard about them over the past many years, but you do not need to write about them. In extreme cases, you can mention.
It is necessary to write in the resume that is sold for this particular vacancy. In theory, it is better to write a separate resume for various vacancies, but this is very dreary. The main task of the resume is for the recruiter to contact you and schedule a phone call.
All my work experience is industrial mountaineering, the teaching and movement of science at the university, and military service. This is not to sell.
Scientific publications and speeches at conferences on topics that are not directly related to Data Science - do not sell.
Education is for sale, but bad. I have a strong impression that in the San Francisco Bay Area no one wants to look at your resume if you do not have experience, PhD in something, or at least a master's degree in Computer Science. This is further complicated by the fact that graduates (Fresh Grads) are divided into first-class people (Stanford and UC Berkley graduates) and everyone else. It is common and expected that you won’t get phone screening just because you don’t have PhD, and even if you have one, you won’t get phone screening anyway because you are not with Stanford. (There are a lot of startups with a strict rule. Recruiting only from top schools. I don’t know about big companies, but I think that they are more adequate to the process and suffer less from such garbage). In short
It is well appreciated if what you did during your studies at the university is related to data analysis, especially if the recruiter can understand at least at the idea level how this knowledge can be applied to the company. (Here you can lie, but not very much.)
Several lines in my resume are devoted to the results of the machine learning competitions mentioned above. I have been bugging a lot of time at kaggle, so it fits well with my resume.
An important but non-obvious piece of resume is Communication and Leadership. The idea is that the academic environment deforms the personality in the sense that it is difficult to communicate with “nerds”. Plus, they often do not know how to work in a team. Here, my teaching came in handy, at least as a line in the resume, which tries to say that I can explain technically complex topics to people who understand little about this.
And still a lot of free space. There I entered the names of the online courses that I took on coursera and edx on topics related to Data Science and a subsection called Independent Coursework. Americans love the word Independent, and Coursework sounds good.
Actually that's all. In fact, only a scientific degree, kaggle and a lot of water. But God be with him. The task of the resume is to get phone screening.
It turned out somehow hereso .
How to make your resume go to a recruiter?
- LinkedIn - a list of vacancies that were created on LinkedIn itself, as well as those vacancies that LinkedIn pulled from other resources. The disadvantage is easy access to the list of vacancies and as a result there are a lot of applicants. 300 - 1000 applicants for one vacancy - this is normal. Of the advantages - there are many vacancies, you can massively apply to everything that is possible.
- dice.com - there are some vacancies there, but I have not received a single interview with them.
- monster.com - also some vacancies, but I registered on it quite late
- The jobs section on kaggle.com
- Friends who work somewhere can advise you. (I got an interview with Google and Pebble)
- Friends who received the job offer rejected it, but advised you instead. (I got an interview with Uptake and Bidgely)
- The Career Fair at UC Davis was useless, and the Career Fair at Stanford or Berkley would not leak without an appropriate student card. But I caught it late. If earlier the brain turned on, maybe something would have come up with.
- Meetups - in the Bay Area almost every day various meetings are held on topics related to Data Science. There, at least you can meet someone, but at the maximum, you can impress (Recent example - I had to wait for traffic jams, I went to the meeting, which was held nearby called Deep Learning in Natural Language Processing. And I never Separately, both neural networks and NLP work, and when you cross them, the result was mediocre. So I went in and got enlightened. But I didn’t guess. They were all inexperienced, so I lectured them at the board for two hours that I know on this subject and the next day a couple of those present at the metapi napi Ali, that they have a vacancy at work well here just for me. But it is rather an exception. And mitapy where I do not learn anything I do not like.).
And it seems that you have a wonderful resume, and you send it out, but something does not answer you. One of the problems is that in large companies where experienced recruiters are many candidates, and they are recommended, and you are on your own. You are lost in the total mass. But, recruiters are adequate. (In a good way, I note Googe, Pivotal and LinkedIn. I especially mention Mikhail Obukhov on LinkedIn , I don’t know what he wrote in the report on the results of the interview, but he asked good questions and exclusively on the case)
The situation with startups is different - there the recruiters are young and inexperienced and do not really know what they want to see in the resume. For example, job announcements for large companies are short, but specific, and for many small startups - a sea of requirements. For example, there was one startup that I wanted from a potential Data Scientist:
- Knowledge of machine learning algorithms at expert level.
- Expertise knowledge of statistics
- Knowledge of genetics at expert level.
- Writing ability production quality code
- Ability to work with all kinds of databases.
- Naturally, you should have a PhD in a technical field.
And another sheet of requirements. Moreover, they did not offer candidates a job, but offered a low-paid contract for several months, according to the results of which perhaps you would be transferred to a full time job. To find a candidate who meets these criteria and even agrees to work for food is unrealistic.
This is me to the fact that you need to send your resume everywhere and everywhere. Even if this vacancy is not interesting to you. Each interview is an interview experience. And this experience for someone who does not know how to do this is worth its weight in gold.
A recruiter is calling you. What does he want? And he wants (more often it is she) to supplement your resume with comments.
- Why do you want to work in our company?
- Have you finished your education and if not, when are you finishing?
- Do you interview other companies?
- What is your visa status? And when will your visa allow you to go to work?
- What is your experience with data?
- A bunch of questions on the resume with answers to which you must convince her that she will not get a hat for wasting precious time when she passes your resume with comments to someone to whom she should pass it.
Everything is straightforward here. The better your resume is, the less wacky questions you will be asked. And when I say that the resume is good, I do not say that you are well covered there - I say that the recruiter will like this coverage. In essence, your resume should be tailored to her / his expectations of you. Usually, this step goes to the next step without problems, although there are exceptions - I flew once because the office works with some secret data, and I cannot be allowed to access them due to Russian citizenship.
Conversation with a member of the Data Science team
A conversation is similar to a conversation with a recruiter, but more technical. They no longer ask about a visa.
Muddy motives begin. People come to Data Science from various directions: Computer Science, Statistics, Physics, Math, Economics, Biology, etc. Moreover, they usually begin to interview almost immediately. That is, there is really no work experience, no interviewing experience, but there are ideas and a desire to practice. And then they come across to you ...
They want a lot of different things from you.
- Give an example of your work with data?
- Here is such a task for you, how would you approach it?
- But what problems would we have if we took this data, this algorithm and tried to answer this question?
Then I shot a cough. After half a year of working on various puzzles on this subject, I can gabble for hours. But without kagla I would have been swimming a lot here. The range of questions on one side is narrow - about the data and about your experience, but with a friend it is immense, because they can ask about anything from Machine Learning, Statistics, about Use Case, and it is not necessary that the question will be at a basic level. And on different topics driven. They can about Natural Language Processing, about Credit Card Fraud Detection, or about Recommender Systems. And there are no guarantees that they themselves understand this topic at least somehow. Often they like to ask questions to which they do not know the answer and themselves are tormented with them at work. You train to go through interviews, and they train to interview people for you, and as you know, stupid questions are easier to ask than to answer.
There was such a case. At Pebble, a guy asked me: " Geeks use our products, but how would we begin to promote our watches in a negikovskoy environment ?" I replied to him: “ Without the Science of Science, I’ll tell you - fire your designer. In your watch, no self-respecting president will declare the third world war, even if he wants to. He will simply be ashamed of publicizing your products on hand. ” The next morning I received an email saying that I was not suitable for them. But God be with them.
What helps is to go to GlassDoor and see what questions are asked by data Analyst, Data Scientist, Software Developer, and all of them will be resolved. This is not a panacea, but often on an interview come across puzzles that have met somewhere.
Do not go out on this knowledge alone, you need to think with your head. For example, there was such a question - but how would you reproduce the Swype algorithm? The experience of kagle competitions helped me, I generated ideas like a fountain, and, as it turned out, my interviewer was very impressed.
Again, in larger campaigns or large startups, more intelligible questions, more adequate interviewers. For the better, I note (LinkedIn, Google, Pivotal, Bidgely, Affirm). For the worse (Pivotal, Pebble, Turn, Workday, Leap Motion). (Pivotal twice, because I went through this stage twice. And once I got on a self-confident aunt with a low coefficient of intelligence and I did not grow together with her.)
You will be interviewed by a member of the Data Science team. For two google docs or collabedit or something like that is shared. In this case, you are in a state of telephone conversation. So you will need to speak and type at the same time. A telephone headset will come in handy.
Questions will be different.
- Problems of probability, especially problems of Bayes theorem.
- Programming. Usually it is python, R, Java.
- Machine Learning - Theory
- Algorithms and data structures
The range of questions is immense. If you are interviewed by a person whose specialization in the university was statistics or this vacancy requires deep statistical knowledge - you will be severely pressed on this topic. If he has an education in Computer Science - you will be declined on this topic, etc.
Prepare - solve problems with GlassDoor and improve your background in all directions.
This is a marathon for several hours in the office of the company. And if you get from another city you will be paid for the flight and hotel. (I flew to Chicago so pleasantly)
A bunch of different people communicate with you for half an hour or an hour each. In the middle, usually lunch.
Usually one at a time. But LinkedIn works beautifully, in pairs. An experienced one is pressing you, and the second, who has recently settled down, is studying with a senior comrade, although sometimes he also asks questions.
Here and writing code on the board, and how would you attack this problem, and questions on the theory, and just talk for life.
Onsite is a murky affair, and the introversion that is developed during postgraduate studies is a hindrance. Here they test you for technical skills, and try to determine your IQ, your way of thinking, and in general they will be able to work with you or not. The fact that you were invited to their office on onsite does not mean that they will give you a job, but, as a minimum, it means that you are being seriously considered. Statistically, about half of the candidates who reach this stage receive a job offer.
Finding a job in Data Science in the San Francisco Bay area is not easy. Especially if you do it like me at the last moment. This is a nervous process that takes a lot of time and effort. Largely because the process itself is long. And at the same time you are interviewed with many companies in parallel. Two to three interviews a day is normal. Getting annoying at first, and then you get used to it. I graduated in June, and received a job offer only in October. Yes, this is the first job, and finding it is hard anyway. But every day of these months is dying nerve cells that cannot be restored, and not only from you, but also from your friends, family and all those to whom you are not indifferent.
Is it possible to cut a corner and not cut through all these stages, or at least simplify them? Yes you can. There are organizations that take talented graduates, train and help find jobs (Example Insight Fellowship , Data Science Incubator ). But! The number of places is very limited, and the number of applicants is huge. And on paper, they almost certainly look better than you. But I know a few people who were selected at Insight and had no problems finding a job. So, I recommend actively submitting it to these organizations to all my friends who will face this whole saga of finding work in Data Science.
Another opportunity to cut a corner is the internship in some company. If I were smarter, I would try to squeeze into some kind of internship every summer that I studied at UC Davis. Life would be much simpler.
The question arises: was it worth it and what has changed in general?
An alternative to job search in Data Science was the postdoc position. Of the benefits - the same familiar workflow as in graduate school, a sea of free time. Of the shortcomings, the number of vacancies per postdoc position is very limited and finding such a vacancy is much more difficult than finding a job. That is, to choose a place where to live and work, almost none. For money, everything is sad, and it is not clear what the prospect is. About 3% of those who go to a postdoc after postgraduate study, after 5-10 years of postdoc, find themselves a professor’s position, again where they will be given this position, and not where they want to. As a rule, almost all (there are sincerely encouraging exceptions) my familiar post-docs simultaneously move science and search for jobs, many of them in Data Science.
Data Scientist Position:
- More pleasant for the money. For example, during my graduate school studies, I always lived in some kind of hacks, taking them off with other people, usually unfamiliar or not interesting to me. Now I am renting a one-room apartment, and when I get home, my apartment does not cause disgust. This is a temporary house.
- Unpleasant in time. I don’t like selling my time for money. I prefer to sell it for knowledge. And there are no guarantees that during the working day you will do what you are interested in. This is a sad fact. It is partly solved by the fact that working time is spent not only on work, but also on self-education. Nevertheless, after many years of work on some fundamental tasks, the search for bugs in undocumented code is perceived as a transformation into office plankton. But there is hope that after the introductory period the tasks will be more interesting.
- At this stage, a lot of knowledge comes from psychology. For the past many years, I have been spinning around in the academic environment, and what do students, graduate students, and professors live by. Their value system is what they live and breathe. They are like relatives. It's boring. But what people live in Silicon Valley, how everything is arranged and functioning, and who all these people are, I don’t know this and now knowledge flows in this direction in a wide river.
- When looking for work, I flew into the fact that there were no links and a line in the summary that I have work experience. Now there is both. It is assumed that if I want to change jobs, life will be a little easier.
- Again, it’s easier to choose a place of work / residence. Bored in Silicon Valley - I will go, as planned in the first paragraph, to Tokyo, Rio or Singapore.
What changed? Yes, essentially nothing has changed much. As in graduate school, I sit at the computer all day, staring at the monitor, and poke fingers on the keys on the keyboard, and in my free time climbing, dancing, beer, snowboarding in the winter and other entertainments.
I don’t know how and what will happen in a year, but so far everything is going fine.
The third part.