Internship at JetBrains and how I almost managed to get on it

  • Tutorial
image

Like many young developers, when there is a desire to find a job / internship, I look in the direction of cool IT companies.

Recently I tried to get into the ranks of JetBrains and under the cut I am ready to share my experience.

Why did “almost” succeed?


Surely you immediately have such a question.

In my opinion, I have a good resume with a bunch of achievements and a good skill, which I have been improving the last 8-9 years day after day.

I completed the test task (and it seems to me good), previously visited the office of JB, which is located in my city, talked to HH and some developers of the company and, as a result, was refused an internship without any comments.

Most likely, the reason lies in the fact that JetBrains selects students exclusively for the internship, and at the moment I just graduated from the 11th and pass exams one after another.

Well, this is an occasion for another whole year to pull yourself up and apply for the next year.

Analysis of the test task


The deadlines for submitting applications for internships and testing test assignments are over, which means that everyone who solved them, including me, can post an analysis of these assignments so that next year any interested student could familiarize themselves with the approximate level of assignments before starting JB internships, with which he will have to face and in which case to pull up his knowledge.

I applied for an internship with the Corotin debugger development team for Kotlin.

The task of this team during an internship for those who hit it this year will be to finalize this part of the debugger and its integration with the IDE.

The task was a little expected - to write a debugger for a small PL.

I would not say that it is complex, rather the opposite. It does not require any in-depth knowledge of the theory of constructing translators and a cool skill. But nevertheless, those who apply for an internship in this area should at least have these basics and cope with this task without any problems. I was surprised when I decided to search on github for keywords of the solutions of my “competitors” and found 1-2 more or less working-looking solutions against about 6-7 empty repositories or with a couple of pieces of code after which people gave up. Maybe I was looking badly, but nevertheless, the results did not please me. If this post will be read by people who abandoned this task - no need to do this in the future. In an extreme case, it was enough to sit on the task for a couple of days and I am sure that you would deal with it.

The text of the quest itself
Objective: implement step-by-step code execution for the trivial programming language Guu.

Attention: in the description below, some significant points are deliberately omitted. As a rule, they remain at your discretion. If it is completely incomprehensible, write to (here is the mail that I decided to remove).

A Guu program consists of a set of procedures. Each procedure begins with the line sub (subname) and ends with the declaration of another procedure (or the end of the file if the procedure in the file is the last). Execution begins with sub main.

The body of a procedure is a set of instructions, each of which is on a separate line. Insignificant tabs or spaces may occur at the beginning of a line. Blank lines are ignored. There are no comments on Guu.

Guu has only three operators: - set (varname) (new value) - setting a new integer value for the variable. - call (subname) - call the procedure. Calls can be recursive. - print (varname) - print the value of the variable on the screen.

Variables in Guu have a global scope. The program below will display the line a = 2.

sub main
set a 1
call foo
print a

sub foo
set a 2

But the simplest program with infinite recursion:

sub main
call main You

need to write a step-by-step interpreter for Guu. When it starts, the debugger should stop on the line with the first instruction in sub main and wait for commands from the user. Minimum required set of debugger commands:

i - step into, the debugger goes inside call (subname).
o - step over, the debugger does not go inside the call.
trace - print stack trace execution with line numbers starting with main ...
var - print the values ​​of all declared variables.

The user’s communication format with the debugger is left to the discretion above. You can choose either a minimalistic GDB-like interface or a console or graphical UI. The names of the debugger commands can be changed if desired.

To solve this problem, you can use any programming language from TIOBE TOP 50 and an open-source compiler / interpreter.

When evaluating the work will be evaluated:

The overall performance of the program;
The quality of the source code and the availability of tests;
Easy to expand functionality (for example, support for new language statements or debugger instructions).
A solution with instructions for building it must be published to the Git repository (for example, on GitHub or BitBucket). In the response you need to specify a link to the repository. A link to a private GitHub repository is also suitable, only you will need to add me to it.

I write in C ++, Java and Object Pascal.

At first there were thoughts to write everything in my same MPS, but I thought that it would not be very convenient for JB to check it, and I submitted the application 2 days before the closing of the submission (exams all the same ...), and it was already evening outside the window - I decided to quickly write everything in more well-known languages.

In my opinion, Pascal is most suitable for solving the problem, at least because of the most convenient implementation of strings ...

At least for me. In addition, it is in TIOBE TOP 50, so I boldly launched the IDE, namely Lazarus, because he is not commercial :) and set about solving the problem.

Despite the fact that they give JB as much as 7 days, it took me about an hour to complete the project, and the project turned out to be about 500 lines of code.

Where to begin?


First of all, you need to imagine how code debugging will work in the end.

We need to implement step-by-step code execution - that means each instruction should be presented in the form of a structure / class and, in general, the instructions should look like a list of these classes or, as in my implementation, refer to each other forming a sequence (I will write down why I did it later).

To get this sequence, our debugger needs to process the code in the proposed language, which means we also need to implement a small parser, as well as syntactic and semantic analysis of the code.

Let's start with the implementation of the parser. Because Since the Guu language consists of a set of tokens, separated by a space, it is logical to write a small and simple tokenizer first:

function GetToken(s: string; tokenNum: word): string;
var
  p: word;
begin
  s := Trim(s);
  s := StringReplace(s, '  ', ' ', [rfReplaceAll]);
  while tokenNum > 1 do
   begin
     p := Pos(' ', s);
     if p > 0 then
       Delete(s, 1, p)
     else
       begin
         s := '';
         break;
       end;
     dec(tokenNum);
   end;
  p := Pos(' ', s);
  if p > 0 then
    Delete(s, p, Length(s));
  Result := s;
end;

Next, declare enum from tokens:

type
  TGuuToken = (opSub, opSet, opCall, opPrint, opUnknown);
const
  GuuToken: array[opSub..opPrint] of string = (
    'sub', 'set', 'call', 'print'
  );

And the instruction class itself, into which we will parse the lines of code:

type
  TGuuOp = class
    public
      OpType         : TGuuToken;
      OpArgs         : TStringList;
      OpLine         : Cardinal;
      OpUnChangedLine: string;
      NextOp         : TGuuOp;
      OpReg          : Pointer;
      function    Step(StepInto: boolean; CallBacks: TList; Trace: TStringList): TGuuOp;
      constructor Create(LineNum: Cardinal; Line:string);
      destructor  Destroy; override;
  end;

In OpType the instruction will be stored, in OpArgs - the rest of the construction.
OpLine, OpUnChangedLine - information for the debugger.

NextOp is a pointer to the next statement. If it is equal to nil (null in Pascal), then there are no further instructions and you need to complete the code or return via the callback stack.

OpReg is a small pointer-register, which will be used later for a small optimization of code execution.

After the class declaration was written - I decided that the most compact and beautiful solution would be to add the parser and a little parsing in its constructor, which I did next:

constructor TGuuOp.Create(LineNum: Cardinal; Line:string);
(*
 * That method parse code line.
 *)
var
  s: string;
  w: word;
begin
  inherited Create;
  OpArgs := TStringList.Create;
  OpLine := LineNum;
  OpUnChangedLine := Line;
  NextOp    := nil;
  OpReg     := nil;
  s := GetToken(Line, 1);
  OpType := TGuuToken(AnsiIndexStr(s, GuuToken));
  case OpType of
    opSub  : begin // sub 
               s := GetToken(Line, 2);
               if Length(s) > 0 then
                OpArgs.Add(s)
               else
                begin
                  writeln('[Syntax error]: Invalid construction "sub" at line ', OpLine, '.');
                  halt;
                end;
               if Length(GetToken(Line, 3)) > 0 then
                begin
                  writeln('[Syntax error]: Invalid construction "', Line, '" at line ', OpLine, '.');
                  halt;
                end;
             end;
    opSet  : begin // set 
               OpArgs.Add(GetToken(Line, 2));
               OpArgs.Add(GetToken(Line, 3));
               w := 1;
               while w < Length(OpArgs[1]) + 1 do
                begin
                  if not (OpArgs[1][w] in ['0'..'9']) then
                   begin
                     writeln('[Syntax error]: Invalid variable assigment "', Line, '" at line ', OpLine, '.');
                     halt;
                   end;
                  inc(w);
                end;
               if (Length(OpArgs[0]) = 0) or (Length(OpArgs[1]) = 0) or
                  (Length(GetToken(Line, 4)) > 0) then
                begin
                  writeln('[Syntax error]: Invalid construction "', Line, '" at line ', OpLine, '.');
                  halt;
                end
             end;
    opCall : begin // call 
               s := GetToken(Line, 2);
               if Length(s) > 0 then
                OpArgs.Add(s)
               else
                begin
                  writeln('[Syntax error]: Invalid construction "call" at line ', OpLine, '.');
                  halt;
                end;
               if Length(GetToken(Line, 3)) > 0 then
                begin
                  writeln('[Syntax error]: Invalid construction "', Line, '" at line ', OpLine, '.');
                  halt;
                end;
             end;
    opPrint: begin // print 
               s := GetToken(Line, 2);
               if Length(s) > 0 then
                OpArgs.Add(s)
               else
                begin
                  writeln('[Syntax error]: Invalid construction "print" at line ', OpLine, '.');
                  halt;
                end;
               if Length(GetToken(Line, 3)) > 0 then
                begin
                  writeln('[Syntax error]: Invalid construction "', Line, '" at line ', OpLine, '.');
                  halt;
                end;
             end;
    else
      begin
        writeln('[Syntax error]: Invalid token "', s, '" at line ', OpLine, '.');
        halt;
      end;
  end;
end;
destructor  TGuuOp.Destroy;
begin
  FreeAndNil(OpArgs);
  inherited;
end;

Here we essentially check the beginning of the construction (i.e. the first word), and then look at the remaining tokens and their number. If something is clearly wrong with the code, we display an error.

In the main piece of code, we simply read the code in the TStringList from the file, call the TGuuOp constructors line by line, and save the pointers to the class instances in GuuOps: TList.

Ads:

var
  LabelNames: TStringList;
  GuuOps, GuuVars: TList;
  SubMain: TGuuOp = nil;

Together with code parsing, it would be nice to perform a couple more actions:

procedure ParseNext(LineNum: Cardinal; Line: string);
(*
 * Parsing code lines and define variables and labels.
 *)
var
  Op: TGuuOp;
  GV: TGuuVar;
  c: cardinal;
begin
  if Trim(Line) <> '' then
   begin
     Op := TGuuOp.Create(LineNum, Line);
     GuuOps.Add(Op);
     case Op.OpType of
      opSet: begin // define variable and/or optimisation var calling
               GV := nil;
               c := 0;
               while c < GuuVars.Count do
                begin
                  if TGuuVar(GuuVars[c]).gvName = Op.OpArgs[0] then
                   begin
                     GV := TGuuVar(GuuVars[c]);
                     break;
                   end;
                  inc(c);
                end;
               if GV = nil then
                begin
                  GV := TGuuVar.Create(Op.OpArgs[0]);
                  GuuVars.Add(GV);
                end;
               Op.OpReg := GV;
             end;
      opSub: begin // Check for label dublicade declaration
               if Op.OpArgs[0] = 'main' then
                SubMain := Op;
               if LabelNames.IndexOf(Op.OpArgs[0]) <> -1 then
                begin
                  writeln('[Error]: Dublicate sub "', Op.OpArgs[0], '" declaration at line ', Op.OpLine, '.');
                  halt;
                end
               else
                LabelNames.Add(Op.OpArgs[0]);
             end;
     end;
   end;
end;

At this stage, you can check the entry points at the time of redefinition and think of OpReg - I used it to store a pointer to a Guu variable.

Speaking of variables, I took this small piece of code into a separate unit:

unit uVars;
{$mode objfpc}{$H+}
interface
uses
  Classes, SysUtils;
type
  TGuuVar = class
    public
      gvName: string;
      gvVal: variant;
      constructor Create(VarName: string);
  end;
implementation
constructor TGuuVar.Create(VarName: string);
begin
  inherited Create;
  gvName := VarName;
  gvVal := 0;
end;
end.

Now we have parsed code that seems to be correct in syntax. It remains to analyze it and you can begin to perform the most important thing - debugging.

Next, you need to implement a little semantic analysis and simultaneously prepare everything for code execution and debugging:

procedure CheckSemantic;
(*
 * Semantic analyse and calls optimisation.
 *)
var
  c, x: cardinal;
  op: TGuuOp;
begin
  if GuuOps.Count > 0 then
   begin
     if TGuuOp(GuuOps[0]).OpType <> opSub then
      begin
        writeln('[Error]: Operation outside sub at line ', TGuuOp(GuuOps[0]).OpLine, '.');
        halt;
      end;
     c := 0;
     while c < GuuOps.Count do
      begin
        case TGuuOp(GuuOps[c]).OpType of
          opSub:;
          opCall: begin
                    TGuuOp(GuuOps[c - 1]).NextOp := TGuuOp(GuuOps[c]);
                    x := 0;
                    op := nil;
                    while x < GuuOps.Count do
                     begin
                       if TGuuOp(GuuOps[x]).OpType = opSub then
                       if TGuuOp(GuuOps[x]).OpArgs[0] = TGuuOp(GuuOps[c]).OpArgs[0] then
                        begin
                          op := TGuuOp(GuuOps[x]);
                          break;
                        end;
                      inc(x);
                    end;
                   if op <> nil then
                    TGuuOp(GuuOps[c]).OpReg := op
                   else
                    begin
                      writeln('[Error]: Calling to not exist sub "', TGuuOp(GuuOps[c]).OpArgs[0],
                              '" at line ', TGuuOp(GuuOps[c]).OpLine, '.');
                      halt;
                    end;
                 end;
          opPrint: begin
                     TGuuOp(GuuOps[c - 1]).NextOp := TGuuOp(GuuOps[c]);
                     x := 0;
                     while x < GuuVars.Count do
                      begin
                        if TGuuVar(GuuVars[x]).gvName = TGuuOp(GuuOps[c]).OpArgs[0] then
                         begin
                           TGuuOp(GuuOps[c]).OpReg := TGuuVar(GuuVars[x]);
                           break;
                         end;
                        inc(x);
                      end;
                     if TGuuOp(GuuOps[c]).OpReg = nil then
                      begin
                        writeln('[Error]: Variable "', TGuuOp(GuuOps[c]).OpArgs[0],
                                '" for print doesn''t exist at line ', TGuuOp(GuuOps[c]).OpLine, '.');
                      end;
                   end;
          else
            TGuuOp(GuuOps[c - 1]).NextOp := TGuuOp(GuuOps[c]);
        end;
        inc(c);
      end;
   end;
end;

In TGuuOp.NextOp of each token, write a pointer to the next token.
For the call opcode, we’re doing it tricky and simple - in NextOp we write a pointer to the called entry point.

We also check the output variables through the print statement ...

Maybe they were not declared before the output?

Now you need to implement code execution. Go back to the TGuuOp class and implement the Step method:

function TGuuOp.Step(StepInto: boolean; CallBacks: TList; Trace: TStringList): TGuuOp;
(*
 * That method execute instruction.
 *)
var
  Op: TGuuOp;
  CBSize: Cardinal;
begin
  case OpType of
    opSub: begin
             Trace.Add('-> Sub "' + OpArgs[0] + '"');
             Result := NextOp;
           end;
    opCall: begin
              if StepInto then
               begin
                 if NextOp <> nil then
                  CallBacks.Add(NextOp);
                 Result := TGuuOp(OpReg);
               end
              else
               begin
                 Op := TGuuOp(OpReg);
                 CBSize := CallBacks.Count;
                 while ((Op <> nil) or (CallBacks.Count > CBSize)) and (Trace.Count < STACK_SIZE) do
                  begin
                    if Op = nil then
                     begin
                       Op := TGuuOp(CallBacks[CallBacks.Count - 1]);
                       CallBacks.Delete(CallBacks.Count - 1);
                       Trace.Delete(Trace.Count - 1);
                     end;
                    Op := Op.Step(StepInto, CallBacks, Trace);
                  end;
                 Result := NextOp;
               end;
            end;
    opPrint: begin
               writeln(TGuuVar(OpReg).gvName, ' = ', TGuuVar(OpReg).gvVal);
               Result := NextOp;
             end;
    opSet: begin
             TGuuVar(OpReg).gvVal := OpArgs[1];
             Result := NextOp;
           end;
  end;
end;

To avoid access violation in the event of a loop - it is better to limit the stack, which I did.
The constant STACK_SIZE = 2048, declared above is just responsible for this.

Now it’s finally time to write the main code of our debugger:

var
  code: TStringList;
  c: Cardinal;
  cmd: string;
  CallBacks: TList;
  Trace: TStringList;
  DebugMode: boolean = true;
begin
  if ParamCount > 0 then
    begin
      // Initialisation
      if not FileExists(ParamStr(1)) then
       begin
         writeln('[Error]: Can''t open file "', ParamStr(1), '".');
         halt;
       end;
      if ParamCount > 1 then
       if LowerCase(ParamStr(2)) = '/run' then
        DebugMode := false;
      code := TStringList.Create;
      code.LoadFromFile(ParamStr(1));
      GuuOps  := TList.Create;
      GuuVars := TList.Create;
      // Parsing and preparing
      LabelNames := TStringList.Create;
      c := 0;
      while c < code.Count do
       begin
         ParseNext(c + 1, Trim(code[c]));
         inc(c);
       end;
      FreeAndNil(LabelNames);
      CheckSemantic;
      if SubMain = nil then
       begin
         writeln('[Error]: Sub "main" doesn''t exist!');
         halt;
       end;
      // Start code execution
      CurrentOp := SubMain;
      CallBacks := TList.Create;
      Trace := TStringList.Create;
      if DebugMode then
       begin
         //Out code and features
         ClrScr;
         writeln('Code for debugging:');
         writeln('.....');
         c := 0;
         while c < code.Count do
          begin
            writeln(FillSpaces(IntToStr(c + 1), 4), '| ', code[c]);
            inc(c);
          end;
         writeln('"""""');
         FreeAndNil(code);
         writeln(sLineBreak,
                 'Features:', sLineBreak,
                 '* i     - step into.', sLineBreak,
                 '* o     - step over.', sLineBreak,
                 '* trace - print stack trace.', sLineBreak,
                 '* var   - print variables list.', sLineBreak,
                 '* x     - exit.', sLineBreak);
         // Execution loop
         while ((CurrentOp <> nil) or (CallBacks.Count > 0)) and (Trace.Count < STACK_SIZE) do
          begin
            write('Line ', CurrentOp.OpLine, ' ~> ');
            readln(cmd);
            // Execute commands
            if cmd = 'i' then
             CurrentOp := CurrentOp.Step(true, CallBacks, Trace)
            else
            if cmd = 'o' then
             CurrentOp := CurrentOp.Step(false, CallBacks, Trace)
            else
            if cmd = 'trace' then
             begin
               writeln('| Trace:');
               c := 0;
               while c < Trace.Count do
                begin
                  writeln('| ', Trace[c]);
                  inc(c);
                end;
               writeln('| -> Line ', CurrentOp.OpLine, ': "', CurrentOp.OpUnChangedLine, '".')
             end
            else
            if cmd = 'var' then
             begin
               writeln('| Variables list:');
               c := 0;
               while c < GuuVars.Count do
                begin
                  writeln('| ', TGuuVar(GuuVars[c]).gvName, ' = ', TGuuVar(GuuVars[c]).gvVal);
                  inc(c);
                end;
             end
            else
            if cmd = 'x' then
             halt;
            // Check for method end & make callback
            if (CurrentOp = nil) and (CallBacks.Count > 0) then
             begin
               CurrentOp := TGuuOp(CallBacks[CallBacks.Count - 1]);
               CallBacks.Delete(CallBacks.Count - 1);
               Trace.Delete(Trace.Count - 1);
             end;
          end;
       end
      else
       begin
         // Only run mode (/run)
         FreeAndNil(code);
         while ((CurrentOp <> nil) or (CallBacks.Count > 0)) and (Trace.Count < STACK_SIZE) do
          begin
            CurrentOp := CurrentOp.Step(false, CallBacks, Trace);
            if (CurrentOp = nil) and (CallBacks.Count > 0) then
             begin
               CurrentOp := TGuuOp(CallBacks[CallBacks.Count - 1]);
               CallBacks.Delete(CallBacks.Count - 1);
               Trace.Delete(Trace.Count - 1);
             end;
          end;
       end;
      if Trace.Count >= STACK_SIZE then
       writeln('[Runtime error]: Stack overflow!');
      FreeAndNil(CallBacks);
      FreeAndNil(Trace);
    end
  else
    writeln(
      'Guu debugger v1.0.', sLineBreak,
      'Author: Pavel Shiryaev (@RoPi0n).', sLineBreak,
      'Run: svmc guu_debugger.vmc  [arg]', sLineBreak,
      'Args:', sLineBreak,
      ' /run - Run Guu code.'
    );
end.

By the condition of the job, the interface can be implemented as you like.

It would be possible to implement a full-fledged UI, fasten SynEdit to the project, but in my opinion it’s a waste of work that will not reflect the skill, and also, which will not be paid for :)

So I limited myself to a small console UI.

The code above is not complicated, so you can leave it without comment. In it, we take ready-made TGuuOp's and call them Step.

Screenshots of the solved problem:

image

image

Displaying error information:

image

image

Link to the repository of my solution: click

Summary


There are no particular results. I’ll have to devote most of the summer to a busy vacation and search for a university (well, in case I pass the exam well, of course), instead of two months of work and training in the JetBrains team.

Perhaps next year a new post will appear on Habré, already describing the internship process in JB or in another company interesting to me :)

Also popular now: