Newtoo - development of a full browser engine from scratch in 2018?


Hello! My name is Dmitry Kozichev.

Today I will tell you about my attempt to create my own modern web browser engine from scratch.

My engine is called Newtoo.

What is newtoo

So, Newtoo. Why did I create it?

It so happens that there are only 4 popular browser engines in the world that are so complex that the developers themselves do not know even half of their code base and are so advanced in technology that it is a waste of time to start catching up with them.

Is it really so? My project was created to repeat the exploits of modern browser engines and see how realistic it is to create a worthy alternative to large projects whose history begins in the nineties. My new engine is created from scratch, which means its story begins - today.

Idea Newtoo - show the page faster than others.

How Newtoo Works Faster

As I said earlier, the main browser engines have been developing not for the first year. Those errors that were made in the initial stages of development remain in the project to the end. The most striking example of this - smart pointers in C ++ - is an even more complex syntax, a big overhead when working, creating and deleting smart pointers. In addition, there are so many types of smart pointers and you need to know which one to use, because each has its own nuances of surprises . Look at this file from webkit. When you see such code, the syntax of smart pointers, you try to calm down and breathe evenly, but this kind of code is the whole webkit from head to toe. There are no such flaws in my engine.

What's in the box

Let's see what Newtoo is made of

. At the moment, the following parts of the project are implemented:

  • HTML parser
  • HTML serializer
  • CSS parser (selectors, rules and properties)
  • CSS serializer
  • Basic DOM API 1

The remaining parts of the project that are not yet implemented:

  • CSS cascading (css style calculation)
  • Linker
  • Render
  • JS virtual machine and events
  • Event handler and interactive page selection

HTML parser

My HTML parser can be called modern. To begin with, it is built on the standard HTML5 . It takes into account any of your mistakes.

For example, you forgot to put quotes by typing the attribute


The engine will understand you, there is an attribute value written without spaces.

You can not close the tag when it is not necessary.

<div><p>First line
   <p>Second line
   <imgsrc="ru/images/2019.png"alt="С новым годом!"><p>Third line <br> Last line

Parser supports prefixes

<myprefix:span>Hello, world!</myprefix:span>

In order to turn the page elements back into code, I wrote the HTML serializer. I think you guessed what he was doing.

How the HTML parser works

To begin with, our parser cuts our html code into pieces and determines their type.

For example, this:

<!doctype html><html><head><title>Lorem ipsum</title></head></html>

Turns into this:

<!doctype html>   - doctype token
<html>            - tag token
<head>            - tag token
<title>           - tag token
Lorem ipsum       - text token
</title>          - close tag token
</head>           - close tag token
</html>           - close tag token

These pieces are called tokens.

Tokens are divided into 6 types:

  • Tag
  • Closing tag
  • Text
  • Comment
  • Document Type (doctype)
  • Javascript or css code

Parser reads tokens from left to right. For each type of its approach to parsing.

When the parser reads the contents of the tag, the tag itself is registered in the hierarchy (hierarchy from child to parent down), and when the parser has finished reading the contents of the tag, it removes it from the hierarchy.

If it is a opening tag, it parses its tag name, attributes, and then, if it is a paragraph and there is also a paragraph in the hierarchy, deletes the paragraph tag existing in the hierarchy and adds a new one if it is not a single tag (a tag without a closing tag). If it is a closing tag, the parser removes the last tag from the hierarchy and if the last tag was a paragraph, then it deletes the last two. And if this is a code, special characters are allowed in it.

Using this method of parsing tokens, you can write <p> without a closing tag.

CSS parser

At the moment, the engine can only parse style css rules, for example:

.flex[alignment="right"] { font-weight: light; color: #999 }

By supporting only one style rules, you can already properly display the desktop version of a site.

Unlike other engines, Newtoo supports single '//' comments in the css code and does not remove them when interacting with css via javascript.

CSS parser selectors

To find out which html elements of the page should be formatted with css styles, a selector language was invented . You probably already know him.

The selector parser supports all combinators, two kinds of quotes, tag selectors, classes, attributes, multiple selectors and classes.

Here is a complete list of all supported selectors:

#Mix#ed.Selec[tor=s]"Quotes"'Alternative quotes'#descedant #child#parent < #child#previous + #this#other ~ #this
.multi, .selectors

Yes, the fourth-level selectors engine does not yet support, but I am working on it.


When my HTML parser reads our code, it creates a document object model (DOM). DOM looks like a tree of nodes, where the root is a browser window, a document is branched from it, and page elements are already from the document. You can interact with all DOM nodes through JavaScript using the DOM API.

My engine supports any changes to the DOM changes. For example, you can remake the html code of any element:

document.getElementById("article").innerHTML = "Статья исчезла. <b>Бум!</b>";

Now I will not list all the functions of working with elements, document, text, selection, believe me, there are a lot of them!

JavaScript virtual machine has not yet written, but the API is already there and works well.

Future of the project

About the prospects of the project I can not say anything, it's up to you.
If you like my engine, then I tried well.

Newtoo on github

Also popular now: