un1t April 9, 2013 at 08:14

bookradar.org - book search service

I want to introduce you to my project, which I have been working on over the past few months, and also to tell you how it happened. bookradar.org - a search engine for books on online stores. The service is designed for people who like to read books. Using the site, they can find out where you can buy the right book, as well as save on the purchase. The more the book is worth, the more its price varies in stores.

For example, Phil Rosenzweig’s book “The Halo Effect ... and the other eight illusions that mislead managers.” Costs in different stores from 537 to 885 rubles. The difference is quite substantial.

From idea to result more ...

Example query on a book:

Idea

I like to read books, and often buy them. honestly, I buy them more often than I have time to read. Some books may lie on my shelf for 3 years before I get to them. I have Kindle Paperwhite and Nook Simple Touch readers, I also read a lot of books on my computer in all kinds of pdfs. Undoubtedly, e-books have advantages - they can be quickly bought and read in the dark, but even so I prefer paper ones.

It was mid-November 2012, I signed up for the online course “MongoDB for Developers”. A couple of weeks have passed from its inception. Although I did not yet understand all the features of MongoDB, I already really liked this technology. There was a desire to somehow apply new knowledge in practice. And then I got the idea to create a book search site.

At that time, I already saw one such site, but to be honest, it was not very convenient, and did not always give reliable results. Perhaps it was worth looking for other sites, but at that moment I did not. I began to look at other sites after I launched my own. They turned out to be much more than I could have expected. There are more than a dozen such sites. However, this did not bother me at all. I can offer users a more convenient search, but in the future I hope more accurate and wider. And in general there is no lack of ideas)

Implementation

At first I decided that a couple of days off would be enough for me to carry out my plan, but as often happens, it took a lot more time. At work, I use Django, but frankly, for a couple of years of working with her, she bored me somewhat. Django is a great framework, but I just wanted something new, and I decided to do a project on Flask. Why exactly Flask? A random friend threw off a link to a blog creation tutorial on Flask + mongoDB and said that he had been using Flask in his projects for a long time. It was interesting to try.

I asked my wife to draw a design (Polina hello!), Specifying that the design is needed as simple as possible, without any shadows and gradients. At that time, this allowed me to save time on layout and easier to make changes.

A month passed ... to be honest, I was very tired all evening and weekend devoting to programming, and my enthusiasm was rapidly dying. I urgently needed feedback from real people. I posted the project in a minimally working form. The project has already worked, I was really looking for books in stores, but I suffered greatly from many minor bugs and shortcomings. All this was done intentionally in order to speed up the calculation of the first version. There were not even such elementary things as error handling 404 and 500, not to mention all sorts of history APIs.

Having received a positive assessment from my colleagues, I was inspired to continue working. The next month was devoted mainly to finalization, in these very small jambs.
In addition, it turned out that the real data does not correspond to what I was preparing for initially. I had to change the scheme of documents in the database, separate collections, change algorithms.

With real data, a lot of jokes. For example, I calculated that ISBN could be a unique identifier for a book. In fact, it was not so. One book can have many ISBNs. I don’t know who in the stores is filling the database, but not only can ISBNs be invalid, so instead of ISBN it can turn out to hell with anything from arbitrary numbers to some phrases in Russian. Instead of the digit zero, the symbol “O” can be hammered, and instead of the English “Ex” there can be a Russian “Ha”.
Moreover, two completely different books may have one ISBN. In theory, this is impossible. Publishers bundle some books. For example, ISBNs from two different books and authors from two different books appear in such an alleged book.

And then I ran into a performance issue. Python, like any other dynamic language, does not work as fast as we would like. For web applications, as a rule, this does not matter, because if your site slows down, it means it slows down the database, disk operations or the network. I profiled the code for a long time, optimized the algorithms. I came to the conclusion that the algorithms are normal, they slow down Python and the database.

It was necessary to rewrite a substantial part of the project in a faster language, apparently with static typing. So Scala appeared in the project, a hitherto unknown programming language)

Why Scala itself? I chose between C, C ++ and Scala. The first Segmentation Fault forced me to delete C from this list. C is a good language, but obviously not optimal for this task. Of course I looked at language performance tests, but honestly I did not believe that the speed of Java / Scala is close to the speed of C ++. So I wrote my performance test. I took a small piece of the parser and wrote an implementation of it in Python, Scala, and C ++.

Here are the results of the Parsig 1.5 GB file:

CPython 4 min 12 sec
PyPy 2 min 48 sec
Scala 57 sec
C ++ 47 sec.

The algorithm is the same everywhere, for parsing, only string operations are used. Scala also tried using some standard XML parser, but it worked much more slowly.
As you can see, the Scala speed is really close to the pluses. And writing and debugging Scala is easier. In addition, I had someone to consult with in case of misunderstandings (Ivan hi!).

When writing code in Scala, I often caught myself thinking “since this code looks wrong, you need to figure out how to do it right”. Such perfectionism could significantly slow down the development. I told myself, “dude, you don’t know this language, and therefore you can’t write right away right away, so just write to make it work!” It was psychologically difficult to force yourself to write “to work”, I wanted to do “beautifully”. But in the end, I pulled myself together and wrote “to work”.

TDD From the very beginning, all parsers were covered with tests in Python and in Scala. This is where the tests immediately accelerate development. On the other hand, there is still no test at forntand.

Monetization

On the very first day after the launch, my colleagues asked me when I was going to make the search paid. And I was not going to do it. Monetization is simple and straightforward - affiliate programs of stores. I do not plan to advertise.

The present

Now a whole mountain of shortcomings present at launch has already been fixed, although they are still present in a noticeable amount. One of the interesting tasks that had to be faced is the gluing of books from various sources. Now the gluing algorithm is already working quite well, but sometimes it still crashes. If you encounter this, write to me.

The front end is written in Python / Flask, the backend is in Scala, and MongoDB is the base.

Future plans

There are a lot of ideas, but in the near future I plan to work on the quality of the search and the correction of minor flaws. New features are cool, and they will certainly appear, but a little later. By the way, your comments can affect the order in which they appear.

I hope you enjoyed my service,
I will be extremely grateful for the advice, criticism and suggestions.

Come in - www.bookradar.org !

Tags: