Django-nonrel Google App Engine Python Website
In this article I want to talk a little about the development of my project - egaxegax.appspot.com .
Since I am a big fan of the Python language, I decided to create my site on the popular Django framework. To use it on the free hosting appspot.com , I adapted the code to use the NoSQL version of django and the Google App Engine platform .
The site has existed since the 12th year. I use it as a platform to learn django and appengine live. It’s also interesting to study statistics on it in Google Webmasters: search indexes, queries. For example, for myself, I found out that Google indexes title headings for search, and not the contents of meta tags.
It all started with the Articles section - small notes on software topics: scripts, configs, examples of use. But the articles quickly ended. But creating new ones in a large volume does not work. Something more was needed.
Somewhere somewhere on the Web I downloaded an archive of files with lyrics and chords of songs. Added to them a couple of dozen of his rebounds and decided to put everything on the site. In total, about 25,000 files were obtained. Manually do not load so much. To do this, I wrote a song_conv.bat script that converts text files into dumps for loading table data into the GAE DataStore .
The data loading had to be divided into several stages due to restrictions on the number of write operations per day in the DataStore. It turned out to record about 700-800 records (files) per day.
In this way, I downloaded the first portion of texts with an object of about 11,000 files (records in the DataStore). After that, I wrote the song_upload.py script to download data through the POST request using the HTTP protocol. Thus, I simulate filling out the fields of the input form and processing goes through the same controller. Data loading speed has decreased, but I can debug the data insertion locally.
After loading the data after some time, when the page was opened, the error 503 Server Error: Over Quota began to appear more and more often. Having studied the logs on the server, I found out that the main users of my site were googlebot and yandexbot, which access pages with an interval of 2-3 minutes. The error occurs because the limit on the number of operations per day for reading from the DataStore is exceeded.
After looking at the documentation and examples on appengine, I realized that I did not use the cache module (namely memcache) at all. Each page opening caused a database call through a QuerySet. In the new scheme, I transfer the results of samples from QuerySet records to Dictionary lists, which are stored in the cache and are read from there upon repeated access. This solved the problem of quickly running out of read limits.
Later I added the Photo and News section.. Sections are designed as separate applications (apps). Data is stored in DataStore tables. The Photos section also uses the BlobStore file storage. All applications use the cache when fetching data.
By analogy with the Chords section, I fill out the Books section , where I spread the texts of electronic books. I get the texts of books by unpacking * .epub files using the book_conv_up.py script from the / media / scripts directory. Unlike lyrics, they are much larger in volume and cannot be displayed entirely on the page. In addition, there was a problem that the entire book could not be added to the cache due to the cache memory limit being exceeded. To do this, I read, put in the cache and display them in chapters.
To fill in the Photo and Book sections, I wrote the scripts photo_upload.py and book_upload.py as well as for filling the lyrics.
The site has built-in user authentication in Django and adding new ones with verification through Captcha.
Who cares, go to the project page in the GitHub repository django-egaxegax .
Since I am a big fan of the Python language, I decided to create my site on the popular Django framework. To use it on the free hosting appspot.com , I adapted the code to use the NoSQL version of django and the Google App Engine platform .
The site has existed since the 12th year. I use it as a platform to learn django and appengine live. It’s also interesting to study statistics on it in Google Webmasters: search indexes, queries. For example, for myself, I found out that Google indexes title headings for search, and not the contents of meta tags.
It all started with the Articles section - small notes on software topics: scripts, configs, examples of use. But the articles quickly ended. But creating new ones in a large volume does not work. Something more was needed.
Somewhere somewhere on the Web I downloaded an archive of files with lyrics and chords of songs. Added to them a couple of dozen of his rebounds and decided to put everything on the site. In total, about 25,000 files were obtained. Manually do not load so much. To do this, I wrote a song_conv.bat script that converts text files into dumps for loading table data into the GAE DataStore .
The data loading had to be divided into several stages due to restrictions on the number of write operations per day in the DataStore. It turned out to record about 700-800 records (files) per day.
In this way, I downloaded the first portion of texts with an object of about 11,000 files (records in the DataStore). After that, I wrote the song_upload.py script to download data through the POST request using the HTTP protocol. Thus, I simulate filling out the fields of the input form and processing goes through the same controller. Data loading speed has decreased, but I can debug the data insertion locally.
After loading the data after some time, when the page was opened, the error 503 Server Error: Over Quota began to appear more and more often. Having studied the logs on the server, I found out that the main users of my site were googlebot and yandexbot, which access pages with an interval of 2-3 minutes. The error occurs because the limit on the number of operations per day for reading from the DataStore is exceeded.
After looking at the documentation and examples on appengine, I realized that I did not use the cache module (namely memcache) at all. Each page opening caused a database call through a QuerySet. In the new scheme, I transfer the results of samples from QuerySet records to Dictionary lists, which are stored in the cache and are read from there upon repeated access. This solved the problem of quickly running out of read limits.
Later I added the Photo and News section.. Sections are designed as separate applications (apps). Data is stored in DataStore tables. The Photos section also uses the BlobStore file storage. All applications use the cache when fetching data.
By analogy with the Chords section, I fill out the Books section , where I spread the texts of electronic books. I get the texts of books by unpacking * .epub files using the book_conv_up.py script from the / media / scripts directory. Unlike lyrics, they are much larger in volume and cannot be displayed entirely on the page. In addition, there was a problem that the entire book could not be added to the cache due to the cache memory limit being exceeded. To do this, I read, put in the cache and display them in chapters.
To fill in the Photo and Book sections, I wrote the scripts photo_upload.py and book_upload.py as well as for filling the lyrics.
The site has built-in user authentication in Django and adding new ones with verification through Captcha.
Who cares, go to the project page in the GitHub repository django-egaxegax .