Own search on distributions of rutracker.org - implementation on Yii2

    Inspired by this post.

    Here it is described how to implement a search for distributions of rutracker.org on your own hosting / localhost.



    Preliminary agreement:
    • all operations are conducted in a unix-like environment. Unfortunately, the nuances for windows are unknown to me;
    • you are supposed to have basic knowledge of Unix shell, Yii2, git
    • personally, I see quite a few scenarios for using this (local search by hand) solution;
    • the implementation on yii2 advanced template is redundant in this case, but I'm used to it;
    • I see spinx for the first time in my life, so there may be strange things in the config;
    • in some places the decisions are quite controversial (I will be grateful for the prompts “how to correctly”).



    After reading the previous topic on this topic, to be honest, I was slightly disappointed with the implementation that the author offers. Actually, that's why I did everything myself.

    The whole project is on github , the whole code can be viewed there, here I will give only excerpts to understand the essence.

    The project implements automatic import of csv-files from this distribution (it is launched from the console), and search by name / category / subcategory of distribution.

    Details

    If you want to use the entire project as is, here is a brief instruction:

    1. clone the repository (git clone github.com/andrew72ru/rutracker-yii2.git )
    2. go to the project folder, install the components (composer install)
    3. initialize the environment (./init)
    4. create a database, configure access to it in common / config / main-local.php
    5. start the migration (./yii migrate)
    6. configure your web server to access the project (root directory - frontend / web)
    7. download distribution
    8. create directory frontend / runtime / csv
    9. put the latest version of the files from the distribution in this directory. All distribution is divided into folders, they are called dates, I took a folder with the last date
    10. run in the console ./yii import / import


    On my server, the import lasted about six hours - there are more than one and a half million entries in the distribution table, do not be surprised.

    DB schema
    Table for categories:

    CREATE TABLE `categories` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `category_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      `file_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=37 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
    


    Subcategory Table:

    CREATE TABLE `subcategory` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `forum_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=1239 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
    


    Distribution table:

    CREATE TABLE `torrents` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `forum_id` int(11) DEFAULT NULL,
      `forum_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      `topic_id` int(11) DEFAULT NULL,
      `hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
      `topic_name` text COLLATE utf8_unicode_ci,
      `size` bigint(20) DEFAULT NULL,
      `datetime` int(11) DEFAULT NULL,
      `category_id` int(11) NOT NULL,
      `forum_name_id` int(11) NOT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `topic_id` (`topic_id`),
      UNIQUE KEY `hash` (`hash`),
      KEY `category_torrent_fk` (`category_id`),
      KEY `torrent_subcat_id` (`forum_name_id`),
      CONSTRAINT `category_torrent_fk` FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
      CONSTRAINT `torrent_subcat_id` FOREIGN KEY (`forum_name_id`) REFERENCES `subcategory` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=1635590 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
    


    The table with distributions is somewhat redundant (the forum_name column is no longer needed, it is implemented as a link), but I didn’t delete it so that I could access it directly and not use JOIN.


    Models

    Models are used generated through gii with virtually no changes. I don’t think it’s worth bringing them all here (see github), except for one used for searching through Sphinx.

    TorrentSearch.php
    namespace common\models;
    use Yii;
    use yii\helpers\ArrayHelper;
    use yii\sphinx\ActiveDataProvider; // для работы
    use yii\sphinx\ActiveRecord;          // используется расширение yii2-sphinx
    /**
     * This is the model class for index "torrentz".
     *
     * @property integer $id
     * @property string $size
     * @property string $datetime
     * @property integer $id_attr
     * @property integer $size_attr
     * @property integer $datetime_attr
     * @property string $topic_name
     * @property string $topic_id
     * @property integer $topic_id_attr
     * @property integer $category_attr
     * @property string $category_id
     * @property string $name_attr
     * @property integer $forum_name_id_attr
     */
    class TorrentSearch extends ActiveRecord
    {
        /**
         * @inheritdoc
         */
        public static function indexName()
        {
            return '{{%torrentz}}';
        }
        /**
         * @inheritdoc
         */
        public function rules()
        {
            return [
                [['id'], 'required'],
                [['id'], 'unique'],
                [['id'], 'integer'],
                [['id_attr'], 'integer'],
                [['topic_name', 'topic_id', 'category_id'], 'string'],
                [['name_attr'], 'string'],
                [['id', 'size_attr', 'datetime_attr', 'id_attr', 'topic_id_attr', 'category_attr', 'forum_name_id_attr'], 'integer'],
                [['size', 'datetime', 'topic_name', 'name_attr'], 'string']
            ];
        }
        /**
         * @inheritdoc
         */
        public function attributeLabels()
        {
            return [
                'id_attr' => Yii::t('app', 'ID'),
                'name_attr' => Yii::t('app', 'Topic Name'),
                'id' => Yii::t('app', 'ID'),
                'size' => Yii::t('app', 'Size'),
                'datetime' => Yii::t('app', 'Datetime'),
                'topic_name' => Yii::t('app', 'Topic Name'),
                'size_attr' => Yii::t('app', 'Size'),
                'datetime_attr' => Yii::t('app', 'Torrent Registered Date'),
                'category_attr' => Yii::t('app', 'Category Name'),
                'forum_name_id_attr' => Yii::t('app', 'Forum Name'),
            ];
        }
        /**
         * Функция для поиска
         * 
         * @param $params
         * @return ActiveDataProvider
         */
        public function search($params)
        {
            $query = self::find();
            $dataProvider = new ActiveDataProvider([
                'query' => $query,
            ]);
            $this->load($params);
            $query->match($this->name_attr);
            $query->filterWhere(['category_attr' => $this->category_attr]);
            $query->andFilterWhere(['forum_name_id_attr' => $this->forum_name_id_attr]);
            $dataProvider->sort = [
                'defaultOrder' => ['category_attr' => SORT_ASC, 'datetime_attr' => SORT_DESC],
            ];
            return $dataProvider;
        }
        /**
         * Возвращает массив подкатегорий (forum_name) для переданной категории
         *
         * @param null|integer $id
         * @return array
         */
        public static function subsForCat($id = null)
        {
            $query = Subcategory::find();
            if ($id != null && ($cat = Categories::findOne($id)) !== null)
            {
                $subcatsArr = array_keys(self::find()
                    ->where(['category_attr' => $id])
                    ->groupBy('forum_name_id_attr')
                    ->indexBy('forum_name_id_attr')
                    ->limit(10000)
                    ->asArray()
                    ->all());
                $query->andWhere(['id' => $subcatsArr]);
            }
            return ArrayHelper::map($query->asArray()->all(), 'id', 'forum_name');
        }
        /**
         * Возвращает массив с одной категорией, если передана подкатегория
         *
         * @param null|integer $id
         * @return array
         */
        public static function catForSubs($id = null)
        {
            $query = Categories::find();
            if($id != null && ($subCat = Subcategory::findOne($id)) !== null)
            {
                /** @var TorrentSearch $category */
                $category = self::find()->where(['forum_name_id_attr' => $id])->one();
                $query->andWhere(['id' => $category->category_attr]);
            }
            return ArrayHelper::map($query->asArray()->all(), 'id', 'category_name');
        }
    }
    



    Import

    The main idea is to import categories first (category_info.csv file), then distributions (category _ *. Csv files), in the process of import distributing we take subcategories from them and write in a separate model.

    Import controller
    namespace console\controllers;
    use common\models\Categories;
    use common\models\Subcategory;
    use common\models\Torrents;
    use Yii;
    use yii\console\Controller;
    use yii\helpers\Console;
    use yii\helpers\VarDumper;
    /**
     * Импорт раздач и категорий из csv-файлов
     *
     * Class ImportController
     * @package console\controllers
     */
    class ImportController extends Controller
    {
        public $color = true;
        /**
         * Инструкция
         * @return int
         */
        public function actionIndex()
        {
            $this->stdout("Default: import/import [file_path]. \nDefault file path is frontend/runtime/csv\n\n");
            return Controller::EXIT_CODE_NORMAL;
        }
        /**
         * Основная функция импорта
         *
         * @param string $path
         * @return int
         */
        public function actionImport($path = 'frontend/runtime/csv')
        {
            $fullPath = Yii::getAlias('@' . $path);
            if(!is_dir($fullPath))
            {
                $this->stderr("Path '{$fullPath}' not found\n", Console::FG_RED);
                return Controller::EXIT_CODE_ERROR;
            }
            if(is_file($fullPath . DIRECTORY_SEPARATOR . 'category_info.csv'))
                $categories = $this->importCategories($fullPath);
            else
            {
                $this->stderr("File 'category_info.csv' not found\n", Console::FG_RED);
                return Controller::EXIT_CODE_ERROR;
            }
            if($categories === false)
            {
                $this->stderr("Categories is NOT imported", Console::FG_RED);
                return Controller::EXIT_CODE_ERROR;
            }
            /** @var Categories $cat */
            foreach ($categories as $cat)
            {
                if(!is_file($fullPath . DIRECTORY_SEPARATOR . $cat->file_name))
                    continue;
                $this->importTorrents($cat, $path);
            }
            return Controller::EXIT_CODE_NORMAL;
        }
        /**
         * Импорт торрентов
         *
         * @param \common\models\Categories $cat
         * @param                           $path
         */
        private function importTorrents(Categories $cat, $path)
        {
            $filePath = Yii::getAlias('@' . $path . DIRECTORY_SEPARATOR . $cat->file_name);
            $row = 0;
            if (($handle = fopen($filePath, "r")) !== FALSE)
            {
                while (($data = fgetcsv($handle, 0, ";")) !== FALSE)
                {
                    $row++;
                    $model = Torrents::findOne(['forum_id' => $data[0], 'topic_id' => $data[2]]);
                    if($model !== null)
                        continue;
                    // Subcategory
                    $subcat = $this->importSubcategory($data[1]);
                    if(!($subcat instanceof Subcategory))
                    {
                        $this->stderr("Error! Unable to import subcategory!");
                        $this->stdout("\n");
                        continue;
                    }
                    $this->stdout("Row {$row} of category \"{$cat->category_name}\" ");
                    $this->stdout("and subcategory \"{$subcat->forum_name}\": \n");
                    if($model === null)
                    {
                        if(isset($data[4]))
                        $data[4] = str_replace('\\', '/', $data[4]);
                        // Здесь надо проверить, определились ли поля, а то с этим бывают проблемы
                        // Можно поподробнее распарсить название и убрать оттуда все подозрительные символы, 
                        // но я решил пропускать, если возникает ошибка 
                        if(!isset($data[0]) || !isset($data[1]) || !isset($data[2]) || !isset($data[3]) || !isset($data[4]) || !isset($data[5]) || !isset($data[6]))
                        {
                        $this->stderr("Error! Undefined Field!\n", Console::FG_RED);
                        \yii\helpers\VarDumper::dump($data);
                        $this->stdout("\n");
                        continue;
                        }
                        $model = new Torrents([
                            'forum_id' => $data[0],
                            'forum_name' => $data[1],
                            'topic_id' => $data[2],
                            'hash' => $data[3],
                            'topic_name' => $data[4],
                            'size' => $data[5],
                            'datetime' => strtotime($data[6]),
                            'category_id' => $cat->id,
                        ]);
                    }
                    $model->forum_name_id = $subcat->id;
                    if($model->save())
                    {
                        $this->stdout("Torrent \t");
                        $this->stdout($model->topic_name, Console::FG_YELLOW);
                        $this->stdout(" added\n");
                    }
                    $this->stdout("\n");
                }
            }
        }
        /**
         * Создание подкатегории (forum_name)
         *
         * @param string $subcat_name
         * @return bool|Subcategory
         */
        private function importSubcategory($subcat_name)
        {
            $model = Subcategory::findOne(['forum_name' => $subcat_name]);
            if($model === null)
                $model = new Subcategory(['forum_name' => $subcat_name]);
            if($model->save())
                return $model;
            else
            {
                VarDumper::dump($model->errors);
            }
            return false;
        }
        /**
         * Импорт категорий
         *
         * @param $path
         * @return array|\yii\db\ActiveRecord[]
         */
        private function importCategories($path)
        {
            $file = $path . DIRECTORY_SEPARATOR . 'category_info.csv';
            $row = 1;
            if (($handle = fopen($file, "r")) !== FALSE)
            {
                while (($data = fgetcsv($handle, 0, ";")) !== FALSE)
                {
                    $row++;
                    $this->stdout("Row " . $row . ":\n");
                    $model = Categories::findOne($data[0]);
                    if($model === null)
                    {
                        $model = new Categories([
                            'id' => $data[0],
                            'category_name' => $data[1],
                            'file_name' => $data[2]
                        ]);
                    }
                    if($model->save())
                        $this->stdout("Category {$model->id} with name '{$model->category_name}' imported\n");
                    $this->stdout("\n");
                }
            } else
                return false;
            return Categories::find()->all();
        }
    }
    



    It is better to run the import in screen so that you can close the console. You can, of course, redirect the output to a file and read later, at your leisure.

    Sphinx

    For debian - apt-get install sphinxsearch
    I have installed Sphinx 2.2.9

    /etc/sphinxsearch/sphinx.conf
    source torrentz {
            type = mysql
            sql_host = localhost
            sql_user = webmaster # логин в MySQL
            sql_pass = webmaster # пароль в MySQL 
            sql_db = rutracker # измените на название вашей БД 
            sql_port = 3306
            sql_query_pre = SET NAMES utf8
            sql_query_pre = SET CHARACTER SET utf8
            sql_query = SELECT id, id AS id_attr, \
                    size, size AS size_attr, \
                    datetime, datetime as datetime_attr, \
                    topic_name, topic_name AS name_attr, \
                    topic_id, topic_id AS topic_id_attr, \
                    category_id, category_id AS category_attr, \
                    forum_name_id, forum_name_id AS forum_name_id_attr \
                    FROM torrents
            sql_attr_string = name_attr
            sql_attr_uint = id_attr
            sql_attr_uint = size_attr
            sql_attr_uint = datetime_attr
            sql_attr_uint = topic_id_attr
            sql_attr_uint = category_attr
            sql_attr_uint = forum_name_id_attr
    }
    index torrentz {
            source = torrentz
            path = /var/lib/sphinxsearch/data/
            docinfo = extern
            morphology = stem_enru
            min_word_len = 2
            charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42C->U+430..U+44C, U+42E..U+42F->U+44E..U+44F, U+430..U+44C, U+44E..U+44F, U+0401->U+0435, U+0451->U+0435, U+042D->U+0435, U+044D->U+0435
            min_infix_len = 2
    }
    indexer {
            mem_limit = 512M
    }
    searchd {
            listen = 0.0.0.0:9306:mysql41
            log = /var/log/sphinxsearch/searchd.log
            query_log = /var/log/sphinxsearch/query.log
            read_timeout = 5
            max_children = 30
            pid_file = /var/run/sphinxsearch/searchd.pid
    }
    



    Indexing is triggered by the command

    indexer --config /etc/sphinxsearch/sphinx.conf --all # для первой индексации
    

    indexer --config /etc/sphinxsearch/sphinx.conf --rotate --all # переиндексация при запущенном демоне
    


    That's all.
    In the web interface - standard Yii2 GridView, search - through standard filters.

    In order to complete

    it, you can develop it endlessly, if you want. First of all, you can make selective import of categories / subcategories, a more correct dependent list of categories / subcategories in GridView, the API for remote queries, and then all that comes to mind.

    Maybe I'll do it at my leisure.

    PS I really welcome comments and additions to the code, but please do not bother to write "php sucks, write in ... <insert any other language>" - we have all discussed this a long time ago.
    Comments / additions on the sphinx config are also welcome, and I want to remind you once again - I saw it for the first time in my life and used it only because the author of the original topic wrote about it. Well, for the experiment, of course, but what about :)

    Also popular now: