Distributed File Storage Network at Gmail.com

    Everyone has long been aware of the excellent gmail.com mail and the ability to store more than 7 gigabytes of mail there. I think that everyone also knows about plugins like GMail Drive, which allows you to store files in your account. But, now the talk is not about that, I want to tell you about a really working system that allows you to store an unlimited number of files on Gmail.com in a distributed and redundant manner. So, the task that I needed to solve three years ago, where to store an ever-increasing archive of files, many of which I will not use for a long time, since I am skeptical of paid services, it was decided to make a free one. The choice fell on gmail.com, which already provided enough storage space for mail.

    But, the following tasks had to be solved
    1. gmail.com mail size three years ago was 10 megabytes
    2. if you upload more than 600 megabytes to mail in a short period, mail is blocked
    3. if you unload more than 600 megabytes from mail in a short period, mail is also blocked
    The solution came by itself, 100 mail accounts were created, and a storage system was written.

    The system consists of the following parts:

    1. Downloading the file to the system

    Here the solution came by itself, since gmail.com does not allow you to store more than 10 megabytes, the file is divided into chunks and downloaded in parts. Tables were created in MySQL (see the application at the bottom of the post), accounts were added and a download script was written in gmail.

    2. Providing backup

    For this, a script was created that determined how many places each chunk is stored, if there are less than 2 such places (for example, an account was blocked or just a new file, the script migrated the chunk to another account)

    3. Download the file

    To download a file, randomly select an account that stores each chunk and sequentially loaded into the system.

    4. Account verification.

    A special script periodically checked accounts for account blocking and saved this information in the database.

    5. Download from other servers

    Especially in order not to produce a lot of logins, a central server was created, which logged into accounts and saved cookies. Using a special protocol, the rest of the servers could request cookies and immediately start downloading the file bypassing the login stage.

    Thus, a system was created that allows you to store an almost unlimited number of files and provides a distributed download network.

    Account Table
    Copy Source | Copy HTML
    1. CREATE TABLE IF NOT EXISTS `account` (
    2.   `ACCOUNT_ID` int(11) NOT NULL auto_increment,
    3.   `ACCOUNT_LOGIN` varchar(32) NOT NULL default '',
    4.   `ACCOUNT_PASSWORD` varchar(32) NOT NULL default '',
    5.   `ACCOUNT_SIZE` int(11) NOT NULL default '0',
    6.   `ACCOUNT_MAX_SIZE` int(11) NOT NULL default '0',
    7.   `ACCOUNT_UPLOAD_ENABLED` tinyint(4) NOT NULL default '0',
    8.   `ACCOUNT_DOWNLOAD_ENABLED` tinyint(4) NOT NULL default '0',
    9.   `ACCOUNT_BANNED` tinyint(4) NOT NULL default '0',
    10.   `ACCOUNT_ERRORS` int(11) NOT NULL default '0',
    11.   `ACCOUNT_INVITES` int(11) NOT NULL default '0',
    12.   `ACCOUNT_UPDATE_DATETIME` datetime default NULL,
    13.   PRIMARY KEY (`ACCOUNT_ID`),
    14.   UNIQUE KEY `ACCOUNT_LOGIN` (`ACCOUNT_LOGIN`),
    15.   KEY `ACCOUNT_SIZE` (`ACCOUNT_SIZE`),
    16.   KEY `ACCOUNT_DOWNLOAD_ENABLED` (`ACCOUNT_DOWNLOAD_ENABLED`),
    17.   KEY `ACCOUNT_BANNED` (`ACCOUNT_BANNED`),
    18.   KEY `ACCOUNT_ERRORS` (`ACCOUNT_ERRORS`),
    19.   KEY `ACCOUNT_INVITES` (`ACCOUNT_INVITES`),
    20.   KEY `ACCOUNT_UPDATE_DATETIME` (`ACCOUNT_UPDATE_DATETIME`)
    21. )

    File table
    Copy Source | Copy HTML
    1. CREATE TABLE IF NOT EXISTS `file` (
    2.   `FILE_ID` int(11) NOT NULL auto_increment,
    3.   `FILE_NAME` varchar(255) NOT NULL default '',
    4.   `FILE_SIZE` int(10) unsigned NOT NULL default '0',
    5.   `FILE_MD5` varchar(32) default NULL,
    6.   `FILE_DOWNLOAD_REQUEST_COUNT` int(10) unsigned NOT NULL default '0',
    7.   `FILE_DAMAGED` tinyint(4) NOT NULL default '0',
    8.   `FILE_NONREMOVABLE` tinyint(4) NOT NULL default '0',
    9.   `FILE_ACCESS_DATETIME` datetime default NULL,
    10.   `FILE_DOWNLOAD_RATE` float NOT NULL default '0',
    11.   PRIMARY KEY (`FILE_ID`),
    12.   KEY `FILE_SIZE` (`FILE_SIZE`),
    13.   KEY `FILE_DOWNLOAD_COUNT` (`FILE_DOWNLOAD_REQUEST_COUNT`),
    14.   KEY `FILE_NONREMOVABLE` (`FILE_NONREMOVABLE`),
    15.   KEY `FILE_ACCESS_DATETIME` (`FILE_ACCESS_DATETIME`),
    16.   KEY `FILE_DOWNLOAD_RATE` (`FILE_DOWNLOAD_RATE`),
    17.   KEY `FILE_MD5` (`FILE_MD5`),
    18.   KEY `FILE_DAMAGED` (`FILE_DAMAGED`,`FILE_NONREMOVABLE`)
    19. )

    Chunk table
    Copy Source | Copy HTML
    1. CREATE TABLE IF NOT EXISTS `file_chunk` (
    2.   `FILE_CHUNK_ID` int(11) NOT NULL auto_increment,
    3.   `FILE_ID` int(11) NOT NULL default '0',
    4.   `FILE_CHUNK_OFFSET` int(11) NOT NULL default '0',
    5.   `FILE_CHUNK_SIZE` int(11) NOT NULL default '0',
    6.   `ACCOUNT_ID` int(11) NOT NULL default '0',
    7.   `FILE_THREAD_ID` varchar(16) NOT NULL default '0',
    8.   `FILE_ATTACH_ID` decimal(2,1) NOT NULL default '0.0',
    9.   PRIMARY KEY (`FILE_CHUNK_ID`),
    10.   UNIQUE KEY `ACCOUNT_ID` (`ACCOUNT_ID`,`FILE_THREAD_ID`,`FILE_ATTACH_ID`),
    11.   KEY `FILE_ID` (`FILE_ID`,`FILE_CHUNK_OFFSET`)
    12. )

    Also popular now: