To the issue of bicycle engineering in the field of electrical posting

    By the will of fate there is in my care a mail server. Small, ~ 20 users. It works stably, software change is undesirable. And it would not be necessary, but once the backup logs hinted unambiguously - if you continue in the same vein, a full backup will go all night. And the point - in the volume of user mailboxes.


    The problem is indicated, it is necessary to solve. The way ahead - to buy iron is even more powerful - not in my taste, and the budget is not rubber. The obvious option: quotas. But in practice it does not help much. Oath assurances "I cleaned everything" on closer inspection turn into seals, funny pictures and family photo archives (in corporate mail, yes). And the number of screams "I urgently burn does not work do it immediately" increases by an order of magnitude. So long and faith in people to lose.

    Fortunately, I am not a psychologist, not a coach or a mentor. My job is technology. Here we come from the technical side.

    The first thing that was thought was self-destructing messages. Roughly speaking, everything without the “important” mark is deleted after N days. For my taste, this should be “stitched” into e-mail standards. But so far this is not, and the implementation seemed to me too large-scale.

    The second thought was a copy. You know, these messages where you are not the main addressee. Comes to you just for information. Some of these messages could be deleted automatically. But, suddenly, here users divided into two camps: “they all need you what” and “what is it”. I did not master the automatic sorting algorithm with such conditions.

    Well, do not delete, so copy! Take all the copies and make symbolic links. A quick analysis showed: even processing in this way only FULL duplicates saves THIRD repository. But but but. Unfortunately, the path is a dead end due to many technical limitations.

    Details for those interested in a spoiler
    — не все архиваторы понимают симлинки;
    — ПО сервера местами сходит с ума;
    — сложности орг. характера и прав доступа.

    By the way, in my mail server settings and general backups, and archival storage for users are very scanty. So there was little room for maneuver.

    What remains? I looked at the cats with sadness


    and wondered already unpretentious neural network that would clean the mail for the user. And then ... Excuse me, permit me, but what do the seals do in the letter? I remember that a letter with an attachment weighs almost a third more than one attachment! And not whether to move the attachment? ..

    So began the path where there were "many wonderful discoveries." If I knew ... Well, you understand. A drop of ignorance and courage lead us to victories!

    So: we make keeping attachments separate from letters .

    The main mistake you can make here is to open the eml file in a text editor and decide that there is plain text. So I did. And he was delighted. Right now I'll write a batch file. Command line utilities for extracting full attachments: github.com/erikvdv1/eml-attachments or github.com/maiken2051/uudeview, offhand. There are problems with encodings, but this is not the most important thing.

    The most important thing: to take out the file and create a link to it is a trifling matter. But shove this link into the original letter ... Because there is not a text. There MIME .

    An experienced reader, of course, is now laughing at the hapless author. The author also discovered the delights of "standard". The most important thing that I realized: for falling into a berserk, the toadstools are not necessary.

    Examples and swearing - under the spoiler:

    charset=UTF-8
    charset = «UTF-8»
    charset=«UTF-8»
    charset=UTF-8;
    charset=«UTF-8»;
    charset = «UTF-8»;
    Это вот у них одно и то же.

    Разрывы строк посреди потока Base64. Откуда берутся – для меня до сих пор загадка.

    И наоборот: отсутствие \r\n\r\n после заголовочной части.

    В самом заголовке порядок полей по желанию левой пятки.

    Старые письма допускают длину строки не более 80 символов, включая служебные.

    В именах файлов могут быть разрывы строк (в теле письма, а не в самом имени).

    Вообще разрывы строк могут быть где угодно, это при том, что в стандарте разрыв строки заявлен как конец текущего параметра.

    Сам текст письма кодирован. Как конкретно он кодирован, остаётся на совести конкретного сервера, вариантов там куча (смердячая).

    А, и в письме почти всегда есть и html-часть. То есть, если шлёшь «Привет» и там есть тэг br или p, то в письме всегда будет ДВЕ секции: с просто текстом и с тэгами. И текст задублирован. А вот здесь они «сэкономили» вычислительные мощности… Просто какой-то зверинец с Франкенштейном.

    Имя файлов у них бывают так: filename="=?encoding?type?; а бывает так: filename*0*=encoding'' (ШТА??!!). Второе – это более новый стандарт, RFC5987. В стандарте прямо указано, что filename*0*=ENC и filename="=? одно и тоже. На этом месте я окончательно убедился, что они издеваются. Как это можно нормально обрабатывать, я не знаю.

    Отдельно, как водится, отличился Apple. У них вообще какой-то свой стандарт. Забегая вперёд, долгие попытки обработать их код привели к единственно верному решению: «Error: Apple mail is not supported.»

    Хотя Thunderbird справляется. С горя я полез в его исходники, но найти нужную секцию в полутора гигабайтах кода на смеси питона и диалектов явы не смог. Полез в их IRC, где мне любезно подсказали, где искать, но всё равно не нашёл.

    But he did not lose heart. Documentation do not read @ code write, and ready. No, seriously, I had to do something to bring the end of MIME closer.

    Batch-script has not done. The result is a command line utility in C # and dotNet .

    The utility has two modes of operation:
    First: it simply extracts attachments. It works correctly with encodings under Windows.

    Second: and then the main fun. Now we can still keep mail attachments separate from the mail! The utility creates a new letter instead of the old one : the attachment is cut, the letter is reformatted in plain HTML with UTF encoding.unlimited string length. The text / plain section is taken as the basis. If there are tables in the html section, it transfers them while preserving the formatting inside the table, but this functionality works so-so. At the end of the text of the current letter (if it is an answer or forward) links to network resources are inserted with the path to the extracted files in the file: /// and ftp: // formats.

    image

    The system was tested on 10,000+ letters and deployed on the existing infrastructure.

    Identified advantages:
    + was:
    Backup
    was started at 01:00:08
    and ended successfully 03:26:32

    It became:
    Backup
    was started at 01:00:09
    and successfully completed 01:40:36

    + Saved 30+% of the storage: the files are leaving the heavy Base64 and their ilk in the normal format of the file system, plus a lot of duplicates were found even inside separate boxes.

    + Increases the speed of processing mailboxes server and mail programs.

    + Disappears "I opened the letter from the mail, I edited it for 10 hours and it was not saved"

    + You can refuse quotas.

    + It remains possible to find an attachment in the mail, in contrast to the simple transfer to the file storage.

    + Approaching the end of MIME. Repent, authors!

    Cons of the solution:

    - some letters (but not attachments) are still fighting. Mostly not internally, but when viewed in some clients;
    - some devils constantly break in ftp;
    - not all email clients support opening via file: ///

    Disputes

    :? Apple mail not supported. For me, the Buddha is with him;
    ? Fight letters with complex formatting. These are usually Booker flyers or advertisements;
    ? If the ftp server is on a non-standard port, then there may be problems with access. I decided to mail bot.

    So the problem was solved by a thorny path.

    Thanks for attention!

    Only registered users can participate in the survey. Sign in , please.

    There are similar examples of cycling. Is it worth writing about them?


    Also popular now: