Google, where are you doing my place in GMail? Do you know exactly how shortcuts work in GMail?


    I began to notice that out of 15 gigabytes of free space provided by Google, my mail already occupies almost 12 gigabytes. And this trend does not please me.
    On the other hand, I use Thunderbird with full synchronization as an email client. Those. all letters must be uploaded. So the Thunderbird folder with all letters and indexes takes up only 3 gigabytes. Although according to the logic of things, the size should not just more or less coincide with the occupied space on GMail, but be larger, because Thunderbird does not archive letters, but stores it as is and builds indexes to speed up the search.
    The problem is on the face! We begin to get to the bottom of the matter.

    I started by going to the shortcut (yes, in the case of GMail it’s right to say the shortcut, not the folder, details here) “All mail” and saw that I have a little more than 500 thousand messages. The situation was complicated by the fact that I have about 100 shortcuts! And shortcuts in GMail are typical folders in Thunderbird. I did not find how to quickly calculate the total number of letters in Thunderbird. But looking ahead, I’ll say that in it I have about 200 thousand of them. From here it becomes clear why there is less space on the disk.
    But the same question still remains: what kind of 300 thousand messages are in GMail that are not visible in Thunderbird, but occupy a place on GMail?

    The inquisitive mind + the desire not to sleep at night + the desire to touch Go on a real task led me to the decision that I need to take the Go compiler, study the GMail API and see what’s under the hood of GMail.
    A Brief Summary of Go Impressions
    Only the laziest did not write about error handling in Go. Only on them did I pay attention more closely.
    Otherwise:
    • Started writing the next evening
    • Another language
    • Life will force - I will write on Go
    • For me, C / C ++, Python, Java (and PHP too) are also languages ​​for their niches.
    • I guess I'm just omnivorous

    And the article is not about Go.

    As I noted above, I have about a hundred shortcuts. Emails usually have one label. And I wanted to find out how many letters I have marked with each label and how much they take up space in total.
    I did not find a way to find out the sizes of labels in the GMail web-interface (the volume of letters marked with one or another label).
    I rolled up my sleeves, installed the Go compiler, raised MongoDB in the Docker container (Yes, I’m such a pervert! But this is my pet project and what I want, I use it, especially for training purposes) and started shit-writing .
    Further I will refer to this my project of mine .
    I take all my tags with GMail and put them in the Users.labels: list database :
    GMailMessagesSize -importLabels -mongoConnectionString 10.211.55.5
    Imported labels: 112
    

    I take away the ID of all messages that are in the Users.messages: list box :
    GMailMessagesSize -mongoConnectionString 10.211.55.5 -importMessages
    Processed 100 messages
    Processed 200 messages
    Processed 300 messages
    .......
    Processed 523100 messages
    Processed 523115 messages
    

    Of course, it doesn’t climb quickly, but I could not find how to parallelize it here (the API does not allow it).
    So far we only have a list of message IDs, and we need to know its labels and size about each message. There is a Users.messages: get method for this . But it does not work out quickly, even despite the fact that in the request I indicate exactly which fields interest me (internalDate, labelIds, sizeEstimate). Something I did not find
    implementation of Batching Requests .
    But I write on Go and it’s a sin not to use goroutines! No sooner said than done. We pull the information into the number of threads (as much as we want, but I set a limit of 50). If the Internet is fast and the computer is not stupid, then we begin to quickly rest on the limit of raterequests from Google. The script can be stopped and continued, or you can just wait hard, because when the limit is triggered, gorutins sleep for 5 seconds and then continue to torment Google. Yes, it would be possible to increase sleep time each time, for example, twice and not forget about the restriction from above. But in this case, a simple 5 seconds is quite a solution.
    I processed my 500 thousand letters in total, it seems, in about 3 hours. In general, sane time.
    GMailMessagesSize -mongoConnectionString 10.211.55.5 -processMessages -procNum 20
    ............................Procecced 100 messages
    ............................Procecced 200 messages
    ............................Procecced 300 messages
    ....
    ............................Processed 523100 messages
    ............................Processed 523115 messages
    

    There not only points popped up. If you run into the limit, then instead of the point S (sleep) or maybe the message has already been deleted, then NF (NotFound).
    As a result of all the above suffering, MongoDB has a collection of shortcuts and a collection of messages:
    { 
        "SizeEstimate" : NumberLong(63422), 
        "_id" : ObjectId("5677188d2afd90a80e5e06f2"), 
        "id" : "136b83b1ff739dec", 
        "internaldate" : ISODate("2012-04-15T22:47:51.000+0000"), 
        "labelids" : [
            "CATEGORY_PROMOTIONS"
        ], 
        "processed" : true
    }
    

    Now you have all the data at hand to start analyzing it.
    At first I decided to export to CSV information on labels, number of messages and their total size.
    GMailMessagesSize -mongoConnectionString 10.211.55.5 -showSizes
    LabelId;Label name;Messages size;Messages count
    Label_11;Archives;21279;4
    Label_12;Archives/2012;18684;3
    CATEGORY_FORUMS;CATEGORY_FORUMS;519396295;30038
    CATEGORY_PERSONAL;CATEGORY_PERSONAL;5040188875;268116
    CATEGORY_PROMOTIONS;CATEGORY_PROMOTIONS;2990655727;36508
    CATEGORY_SOCIAL;CATEGORY_SOCIAL;205976374;6553
    CATEGORY_UPDATES;CATEGORY_UPDATES;2769764066;180729
    CHAT;CHAT;0;0
    DRAFT;DRAFT;82817;6
    IMPORTANT;IMPORTANT;6600492209;159268
    INBOX;INBOX;40306538;334
    UNREAD;UNREAD;479586429;11678
    .....
    Label_97;INBOX/Coursera;6021524;151
    Label_77;INBOX/Временная;1077571;28
    Label_63;INBOX/Ответить!!!;6195999;12
    Label_67;INBOX/Поездка в США;1693366;11
    

    This is a CSV, which was convenient for me to open in Excel and study (sort and filter).

    And at this stage I was seriously thinking. What are 6 gigs of some important (with IMPORTANT label) messages? What is 11678 unread messages (labeled UNREAD)? I (as I thought) have all the messages read! Even if you type label: unread in the GMail search bar, it displays a total of 106 unread messages! What's happening?

    Googling this situation led to forums where others wondered - why messages deleted in Thunderbird are not deleted in GMail? Well there are many different cases. I will tell you about the most, in my opinion, sad.

    At this point, those who use GMail exclusively in the browser may regret reading this article. BUT!!! You can read mail, including from a mobile. And maybe you have a non-GMail client there. In this case, maybe you have the same problem as mine!

    I will not continue to languish and tell you what is happening.
    Watch your hands. The sequence of events is as follows:
    1. A letter arrives in GMail
    2. The letter is assigned labels INBOX, UNREAD and ( this is important here ) maybe some additional label, for example CATEGORY_PROMOTIONS
    3. In the mail client you opened the letter. The label UNREAD has been removed.
    4. In the mail client you deleted the letter
    5. Drum roll: The INBOX label has been removed. And ... everything, nothing more
    6. Post has label CATEGORY_PROMOTIONS

    Messages labeled CATEGORY_PROMOTIONS are displayed if you type in the search: category: promotions Do you often do this?
    If it’s very short, then the letters are simply not deleted! I delete them, and they remain on GMail.
    Here is the time to remember about archiving letters . And it looks like this is the case!
    When Thunderbird deletion is configured through “Mark for deletion”, then “Compression”:

    And what should the daw be put in the basket:

    That happens EVERYTHING is equal to archiving !
    Total: letters go to the archive. And the archive from the point of view of GMail is letters that do not have visible labels and have not been in the basket.
    On the one hand, it's okay. But letters can always be found through a search.
    What if I don’t want to? What should I do now?
    How to find and delete all messages from the archive? Here is a good answer. But I didn’t dare to delete everything at once.
    By the way, in the search bar, I did not find a way to show messages that have only one specific label. Those. For example, I decided to delete all messages that have the label CATEGORY_PROMOTIONS and no other. I definitely do not need these advertising letters in the archive. By the way, how many are there?
    GMailMessagesSize -mongoConnectionString 10.211.55.5 -showSizes -l CATEGORY_PROMOTIONS -onlyThisLabel
    LabelId;Label name;Messages size;Messages count
    CATEGORY_PROMOTIONS;CATEGORY_PROMOTIONS;1197364170;14618
    

    I have them there on a gigabyte accumulated.
    -onlyThisLabel is an important option, which just allows you to find only those messages that have this single label.
    GMailMessagesSize -mongoConnectionString 10.211.55.5 -showSizes -l CATEGORY_PROMOTIONS -l IMPORTANT -onlyThisLabel
    LabelId;Label name;Messages size;Messages count
    CATEGORY_PROMOTIONS;CATEGORY_PROMOTIONS;1197364170;14618
    

    Yes, I have one and a half gigabytes of “important advertising” messages :) Please note that this is in addition to just a gigabyte of unimportant advertising.
    Hands immediately combed his hair to remove it all!
    GMailMessagesSize -mongoConnectionString 10.211.55.5 -deleteMessages -l CATEGORY_PROMOTIONS -l IMPORTANT -onlyThisLabel -procNum 10
    

    In fact, letters are not deleted, but are placed in the basket. There, after 30 days, they either completely disappear, or you can go and manually clean yourself.

    TOTAL: If you delete messages not through the GMail Web interface, but through a third-party client (possibly mobile), then there is a possibility that your messages are not deleted, but archived. For some, this is even good. And for someone this leads to the fact that the box simply indecently swells.
    And it's not even 2 bucks a month. You can eat 100 gigs and more. I just wanted to understand the essence of the issue.

    ATTENTION!!! The project was written personally for yourself. This is my first Go program. I am not responsible for the safety of your letters! But if you do not use the -deleteMessages option, then nothing will happen to your mailbox.
    What to make the application work?
    • Use this wizard to create or select a project in the Google Developers Console and automatically turn on the API. Click Continue, then Go to credentials.
    • At the top of the page, select the OAuth consent screen tab. Select an Email address, enter a Product name if not already set, and click the Save button.
    • Select the Credentials tab, click the Add credentials button and select OAuth 2.0 client ID.
    • Select the application type Other, enter the name "Gmail API Quickstart", and click the Create button.
    • Click OK to dismiss the resulting dialog.
    • Click the (Download JSON) button to the right of the client ID.
    • Move this file to your working directory and rename it client_secret.json.

    Only registered users can participate in the survey. Please come in.

    Tell me, is the problem with the size of the GMail mailbox relevant for you? Do I need to bring my craft to mind so that you can selectively clean your archived messages?

    • 61.5% I have 15 gigabytes free enough for two lives, because I get one message on day 1123
    • 3.2% I bought 30 terabytes from Google and the size of the GMail mailbox doesn’t bother me 59
    • 30.3% I would understand what and how my GMail is stored in a box and a convenient tool would be useful to me 553
    • 4.8% I do not have a GMail account 89

    Also popular now: