Mailing lists: civilized collection and cleaning of lists

    The post is more suitable for the hub " Spam and anti-spam ", but globally, the post is still for the offtopic "I am PR."

    Recently, I had the task of collecting email addresses of schools to invite participation in interscholastic competitions. On Habré there were several posts devoted to collecting email from commercial and non-commercial sites. I have not seen a single truly effective and civilized option for an automatic or semi-automatic option, although such a need periodically arises. 99% of the tools are email generators and “sales bases”, or desktop buggy software, which there is no desire to use.

    Lyrical digression. The topic of spam and antispam is a very fine line, so I’ll immediately give a definition: a civilized (or delicate) way - in terms of respect for those who will receive the newsletter. Manual version of the list is the most optimal, but the speed of modern life forces you to automate everything that is possible, because the task of any newsletter is to inform a large number of people in the shortest possible time.

    A couple of weeks ago, I was contacted by the developer of the spider-post.com service , which solves this problem. He suggested that I test the resource and post a review on Habré. I agreed, because The topic is interesting to me, but I did not find similar tools. I will be glad to see links to other services in the comments.

    All your questions will be referred to the authors of the development. Answers to them will appear in the comments.

    I saw the variant of a compromise solution to the task of collecting email as follows:
    • according to certain criteria, select sites related to your business;
    • pick up email from them;
    • check for validity;
    • carry out manual cleaning from email, the appearance of which is in doubt;
    • make a trial newsletter with an offer to subscribe on an ongoing basis.


    Spider Post takes a similar approach.

    • you select a region and specify lists of key phrases that characterize your business;
    • the service selects sites in the search engines according to the specified parameters and collects email lists. A finished list can be obtained within a few hours. According to the developer, the service analyzes what is written after "@" and checks if the site and email are alive, what is the age of the resource, whether it is commercial;
    • after this, the lists can be downloaded and cleaned manually (the report also contains the addresses of sites in order to carry out cleaning more efficiently).


    I tested on several topics in which I understand something (high schools, phosphors, perimeter security). Results and conclusions below.

    Screenshot of the completed order page:


    Detailed information on the results: The


    impression is ambiguous.
    1. I set highly specialized queries to minimize the possibility of getting into the final list of garbage.
    2. In all cases, the email database was obtained with tens of thousands of addresses and a large number of "strange looking" emails.
    3. Manually completing such a document is simply unrealistic, and most of the B2B market is unlikely to boast such a number of participants, and, accordingly, email.


    A few tips for developers on the functionality. What I would like to add to the functionality:
    1. The ability to use the search engine query language, due to which it will be possible to narrow the number of sites for selection.
    2. Collection of additional information. In addition to the address of the site - a heading and its description from the Y. Directory or from the search engine.
    3. The ability to specify which sites you want to collect addresses from (for example, there are federal resources for my task with schools)


    These simple add-ons will reduce the percentage of garbage on your lists and make it easier to clean.

    Also popular now: