D'Artagnan and the Internet, or work on the problem of broken links

    Picture 1
    Gentlemen, it’s enough to consider links exclusively in the context of their quantity, purchase / sale and consider the PR site where they are located. It's time to take care not of robots, but of people. Working with the Internet is becoming unbearable. The farms of auto-generated sites with govnotexts bloom and multiply, ladies forgive me. Because of them, it is impossible to find even technical materials, not to mention ordinary ones. But I wouldn’t be so worried about the search for technical materials if they had the correct links. Links die like flies and, reading a post a year ago in a forum or blog, there are almost no hopes to follow these links.

    I consider broken links to be a very big problem of the modern Internet, although they somehow do not talk or think about it. I think it's time to do at least something. We are already doing something, and I’ll tell you about it. I hope that the example will inspire someone, and he will also want to take care of his users.



    There are so many broken links on the Internet that it is not even possible to choose something for example. Everyone came across that after reading interesting information about something, he confidently clicked on the link and got nowhere. And the final goal is usually not a dead site, but quite lively. And so lively that it constantly corrects without worrying about redirecting users coming from external resources. However, often they do not care about transitions from internal resources. A good example of this is an article on MSDN.

    Someone will object that there is nothing wrong with the fact that the material has moved somewhere. It can always be easily found on Google. Firstly, even if possible, it spends a huge amount of time. And this is a big problem. One single, but useful resource, moved at the whim of the site administrator, will take time from thousands, and in some cases, millions of people. Each of them will be forced to search for the necessary material and follow the links.

    In other cases, finding the material can be extremely difficult, or the one who needs it cannot do it. I’ll give two examples when "go to Google" does not help.

    The first example. In order to release the plug-in for Microsoft Visual Studio, it is necessary for each version to receive a special key (PLK) on the Microsoft website. For several years this key was issued on the pagemsdn.microsoft.com/en-us/vsx/cc655795.aspx (link not working). A couple of months ago, someone decided that the ideally called “vsx” section was not correct, and it was renamed “vstudio”, so the link became new http://msdn.microsoft.com/en-us/vstudio/cc655795.aspx . But EVERYWHERE, including Microsoft sites, links were old, not new. Google search also returned only the old link, since the new one wasn’t featured anywhere. They helped in the Microsoft forum, where they clearly indicated a new page. The question is - did someone feel better about changing the link? How many people around the world were forced to look for the answer to this question? If you really want to change the link, was it really difficult to make a redirect?

    And here is another, more emotional example. There is such a book "C # for schoolchildren", released with the support of Microsoft and aimed at children 12-16 years old.
    Picture 2
    I personally am not sure that at this age it is rational to study C #, but the book as a whole makes a very good impression. According to extremely funny explanatory pictures there are very, very many.
    Picture 3
    So imagine how much effort people put in to create such a book. Someone came up with an Microsoft advertising initiative to introduce children to C # while still at school, a person wrote a book, then translated it, the artist redrew the drawings so that the text was in Russian and, probably, in other languages. A lot of money and time was spent. And what is the result? And I'm sure that no!

    I very much doubt that the child will go further "Part 1. First acquaintance", because there he is explained about the need to download and install Microsoft Visual C # 2008 Express Edition. I have no doubt in the abilities of the student. They download and install Starcraft 2 without any help, and they understand me better in different iPhones. Everything is more banal. It’s just suggested that you download it from an address that no longer exists:
    Picture 4
    Result of the transition:
    Picture 5
    And the question is, why was it necessary to create this book if everything breaks down on the mindless transfer of data on the site from place to place? I doubt very much that a student of thirteen years after this will go to Google to search for the download of the miracle beast "Microsoft Visual C # 2008 Express Edition". With a 90% probability in this chapter, learning C # will be over.

    Yes, it might seem like I'm criticizing Microsoft. No, on other sites it’s not better, just such examples have turned out.

    What are my conclusions from all this?


    It is very easy to spoil all your material, blog post, service, book or any other project due to the fact that someone else (or yourself) will take and change the address of the resource to which you are linking. After that, the value of your creation if it does not become zero, then at least it will become much lower, since your readers / users will have to spend time and nerves independently searching for the desired link.

    How do we solve this problem?


    We write technical articles and often refer to various documentation, tools, and third-party blog posts. As a result, we also often encounter the problem of moving materials and articles to third-party sites. Especially, for some reason, such sites of large companies as Microsoft, Intel or AMD sin. They move entire sections and as a result, for example, seek help in the articles of Microsoft / Intel employees who have at least one year turned a very base job. What link do not click - you get nowhere. I think that many programmers will understand my experiences.

    I am sure many do not give a damn about it, the transition to somewhere does not work, well, okay. Actually it is, since there are so many dead links on the Internet. However, we write articles for people, not for search engines. And I declare this with pride. Though millions haven’t earned yet, at least for a moment I want to feel myself as a d'Artagnan.

    So, it is important for us that the articles contain correct links not only to materials on our own site, but also to external sites. Therefore, we need to correct those links that begin to lead nowhere. The task is complicated by the fact that we publish our articles on many other sites. And it is natural to edit links in them there is no strength, and sometimes technical ability.

    A natural solution is to create a redirect system. I'll tell you how it all works for us, maybe someone wants to do something similar for themselves. I even really want someone to be interested, so tired of the road to nowhere!

    The system consists of a database that stores a pair of short links - a link to an external resource. The user interface for adding links is quite simple and is shown in the figure below.
    Picture 6
    Just enter the link on an external resource and get a short link to insert in articles, blogs and so on. If the address of the external resource is already in the database, then the previously created short link is returned:
    Picture 7
    If there is no such link in the database, a new pair is created and a new short link is generated:
    Picture 8
    Technically, the record is stored in the database in the links table and is a set of the following fields:
    • id - primary key
    • num - link number, just this number determines which link qwerty.php script will get from the database
    • link - link text itself
    • link_category_id - the number of the category in which the link is located; for the script to work, this field is not significant, but for the convenience of the user, link categories are introduced

    By clicking the "Generate" button, a request is sent to viva64.com containing the address of the link that you want to add. The script processing the request looks something like this:
    $ sql = "select * from links where link = '". $ add_url. "'";
    $ link = mysql_query ($ sql);
    if (mysql_num_rows ($ link)) {
        $ row = mysql_fetch_array ($ link, MYSQL_ASSOC);
        $ new_url = "http://www.viva64.com/qwerty.php?url=".$row['num '];
    }
    else {
        $ sql = "select * from links order by num desc";
        $ link = mysql_query ($ sql);
        $ row = mysql_fetch_array ($ link, MYSQL_ASSOC);
        $ last_num = $ row ['num'] + 1;
        $ sql = "insert into links (num, link) values
               (". $ last_num.", '". $ add_url."') ";
        $ link = mysql_query ($ sql);
        $ new_url = "http://www.viva64.com/qwerty.php?url=".$last_num;
    }

    The script receives this address of the $ add_url variable and checks if the address is in the database:
    $ sql = "select * from links where link = '". $ add_url. "'";
    $ link = mysql_query ($ sql);

    If there is, then a link to call the redirect script with the identifier of the address received from the database will be simply written to the $ new_url variable:
    if (mysql_num_rows ($ link)) {
        $ row = mysql_fetch_array ($ link, MYSQL_ASSOC);
        $ new_url = "http://www.viva64.com/qwerty.php?url=".$row['num '];
    }

    If the address is not found, then the maximum unique identifier of the address from those contained in the links table will be calculated and a new record will be added to the database with the incremented maximum identifier, after which the value of the new link is written to the $ new_url variable to call the redirect script:
    else {
        $ sql = "select * from links order by num desc";
        $ link = mysql_query ($ sql);
        $ row = mysql_fetch_array ($ link, MYSQL_ASSOC);
        $ last_num = $ row ['num'] + 1;
        $ sql = "insert into links (num, link) values
                 (". $ last_num.", '". $ add_url."') ";
        $ link = mysql_query ($ sql);
        $ new_url = "http://www.viva64.com/qwerty.php?url=".$last_num;
    }

    After which the user receives a redirect link, regardless of whether a new address was added to the database or just one of the existing ones was simply received

    Redirection mechanism


    The redirect script on viva64.com is not complicated. In fact, all that he does is take the link number as a parameter, then get the link with the same number from the database and redirect the link. In the code, it looks like:
    $ s = substr ($ HTTP_GET_VARS ['url'], 0, 15);
    $ u = "http://www.viva64.com/";
    $ isConnect = mysql_connect ($ sqlserver, $ sqluser, $ sqlpassword);
    if ($ isConnect) {
        $ isSelectDatabase = mysql_select_db ($ database);
        if ($ isSelectDatabase) {
             $ currentLink = $ s;
             $ sql = "SELECT * FROM links WHERE num = '". $ currentLink. "'";
             $ link = mysql_query ($ sql);
             if ($ link && mysql_num_rows ($ link)) {
               $ row = mysql_fetch_array ($ link, MYSQL_ASSOC);
               $ u = $ row ['link'];
             }
    }
    }
    print Header ('Location:'. $ u);


    Finding and fixing broken links


    The task of finding broken links is solved by the Fast Link Checker program . The program crawls all pages of the site and tries to follow all the links found. Then the results are filtered and, to a predetermined e-mail address, an email is sent with a list of broken links. The launch of the program is automated, once a week, the links are checked for health.

    After determining the broken link manually, the material to which the link points is searched. Usually, you can easily determine the new address at which the material is available. On sites such as Microsoft, Intel, AMD are very fond of just moving the material to another section.

    If it is impossible to find this or almost identical resource, which is extremely rare, then the link is removed from the articles on the site. On external sites, the link in our article will point to nowhere, but nothing can be done here. Once some material / site has disappeared, then it has disappeared.

    When a new link is defined, it is entered into the database and thus the link is again working in all articles of the site.

    To change the link through the administrator’s interface, a request of the form will be executed:
    UPDATE 'links' SET
     'link' = 'http://msdn.microsoft.com/en-us/isv/bb190527.aspx' 
    WHERE 
     'links'.' numn = 341 LIMIT 1;

    I did not begin to describe the system’s operation in detail, I honestly say that the user is a system, not a developer. But if there is interest on the part of the readers, then my colleague Anton Dubrovin will describe everything in more detail and answer questions.

    Initiative for Intel


    I myself am not an Intel employee, but I know that many of the company's employees read this blog. That is why I am writing here because I want to offer an initiative. I know that Intel is constantly conducting various programs and summer schools, where students intern, performing various interesting tasks. If one of the readers of Habrahabr is not up to date, then here are a few links on this topic: 1 , 2 , 3 , 4 .

    I would like to propose, as one of the tasks, to reflect on the implementation of a system that will allow you to keep existing links on the Intel website in an adequate state. Unfortunately, broken links on the Intel site are probably no less than on the Microsoft site. You can start with a small part. For example, consider supporting the Russian-language part of ISN (articles, forums, blogs). What I described in the article is still some hack that solves only one problem and is very narrow. And the problem of incorrect links requires more serious research and work.

    Thanks in advance to those who also want to improve the world a little.

    Also popular now: