Google Cache Browser - view cache without torment

    It so happens that you need to go around the pages of a site that suddenly lay down or completely closed, and for centuries Google has been helping us here with its search cache. One trouble - “walking” in this case turns into a torment: look at the page, copy the address of the link you want to go to, paste in the search bar and add the prefix “cache:”. Too many actions for the sake of one click on the link. Here is a link to solve this problem for the impatient: GCB 2.0 .

    Google Cache Browser 1.0 and its problems


    Several years ago, I was already trying to solve this problem, and even created a small Google Cache Browser service, which worked on the principle of a proxy: I downloaded the cache page, replaced all the links in it so that they again led to the service itself, and in this form gave browser to user. However, he had several significant drawbacks:
    1. He was spending a fair amount of traffic.
    2. I regularly got into Google’s ban.
    3. To a lesser extent, but still noticeably loaded the processor (applying regular expressions to large pages is a thankless task).
    4. Despite all my tricks, he did not replace all the links. On some sites, they were designed so that you wonder. The validity of this HTML was out of the question.

    As a result, the service gradually died out and once I just did not renew the domain.

    Google Cache Browser 2.0 and JS Fu


    After that, my thoughts periodically returned to the problems that drowned the service, and they revolved mainly around the fact that it would be nice to transfer all this processing to the client side: the browser is much better for manipulating the contents of a web page than regular expressions. And just recently, I found a way to do this!

    The main problem was that for my purposes it was necessary to run my JavaScript in the context of the webcache.googleusercontent.com domain , and about a week ago I noticed that the cached pages still load and execute their javascripts, and not their cached versions, but current versions from the site. From now on, it remains only to drive into Google’s cache a suitable page with JS connected and start working in the context of Google’s domain.

    All this quite successfully coincided in time with SOPA and temporarily disabling good sites like Wikipedia, so last night I took it and brought the service to mind: now it works completely in the browser (not a single server script), in the latest versions of Firefox, Chrome, Opera and in IE8. I did not have enough time to check in other browsers, so send bug reports! :-)

    And, yes, the last goodie: I published all the source code of the service on GitHub, under the terms of GPLv3. Feel free to fork!

    Summary


    Humanity is happy with the opportunity to read Wikipedia for today, and I got a lot of pleasure from using the already forgotten JS-fu, because at work I do the server side most of the time.

    Todo


    As it usually happens, there are still a lot of improvements that would perfectly complement the service. Here are the most interesting of them:

    • Make a bookmarklet for the service. At the same time, the bookmarklet will be able to both transfer to the cached version of the page, and add functionality to the page from the cache, if it has already been opened.
    • Overcome some spontaneous glitches with layout.
    • Protect yourself from falling out of the entry-point page from the index.
    • More thoroughly cross-browser testing.
    • It is possible to transfer the service to a separate, more specious domain.

    By the way, you can well have a hand in this - the sources are on the GitHub ;-)

    Also popular now: