Implementing PubSubHubbub Subscription in an App Engine Java Application

    PubSubHubbubDealing with the topic indicated in the title, I simultaneously found that in RuNet it was disclosed rather poorly, although a lot of time had passed since the submission of this protocol. I want to slightly fill this small gap by sharing my experience.
    Let me remind you briefly that PubSubHubbub ( PuSH ) is a protocol proposed by Google and designed to make the process of delivering data via RSS feeds from publishers to subscribers more efficient. The central place in the scheme ensuring the operation of the protocol is assigned to independent hubsacting as intermediaries between direct data sources and their final recipients. At the same time, the hub notifies all subscribers of the channel registered with it about the arrival of new data immediately after their appearance, while transmitting a new portion of the data.
    Thus, if you are creating an application that processes feeds in RSS or Atom format, you can make your life noticeably easier by putting black work on the hub. Specific advantages of such a scheme:
    • the ability to "integrate" many external channels into a single data stream of a common format, coming to the application input: the hub can take care of this;
    • no need to separate new data from old: the hub will deliver only new data;
    • no need to constantly monitor the channel for new data: the hub itself will inform you when necessary;
    • minimum time from the moment of publication to the moment of notification of your application.

    In other words, you can get prompt data delivery, significantly saving both the amount of incoming traffic and the processor time of the application. For app engine applications limited by quotas, these points may be critical. In addition, you will save your time, since you will have to write a smaller amount of simple code.
    Below are the minimally necessary fragments of Java code that I have successfully tested on one of the hubs. There is very little code and it is simple.


    So, this is a subscriber application that will receive data from a hub. In accordance with the protocol , the scenario of the subscriber interacting with the hub includes the following:
    1. the subscription request is sent to the hub with the channel address and subscriber address;
    2. the hub checks the channel and sends a request to the subscriber for confirmation of the subscription;
    3. the subscriber confirms the subscription;
    4. the hub notifies the subscriber and delivers new data to him as they appear on the channel;
    5. after a certain time, the hub re-requests the subscriber to confirm the subscription.

    This scenario means that our minimal application must implement a servlet capable of:
    1. Confirm the subscription in response to the hub request;
    2. accept the next parcel with a portion of new data.

    In addition, it may have a function that implements the actual procedure for requesting a subscription.

    Subscription Request


    Since the hubs that I tried allow you to request a subscription “manually” using the appropriate web interface of the service, this procedure is not required within the application.
    When requesting a subscription, you must inform the hub of the values ​​of the four required parameters:
    1. Subscriber URL ( hub.callback ): the address of the application servlet at which the hub will interact with it;
    2. type of request ( hub.mode ): desired action, namely subscription, or refusal (subscribe / unsubscribe);
    3. Subscribed channel URL ( hub.topic ): address of the channel whose messages you want to receive;
    4. request confirmation method ( hub.verify ): informs the hub whether it is necessary or optional to immediately (synchronously) request for confirmation of the subscription (sync / async).

    In addition, the hub may support optional parameters, such as:
    • subscription time ( hub.lease_seconds ): duration in seconds, which determines how long we want to receive channel messages;
    • secret line ( hub.secret ): transmitted if authentication of messages received by the subscriber is required (the hub on its basis will generate an HMAC code for the transmitted content and sign its communities to them);
    • verification sequence of characters ( hub.verify_token ): if specified, it will be passed as a parameter in the confirmation request so that the subscriber application can verify that it confirms a non-random subscription.

    If you are satisfied with the “manual” subscription mode, then you can proceed to the next section.
    However, it may be that the application is required to independently subscribe. Here is an example function that implements this procedure:

    import java.net.URL;
    import java.net.URLEncoder;
    import java.net.HttpURLConnection;
    import java.io.OutputStreamWriter;
    import com.google.appengine.repackaged.com.google.common.util.Base64;

    // ..

    public static void pshbSubscribe (String callback, String mode, String topic, String verify) throws IOException {

      callback = URLEncoder.encode ("hub.callback", "UTF-8") + "=" + URLEncoder.encode (callback, "UTF-8");
      mode = URLEncoder.encode ("hub.mode", "UTF-8") + "=" + URLEncoder.encode (mode, "UTF-8");
      topic = URLEncoder.encode ("hub.topic", "UTF-8") + "=" + URLEncoder.encode (topic, "UTF-8");
      verify = URLEncoder.encode ("hub.verify", "UTF-8") + "=" + URLEncoder.encode (verify, "UTF-8");
      String body = callback + "&" + mode + "&" + topic + "&" + verify;

      URL url = new URL (" myhub.com/hubbub ");
      HttpURLConnection connection = (HttpURLConnection) url.openConnection ();
      connection.setDoOutput (true);
      connection.setRequestMethod ("POST");
      connection.setRequestProperty ("Content-Type", "application / x-www-form-urlencoded");
         
      connection.setRequestProperty ("Authorization",
        "Basic" + Base64.encode (("myname: mypwd"). getBytes ()));

      OutputStreamWriter writer = new OutputStreamWriter (connection.getOutputStream ());
      writer.write (body);
      writer.close ();
     
      if (connection.getResponseCode ()! = HttpURLConnection.

        // ..  
      }
    }

    * This source code was highlighted with Source Code Highlighter .

    According to the protocol, a subscription request is a POST request to the address provided by the hub (" myhub.com/hubbub ") in the standard form used to transfer form values ​​(where " Content-Type " is " application / x-www -form-urlencoded "). The body of the message transmits the above voiced parameters.
    The hub on which I tested the code requires pre-registration and a request for a subscription with authentication (HTTP Basic Authentication). From here comes the "Authorization" with the username and password (" myname: mypwd ") of the hub user. As I understand it, this is a feature of a particular hub.
    In the case of a successful subscription, the hub must return 204 (“No Content”), or 202 (“Accepted”) in the case of asynchronous verification (if hub.verify was set to “async”).
    Thus, an example of a subscription request might look like this:

    pshbSubscribe (" myapp.appspot.com/subscribe ", "subscribe", " habrahabr.ru/rss/blogs/java ", "sync");

    The first parameter is the address of the application servlet. Next, consider the operation of this servlet.

    Subscription Confirmation


    After receiving a subscription request, the hub must request confirmation by sending a GET request to the received address. In our example, this is " myapp.appspot.com/subscribe ". At this address, the application must implement a servlet that responds to this request:

    import javax.servlet.http. *;
    // ..

    @SuppressWarnings ("serial")
    public class SubscribeServlet extends HttpServlet {
    // ..

    public void doGet (HttpServletRequest req, HttpServletResponse resp)
          throws IOException {

      resp.setContentType ("text / plain");
      resp.setStatus (200);

      if (req.getParameter ("hub.mode")! = null)
      {
        resp.getOutputStream (). print (req.getParameter ("hub.challenge"));
        resp.getOutputStream (). flush ();
      }
    }
    // ..

    * This source code was highlighted with Source Code Highlighter .

    In the request, the hub passes several parameters, the meaning of which is the same as in the request for subscription:
    • hub.mode : request type (subscribe / unsubscribe);
    • hub.topic : URL of the subscribed channel;
    • hub.verify_token : verification sequence of characters (present if transmitted during the request).

    If the parameter values ​​are satisfactory (correspond to the request), then in order to confirm the subscription (or reject it), you need to return the 2xx code in response, and put the value of another parameter in the response body: hub.challenge .
    If we do not want to confirm the request, return 404 (“Not Found”).
    If the hub returns the other codes (3xx, 4xx, 5xx), then it will decide that we have problems and the verification failed.
    If the contents of the response body will differ from the value of hub.challenge, the hub will also consider that the verification failed.
    If the asynchronous request method is used, then in case of failure (return 3xx, 4xx, 5xx or the contents of the response do not match the hub.challenge parameter), the hub should try to request confirmation again.

    Receive data from the hub


    When the hub discovers that it has new data for the subscriber, it will execute a POST request to the address already known to it provided by the subscriber. In the request body, it will transmit this data in RSS or Atom format (" Content-Type " will be " application / rss + xml " or " application / atom + xml "). To process the request, our servlet will have the function:

    public void doPost (HttpServletRequest req, HttpServletResponse resp)
           throws IOException {

      SyndFeedInput input = new SyndFeedInput ();
      SyndFeed feed = input.build (new XmlReader (req.getInputStream ()));

    @SuppressWarnings ("unchecked")
      ListentriesList = feed.getEntries ();

      for (SyndEntry entry: entriesList)
      {
        String title = entry.getTitle ();
        String author = entry.getAuthor ();
        URL url = new URL (entry.getLink ());

    @SuppressWarnings ("unchecked")
        ListcontentsList = entry.getContents ();
        // ..

      }
      // ..

      resp.setStatus (204);
    }

    * This source code was highlighted with Source Code Highlighter .

    In this example, we use the classes of the Rome library intended for working with feeds (SyndFeedInput, SyndFeed, SyndEntry, ...) to parse the data . An example of similar code used to solve a specific problem (sending data received from the hub through XMPP) can be found here .
    If the hub.secret parameter was defined during the subscription, the request will come with the parameter " X-Hub-Signature ", with a value of the form " sha1 = signature", where 'signature' is the HMAC code generated for the content of the request body (SHA1 signature). To verify the authenticity of the message, the application itself must calculate the HMAC code for the request body using the hub.secret known to it. If the result matches' signature ', then the message is genuine. More details here .
    If the message is successfully received, you need to return the 2xx code, regardless of the results of the “X-Hub-Signature" check. If the return is different, the hub should try to re-execute the request within a reasonable time, until will receive a success code.

    References:


    Also popular now: