How to get data on 5 million companies through the LinkedIn REST API or why it is better to allow Request Token only once in OAuth

    Introduction


    Using the LinkedIn Company Lookup API, you can get information about a company registered on LinkedIn. You can search by keywords, or by ID. For example, a GET request
    http://api.linkedin.com/v1/companies::(1337)
    will return information about LinkedIn itself. It’s very convenient that you can specify several IDs at once through a comma, however, it should be borne in mind that too many IDs cannot be specified in one request due to the size limit of the request itself.

    There are also restrictions on the number of requests per day, which are set both for the application and for an individual user. In particular, to obtain information about companies, the limits are as follows:



    Thus, as can be seen in the figure above, LinkedIn does not allow one user to receive information about more than 500 companies in one day. At the same time, it is important to note that the number of requested companies is considered, and not requests (as mentioned above, information on several companies can be requested in one request).

    How it should work


    LinkedIn uses oAuth to authorize requests, and both versions are currently supported: OAuth 2.0 , as well as the older OAuth 1.0a . In the case of using version 1.0a, the mechanism can be briefly described as follows:

    1. We make a request to the API to get a link for authoring the application, and request_token is also created.
    2. You must open the link in a browser and manually allow access for the application.
    3. The previously generated request_token is used to get the access_token (by calling the appropriate API method).

    Thus, the steps described above need to be done only once, and the access_token should be stored in a safe place and used for all subsequent API method calls. In this case, we will face the fact that using one user we will not be able to get more than 500 companies per day.

    How it really worked


    In reality, I tried not to get the access_token, but to call the API methods using the request_token generated in step 1 (step 2 is required in any case, otherwise request_token will be unusable). And in the end, I found that the API methods worked successfully with request_token for the first 5 minutes (then its lifetime expires). In addition, with this method, the limit for the user did not apply, only the limit for 100,000 companies per day for the application. However, the creation of several applications comes to the rescue here. The obvious drawback of this approach is the need to authorize the application every 5 minutes. But with the proper level of automation and integration with the browser, this was not a big problem. As a result, in a couple of days of unhurried work of a simple code, a database was downloaded from about 5 million companies stored on LinkedIn.

    Recommendation


    You should not give the opportunity to call API methods (except for the access_token method) using a temporary request_token. This recommendation is spelled out in section 6 of OAuth 1.0a

    PS: Currently, LinkedIn has already corrected the situation in accordance with the given recommendation.

    Also popular now: