We use the undocumented site API captionbot.ai
- Tutorial
In this article we will discuss how to get and use the site API if there is no documentation on it or it is not yet officially open. The guide is written for beginners who have not yet tried to revert a simple API. For those who were engaged in similar things, there is nothing new here.
We will analyze using the API service https://www.captionbot.ai/ that Microsoft recently opened (thanks to them for this). Many could read about him in an article on Geektimes . The site uses ajax requests in JSON format, so copying them will be easy and pleasant. Go!
We analyze requests
First of all, we open the developer tools and analyze the requests that the site sends to the server.
In our case, all requests that interest us have a base URL https://www.captionbot.ai/api
Initialization
When you first open the site there is a GET request for /api/init
no parameters.
The answer is Content-Type: application/json
, while in the body of the answer we get just a line of the form:
"54cER5HILuE"
Remember this and move on.
URL Submission
We have two ways to upload an image: through the URL and through the file upload. For the test, we take the URL of the Lena image from the wiki and send it. In network activity, a POST request appears /api/message
with the following parameters:
{
"conversationId": "54cER5HILuE",
"waterMark": "",
"userMessage": "https://upload.wikimedia.org/wikipedia/ru/2/24/Lenna.png"
}
Yeah, we tell ourselves, it means the method init
returned us a string for conversationId
, and userMessage
our link hit. What is waterMark
still unclear. We look at the response data:
"{\"ConversationId\":null,\"WaterMark\":\"131071012038902294\",\"UserMessage\":\"I am not really confident, but I think it's a woman wearing a hat\\nand she seems . \",\"Status\":null}"
For some reason, they encoded JSON twice, but oh well. In human form, it looks like this:
{
"ConversationId": null,
"WaterMark": "131071012038902294",
"UserMessage": "I am not really confident, but I think it's a woman wearing a hat\\nand she seems .",
"Status": null
}
All parameters along the way changed the way of writing, but these are trifles of life. So, some value was returned to us, for some WaterMark
reason it’s empty ConversationId
, actually the signature for the photo in the field UserMessage
and some empty status.
Image upload
Further, without closing the tab, we try the same operation with loading photos from a local file. We see the POST request /api/upload
in the format multipart/form-data
with the field name file
:
-----------------------------50022246920687
Content-Disposition: form-data; name="file"; filename="Lenna.png"
Content-Type: image/png
In response, we get the URL string of our downloaded file, we can go through it and verify this:
"https://captionbot.blob.core.windows.net/images-container/2ogw3q4m.png"
Then we send a request we are already familiar with /api/message
:
{
"conversationId": "54cER5HILuE",
"waterMark": "131071012038902294",
"userMessage": "https://captionbot.blob.core.windows.net/images-container/2ogw3q4m.png"
}
That came in handy waterMark
from the previous answer, and the URL is the one that the method returned to us upload
. The response data is similar to the previous ones.
Writing a wrapper
To use this knowledge with convenience, we make a simple wrapper in your favorite programming language. I will do it in Python. For requests to the site I use requests, as it is convenient and it has sessions that store cookies for me. The site uses SSL, but by default requests will swear on the certificate:
hostname 'www.captionbot.ai' doesn't match either of '*.azurewebsites.net', '*.scm.azurewebsites.net', '*.azure-mobile.net', '*.scm.azure-mobile.net'
This is solved by setting the verify = False flag for each request.
import json
import mimetypes
import os
import requests
import logging
logger = logging.getLogger("captionbot")
class CaptionBot:
BASE_URL = "https://www.captionbot.ai/api/"
def __init__(self):
self.session = requests.Session()
url = self.BASE_URL + "init"
resp = self.session.get(url, verify=False)
logger.debug("init: {}".format(resp))
self.conversation_id = json.loads(resp.text)
self.watermark = ''
def _upload(self, filename):
url = self.BASE_URL + "upload"
mime = mimetypes.guess_type(filename)[0]
name = os.path.basename(filename)
files = {'file': (name, open(filename, 'rb'), mime)}
resp = self.session.post(url, files=files, verify=False)
logger.debug("upload: {}".format(resp))
return json.loads(resp.text)
def url_caption(self, image_url):
data = json.dumps({
"userMessage": image_url,
"conversationId": self.conversation_id,
"waterMark": self.watermark
})
headers = {
"Content-Type": "application/json"
}
url = self.BASE_URL + "message"
resp = self.session.post(url, data=data, headers=headers, verify=False)
logger.debug("url_caption: {}".format(resp))
if not resp.ok:
return None
res = json.loads(json.loads(resp.text))
self.watermark = res.get("WaterMark")
return res.get("UserMessage")
def file_caption(self, filename):
upload_filename = self._upload(filename)
return self.url_caption(upload_filename)
The source code is on Github , plus the finished package in pip .