How to repeat anyroom.io service in several lines of JS and without a backend



    A month ago on Hacker News, a post about the AnyRoom service was released to the top: a simple Go backend that allows you to create telephone conferences. The rating is more than a hundred, discussion in comments, Source code on github, subscriptions of $ 50 per month - in general, everything is like in adults. After the first surprise, “Does anyone really need this?!?” I googled a little “voice conference without registration and sms is inexpensive”, marveled at the high cost of applications and realized that yes, I need it. And on Voximplant, such a thing can be assembled in half an hour and a dozen lines of JavaScript code. Who wants to create a startup and promote on Hacker News? Under the cat I tell you how.

    How does AnyRoom work?


    On Go and Twilio. From the user's point of view, this is as follows: the conference organizer sends the participants a phone number (it is one for the entire service) and several digits of the conference number. All participants call, enter the conference number - and can communicate with each other. Like skype, it only works. Under the hood, the Go backend communicates with Twilio HTTP requests and transmits XMLs explaining what to do with calls.

    How to repeat it on Voximplant?


    Our cloud has a highlight. The logic for working with calls is set by a JavaScript code that runs in parallel with the call. There are no delays in communicating with the backend, and the backend itself is not needed for most scenarios: JavaScript can be written in our Web IDE and debugged in Web Debugger (yes, you call the number or the cloud calls itself, wherever you say, after which you can directly browser step-by-step using JavaScript and see what happens with calls). And you can download via HTTP API, for example from GitLab as part of Continuous Integration.

    To repeat AnyRoom in the minimum version, it’s enough to attach a JavaScript script to the phone number, which will do the following:

    • It will receive a call (the specifics of telephony, until a response is made, the call is not charged and you can, for example, play a waiting ringtone, but no one will get a voice stream from it).
    • Synthesizes a greeting and offers to enter a conference code.
    • Reads the numbers that the user presses on the keyboard (it is called DTMF or "dialing").
    • By clicking on something special, for example, a grid, sends the user to the conference with a name consisting of the entered numbers.

    Such an implementation will thereby be MVP: it is already possible to use, but there is great scope for improvements. What happens if two users want different conferences, but come up with the same number, for example, “111”? The web interface with the conference number generator does not hurt. But MVP is implemented in such a scenario, which can be linked either to a number rented on Voximplant (write in a personal letter, I’ll replenish the balance so you can experiment) or to a debugging number in Gotem City, the rent of which costs one cent per month and allows you to test incoming calls without spending on the number:


    Please note that the call is sent to the conference. This is another JavaScript script that needs to be added to the same application as the incoming call processing script.

    The incoming call rule must contain the regular expression 'conf _. *' To only trigger calls to the conference. This rule must be placed before the rule for incoming calls, which does not work for all calls at all:



    The JavaScript script for the conference itself is primitive in this example. Since conference scenarios are the only scenarios that can receive more than one incoming call, one conference mixer is created, and all calls mix voice streams through it. The conference mixer differs from the usual mixer in that it gives the user all voice streams, except for his own:


    What can be done better?


    As I wrote, this is a very simple clone of a popular service. The web interface for generating the conference ID, the ability to call from the website (we have the Web SDK), the mobile application (we have the Android and iOS SDK), playing music in the conference (via the Player object), synthesizing greetings, informing about connected and disconnected users, payment integration (via HTTP API) and much more.

    What I wanted to talk about. The fact that there are many unclosed needs related to telephony and video communications. All this can be quickly and very inexpensively assembled in cloud communication platforms. Quickly check ideas, quickly make modifications and pivots. And of course, use JavaScript for this. Because any web developer can JavaScript.

    Also popular now: