I recently had the chance to work on a prototype for a video chat service. It was an excellent opportunity to become more closely acquainted with WebRTC concepts and to try them out in practice. As a rule, when people talk about WebRTC, they mean the organisation of audio and video connections, but this technology can also be used for other interesting things as well. I decided to try to make a peer-to-peer game and to share my experience in creating it. Scroll down to see a video of the result and the details of how I did it.
The engine costs money and I had bought it a couple of years ago, but I hadn’t done anything useful with it. Now, finally, it would come in useful. I should say that in and of itself the process of creating a game using this engine is very absorbing and, for people like me who want – quickly and inexpensively – to feel like serious ‘game makers,’ it is just what you need. Having decided on which communication technology and game engine to use, you can move on to the implementation stage. As for me, I started with the game rooms.
How does a player get into the game and how can they invite their friends? Lots of online games use what are called game rooms or channels, so that players can play one another. This requires a server which allows you to create the rooms in question and add/remove users. It is a pretty simple set-up: when the user launches the game, and, in our case, opens the game’s URL in the browser window, the following happens:
- A new player communicates to the server the name of the room in which they would like to play;
- The server responds by sending back a list of players in the room in question;
- The other players receive a message that a new participant has appeared.
All this is pretty simple to implement, for example using node.js + socket.io. You can see here how it turned out. After the player has joined the game room, they have to set up a peer-to-peer connection with each of the players present in the room. However, before we move on to implementing peer-to-peer data, I suggest we have a think about what, in principle, this data will be.
The format and content of the messages sent between players very much depends on what happens in the game. In our case, it is a simple 2D shooting game in which players run around and shoot one another. So, in the first instance, you need to know the position of the players on the map:
When you receive this message, you will know where a player is positioned, but you cannot know what they look like at the present time. So, for a full picture, you can add information on what animation the player has switched on at the present time, what frame it is in and which way they are looking:
Excellent! What other messages do we need? Depending on what you are planning to do in the game, you will have your own set of messages. Here, basically, is my set:
- Player dies ();
- Player is born ( int16 x, int16 y );
- Player shoots ( int16 x, int16 y, boolean flipped );
- Player selects weapon ( int8 weapon_id).
Standardised fields in messages
As you may have noticed, each of the fields in these messages has its own data type, for example int16 — for fields which specify coordinates. Let’s look into this first of all, and along the way I will tell you a little bit about WebRTC API. The thing is, that to transfer data between peers an object such as RTCDataChannel is used, which, in turn, is able to work with data such as USVString, BLOB, ArrayBuffer or ArrayBufferView. And in order to use ArrayBufferView you need to be clear about what format the data will be in.
Right, having described all the messages, we are ready to continue and to move on to the actual organisation of the interaction between peers. Here I will try to describe the technical side as briefly as I can. In actual fact, trying to discuss every aspect of WebRTC in detail is a long and complicated process, particularly in the light of the fact that Ilya Grigorik’s book is available in the public domain – a real treasure trove of information on this and other subjects in respect of network interaction. My aim, as I have already stated, is to describe in brief the basic workings of WebRTC – studying these is the starting point for everyone.
Setting up a connection
What do users A and B need to set up a peer-to-peer connection between themselves? Well, each of the users needs to know at least the address and port where his opponent is listening and is able to receive incoming data. But how can A and B communicate this information to one another if the connection has not yet been set up? To transfer this information, a server is required. In WebRTC jargon this is called a signalling server. And since a server has already been set up for the game rooms, this same server may also be used as a signalling server.
Also, besides addresses and ports, A and B must agree the parameters of the session to be set up (for example, in respect of the use of various codecs and their parameters in the case of audio and video connections). The format of the data describing all sorts of different connection characteristics is called SDP — Session Description Protocol. You can find out more about this at webrtchacks.com. Right, based on what we have said above, the procedure for data exchange via signalling is as follows:
- User A sends a request for connection to user B;
- User B confirms the request from A;
- Having received confirmation, user A identifies their IP, port, any session parameters and sends these to user B;
- User B responds by sending their address, port and session parameters to user A.
Once these operations have been completed, both users know each other’s address and parameters and can start exchanging data. However, before moving on to the implementation stage, it is worth finding out some more about identifying IP address + port pairings.
Address identification and verifying accessibility
When each of the users is available via a public IP address or if both are on a single subnet — everything is simple. If this is the case, they can each request their own IP from the operating system and send it via signalling to their opponent. But what do you do if the user is not available directly, but is behind a NAT, and they have two addresses: one local, on the subnet (192.168.1.1), and a second, namely the address of the NAT (22.214.171.124)? In this case, they have to somehow identify their public address and port.
The idea for solving this quandary is quite simply: you need a publicly available server which, on receiving a request from you, will respond by sending the public address and port we need.
These servers are called STUN (Session Traversal Utilities for NAT). There are ready-to-use solutions, such as coTURN, which can be enabled as your STUN server. But, even simpler, you can use already enabled and accessible servers such as those from Google.
In this way, each one may obtain their own address and send it to their opponent. However, this is not sufficient, since, after having received an address from an opponent, you still need to check whether they can be reached at the address in question.
Fortunately, the ICE (Interactive Connectivity Establishment) framework, which is integrated into the browser, assumes the task of interacting with STUN and verifying accessibility. All that we need to do is to process the events of this framework. Right, let’s move on to the implementation stage …
Setting up a connection
Initially, it might seem that the process of setting up a connection is quite complex. However, fortunately, the complexity is limited to the RTCPeerConnection interface and in practice everything is simpler than it might appear at first glance. You can view the full code of the class which sets up peer-to-peer connection here. I will now go on to explain it.
As I have already said, setting up, monitoring and closing down a connection, and also working with SDP and ICE candidates — all this is done via RTCPeerConnection. You can obtain more detailed information about the configuration here. However, in terms of configuration, we only need the address of the Google STUN server which I spoke about earlier.
RTCPeerConnection offers a range of call-backs for various events in the life cycle of the connection – of which we need the following:
- icecandidate — for processing the candidate found;
- iceconnectionstatechange — for monitoring the state of the connection;
- datachannel — for processing the open data channel.
Sending a connection request
The first two points on the list of operations for a connection were sending a request for setting up a connection and confirmation of that request. Let’s simplify the process a bit, and let’s say that if the user knows the address of the game room, then someone gave them the link, and so the request for setting up a connection is not required, and you can move straight on to exchanging session data and addresses.
Identifying session parameters
For the purposes of receiving session parameters in RTCPeerConnection, createOffer has methods for the initiating party to create an offer, and createAnswer for the responding party to create an answer. These methods generate data in SDP format, which must be sent to the opponent via signalling. RTCPeerConnection saves both the local session description and the remote session description received via signalling from the opponent. For setting up these fields the setLocalDescription and setRemoteDescription methods are available. Okay, let’s say that client A initiates a connection. The list of operations would be as follows:
1. Client A creates an SDP offer, sets a local session description in their RTCPeerConnection, after which they send it to client B:
2. Client B receives an offer from client A and sets a remote session description. After this they create an SDP answer, set it as a local session description and send it to client A:
3. After client A has received an SDP answer from client B, they also set it as a remote session description. As a result, each of the clients has set a local session description and a remote session description received from their opponent:
Collecting ICE candidates
Each time an ICE agent from client A finds a new IP+port pairing which can be used for a connection, RTCPeerConnection triggers an icecandidate event. The candidate’s data looks like this:
This is what we can glean from this data:
- udp: if the ICE agent opts to use this candidate for a connection, then udp transport will be used for the connection;
- typ srflx — this is a candidate obtained by requesting the STUN server to identify the NAT address;
- 126.96.36.199 60478 — NAT address and port which will be used for the connection;
- raddr 192.168.1.157 rport 60478 — address and port inside NAT.
You can read up in more detail about the ICE candidates’ description protocol here.
This data needs to be transferred via signalling to client B, so that they can add them to their RTCPeerConnection. Client B does exactly the same thing when they discover their own IP+port pairings:
Creating a data channel
The final thing to draw attention to is RTCDataChannel. This interface offers us API, which helps us to transfer random data, and also to configure the data transfer settings:
- Full or partial guarantee for message delivery;
- Ordered or non-ordered message delivery.
You can find out more details about the RTCDataChannel configuration here, for example. For now, it will be sufficient to configure the setting ordered = false, to retain UDP semantics when transferring your data. Like RTCPeerConnection, RTCDataChannel offers a range of events describing the life cycle of a data channel. Of these open, close and message are required for opening and closing a channel and receiving a message, respectively:
And, finally, once a data channel has successfully been opened between players, they can start exchanging game messages.
We have considered how to set up a connection between two players and this is basically enough, if you are playing one-on-one. But what if we want there to be several players in a given room? What does that change? In actual fact, it doesn’t change anything; it’s just that every pair of players has to have their own connection. This mean, if you are playing in a room with 3 other players, you have to have 3 peer-to-peer connections – one for each of them. You can view the full code of the class responsible for interaction with all the opponents in the room here.
Right, so the signalling server with the rooms is ready, and we have discussed the message format and how to deliver the messages. Now, based on all that, how do we make sure the players can see one another?
The idea of synchronisation is quite simple: in the space of a given time period you need to send opponents your coordinates once, and then, based on those coordinates, they can display your true location.
How often do you need to send synchronised messages? Ideally the opponent should see updates as often as the player themselves, i.e. if the game is operating at a frame rate of 30-60 frames per second, then messages should be sent at that same frequency. However, this is a rather naïve solution, and in the end a lot depends on the dynamic of the game itself. For example, is it worth sending coordinates so frequently, if they only change every 10-20 seconds? If that’s the case, it’s probably not worth it. In my case, the animation and the position of the players change relatively frequently, and so I opted for the simple answer: sending a message with coordinates for every frame.
Sending a synchronised message:
Receiving a synchronised message:
Unfortunately, it worked out that this only operates without a time-lapse as long as you don’t start playing with a real person who is sitting at another computer and not on the same network as you. Because in that case it starts working like this:
The thing is, that for the image to be uninterrupted, the messages need to be delivered at a consistent frequency – with the same frequency as they are being sent. It is practically impossible to achieve this under real-world conditions, and so the time gaps between incoming messages are constantly changing, creating an effect which is unpleasant for the eyes. This can be overcome using coordinate extrapolation.
To start off with, you need to get to the bottom of how the delay with the messages has an effect on the quality of the image which the player sees. In order for the movement of the image to be uninterrupted, messages need to arrive at an even interval which is also close to the rate at which the frames are updated in the game:
In practice, it works out differently. The intervals between the messages are distributed unevenly which makes the animation ‘jump’ and the coordinates change:
Looking at the second diagram, you can see what happens when there is an increased time-lapse with the messages: first of all the player sees the image freeze, and then the image jumps. This is what produces the unpleasant effect.
The movement would be much more uninterrupted, if, when the messages are delayed, the player’s coordinates changed proportionally, even if they are not always reliably accurate:
And, actually, if you analyse the players’ movement, you realise that they don’t usually suddenly change direction and that means that, if, at a given moment, the following coordinate message has not been received, then we can estimate the coordinates on the basis of, for example, the speed at which they were travelling in the previous frame. To do this, you either need to calculate the speed on the receiving end or simply send it along with the coordinates. As usual, I go for the simplest option and send the speed along with the coordinates. And now, if in a given frame, there was no message updating the coordinates, then the coordinates can be calculated based on the speed at which the player was travelling in the previous frame:
And this is what it looks like after extrapolation:
Of course, this method has lots of drawbacks and, if the connection is particularly slow, then this can happen:
However, performing extrapolation is far beyond the scope of the present article, and so I suggest that we stop here.
Other game actions
Besides moving around on the map, it would also be good to get some ammunition and shoot someone. What I mean by that, is that there is a whole range of actions which the player performs in the game, and they also relate to the issue of synchronisation. Fortunately, this presents far fewer problems than in the case of movement synchronisation: it is sufficient simply to reproduce the event received via a message. That is why I am not going to go into detail, but will simply direct you to the project code.
How it all worked out
You can view the code (apart from the source code of ImpactJS itself) and instructions for launching it on github.
I will take the risk of giving out the link where you can try to play it here. I don’t know what will happen to my single-core Droplet, but que sera, sera =)
If you have read this right to the end – thank you! That means my work has not been wasted and you have found something interesting for yourself. Feel free to write any questions, feedback and suggestions in the comments section.
Alexander Gutnikov, Frontend developer.