Skip to content

Video Conferencing (Video Chat activity)

mluketin edited this page Aug 31, 2018 · 10 revisions

This page describes how the video conferencing (VideoChat activity) in FROG works. Before getting into the activity, it is necessary to explain WebRTC which is used to create real-time video conferencing applications.

Content

WebRTC

WebRTC is new technology and its idea is that two users can communicate via peer-to-peer (p2p) connection and exchange media streams (video/audio) and even arbitrary data (files, text messages, etc). The best way to learn this technology is to implement mini videochat web applications. Here is a list of useful resources for learning WebRTC:

The first two links should cover all the basics for WebRTC and the second two contain blog posts which can be quite useful and interesting.

After going through these tutorials, you should know and understand how RTCPeerConnection works, how getUserMedia works, how the exchange of Offer, Answer and ICE Candidates works, etc.

Note: while implementing WebRTC application, be careful when you add ICE candidates to a RTCPeerConnection object. First you must set local and remote descriptions of the RTCPeerConnection and after that, you can add ICE candidates.

TURN server

In the WebRTC basics, you should learn about STUN and TURN. STUN is used to retrieve peer's public IP address and TURN is used when peers cannot establish direct p2p connection and have to relay the data over additional server (TURN server).

For FROG, coturn TURN server is used. The documentation for this server is on the provided link, but there is also a lot of information on StackOverflow.

Note: when you deploy your TURN server, always check if it works by using Trickle ICE test

Media server

Media server is used to support video conferencing (many-to-many and one-to-many). In FROG, it is used in those two cases which we called group mode (many-to-many) and webinar mode (one-to-many). There are several options available when choosing the media server:

For FROG we decided to use Kurento as it seemed as the best fit.

Kurento is simple to use and it also allows creation of modules for analysis on the media server. Janus also offers plugins (modules).

Kurento can easily be installed from terminal. After installation, STUN and TURN servers must be added in the KMS configuration. (all the instructions are on the official website)

Kurento Media Server (KMS) acts like a peer and the job of a signalling server is to connect each user with the media server. KMS has Media Elements and Media Pipelines. Media Pipelines can be considered as Rooms, and Media Elements are WebRTCEndpoints (there are many Media Elements, but WebRTCEndpoints are the most used ones in FROG).

Example: There are 4 participants that want to have a conversation. It is necessary to create one Media Pipeline that acts as a Room. By using the pipeline object, we create 4 WebRTCEndpoints that act as a recv-only peers (only receive streams from the participants). We must use signalling server to connect the participants with these endpoints. Then for each participant, 3 WebRTCEndpoints are created that act as a send-only peers (they send stream from other 3 participants).

Signalling server is the one that takes care of KMS (creating pipelines and elements). This is done by connecting to KMS with web socket and exchanging JSON messages. It is not necessary to do this manually because Kurento has a library called KurentoClient which has API that models the objects on KMS and abstracts the communication. KurentoClient exists for Java and JavaScript. Since FROG is written in JavaScript technologies, it was decided to have a signalling server also written in JavaScript. Kurento provides most of its tutorials for Java. There are some JavaScript tutorials that show some ways of using the Kurento.

Signalling server

Signalling server is located on frog-signalling-server. The Wiki pages in that repository explain how the signalling server works.

Summary: signalling server connects to the KMS. Web clients (users/participants) are represented by UserSessions on the signalling server. For each UserSession, signalling server creates one WebRtcEndpoint that receives stream from participant, and creates N-1 WebRtcEndpoints that send streams from other participants to user. More details are on the Wiki pages in the repository.

Video chat activity - intro

Video chat activity enables users to talk to each other. The source code consists of:

  • ActivityRunner
  • analytics - used for analysing MediaStream objects
  • lib - 3rd party libraries for voice activity detection (hark.js) and emotion recognition (clmtrackr.js)
  • webrtc-config - configuration of WebRTC (rtcConfiguration, signalling server URL, media constraints)
  • Dashboard - dashboard displaying how much time each user talked in a group
  • config - config file of the activity (just like in all activities)
  • index - index file of the activity (just like in all activities)

Video chat activity - ActivityRunner

Activity runner consists of few classes:

  • index
  • participant - represents a participant in the conversation - contains and manages RTCPeerConnection object
  • Header - not really a class - it is used to display title and information from activity's config
  • VideoLayout - renders the local and remote videos and buttons for managing the video elements
  • Video - represents HTML video
  • ParticipantsView - renders a list of participants and information about them (whether a participant is streaming or not, or if the participant has raised hand)

Activity is used for conference calls. The apropriate state for the activity consists of:

  • local stream (representing user runing the activity)
  • remote streams (consisting of the remote participant's MediaStream object, participant's name and id)
  • participants - used by ParticipantsView

When activity mounts we retreive information about local participant (name, id, etc. - this information is located in config) and create roomId (from activityId, sessionId and groupingValue). Now we create a WebSocket connection to the signalling server (URL is in the webrtc-config).

When the connection is established, we request joining a room (with roomId we created before). The signalling server will create UserSession for local participant . Depending on the role of local participant, we send local stream to the media server. (NOTE: Safari has some issues - in order to receive video stream, safari user must allow media devices).

When new participant joins a room, we get information about that user (name, id and role). Depending on that user's role, we request that user's media. We get the user's name and role only in the message indicating new (or existing) user. In all other messages, user's ID will be used.

Few important notes:

  • WebRTC API might have some changes in the future (so API used in this activity might be deprecated or absolete)
  • Only Firefox and Chrome are fully functioning
  • Safari has some issues like the problem with allowing media devices to have a recv-only connection
  • Edge has some problems with ICE candidates
  • In case of something not working, check if signalling server is working, if TURN server is working, if KMS is working

Video chat activity - analytics

This directory consists of StreamAnalysis.js and components directory. Components directory contains classes that do some sort of analysis on a MediaStream. Idea is that ActivityRunner gives the MediaStream to the StreamAnalysis.js and StreamAnalaysis.js passes the MediaStream object to component classes from the components directory.

StreamAnalysis imports all the classes from the components directory. It has a method called analyzeStream which takes MediaStream object and some options object. This method will be called from the ActivityRunner. When this method is called, StreamAnalysis will pass the arguments to classes in the components directory.

There is also onVAD method. It is not used to create log messages for dashboard, but to provide callback to the ActivityRunner when a voice is detected on the media stream.

Currently there are two components in the components directory. First one is VoiceActivityDetectionAnalysis.js which uses the hark library to detect voice. When user starts or stops speaking, log message will be created containing user's id, name and a boolean indicating whether user is speaking or not. The second component is VideoEmotionAnalysis.js which is made from the official example of the clmtrackr library. It did not work very well in the FROG.

All in all, creating some new analysis on the media stream does not require any knowledge of WebRTC. It is only necessary to create new analytics component which does the work, and import that component in StreamAnalysis (and call the component's analyze method from the StreamAnalysis' anaylzeStream method).

Video chat activity - Dashboard

Dashboard.js creates dashboard for the video chat activity by using the log messages from the analysis part. As said in analytics section, log message contains payload object with user's id, name and boolean. Other properties besides payload which are used in dashboard are instanceId, timestamp and userId.

Dashboard uses bar chart to display how much time each user spent talking in a group.

Dashboard has a state which is a map of groups. Each groups has id, starting time, participants and other properties required by the bar chart (tickValuees, tickFormat, data). Participant has id, index, name and array of time ranges (intervals). Each interval has start (when user started speaking) and end (when user stopped speaking, and if the end is null, user is still speaking) properties.

In the mergeLog method, we first create a new group in the state (when new log message arrives). If the log message is from new user, we create empty participant and add it to the group (this log message is from the activityDidMount message). Now we check the log's payload. If the speaking boolean of payload is true: first check/set the last interval's end property; then push a new time interval. If the payload's speaking boolean is false, set the last time interval's end property.

In the prepareData method we set everything up for the bar chart. We have a state which is a map of groups and each group has participants with their time intervals. Tick values is set to the number of participants + 2. This will make graph look nice. For tick format, we use participants' names. Then we create data for each froup consisting of participants' index and time intervals.

In the Viewer method we create a bar chart for each group.