“The protocol governing the transmission of voice over the Internet is known as the Voice over Internet Protocol (VoIP). It employs a sub-protocol for real-time streaming of media, called the Real-Time Protocol (RTP).” That being said, what in the world is a protocol? How in the world do two people living on separate continents communicate with each other, effectively in person, as though their computers are connected through a wormhole? First, some basics.
Internet Protocol and Packet Switching
The Internet is the most efficient mailing system ever devised. It perpetuates the transfer of information by mailing it in small envelopes known as packets. They either line up in trains linearly or take faster and more direct routes in digital airplanes. They deliver these messages at train stations and airports.
These “stations” are our modern computers, while the modes of transport resemble the media through which this information travels. The train route — fiber optics and telephone lines, while the flight route — satellite links.
However, packets don’t carry information as you’d normally expect. For instance, like a standard Amazon delivery, books aren’t delivered whole but are instead broken down into multiple pieces. The envelopes are then stocked with these chunks of pages and sent through a medium. The packets travel independently and reassemble into the original copy at their destination. This is known as packet switching.
The segregation of information and the journey of the consequent packets is governed by a set of rules or protocols known as the Internet Protocol (IP). These are rules or agreed-upon methods that every computer participating in the communication must comply with.
Servers, Clients, P2P and VoIP.
Another set of protocols are the Server-client and Peer-to-Peer (P2P) protocols, which establish the relationship between two or more participating devices.
The server-client protocol, on the basis of who initiates the connection and who responds to it, determines who the client is and who the server is, respectively. The server is a cistern that contains the information, such as any website that a client wants to access. The limitation of this model is that the flow of information is rarely bidirectional. This is unfair to the server, which might not necessarily be a website, but possibly another user. In this case, the opportunity to allow data access isn’t reciprocated by the client.
The alternative to this is the Peer-to-Peer protocol, which establishes a bi-directional connection between a sender and a receiver. In P2P, everyone is a server! Therefore, rather than just pinching files, you can also give them back. This is evident in file-sharing applications, such as Napster or Torrents. Unlike the server-client model, where multiple users could degrade network performance, P2P enhances it due to the availability of coveted files with multiple users. If a server communicates at 100 Mbps, then 100 users are communicating with it each at 1mbps. Imagine the speed for 1000 or more users!
P2P allows for the simultaneous download of a file from multiple users in the form of snippets. The individual pages are then reattached in a sequential manner upon completion, which makes for faster transactions.
Obviously, one could wonder how these packets know where to go? The participating train stations or devices are identified by unique addresses. Other than the critical user information, the protocol requires each packet to be labelled with the address of the source that sends it and the address of the receiver where it is supposed to end up. It also carries other data, but that doesn’t concern us right now.
The information exchanged between the two devices isn’t limited to text messages or images. With the development of new protocols and improved hardware, digitally encoded sound or video information traveling in real time was made possible. The protocol governing the transmission of voice over the Internet is known as the Voice over Internet Protocol (VoIP). It employs a sub-protocol for real-time streaming of media, called the Real-Time Protocol (RTP).
How VoIP Applications Work
Now, we’re familiar with the basics and can move forward to the crux of the article. How does an application like Skype actually work?
Skype is a proprietary VoIP system that uses its own protocol, known as the Skype protocol, but it is loosely based on P2P networking — It implements direct communication between two computers on the Internet in a similar way as file sharing. This makes it decentralized and totally distributed — it has no “Skype central servers or systems”. But how does it keep track of data?
Keeping track of data would mean creating tables or indexes that match two entries, usually, one public address mapped to one private LAN address. This is helpful in translating addresses when packets cross the boundary between private and public networks. This is known as Network Address Translation (NAT). When you sign on to Skype, your device becomes one node in a network of equal peers.
Skype then use algorithms that identify a “super-node” amongst them that would serve for indexing and NAT. The nodes are selected without the knowledge of users. The algorithm seems to recognize a super-node by evaluating its up-time and encountered congestion. This saves Skype from having to set up and maintain centralized servers.
The Flow of Information
Initially, the interfacing of external devices, such as a webcam and a microphone to desktops, were a necessity to implement video calling, but the vehement scaling down of silicon technology led to the birth of devices that are already integrated with these features. Nowadays, video calling isn’t a novelty. Every device from laptops to mobile phones have front cameras and microphones to make audio-visual input a breeze.
To initiate a connection, a VoIP application uses a variation of the Internet Protocol (IP) known as the Session Initiation Protocol (SIP) to locate and connect with a device. SIP is only involved in signaling initiation and terminating the communication.
After the connection is established and authenticated, the hardware modules come into play. They convert analog signals such as light and voice into digital signals. The protocols then stuff this digital data into the envelopes and send them on their journey, where upon their arrival they are converted back to digital data, and finally back to analog output, which we can comprehend. The connection is then awkwardly terminated when you finally run out of things to talk about.
Ideally, the receiving end reconstructs the packets sequentially, but as you might have guessed, this practically never happens. Most public networks are prone to congestion during peak hours, and in the hustle and bustle of severe traffic, some packets lose their way. In the absence of packets, the receiver is unable to reconstruct the complete message accurately, which is why live conversations operating on VoIP are always prone to stutter and latency.
Private networks are usually better, but poorer speeds and hogged networks are the reason why conversations look like the video in The Ring or the classic Hollywood existential threat from aliens, played out across televisions around the world.