KRPC Protocol: The Language of Torrent Peers

Abstract
BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts of data over the internet. It works by allowing users to download small pieces of a file from multiple sources, rather than downloading the entire file from a single source. This makes it more efficient than traditional download methods and allows for faster download speeds.

BitTorrent has gained notoriety for how it is used, while fewer people know about the fascinating aspects of its underlying technology. In this blog we will explore how clients implementing BitTorrent protocol communicate with each other in a tracker-less environment and especially how it is implemented using KRPC protocol.

Tracker vs Tracker-less

BitTorrent is a peer-to-peer file sharing method in which each participant downloads pieces of the file from other peers in the network, following certain rules (such as the choking algorithm). In the original version of BitTorrent, although the peers download files in a decentralized fashion but the implementation is not truly decentralized.

BitTorrent Tracker architecture. BitTorrent Tracker architecture.

Early BitTorrent implementations required clients to connect to a centralized server called tracker. The tracker responds with a list of other participating peers that the client can connect to. The client can then download file chunks from those peers. The tracker was an essential part of the BitTorrent protocol, as it helps to coordinate the activity of peers in a network.

As with any centralized service, there are several potential flaws with the use of BitTorrent trackers:

Centralization: Since the tracker is the central orchestrator of the swarm, it can be a single point of failure. If the tracker goes offline, the swarm will be unable to communicate, and the download will stop.
Scalability: As the number of seeds and peers in the swarm increases, the tracker can become overwhelmed and unable to handle the load. This can lead to slowdowns or even failure of the tracker.
Privacy: The tracker maintains a list of seeds and peers, which could potentially be used to identify the IP addresses of users downloading and sharing torrents. This can be a concern for users who want to maintain their privacy.

To address these issues, some BitTorrent clients now use a decentralized (Tracker-less) tracking system known as DHT (Distributed Hash Table), which does not rely on a central tracker and can be more resilient and scalable.

DHT
Each BitTorrent peer uses a "distributed sloppy hash table" (DHT) for storing peer contact information in a "tracker-less" environment, i. e. each peer becomes a tracker, and no centralized tracker server is needed.

Simplified DHT Architecture Simplified DHT Architecture [1]

Any participating node in the network can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to peers’ information. The key-value mapping responsibility is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continuous node arrivals, departures, and failures.

Kademlia
Kademlia is a Distributed Hash Table implementation that distributes the key-value stores across nodes in a network and retrieves them without any central authority/database.

Every node in the Kademlia network is identified by a unique 20-byte SHA-1 hash. The hash function is also used in reducing the distributed keys to 20 byte unique id.

The node closest to the key is the one that owns the key and is responsible for holding it. But to define which is the closest one, we need to define a distance metric that could quantify the distance between two entities. Kademlia uses XOR as a distance metric and defines it as a simple XOR between the ID of the entities.

Because there is no central entity, nodes must know how to route requests among themselves such that they always converge to the right node.
To ensure this, every node in the network keeps track of a few nodes in its routing table. These are not random, but very strategic.
Every node knows at least one node in each subtree that it is not part of. This means that the routing table may not have the address of the desired node, but it can lead us to one of the nodes present in its subtree.
By following a greedy approach, the nodes can route us, step-by-step, to the desired node. This is a classic Overlay network with its own routing.\

KRPC protocol
The nodes in DHT (Kademlia) communicate through an RPC (Remote Procedure Call) mechanism over UDP called KRPC. The KRPC protocol is a simple RPC mechanism consisting of bencoded dictionaries sent over UDP. A single query packet is sent out and a single packet is sent in response. There are three message types: query, response, and error. For the DHT protocol, there are four queries: ping, find_node, get_peers, and announce_peer. We will be diving deep about these queries and how they look like in actual network packets below-

Ping

The most basic query is a ping. It is used not check whether a node is reachable or not.

A ping query has a single argument, "id" the value is a 20-byte string containing the sender’s node ID in network byte order.

arguments:  {"id" : "<querying nodes id>"}

Packet Structure Example

ping Query = {"t":"aa", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}}
bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t2:aa1:y1:qe

Ping Query packet Ping Query packet

 
response: {"id" : "<queried nodes id>"}

Packet Structure Example

Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re

Ping Response Packet Ping Response Packet

**Find node**

Find node is used to find the contact information for a node from its ID.

A find_node query has two arguments, "id" containing the node ID of the querying node, and "target" containing the ID of the node sought by the querier.

arguments:  {"id" : "<querying nodes id>", "target" : "<id of target node>"}

Packet Structure Example

find_node Query = {"t":"aa", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}
bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t2:aa1:y1:qe

Find Node Query Packet Find Node Query Packet

When a node gets find_node query it responds with a find_node response. The find_node response is a bencoded dictionary consists of a key "nodes" and value of a string containing the compact node info for the target node or the K (8) closest good nodes in its own routing table.

response: {"id" : "<queried nodes id>", "nodes" : "<compact node info>"}

Packet Structure Example

Response = {"t":"aa", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456..."}}
bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...e1:t2:aa1:y1:re

Find Node Response Packet Find Node Response Packet

Get peers

Get peers associated with a torrent infohash The queried node responds with the peer list (if present) contating a particular infohash.
A get_peers query has two arguments, "id" containing the node ID of the querying node, and "info_hash" containing the infohash of the torrent.

arguments:  {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>"}

Packet Structure Example

get_peers Query = {"t":"aa", "y":"q", "q":"get_peers", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456"}}
bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe

Get Peers Query Packet *Get Peers Query Packet *

The get_peers response is sent by the queried node. If has peers for the infohash, they are returned in a key "values" as a list of strings. Each string containing "compact" format peer information for a single peer. If the queried node has no peers for the infohash, a key "nodes" is returned containing the K nodes in the queried nodes routing table closest to the infohash supplied in the query. In either case a "token" key is also included in the return value. The token value is a required argument for a future announce_peer query. The token value should be a short binary string.

response: {"id" : "<queried nodes id>", "token" :"<opaque write token>", "values" : ["<peer 1 info string>", "<peer 2 info string>"]}

Packet Structure Example

Response with peers = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "values": ["axje.u", "idhtnm"]}}
bencoded = d1:rd2:id20:abcdefghij01234567895:token8:aoeusnth6:valuesl6:axje.u6:idhtnmee1:t2:aa1:y1:re

Get Peers Response Packet Get Peers Response Packet

Announce peers

This query is used to announce that the peer, controlling the querying node, is downloading a torrent on a port.
announce_peer query has four arguments: "id" containing the node ID of the querying node, "info_hash" containing the infohash of the torrent, "port" containing the port as an integer, and the "token" received in response to a previous get_peers query.

arguments:  {"id" : "<querying nodes id>",
  "implied_port": <0 or 1>,
  "info_hash" : "<20-byte infohash of target torrent>",
  "port" : <port number>,
  "token" : "<opaque token>"}

Packet Structure Example

announce_peers Query = {"t":"aa", "y":"q", "q":"announce_peer", "a": {"id":"abcdefghij0123456789", "implied_port": 1, "info_hash":"mnopqrstuvwxyz123456", "port": 6881, "token": "aoeusnth"}}
bencoded = d1:ad2:id20:abcdefghij012345678912:implied_porti1e9:info_hash20:mnopqrstuvwxyz1234564:porti6881e5:token8:aoeusnthe1:q13:announce_peer1:t2:aa1:y1:qe

Announce Peers query packet Announce Peers query packet

During announce_peers response, queried node must verify that the token was previously sent to the same IP address as the querying node. Then the queried node should store the IP address of the querying node and the supplied port number under the infohash in its store of peer contact information.
There is an optional argument called implied_port which value is either 0 or 1. If it is present and non-zero, the port argument should be ignored and the source port of the UDP packet should be used as the peer's port instead. This is useful for peers behind a NAT that may not know their external port, and supporting uTP, they accept incoming connections on the same port as the DHT port.

response: {"id" : "<queried nodes id>"}

Packet Structure Example

Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re

Announce Peers Response packet Announce Peers Response packet

BitTorrent in BPS

Keysight ATI (Application and Threat Intelligence) team have covered a wide variety of BitTorrent use cases.

BitTorrent Superflows in Keysight BreakingPoint. BitTorrent Superflows in Keysight BreakingPoint.

BreakingPoint offers highly customizable BitTorrent traffic to test your network equipment performance with high-fidelity simulated network traffic scenarios.
The BPS offers niche capability like mixing BitTorrent traffic with thousands of other application traffics to make a real world network traffic simulation that flows through your network equipment.
For more details about Keysight BreakingPoint and to test your network equipment against the most updated network traffic available in the internet visit BreakingPoint. \

limit

KRPC Protocol: The Language of Torrent Peers

Related Content

Related Posts