Kademlia Node Usage

Kademlia Node Usage #

In this section, we see how to create your Kademlia Nodes. Make sure you have read abstraction layer first.

Introduction #

This library has 2 Kademlia related interfaces.

  • KademliaNodeAPI: Main implementation is KademliaNode class. It only handles bootstrapping the network and keeping it alive.
  • KademliaDHTNodeAPI: Extends KademliaNodeAPI interface. Main implementation is KademliaDHTNode which decorates any KademliaNodeAPI.

Therefore, to create KademliaDHTNode you first need an instance of KademliaNodeAPI (KademliaNode).

It’s not as complicated as it may sound. Read below.

Kademlia Node #

For only network bootstrapping and continues routing-table updates you can consider using default KademliaNodeAPI (interface) implementation: KademliaNode (impelementation).

import io.ep2p.kademlia.node.KademliaNodeAPI;
import io.ep2p.kademlia.node.KademliaNode;

KademliaNodeAPI<ID, C> node = new KademliaNode<>(id, connectionionInfo, routingTable, messageSenderAPI, nodeSettings);

You can start the node by calling start() method, or you can bootstrap it by calling bootstrap(bootstrapNode) method and pass Node<ID, C> of bootstrap node as the parameter.

You can stop the node by calling stop() method which should gracefully stop the node and its executor services, or you can stop it by calling stopNow() method which will immediately the running jobs.

Generics #

Note that KademliaNodeAPI and all its implementations are Generic (KademliaNodeAPI<ID, C>).

The first generic type is for ID of the node, and you can choose between Integer, Long, BigInteger, and you should decide on which on to use based on your GUID space size.

For example on the other software I am developing BigInteger is being used as ID since node IDs are SHA1 of a cryptographic public keys.

You can choose the appropriate RoutingTable implementation based on this key size. There is an implementation available for all of the three supported ID types.

The second generic type is type of your class that implements ConnectionInfo. You are not limited on how to implement this class. The purpose of this class is to give information to other nodes on how to connect to this/current node.

For example, this could simply be TCP (or UDP if you prefer) IP/PORT of a node. In our Kademlia Netty implementation we are using “host” and “port” for example.

DHT Kademlia Node #

By default, KademliaAPI class does not focus on [D]istributed [H]ash [T]able. In case you want to use DHT, you should consider using DHTKademliaNodeAPI interface. The main implementation is DHTKademliaNode class.

This class decorates KademliaNodeAPI as explained earlier. So you can instantiate it in 2 ways.

A. Using constructor:

new DHTKademliaNode<>(kademliaNodeAPI, keyHashGenerator, kademliaRepository);

B. Using builder:

new DHTKademliaNodeBuilder<>(id, connectionInfo, routingTable, messageSenderAPI, keyHashGenerator, kademliaRepository).build();

Performance #

The general performance of this library depends on how the abstraction is implemented. However, there were a couple of performance improvement techniques in mind.

Pushing/Pulling Values for store() call #

By default, when store method is called on a DHTKademliaNode it will keep pushing the <K,V> to the closest node. This may be a fine approach when the value is not large.

Alternatively, you can choose the pulling mechanism, which will push only the key to find the closest node, and then the closest node to store a key will pull the value from the requester node.

DHTStoreServiceAPI is in charge of handing the store mechanism. In order to override the default behavior you’d need to change the factory (DHTStoreServiceFactory) of your DHTKademliaNode.

The code snippet below shows you how to do so:

// Creating a factory to return PullingDHTStoreService instance
DHTStoreServiceFactory<I, C, K, V> dhtStoreServiceFactory = new DHTStoreServiceFactory<I, C, K, V>() {
    @Override
    public PushingDHTStoreService<I, C, K, V> getDhtStoreService(DHTKademliaNodeAPI<I, C, K, V> kademliaNodeAPI) {
        // Example of ExecutorService to use:  Executors.newFixedThreadPool((int) Math.ceil(kademliaNodeAPI.getNodeSettings().getDhtExecutorPoolSize()/2))
        return new PullingDHTStoreService<>(kademliaNodeAPI, YOUR_DESIRED_EXECUTOR_SERVICE_HERE);
    }
};

new DHTKademliaNodeBuilder<>(...)
        .setDhtStoreServiceFactory(dhtStoreServiceFactory)
        .build();

Settings #

You can configure your node by passing an instance of NodeSettings to KademliaNode or it’s subclasses. By default, they use the configuration from NodeSettings.Default which has static fields. You can edit NodeSettings.Default if all the nodes that you want to have in a single application are having the same configuration.

Node Settings explanation:

NameInfo
IDENTIFIER_SIZEGUID Space AKA Routing table buckets count
BUCKET_SIZEMax size of each bucket in routing table
FIND_NODE_SIZESize of results list when finding close nodes
MAXIMUM_LAST_SEEN_AGE_TO_CONSIDER_ALIVEValue in seconds. If a node is older than this age, we will ping it before sending messages
PING_SCHEDULE_TIME_VALUEValue in PING_SCHEDULE_TIME_UNIT to send pings in scheduler
PING_SCHEDULE_TIME_UNITTimeUnit for ping scheduler
DHT_EXECUTOR_POOL_SIZESize of ExecutorService in DHT for default constructor
ENABLED_FIRST_STORE_REQUEST_FORCE_PASSEnables force pass on first store loop. Has negative impact on performance but makes sure another node also tries to store. Useful in situations where a node doesn’t know much about other nodes in the network

Republishing Keys #

In DHT we often require to republish keys. For example in a scenario where a node is shutting down there should be a mechanism for the node to republish its keys (and values) to other nodes before shutting down. This way there won’t be much data loss. Another scenario is when a new node joins the network, and we (as an individual node) may want to pass the data that the new node should hold to them. Therefore, we should periodically republish our data so closer nodes hold it.

This kademlia-api does not implement key republishing since 4.0.0-RELEASE version. This mechanism can be very tricky and very depended on how each developer implements their KademliaRepository. For example, when a single key can hold large amount of data we’d probably want to find a closer node first and then pass the data to them in different chunks.