Hyperledger Fabric - KFS
Posts
What happens when we have 25 channels and 90 peers in a Fabric Network?

What happens when we have 25 channels and 90 peers in a Fabric Network?

DAVID VIEJO
February 10, 2024

Hi there!

Last week, I ran into an issue where these warning logs were being thrown:

Haven't heard from [203 3 128 37 188 112 224 13 135 56 248 228 68 169 189 74 196 244 83 13 195 237 130 172 181 124 8 154 82 72 229 217] for 27.167205923s

Troubleshooting

Then, after this warning, the connection between 2 peers closed, it happened every 30 seconds.

The network kept working correctly, but these warnings told me something was happening.

This is the line where the warning was being logged:

These messages are related to the discovery service, which is in charge of discovering peers of other organizations to exchange data; this can be:

Private data
Dissemination of blocks

More information can be found in the official documentation.

To have additional context, the anchor peers, which are in the configuration block of the channel, are reached by each peer so that every peer knows about the available peers in the network.

There’s a heartbeat interval where the peers send a “ping” to each other. So, if the peer was inactive or didn’t send any data for a period of time, this peer was removed from the “alive” peers list to avoid contacting peers who were down.

But this wasn’t our case, since peers were alive all the time (in Kubernetes), there wasn’t any restart of the peers and communication happened inside the cluster.

Solution

So, what was going on?

It seems we had many peers and channels; in total we had:

+80 peers
+25 channels

After talking to one of the maintainers of the gossip protocol in Hyperledger Fabric, he suggested to modify these variables:

# Alive check interval(unit: second)
aliveTimeInterval: 5s
# Alive expiration timeout(unit: second)
aliveExpirationTimeout: 25s
# Reconnect interval(unit: second)
reconnectInterval: 25s

The variables above are in core.peer.gossip section in the peer.yaml.

What happened is that in a normal network, the connections are bidirectional, so if there are three peers in the same network, the connections opened will be:

Peer1 → Peer2
Peer2 → Peer 3
Peer1 → Peer3

If there are two channels, there will be six connections opened:

Each channel was independent, but two organizations with 3 peers had access to each channel.

So, if we have 27 channels with 9 peers each, that means that the two organizations that are joined to all the channels have:

= 9 nodes × 25 channels = at least 225 connections!

After increasing the values multiple times, these were the final values for our network.

# Alive check interval(unit: second)
aliveTimeInterval: 40s
# Alive expiration timeout(unit: second)
aliveExpirationTimeout: 130s
# Reconnect interval(unit: second)
reconnectInterval: 130s

Conclusion

Scaling a Hyperledger Fabric network is hard, especially when the network throws warnings you haven’t seen before or when you start going beyond the usual use cases of Fabric, which is one orderer organization and <5 organization peers.

Gossip protocol scales, but specific parameters must be changed to work correctly.

I hope you learnt something new this Saturday!

Regards

— David