Etcd: cluster bring-up details

Understanding etcd cluster formation


Etcd uses Raft algorithm for cluster formation and consensus building. In this post we will look at details of cluster bring-up process of Etcd.

Overview of Raft Algorithm


In a distributed data store cluster, data consistency is achieved by consensus within the majority of the cluster members.  Raft is a leader based consensus building algorithm.

Consensus based system can be either developed using a leader-based (asymmetric) or leader-less (symmetric) approach. Leader based system are simpler to implement and maintain in normal operation. Raft decomposes the problem of achieving consensus into two parts:

  • Normal Operation state

A cluster node participating in Raft can be in one of the three roles leader, follower or candidate. A leader is the node to which the clients communicate. The leader is responsible for maintaining the consistency of data. All other nodes are followers and follow the commands from leader.

From the point of view of the client, the system behaves exactly like a single node server. The cluster leader maintains consistency of data by making sure that any change in data are made only by the leader and acknowledged(and replicated) by a majority of cluster nodes.

  • Leader Election state

If the follower nodes in the cluster detect lose of leader node (if they do not receive heartbeat from leader) they transition to candidate role. The candidate nodes then contest for a leader election and tries to become a leader by asking for voles from other candidate nodes. If one of the candidates receives a majority, it is chosen as the leader, else if there is a split vote a new election is started and the process repeats. Every time an election is started the term of election is incremented.

We will not cover the details of Raft in this blog. There are many good resources on the net that describe Raft.

A 3-Node cluster example


In my previous blog we looked at bringing up a 3 node Etcd cluster based on name-spaces. Lets now examine the details of cluster bootstrap and operation. Complete logs are available for reference

We will look at the following operations:

  1. Initial cluster formation
  2. Re-Election after leader lose

The following section describes the working of Etcd/Raft in details

Slide1

Initial cluster formation


  • Node_1 starts and waits for votes from other cluster members, as none of the other members are alive quorum cannot be reached.
  • Node_1 goes for re-election after incrementing Term.
  • This process continues till Term-63  is reached when Node_2 joins the cluster.
  • Election term 63 and 64 are unable to elect a leader, but in term 65 Node_2 is elected the Leader.
  • Node_3 later joins the cluster. It receives a heartbeat message from the leader and assumes Follower role.

Node_1

...
...

2015/05/23 09:20:15 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to c9899e68a83f6626 at term 63
2015/05/23 09:20:15 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to 14fb055fe2856bf7 at term 63
2015/05/23 09:20:15 sender: the connection with 14fb055fe2856bf7 became active
2015/05/23 09:20:15 raft: a19202a30ffa8431 received vote rejection from 14fb055fe2856bf7 at term 63
2015/05/23 09:20:15 raft: a19202a30ffa8431 [q:2] has received 1 votes and 1 vote rejections
2015/05/23 09:20:16 etcdserver: publish error: etcdserver: request timed out
2015/05/23 09:20:16 raft: a19202a30ffa8431 is starting a new election at term 63
2015/05/23 09:20:16 raft: a19202a30ffa8431 became candidate at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 received vote from a19202a30ffa8431 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to c9899e68a83f6626 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to 14fb055fe2856bf7 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 received vote rejection from 14fb055fe2856bf7 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 [q:2] has received 1 votes and 1 vote rejections
2015/05/23 09:20:17 raft: a19202a30ffa8431 [term: 64] received a MsgVote message with higher term from 14fb055fe2856bf7 [term: 65]
2015/05/23 09:20:17 raft: a19202a30ffa8431 became follower at term 65
2015/05/23 09:20:17 raft: a19202a30ffa8431 [logterm: 15, index: 2215, vote: 0] voted for 14fb055fe2856bf7 [logterm: 16, index: 30883] at term 65
2015/05/23 09:20:17 raft: a19202a30ffa8431 [logterm: 0, index: 30883] rejected msgApp [logterm: 16, index: 30883] from 14fb055fe2856bf7
2015/05/23 09:20:17 raft.node: a19202a30ffa8431 elected leader 14fb055fe2856bf7 at term 65

Node_2

2015/05/23 09:20:14 etcd: no data-dir provided, using default data-dir ./Node_2.etcd
...
...
2015/05/23 09:20:14 etcdserver: advertise client URLs = http://1.1.1.2:2379,http://1.1.1.2:4001
2015/05/23 09:20:14 etcdserver: loaded cluster information from store: Node_1=http://1.1.1.1:2380,Node_2=http://1.1.1.2:2380,Node_3=http://1.1.1.3:2380
2015/05/23 09:20:14 etcdserver: restart member 14fb055fe2856bf7 in cluster bb883d6508fab334 at commit index 30883
2015/05/23 09:20:14 raft: 14fb055fe2856bf7 became follower at term 16
2015/05/23 09:20:14 raft: newRaft 14fb055fe2856bf7 [peers: [14fb055fe2856bf7,a19202a30ffa8431,c9899e68a83f6626], term: 16, commit: 30883, applied: 30003, lastindex: 30883, lastterm: 16]
2015/05/23 09:20:15 raft: 14fb055fe2856bf7 [term: 16] received a MsgVote message with higher term from a19202a30ffa8431 [term: 63]
2015/05/23 09:20:15 raft: 14fb055fe2856bf7 became follower at term 63
2015/05/23 09:20:15 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883, vote: 0] rejected vote from a19202a30ffa8431 [logterm: 15, index: 2215] at term 63
2015/05/23 09:20:16 raft: 14fb055fe2856bf7 [term: 63] received a MsgVote message with higher term from a19202a30ffa8431 [term: 64]
2015/05/23 09:20:16 raft: 14fb055fe2856bf7 became follower at term 64
2015/05/23 09:20:16 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883, vote: 0] rejected vote from a19202a30ffa8431 [logterm: 15, index: 2215] at term 64
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 is starting a new election at term 64
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 became candidate at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 received vote from 14fb055fe2856bf7 at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883] sent vote request to a19202a30ffa8431 at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883] sent vote request to c9899e68a83f6626 at term 65
2015/05/23 09:20:17 sender: error posting to c9899e68a83f6626: dial tcp 1.1.1.3:2380: connection refused
2015/05/23 09:20:17 sender: the connection with c9899e68a83f6626 became inactive
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 received vote from a19202a30ffa8431 at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 [q:2] has received 2 votes and 0 vote rejections
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 became leader at term 65
2015/05/23 09:20:17 raft.node: 14fb055fe2856bf7 elected leader 14fb055fe2856bf7 at term 65

Node_3

2015/05/23 09:20:46 etcd: no data-dir provided, using default data-dir ./Node_3.etcd
2015/05/23 09:20:46 etcd: already initialized as member before, starting as etcd member...
2015/05/23 09:20:46 etcd: listening for peers on http://1.1.1.3:2380
...
...
2015/05/23 09:20:46 etcdserver: advertise client URLs = http://1.1.1.3:2379,http://1.1.1.3:4001
2015/05/23 09:20:46 etcdserver: loaded cluster information from store: Node_1=http://1.1.1.1:2380,Node_2=http://1.1.1.2:2380,Node_3=http://1.1.1.3:2380
2015/05/23 09:20:46 etcdserver: restart member c9899e68a83f6626 in cluster bb883d6508fab334 at commit index 30883
2015/05/23 09:20:46 raft: c9899e68a83f6626 became follower at term 16
2015/05/23 09:20:46 raft: newRaft c9899e68a83f6626 [peers: [14fb055fe2856bf7,a19202a30ffa8431,c9899e68a83f6626], term: 16, commit: 30883, applied: 30003, lastindex: 30883, lastterm: 16]
2015/05/23 09:20:46 raft: c9899e68a83f6626 [term: 16] received a MsgHeartbeat message with higher term from 14fb055fe2856bf7 [term: 65]
2015/05/23 09:20:46 raft: c9899e68a83f6626 became follower at term 65
2015/05/23 09:20:46 raft.node: c9899e68a83f6626 elected leader 14fb055fe2856bf7 at term 65
2015/05/23 09:20:46 rafthttp: starting client stream to 14fb055fe2856bf7 at term 65
2015/05/23 09:20:46 etcdserver: published {Name:Node_3 ClientURLs:[http://1.1.1.3:2379 http://1.1.1.3:4001]} to cluster bb883d6508fab334

Re-Election after leader lose


  • The Leader node, Node_2 is stopped to simulate a node failure.
  • Leader lose is detected by Node_1 and Node_3.
  • Node_3 starts new election term 66 and send out vote request
  • Node_3 receives the vote from Node_1 and becomes the Leader

Node_1

2015/05/23 20:57:56 rafthttp: client streaming to 14fb055fe2856bf7 at term 65 has been stopped
2015/05/23 20:57:58 raft: a19202a30ffa8431 is starting a new election at term 65
2015/05/23 20:57:58 raft: a19202a30ffa8431 became candidate at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 received vote from a19202a30ffa8431 at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 [logterm: 65, index: 60433] sent vote request to c9899e68a83f6626 at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 [logterm: 65, index: 60433] sent vote request to 14fb055fe2856bf7 at term 66
2015/05/23 20:57:58 raft.node: a19202a30ffa8431 lost leader 14fb055fe2856bf7 at term 66
2015/05/23 20:57:58 sender: error posting to 14fb055fe2856bf7: dial tcp 1.1.1.2:2380: connection refused
2015/05/23 20:57:58 sender: the connection with 14fb055fe2856bf7 became inactive
2015/05/23 20:57:58 raft: a19202a30ffa8431 received vote from c9899e68a83f6626 at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 [q:2] has received 2 votes and 0 vote rejections
2015/05/23 20:57:58 raft: a19202a30ffa8431 became leader at term 66
2015/05/23 20:57:58 raft.node: a19202a30ffa8431 elected leader a19202a30ffa8431 at term 66

Node_2

Node is stopped.

Node_3

2015/05/23 20:57:56 rafthttp: client streaming to 14fb055fe2856bf7 at term 65 has been stopped
2015/05/23 20:57:58 raft: c9899e68a83f6626 [term: 65] received a MsgVote message with higher term from a19202a30ffa8431 [term: 66]
2015/05/23 20:57:58 raft: c9899e68a83f6626 became follower at term 66
2015/05/23 20:57:58 raft: c9899e68a83f6626 [logterm: 65, index: 60433, vote: 0] voted for a19202a30ffa8431 [logterm: 65, index: 60433] at term 66
2015/05/23 20:57:58 raft.node: c9899e68a83f6626 lost leader 14fb055fe2856bf7 at term 66
2015/05/23 20:57:58 raft.node: c9899e68a83f6626 elected leader a19202a30ffa8431 at term 66
2015/05/23 20:57:58 rafthttp: starting client stream to a19202a30ffa8431 at term 66

Conclusion


In this blog we saw the cluster formation and leader election process with Etcd and Raft. We also looked at the process of re-election. I will update the blog with more test scenarios and analyze the behavior or Etcd and Raft in coming days

Advertisements

Published by

Chandan Dutta Chowdhury

Software Engineer

One thought on “Etcd: cluster bring-up details”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s