Test Driving Etcd – part 3 (Service Discovery and Proxy)

This blog is part of 3 part series on etcd

Part1: Cluster Setup

Part2: API and HA Application

Part3: Service Discovery and Proxy

Service Discovery

In a clustered environment services can be dynamically created and moved around due to reasons line node failure, maintenance or load rebalancing etc. In each of these cases the client need to discover the address the services running on the cluster. Using a service-registry to which the service must register itself once started normally solves this problem.

Etcd being a distributed and highly available data-store can be used as a service-registry to maintain records of services available in the cluster. Directory APIs exposed by Etcd provide a convenient way to group related configuration records.

Lets take a brief look at these APIs and how these APIs can be used to provide a Service Discovery mechanism.

The Directory APIs

Etcd stores data in a file-system like hierarchy, the key-value pairs can be stored in a directory and a directory can be nested within another. Directories are automatically created if they are part of the path of a key.


client > ./etcdctl set /Dir1/Dir2/key value
value
client > ./etcdctl ls /Dir1
/Dir1/Dir2
client > ./etcdctl ls /Dir1/Dir2
/Dir1/Dir2/key

Directories can also be created explicitly

client > ./etcdctl mkdir /dir3

Directories can have TTL values associated, when the timer expires the directory and its contents are recursively removed

client > ./etcdctl mkdir /dir4 --ttl 10

We can list and delete the contents of the Directory recursively

client > ./etcdctl ls /Dir1 --recursive=True
/Dir1/Dir2
/Dir1/Dir2/key

To create a hidden directory(or key) its name should start with an “_”

client > ./etcdctl mkdir /_dir5
client > ./etcdctl ls -p /
/server
/testing
/Dir1/
/dir3/

Another interesting directory feature is the support for ordered keys. This API allows the Directory to record the order in which the key value pair is added to it.

client > curl http://1.1.1.1:2379/v2/keys/job_queue -XPOST -d value=Job1
{"action":"create","node":{"key":"/job_queue/2667","value":"Job1","modifiedIndex":2667,"createdIndex":2667}}

client > curl http://1.1.1.1:2379/v2/keys/job_queue -XPOST -d value=Job2                  
{"action":"create","node":{"key":"/job_queue/2669","value":"Job2","modifiedIndex":2669,"createdIndex":2669}}             

client > curl http://1.1.1.1:2379/v2/keys/job_queue -XPOST -d value=Job3
{"action":"create","node":{"key":"/job_queue/2670","value":"Job3","modifiedIndex":2670,"createdIndex":2670}} 

Lets do a listing to view the keys

client > curl http://1.1.1.1:2379/v2/keys/job_queue?sorted=true
{"action":"get","node":{"key":"/job_queue","dir":true,"nodes":[{"key":"/job_queue/2667","value":"Job1","modifiedIndex":2667,"createdIndex":2667},{"key":"/job_queue/2669","value":"Job2","modifiedIndex":2669,"createdIndex":2669},{"key":"/job_queue/2670","value":"Job3","modifiedIndex":2670,"createdIndex":2670}],"modifiedIndex":2667,"createdIndex":2667}}

Using Directory APIs for discovery

As seen in the previous section the directory APIs can be used to group together related records and create a hierarchy of records. These features of the data-store can be used to store information about service, the hierarchy of these records helps in creating a dependency tree like structure. As etcd is able to maintain the order of entries added to the directory it help is creating a queue like structure.

Service_disc

Lets try to create a hypothetical service record using these APIs. For our example lets, assume that all services will be registered under the “_services” directory. When the mysql service is deployed in the cluster the deployment process will create a service record for the database service as a directory ”db” with records such as “mysql:host” with value to the mysql server host amd “mysql:port” with the value of the port used by mysql and so on.

Similarly other services deployed in the cluster should register themselves by creating directories under the “_services” directory.

_service/
├── db
│   ├── mysql:host
│   ├── mysql:password
│   ├── mysql:port
│   └── mysql:user
└── web
    ├── apache:host
    └── apache:port

For a client to discover all the services available on the cluster all it needs to do is to list the contents of “_services” directory.

Etcd Proxy

With service discovery we solved the problem of publishing the services available on various nodes on the cluster, but as etcd is a distributed services itself the client needs to know which nodes to send the Etcd commands to and what id the node that the client was communicating with fails.

When a client tries to connect to an Etcd cluster it must know about the cluster Nodes and try connecting to each of them till the connection succeeds. If the cluster is augmented with more nodes at a later time, the clients need to be updated about the new nodes

proxy

Etcd proxy tries to resolve this problem by running an instance of etcd locally on the client node. The Etcd instance on the client node acts as a proxy to the real Etcd cluster and forwards all calls to the real cluster. The proxy instance does not participate in quorum formation or replication of configuration data.

This means that the client need not know about the cluster nodes or which node in the cluster is actually active and which one has failed.

To start a etcd instance as a proxy the –proxy flag should be set to “on” or “readonly. In addition the proxy needs to be told about the initial cluster members with the –initial-cluster flag. More details can be found in the documentation.

Conclusion

In this post we looked at the Directory APIs provided by Etcd and how those APIs can be used to provide a service discovery mechanism. We also looked at the proxy mode in Etcd, which helps in simplifying client configuration.

Advertisements

Test Driving Etcd – part 2 (API and HA Application)

This blog is part of 3 part series on etcd

Part1: Cluster Setup

Part2: API and HA Application

Part3: Service Discovery and Proxy

In my previous blog, we looked at Etcd a highly available distributed configuration data-store, in this post we will look at the API exposed by Etcd. We will use our previous setup to try these APIs. Finally we will develop a very simple HA application based on Etcd APIs and test service failover.

Etcd data API Overview


Etcd exposes REST based APIs for accessing the data-store, cluster configuration and statistics. We will mostly focus on the data-store API for this post.

Storing data in Etcd

Etcd stores data in key-value pairs; in the previous blog we have already looked at using etcdctl to store a key-value pair in the data-store. In this post we will use curl to interect with etcd. To store a key-value pair of “my_key”, “my_value” we use the following command.


client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XPUT -d value="my_value"

{"action":"set","node":{"key":"/my_key","value":"my_value","modifiedIndex":446,"createdIndex":446}}

Retrieving data

To read value of a key stored in the data-store use HTTP GET. Here is an example


client $ curl -v -L http://1.1.1.1:4001/v2/keys/my_key

{"action":"get","node":{"key":"/my_key","value":"my_value","modifiedIndex":476,"createdIndex":476}}

Deleting the data

Deleting data is as simple as using the DELETE verb with HTTP.


client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XDELETE

{"action":"delete","node":{"key":"/my_key","modifiedIndex":457,"createdIndex":446},"prevNode":{"key":"/my_key","value":"my_value","modifiedIndex":446,"createdIndex":446}}

Apart from the normal read/write/delete operation, Etcd also support some interesting data manipulation operation, which can be used to make cluster aware application. Lets look at some of this operation

Storing data with a TTL value

Etcd allow storage of key-value pair with an associated timeout. Here is an example usage.

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XPUT -d value="my_value" -d ttl=10

{"action":"set","node":{"key":"/my_key","value":"my_value","expiration":"2015-05-27T06:02:06.385658433Z","ttl":10,"modifiedIndex":693,"createdIndex":693}}

This means once the data is stored in the data-store it will only be available for certain period of time(till the timeout values expires) following which the key-value pair is automatically removed from the data-store

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key 

{"errorCode":100,"message":"Key not found", "cause":"/my_key","index":701}

Conditional update of data

Etcd allows conditional updated of key-value pair. This means a data update is allowed only if given criteria are met.

E.g. compare-and-swap operation which takes two values the previous value (P) and the new value(N) and allows updating the value associated with a key only if the previous value (P) matches the value of the key in the data-store

Lets take an example to explain this

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XPUT -d value="my_value"

{"action":"set","node":{"key":"/my_key","value":"my_value","modifiedIndex":787,"createdIndex":787}}

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XPUT -d value="my_value1" -d prevValue="my_value"

{"action":"compareAndSwap","node":{"key":"/my_key","value":"my_value1","modifiedIndex":796,"createdIndex":787},"prevNode":{"key":"/my_key","value":"my_value","modifiedIndex":787,"createdIndex":787}}

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XPUT -d value="my_value1" -d prevValue="my_value"

{"errorCode":101,"message":"Compare failed","cause":"[my_value != my_value1]","index":797}

Another case of compare-and-update is the creation of new key, to make sure that the key did not exist in the date-store before the operation the prevExist attribute of the key should be set to “false”

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XPUT -d value="my_value1" -d prevExist="false"

{"errorCode":105,"message":"Key already exists","cause":"/my_key","index":819}

The same logic applies for compare-and-delete operation which deletes the key-value pair if the old value supplied to the operation matches the one stored in the data-store

client $ curl -L http://1.1.1.1:4001/v2/keys/my_key -XDELETE -d prevValue="my_value1"
{"action":"delete","node":{"key":"/my_key","modifiedIndex":824,"createdIndex":787},"prevNode":{"key":"/my_key","value":"my_value1","modifiedIndex":796,"createdIndex":787}}</pre>

We have looked at very few of the data-store APIs, for a complete description of Etcd APIs have a look at the documentation

A Highly Available application using Etcd

Now that we have looked at the basic data-store APIs of Etcd let use these APIs to build a simple HA application. We will use the Etcd cluster setup that we built in the previous blog and extend it to include two server node which will be running our HA application. Following is the cluster setup for this test.

etcd-ha-app-setup

The application simple makes the shared service IP address of 1.1.1.50 highly available.

The application code is available here. The code will be running on both the servers (Server_1 and Server_2) one of them will act as the master node and host the service (Shared Service IP) in case of the master nodes failure the service will be moved to the backup node.

The following diagram shows the working of the application.

etcd-ha-app-flowchart

The Client node constantly tries to use the service (ping the Shared Service IP). Here is an example of service disruption during node switchover.

failover

The key to running a shared service by the two-node cluster in our example is acquiring the lock and the heart-beat mechanism. Lets look at the application code and see how Etcd API provides these clustering mechanisms.

function acquire_lock()
{
    $ETCDCTL mk /my_service $NODE --ttl $RES_TIME
    return $?
}

function heartbeat()
{
    $ETCDCTL set /my_service $NODE --swap-with-value $NODE --ttl $RES_TIME
    return $?
}

The application uses compare-and-update operation to setup a new key (my_service) it then uses data update with TTL value to keep holding the key. This prevents the second instance of the application to start the service. Etcd thus provides a distributed locking mechanism for the clustered application.

Conclusion

In this post, we briefly looked at some of the interesting data manipulation API exposed by Etcd. We used these APIs to develop a simple HA application. In the following post, we will look Etcd and service discovery.

Etcd: cluster bring-up details

Understanding etcd cluster formation


Etcd uses Raft algorithm for cluster formation and consensus building. In this post we will look at details of cluster bring-up process of Etcd.

Overview of Raft Algorithm


In a distributed data store cluster, data consistency is achieved by consensus within the majority of the cluster members.  Raft is a leader based consensus building algorithm.

Consensus based system can be either developed using a leader-based (asymmetric) or leader-less (symmetric) approach. Leader based system are simpler to implement and maintain in normal operation. Raft decomposes the problem of achieving consensus into two parts:

  • Normal Operation state

A cluster node participating in Raft can be in one of the three roles leader, follower or candidate. A leader is the node to which the clients communicate. The leader is responsible for maintaining the consistency of data. All other nodes are followers and follow the commands from leader.

From the point of view of the client, the system behaves exactly like a single node server. The cluster leader maintains consistency of data by making sure that any change in data are made only by the leader and acknowledged(and replicated) by a majority of cluster nodes.

  • Leader Election state

If the follower nodes in the cluster detect lose of leader node (if they do not receive heartbeat from leader) they transition to candidate role. The candidate nodes then contest for a leader election and tries to become a leader by asking for voles from other candidate nodes. If one of the candidates receives a majority, it is chosen as the leader, else if there is a split vote a new election is started and the process repeats. Every time an election is started the term of election is incremented.

We will not cover the details of Raft in this blog. There are many good resources on the net that describe Raft.

A 3-Node cluster example


In my previous blog we looked at bringing up a 3 node Etcd cluster based on name-spaces. Lets now examine the details of cluster bootstrap and operation. Complete logs are available for reference

We will look at the following operations:

  1. Initial cluster formation
  2. Re-Election after leader lose

The following section describes the working of Etcd/Raft in details

Slide1

Initial cluster formation


  • Node_1 starts and waits for votes from other cluster members, as none of the other members are alive quorum cannot be reached.
  • Node_1 goes for re-election after incrementing Term.
  • This process continues till Term-63  is reached when Node_2 joins the cluster.
  • Election term 63 and 64 are unable to elect a leader, but in term 65 Node_2 is elected the Leader.
  • Node_3 later joins the cluster. It receives a heartbeat message from the leader and assumes Follower role.

Node_1

...
...

2015/05/23 09:20:15 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to c9899e68a83f6626 at term 63
2015/05/23 09:20:15 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to 14fb055fe2856bf7 at term 63
2015/05/23 09:20:15 sender: the connection with 14fb055fe2856bf7 became active
2015/05/23 09:20:15 raft: a19202a30ffa8431 received vote rejection from 14fb055fe2856bf7 at term 63
2015/05/23 09:20:15 raft: a19202a30ffa8431 [q:2] has received 1 votes and 1 vote rejections
2015/05/23 09:20:16 etcdserver: publish error: etcdserver: request timed out
2015/05/23 09:20:16 raft: a19202a30ffa8431 is starting a new election at term 63
2015/05/23 09:20:16 raft: a19202a30ffa8431 became candidate at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 received vote from a19202a30ffa8431 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to c9899e68a83f6626 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 [logterm: 15, index: 2215] sent vote request to 14fb055fe2856bf7 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 received vote rejection from 14fb055fe2856bf7 at term 64
2015/05/23 09:20:16 raft: a19202a30ffa8431 [q:2] has received 1 votes and 1 vote rejections
2015/05/23 09:20:17 raft: a19202a30ffa8431 [term: 64] received a MsgVote message with higher term from 14fb055fe2856bf7 [term: 65]
2015/05/23 09:20:17 raft: a19202a30ffa8431 became follower at term 65
2015/05/23 09:20:17 raft: a19202a30ffa8431 [logterm: 15, index: 2215, vote: 0] voted for 14fb055fe2856bf7 [logterm: 16, index: 30883] at term 65
2015/05/23 09:20:17 raft: a19202a30ffa8431 [logterm: 0, index: 30883] rejected msgApp [logterm: 16, index: 30883] from 14fb055fe2856bf7
2015/05/23 09:20:17 raft.node: a19202a30ffa8431 elected leader 14fb055fe2856bf7 at term 65

Node_2

2015/05/23 09:20:14 etcd: no data-dir provided, using default data-dir ./Node_2.etcd
...
...
2015/05/23 09:20:14 etcdserver: advertise client URLs = http://1.1.1.2:2379,http://1.1.1.2:4001
2015/05/23 09:20:14 etcdserver: loaded cluster information from store: Node_1=http://1.1.1.1:2380,Node_2=http://1.1.1.2:2380,Node_3=http://1.1.1.3:2380
2015/05/23 09:20:14 etcdserver: restart member 14fb055fe2856bf7 in cluster bb883d6508fab334 at commit index 30883
2015/05/23 09:20:14 raft: 14fb055fe2856bf7 became follower at term 16
2015/05/23 09:20:14 raft: newRaft 14fb055fe2856bf7 [peers: [14fb055fe2856bf7,a19202a30ffa8431,c9899e68a83f6626], term: 16, commit: 30883, applied: 30003, lastindex: 30883, lastterm: 16]
2015/05/23 09:20:15 raft: 14fb055fe2856bf7 [term: 16] received a MsgVote message with higher term from a19202a30ffa8431 [term: 63]
2015/05/23 09:20:15 raft: 14fb055fe2856bf7 became follower at term 63
2015/05/23 09:20:15 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883, vote: 0] rejected vote from a19202a30ffa8431 [logterm: 15, index: 2215] at term 63
2015/05/23 09:20:16 raft: 14fb055fe2856bf7 [term: 63] received a MsgVote message with higher term from a19202a30ffa8431 [term: 64]
2015/05/23 09:20:16 raft: 14fb055fe2856bf7 became follower at term 64
2015/05/23 09:20:16 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883, vote: 0] rejected vote from a19202a30ffa8431 [logterm: 15, index: 2215] at term 64
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 is starting a new election at term 64
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 became candidate at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 received vote from 14fb055fe2856bf7 at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883] sent vote request to a19202a30ffa8431 at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 [logterm: 16, index: 30883] sent vote request to c9899e68a83f6626 at term 65
2015/05/23 09:20:17 sender: error posting to c9899e68a83f6626: dial tcp 1.1.1.3:2380: connection refused
2015/05/23 09:20:17 sender: the connection with c9899e68a83f6626 became inactive
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 received vote from a19202a30ffa8431 at term 65
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 [q:2] has received 2 votes and 0 vote rejections
2015/05/23 09:20:17 raft: 14fb055fe2856bf7 became leader at term 65
2015/05/23 09:20:17 raft.node: 14fb055fe2856bf7 elected leader 14fb055fe2856bf7 at term 65

Node_3

2015/05/23 09:20:46 etcd: no data-dir provided, using default data-dir ./Node_3.etcd
2015/05/23 09:20:46 etcd: already initialized as member before, starting as etcd member...
2015/05/23 09:20:46 etcd: listening for peers on http://1.1.1.3:2380
...
...
2015/05/23 09:20:46 etcdserver: advertise client URLs = http://1.1.1.3:2379,http://1.1.1.3:4001
2015/05/23 09:20:46 etcdserver: loaded cluster information from store: Node_1=http://1.1.1.1:2380,Node_2=http://1.1.1.2:2380,Node_3=http://1.1.1.3:2380
2015/05/23 09:20:46 etcdserver: restart member c9899e68a83f6626 in cluster bb883d6508fab334 at commit index 30883
2015/05/23 09:20:46 raft: c9899e68a83f6626 became follower at term 16
2015/05/23 09:20:46 raft: newRaft c9899e68a83f6626 [peers: [14fb055fe2856bf7,a19202a30ffa8431,c9899e68a83f6626], term: 16, commit: 30883, applied: 30003, lastindex: 30883, lastterm: 16]
2015/05/23 09:20:46 raft: c9899e68a83f6626 [term: 16] received a MsgHeartbeat message with higher term from 14fb055fe2856bf7 [term: 65]
2015/05/23 09:20:46 raft: c9899e68a83f6626 became follower at term 65
2015/05/23 09:20:46 raft.node: c9899e68a83f6626 elected leader 14fb055fe2856bf7 at term 65
2015/05/23 09:20:46 rafthttp: starting client stream to 14fb055fe2856bf7 at term 65
2015/05/23 09:20:46 etcdserver: published {Name:Node_3 ClientURLs:[http://1.1.1.3:2379 http://1.1.1.3:4001]} to cluster bb883d6508fab334

Re-Election after leader lose


  • The Leader node, Node_2 is stopped to simulate a node failure.
  • Leader lose is detected by Node_1 and Node_3.
  • Node_3 starts new election term 66 and send out vote request
  • Node_3 receives the vote from Node_1 and becomes the Leader

Node_1

2015/05/23 20:57:56 rafthttp: client streaming to 14fb055fe2856bf7 at term 65 has been stopped
2015/05/23 20:57:58 raft: a19202a30ffa8431 is starting a new election at term 65
2015/05/23 20:57:58 raft: a19202a30ffa8431 became candidate at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 received vote from a19202a30ffa8431 at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 [logterm: 65, index: 60433] sent vote request to c9899e68a83f6626 at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 [logterm: 65, index: 60433] sent vote request to 14fb055fe2856bf7 at term 66
2015/05/23 20:57:58 raft.node: a19202a30ffa8431 lost leader 14fb055fe2856bf7 at term 66
2015/05/23 20:57:58 sender: error posting to 14fb055fe2856bf7: dial tcp 1.1.1.2:2380: connection refused
2015/05/23 20:57:58 sender: the connection with 14fb055fe2856bf7 became inactive
2015/05/23 20:57:58 raft: a19202a30ffa8431 received vote from c9899e68a83f6626 at term 66
2015/05/23 20:57:58 raft: a19202a30ffa8431 [q:2] has received 2 votes and 0 vote rejections
2015/05/23 20:57:58 raft: a19202a30ffa8431 became leader at term 66
2015/05/23 20:57:58 raft.node: a19202a30ffa8431 elected leader a19202a30ffa8431 at term 66

Node_2

Node is stopped.

Node_3

2015/05/23 20:57:56 rafthttp: client streaming to 14fb055fe2856bf7 at term 65 has been stopped
2015/05/23 20:57:58 raft: c9899e68a83f6626 [term: 65] received a MsgVote message with higher term from a19202a30ffa8431 [term: 66]
2015/05/23 20:57:58 raft: c9899e68a83f6626 became follower at term 66
2015/05/23 20:57:58 raft: c9899e68a83f6626 [logterm: 65, index: 60433, vote: 0] voted for a19202a30ffa8431 [logterm: 65, index: 60433] at term 66
2015/05/23 20:57:58 raft.node: c9899e68a83f6626 lost leader 14fb055fe2856bf7 at term 66
2015/05/23 20:57:58 raft.node: c9899e68a83f6626 elected leader a19202a30ffa8431 at term 66
2015/05/23 20:57:58 rafthttp: starting client stream to a19202a30ffa8431 at term 66

Conclusion


In this blog we saw the cluster formation and leader election process with Etcd and Raft. We also looked at the process of re-election. I will update the blog with more test scenarios and analyze the behavior or Etcd and Raft in coming days

Test Driving Etcd – part 1 (Cluster Setup)

This blog is part of 3 part series on etcd

Part1: Cluster Setup

Part2: API and HA Application

Part3: Service Discovery and Proxy

What is Etcd ?


In a multi-node cluster hosting a service, each node needs to have a consistent view of the shared service configuration. At the same time, it is important that the configuration remains highly available and should be able to survive node failure. Etcd is a highly available, distributed configuration data store designed for a clustered environment.

For a distributed system to remain consistent, its data needs to be in agreement with atleast the majority of the cluster members. Consensus building is the process of bringing in an agreement within the majority of the cluster members. A cluster must constantly strive to be in consensus for its data to remain consistent.

Etcd uses Raft algorithm for building and maintaining consensus in the cluster.

Etcd Test Setup


To try out Etcd we will need a cluster setup. For our testing, we will create a very simple cluster setup using Linux network Name-Spaces and different directory for the node data store on a Ubuntu 14.10 Virtual Machine. The following diagram describes our setup.

cluster-setup

The setup has three cluster nodes and a client node for testing. All the nodes are implemented with network Name Spaces and connect to a bridge on an OpenVSwitch instance (Linux Bridge can be used as well) using veth pair.

Node Name Space IP
Node1 NS1 1.1.1.1/24
Node2 NS2 1.1.1.2/24
Node3 NS3 1.1.1.3/24
Client NS4 1.1.1.4/24

Setup scripts


The scripts for bringing up a simple etcd cluster based on namespaces are available here. The script setup.sh brings up a 2-node name-space based setup

Aadditional nodes can be added with the add_node.sh script.

  1. Run setup.sh to bring up 2 nodes.

      $ sudo bash setup.sh


chandan@chandan-VirtualBox:~/TA/etcd_prebuilt/etcd/setup$ sudo bash setup.sh
+ cleanup
+ ip netns add ns1
+ ip netns add ns2
+ ip link add veth1-ovs type veth peer name veth1-ns
+ ip link add veth2-ovs type veth peer name veth2-ns
+ ip link set veth1-ns netns ns1
+ ip link set veth2-ns netns ns2
+ ip netns exec ns1 ifconfig veth1-ns 1.1.1.1/24 up
+ ip netns exec ns2 ifconfig veth2-ns 1.1.1.2/24 up
+ ifconfig veth1-ovs up
+ ifconfig veth2-ovs up
+ ovs-vsctl add-br br-ns
+ ovs-vsctl set bridge br-ns datapath_type=netdev
+ ovs-vsctl add-port br-ns veth1-ovs
+ ovs-vsctl add-port br-ns veth2-ovs

  1. Run add_node.sh to add the 3rd

     $ sudo bash add_node.sh 3


chandan@chandan-VirtualBox:~/TA/etcd_prebuilt/etcd/setup$ sudo bash add_node.sh 3
+ node_no=3
+ [[ 3 == '' ]]
+ ip netns add ns3
+ ip link add veth3-ovs type veth peer name veth3-ns
+ ip link set veth3-ns netns ns3
+ ip netns exec ns3 ifconfig veth3-ns 1.1.1.3/24 up
+ ovs-vsctl add-port br-ns veth3-ovs
+ ifconfig veth3-ovs up

  1. Run add_node.sh to add 4th node to act as client.

     $sudo bash add_node.sh 4

Start Etcd on cluster nodes


Download a pre-built version of etcd from github and extract the etcd and etcdctl binary.

Etcd provides flags to configure its runtime behavior, for ease of use, I have wrapped the flag settings into a script etcd_start.sh. This script assumes etcd binary in the current directory.

You can use screen or terminator to open multiple terminals to monitor the formation of cluster. The setup script in the previous section created a namespace per node. On 3 different terminal start a shell in the namespaces using the following command, replacing node_number with actual node number.

$sudo ip netns exec ns bash

This will start a root shell in the namespace

Use the etcd_start.sh wrapper to start etcd as follows

# bash etcd_start.sh

Cluster formation on my test setup is captured in the following screenshot.

cluster

Testing basic operation


Etcdctl is the client that can be used to interact with the cluster. Etcdctl needs to be told about the cluster endpoints.


export ETCDCTL_PEERS=http://1.1.1.1:2379,http://1.1.1.2:2379,http://1.1.1.3:2379

  • Lets check the cluster status

Client $ ./etcdctl cluster-health
cluster is healthy
member 14fb055fe2856bf7 is healthy
member a19202a30ffa8431 is healthy
member c9899e68a83f6626 is healthy

Client $ ./etcdctl member list
14fb055fe2856bf7: name=Node_2 peerURLs=http://1.1.1.2:2380 clientURLs=http://1.1.1.2:2379,http://1.1.1.2:4001
a19202a30ffa8431: name=Node_1 peerURLs=http://1.1.1.1:2380 clientURLs=http://1.1.1.1:2379,http://1.1.1.1:4001
c9899e68a83f6626: name=Node_3 peerURLs=http://1.1.1.3:2380 clientURLs=http://1.1.1.3:2379,http://1.1.1.3:4001

Cluster status is healthy

  • Lets try to set a data item and retrieve its value.

Client $ ./etcdctl set name "Chandan Dutta Chowdhury"
Chandan Dutta Chowdhury
Client $
Client $ ./etcdctl get name
Chandan Dutta Chowdhury

Cluster is functional

  • Lets stop the leader and check if the data survived the lose of leader

Client $ ./etcdctl cluster-health
cluster is healthy
member 14fb055fe2856bf7 is healthy
member a19202a30ffa8431 is unhealthy
member c9899e68a83f6626 is healthy
Client $ ./etcdctl get name
Chandan Dutta Chowdhury
Client $ ./etcdctl set name chandan
chandan

Cluster is still functional after lose of one member(leader)

  • Lets stop another member and check the cluster

client $ ./etcdctl get name
chandan
client $ ./etcdctl set name "chandan dutta chowdhury"
Error:  501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
client $

With only one member the cluster is in read-only state, new key-value pair cannot be set

Conclusion


In this post, we looked at Etcd a distributed highly available configuration data store for clustered environment. We also looked at the cluster behavior on lose of minority and majority members . We now have a 3 node etcd cluster to try basic etcd features. In the following posts we will look at details of etcd APIs, service discovery and creating a simple application using etcd.