Cisco AppDynamics Community

Mohammed.Rayan · ‎12-13-2017

Once in a while, you may come across an instance where your events-service cluster gets stuck during startup and its health state doesn't change for a long time.

There are many reasons for it, however one of the most likely reasons is a translog issue.

You can easily find out if a cluster is stuck on a translog issue by running the below command

curl -s -XGET http://<Events-service-hostname>:9200/_cat/recovery | grep translog | awk '{ print $1 }'

Note: Port 9200 might be disabled and so you need to enable it first by following this article Events-Service-debugging-port-disabled-by-default-in-version-4-3 to run the above command.

This will give a list of indexes that are stuck in translog. The fix is quite easy; just set replication=0 and then back to replication=2 or 1 (whatever your replication factor is). You can set the replication factor by running thru the following the steps:

To find out the replication factor, you can use the below command:

curl -XGET http://<Events-service-hostname>:9200/<index_name>/_settings

To update it, use the below command:

curl -XPUT http://<Events-service-hostname>:9200/<index_name>/_settings -d '{
      "index" : {
        "number_of_replicas" : <replication factor>
    }
}'

Cisco AppDynamics Community

Why is the Events-service cluster stuck in a translog state?