Cisco AppDynamics Community

Mohammed.Rayan · ‎12-11-2017

If an Events Service cluster is stuck in a RED state and it's due to unassigned shards, you will need to identify the shards, determine the cause, and re-allocate them.

“events-service-api-store / elasticsearch-singlenode-module” : {
	“healthy” : false,
	“message” : “Current [appdynamics-events-service-cluster]
cluster state: [RED]. data nodes: [1], nodes: [1}, active shards:
[3], relocating shards: [0], initializing shards: [0], unassigned 
shards; [8], timed out: [false]
	}
}

How do I troubleshoot the issue's cause?

Use the following command to find the root cause of the issue:

curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

Elasticsearch gives the following reasons for a shard to be in an unassigned state:

`INDEX_CREATED`	Unassigned as a result of an API creation of an index.
`CLUSTER_RECOVERED`	Unassigned as a result of a full cluster recovery.
`INDEX_REOPENED`	Unassigned as a result of opening a closed index.
`DANGLING_INDEX_IMPORTED`	Unassigned as a result of importing a dangling index.
`NEW_INDEX_RESTORED`	Unassigned as a result of restoring into a new index.
`EXISTING_INDEX_RESTORED`	Unassigned as a result of restoring into a closed index.
`REPLICA_ADDED`	Unassigned as a result of explicit addition of a replica.
`ALLOCATION_FAILED`	Unassigned as a result of a failed allocation of the shard.
`NODE_LEFT`	Unassigned as a result of the node hosting it leaving the cluster.
`REROUTE_CANCELLED`	Unassigned as a result of explicit cancel reroute command.
`REINITIALIZED`	When a shard moves from started back to initializing, for example, with shadow replicas.
`REALLOCATED_REPLICA`	A better replica location is identified and causes the existing replica allocation to be canceled.

Use the following health check, shards, and indices command to determine the number of shards and which shards are in RED state:

curl http://<events_service_machine>:9081/healthcheck?pretty=true
curl http://<events_service_machine>:9200/_cat/shards?v
curl http://<events_service_machine>:9200/_cat/indices?v

Output from the above health check commands:

 "events-service-api-store / elasticsearch-singlenode-module" : {
    "healthy" : false,
    "message" : "Current [appdynamics-events-service-cluster] cluster state: [RED], data nodes: [1], nodes: [1], active shards: [3], relocating shards: [0], initializing shards: [0], unassigned shards: [12], timed out: [false]"
  }
}


appdynamics_meters_v2                0 p UNASSIGNED CLUSTER_RECOVERED 
appdynamics_meters_v2                1 p UNASSIGNED CLUSTER_RECOVERED 
appdynamics_api_keys_v1              0 p UNASSIGNED CLUSTER_RECOVERED 
appdynamics_api_keys_v1              1 p UNASSIGNED CLUSTER_RECOVERED 
appdynamics_accounts                 0 p UNASSIGNED CLUSTER_RECOVERED 
appdynamics_accounts                 1 p UNASSIGNED CLUSTER_RECOVERED 
event_type_metadata                  0 p UNASSIGNED CLUSTER_RECOVERED 
event_type_metadata                  1 p UNASSIGNED CLUSTER_RECOVERED 
event_type_extracted_fields          0 p UNASSIGNED CLUSTER_RECOVERED 
event_type_extracted_fields          1 p UNASSIGNED CLUSTER_RECOVERED

How do I reallocate the shards?

The solution is to manually allocate the shards using the reroute API available from elasticsearch.

In the above output example, there are 10 unassigned shards: each shard must be separately allocated.

 curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ 
"commands" : [ { 
"allocate" : { 
"index" : "appdynamics_accounts", 
"shard" : 0, 
"node" : "node-12345798-7867-4ea1-b546-557a6bc546afg", 
"allow_primary" : true 
} 
}
]}'

You need to update the node with the IP address or the name of the node, and then run the appropriate script, from below

Without Primary:

curl http://localhost:9200/_cat/shards?v  2>/dev/null |grep UNASSIGNED | sed -e 's/\([^ ]*\) *\([^ ]*\) *.*/\1 \2/'|while read line
do
indexname=`echo $line|cut -d" "-f1`
shardid=`echo $line|cut -d" "-f2`
echo $indexname $shardid
curl -XPOST 'http://localhost:9200/_cluster/reroute'-d "{
\"commands\": [{
\"allocate\": {
\"index\": \"$indexname\",
\"shard\": $shardid,
\"node\": \"10.90.3.125\",
\"allow_primary\": 0
}
}]
}"
done

For Primary shards:

curl http://localhost:9200/_cat/shards?v  2>/dev/null |grep UNASSIGNED | sed -e 's/\([^ ]*\) *\([^ ]*\) *.*/\1 \2/'|while read line
do
indexname=`echo $line|cut -d" "-f1`
shardid=`echo $line|cut -d" "-f2`
echo $indexname $shardid
curl -XPOST 'http://localhost:9200/_cluster/reroute'-d "{
\"commands\": [{
\"allocate\": {
\"index\": \"$indexname\",
\"shard\": $shardid,
\"node\": \"10.90.3.125\",
\"allow_primary\": 1
}
}]
}"
done

Anonymous · ‎03-27-2018

Here is a powershell equivalent if you are running on Windows and don't want to install curl...

$postParams=@'
{
	"commands": [{
		"allocate": {
			"index": "appdynamics_accounts",
			"shard": 0,
			"node": "node-fa635229-8114-4ff8-a3b6-eee4449fkdsaaaa",
			"allow_primary": true
		}
	}]
}
'@

Invoke-WebRequest -Uri http://localhost:9200/_cluster/reroute -ContentType "application/json" -Method POST -Body $postParams

Anonymous · ‎04-04-2019

Don't forget. You may need to enable the http property (ad.es.node.http.enabled=true) in the "events-service-api-store.properties" file under events-service/processor/conf to allow port 9200 activity.