Showing results for 
Show  only  | Search instead for 
Did you mean: 

Observability of a Chaos Engineering experiment in AppDynamics


Chaos Engineering & Observability

Estimated Reading time: 4 mins


Steady-state is not the only set of metrics that one should observe when running your experiments. Observability in chaos engineering extends to experiments themselves. One of the important facts that Observability implies is that once you observe deviations in the application's steady-state metrics, you should also be able to correlate it with events that could cause such deviation. 

Event Browser in AppDynamics is one such place where one can discover many events that either agents publish or applications create. Examples of events are health rule violations, application restarts, JVM crashes, or any custom events that a developer may choose to publish.


Read more about AppDynamics Events at



Gremlin is a favorite tool to bring chaos engineering practices and culture into your organization. It can help you design, run, analyze, and collaborate chaos engineering experiments. You can read more about this platform on its website.


Read more about Gremlin CE platform at

Use case

If you design and run your chaos experiments using Gremlin's platform and observe your application's steady-state metrics using AppDynamics, then there is no straight way to both observe your experiment and metrics in the same pane of monitoring. So we try to solve this problem using AppDynamics Events.



Publishing Gremlin Experiment into AppDynamics Events

Gremlin's experiments once initiated go through various stages. These stages describe the lifecycle of the experiment itself. When an experiment is in the 'RUNNING' stage, it means the attack is being run on the target. One can query the Gremlin attack API and if the attack is found to be RUNNING, the same event can be published in AppDynamics. Here is how to do it using a quick code in Python3.6.


Poll Gremlin Experiment Status


def pollExperimentStatus(guid):
 url = ''+str(guid)
 h = json.dumps({"Authorization": "Key xxxxxxxx"})
 i = 0
  #Poll experiment status 
  expdata = requests.get(url,headers=json.loads(h))
  #Identify STAGE from the response
  expdatajson = json.loads(expdata.text)
  stage = expdatajson['stage']
  while stage != "Successful":
   time.sleep(2) #poll every 2 seconds
   #Identify STAGE from the response
   expdata = requests.get(url,headers=json.loads(h))
   expdatajson = json.loads(expdata.text)
   stage = expdatajson['stage']
   if stage == "Running" and i == 0:
    i = i + 1 
 except Exception as err:
  print(f'Error occurred: {err}')



Publish Event in AppDynamics


def raiseAppdEvent(guid):
   appdConfig,a_range = getConfig('appdSettings') ## custom function to get url, etc.
   eventsURL = appdConfig['eventsURL']
   token = getAppdToken() ## custom function for oath token
   eventsHeaders = {'Authorization': 'Bearer {}'.format(token)} 
   eventsURL=eventsURL + "?" + "severity=INFO&summary=Gremlin Experiment {}&eventtype=CUSTOM&customeventtype=Gremlin".format(guid)
   response =,headers=eventsHeaders)
 except Exception as err:
   print(f'ERROR: event : {err}')



AppDynamics complements SRE practice

Using such a simple mechanism of querying experiment data, in this case, experiment GUID, from Gremlin and using it to publish an event in AppDynamics proves really helpful when observing your application during experiments. 

By registering your experiment details into AppDynamics, one can use a single pane of monitoring to observe your experiments. You can not only observe but create policies to trigger custom actions like notifying your service owners about the start and end of an experiment by leveraging your existing monitoring infrastructure and investment.


Happy Experimenting!