Controller (SaaS, On Premise)

cancel
Showing results for 
Search instead for 
Did you mean: 

Confusion with Alerts setup and data reported by the metric browser

Confusion with Alerts setup and data reported by the metric browser

Hello. 

I created an internal ticket for this, but i thought i could put this out there to see if other people have similar issues.

So we are trying to tune our alerts in order to get meaningful notifications.

We used the metric browser to setup our alerts.

I will give a use case that happened to us last night.

Our alert is setup as follow:

We have an alert setup for 2 of our BT.
The way we understand it is as follow:
Use the last 10m of data.
if the average is greater than 20 times the baseline StDev for the last 30days then trigger a warning alert.
if the average is greater than 30 times the baseline StDev for the last 30days then trigger a critical alert.

We have received one alert that says:
Average Response Time (ms)'s value 135.0 was greater than baseline-based calculated value 3.0 by 30.0 standard deviation(s) 0.6. Baseline used here is 'All data - Last 30 days' for the last 10 minutes
The threshold is 30*0.6+3 = 21ms
So based on that we would expect an alert and we did get one.

 

However when we use the metric browser we get a total different picture.

When we use the metric browser, set the All Data last 30days baseline and add the baseline shading we estimated that the 1x StDev is about 5ms, and 5x StDev is about 8.2ms.
For us, 30*5=150ms, so we would have expected a warning not critical.

On top of that, the metric browser is giving confusing information.
at 12:40 the observed latency is 3ms
at 12:50 it is 10ms
at 01:00 it is 3 again.

On top of that, the metric browser says that the StDev is 0.848 when highlighting the base.

And when we look at the values produced around the time the alert was thrown, we get an average of 10ms which is not even close to the 135ms reported by appDynamics.

 

So we are totally confused how what numbers to use to properly set our alerts.

 

Does anyone have information on this?

 

Screen_Shot_2019-01-16_at_9_32_01_AM.pngScreen_Shot_2019-01-16_at_9_32_15_AM.pngScreen_Shot_2019-01-16_at_9_35_05_AM.pngScreen_Shot_2019-01-16_at_9_35_19_AM.pngScreen_Shot_2019-01-16_at_9_35_30_AM.pngScreen_Shot_2019-01-16_at_9_53_00_AM.pngScreen_Shot_2019-01-16_at_9_53_08_AM.pngScreen_Shot_2019-01-16_at_9_53_17_AM.pngScreen_Shot_2019-01-16_at_9_56_13_AM.png

Confusion with Alerts setup and data reported by the metric browser