cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Time based health rules not possible?

Stewart_Berg
Adventurer

Why doesn't AppD have time settings for health rules?

Examples:

"trigger alert only if threshold is breached every polling minute for X minutes"

"trigger alert if threshold is breached X number of times in the last X minutes"

 

For business transactions, I'm using the standard baseline settings for average response time health rules, but my business transactions are filling up my inbox with violations that are not really useful. I can't turn these rules off or I'll miss any serious issues that show up ( a continuously sick node or worse ). 

 

I need health rules to only trigger if average response time, for a specific node, crosses a threshold AND also stays above a threshold for x minutes/seconds. There is no need to alert out for a single transaction that succeeds but takes a long time, when 99.9% are within limits. I can't find a simple way to filter this noise out without watering down detection accross the larger pool of nodes.

 

The only way I can see to do this is to play around with the baseline, which then raises the ART and reduces effectiveness for large scale problems. Please enlighten me if I'm missing something.

 

This time based option is standard on most other APMs I've used. I see you can build custom rules, but there seems to be little documentation on how to do this (no examples that I can find).

 

TIA.

5 REPLIES 5

gurmitsa
Architect

Hi Steward

 

To your question:

I need health rules to only trigger if average response time, for a specific node, crosses a threshold AND also stays above a threshold for x minutes/seconds. There is no need to alert out for a single transaction that succeeds but takes a long time, when 99.9% are within limits. I can't find a simple way to filter this noise out without watering down detection accross the larger pool of nodes.

>>>

You can acheive this by using the configuration "you the last xx minutes of data", check point 5 on this link

https://docs.appdynamics.com/display/PRO44/Configure+Health+Rules#ConfigureHealthRules-UsetheHealthR...

 

Let me know if the above helps.

 

Also, in the policy, have you turned or checked box for Slow transactions under Other events?

if so, you may want to disable that as that will send out an email for slow transactions every minute.

 

Thanks,

Gurmit.

Thanks Gurmit.

 

This solution does not work though. It would still trigger alerts regardless of 1 minute or 30 minutes or 360 minutes of evaluation time. It only takes one transaction to go beyond the threshold momentarily during the evaluation time window, and an alert is triggered. The only way to prevent this, that I can see, is to use the BT average accross all nodes (not "per node" -> if any node has a spike... send an alert). The downside is that 1 sick node is likely to be hidden by the other 50 nodes in the pool. There doesn't look to be a nice middle ground.

 

These are specific BTs, not the default BT health alert that includes all transactions. So, I cannot select/un-select slow transactions.

 

Thanks.

I'm having a similar issue with a different metric. A quick 1 min spike causes the average to go above 3 standard deviations and generates an alert. By the time I can check out the alert the spike is gone and everything is back to normal. The application is running normally. 

 

Is there any solution to this ?

Do we have any solution for this issue ?

 

Eg. I have an alert set up. It is in the started state.

Once it is triggered, it will go to Ongoing critical and continues to alert till the time it is in that state.

 

I want to trigger the alert like below.

Once the alert starts, there should an alert.

After a certain amount of time, say 4 hours, I should get another alert. I should not get the alert in between.

 

Hi,

For this you can use the option "Wait time after violation" to make sure no
alert is generated for those many minutes for same health rule.

Thank you,
Gurmit.