I'm going crazy with this, even with several weeks of data in our controller already. We're getting a lot of health rules that fire due to edge cases when baseline is 0 or when standard deviation is 0. The math is awful:
When baseline is 0 (or even very close to it), even a lax rule like 1000% increase over the baseline will fire with even a very small value. We can set rules that check for minimum volumes of calls/errors/slow calls in some cases, or minimum values of response time in some cases, but those don't completely solve the issue.
When baseline standard deviation is 0 (or very low), then any rule, even something huge like 10 standard deviations above baseline, is destined to fail as soon as anything higher than the baseline comes along. Again, sometimes there are some other conditions that can be added, but they don't completely solve the issue, because in some types of rules, those other options are extremely limited, like setting a hard minimum on the value in question.
Are there other solutions I'm missing?
I've seen posts here in the community going back to 2015 that ask about this, with lackluster responses from staff. Is there any chance of getting this seemingly simple feature onto the roadmap?
Jeannie, thanks for the general tips, but we need a real solution on this. Has there been any progress on the specific issue? Has anyone come up with a work around?
@Mac.Newbold @John.Panelli and @Moe.Saidi : Thank you for weighing in here on this health rules topic. Let's continue to track this issue here.
A few general comments and tips on health rules (these don't necessarily solve your particular problem, however putting the general tips here for the sake of the Community to understand and maybe you find some value in these as well?):
-Tuning a large number of health rules is hard.
- values near 0 get tricky.
- The root of the problem may be a large number of business transactions.
- Maybe focus health rules by creating fewer of them, or only apply to a smaller number of critical metrics.
- Creating health rules is an iterative process. It's okay if things aren't perfect at first, or not everything has granular metrics, as long as we continue to improve them.
Workaround suggested from an AppDynamics consultant:
For this reason, I generally refrain from alerting purely based on baseline, one option is to create a composite health rule with:
1. Load as a criteria , Calls/Min > some meaningful value
2. Baseline > Meaningful 'x' deviations
3. Slowness Indicator : {slow+verslow calls}/Total Calls/Min*100 . This helps to say 'Alert me only when breach of the baseline impacts 'x' % of the users'
It would appear that this issue is quite popular given the number of times this was asked on the community forums.
I'm requesting an enhancement to APpDynamics Health Rules where a rule can not violate if the baseline is at 0. This could be a checkbox, or the ability to add a condition where "baseline not equal to 0"
Has anyone tried using a different baseline measurement?
I seem to recall we had a similar issue in the past and I was able to define a new baseline with a shorter timespan (or longer in your case maybe?) that would make sure you don't have a baseline of 0.
We're having the same issue and a minor revolt because people are getting woken up due to false posatives. We really need a solution to this from App Dynamics. Yogesh, can you please figure out a solution?
Hi,
We understand that the baseline and the standard deviation shows 0 on the metric browser and hence the health rule is getting violated and so the email triggered.
As of now, the available options are only for either <baseline or >baseline, (but nothing for =baseline) so that you can avoid the alerts. Can you try adding a second condition with [absolute value comparison] with AND clause(with baseline base condition) to avoid such alerts. Please find attached screenshot.
We also understand that you have already configured multiple conditions in the health rule configuration. So to achieve above, you may need to configure the conditions separatey for different kind of metrics.
Thanks,
Yogesh
Oh, I completely understand why the rule is firing. It's correct that those rules are firing.
My complaint is that I can't put a condition on the health rule so that it doesn't fire when baseline is 0 or when Std Dev is 0. They cause false alarms for us, every single time. And they're very sensitive, since it only takes 1 thing above baseline for it to fire.
Ideally, I'd love to be able to put a condition that compares the value of the baseline of a metric, and a different one for comparing the value of the baseline std. deviation of the metric.
Short of that, the most viable workaround I've found is to require a minimum level of traffic (calls/minute), a minimum number of Std Deviations above baseline, and a minimum percentage of the baseline above the baseline. But even that still fails when there's sufficient traffic, and baseline and std dev are both 0. One single thing will be both infinite std deviations above baseline and infinite percentage above baseline. And then a bunch of us get useless alerts waking us up in the middle of the night.
Hi,
Can you please verify what value does the baseline show on the controller UI.
To have better understanding, let's take an example of one of the alert you received.
"Average Response Time (ms)'s value 1 was greater than baseline-based calculated value 0.0 by 3.0 standard deviation(s) 0.0. Baseline used here is "Daily trend - Last 30 days "
Here, 0.0 is standard deviation
Threshold value will be : =Baseline value + (3 * Standard deviation). Whenever average response time 1 ms will be greater than value 0 (0.0 + 3*0.0), alert will be raised.
Hope that is cleared. Please share the screenshot of health rules configuration as well as baseline data line in case you still have any concerns regarding this.
Thanks,
Yogesh
N > 0 by X% is true for any N>0
N > 0 by X std dev. of 0 is true for any N>0
N is 1000% greater than 0
Hi Mac,
Sorry to hear of your frustration. Please only share non-sensitive company information/screenshots in the community. If you have sensitive information to share please open a support ticket and send files over secure support channels.
FYI, you may also private messaging with Yogesh as needed via the community.
To send a private message to Yogesh, hover over his name in the thread and click on 'send message'
Lastly, as you uncover things that are helpful, workarounds, solutiojsn please share with the community. We are striving to be a closed loop community and responsiveness is important to us.
Thank you for your diligence and for raising the question.
Hi,
Can you please share the screenshots depicting an issue to understand the issue more clearly.
For more info please follow below docs-
https://docs.appdynamics.com/display/PRO44/Configure+Health+Rules
https://docs.appdynamics.com/display/PRO44/Dynamic+Baselines
Thanks,
Yogesh
I'm going crazy with this, even with several weeks of data in our controller already. We're getting a lot of health rules that fire due to edge cases when baseline is 0 or when standard deviation is 0. The math is awful:
When baseline is 0 (or even very close to it), even a lax rule like 1000% increase over the baseline will fire with even a very small value. We can set rules that check for minimum volumes of calls/errors/slow calls in some cases, or minimum values of response time in some cases, but those don't completely solve the issue.
When baseline standard deviation is 0 (or very low), then any rule, even something huge like 10 standard deviations above baseline, is destined to fail as soon as anything higher than the baseline comes along. Again, sometimes there are some other conditions that can be added, but they don't completely solve the issue, because in some types of rules, those other options are extremely limited, like setting a hard minimum on the value in question.
Are there other solutions I'm missing?
I've seen posts here in the community going back to 2015 that ask about this, with lackluster responses from staff. Is there any chance of getting this seemingly simple feature onto the roadmap?
Hi,
Can you please share the screenshots depicting an issue to understand the issue more clearly.
For more info please follow below docs-
https://docs.appdynamics.com/display/PRO44/Configure+Health+Rules
https://docs.appdynamics.com/display/PRO44/Dynamic+Baselines
Thanks,
Yogesh
Hi Mac,
Sorry to hear of your frustration. Please only share non-sensitive company information/screenshots in the community. If you have sensitive information to share please open a support ticket and send files over secure support channels.
FYI, you may also private messaging with Yogesh as needed via the community.
To send a private message to Yogesh, hover over his name in the thread and click on 'send message'
Lastly, as you uncover things that are helpful, workarounds, solutiojsn please share with the community. We are striving to be a closed loop community and responsiveness is important to us.
Thank you for your diligence and for raising the question.
N > 0 by X% is true for any N>0
N > 0 by X std dev. of 0 is true for any N>0
N is 1000% greater than 0
Hi,
Can you please verify what value does the baseline show on the controller UI.
To have better understanding, let's take an example of one of the alert you received.
"Average Response Time (ms)'s value 1 was greater than baseline-based calculated value 0.0 by 3.0 standard deviation(s) 0.0. Baseline used here is "Daily trend - Last 30 days "
Here, 0.0 is standard deviation
Threshold value will be : =Baseline value + (3 * Standard deviation). Whenever average response time 1 ms will be greater than value 0 (0.0 + 3*0.0), alert will be raised.
Hope that is cleared. Please share the screenshot of health rules configuration as well as baseline data line in case you still have any concerns regarding this.
Thanks,
Yogesh
Oh, I completely understand why the rule is firing. It's correct that those rules are firing.
My complaint is that I can't put a condition on the health rule so that it doesn't fire when baseline is 0 or when Std Dev is 0. They cause false alarms for us, every single time. And they're very sensitive, since it only takes 1 thing above baseline for it to fire.
Ideally, I'd love to be able to put a condition that compares the value of the baseline of a metric, and a different one for comparing the value of the baseline std. deviation of the metric.
Short of that, the most viable workaround I've found is to require a minimum level of traffic (calls/minute), a minimum number of Std Deviations above baseline, and a minimum percentage of the baseline above the baseline. But even that still fails when there's sufficient traffic, and baseline and std dev are both 0. One single thing will be both infinite std deviations above baseline and infinite percentage above baseline. And then a bunch of us get useless alerts waking us up in the middle of the night.
Hi,
We understand that the baseline and the standard deviation shows 0 on the metric browser and hence the health rule is getting violated and so the email triggered.
As of now, the available options are only for either <baseline or >baseline, (but nothing for =baseline) so that you can avoid the alerts. Can you try adding a second condition with [absolute value comparison] with AND clause(with baseline base condition) to avoid such alerts. Please find attached screenshot.
We also understand that you have already configured multiple conditions in the health rule configuration. So to achieve above, you may need to configure the conditions separatey for different kind of metrics.
Thanks,
Yogesh
We're having the same issue and a minor revolt because people are getting woken up due to false posatives. We really need a solution to this from App Dynamics. Yogesh, can you please figure out a solution?
It would appear that this issue is quite popular given the number of times this was asked on the community forums.
I'm requesting an enhancement to APpDynamics Health Rules where a rule can not violate if the baseline is at 0. This could be a checkbox, or the ability to add a condition where "baseline not equal to 0"
Has anyone tried using a different baseline measurement?
I seem to recall we had a similar issue in the past and I was able to define a new baseline with a shorter timespan (or longer in your case maybe?) that would make sure you don't have a baseline of 0.
@Mac.Newbold @John.Panelli and @Moe.Saidi : Thank you for weighing in here on this health rules topic. Let's continue to track this issue here.
A few general comments and tips on health rules (these don't necessarily solve your particular problem, however putting the general tips here for the sake of the Community to understand and maybe you find some value in these as well?):
-Tuning a large number of health rules is hard.
- values near 0 get tricky.
- The root of the problem may be a large number of business transactions.
- Maybe focus health rules by creating fewer of them, or only apply to a smaller number of critical metrics.
- Creating health rules is an iterative process. It's okay if things aren't perfect at first, or not everything has granular metrics, as long as we continue to improve them.
Workaround suggested from an AppDynamics consultant:
For this reason, I generally refrain from alerting purely based on baseline, one option is to create a composite health rule with:
1. Load as a criteria , Calls/Min > some meaningful value
2. Baseline > Meaningful 'x' deviations
3. Slowness Indicator : {slow+verslow calls}/Total Calls/Min*100 . This helps to say 'Alert me only when breach of the baseline impacts 'x' % of the users'
User | Count |
---|---|
8 | |
2 | |
1 | |
1 | |
1 | |
1 |