I want to be able to alert when my servers meet the following conditions:
1) Disk Space on any drive is <10GB
2) Disk Space on any drive is <10%
The first is easy:
Create a Health Rule of Type "Node Health - Hardware, JVM, CLR (cpu, heap, disk I/O, etc)"
Set it to "Nodes" / "All Nodes" / "Nodes withing the specified tiers" and choose the "Machine Agent" tier
Then create a condition using a relative path metric of "Hardware Resources|Disks|*|Space Available" and make it "<Specific Value" of 10000000.
The above works flawlessly and let's you alert on disk space under a hard limit for any drive on any node within your application.
Unfortunately I haven't been able to get percentage-based working using wildcards...
Again, it's easy for a SPECIFIC drive... Use a relative path like:
Hardware Resources|Disks|c-|Space Available
If you use a relative wildcard path though like:
Hardware Resources|Disks|*|Space Available
Then AppD only evaluates the first node it finds and leaves all the other nodes unevaluated...
Has anyone conquered this problem and been able to use a percentage-based alerting that covers all drives (even if each node has a different number of drive letters...) across all nodes?
Solved! Go to Solution.
You can refer the below thread that talks about how to get alert based on % .
However wildcards are not supported in Metric Expressions and hence you need to do it manually.
I found a solution and wanted to share it with everyone!
If you want to monitor all the drives on your servers (using basic machine agent) and alert if they are below a certain percentage of disk space AND if they are below a certain bytes free level, you can do it as follows:
1) Set Type to "Node Health - Hardware, JVM, CLR (cpu, heap, disk I/O, etc), set the "Use the last xxx minutes of data when evaluating the Health Rule" to something like 5 (basically so that a temporary process won't cause it to alert, but if the disk stays below your thresholds for 5 or more minutes it will trigger):
2) Choose the options as shown below. Make sure to choose just the "Machine Agent" tier (this is so you don't get multiple alarms per server for every tier on that server):
3) Next setup the "Percentage" condition. In my example I'm telling it to go critical if the drive is over 90% full:
Click Edit Expression to add your formulas
4) Next add another condition to also check for a hard byte limit. In my example I want to know if the drive space drops below 5 GB (5000000):
NOTE: Notice that I set the "ALL" setting indicating that BOTH of these conditions must trigger for the health rule to violate. I did this because I only want to be alerted if the drive letter is both BELOW 5 GB and under 10%. If I didn't do this, I would get some false positives that I don't care about. For example:
FINAL IMPORTANT NOTE: You need to create a rule like the above for EVERY drive letter used on any of your servers. You can't just keep adding additional conditions per drive. You need a NEW "Health Rule" PER drive. The above example is for the C: drive.
In our environment, the servers in our application always have a C & D drive, but also can have drives anywhere from E to Z. So I created 24 health rules. For the servers that don't have have one of these drives, it ignores them (because the "Evaluate to true on no data" option is NOT checked. That's perfect and exactly what you want.
COMMENT: Hopefully in the future AppD allows us to use wildcards in the expressions for conditions. This would allow us to create ONE health rule instead of 24 health rules. I've submitted this as a feature request.
I hope this guide was useful to you! Enjoy!