In a recent prod incident, we had a server crash and sysadmins rebooted the server , but did not reboot our application on it since they did not know how.
The way we have configured machine agent on this app is that it comes up as part of the application bootup.
We are looking at catching this scenario where app remained down for a extended period of time.
We are thinking we could use a "machine agent down" health rule.
AM i on right path?
Also how can I setup just a machine agent down health rule ?
I tried the App_availability metric but that reports when app agent is not responding which seems to happen every so often on our app and app agent is at 95%.
I would like to eveluate just machine agent down health rule being at 0% when ideally it is 100%