Sometimes errors may appear in an appserver log, but not in AppDynamics. Or you may run your own specific test suites and see that known errors are not being detected by AppDynamics. Here are some things to check.
Your appserver may be using an unsupported logging framework. AppDynamics App Agent for Java supports the following logging frameworks:
AppDynamics App Agent for .NET supports the following logging frameworks:
In version 4.0, support was added for SLF4J, Logback, which implements SLF4J under the covers, and for Log4j 2.
The support extends to the following features of these logging libraries:
We support instrumenting classes that implement the slf4j interface. Logback uses slf4j natively, so we support logback also.
We do not support SLF4J error passed objects, for example, error(java.lang.String,%20java.lang.Object...)
We instrument out of the box anything that implements the log4j2 Logger interface. Specifically, we support:
Also, fatal variants of all of the above are supported.
Notice that we don't support logger.logMessage(), log(), or any calls with Object ... params (meaning a Object params). We don't support the log() and logMessage() from ExtendedLogger.
The solution is to configure a custom logger. See Configure a Custom Logger.
There is an agent metric limit of 5000 metrics that can be registered per Node and an agent limit of 500 ADDs (Application Diagnostic Data - this includes async threads, errors and exception registration, snapshots and so on).
If this limit is reached and the Agent attempts to create metrics beyond this threshold, you see AGENT_METRIC_REG_LIMIT_REACHED alert in the event list. You can increase this default limit but that might cause an increase in overhead. Sometimes hitting this limit can be indicative of misconfigurations in your application. Hitting this limit and a similar limit in the Controller can indicate that you have hit the business transaction or backend limits and you may need to change the default discovery rules.
What is a metric?
A metric is an identifier used to uniquely identify a particular statistic.
All of the above are individual metrics registered from the node, against which the corresponding statistics data is collected and reported to the controller. At any particular point in time, the metric name remains the same but the value changes and that value is captured and reported.
This particular concept of a metric is internal to AppDynamics, however, it is helpful to understand how it works because of the self-imposed limits on the number of metrics that can be discovered. The limits help to minimize the AppDynamics footprint and overhead impact to an application. One limit is the maximum number of metrics that the agent creates. Once the limit is reached the agent does not create new metrics.
Q: What is the impact of exceeding this 5000 limit?
A. This limit is per agent. Once the limit is reached no new metrics are created, therefore no new activity is tracked. If you have more endpoints discovered those are not tracked.
Q: If that is true then does restarting the agent from the console reset this limit and hopefully get new endpoints monitored while perhaps not picking up some old defunct ones that were working towards the 5000 limit?
A: Once the metric is registered, it is present always for that agent whether or not load is present on that metric or not.
For example, once metrics corresponding to an HTTP backend are registered, it doesn't matter whether there are calls or not to that backend, those metrics are always counted against the limit. In a case such as this, you could increase the maximum metric limit or you can delete the backends that are not being used to free up those metrics. You may also need to revise your backend configuration to avoid registering so many backends.
Once you increase the limit (or free up metrics by deleting unneeded backends/components), it is not guaranteed that the new end points will be visible because it is possible that there are other statistics which will be detected first and use the added metrics capacity.
For example, if there are async calls in the application, but the agent was not able to create metrics for them due to the limit being reached, once there is metric capacity, those async-call-related metrics might be created first before the new endpoints are detected.
Verify that you are not exceeding other limits such as backend limits and BT limits. Hitting the metric limit can be a warning sign of a configuration problem specific to your environment.
For example, HTTP backends are discovered automatically using Port and Host properties. If the configuration were changed to use the entire URL, your might rapidly exceed the backend limits and cause the metric limit to be hit, when the real problem is the HTTP backend configuration.
The limit of 5000 suffices much of the time, but if truly needed it can be increased if you think there are calls missing or some functionality is not being captured by the agent. On average, agents register 800 metrics across applications. The lower end is 300 and some applications produce 1500 metrics per agent. If agents need more than 5000 metrics, something else is often wrong and raising the limit rather obscures the underlying problem.
To increase the limit see Metrics Limits documentation.
NOTE: Before increasing the metric limit, be sure you have verified no other limits are being hit.
If there are errors or exceptions that are well-known and don't need to be monitoried, you can exclude them from detection and free up metric capacity. Review the documentation here: Configure Exceptions and Log Messages to Ignore.
There is a similar metric limit at the controller level. When this limit is reached, the controller issues the CONTROLLER_ERROR_ADD_REG_LIMIT_REACHED event.
The recommended solution is to fine-tune the default error detection rules, for example exclude the ones you're not interested in. Review the documentation here: Configure Exceptions and Log Messages to Ignore.
Increase the default limit to 4000, for example:
a. Login to the admin page http://<controller ip>:<port>/controller/admin.html
b. Enter root password (default value is changeme)
c. Change the value of the property 'error.registration.limit' accordingly (see attached screenshot)
Note: Increasing the limit incurs additional overhead so be sure to verify that you need to monitor all the discovered errors and exceptions.
If exclude rules are misconfigured, exceptions might be missed. Review the error detection configuration in your application. From Controller UI -> select Configure -> Instrumentation -> select Error Detection tab. For more details, see: Monitor Errors and Exceptions
AppDynamics reports error codes when the sendError method is used to report the error code. However, for some implementations of HttpServletResponse some HTTP errors are sent using setStatus. In this case, pset the capture-set-status (Java agent only) node property to true to capture these HTTP errors. For more details see the node property reference documentation: App Agent Node Properties Reference.