Cisco AppDynamics Community

Anonymous · ‎03-05-2015

Understanding the AppDynamics error detection life cycle can help you to troubleshoot missing errors and related error metrics. Use agent log files to search for evidence of each phase of error detection and reporting.

Errors and exceptions are counted as Application Diagnostic Data metrics (ADDs) internally and they count towards the limits that are in place for ADDs.

The phases in the error detection process are depicted in the following image:

Phase 1- Error Detections
Phase 2 - Error Registration
Phase 3 - Error Metrics Registration
Phase 4 - Error Metrics Reporting

Each step must complete successfully before the next step begins. When you are troubleshooting, verify that each step happened successfully.

Determining If Error Detection and Reporting Are Successful

Use the Agent Log Files

Generate and retrieve the agent logs using Debug level. You may need both the agent logs and the REST logs. Be sure to request the logs with logger level=debug.

When you search, it is good to search all files at the same time for the best results. Registration of the higher-level objects such as BTs, backends, and errors will appear in the agent.log files. Metric registration and metric uploads will appear in the REST log.

Phase 1 - Error Detection

AppDynamics reports errors that occur during the execution of business transactions and error messages outside the context of a business transaction that are logged by the application server (called application server exceptions)

Typical examples of errors are:

Logged Exceptions or Messages
Errors based on HTTP Return Codes
Errors based on Redirect Pages

Conditions that result in error detection:

Unhandled Exceptions

An unhandled exception that occurs during the execution of a business transaction is reported as an error.
Unhandled exceptions that occur during an exit call, for example, calls to databases, web services, or message queue servers.

HTTP error codes

See the Supported Environments and Versions for your app agent to determine if the loggers you use are recognized by default by AppDynamics.

Phase 2 - Confirm Error Registration

When an error is detected, the agent sends a registration request to the controller and the controller assigns an ID to the error. The ID is used for communication between the agent and controller to identify exactly which objects are being reported and stored in the controller database.

To confirm error registration, get the agent log files and look for the registration request and response entries and find the ID assigned to the error.

1. Find error registration request log entries.

Use search string = "Sending ADDs to register"

Search the log files using the search string Sending ADDs to register. You should see entries similar to this one for the initial error registration request:

[AD Thread Pool-Global0] 07 Aug 2014 16:44:07,255 INFO ErrorProcessor - Sending ADDs to register [ApplicationDiagnosticData{key='org.apache.tomcat.dbcp.dbcp.SQLNestedException:java.sql.SQLException:',
name=SQLNestedException : SQLException,
diagnosticType=ERROR,
configEntities=null, summary='org.apache.tomcat.dbcp.dbcp.SQLNestedException caused by java.sql.SQLException'}]

The string between the brackets [] identifies the type of object (Application Diagnostic Data or ADD), and contains values for key, name, diagnosticType, and summary.

In this case, the diagnosticType=ERROR, and the key=org.apache.tomcat.dbcp.dbcp.SQLNestedException:java.sql.SQLException.

Using that specific error key as a search string, you can find the log entry that contains the ID assigned to this error. In this case the ID is 53 as shown in this log example.

[AD Thread Pool-Global0] 07 Aug 2014 16:44:07,340  INFO ErrorProcessor - Error Objects registered with controller :{org.apache.tomcat.dbcp.dbcp.SQLNestedException:java.sql.SQLException:=53}

2. Find error registration response log entries

Use search string = "Error Objects registered with controller"

Alternately, you can use the more general search string "Error Objects registered with controller" to find all the error objects that were registered during the duration of your logging session. The search results below show multiple errors and their IDs highlighted with a red outline.

If you do not see successful registration request and response, look in the Controller Logs for exceptions related to error registration.

Phase 3 - Confirm Metric Registration

After successful error registration, the agent keeps an error map in memory with the ID, type, and name for each error. The next time the error is detected, the agent is ready to report metrics. A similar process of request and response between the agent and controller enables the controller to assign each metric an ID.

The errors per minute metrics are reported at the following levels of aggregation:

Application level - Total counts for the error across the entire application. The log looks similar to:

BTM|Application Diagnostic Data|Error:61|Errors per Minute

Tier level - Total counts for the error across the tier. The log looks similar to:

BTM|Application Summary|Component:1|Errors per Minute

BTM|Application Summary|Component:2|Exit Call:HTTP|To:3|Errors per Minute

Business transaction level - Totals by BT and exit calls:

BTM|BTs|BT:101|Component:1|Errors per Minute

BTM|BTs|BT:102|Component:1|Exit Call:JDBC|To:{[UNRESOLVED][4]}|Errors per Minute

Note: In the logs, Component=tier.

Find error registration request log entries.

Use search string = Application Diagnostic Data

Using this search string should give search results similar to the following:

Line 1366: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:61|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>

Line 1392: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:71|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>

Line 1402: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:53|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>

Line 1403: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:70|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>

...

Line 1561: <metric id="2450" name="BTM|Application Diagnostic Data|Error:61|Errors per Minute"/>
Line 1575: <metric id="2539" name="BTM|Application Diagnostic Data|Error:71|Errors per Minute"/>
Line 1515: <metric id="2463" name="BTM|Application Diagnostic Data|Error:53|Errors per Minute"/>
Line 1568: <metric id="2546" name="BTM|Application Diagnostic Data|Error:70|Errors per Minute"/>

Use search string = Errors per Minute

Results similar to the following will show metrics for the varying levels of rollup indicated in BOLD:

Line 1394: <metric time-rollup-type="AVERAGE" name="BTM|BTs|BT:131|Component:7|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>
 
Line 1395: <metric time-rollup-type="AVERAGE" name="BTM|Application Summary|Component:7|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>
 
Line 1402: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:53|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>

Line 1403: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:70|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>
 
Line 1442: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:69|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>
 
Line 1445: <metric time-rollup-type="AVERAGE" name="BTM|Backends|Component:{[UNRESOLVED][13]}|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>
 
Line 1460: <metric time-rollup-type="AVERAGE" name="BTM|Application Diagnostic Data|Error:76|Errors per Minute" hole-fill-type="RATE_COUNTER" cluster-rollup-type="COLLECTIVE"/>
...
Line 1536: <metric id="2475" name="BTM|Backends|Component:{[UNRESOLVED][13]}|Errors per Minute"/>
Line 1546: <metric id="2342" name="BTM|BTs|BT:131|Component:7|Errors per Minute"/>
Line 1491: <metric id="2512" name="BTM|Application Summary|Component:7|Errors per Minute"/>
 
Line 1515: <metric id="2463" name="BTM|Application Diagnostic Data|Error:53|Errors per Minute"/>
Line 1531: <metric id="2456" name="BTM|Application Diagnostic Data|Error:69|Errors per Minute"/>
Line 1496: <metric id="2521" name="BTM|Application Diagnostic Data|Error:76|Errors per Minute"/>
Line 1568: <metric id="2546" name="BTM|Application Diagnostic Data|Error:70|Errors per Minute"/>

2. Find error registration response log entries.

In the above search results, you can see both the metric registration request and the response containing the metric ID for several errors at the application level.

The entries beginning with "<metric id=" show the ID assigned to each separate error's metrics.

You can see the metric ID assigned at line 1515 to Error:53 is 2463.

If you do not see successful request and response for the metric, look in the Controller Logs for exceptions related to error metric registration.

Phase 4 - Error Metrics Reporting

After successful metric registration, metrics are reported to the controller every minute. Using the metric ID, you can search the REST log for the metric upload.

Find error metric upload log entries:

Use the metric ID that you found when you verified metric registration.

Look for log entries showing error metric reporting uploads. This example shows search results from using the metric ID value of "2463" to search the REST logs:

Line 1515: <metric id="2463" name="BTM|Application Diagnostic Data|Error:53|Errors per Minute"/>
Line 1686: <metric id='2463', value[sum=1, count=1, min=1, max=1, current=1]>
Line 2511: <metric id='2463', value[sum=0, count=1, min=0, max=0, current=0]>
Line 3207: <metric id='2463', value[sum=0, count=1, min=0, max=0, current=0]>
Line 3919: <metric id='2463', value[sum=0, count=1, min=0, max=0, current=0]>
Line 4578: <metric id='2463', value[sum=0, count=1, min=0, max=0, current=0]>

If you do not see successful error metric data uploads, look in the Controller Logs for exceptions related to metric data upload.

Notes on Error Registration Counts and ADD limits

Application Diagnostic Data metrics include errors, exceptions, and async threads (and a few other things, such as snapshots).

Error/Thread ADD registration limits counts behave as follows:

If an Error/Thread ADD is deleted or excluded the relevant ADD registration limits count are decremented.
If an Error/Thread ADD is unexcluded the relevant ADD registration limits count are incremented.
Changed the exclude/unexclude operation so that you can not exclude an already excluded ADD or unexclude an already unexcluded ADD. This keeps ADD registration limit counts in a good state even if the exclude/unexclude APIs are used incorrectly.
Note that limiting ERROR ADD registration prevents STACK_TRACE ADD registration as well.
When a tier is deleted, the Error/Thread ADD registration limits counts are decremented accordingly.