Not a customer? Click the 'Start a free trial' link to begin a 30-day SaaS trial of our product and to join our community.
Existing Cisco AppDynamics customers should click the 'Sign In' button to authenticate to access the community
on
04-07-2021
08:36 PM
- edited on
07-20-2022
11:57 AM
by
Claudia.Landiva
Applications are central to our businesses — or they are the actual business. We rely heavily on them to perform user-requested actions and downstream business processing. Any application-oriented performance problems inevitably become business problems. AppDynamics gives a holistic view of an application’s performance and gives you the tools to narrow problems down to a specific scope that’s experiencing service degradation.
To narrow your problem to the root successfully, follow AppDynamics best practices:
to be aware of the exact time of the issue |
|
to have an understanding of the actual scope of the issue |
Due to the unique fit of each application to your business needs, only you can determine a problem’s context. Proactively and continuously adjust your AppDynamics application configuration to the needs of your business.
These rules will result in more streamlined and successful troubleshooting sessions, so you can resolve issues before they impact your business severely.
No matter how technical you are, you are troubleshooting your business.
Because any troubleshooting process may be complicated, it’s essential to use a streamlined approach. The troubleshooting procedure outlined here follows AppDynamics best practices and consists of three parts:
This successful troubleshooting strategy builds on your “configured-to-measure” Application Performance Monitoring foundation, as well as a broad understanding of the application’s environment.
SEE
|
Triaging and troubleshooting application problems, so you can find and describe the right problems within the right scope and timeframe.
Since Dashboards present the status of the application environment visually, and in real time, they are the starting point of the troubleshooting process. An arising issue’s yellow or red color is the trigger for the analysis.
When a problem is revealed, many tools in the AppDynamics platform enable you to determine its root cause.
NOTE | See the detailed process for successful troubleshooting below, under the Seeing the problem: Troubleshooting Principles section.
|
ACT
|
The target for application performance is to eliminate problems before they can impact customers.
So, the Alerting Model will inform teams early about any abnormal behavior, at any layer of the application. As a result, relevant teams will be able to undertake remedial actions immediately. In the best cases, AppDynamics will trigger fully automated actions that prevent the application from running into critical issues.
You can apply these principles in a testing environment, to prevent problems before they’re promoted to production.
|
KNOW
|
The last step of the troubleshooting session should result in a determination of the event’s impact to the organization.
The time between the performance bottleneck’s emergence and its solution is simultaneously the time during which business was disturbed. An affected application cannot offer services to the end user and, so it adds to the business opportunities losses. In the long view, poor user experience leads to loss of customers and loss of revenue.
The AppDynamics platform’s capability to determine impact is called Business iQ, and is provided through the set of AppDynamics products that correlate your application performance with business metrics (revenue, active users, product type, payment ID, etc.) in any given timeframe. |
There are two main principles of troubleshooting:
Well-defined Health Rules reveal where an issue appears, giving you a picture of the current application status. When the Health Rule is violated, AppDynamics will inform you by either sending a notification or by illustrating the problem in a Dashboard. This presents you with one of many starting points, where others may include: an end-user complaint, or a ticket from another team.
Once your application experiences problems, what should you do? A notification or Dashboard visualization as discussed above gives you the issue’s rough scope and starting time.
The next step is to triage the exact time that the problem arose in the application and then to identify the specific component that had been affected at the very beginning of the issue, presenting you with the root cause. You accomplish this by iteratively following all impacted components and metrics in a given timeframe, which will ultimately lead to the scope and time of the problem.
In troubleshooting timeframe, the best practice is to locate the beginning of an issue. This issue type’s beginning is usually visible through a change of behavior in the different metrics. Since a problem becomes more unclear the longer it exists, it’s important to start at the beginning.
Starting there also gives you the ability to compare what’s happening now with what was normal in the past by:
Scope is a representative component, function, metric, group of components, configuration, etc.
Because problems often don't show up in a single scope, it’s important to understand all involved and/or impacted scopes that might be related to the problem, as well as triaging the scope that’s affecting all other scopes.
In looking at a health rule, we can actually see the time scope and issue that is presented in a health rule event. If this event is an issue’s root cause, you’ll only be able to find out by triaging the problem as described above.
Iteratively triage the Average Response Time of transactions, Events, Errors and resource issues, taking notes at every stage of the troubleshooting session. By analyzing components with poor performance, and the performance indicators (in a given timeframe), you’ll be able to understand the affected scope, as well as gathering relevant information for further remedial actions.
When you see that an EUM Page experiences poor performance, keep in mind that the root of the problem can be caused by any component of your application environment.
This should give you an understanding of the issue: scope of affected components and time of the problem.
At some point, you may need more data (for example expanding the visibility by adding more agents) or to collaborate with other teams (such as Infrastructure, Development, Business, etc.) who know the subject matter in detail, and who will understand why some of the metrics you discovered in AppDynamics had poor performance.
A troubleshooter's skill is not only in troubleshooting and understanding the details of a given scenario, but in also knowing their limitations and being able to escalate problems outside of their expertise.
Consequently, you must understand the application environment. AppDynamics will make detailed measurements, and show baselined performance status, but will not explain the full context of the problem. The context may be technical, such as understanding of your database characteristics. But, it can also be non-technical. For instance, only you can determine which Business Transaction is business-critical for your organization. Therefore a deeper understanding of the application environment and the application “context” is essential.
AppDynamics provides various tools for troubleshooting different application issues. Official documentation includes several pages dedicated to troubleshooting different application components.
SCENARIO Imagine what you might do in a situation where... |
RESOURCES
|
Your service desk calls, telling you that there are complaints about slow logging into your application. |
What do you do? And by the way, have you checked Analytics to see how the issue impacted your business? |
An alert reported a higher than usual error rate for certain Business Transactions, and we simultaneously spotted a sudden increase of exceptions in the application dashboard. |
How do we analyze it? |
While analyzing a Business Transaction, you discovered a lot of issues between Tiers and towards the shared backend. |
Instead of escalating to the network team, can you look to the Network Visibility yourself to further troubleshoot? |
You notice some bottlenecks on the side of Java Virtual Machines, and your team of developers asks for more detail about the problems you discovered in AppDynamics |
|
The application that you monitor takes advantage of multithreading solutions. |
Do you know how to discover potential problems in the AppD Controller? |
One of the tools provided by AppDynamics' SaaS platform is Anomaly Detection. AIOps supports AppDynamics SaaS customers with automatic troubleshooting to replace the manual process described above, in the troubleshooting section.
Machine Learning-based algorithms leverage the data in the AppDynamics platform (such as transactions metrics, components, and their relations and events) and try to correlate abnormal behaviors, and inform AppDynamics users about possible issues in the application with a given Business Transaction.
Thank you! Your submission has been received!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form