Discussion Feed
11-23-2020
09:20 PM
What are the recommended considerations and steps for patching nodes in Events Service?
This article provides recommendations on how to safely patch an Events Service node. It includes an example in which you must effectively stop a node for an extended period of time, and then return it to the cluster using either old index information or a fresh, clean node.
This article references the official Elasticsearch guidelines for Rolling Upgrades . As mentioned in Step 2 of these guidelines, due to time constraints, the administrator is asked to stop non-essential indexing as the node is being stopped. However, this step would not be recommended in a heavily active production cluster.
Additionally from the Events Service index management API perspective, using cluster.routing.allocation.enable “none” may lead to unintended consequences. For example, if time-consuming index creation and management tasks happen to occur between “none” and “all” settings, this may prevent indices from being created.
Table of Contents
What should I consider before patching nodes in Events Service?
What is the recommended process for rotating nodes?
Considerations
Steps for patching, upgrading, or removing nodes
Additional Recommendations
Troubleshooting
What should I consider before patching nodes in the Events Service?
When patching nodes in Events Service, consider the following practical limitations:
Are there sufficient policy and practical application upgrade time windows ?
If the cluster is actively ingesting production monitoring data , you may not be able to stop indexing
If a node was brought back with stale metadata and stale data, there is a potential for Elasticsearch synchronization error conditions ( split-brain )
There may be a performance impact from shard rebalancing that is required on active nodes which are performing ingestion and search functionality at the same time
The possibility of a specific node not returning due to potential patch process failures
Test results on recovery speed and safety of operation (which use combinations of disable and enable) may cause a split-brain scenario when changed:
"cluster.routing.allocation.enable": "primaries" | “none” | “all”
"cluster.routing.rebalance.enable": "primaries" | “none” | “all”
"indices.recovery.max_bytes_per_sec": "1000mb"
"cluster.routing.allocation.node_initial_primaries_recoveries": 1-10,
"cluster.routing.allocation.cluster_concurrent_rebalance": 2-8,
"cluster.routing.allocation.node_concurrent_recoveries": 2-8,
"indices.recovery.concurrent_streams": 1-6
"cluster.routing.allocation.exclude._ip"
What is the recommended process for rotating nodes?
The AppDynamics Analytics team has adopted and recommends the following practice when rotating nodes in or out.
Considerations
After identifying the nodes to replace or upgrade in-place, consider the following, one node at a time:
If you remove more than one node out of the cluster, you must temporarily remove shard allocation restrictions on all indices. The Events Service default is 3 . Note: Do not leave the Events Service without shard allocation restrictions for an extended period of time. Please re-apply any shard restrictions immediately upon completing the patch or upgrade. curl -XPUT 'localhost:9200/*/_settings’ -d’
{
"index": {
"index.routing.allocation.total_shards_per_node" : -1
}
}’
( Optional) You can increase the rebalancing and recovery speed: curl -XPUT localhost:9200/_cluster/settings -d'
{ "transient": {
"indices.recovery.max_bytes_per_sec": "1000mb", "cluster.routing.allocation.node_initial_primaries_recoveries": 1,
"cluster.routing.allocation.cluster_concurrent_rebalance": 2,
"cluster.routing.allocation.node_concurrent_recoveries": 2,
"indices.recovery.concurrent_streams": 6
} }’
Steps for patching, upgrading, or removing nodes
For each node in the cluster to patch, upgrade, or remove:
Retrieve the IP address node: curl -s 'http://localhost:9200/_cat/nodes?v'
Exclude the single node from the cluster. Given the volume data stored on a single node, this may take a significant amount of time to complete. curl -XPUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "W.X.Y.Z"
}
}'
Wait for the process to complete and then verify that the excluded node has 0 shards. curl -s 'http://localhost:9200/_cat/allocation?v'
From the Platform Admin or on the node itself, stop the node: <$PLATFORM_PATH>/product/events-service/processor/bin/events-service.sh stop
Patch or replace the node.
Remove the temporary node exclusion: curl -XPUT 'localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{
"transient" :{ "cluster.routing.allocation.exclude._ip" : "" } }'
From the Enterprise Console UI or on the node itself, restart the node: nohup $PLATFORM_DIRECTORY/product/events-service/processor/bin/events-service.sh start -p $PLATFORM_DIRECTORY/product/events-service/processor/conf/events-service-api-store.properties &
Wait for the shard migration to complete and for the cluster indicator to turn green.
Repeat this process (steps 1-8) for the next node to patch, as needed.
After you have completed patching, downsizing, or migrating all of the nodes, if total_shards_per_node was set to -1, then re-apply the total_shards_per_node default limit: curl -XPUT 'localhost:9200/*/_settings’ -d’
{
"index": {
"index.routing.allocation.total_shards_per_node" : 3
}
}’
Additional Recommendations
When performing a rolling restart of the Elasticsearch data nodes during a minor update, our test results determined that a 30-60 minute window is required (excluding the upgrade or patch time).
Note: AppDynamics Analytics does not recommend stopping Events Service (Elasticsearch) nodes for an extended period of time. As a result, we perform this type of operation in 3-6 month intervals with 60 minutes allocated for each node of the Elasticsearch cluster .
Troubleshooting
After analyzing why your rebalancing or allocating is not correct, you can enter the following troubleshooting commands:
curl -s 'http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason' | grep UNASSIGNED
You can retrieve unassigned.reason descriptions from https://www.elastic.co/guide/en/elasticsearch/reference/2.4/cat-shards.html
If a per cluster allocation restriction exists, you can check the existing cluster settings under cluster.routing:
curl -XGET 'localhost:9200/_cluster/settings?pretty=true'
Verify that you do not have cluster based allocation restrictions:
curl -XPUT 'localhost:9200/_cluster/settings' -d' { "transient": { "cluster.routing.allocation.enable" : "all" } } '
Verify that you do not have cluster based relocation restrictions:
curl -XPUT 'localhost:9200/_cluster/settings' -d' { "transient": { "cluster.routing.rebalance.enable" : "all" } } '
Verify that you do not have cluster based rebalance restrictions:
curl -XPUT 'localhost:9200/_cluster/settings' -d' { "transient": { "cluster.routing.allocation.allow_rebalance" : "always" } } '
To enable faster rebalancing before and after removing the node, or adding the node:
curl -XPUT 'localhost:9200/_cluster/settings' -d' { "transient": { "cluster.routing.allocation.cluster_concurrent_rebalance" : 10 } } '
Verify by retrieving the problematic index settings under index.routing.allocation:
curl -XGET 'localhost:9200/<index_name_goes_here>/_settings?pretty=true'
... View more
- Find more articles tagged with:
- ES upgrade
- Events Service patching
- rolling restart
Labels:
02-28-2020
03:41 PM
What are the steps to set up and configure APM for Events Service, when deploying Analytics on-prem?
Overview
The Events Service should have its own Java Agent and Machine Agent.
Step 1. How do I set up Machine Agents for Events Service nodes?
Step 2. Set up Application Agents for the Events Service
Step 3. Set Up an Application Agent for Elasticsearch
Step 4. Begin Monitoring
Troubleshooting
Where do I view logs?
How do I troubleshoot common problems?
Step 1. How do I set up Machine Agents for Events Service Nodes?
To monitor the hardware health of the Events Service nodes, this example uses a Machine Agent installation. For more details, see the AppDynamics Machine Agent installation documentation.
Link each Machine Agent with either:
the Events Service's API, or
the Events Service's Elasticsearch Java Agents
The unique host ID configuration option enables the linking.
Complete the following procedure:
Decide whether you are linking the Machine Agent to the Events Service API or the Events Service Elasticsearch.
Note the relevant uniqueHostId value that you configured in the vmoptions file. For example:
- Dappdynamics.agent.uniqueHostId=EventsService01
Edit < machine-agent-home >\conf\controller-info.xml such that <unique-host-id>EventsService01</unique-host-id> corresponds to that value.
Verify that the Java Agent and Machine Agent are associated by viewing the Applications > Tiers and Nodes tab in the Controller:
Step 2. Set up application Agents for the Events Service
Common Application Agent Configurations
Locate your Events Service and Java Agent directories. For this example, use the following: /opt/appdynamics/events-service = <Events-Service-Home> /opt/appdynamics/AppAgent = <AppAgent-Home>
Collect Controller information by using the information in the following table:
Item
Example Data
hostname
<controller_host>
port
<controller_port>
account name
<account_name>
access key
<access_key>
name of the APM application that monitors the Events Service
EventsService
Edit the controller-info.xml file. Add your equivalent to <controller_host>, <controller_port> , <account_name> , <access_key> to AppAgent <AppAgent-Home>/conf/ or the relative directory to be <AppAgent-Home>/conf/controller-info.xml
Edit the events-service.vmoptions file under <Events-Service-Home>/processor/config>
Add -javaagent:<AppAgent-Home>/javaagent.jar in the events-service.vmoptions file
Stop the event service. ./bin/events-service.sh stop
Restart the event service to match the new configuration. ./bin/events-service.sh start -p ./conf/events-service-api-store.properties &
Step 3. Set Up an Application Agent for Elasticsearch
Add a security policy for AppAgent file to access from within the Elasticsearch process. Create the <Events-Service-Home>/conf/elasticsearch.policy file:
grant codeBase "file:<AppAgent-Home>/-” { Permission java.security.AllPermission; };
Create and edit a new <Events-Service-Home>/conf/elasticsearch.vmoptions file.
Copy the existing <Events-Service-Home>/conf/events-service.vmoptions file
Enable a separate vmoptions file for Elasticsearch by editing the <Events-Service-Home>/conf/events-service-api-store.properties property as follows: ad.es.jvm.options.name =elasticsearch.vmoptions
Append the following new properties to the new elasticsearch.vmoptions file .
-javaagent:<AppAgent-Home>/javaagent.jar -Dappdynamics.socket.collection.bci.enable=true -Dappdynamics.agent.tierName=EventsService-Elasticsearch -Dappdynamics.agent.nodeName=Elasticsearch01 -Djava.security.policy=<Events-Service-Home>/conf/elasticsearch.policy
Specify a hard (not relative) path in -Djava.security.policy .
Append the following new properties to events-service.vmoptions
-javaagent:<AppAgent-Home>/javaagent.jar -Dappdynamics.socket.collection.bci.enable=true -Dappdynamics.agent.uniqueHostId=EventsService01 -Dappdynamics.agent.tierName=EventsService-API -Dappdynamics.agent.nodeName=API01
Step 4. Begin Monitoring
Manually restart the single Events Service process that you are trying to instrument:
bin/events-service.sh stop bin/events-service.sh start -p conf/events-service-api-store.properties &
Through the Controller UI, verify that both Application Agents are reporting:
To prevent termination after logoff of jobs, detach the job that you just started from the active terminal by using the disown command:
#[1] bin/events-service.sh start -p conf/events-service-api-store.properties disown %1
Troubleshooting
Where do I view logs?
How do I troubleshoot common problems?
Where do I view logs?
View the logs from their typical locations, shown in the following table:
Log
Location
API store
<Events-Service-Home>/logs/events-service-api-store.log
Elasticsearch
<Events-Service-Home>/logs/appdynamics-events-service-cluster.log
Java Application Agent
<AppAgent-Home>/ver<Your App Agent Version>/logs/<Your agent.nodeName>/agent<Latest App Agent Start Time>.log
How do I troubleshoot common problems?
Run troubleshooting commands when you encounter common problems.
Problem
Command
When Elasticsearch is starting, you see vm.max_map_count errors
sudo sysctl -w vm.max_map_count=262144
If the system starts running out of descriptors, check /etc/security/limits.conf
Substitute the name of the user running Elasticsearch for the asterisk ("*" )
echo "*" soft nofile 65536" | sudo tee -a etc/security/limits.conf
echo "*" hard nofile 65536" | sudo tee -a etc/security/limits.conf
... View more
Labels:
02-27-2020
11:46 PM
Considerations for configuring rules to enforce Analytics trade-offs
Overview
Create the configuration rules for your on-premise Controller based on your analytics resource priorities and strategic trade-offs.
Note: These configuration rules are described in the Browser RUM (BRUM) , Mobile RUM (MRUM), and IoT Monitoring documentation.
Analytics Trade-offs
As you add new applications to your Analytics deployment, you can control resource consumption by making strategic trade-offs.
Understanding Analytics trade-offs begins with this concept:
Anytime we acquire or generate more information to analyze, that costs something. Conversely, by choosing not to acquire or generate unimportant information, we can realize savings.
To make a trade-off, you take an action that either increases or decreases the amount of information that Analytics must process. Ask yourself:
Does the potential benefit of the action justify the cost, for the user’s customer's particular situation?
Costs or savings can be in terms of:
Licenses
Hardware
Energy use
The following table shows important trade-offs to consider:
Action
Benefit
Cost
When is the cost justified?
Discover Ajax calls
More information about activity in the browser
Many more requests for the system to handle
When there is a web front-end that depends heavily on Ajax
Restrict discovery to pages of a specific type of pages that satisfy particular rules
Limit burden on the Events Service
Less information about activity in the browser
When knowing about the excluded information adds little value, and/or unnecessarily complicates your picture of application behavior
Restrict Analytics BT Events
Limit burden on the Events Service
Less information about BTs
When knowing about the excluded information adds little value, and/or unnecessarily complicates your picture of application behavior
Restrict Analytics log events
Limit burden on the Events Service
Less log information to search when troubleshooting
When knowing about the excluded information adds little value, and/or unnecessarily complicates your picture of application behavior
For more details, see Configuring Application Analytics .
... View more
- Find more articles tagged with:
- mobile
Labels:
02-27-2020
10:59 PM
The EUM Server and Events Service T-shirt Sizing Guide
Overview
The following tables contain data based on EUM Processor and Events Service load testing combinations under synthetic load. The tables have been normalized.
Once the EUM traffic profile has been estimated, you can use these maximum load measurement results to establish which T-shirt size the EUM Processor and Events Service should be.
How do I account for multiple traffic types?
If you plan to consume more than one type of traffic, then you should add EUM loads using beacons, and add Events Service loads using Normalized Performance Events.
Normalized EUM Processor and Events Service Load Testing Tables
Node Configuration and System Architecture
Events Service T-shirt Sizes
Shard Replication in Elasticsearch
Normalized EUM Processor and Events Service Load Testing Tables
Use the following maximum load measurement results to establish the EUM Processor and Event Service T-shirt sizes.
Web
Mobile
IoT
EUM Processor T-shirt Sizes
Events Service T-shirt Sizes
Web Maximum Load Testing Results
EUM Server Size
Events Svc. Size
EUM beacons/min
MAXIMUM EVENTS SERVICE Events/minute
Browser Record
Browser Session
Normalized Performance
Small
Small
60K
60K
3K
60k browser records/min * 0.33 Normalized Performance Events/browser record event
3k browser session events/min * 5 Normalized Performance Events/session event = 35k Normalized Performance Events/min
Medium
Medium
120K
120K
6K
60k browser records/min * 0.33 Normalized Performance Events/browser record event
60k browser records/min * 0.33 Normalized Performance Events/browser record event
Large
Large
300K
300K
12K
+300k browser records/min * 0.33 Normalized Performance Events/browser record event
12k Browser session events/min * 5 Normalized Performance Events/session event = 160k Normalized Performance Events/min
Mobile Maximum Load Testing Results
EUM Server Size
Events Svc. Size
Max EUM
MAXIMUM EVENTS SERVICE Events/minute
Mobile Snapshot
Mobile Session
Normalized Performance Events
Small
Large
90K
70K
40K
+70k Mobile snapshots/min * 0.33 Normalized Performance Events/ Mobile snapshot event
40k Mobile session events/min * 5 Normalized Performance Events/session event = 223.3k Normalized Performance Events/min
Medium
XLarge
130K
110K
61K
110k Mobile snapshots/min * 0.33 Normalized Performance Events/Mobile snapshot event
61k Mobile session events/min * 5 Normalized Performance Events/session event
341.7k Normalized Performance Events/min
Large
XXLarge
550K
1.35M
118K
3.5 Mobile snapshots/min * 0.33 Normalized Performance Events/Mobile snapshot event
118k Mobile session events/min * 5 Normalized Performance Events / session event = 1756.7k Normalized Performance Events/min
IoT Maximum Load Testing Results
EUM Server Size
Events Service Size
Maximum EUM
beacons/min
MAXIMUM EVENTS SERVICE
IoT Records/minute
Normalized Performance Events/minute
Small
Medium
110K
500K
500k IoT records/min * 0.33 Normalized Performance Events / IoT record event
166.7k Normalized Performance Events/min
Medium
Large
300K
600K
600k IoT records/min * 0.33 Normalized Performance Events/IoT record event
200k Normalized Performance Events/min
Large
XLarge
600K
1M
1M IoT records/min * 0.33 Normalized Performance Events/IoT record event
333.3k Normalized Performance Events/min
EUM Processor T-shirt Sizes
T-shirt Size
EUM Recommended Instance Type
EUM Processor JVM Heap Size
Small
4 core, 16GB RAM, disk 300GB 600IOPS (m4.xlarge)
11GB
Medium
8 core, 32GB RAM, disk 300GB 600IOPS (m4.2xlarge)
30GB
Large
16 core, 64GB RAM, disk 300GB 600IOPS (m4.4xlarge)
50GB
Node Configuration and System Architecture
The Events Service node network setup used in this example has speeds of >= 1 GBPs. Latencies are similar to a switched network, such that:
Average SSD latencies are < 1.2ms per read/write operation
Average nVME latencies are < 0.4ms per read/write operation
If SaaS deployment is not an option, consider splitting the deployment on the Application level into multiple accounts and multiple controllers.
AppDynamics strongly recommends using SSD-backed instances for Analytics—SAN is not recommended. This is because AppDynamics follows the official Elasticsearch hardware guidelines, configurations vary widely, and AppDynamics cannot guarantee that a particular SAN configuration is supported.
Finally, you should avoid network-attached storage (NAS). A NAS solution is often slower, displays larger latencies with a wider deviation in average latency, and is a single point of failure.
Events Service T-shirt Sizes
The following table shows the recommended number of nodes and node configuration for each T-shirt size.
See Stipulate throughput by license type in the Quick Method.
T-Shirt Size
Number
of Nodes
Normalized Performance Events/minute
Recommended
Node Configuration
SaaS
Recommended or Required?
X-Small
1
50000
4 core, SSD (ideally as nVME) or HDD 7,200 RPM
No
Proof of Concept, Dev, and Demo
Small
3
100000
4 core, SSD (ideally as nVME)
(i2.xlarge)
No
Medium
3
195761
8 core, SSD (ideally as nVME)
(i2.2xlarge)
No
Large
5
284000
8 core, SSD (ideally as nVME)
(i3.2xlarge)
Recommended sometimes
XLarge
10
438000
8-16 core, nVME (i3.2xlarge)
Recommended
XXLarge
20
600000
8-16 core, nVME
(i3.2xlarge)
Depending on query load and other factors that impact performance, larger nodes may be more suitable
Required
On-premises deployments of this size are not supported
The overall deployment should be structured as multiple smaller deployments, each with its own Controller
Once you determine your T-shirt size, refer to the non-virtual hardware specifications that correspond to the relevant EC2 instance size:
AWS EC2 i2
AWS EC2 i3
The Events Service should be on separate, dedicated server(s).
Shard Replication in Elasticsearch
We assume a replication factor of 1 for Elasticsearch. Performance tests show that there is an upper limit to the average CPU performance of a cluster. Due to the replication and synchronization of Elasticsearch node segments, the limit decreases as the number of nodes increases.
Risks and Benefits
Not enabling Elasticsearch replication has both risks and benefits:
Risks
The lack of redundancy highly increases the likelihood of data loss if any nodes in the Events Service cluster go down.
Benefits
CPU ingestion utilization decreases by approximately 55%
Data drive storage requirements decrease by approximately 50%
Data Replication
Since Elasticsearch builds in redundancy with replicas, there is no need for RAID configurations other than RAID 0. For this reason, Elasticsearch recommends using RAID 0 and increasing write throughput.
If RAID (1,3,5) replication is selected, AppDynamics does not provide support for disk performance or data integrity issues.
... View more
- Find more articles tagged with:
- mobile
02-11-2020
02:22 PM
1 Kudo
When planning instrumentation for deploying Analytics on-prem, how do I build the traffic profile?
Table of Contents
How do I understand my traffic for sizing?
How do I consider EUM usage when building my Events Service traffic profile?
How do I understand my traffic for sizing?
Understanding the characteristics of your traffic is the foundation for successful sizing. To build a traffic profile, answer the following questions.
If I have Business Transactions (BTs) in an APM application I want to enable Analytics on:
Is it a new or an existing APM application?
Do I have Log Analytics licenses?
Additional estimate methods
Am I seeking to enable Analytics on an existing or new APM application?
Existing APM application
Identify the BTs you need to instrument from the APM application page.
For each BT, count how many tiers, and on average how many requests are going through each tier for a single origin BT. Each tier for a single BT produces at least one event if a request passes through the tier and node.
Identify the peak sum of BTs per minute for all BTs and sum the count of events per those BTs to estimate the peak events per min. This enables you to size for the number of nodes and CPU.
New APM application
For a more accurate option, instrument and then go through an existing application.
If not instrumented, you should try to access using the available number of requests coming through each tier by:
count of log lines/timestamps
existing database aggregations
Q/A with application stakeholders
Figure 1 shows an example of how many BT events per tier are generated. Figure 2 shows the estimated number of events per APM BT.
A tier can be called multiple times by originating BT, thus producing multiple events. If the request performs asynchronous calls on a single node then it's possible that a single request on a single tier could output multiple segments (events). Each async thread call becomes its own segment in the Java Agent. In .NET, results from an async call could be merged into one or more segments (events).
Figure 1: Example of how many BT events per tier are generated
Figure 2: Estimated number of events per APM BT
Do I have Log Analytics licenses?
If so, then:
Collect log samples for a period of time per source.
Determine the start and end times of each log
Determine the file sizes which determine the required disk size for the Events Service footprint (which is approximately = file size * 1.7)
Determine the number of lines or timestamps. Each is counted as one event
Calculate the average number of events per minute for each log source
Avg number of lines per source number 1 = (file 1 num of lines / (end time file 1 - start time file 1) + … + (file N num of lines / (end time file N - start time file N )) / number of files in source number 1 Then, sum the averages for an overall average number of log events
Logs
You can also use these additional estimation methods:
What licenses do you have?
The limits on traffic that licenses specify serve as useful benchmarks when you estimate events later in the Instrument, Run, and Refine process.
What is the shape of traffic over time?
Is your traffic consistent or spiky?
What times of the day, week, month, season, or year produce changes?
You should plan your deployment to perform well at peak loads, or at least at the 90th percentile loads.
What are typical numbers for concurrent users?
Does this vary over time, and if so, what are the patterns?
What are typical patterns of user behavior?
Are users doing the exact same thing repeatedly, or taking diverse actions?
How are the patterns concentrated or spread out over time?
Mobile app crash rate
If any, what is the typical crash rate for your mobile apps?
How do I consider End User Management (EUM) usage when building my Events Service traffic profile?
Unlike Business Transaction (BT) or Log Analytics licenses, the number of EUM licenses does not translate into a good basis to estimate what size machine can support a particular deployment. EUM performance depends on the number of beacons.
To accurately estimate expected beacons, you must understand:
The millions of browser page views and/or IOT events allowed by the licenses can be distributed in many ways over the year. Temporal distribution of traffic is key.
Mobile licenses, by contrast, allow some thousands of mobile agents to connect per month—but any of those agents can send an unlimited number of requests. User behavior is key.
When building the traffic profile, you need detailed and accurate information.
The following will help you determine the projected EUM Analytics load:
Browser Request User Monitoring (BRUM)
EUM license types and their unit-to-beacon relationships
Browser Request User Monitoring (BRUM)
How many page views?
How many beacons per page view? Typically, it is one. Typically, each beacon would produce one session event and one browser request event.
Mobile Request User Monitoring (MRUM)
How many session events per user event? Typically, it is one.
How many MobileSnapshots per user event? Typically, it is one.
How many crashes in a worst-case scenario? Review the MobileCrashReport.
Internet of Things (IoT)
How many devices and device types?
What is a typical transmission rate per minute per device type?
This table lists the unit-to-beacon relationship for each EUM license type.
EUM License Type
Units
Unit-to-Beacon Relationship
BRUM
one million page views/year
One to one
IoT
TBD million IOT events/year
One to one
MRUM
5K mobile agents/month can connect
No unit-beacon relationship can be derived because:
Each mobile agent can send unlimited requests per month.
The total number of mobile beacons is not limited.
EUM license types and their unit-to-beacon relationships
Note: For up-to-date licensing information for EUM Analytics, click here .
Note: By default, each EUM Analytics event type is limited to 50GB per day. You can adjust that limit inside each Events Service node’s properties located in this directory: <Events-Service-Home>/conf/events-service-api-store.properties in the ad.es.event.maxDailyEventTypeBytesQuota folder
... View more
- Find more articles tagged with:
- mobile
- Mobile Real User Monitoring
Labels:
01-17-2020
09:12 PM
1 Kudo
Overview
To deploy analytics successfully, you should understand the following concepts:
EUM and Analytics Events
How do Events Flow Through the System?
Beacons
Resource Timing Snapshots
Performance Considerations
Analytics Trade-offs
EUM and Analytics Events
In the AppDynamics model, each major service produces its own types of events. That includes services not relevant here such as Application Monitoring, Database Visibility, and Server Visibility.
Note : Sizing analytics for database visibility is treated as a special case. If you have questions, contact your AppDynamics representative.
In this example, we'll look at End User Monitoring and Analytics services, and their respective events.
Service
Event Definition
End User Monitoring
Change in the state of a web, mobile, or IoT front-end
Analytics
Change in the state of:
Business Transaction
Application log
Custom-defined event created for analytics purposes
Sometimes it makes sense to distinguish between EUM and Analytics events. However, in this discussion, we'll treat both types of events as Analytics events.
Event Categorization by Behavior
Another way to categorize events is to consider how they behave in the system. This yields two broad categories of event types described in the following table:
Event Category
Event Type
How the Event Behaves
Examples
Upsert, also known as Update
Transaction Analytics, Browser Session, Mobile Session
Freshly creates something that is modifiable, or modifies that thing; often events with multiple segments or components
An Upsert could modify or create:
Business Transaction (BT) which has multiple segments, each associated with a different tier
Browser session which has multiple variables, such as one that is modified when the user clicks a radio button
Publish
Log, Snapshot, Crash Report, IoT, Custom API
Creates immutable artifacts
A Publish event could create a log entry
How the Events Service is provisioned in terms of hardware, OS tuning, and network tuning, determines the limits of Events Service's ability to handle incoming events, and the quality of its performance. Hardware includes cores (CPUs), memory, and disk storage.
How do Events flow through the system?
What Flows into the Events Service from the EUM Server?
What Flows into the Events Service from the Analytics Agent?
Controller Queries
Events come to the Events Service in two flows:
from the EUM Server
from the Analytics Agent
The Events Service also handles queries from the Controller. Even though Controller queries are not events, but they still contribute to load on the Events Service.
What flows into the Events Service from the EUM Server?
Demand on the EUM Server consists of the sum total of clicks and other actions that occur in browsers, or on mobile or IoT devices. We cannot limit that demand because we have no control over what flows into the EUM Server.
However, we do control what flows out of the EUM Server, and thus can limit the demand that the Events Service and other parts of the system must handle.
The process is:
AppDynamics agents in browsers, and on mobile and IoT devices, generate a snippet of data called a beacon for every click or other action.
The agents send the beacons to the EUM Server.
The EUM Server determines whether the beacons are valid, based on configurable rules.
Based on beacon content, the EUM Server generates snapshots, session information, and metadata and then sends these to the Events Service.
Based on configurable rules, the EUM Server decides whether the event that the beacon represents should be discovered or not. Events that are not discovered are dropped from the system; the Events Service is never aware of them.
What flows into the Events Service from the Analytics Agent?
The Analytics Agent
Forwards Business Transaction events to the Events Service
Generates Log events and sends them to the Events Service
Controller Queries
The Controller persists metadata, which includes license and account information, and event schemas. It then synchronizes the metadata with the Events Service. This enables AppDynamics to process licensing rules and limits, honor expirations, and authenticate publish/query calls using the account name and access key.
When the user creates Analytics queries in ADQL, the Controller sends them to the Events Service—these are Controller Queries.
Controller Queries also originate from EUM and DBMon dashboards and screens.
Beacons
Beacons, Events, and Sessions
Beacons from Browser Actions
Beacons from Mobile and IoT Actions
When an AppDynamics Agent, or SDK instrumented code, detects an action in the browser or on a mobile or IoT device, it sends a beacon (a JSON file which contains keys and values) to the EUM Server.
Beacons, Events, and Sessions
A beacon is a network request from an EUM agent that includes an agent ID and metadata about the events or activities that were reported.
A session consists of a series of beacons that share the same agent ID and which occurred within a configured period of time. The default configured period is five minutes. A session ends and a new one is created if there was no activity during that configured period of time.
For example, if a series of beacons are sent within the default configured period of five minutes with no inactivity and have the same agent ID, the events reported in those beacons are included in the same session. If an event occurs outside of the configured period of inactivity, however, a new session is created.
Beacons from Browser Actions
The JavaScript Agent generates a beacon for every browser action. The EUM Server determines which page type the beacon represents. Page types are virtual page , base page , iFrame , and Ajax call .
According to the default page naming rules :
Virtual and base pages and iFrames are discovered
Ajax calls are not discovered, because they generate more requests than other page types
When you want to constrain which pages to discover, work with page naming rules. For example, to direct the system to only discover pages from the host server1, you would write the rule hostname = servers.
For more details, see:
Configure Page Identification and Naming for its description of virtual and base pages, and iFrames
Beacons from Mobile and IoT Actions
The Mobile Agent generates a beacon for every action or event on a mobile device. The IoT Java Agent generates a beacon for every action on an IoT device.
Use the C/C++, Java SDK, or the REST API to specify the events to send.
For more details, see:
Instrument Applications with the IoT Java SDK
Instrument Applications with the IoT C/C++ SDK
Instrument Applications with the IoT REST APIs
The EUM Server inspects the beacons and applies rules based on the characteristics and content of the network requests.
Resource Timing Snapshots
AppDynamics Agents collect load times for each resource on a web page and send the load times to the EUM Server. The EUM Server packages up a percentage of the page-load events as a Resource Timing Snapshot and sends that to Analytics. Analytics uses Resource Timing Snapshots to tell AppDynamics users which resources use the most time to load.
Performance considerations
For browser traffic, t he more resources in a web page, the more CPU EUM processor uses.
Upserts are more expensive on Events Service side.
Analytics Trade-offs
As you add new applications to your Analytics deployment, you can control resource consumption by making strategic trade-offs.
Understanding Analytics trade-offs begins with the following concept:
Anytime we acquire or generate more information to analyze, that costs something. Conversely, by choosing not to acquire or generate unimportant information, we can realize savings.
To make a trade-off, you take an action that either increases or decreases the amount of information that Analytics must process.
Ask yourself:
Does the potential benefit of the action justify the cost, for the customer's particular situation?
Costs or savings can be in terms of:
Licenses
Hardware
Energy use
The following table shows important trade-offs to consider:
Action
Benefit
Cost
When is the cost justified?
Discover Ajax calls
More information about activity in the browser
Many more requests for the system to handle
When there is a web front-end that depends heavily on Ajax
Restrict discovery to pages of a specific type or pages that satisfy particular rules
Limit burden on the Events Service
Less information about activity in the browser
When knowing about the excluded information adds little value, and/or unnecessarily complicates your picture of application behavior
Restrict Analytics BT events
Limit burden on the Events Service
Less information about BTs
Restrict Analytics log events
Limit burden on the Events Service
Less log information to search when troubleshooting
For more details, see Configuring Application Analytics .
... View more
- Find more articles tagged with:
- EUM
- Events Service
- mobile
Labels:
Latest Activity
- Got a Kudo for How do I build my traffic profile for Analytics instrumentation?. 01-15-2021 10:04 AM
- Posted How do I safely patch nodes in Events Service on Knowledge Base. 11-23-2020 09:20 PM
- Kudoed New in March 2020: AppDynamics is switching to calendar versioning, why? for Deena.Shanghavi. 03-03-2020 11:04 AM
- Kudoed Product Update, February 2020 (v4.5.17) for Claudia.Landivar. 03-03-2020 11:04 AM
- Got a Kudo for Understanding EUM and Events Service Concepts. 02-29-2020 02:14 AM
- Posted How do I set up APM for Events Service when deploying Analytics on premises? on Knowledge Base. 02-28-2020 03:41 PM
- Posted How do I configure Rules to enforce Analytics trade-offs when deploying Analytics on-prem? on Knowledge Base. 02-27-2020 11:46 PM
- Posted How do I size the EUM Server and Events Service? on Knowledge Base. 02-27-2020 10:59 PM
- Posted How do I build my traffic profile for Analytics instrumentation? on Knowledge Base. 02-11-2020 02:22 PM
- Posted Understanding EUM and Events Service Concepts on Knowledge Base. 01-17-2020 09:12 PM
- Kudoed How do I resolve an Events Service stack trace start-up error? for Deepanshu.Grover. 10-19-2018 02:04 PM
- Kudoed How are customers using Business iQ for AppDynamics? for Erin. 11-15-2017 07:53 PM
- Tagged How are customers using Business iQ for AppDynamics? on Knowledge Base. 11-15-2017 07:53 PM
Community Stats
Date Registered | 01-12-2016 02:42 PM |
Date Last Visited | 02-25-2021 05:00 PM |
Total Messages Posted | 7 |
Total Kudos Received | 2 |