Cisco AppDynamics Community

Georgiy.Chigrichenko · ‎02-27-2020

The EUM Server and Events Service T-shirt Sizing Guide

Overview

The following tables contain data based on EUM Processor and Events Service load testing combinations under synthetic load. The tables have been normalized.

Once the EUM traffic profile has been estimated, you can use these maximum load measurement results to establish which T-shirt size the EUM Processor and Events Service should be.

How do I account for multiple traffic types?

If you plan to consume more than one type of traffic, then you should add EUM loads using beacons, and add Events Service loads using Normalized Performance Events.

Normalized EUM Processor and Events Service Load Testing Tables
Node Configuration and System Architecture
Events Service T-shirt Sizes
Shard Replication in Elasticsearch

Normalized EUM Processor and Events Service Load Testing Tables

Use the following maximum load measurement results to establish the EUM Processor and Event Service T-shirt sizes.

Web
Mobile
IoT
EUM Processor T-shirt Sizes
Events Service T-shirt Sizes

Web Maximum Load Testing Results

EUM Server Size	Events Svc. Size	EUM beacons/min	MAXIMUM EVENTS SERVICE Events/minute
			Browser Record	Browser Session	Normalized Performance
Small	Small	60K	60K	3K	60k browser records/min * 0.33 Normalized Performance Events/browser record event
					3k browser session events/min * 5 Normalized Performance Events/session event = 35k Normalized Performance Events/min
Medium	Medium	120K	120K	6K	60k browser records/min * 0.33 Normalized Performance Events/browser record event
					60k browser records/min * 0.33 Normalized Performance Events/browser record event
Large	Large	300K	300K	12K	+300k browser records/min * 0.33 Normalized Performance Events/browser record event
					12k Browser session events/min * 5 Normalized Performance Events/session event = 160k Normalized Performance Events/min

Mobile Maximum Load Testing Results

EUM Server Size	Events Svc. Size	Max EUM	MAXIMUM EVENTS SERVICE Events/minute
			Mobile Snapshot	Mobile Session	Normalized Performance Events
Small	Large	90K	70K	40K	+70k Mobile snapshots/min * 0.33 Normalized Performance Events/ Mobile snapshot event
					40k Mobile session events/min * 5 Normalized Performance Events/session event = 223.3k Normalized Performance Events/min
Medium	XLarge	130K	110K	61K	110k Mobile snapshots/min * 0.33 Normalized Performance Events/Mobile snapshot event
					61k Mobile session events/min * 5 Normalized Performance Events/session event
					341.7k Normalized Performance Events/min
Large	XXLarge	550K	1.35M	118K	3.5 Mobile snapshots/min * 0.33 Normalized Performance Events/Mobile snapshot event
					118k Mobile session events/min * 5 Normalized Performance Events / session event = 1756.7k Normalized Performance Events/min

IoT Maximum Load Testing Results

EUM Server Size	Events Service Size	Maximum EUM beacons/min	MAXIMUM EVENTS SERVICE
EUM Server Size	Events Service Size	Maximum EUM beacons/min	IoT Records/minute	Normalized Performance Events/minute
Small	Medium	110K	500K	500k IoT records/min * 0.33 Normalized Performance Events / IoT record event
Small	Medium	110K	500K	166.7k Normalized Performance Events/min
Medium	Large	300K	600K	600k IoT records/min * 0.33 Normalized Performance Events/IoT record event
Medium	Large	300K	600K	200k Normalized Performance Events/min
Large	XLarge	600K	1M	1M IoT records/min * 0.33 Normalized Performance Events/IoT record event
Large	XLarge	600K	1M	333.3k Normalized Performance Events/min

EUM Processor T-shirt Sizes

T-shirt Size	EUM Recommended Instance Type	EUM Processor JVM Heap Size
Small	4 core, 16GB RAM, disk 300GB 600IOPS (m4.xlarge)	11GB
Medium	8 core, 32GB RAM, disk 300GB 600IOPS (m4.2xlarge)	30GB
Large	16 core, 64GB RAM, disk 300GB 600IOPS (m4.4xlarge)	50GB

Node Configuration and System Architecture

The Events Service node network setup used in this example has speeds of >= 1 GBPs. Latencies are similar to a switched network, such that:

Average SSD latencies are < 1.2ms per read/write operation
Average nVME latencies are < 0.4ms per read/write operation

If SaaS deployment is not an option, consider splitting the deployment on the Application level into multiple accounts and multiple controllers.

AppDynamics strongly recommends using SSD-backed instances for Analytics—SAN is not recommended. This is because AppDynamics follows the official Elasticsearch hardware guidelines, configurations vary widely, and AppDynamics cannot guarantee that a particular SAN configuration is supported.

Finally, you should avoid network-attached storage (NAS). A NAS solution is often slower, displays larger latencies with a wider deviation in average latency, and is a single point of failure.

Events Service T-shirt Sizes

The following table shows the recommended number of nodes and node configuration for each T-shirt size.

See Stipulate throughput by license type in the Quick Method.

T-Shirt Size	Number of Nodes	Normalized Performance Events/minute	Recommended Node Configuration	SaaS Recommended or Required?
X-Small	1	50000	4 core, SSD (ideally as nVME) or HDD 7,200 RPM	No Proof of Concept, Dev, and Demo
Small	3	100000	4 core, SSD (ideally as nVME) (i2.xlarge)	No
Medium	3	195761	8 core, SSD (ideally as nVME) (i2.2xlarge)	No
Large	5	284000	8 core, SSD (ideally as nVME) (i3.2xlarge)	Recommended sometimes
XLarge	10	438000	8-16 core, nVME (i3.2xlarge)	Recommended
XXLarge	20	600000	8-16 core, nVME (i3.2xlarge) Depending on query load and other factors that impact performance, larger nodes may be more suitable	Required On-premises deployments of this size are not supported The overall deployment should be structured as multiple smaller deployments, each with its own Controller

Once you determine your T-shirt size, refer to the non-virtual hardware specifications that correspond to the relevant EC2 instance size:

The Events Service should be on separate, dedicated server(s).

Shard Replication in Elasticsearch

We assume a replication factor of 1 for Elasticsearch. Performance tests show that there is an upper limit to the average CPU performance of a cluster. Due to the replication and synchronization of Elasticsearch node segments, the limit decreases as the number of nodes increases.

Risks and Benefits

Not enabling Elasticsearch replication has both risks and benefits:

Risks

The lack of redundancy highly increases the likelihood of data loss if any nodes in the Events Service cluster go down.

Benefits

CPU ingestion utilization decreases by approximately 55%
Data drive storage requirements decrease by approximately 50%

Data Replication

Since Elasticsearch builds in redundancy with replicas, there is no need for RAID configurations other than RAID 0. For this reason, Elasticsearch recommends using RAID 0 and increasing write throughput.

If RAID (1,3,5) replication is selected, AppDynamics does not provide support for disk performance or data integrity issues.