I am receiving the following notification on the Controller.
This message indicates that the data buffer used to store the metrics is full before these can be flushed to the underlying data store. The buffer is an in-memory cache used to store metrics temporarily in the Controller Appserver and is emptied periodically after the metric data is written to the persistence store. If the buffer becomes full before it is emptied, then the metric data buffer overflow error is generated. When this occurs, the new metrics are dropped by the Controller until space is available in the buffer.
The buffer can be intermittently or consistently full. The main determining factors that affect this metric flow are the metric ingestion rate and I/O throughput (and latency) of the underlying storage system. The buffers are sized according to the Controller profile but can vary depending upon the environment. If the buffers are consistently full, then it mostly means the incoming metrics rate is high and buffers are not sized accordingly. If the buffers are intermittently full, it means there is a sudden spike in the metrics rate and/or the I/O throughput of storage is not sufficient enough to flush the metrics in a timely manner. It’s usually the latter and is often seen in SAN-based storages.
Let's say the metrics data buffer is sized as 300MB and it can hold approximately 1 million metric data points.
1) If each minute metrics rate is <= 1 million/min and the throughput to write metrics to storage is 1+ million/min, then the buffers will not overflow. [Ideal situation]
2) If the incoming metrics rate is 2 million/min, then the buffer will be full every minute because buffers can only hold 1 million at any given time and extra metrics will potentially be dropped. [Buffer not sized correctly]
3) If the incoming metrics rate is ~1 million/min and the disk write throughput is not fast enough, then buffers would still be required to hold data worth more than 1 minute and potentially will get full since they’re not getting flushed to keep up with the incoming rate. [Slow disk write]
1) If it’s determined that the metrics buffer is not sized properly, increasing the buffer size will fix the problem. The approximate calculation we use for metrics buffer size is a 300-400 MB per one million metrics/min metrics rate. This size considers extra space to hold metrics data for 1+ minute(s) worth of data. The buffer uses the Controller's heap memory; therefore, it’s important that the Controller's host has enough RAM (reserved) available that can be allocated to the Controller’s heap if required.
If you have an on-prem Controller, log in to the admin.jsp page of the Controller by logging out of the existing account and going to url <host>/controller/admin.jsp. Set the value for the Controller setting = "metrics.buffer.size" to a higher value and restart the Controller server. The buffers are sized at the Controller startup, so any change in buffer requires a Controller Appserver restart.
If you have a SaaS Controller, the buffer sizes are usually set appropriately but a sudden spike can lead to buffer overflow. If you notice overflow notifications, contact the AppDynamics Support team.
2) If the metrics buffer size is correctly set, then most likely the underlying cause of buffer overflow is slow disk I/O throughput. Check the Controller profile, sizing and disk I/O requirements outlined here: Controller System Requirements
If you need further assistance, contact AppDynamics Support team.