cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Anonymous
Not applicable

Updated 7/23/18

 

Issue

I am receiving the following notification on the Controller.

 

CONTROLLER_METRIC_DATA_BUFFER_OVERFLOW 

 

Explanation

This message indicates that the data buffer used to store the metrics is full before these can be flushed to the underlying data store. The buffer is an in-memory cache used to store metrics temporarily in the Controller Appserver and is emptied periodically after the metric data is written to the persistence store. If the buffer becomes full before it is emptied, then the metric data buffer overflow error is generated. When this occurs, the new metrics are dropped by the Controller until space is available in the buffer.

 

The buffer can be intermittently or consistently full. The main determining factors that affect this metric flow are the metric ingestion rate and I/O throughput (and latency) of the underlying storage system. The buffers are sized according to the Controller profile but can vary depending upon the environment. If the buffers are consistently full, then it mostly means the incoming metrics rate is high and buffers are not sized accordingly. If the buffers are intermittently full, it means there is a sudden spike in the metrics rate and/or the I/O throughput of storage is not sufficient enough to flush the metrics in a timely manner. It’s usually the latter and is often seen in SAN-based storages.

 

Example:

Let's say the metrics data buffer is sized as 300MB and it can hold approximately 1 million metric data points.

 

1) If each minute metrics rate is <= 1 million/min and the throughput to write metrics to storage is 1+ million/min, then the buffers will not overflow. [Ideal situation]

2) If the incoming metrics rate is 2 million/min, then the buffer will be full every minute because buffers can only hold 1 million at any given time and extra metrics will potentially be dropped. [Buffer not sized correctly]

3) If the incoming metrics rate is ~1 million/min and the disk write throughput is not fast enough, then buffers would still be required to hold data worth more than 1 minute and potentially will get full since they’re not getting flushed to keep up with the incoming rate. [Slow disk write]

 

Solution

1) If it’s determined that the metrics buffer is not sized properly, increasing the buffer size will fix the problem. The approximate calculation we use for metrics buffer size is a 300-400 MB per one million metrics/min metrics rate. This size considers extra space to hold metrics data for 1+ minute(s) worth of data. The buffer uses the Controller's heap memory; therefore, it’s important that the Controller's host has enough RAM (reserved) available that can be allocated to the Controller’s heap if required.

 

If you have an on-prem Controller, log in to the admin.jsp page of the Controller by logging out of the existing account and going to url <host>/controller/admin.jsp. Set the value for the Controller setting = "metrics.buffer.size" to a higher value and restart the Controller server. The buffers are sized at the Controller startup, so any change in buffer requires a Controller Appserver restart.

 

If you have a SaaS Controller, the buffer sizes are usually set appropriately but a sudden spike can lead to buffer overflow. If you notice overflow notifications, contact the AppDynamics Support team.

 

2) If the metrics buffer size is correctly set, then most likely the underlying cause of buffer overflow is slow disk I/O throughput. Check the Controller profile, sizing and disk I/O requirements outlined here: Controller System Requirements

 

If you need further assistance, contact AppDynamics Support team.

Comments
Harish.Kumar
AppDynamics Team

Pamela,

 

Nice Aricle! Regarding "This change does not require server restart.", it's not a hot property and requires appserver restart. Please update the article accordingly. 

 

Thanks,

Harish

Anonymous
Not applicable

Thanks Harish! We appreciate the correction :) 

 

I've updated the article to say "This change requires a server restart."

 

 

Simon.Dixon
Discoverer

Hi,

 

Am I correct in assuming that the "Server restart" refers to the Controller?

Or is it the Agent Host?

 

We are getting the following on our servers:

2018-01-22 06:56:18.5601 117288 w3wp 2 28 Warn MetricPoller Coordinator connection problem: System.IndexOutOfRangeException: Index was outside the bounds of the array.

 

Cheers

Simon

Harish.Kumar
AppDynamics Team

Hi Simon, 

 

Yes, your understanding is correct. this property change requires controller restart. 

 

Thanks,

Harish

Anonymous
Not applicable

Hi 

The article is informative and thanks for the detail explanation.

 

VInodkumarmvn
Developer

 
Nina.Wolinsky
Community Manager

Thanks for the feedback @Anonymous! 

Version history
Last update:
‎09-14-2018 11:26 AM
Updated by:
Join Us On December 10
Learn how Splunk and AppDynamics are redefining observability


Register Now!

Observe and Explore
Dive into our Community Blog for the Latest Insights and Updates!


Read the blog here
Contributors