Cisco AppDynamics Community

Noopur.Tibdiwal · ‎09-06-2023

This article discusses some of the most common issues faced when using Linux-based Private Synthetic Agent.

What are the prerequisites for debugging Linux private Synthetic Agent issues?
How do I capture PSA logs to further troubleshoot issues?
What errors arise from unsupported Kubernetes versions?
How do I install PSA on a machine without an internet connection?
How do I resolve a recurring ‘Test Agent Failed to Post Result’ error?
How do I resolve the 'DNS resolution failed (ERROR)'?
How do I resolve the error thrown when cluster-level permissions are missing?
How do I resolve a Heimdall log error?
How do I resolve a Heimdall error on Docker-based PSA?

What are the prerequisites for debugging Linux private Synthetic Agent issues?

Make sure the deployment is done on officially supported PSA platforms, prerequisites and hardware requirements:

See Install the Private Synthetic Agent (Web and API Monitoring) in the documentation, under End User Monitoring > Synthetic Monitoring
Currently, the kernel architecture we support for installing PSA (Web Mon and API Mon) is x86-64, which is also referred to as x64, x86-64,AMD64, and Intel 64.

Back to TOC

How do I capture PSA logs to further troubleshoot issues?

To properly capture PSA logs, capture the pod details in separate files as instructed in the notes:

kubectl get pods --namespace <namespace> > {YOUR_PREFERRED_PATH}/pods-status.txt

kubectl get pods -o wide --all-namespaces > {YOUR_PREFERRED_PATH}/pods-status_wide.txt

kubectl describe pod -n <namespace> <pod-name> > {YOUR_PREFERRED_PATH}/describe-pod-<pod-name>.txt

kubectl logs <pod-name> --namespace <namespace> > {YOUR_PREFERRED_PATH}/logs-pod-<pod-name>.txt

Notes:

Replace <pod-name> and <namespace> with your existing values.
By default, <namespace> may have a value measurement.
To get all the <pod-name>, the first command will list them for you.
Make sure to capture the output of commands 3 and 4 for all <pod-name> per names listed in
command 1, in separate files to avoid overwriting the same file.

Back to TOC

What errors arise from unsupported Kubernetes versions?

Below are some of the errors reported when an unsupported K8s version is used.

Kubectl version | Insufficient resources for K8s | CrashLoopBackOff error |
Low resource allocation to Chrome API/Agent

Kubectl version

You can check the installed kubectl version using "kubectl version":

INFO 1 --- [or-http-epoll-1] c.a.s.heimdall.client.ReactiveWebClient  : [34927359]  Response: Status: 500 

Cache-Control:no-store 
Pragma:no-cache 
Content-Type:application/json 
X-Content-Type-Options:nosniff 
X-Frame-Options:DENY 
X-XSS-Protection:1 ; mode=block 
Referrer-Policy:no-referrer 
content-length:226 

ERROR 1 --- [or-http-epoll-1] c.a.s.h.service.MeasurementService: Failed to submit measurement with id : 8b71c4f4-7541-41f8-9f6a-8e762502d117~02b75cbc-5aaf-43f6-9d1d-30e20a634977 

[SEVERE][main][TcpDiscoverySpi] Failed to get registered addresses from IP finder (retrying every 2000ms; change 'reconnectDelay' to configure the frequency of retries) [maxTimeout=0] 

class org.apache.ignite.spi.IgniteSpiException: Failed to retrieve Ignite pods IP addresses.

The error below (and in the attached txt file) is also seen:

Warning  Unhealthy  23m (x4 over 25m)     kubelet            Readiness probe failed: Get "http://10.244.0.3:8080/ignite?cmd=probe": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Normal   Killing    23m (x2 over 25m)     kubelet            Container ignite failed liveness probe, will be restarted

Warning  Unhealthy  23m (x2 over 25m)     kubelet            Readiness probe failed: Get "http://10.244.0.3:8080/ignite?cmd=probe": EOF

Warning  Unhealthy  23m (x3 over 25m)     kubelet            Readiness probe failed: Get "http://10.244.0.3:8080/ignite?cmd=probe": dial tcp 10.244.0.3:8080: connect: connection refused

Normal   Pulled     23m (x2 over 25m)     kubelet            Container image "apacheignite/ignite:2.14.0-jdk11" already present on machine

Warning  Unhealthy  5m43s (x25 over 25m)  kubelet            Liveness probe failed: Get "http://10.244.0.3:8080/ignite?cmd=version": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Warning  BackOff    92s (x52 over 16m)    kubelet            Back-off restarting failed container ignite in pod synth-ignite-psa-0_ignite(1cba5f54-7723-4be4-a7ba-ce48fc6eacaf)

Back to Errors from Unsupported K8s versions | Back to TOC

Insufficient resources provided to Kubernetes

When not enough resources (CPU and Memory defined in values.yaml) are provided to the K8s env. (for example, when starting minikube).

Events Type ===========	Reason ===========	Age =======	From ==========	Message ============
Warning	FailedScheduling	5m (x863 over 3d3h)	default-scheduler	0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod

To quickly check the current resources that minikube is running with, use the following:

cat ~/.minikube/config.json | grep "Memory\|CPUs"

NOTE | In case of no output, make sure to use config.json under the profile with which minikube was started.

Back to Errors from Unsupported K8s versions | Back to TOC

CrashLoopBackOff image pulling error

When you see CrashLoopBackOff or Back-off pulling image error for Minikube-based PSA, update the values.yaml heimdall > pullPolicy to Never and re-deploy PSA. This fixes the error.

For other platforms, please refer to our documentation for specific instructions by platform: Deploy the Web Monitoring PSA and API Monitoring PSA.

Events: Type ===========	Reason =======	Age ===========	From =======	Message ===========
Normal	BackOff	3m7s (x18915 over 3d3h)	kubelet	Back-off pulling image “sum-heimdall:<<heimdall-tag>”

Back to Errors from Unsupported K8s versions | Back to TOC

Low resource allocation to the Chrome/API Agent

If you're facing slower execution of the jobs/ High Session Duration to complete the jobs are mainly because of low resources allocated to the PSA, specifically to the Chrome/API agent.

Try increasing the resources (CPU and memory) for the Chrome/API agent in values.yaml and re-deploy the PSA.

chromeAgentResources:
min_cpu: "1"
max_cpu: "2"
min_mem: 1024Mi
max_mem: 8192Mi

Back to Errors from Unsupported K8s versions | Back to TOC

How do I install PSA on a machine without an internet connection?

Use the attached document “Install PSA with minikube on an offline machine.pdf.”

You can use any machine with an active internet connection as your temporary machine. Build PSA components on that temporary machine and then export them to your target server machine without an active internet connection.

PLEASE NOTE | The steps in the provided PDF have not been tested in-house by Cisco AppDynamics Support

NOTE | Linux PSA version >= v22.9 doesn't need Postgres DB. Please refer to EUM > Synthetics > Install the Private Synthetic Agent (Web and API Monitoring) in our documentation.

Back to TOC

How do I resolve a recurring ‘Test Agent Failed to Post Result’ error?

If you're periodically or intermittently facing a ‘Test Agent Failed to Post Result’ error, redeploy PSA after updating values.yaml for Heimdall resources and Chrome agent resources (recommended):

heimdallResources:
min_cpu : "3"
max_cpu: "3"
min_mem: 5Gi
max_mem: 5Gi

chromeAgentResources:
min_cpu: "1"
max_cpu: "2"
min_mem: 2048Mi
max_mem: 3072Mi

Back to TOC

How do I resolve the 'DNS resolution failed (ERROR)'?

If you're facing a job failing with the error below:
DNS resolution failed [ERROR] WebDriverException: unknown error: net::ERR_NAME_NOT_RESOLVED

Then,

Log into the Heimdall pod with the below command and see if you can ping the <url>:
kubectl exec -it <heimdall-pod-name> -n <namespace> -- /bin/bash
After logging in to the Heimdall pod, please run the command below to check whether the pods are able to connect or not:
curl <url>

NOTE | Curl command is available only on the Heimdall pod. Log into the Chrome agent pod using the below command to check/debug anything related to that pod:
kubectl exec -it <chrome-pod-name> -n <namespace> -- /bin/sh

NOTE | In order to use any tool available for Alpine (Chrome agent pod), make sure to either remove the USER Block or add the particular install command in Chrome Agent DOCKERFILE , rebuild the image and redeploy the PSA. If you remove the USER block in Chrome Agent DOCKERFILE, the pod will be created with root permissions, and you can install any tool after logging in to the Chrome Agent pod.

Back to TOC

How do I resolve the error thrown when cluster-level permissions are missing?

The error below is thrown when cluster-level permissions are missing since PSA would need cluster-level permissions to function properly:

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list] for kind: [Pod] with name: [null] in namespace: [measurement] failed.

...

Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_362]

As PSA service accounts and roles are configured for cluster-level permissions to do certain operations on the Helm level. Having cluster-level permissions would imply that the Agent requires access to different namespaces in the cluster.

Refer to Create the Kubernetes Cluster in the documentation. The page makes note to create a cluster in the instructions, so the assumption is that you should have access to create a cluster. With only namespace level permissions, an individual won’t be able to create a cluster.

Apply the steps below to fix the issue:

TIP | If you want to permit only namespace-level permissions instead of cluster-level permissions, we suggest you use the role.yaml file attached below

Unpack Helm chart:

cd <Unzipped-PSA-directory> 
tar xf sum-psa-heimdall.tgz

Use/Replace the attached role.yaml with sum-psa-heimdall/templates/role.yaml
Repack using the following:
```
helm package sum-psa-heimdall 
```
Finally, redeploy the PSA using the newly packed sum-psa-heimdall.tgz.

Back to TOC

How do I resolve a Heimdall log error?

If you see the error below in your Heimdall logs, try increasing the RAM on the PSA host machine, or decrease the memory assigned to minikube and values.yaml:

2023-05-30 20:43:38.768 WARN 1 --- [ main] org.apache.ignite.internal.IgniteKernal: Nodes started on local machine require more than 80% of physical RAM what can lead to significant slowdown due to swapping (please decrease JVM heap size, data region size or checkpoint buffer size) [required=2262MB, available=5120MB]
[20:43:38] Nodes started on local machine require more than 80% of physical RAM that can lead to significant slowdown due to swapping (please decrease JVM heap size, data region size or checkpoint buffer size) [required=2262MB,

Back to TOC

How do I resolve a Heimdall error on Docker-based PSA?

For Docker-based PSA, make sure the "docker ps" command outputs both the Heimdall and ignite containers.

To capture Heimdall logs, use the below:

// Capture heimdall container logs using the <HEIMDALL_CONTAINER-ID> to heimdall.txt file, to get <HEIMDALL_CONTAINER-ID>, run "docker ps"

docker logs -n <last-n-lines> <HEIMDALL_CONTAINER-ID> > heimdall-<CONTAINER-ID>.txt

Back to TOC

Cisco AppDynamics Community

How do I debug common Linux Private Synthetic Agent issues?

In this article:

What are the prerequisites for debugging Linux private Synthetic Agent issues?

How do I capture PSA logs to further troubleshoot issues?

Notes:

What errors arise from unsupported Kubernetes versions?

Kubectl version

Insufficient resources provided to Kubernetes

CrashLoopBackOff image pulling error

Low resource allocation to the Chrome/API Agent

How do I install PSA on a machine without an internet connection?

How do I resolve a recurring ‘Test Agent Failed to Post Result’ error?

How do I resolve the 'DNS resolution failed (ERROR)'?

How do I resolve the error thrown when cluster-level permissions are missing?

How do I resolve a Heimdall log error?

How do I resolve a Heimdall error on Docker-based PSA?