cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Machine agent stopping without any error after starting successfully.

areddy
Adventurer

Machine agent stopping without any error after starting successfully.

 

Why, any idea??

 

Below is the log file.

[Redacted]

 

^ Post edited by @Ryan.Paredez  to remove the log file. Please do not share or attach log files to community posts for security and privacy reasons. 

4 REPLIES 4

gurmit.arora
Architect

Do you see anything in the nohup.out file that is generated?

Antti.Elonheimo
Producer

Hi,

 

We are seeing something similar after upgrading machine agents from 4.1.7.1 to 4.2.15.0. Our systems are AIX 7.1 and we have not yet identified what is happening. One thing you could do is to enable debug logging for the machine agent with below instructions. Of course you could let the agent run until it stops unexpectedly instead of only 5 minutes.

 

We are going to try this to see if we find anything and I'll update here if we find the cause. We are starting the agents with the same command that was used with v4.1 agent and that might cause some issues because we had set some maximum Java heap size in the command.

 

Steps for enabling machine agent debug mode:

  • in log4j.xml file located in <MachineAgent-installation-dir>\conf\logging
  • change the below logger to "debug"

<logger name="com.singularity" additivity="false"> 
<level value="debug"/> 
<appender-ref ref="FileAppender"/> 
</logger>

  • allow the machine agent to run for at-least 5 minutes
  • zip the entire folder after 5 minutes of starting the machine agent- "<MachineAgent-installation-dir>\logs" and attach to this ticket
  • Revert the logging level to "info" in log4j.xml

 

-Antti

The sigar library that we use to get the OS related metrics is not supported on AIX 7.1.

 

So as a workaround can you try this.

 

In the MachineAgent/lib folder create a symbolic link for AIX 5 version of the library to work as AIX 7 version as well. Ensure same file permissions.
ln -s libsigar-ppc64-aix-5.so libsigar-ppc64-aix-7.so

 

Another option:

 

  1. Stop the Machine Agent

  2. Switch to Hardware Monitor. Here are the steps to switch to Hardware monitor:

a. Edit monitor.xml from <machine_agent_install_directory>/monitors/HardwareMonitor/
b. Change <enabled>false</enabled> to <enabled>true</enabled>
c. Save the file
d. Edit monitor.xml from <machine_agent_install_directory>/monitors/JavaHardwareMonitor/
e. Change <enabled>true</enabled> to <enabled>false</enabled>
f. Save the file

 

   3. Restart Machine Agent

Hi,

 

Actually I'm not sure about the AIX level currently but I think they are 7.1.

 

Anyway, the problem was very simple - we forgot to use "nohup" in front of the command when starting from shell without using the machine-agent script. Normally these agents are started from inittab during system reboot and we just copied the command from there. The working form is:

"nohup <commandFromInittab> &"

 

Without nohup the agents stop when the user logs off. This wasn't needed on all servers as some servers have probably the WAS user always logged on and that's why I was thinking there is something else wrong now.

 

-Antti