Note: The most recent product update for 4.5.9 does not mention these significant issues. Wondering if there is an update available that would provide more guidance on managing a 4.5.9.x controller pair (HA) without the EConsole. Should we start over and not use the EConsole when building the HA controller pair? Will also be building a 3-node Events Services cluster as soon as we get some answers. Can this be done properly without using the EConsole?
... View more
We just built a new HA controller pair on 126.96.36.199, utilizing the Enterprise Console for the first time. The EC runs on a stand-alone server. As we tested the HA failover from the GUI, everything appeared to work properly, but at the CLI we could see that the watchdog process was not starting on the passive node. We expected to see the watchdog running as it does on our 188.8.131.52 production system. The EC GUI does not provide this level of detail and shows everything is healthy; however, we do not have confidence that our HA is working properly so we opened a support ticket. Shown below is what AppD Support is advising me to do:
"Please be aware that there are known issues in in Enterprise Console that affect its high availability, (HA), and service life cycle functionalities:
ECONSOLE-2711: If EC is down, then EC-based controller HA failover won't happen ECONSOLE-3853: "Start database" job does not start a down database on the primary controller host
The issues noted above can prevent HA fail-overs from happening when they are needed, and increase the difficulty of restarting controller services after a host crash, causing preventable controller down-time.
To work around these issues, we recommend the following steps: 1) If you are using Enterprise Console's automatic failover feature, disable it. 2) Avoid using Enterprise Console to manage the lifecycle of controller components, establish or re-establish MySQL replication, or fail over between nodes in an HA controller pair
3) Please install or upgrade and use the legacy HA toolkit and its bundled init scripts for the following functions: -Starting and stopping the controller and its database -Establishing or re-establishing MySQL database replication -Manual and automatic failover between controller nodes"
I can find no 'known issues' or AppD publications online that would indicate that the EC is not ready for prime time. I am reaching out to the Community to see if anyone else is having trouble with the 184.108.40.206 EC, or is aware of the EC problems noted above.
... View more