02-01-2010 01:07 PM
I've been running with BES HA in a test environment for about the past week and a half and to be honest, Im a little disapointed in it. Its not as seamless and stable as I would have expected. For instance, automated failover works, but once the service has been failed to the standby and the old primary comes back up, it will not failback if there is then a problem with the standby (now active) even after a good hour or more of sync time and a good status displayed in the HA status screens.
Likewise it seems easy for the system to get the servers out of sync with the mailserver and its as if both servers hang each other up. Say for instance I failover one server manually, services come up on the standby i then restart the primary for an update lets say, bring it up let its status go all green and available then try and manually failback takes ages, far to long for the failover to be worthwhile.
I wanted to provide 100% availability to our BES environment as I see it Im better off just leaving the server be, perform updates only once every so often and snapshot before I do it (esx environment). There is less interruption and the services come back up quicker on a standalone service then failing the services back a second time.
Were dont have a huge BES base, 65 or so bes users but its a farily imporant service. Im wondering how others are handling this and whether your moving forward with HA just because its now available or sticking with a single server?
02-01-2010 07:44 PM
I am not using HA for automatic failover, just manual failovers when I need to run Windows Updates, etc.
It works like a dream for us. We've got over 400 Domino mail users on each BES server and there is 4-5 second delay in dispatching emails while switching (failover and back).
We are on an ESX VM envirnment too. The main benefit for us over a 3rd party product eg. Neverfail, is that BES is already running on both servers, no need to shut down one before the other starts up. It typically took 20 minutes to do a manual failover with Neverfail.
As far as we're concerned BES HA is a massive bonus and means we can do maintenance during business hours with no outages to users.
02-02-2010 09:39 AM
It's worth mentioning that once an automatic failover has occured, the automatic failover setting is tripped to manual. You have to go back into the BAS and reset the automatic failover switch. This is done because if there were two different issues with the primary and standby, we wouldn't want them failing over back and forth because both of them are in fail states. Mind you, the system is pretty good at telling if a standby server is ok to become the primary. It's more of a precaution so you can ensure the former primary is in good working order before failing back.
HA isn't a bad thing to get into but it needs to be done right. Some people seem to ignore the database when getting the whole thing set up. If your database server explodes, all the failovers in the world won't save you.
As far as the fail over taking a long time to happen, I wouldn't call that typical so that would warrant some further investigation.
02-03-2010 10:44 AM
Thanks for the replies guys.
CerealByPass, quick question. Is there an unsupported way to change the Automatic Failover behaviour so that a single automatic failover does not force the automatic functionality to move into a manual process after the first automatic failover has been performed?
Also I wanted to state that it appears as though the policy settings may have been attributing to the lag we were experiencing between failover of the node, even though we set the policy settings to manual. We will be spending additional time testing this before we roll into production but ultiamtely we will be moving forward with the HA functionality for our production environment vs. relying on a third party product to provide similiar functionality.
Would still be nice though to have the ability to keep the automatic failover policy enabled regardless of one failover occuring or multiples.