Datacenter Activation Coordination (DAC) Mode in Exchange Server
Datacenter Activation Coordination (DAC) mode is an asset of database availability group (DAG) which works using a new protocol called Datacenter Activation Coordination Protocol (DACP).
DAC mode will be disabled by default but should be enabled for all DAGs with two or more members which use continuous replication in order to avoid the “split brain syndrome” that will occur in certain consequence of datacenter switchover.
Split brain syndrome: Split brain syndrome is a condition where in a database copy being mounted as an active copy on two members of the same DAG which are unable to communicate each other.
Example: Consider a setup where a primary datacenter (A) contains two DAG members and the witness server, and a second datacenter contains two other DAG members. In this scenario, the DAG is not enabled with DAC mode. The primary datacenter went down due to power outage and administrator activates the DAG in the secondary datacenter. Ultimately when power restored in primary datacenter and the DAG members in the primary datacenter, which had quorum before the power failure, will start up and mount their databases. As the primary datacenter was restored when there is no network connectivity with the secondary datacenter, and since the DAG was not enabled with DAC mode, the active databases within the DAG entered a split brain condition.
NOTE: DAC mode shouldn’t be enabled for DAGs that uses third-party replication mode without indicated by the third-party vendor.
As the DAC mode is intended to avoid split brain Syndrome condition, when DAC mode is enabled, DAG members will not automatically mount databases even if they have quorum. Instead DACP bit will be used to determine the current state of the DAG and Active Manager would attempt to mount the databases based on DACP value.
Let us take one example to see how DAC works:
In my Example I have 2 sites, one in India and other in US with a file share witness server and 2 exchange servers on each as shown below:
In general if the database is failed on one member, a database copy will be mounted on another DAG member. Here the Best Copy Selection process is efficient in order to select the database copy to be activated. If databases on Server1 failed, then the passive copy on Server 2 will be automatically activated as below:
Let’s say, the complete India site is failed due to power failure. As the 2 database copies and File share witness server is offline in India site, the Quorum cannot be maintained due to which the databases copies will fail as shown below:
When an entire site is fail, users will not be able to access their mailboxes, administrator has to perform the datacenter switchover manually by activating the File witness Server in Secondary site (US datacenter in my case) and bring the databases online in order to maintain service continuity for business continuity.
Process involved in datacenter switchover:
-
Check if the health and readiness of the secondary (disaster recovery) site infrastructure.
-
If the messaging infrastructure at the primary site is not completely down; i.e. if any servers are still running but the databases cannot be mounted, active manager on the DAG members at the primary datacenter must be marked as stopped to prevent them from mounting the databases.
-
Activate the File Share witness server at the secondary site in order to restore the Quorum.
-
Further configuration changes may possibly require like changing the MX records that point to the failed datacenter servers and DNS records for Client Access Server (CAS), HUB transport and Unified Messaging (UM) servers
Ultimately the power restored in India site. DAG members and file share witness come back online in India. But, the WAN connection remains offline, preventing the communication between the DAG members in each site.
As the DAG members and file share witness have enough votes to achieve quorum, database is brought online in India sit. At this stage we have an active copy of the same database in both India and US datacenters since the DAG members in each site were not able to communicate with each other. This condition is called as “split brain Syndrome“.
In order to avoid this split brain syndrome condition, we need to enable DAC mode. When DAC mode is enabled, it changes the Active manager process and each DAG member starts up with a DACP bit of 0. While waiting for it can communicate with a DAG member that has a DACP bit of 1 or it can communicate with every other member of the DAG, it will not attempt to activate its database copies even though there is an active quorum with sufficient DAG members to mount the databases.
NOTE: DAC uses DACP which present as a bit (0 or 1) which is stored in memory.
When the DAC is configured in advance, the DAG members in India site will star up with a DACP bit of 0 and they are unable to communicate with the US DAG members as the WAN link down. As the DAG members are star up with DHCP bit 0, they will not mount the databases in India site.
Once the WAN connection is restored, India DAG members are able to communicate with the US DAG members and their DACP bit is set from 0 to 1. Since the databases are already mounted on US site, the database copies on India become passive copies.
NOTE: In DAC mode, if the DAG member responds with a DACP bit of 0, starting up server will not mount its databases. If it responds with a DACP bit of 1, databases are mounted on it.
PowerShell command to enable/Disable DAC mode:
To Enable DAC:
Set-DatabaseAvailabilityGroup -Identity <DAG name> -DatacenterActivationMode DagOnly
To Disable DAC:
Set-DatabaseAvailabilityGroup -Identity <DAG name> –DatacenterActivationMode Off
In Addition, DAC mode also enables the use of the below built-in site resilience cmdlets which are used to perform datacenter switchover:
Ratish Nair
Microsoft MVP | Exchange Server
Team @MSExchangeGuru
April 26th, 2016 at 9:08 am
Ratish, Thanks for explanation about DAC. Please publish how Quorum works in exchange DAG. Example: Cluster having 4 Node and 1 FSW. how it will work if node are failed one by one. How quorum voting works with FSW.
FSW is mandatory? If no, how will quorum works without FSW.
Thanks
June 13th, 2016 at 3:39 am
I guess this will help
https://blogs.msdn.microsoft.com/clustering/2011/05/27/understanding-quorum-in-a-failover-cluster/
July 11th, 2016 at 9:18 pm
Very good explanation to understand the DAC concept and I had been looking this for a long time. Thank you so much.
April 28th, 2017 at 6:39 am
I have issue related to Switch back to Production site, could you share me how we could properly switch back. thanks