Datacenter Activation Coordination (DAC)
Datacenter Activation Coordination (DAC) mode is a property setting for a database availability group (DAG). DAC mode is disabled by default and should be enabled for all DAGs with two or more members that use continuous replication. DAC mode shouldn't be enabled for DAGs in third-party replication mode unless specified by the third-party vendor.
If a catastrophic failure occurs that affects the DAG (for example, a complete failure of one of the datacenters), DAC mode is used to control the startup database mount behavior of a DAG. When DAC mode isn't enabled, and a failure occurs that affects multiple servers in the DAG, when a majority of the DAG members are restored after the failure, the DAG will restart and attempt to mount databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each other. Split brain syndrome can also occur when network connectivity is severed between the datacenters. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG's witness server) to be available and interacting for the DAG to be operational. When a majority of the members are communicating, the DAG is said to have quorum.
For example, consider a scenario where the first datacenter contains two DAG members and the witness server, and the second datacenter contains two other DAG members. If the first datacenter loses power and you activate the DAG in the second datacenter (for example, by activating the alternate witness server in the second datacenter), if the first datacenter is restored without network connectivity to the second datacenter, the active databases within the DAG may enter a split brain condition.
Datacenter Activation Coordination (DAC) mode is a property setting for a database availability group (DAG). DAC mode is disabled by default and should be enabled for all DAGs with two or more members that use continuous replication. DAC mode shouldn't be enabled for DAGs in third-party replication mode unless specified by the third-party vendor.
If a catastrophic failure occurs that affects the DAG (for example, a complete failure of one of the datacenters), DAC mode is used to control the startup database mount behavior of a DAG. When DAC mode isn't enabled, and a failure occurs that affects multiple servers in the DAG, when a majority of the DAG members are restored after the failure, the DAG will restart and attempt to mount databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each other. Split brain syndrome can also occur when network connectivity is severed between the datacenters. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG's witness server) to be available and interacting for the DAG to be operational. When a majority of the members are communicating, the DAG is said to have quorum.
For example, consider a scenario where the first datacenter contains two DAG members and the witness server, and the second datacenter contains two other DAG members. If the first datacenter loses power and you activate the DAG in the second datacenter (for example, by activating the alternate witness server in the second datacenter), if the first datacenter is restored without network connectivity to the second datacenter, the active databases within the DAG may enter a split brain condition.
DAC mode is designed to prevent split brain from occurring by including a protocol called Datacenter Activation Coordination Protocol (DACP). After a catastrophic failure, when the DAG recovers, it won't automatically mount databases even though the DAG has a quorum. Instead DACP is used to determine the current state of the DAG and whether Active Manager should attempt to mount the databases.
You might think of DAC mode as an application level of quorum for mounting databases. To understand the purpose of DACP and how it works, it's important to understand the primary scenario it's intended to deal with. Consider the two-datacenter scenario. Suppose there is a complete power failure in the primary datacenter. In this event, all of the servers and the WAN are down, so the organization makes the decision to activate the standby datacenter. In almost all such recovery scenarios, when power is restored to the primary datacenter, WAN connectivity is typically not immediately restored. This means that the DAG members in the primary datacenter will power up, but they won’t be able to communicate with the DAG members in the activated standby datacenter. The primary datacenter should always contain the majority of the DAG quorum voters, which means that when power is restored, even in the absence of WAN connectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenter have a majority and therefore have quorum. This is a problem because with quorum, these servers may be able to mount their databases, which in turn would cause divergence from the actual active databases that are now mounted in the activated standby datacenter.
DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it's allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn't allowed to mount databases. Because it's in DAC mode, the server must try to communicate with all other members of the DAG that it knows to get another DAG member to give it an answer as to whether it can mount local databases that are assigned as active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If another server responds that its bit is set to 1, it means servers are allowed to mount databases, so the server starting up sets its bit to 1 and mounts its databases.
But when you recover from a primary datacenter power outage where the servers are recovered but WAN connectivity has not been restored, all of the DAG members in the primary datacenter will have a DACP bit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter will mount databases, because none of them can communicate with a DAG member that has a DACP bit value of 1.
You might think of DAC mode as an application level of quorum for mounting databases. To understand the purpose of DACP and how it works, it's important to understand the primary scenario it's intended to deal with. Consider the two-datacenter scenario. Suppose there is a complete power failure in the primary datacenter. In this event, all of the servers and the WAN are down, so the organization makes the decision to activate the standby datacenter. In almost all such recovery scenarios, when power is restored to the primary datacenter, WAN connectivity is typically not immediately restored. This means that the DAG members in the primary datacenter will power up, but they won’t be able to communicate with the DAG members in the activated standby datacenter. The primary datacenter should always contain the majority of the DAG quorum voters, which means that when power is restored, even in the absence of WAN connectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenter have a majority and therefore have quorum. This is a problem because with quorum, these servers may be able to mount their databases, which in turn would cause divergence from the actual active databases that are now mounted in the activated standby datacenter.
DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it's allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn't allowed to mount databases. Because it's in DAC mode, the server must try to communicate with all other members of the DAG that it knows to get another DAG member to give it an answer as to whether it can mount local databases that are assigned as active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If another server responds that its bit is set to 1, it means servers are allowed to mount databases, so the server starting up sets its bit to 1 and mounts its databases.
But when you recover from a primary datacenter power outage where the servers are recovered but WAN connectivity has not been restored, all of the DAG members in the primary datacenter will have a DACP bit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter will mount databases, because none of them can communicate with a DAG member that has a DACP bit value of 1.
DAGs with two members have inherent limitations that prevent the DACP bit alone from fully protecting against application-level split brain syndrome. For DAGs with only two members, DAC mode also uses the boot time of the DAG's alternate witness server to determine whether it can mount databases on startup. The boot time of the alternate witness server is compared to the time when the DACP bit was set to 1.
- If the time the DACP bit was set is earlier than the boot time of the alternate witness server, the system assumes that the DAG member and witness server were rebooted at the same time (perhaps because of power loss in the primary datacenter), and the DAG member isn't permitted to mount databases.
- If the time that the DACP bit was set is more recent than the boot time of the alternate witness server, the system assumes that the DAG member was rebooted for some other reason (perhaps a scheduled outage in which maintenance was performed or perhaps a system crash or power loss isolated to the DAG member), and the DAG member is permitted to mount databases.
Important: |
---|
Because the alternate witness server's boot time is used to determine whether a DAG member can mount its active databases on startup, you should never restart the alternate witness server and the sole DAG member at the same time. Doing so may leave the DAG member in a state where it cannot mount databases on startup. If this happens, you must run the Restore-DatabaseAvailabilityGroup cmdlet on the DAG. This resets the DACP bit and permits the DAG member to mount databases. |
In addition to preventing split brain syndrome at the application level, DAC mode also enables the use of the built-in site resilience cmdlets used to perform datacenter switchovers. These include the following:
Performing a datacenter switchover for DAGs that are not in DAC mode involves using a combination of Exchange tools and cluster management tools.
For more information about datacenter switchovers, see Datacenter Switchovers.
Performing a datacenter switchover for DAGs that are not in DAC mode involves using a combination of Exchange tools and cluster management tools.
For more information about datacenter switchovers, see Datacenter Switchovers.
DAC mode can be enabled only by using the Exchange Management Shell. Specifically, you can use the Set-DatabaseAvailabilityGroup cmdlet to enable and disable DAC mode, as illustrated in the following example.
Set-DatabaseAvailabilityGroup -Identity DAG2 -DatacenterActivationMode DagOnly
No comments:
Post a Comment