Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD site failure Part 2
This is a continuation of my article series on cross site disaster recovery with Exchange 2010 DAG.
Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD Site failure Part 1: https://msexchangeguru.com/2012/10/25/exchange-2010-dag-dr/
This is going to work when DAG is expanded to at least 2 data centres & one of them is primary and other is DR site Datacentre.
Part 2 – Switchback to the Production Site
Switchback scenarios:
- Production site is still intact.
- Only Network outage was there which has been fixed.
- Data centre was being moved and new location is up now with same servers.
- We need to ensure minimum 2 Domain Controllers with Global Catalogue are available in the Production site.
- We need to Ensure minimum 4 Production servers are back, assuming 4 DR servers are already up.
Estimated Time:
Normally Production site should be up and running in 3 to 4 hours.
Once Production site will be up and running:
In case of new server or Data centre builds, we will perform audits to ensure patching, Exchange level and server level parameters configured correctly.
Then we will switch back the DR to Production.
Switchback Steps:
- Start Production Data Centre Domain Controllers
- Replicate Active Directory and wait for 10-15 minutes for the replication.
-
Transfer FSMO roles: Transfer the FSMO role to DPDC02. Ensure DNS is working fine and resolving all required servers. Run the below command to check the current fsmo role holder.
Netdom query FSMO
- Log on to Production Domain Controller.
- Assign your account as member of Enterprise Admins, schema admins and Domain Admins.
- Logoff from Production Domain Controller
- Login to Production Domain Controller
- Open cmd prompt with run as administrator.
-
Type ntdsutil then press enter.
-
Type roles, and then press ENTER.
-
Type connections, and then press ENTER.
- Type “connect to server DPDC02”, and then press ENTER.
-
At the server connections prompt, type q, and then press ENTER.
- Type “Transfer rid master” then press ENTER.
- Type “Transfer domain naming master” then press ENTER
- Type “Transfer infrastructure master” then press ENTER
- Type “Transfer PDC” then press ENTER
-
Type “Transfer schema master” then press ENTER
You will receive a warning window every time we transfer fsmo role, asking if you want to perform the transfer. Click Yes here.
- At the fsmo maintenance prompt, type q and then press ENTER.
- Type q, and then press ENTER to quit the Ntdsutil utility.
- Log on to Production Domain Controller.
- Start Exchange Servers
- Bring up the DAG
Run the cmdlet to start the clustering on Production Exchange servers
Log in to any Production server and Open Exchange Management Shell with run as administrator.
Start-DatabaseAvailabilityGroup -Identity DAGName -ActiveDirectorySite “Production AD Site DN” –ConfigurationOnly
This should come back clean.
Do the AD Replication and Wait for 30 Minutes. This is very important.
-
Login on all Production Exchange servers one by one then open exchange management shell with run as administrator and run the following commands
- Cluster node ProdServerName /forcecleanup
- “Start-DatabaseAvailabilityGroup -Identity DAGName -mailboxserver ProdmailboxServerName”
- Cluster node ProdServerName /forcecleanup
-
On one of the production exchange server run the following cmdlet:
Set-DatabaseAvailabilityGroup -identity DAGName -WitnessDirectory WitnessDirectoryPath –WitnessServer “FQDNofWitnessServer”
We should be able to see Production copies are healthy.
- Check the failover cluster. Failover cluster all nodes will be added and cluster resource and File share witness resources will be online in Production site FSW.
-
Move the PrimaryActiveManager to Production Exchange DAG member by the below cmdlet in Exchange management shell
Cluster.exe dagfqdn group “cluster group” /Moveto:nameoftheProductionSitePAM
-
Check the DAG status by the below cmdlet in Exchange management shell
Get-databaseavailabilityGroup –status | fl
DAG Status should show FSW is using Primary FSW and all Servers are part of DAG.
-
Move active mailbox Databases to Production Data Centre: Once we see “Copy Queue Length” and “Replay Queue Length” are both at “0” then we have to move active mailbox database to Production datacenter.
Get-mailboxdatabasecopystatus ** can be used to get the current copy status.
Right click on the database and select the Move Active Mailbox Databaseà select production server in the “mailbox server to host the active database copy” à ensure none is selected in “Override automatic database mount dial setting on the target mailbox server” à Click Move for all the databases.
-
Move OAB to the Production site: Move OAB generation server from DR to Production.
- Login to DR server
- Open Exchange management Console and go to Organization Configuration à Mailbox à Offline address book tab.
- Right click on Default Offline Address Book then select move, then select the Production server and click on move.
- Login to DR server
- Change internal DNS host record for CAS to the Production site
- Change the send connector source servers to Production servers if you have just one send connector. Normally we can configure 2 send connectors one for Production source servers and other for DR source servers.
-
Change the CAS Array site:
- Login to Production server
- Open Exchange management shell
-
Run the cmdlet to get current site
get-ClientAccessArray -Identity CASArrayname | fl site
-
Run the cmdlet to change the CAS array to DR site
Set-ClientAccessArray -Identity CasArrayName -Site Production AD site
-
Run the cmdlet to verify the change.
“get-ClientAccessArray -Identity CasArrayName | fl site”
- Login to Production server
- Change Public DNS record
IP of the CAS and MX records host record from DR Site to Production Site.
-
Change the Public Folder Server: Change the Public folder server to the Production public folder server.
- Log in to the Production Server
- Open Exchange management shell.
-
Run the below cmdlet
Set-MailboxDatabase -PublicFolderDatabase “Name of the Production PF DB”
- Log in to the Production Server
- Perform forest wide Active Directory Replication: Perform forest wide AD replication so that all DNS and AD server gets replicated with the updated information and all clients connect to the correct mailbox servers.
-
Backup the following using DPM:
- System State backup.
- Exchange backup using DPM.
- System State backup.
-
Perform forest wide Active Directory Replication: Perform forest wide AD replication so that all DNS and AD server gets replicated with the updated information and all clients connect to the correct mailbox servers.
- Log in to Production AD site Domain Controller
- Open Active Directory Site and Services
- Go to Sitesàboth sitesàServersàall serveràNTDS Settings
- Replicate now all other connections here and in other Domain controllers
- Log in to Production AD site Domain Controller
Production Site should be up and Running J
Please let me know feedback if any.
Prabhat Nigam (Wizkid)
Team@MSExchangeGuru
Please note that if you need assistance in performing disaster recovery, we can help you with the same. Send an email to prabhat@msexchangeguru.com and mark ratish@msexchangeguru.com and one of us will get back to you.
November 1st, 2012 at 3:24 pm
[…] Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD site failure Part 2 « MSExchang… Says: October 30th, 2012 at 1:16 pm […]
June 5th, 2013 at 11:40 am
Excellent article. Just wanted to understand a scenario should the FSW be in a 3rd site.
June 5th, 2013 at 12:52 pm
3rd site is for 2013
August 15th, 2013 at 11:55 am
Thank you very much, well written and helpful. My question is on switchback to the primary site:
1- when you restart the DAG in the production site (Start-DatabaseAvailabilityGroup -Identity DAGName -ActiveDirectorySite “Production AD Site DN” –ConfigurationOnly) is it still running at the failover site?
2- When you move the Primary Active Manager, is that when reseeding of the production site database copies begins? This should be “lossless”?
I appreciate you writing the artice and taking the time to respond.
D Byron
August 15th, 2013 at 1:32 pm
@D Byron
1. Yes
2. No reseeding required. It will continue the current DB replication unless all servers were rebuild because of fire or flood. Yes, there should not be any log file loss.
May 6th, 2014 at 10:39 pm
Hi,
good day to you, i have one dag exchange 2013 sp1 with 2 members, each one is in its own AD site and witness server.
the problem is when i cut the WAN connection the database fail over to DR site even DAC mode is activated, do you have some idea ?
All database are on site A and i cut the WAN link, the database on site B become active.
i dont want to use Suspend-MailboxDatabaseCopy with the -ActivationOnly or block etc parameter or any thing else.
sorry for posting on wrong location.
Regards
May 7th, 2014 at 12:54 am
@Sajid
2013 DR will work almost the same way as 2010.
You can’t have 2 FSW servers so it depends if the FSW is reachable to other AD site Servers then DB will mount.
In your example I assume FSW was in the site B.
If FSW and DB servers are down then other site can’t activate the databases.
With 2013 we have option of 3 sites in which if any 2 sites are up then databases will be up.
Watch my video to learn more – https://www.youtube.com/watch?v=5bMh5aJ5WT8
May 7th, 2014 at 1:20 am
Dear Prabhat,
i have two site in site A (Head office) my Domain controller is hosting primary File witness share, in site B (DR site)my Additional Domain controller is hosting secondary File witness share. all DB are mounted on site A, DAC mode is configure and every thing is working fine.
When Wan link goes down, the DR site database came up automatically..
hope i explain well this time.
thanks for sharing your video.
Regards
May 7th, 2014 at 1:33 am
in addition to that if i shutdown my whole environment and start my primary site exchange server it don’t mount the database on primary site even File share witness is up and ruing and is showing as a primary file share witness.
until i start the DR site exchange and move the database to primary site exchange server.
thanks
May 7th, 2014 at 4:12 am
Alternate FSW does not come in use until we activate the DR. Well I have not tested the real DR of 2013 but since it is same technology it should be same.
I would like to see where is you PAM. Primary active manager. It should be in the Primary AD site.
If you are running exchange 2013 on windows 2012 then windows 2012 has dynamic quorum so we might need to check if this is causing the problem. Check if disabling dynamic quorum helps.
September 6th, 2014 at 6:34 am
But if your Main site go online again, you will ending having 2 DC with Schema master role on this which will make conflict,
so you need first to take a copy (Backup) of the AD in the DR and restore it in the main site then turn off the DR AD
September 6th, 2014 at 7:34 pm
@Faris
Very good question!
If old schema master is healthy then you need to format the server, cleanup metadata and rejoin the domain, install AD in it then transfer the role to old schema master. This is the right way of bringing back the DC after disaster recovery and it will not take more than 2-3 hours depending on the hardware.
May 7th, 2015 at 8:09 am
Hello Prabhat
We have 2 AD site and each site contain 3 Exch 2010 SP2 server
Prod Site
==========
PROD-EX1
PROD-EX2
PROD-EX3
DR Site
==========
DR-EX1
DR-EX2
DR-EX3
each server has MBX,CAS,HT role installed on it
we have DAG in DAC mode
last week we had storage outage in Primary site, so we had to failover to DR site, but AD in Primary site in functional
now evrything is up on DR site and we are planning to failback to Primary site
Kindly let me if i am missing anything from below points
1. Force cluster cleanup in the primary data center
cluster Node /forcecleanup
NOTE: when i ran start-DatabaseAvailabilityGroup | fl
I found that all 6 servers in StartedMailboxServers
Cluster service in IN Primary site servers in started state , cluster service srartup type is Automatic
So my question is do i still need to run below command to cluster clean up from Primary site
cluster Node /forcecleanup
do i have to change the cluster service status to DISABLE
2. Start DAG
Start-DatabaseAvailabilityGroup –identity –ActiveDirectorySite “Prod Site name”
3. Set DAG
Set-DatabaseAvailabilityGroup –Identity DAG1 –WitnessServer -AlternateWitnessServer
4. Reseed failed database to PRIMARY Site
5. Move PAM to Primary site
6. Move active mailbox database to PRIAMRY Site
Kinldy let me your inputs
Thanks,
Sandip
May 9th, 2015 at 4:03 am
-If the storage had the only databases and log then your exchange servers OS should be up and running. Can you clarify this part?
-If Exchange servers are up then just do the re-seeding