MSExchangeGuru.com

Learn Exchange the Guru way !!!

 

Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD site failure Part 2

This is a continuation of my article series on cross site disaster recovery with Exchange 2010 DAG.

Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD Site failure Part 1: https://msexchangeguru.com/2012/10/25/exchange-2010-dag-dr/

This is going to work when DAG is expanded to at least 2 data centres & one of them is primary and other is DR site Datacentre.

 

 

Part 2 – Switchback to the Production Site

Switchback scenarios:

  1. Production site is still intact.
  2. Only Network outage was there which has been fixed.
  3. Data centre was being moved and new location is up now with same servers.
  4. We need to ensure minimum 2 Domain Controllers with Global Catalogue are available in the Production site.
  5. We need to Ensure minimum 4 Production servers are back, assuming 4 DR servers are already up.

Estimated Time:

Normally Production site should be up and running in 3 to 4 hours.

Once Production site will be up and running:

In case of new server or Data centre builds, we will perform audits to ensure patching, Exchange level and server level parameters configured correctly.

Then we will switch back the DR to Production.

Switchback Steps:

  1. Start Production Data Centre Domain Controllers
  2. Replicate Active Directory and wait for 10-15 minutes for the replication.
  3. Transfer FSMO roles: Transfer the FSMO role to DPDC02. Ensure DNS is working fine and resolving all required servers. Run the below command to check the current fsmo role holder.

    Netdom query FSMO

    1. Log on to Production Domain Controller.
    2. Assign your account as member of Enterprise Admins, schema admins and Domain Admins.
    3. Logoff from Production Domain Controller
    4. Login to Production Domain Controller
    5. Open cmd prompt with run as administrator.
    6. Type ntdsutil then press enter.


    7. Type roles, and then press ENTER.


    8. Type connections, and then press ENTER.


    9. Type “connect to server DPDC02”, and then press ENTER.
    10. At the server connections prompt, type q, and then press ENTER.


    11. Type “Transfer rid master” then press ENTER.
    12. Type “Transfer domain naming master” then press ENTER
    13. Type “Transfer infrastructure master” then press ENTER
    14. Type “Transfer PDC” then press ENTER
    15. Type “Transfer schema master” then press ENTER

      You will receive a warning window every time we transfer fsmo role, asking if you want to perform the transfer. Click Yes here.

    16. At the fsmo maintenance prompt, type q and then press ENTER.
    17. Type q, and then press ENTER to quit the Ntdsutil utility.
  4. Start Exchange Servers
  5. Bring up the DAG

Run the cmdlet to start the clustering on Production Exchange servers

Log in to any Production server and Open Exchange Management Shell with run as administrator.

Start-DatabaseAvailabilityGroup -Identity DAGName -ActiveDirectorySite “Production AD Site DN” –ConfigurationOnly

This should come back clean.

Do the AD Replication and Wait for 30 Minutes. This is very important.

  1. Login on all Production Exchange servers one by one then open exchange management shell with run as administrator and run the following commands
    1. Cluster node ProdServerName /forcecleanup
    2. “Start-DatabaseAvailabilityGroup -Identity DAGName -mailboxserver ProdmailboxServerName”
  2. On one of the production exchange server run the following cmdlet:

    Set-DatabaseAvailabilityGroup -identity DAGName -WitnessDirectory WitnessDirectoryPath –WitnessServer “FQDNofWitnessServer”

We should be able to see Production copies are healthy.

  1. Check the failover cluster. Failover cluster all nodes will be added and cluster resource and File share witness resources will be online in Production site FSW.
  2. Move the PrimaryActiveManager to Production Exchange DAG member by the below cmdlet in Exchange management shell

    Cluster.exe dagfqdn group “cluster group” /Moveto:nameoftheProductionSitePAM

  3. Check the DAG status by the below cmdlet in Exchange management shell

    Get-databaseavailabilityGroup –status | fl

    DAG Status should show FSW is using Primary FSW and all Servers are part of DAG.

  1. Move active mailbox Databases to Production Data Centre: Once we see “Copy Queue Length” and “Replay Queue Length” are both at “0” then we have to move active mailbox database to Production datacenter.

    Get-mailboxdatabasecopystatus ** can be used to get the current copy status.

    Right click on the database and select the Move Active Mailbox Databaseà select production server in the “mailbox server to host the active database copy” à ensure none is selected in “Override automatic database mount dial setting on the target mailbox server” à Click Move for all the databases.

  2. Move OAB to the Production site: Move OAB generation server from DR to Production.
    1. Login to DR server
    2. Open Exchange management Console and go to Organization Configuration à Mailbox à Offline address book tab.
    3. Right click on Default Offline Address Book then select move, then select the Production server and click on move.
  3. Change internal DNS host record for CAS to the Production site
  4. Change the send connector source servers to Production servers if you have just one send connector. Normally we can configure 2 send connectors one for Production source servers and other for DR source servers.
  5. Change the CAS Array site:
    1. Login to Production server
    2. Open Exchange management shell
    3. Run the cmdlet to get current site

      get-ClientAccessArray -Identity CASArrayname | fl site

    4. Run the cmdlet to change the CAS array to DR site

      Set-ClientAccessArray -Identity CasArrayName -Site Production AD site

    5. Run the cmdlet to verify the change.

      “get-ClientAccessArray -Identity CasArrayName | fl site”

  6. Change Public DNS record
    IP of the CAS and MX records host record from DR Site to Production Site.
  7. Change the Public Folder Server: Change the Public folder server to the Production public folder server.
    1. Log in to the Production Server
    2. Open Exchange management shell.
    3. Run the below cmdlet

      Set-MailboxDatabase -PublicFolderDatabase “Name of the Production PF DB”

  8. Perform forest wide Active Directory Replication: Perform forest wide AD replication so that all DNS and AD server gets replicated with the updated information and all clients connect to the correct mailbox servers.
  9. Backup the following using DPM:
    1. System State backup.
    2. Exchange backup using DPM.
  10. Perform forest wide Active Directory Replication: Perform forest wide AD replication so that all DNS and AD server gets replicated with the updated information and all clients connect to the correct mailbox servers.
    1. Log in to Production AD site Domain Controller
    2. Open Active Directory Site and Services
    3. Go to Sitesàboth sitesàServersàall serveràNTDS Settings
    4. Replicate now all other connections here and in other Domain controllers

Production Site should be up and Running J

Please let me know feedback if any.

Prabhat Nigam (Wizkid)
Team@MSExchangeGuru

Please note that if you need assistance in performing disaster recovery, we can help you with the same. Send an email to prabhat@msexchangeguru.com and mark ratish@msexchangeguru.com and one of us will get back to you.

14 Responses to “Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD site failure Part 2”

  1. Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD Site failure Part 1 « MSExchangeGuru.com Says:

    […] Exchange 2010 Cross Site DAG Disaster Recovery: Data Center/AD site failure Part 2 « MSExchang… Says: October 30th, 2012 at 1:16 pm […]

  2. DJ Says:

    Excellent article. Just wanted to understand a scenario should the FSW be in a 3rd site.

  3. Prabhat Nigam Says:

    3rd site is for 2013

  4. D Byron Says:

    Thank you very much, well written and helpful. My question is on switchback to the primary site:

    1- when you restart the DAG in the production site (Start-DatabaseAvailabilityGroup -Identity DAGName -ActiveDirectorySite “Production AD Site DN” –ConfigurationOnly) is it still running at the failover site?

    2- When you move the Primary Active Manager, is that when reseeding of the production site database copies begins? This should be “lossless”?

    I appreciate you writing the artice and taking the time to respond.

    D Byron

  5. Prabhat Nigam Says:

    @D Byron
    1. Yes
    2. No reseeding required. It will continue the current DB replication unless all servers were rebuild because of fire or flood. Yes, there should not be any log file loss.

  6. movi Says:

    Hi,
    good day to you, i have one dag exchange 2013 sp1 with 2 members, each one is in its own AD site and witness server.
    the problem is when i cut the WAN connection the database fail over to DR site even DAC mode is activated, do you have some idea ?
    All database are on site A and i cut the WAN link, the database on site B become active.
    i dont want to use Suspend-MailboxDatabaseCopy with the -ActivationOnly or block etc parameter or any thing else.
    sorry for posting on wrong location.
    Regards

  7. Prabhat Nigam Says:

    @Sajid
    2013 DR will work almost the same way as 2010.
    You can’t have 2 FSW servers so it depends if the FSW is reachable to other AD site Servers then DB will mount.
    In your example I assume FSW was in the site B.
    If FSW and DB servers are down then other site can’t activate the databases.

    With 2013 we have option of 3 sites in which if any 2 sites are up then databases will be up.
    Watch my video to learn more – https://www.youtube.com/watch?v=5bMh5aJ5WT8

  8. movi Says:

    Dear Prabhat,
    i have two site in site A (Head office) my Domain controller is hosting primary File witness share, in site B (DR site)my Additional Domain controller is hosting secondary File witness share. all DB are mounted on site A, DAC mode is configure and every thing is working fine.
    When Wan link goes down, the DR site database came up automatically..
    hope i explain well this time.
    thanks for sharing your video.
    Regards

  9. movi Says:

    in addition to that if i shutdown my whole environment and start my primary site exchange server it don’t mount the database on primary site even File share witness is up and ruing and is showing as a primary file share witness.
    until i start the DR site exchange and move the database to primary site exchange server.
    thanks

  10. Prabhat Nigam Says:

    Alternate FSW does not come in use until we activate the DR. Well I have not tested the real DR of 2013 but since it is same technology it should be same.

    I would like to see where is you PAM. Primary active manager. It should be in the Primary AD site.

    If you are running exchange 2013 on windows 2012 then windows 2012 has dynamic quorum so we might need to check if this is causing the problem. Check if disabling dynamic quorum helps.

  11. Faris Mal Says:

    But if your Main site go online again, you will ending having 2 DC with Schema master role on this which will make conflict,
    so you need first to take a copy (Backup) of the AD in the DR and restore it in the main site then turn off the DR AD

  12. Prabhat Nigam Says:

    @Faris
    Very good question!
    If old schema master is healthy then you need to format the server, cleanup metadata and rejoin the domain, install AD in it then transfer the role to old schema master. This is the right way of bringing back the DC after disaster recovery and it will not take more than 2-3 hours depending on the hardware.

  13. Sandip Says:

    Hello Prabhat

    We have 2 AD site and each site contain 3 Exch 2010 SP2 server

    Prod Site
    ==========
    PROD-EX1
    PROD-EX2
    PROD-EX3

    DR Site
    ==========
    DR-EX1
    DR-EX2
    DR-EX3

    each server has MBX,CAS,HT role installed on it

    we have DAG in DAC mode

    last week we had storage outage in Primary site, so we had to failover to DR site, but AD in Primary site in functional

    now evrything is up on DR site and we are planning to failback to Primary site

    Kindly let me if i am missing anything from below points

    1. Force cluster cleanup in the primary data center

    cluster Node /forcecleanup

    NOTE: when i ran start-DatabaseAvailabilityGroup | fl

    I found that all 6 servers in StartedMailboxServers

    Cluster service in IN Primary site servers in started state , cluster service srartup type is Automatic

    So my question is do i still need to run below command to cluster clean up from Primary site

    cluster Node /forcecleanup

    do i have to change the cluster service status to DISABLE

    2. Start DAG

    Start-DatabaseAvailabilityGroup –identity –ActiveDirectorySite “Prod Site name”

    3. Set DAG

    Set-DatabaseAvailabilityGroup –Identity DAG1 –WitnessServer -AlternateWitnessServer

    4. Reseed failed database to PRIMARY Site

    5. Move PAM to Primary site

    6. Move active mailbox database to PRIAMRY Site

    Kinldy let me your inputs

    Thanks,
    Sandip

  14. Prabhat Nigam Says:

    -If the storage had the only databases and log then your exchange servers OS should be up and running. Can you clarify this part?

    -If Exchange servers are up then just do the re-seeding

Leave a Reply

Categories

Archives

MSExchangeGuru.com