High Availability and Disaster Recovery in SQL Server

Avoid working without a net; implement a SQL Server HADR option

We’re often asked about SQL Server High Availability and Disaster Recovery (HADR) options to help customers protect their data and improve their uptime. No surprise there. When a SQL Server provides a critical component of your daily operations you need those databases to be available. Simply put, when a user needs information from the database application, whether it’s an operational, financial, customer relationship management (CRM), or some other system, you need the application and the database to be available.

You also need to be able to recover from an unexpected disaster quickly. Whether due to corruption, theft, or a natural disaster, you need to be able to recover the data and database systems quickly. (A process that should be clearly documented and periodically practiced.)

The last thing you want is for your business to grind to a halt while the IT team attempts to figure out what happened and how to respond. Your customers will be waiting while your employees are idled despite wanting to do their jobs. Running a database system without considering HADR is like walking a tightrope without a net.

The Cost of Unplanned Downtime

Unplanned downtime costs companies an average of $5,600 per minute according to a 2014 Gartner study. A more recent 2016 study from the Ponemon Institute put the figure at $8,850 per minute. That’s over a half million dollars an hour of lost revenue, decreased productivity, and incident response costs. Other studies suggest that many unplanned outages involved database servers. Ouch!

So, it goes without saying that good HADR is important, but it’s not a replacement for performing Daily Health Checks and actively monitoring the SQL Server to ensure their health.

4 High Availability and Disaster Recovery Options in SQL Server

Although High Availability and Disaster Recovery is a much broader topic than the database servers, the following SQL Server options should be considered as part of the larger discussion.

1. SQL Backup and Restore

Database and log backups are the foundation of any HADR implementation. It’s the basic blocking and tackling of any database implementation. Every SQL Server should be backed up periodically in addition to any other HADR solutions you may have implemented.

We recommended using native SQL Server backups. Of course, to recover from a catastrophic failure, you’ll have to (re)create a new SQL Server system to restore the backups. Taking frequent snapshots of the virtual server can help with the system recovery but those are no substitute for true database backups.

Backing up is important. But it’s really the ability to restore that important. Periodically restore selected databases to another system and run integrity checks on those databases. Document the process so that in a true disaster recovery scenario, you’ll have a proven protocol.

2. Logshipping in SQL Server

Log shipping leverages the native SQL Server log backup processes and creates a more controlled and automated recovery story. By automatically sending transaction log backups to a secondary SQL Server instance, you’ll have a standby server available with as little as a few minutes of lost data. That’s good news.

Should a disaster strike, client applications and other systems will have to be adjusted to point to the new SQL Server instance. This typically requires manual intervention and will not be seamless for the end users. They’ll know that a failure has occurred, however, recovery times are typically faster and the recovery points more granular than with an ordinary backup approach.

3. Failover Clustered Instances (FCIs)

Failover Clustered Instances takes your HADR game up several levels. Leveraging Windows Server Failover Clusters (WSFC), a single instance of SQL Server can be installed across multiple WSFC nodes. Should one of the nodes experience a hardware or software failure, key services such as SQL Server, are automatically shifted to another node.

With FCIs, SQL Server appears to client applications as one instance running on one Windows Server. Since services are automatically shifted to another node during an unexpected failure or planned maintenance cycle, client applications are automatically redirected to the active node. They typically do not notice any disruption in service.

The FCI is online as long as its underlying WSFC cluster is in good quorum health. This instance level redundancy provides substantially reduced potential for noticeable downtime for clients.

4. Availability Groups (AGs)

With Always On Availability Groups, transaction logs are synchronized from the primary replica to one or more secondary replicas. Should the primary replica become unavailable, either due to a failure or scheduled maintenance, a secondary replica transitions to the primary role. Clients, which connect to a listener, are unaware of any changes to the actual location of the primary replica.

Availability Groups further decrease the likelihood of a noticeably disruptive event.

Looking for Additional HADR Information?

Of course, this is a high-level overview of some options for High Availability and Disaster Recovery options in SQL Server. There are many other factors to consider as well, including backup methodologies, cloud strategies, and virtualization techniques. You can even implement Distributed Availability Groups or Failover Clustered Instances across multiple subnets. These are topics for another post.

Additionally, you’ll need to fully understand the business requirements of your stakeholders for Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). The approaches mentioned here obviously come with varying levels of RTO and RPO capabilities and with different licensing price points.

If you’re looking for additional information, check out Microsoft’s Business Continuity and Database Recovery on SQL Docs. Additionally Luke Campbell (@luke7621) has a series on Creating Failover Clustered Instances in Azure.

Want to work with The Sero Group?

Want to learn more about how SERO Group helps organizations take the guesswork out of managing their SQL Servers? It’s easy and there is no obligation.

Schedule a call with us to get started.

Clustering Clusters Database Database Administration Database Development IT Manager Microsoft Azure Sero Sero Group Shared Disks SQL SQL Assessment SQL Audit SQL Consultant SQL Security SQL Server SQL Server Consultant SQL Server Management The Sero Group

4 Responses

Do I Really Need SQL Server Enterprise Edition? - The SERO Group says:

January 28, 2021 at 10:38 am

[…] part of your HADR strategy rather than implementing Availability Groups. For more information, see High Availability and Disaster Recovery in SQL Server and How to Create SQL Server 2019 Failover Clustered Instances in […]

Reply
Script: How Long Until My SQL Server Backup/Restore Completes? - The SERO Group says:

January 28, 2021 at 11:14 am

[…] High Availability and Disaster Recovery in SQL Server […]

Reply
Do I Still Need a SQL Server Health Check? - The SERO Group says:

January 25, 2023 at 1:07 pm

[…] Configurations that don’t support the organization’s RTOs and RPOs – see High Availability and Disaster Recovery in SQL Server […]

Reply
HADR for Crying Out Loud - The SERO Group says:

May 9, 2023 at 11:10 am

[…] High Availability and Disaster Recovery in SQL Server – The SERO Group […]

Reply

High Availability and Disaster Recovery in SQL Server