What’s the Difference in SQL Server FCIs and AGs?

What’s the Difference in SQL Server FCIs and AGs?

SQL Server FCI or AG?

“What’s the difference in SQL Server FCIs and AGs?” That question comes up frequently when talking High Availability and Disaster Recovery (HADR). It’s especially important when “Everything stops and people can’t do their jobs if SQL Server is unavailable.”

There are a lot of how-to articles on the internet for setting up Always On Failover Clustered Instances (FCIs) and Always On Availability Groups (AGs). In fact, Luke Campbell created a couple of nice guides – How to Create SQL Server 2019 Failover Clustered Instances in Azure and Azure shared disks – Failover Clustered Instance – SQL Server 2016.

So, I won’t deep dive into the technologies. Rather I’ll share some of their key, high-level differences. The goal is to help you identify the better choice for your environment. Even with that narrow scope, this is a big topic. I’ll touch on the basics here and elaborate on the more advanced aspects in the future.

Keep in mind that there are lots of ways to combine different technologies to improve your HADR capabilities. For example, you can use a FCI to host one of the AG replicas. You can use Distributed AGs in different datacenters. And many other combinations.

There’s a TLDR; summary table comparing the two at the end of the post.

Comparing SQL Server FCIs and AGs

FCIs and AGs aren’t exactly apples and oranges. They have similarities. Both are native to SQL Server. And both can greatly reduce planned and unplanned downtime. However, there are some key differences in their requirements and the way they are implemented. These differences may influence what’s best in your environment. Let’s have a look.

What are the hardware and software requirements for FCIs and AGs?

Let’s begin by considering some of the aspects that may make one of these technologies a non-starter for you – the hardware and software requirements.

Windows Server Failover Clustering (WSFC) serves as the underlying foundation for both FCIs and AGs. At least in the Windows world. Pacemaker is used for clustering Linux instances. Here, I’ll refer to them collectively as WSFC.

FCI requirements and operation

With an FCI you’re essentially installing one instance of SQL Server across two or more nodes of a WSFC. Only one node is active at a time, the primary node. The secondary nodes are available, but not active. Clients connect to the Virtual Network Name and do not need to know which node is currently serving as the primary.

Should the primary WSFC node become unavailable, the SQL Server services are started on another node and it becomes the primary node. All databases, logins, jobs, etc, that are part of that SQL Server instance are available on the new primary node. That’s good. FCIs provide instance-level protection.

To make this possible, FCIs require shared storage. The primary WSFC node owns the shared resource. When it fails, another node will take ownership of the resource and then start its SQL services. The SQL binaries reside on each WSFC node, not the shared storage.

FCIs are available in both Standard Edition and Enterprise Edition of SQL Server. From a licensing perspective, FCIs are a single instance of SQL Server since only one node is actively serving clients. (Note: a qualified licensing expert should be consulted to ensure you fully understand the SQL Server licensing. It can get quite complex.)

SQL Server FCI

AG requirements and operation

AS with FCIs, SQL Server is installed on each WSFC node that is part of an AG. However, unlike FCIs, the SQL Server service runs continually on each node. One of the nodes is designated as the primary replica and all other nodes are secondary replicas. Clients connect to a Listener that directs the connection to the primary replica.

Each WSFC node has its own copy of the AG databases. No shared storage is required. When changes are made to the primary replica, those changes are propagated to the secondary replicas. The increases the total storage capacity required. A two-node AG doubles your storage requirements. A four-node AG increases your storage requirements by a factor of four.

AG can be configured for HA (more on that later). Secondary replicas in HA mode typically do not require licenses.

When the primary replica in an AG configured for HA goes down, one of the secondary replicas becomes the new primary. The Listener directs all connections to the new primary replica.

AGs provide database-level redundancy. This means that all databases that are part of the AG will failover to the new primary replica at the same time. Instance-level objects are not part of the AG. So, logins, SQL agent jobs, etc, are not automatically available on the secondary replicas.

Basic Availability Groups are available in SQL Server Standard Edition. Basic AGs are limited to a two-node system with a single database. AGs in SQL Server Enterprise Edition can have multiple databases that failover together and up to 8 replicas per AG.

SQL Server AG

Is it HA, DR, or both?

There can be some overlap in what’s considered High Availability (HA) and what’s considered Disaster Recovery (DR). Though few would argue that the terms are interchangeable, some technologies can span across both terms depending on how they are implemented.

Generally speaking, HA means the databases remain online and fielding requests despite what would otherwise be a disruptive event. So, users don’t experience noticeable downtime due to unexpected events such as a hardware failure, or during normally scheduled maintenance windows.

DR, on the other hand, is typically a response to a more catastrophic event that could affect more than the SQL Servers. Fires, Tornados, flooding, hurricanes, etc, can hit a datacenter or even an entire region. Ransomware can make its way into a network. The question is how quickly can your team respond to such events?

FCIs and HADR

FCIs are generally considered an HA technology. They are designed to keep the databases online during unexpected events. FCIs can span multiple subnets, datacenters, etc, to help with DR, but they are primarily used for HA purposes.

AGs and HADR

AGs have couple of modes that are frequently used for different purposes. If the nodes are configured for Synchronous-commit mode, each transaction is held open until the secondary replicas are updated. This keeps the replicas up to date at the possible expense of performance. This is good for HA scenarios.

With asynchronous-commit modes, the transactions complete as soon as the log file is written on the primary replica. This allows for some lag between the primary and secondary replicas. Should the primary unexpectedly go offline, some data loss can occur. Asynchronous-commit mode is often used in DR scenarios, when secondary nodes are in a different datacenter or region.

What about RTO and RPO?

No HADR discussion is complete without Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs). How do FCIs and AGs stack up on RTOs and RPOs? (See Where to Start with Disaster Recovery in SQL Server for a description of RTOs and RPOs)

FCIs and RPOs / RTOs

Recall that each WSFC node has access to a shared storage resource. Only one node at a time owns the resource, but all nodes have access to it should they become the primary node. The system and user databases are located on the shared storage resource. As a result every transaction committed on the primary node persists after a failover to a new primary. There should be no loss of data during failovers.

During a planned failover, dirty buffers are written to disk on the primary before the role is transitioned to a new primary. During an unplanned failover, committed transactions that never made it to the data file are rolled forward during the start-up recovery process. Failovers are frequently very fast, however for busy systems, it can time some time.

AGs and RPOs / RTOs

As mentioned before, AGs can be configured in synchronous-commit or asynchronous-commit modes. Synchronous-commit mode can further be divided into “planned manual failover only” or “automatic failover”.

AGs configured with Synchronous-commit with automatic failover will, as the name implies, automatically failover to a secondary replica should the primary become unavailable. This process generally doesn’t take too long. Of course, you’d want to test in your environment to know what to expect.

For secondary replicas configured for Synchronous-commit with planned manual failover only, failover requires manual intervention. Once initiated, the failover typically occurs relatively quickly.

Asynchronous-commit replicas also require manual intervention and could experience some data loss since the secondary replica may be slightly out of sync with the primary.

How difficult is it to manage FCIs and AGs?

FCIs provide instance-level protection. Once installed and configured, FCIs are essentially a single instance of SQL Server spread across a WSFC. As a result, you don’t have to worry about ensuring logins, jobs, etc, are consistent among the WSFC nodes. If you’d like to add or remove nodes, use the SQL Server Setup program.

AGs, on the other hand, require a bit more effort. Since AGs provide database-level protection, you’ll need make sure that important instance-level objects that AG databases depend on are available on the secondary replicas. For example, primary replica logins are not automatically created on the secondary replicas. You’ll need to do the same for maintenance jobs. We typically add a step to the beginning of each job to determine whether the job is on the primary replica or a secondary replica. If the job is running on the primary, continue. Otherwise, don’t do anything.

What about patching FCIs and AGs?

Both FCIs and AGs can minimize downtime due to patching. Although the exact details vary, you can apply Windows and/or SQL patches to secondary nodes/replicas, manually failover the services so that the newly patched system becomes the primary node/replica, and then apply the patches to the remaining nodes/replicas.

This approach limits downtime to the time it takes to failover.

Are secondary nodes/replicas available for production use?

It can be interesting to think about using the secondary nodes/replicas to offload work from the primary node/replica. With FCIs this is unfortunately not possible. In fact the SQL Server service isn’t even running on the secondary nodes. So, the secondary nodes aren’t available for other purposes.

With AGs, you can configure secondary replicas for read-only access. This allows actions such as backups to be performed on the secondary replica. Additionally, you can offload some resources-intensive reporting queries to the secondary replica. Note, additional licenses are required if the secondary replicas are used for production proposes.

How do FCIs and AGs compare?

So, what’s the Difference in SQL Server FCIs and AGs? The following chart summarizes the differences.

FeatureFCIsAGs
Licenses*Single license requiredSingle license required for HA mode (with Software Assurance), additional licenses required for actively used secondary replicas
EditionStandard and Enterprise EditionsBasic AG on Standard Edition, full AGs on Enterprise Edition
Storage ConfigurationShared storage, eg SAN, Storage Spaces Direct, etc.No shared storage needed, each replica has its own
Storage RequirementsNo additional store space required for user databasesEach replica stores its own copy of the database
ProtectionInstance-levelDatabase-level
Typical configurationHA, with DR capabilities if combined with other technologiesHA or DR
RPO: Potential data lossTypically no data loss during failoverTypically no data loss during failover for synchronous-commit; potential data loss for asynchronous-commit
RTO: Time to failoverPotentially noticeable for large and active systemsTypically fast and may be unnoticeable on many systems
AdministrationOnce configured, manage the instance. Use SQL Server Setup program to add/remove nodesOnce configured, additional care must be taken for instance-level objects such as logins and jobs.
Patching and maintenanceReduces downtime to failover timeReduces downtime to failover time

* See the SQL Server 2019 Licensing Guide available on the Microsoft SQL Server licensing page.

Which is better, FCIs or AGs?

Is one of these two technologies better than the other? Yes, almost certainly. But, which one? That’s really going to depend on what your objectives are, on what technologies and licenses you have in place or can afford, and on how your SQL Servers fit into the broader HADR conversation.

The important thing is to take the first step. Start having the conversations.

In High Availability and Disaster Recovery in SQL Server, I also share a couple of other basic DR options – Backups and Log Shipping.

Want to work with The SERO Group?

Want to learn more about how SERO Group helps organizations take the guesswork out of managing their SQL Servers? It’s easy and there is no obligation. 

Schedule a call with us to get started.

 

3 Responses

  1. […] High Availability. How much does it cost your organization if SQL Server is down for a hour? For a day? For a week? What can be done to prevent that from happening? Would implementing a Failover Clustered Instance help? Do you need an Availability Group? See What’s the Difference in SQL Server FCIs and AGs? […]

  2. […] Setup a Failover Cluster Instance or a High Availability Group to help keep systems available. See What’s the Difference in SQL Server FCIs and AGs? and How to Create SQL Server 2019 Failover Clustered Instances in […]

  3. […] What’s the Difference in SQL Server FCIs and AGs? – The SERO Group […]

Leave a Reply

Your email address will not be published. Required fields are marked *