Archiving and Deletion Strategy…KonMari for Data Management?

Archiving and Deletion Strategy…KonMari for Data Management?

archiving data in sql server

Welcome to the third and final post in our series on Data Lifecycle Management (DLM), where we will talk about archiving and purging company data.

In the last post, we talked about applying the Kaizen approach to data management to achieve a culture of continuous improvement on our data teams.

In this post, we will use the KonMari method of simplification, recently made famous by Marie Kondo, as a lens for considering what to keep and what to purge in our business data repositories.

As always, we will also give some best practices for how to establish policies around archiving and purging your company data.

Data Lifecycle Management: Archiving and Deletion

As mentioned in previous posts, the final phases of Data Lifecycle Management are Archiving and Deletion.

These phases help to ensure that we keep and maintain only that data which is required for our business. But how do we determine what to keep active, what to archive, and what to purge?

We can apply some concepts from the KonMari simplification method to our data strategy here to help us decide.

Rule 1: Make the Commitment

Kondo states that the first step in KonMari is committing to achieving your goal. This may seem like an obvious first step for any endeavor, but many companies fail to establish a data retention strategy.

It is not uncommon for businesses to take the approach that keeping all data (sometimes even in a “hot”, or readily accessible, repository) is the way to go. This tactic usually stems from either:

  1. an explicit belief that you cannot go wrong with keeping too much historical data
    or
  2. from having no capacity to prioritize a retention strategy.

Either way, the “keep everything” strategy is misguided for 3 reasons.

First, the more data you keep, the more time it will take to recover in the event of a crisis.

Crisis can take the form of:

  • a lawsuit or an audit that requires retrieval of specific information

or

  • a natural disaster, human error, criminal activity or another event that demands restoration of data to a particular point in time.

In any case, restoration time is critical during these events. The more time it takes to retrieve the required data from your archive, the longer it will take for the business to recover.

Second, while data retention is a necessity, it is also a liability and entails responsibility.

Businesses must take the responsibility to respect consumer privacy rights very seriously. Part of this responsibility entails keeping consumer data for no longer than is required or for any purpose other than that for which the consumer gave consent. Even if your company does not fall under the regulatory jurisdiction of privacy laws like GDPR, CCPA, or PIPEDA, the business is nevertheless liable for securely and responsibly maintaining its consumer data.

With any data that is retained comes the possibility that it could be stolen, leaked, or misused. This risk is unavoidable, but preserving unnecessary archives of historical data is a liability that ought to be avoided.

Third, “Data stores don’t grow on trees…”

A well-crafted data strategy can reduce the financial cost of maintaining your data repositories, but increasingly large data stores cost the business money nevertheless.

There are also performance, time, system resource, and opportunity costs associated with maintaining large data stores.

So, in short – make the commitment to tidy up your unwieldy data repositories!

Rule 2: Imagine Your Ideal

Compliance regulations do some of the work of envisioning the ideal for us in the data world. Still, take the time to consider the ideal composition of your data repositories. Doing this can help you to think strategically about what should be kept, how and where to keep it, and for how long.

As mentioned in the previous post in this series, consider both regulatory requirements and the needs of the business for reporting, analytics, and strategic planning when determining what to keep. Consult business leaders and business analysts about what data is needed and for how long. You can even create a formal data contract for critical data elements in your business.

Rule 3: Finish Discarding First

Obviously, deleting data should always be done with extreme caution and forethought.

Nevertheless, once you have performed an audit of your data repositories and have determined your retention strategy, you should begin implementation by purging unnecessary data. We will discuss more about how to perform this action safely and according to best practices below.  

Rules 4 and 5: Progress by Category and in the Right Order

For the purposes of purging and archiving data, we should be thinking in a criticality/age matrix like the one below – beginning in the upper left corner and working down and to the right.

You should make incremental passes across departments in these stages, beginning with the oldest and least important data in each departmental area.

Rule 6: “Does it Spark Joy?”

Ok, ok – I admit this iconic final question from Kondo is much less suitable to data retention. However, there still may be an important (if less-than-perfectly-measurable) question that should be asked by data teams before coding a delete.

Does a business leader strongly prefer to retain certain data despite the lack of any clear regulatory or business-driven reason for doing so?

If so, keep it…unless doing so presents a serious risk or concern. If you feel there is a serious risk, continue to voice your concerns. Otherwise, just wait and continue to get clarification.

Now for some best practices…

Keep in mind that the practices listed here do not include the critical practice of backing up your production data, which has been discussed in previous posts (here, here, and here). Always make sure that backups are in place before beginning to archive or delete data.

1.     Replicate and archive data “in-flight”.

Archiving and/or replicating your data at various points in your pipeline is a best practice. Process failures in data pipelines are not uncommon, and you need to be able to recover data from earlier stages of the pipeline if you need to reprocess the data.

3 common examples of this practice, which should include purging data after an established retention period, are:

  • Moving imported files to an archive folder
  • Replicating transactional databases to a staging database before further processing by downstream systems
  • Staging imported API data in its own table or database before integrating with internal systems.

2.     Archive in cold storage and protect the archive.

Consider how you will store data that has served its immediate purpose and has been determined to be a candidate for long-term storage. There are pros and cons to each method, and a combination of archiving methods may be appropriate for your different data sets. Here are some options and considerations.

  • Onsite physical storage
    • Pros: ease of access in an emergency, familiar technology
    • Cons: vulnerable to tampering, theft, and physical damage/natural disasters
  • Offsite storage (tape, optical disk, magnetic hard drives)
    • Pros: well-established, relatively inexpensive, usually more secure than onsite
    • Cons: slower recovery times
  • Cold cloud storage
    • Pros: speed and ease of recovery, alleviated burden of maintenance, built-in security, inexpensive
    • Cons: potentially less familiar to established IT departments than traditional methods
  • Data lake
    • Pros: accessibility, speed and ease of recovery, suitable for use with emerging technologies, built-in security
    • Cons: less-established, steeper learning curve, potentially expensive, requires careful governance of user access

3.     Don’t forget critical data that is managed by third parties.

Over time, businesses may find that a significant body of company data resides in data stores managed by third parties. Since maintenance has often been delegated to vendors in these cases, archiving this data is sometimes overlooked.

While there may be an ability to set a retention policy, it can be very beneficial for many reasons to work with your vendors to set up an extraction process to archive a copy of your data from these systems into your own repositories as well.

4.     Establish retention schedules and procedures for purging data.

Business, regulatory, and legislative needs dictate what should be saved and for how long, and these may differ between data sets. Establishment of policies and procedures for deleting data will ultimately be the responsibility of data owners.

These policies should address the following areas:

  • Who is authorized to purge data?
  • In what maintenance windows can this process occur since deletion can go slowly and data processing jobs/replication must be disabled?
  • How will notification be given to the business?
  • What validation and integrity checks must be in place?
  • What rollback procedures will be used if necessary?

Just the tip of the iceberg…

There is much, much more to say about all these topics.

If you have made it as far as committing to cleaning up your data but the rest seems overwhelming, never fear! There are many vendors that are happy to help with all levels of assistance.

If you have a good handle on your archiving and deletion processes but would like assistance with a SQL Server implementation of them, reach out! We are here to help.

 

Leave a Reply

Your email address will not be published. Required fields are marked *