Archiving vs. Backup: Understanding the Difference

Anyone who has had the misfortune to lose family photos, an MP3 collection or an entire work in progress knows how important it is to be able to retrieve data following an incident or a handling error. Recording, backup, archiving… Although they are not exactly synonymous, these terms are often used indiscriminately. Clarifying these concepts is essential to building a secure, resilient and compliant data infrastructure.

The more companies rely on data to operate and create value, the more costly it becomes to lose it, either permanently or even temporarily. And yet, the risk is omnipresent because data is exposed to countless threats that cannot be prevented with certainty: hardware malfunctions, cyber attacks, deliberate or accidental actions by users, disasters (fire, flooding, etc.), etc. To protect against data loss caused by such events, there are various approaches, from recording as they occur to very long-term archiving. Approaches that are distinguished above all by their positioning on the life cycle of the data.

What is the data life cycle?

The life cycle of data extends from its creation, for a particular purpose, to its destruction, once it is no longer useful. During this period, the data will have been recorded, processed, archived, under conditions of accessibility and security related to its nature, its value for the company and its various uses.

At the beginning of the cycle, the data is said to be active, or hot, and it becomes increasingly cold as its usefulness decreases. Its retention period depends on its usefulness, the legal and regulatory requirements that apply to it and, in the case of personal data, the consent of the data subject. As a result, some data stay hot for a very long time (the plans of a building) while others are almost immediately cold (a queue number).

Backup: preserving hot data

Backup concerns active data, i.e. data that users need to have immediate access to in order to perform their work. The objective is to be able to quickly restore data in case of loss in order to ensure business continuity.

To do this, a copy is automatically saved on a medium that is as isolated as possible from the original source: removable hard disk, remote data center, cloud, etc. This secondary medium must itself be sufficiently durable, secure and controlled to guarantee that the copy will be available when needed. The backup can be complete (all the data is saved), incremental (new data is saved each time compared to the previous backup) or differential (new data is saved each time compared to the last complete backup).

The type, frequency and scope of backups will be chosen according to the importance of the data, but also its volume and the speed with which it can be restored. Backup plans are generally associated with business continuity and recovery plans (BCP/RRP), which define the resources and procedures to be implemented in the event of an incident.

Archiving: preserving cold data

It is often the case that certain data is no longer of immediate use to the company, but must still be retained in the event of future investigations or audits. This is known as intermediate storage. In France, for example, companies are required to keep their social documents, such as pay slips, for five years, their tax documents for six years, and certain accounting or commercial documents that could be used in a lawsuit or insurance dispute for up to ten years. Specific regulations also apply in certain sectors (banking, gambling, etc.) and for certain categories of companies (OIV, OSE). If the data is not destroyed at the end of this period when it could still be useful, the archiving becomes permanent and has no other function than historical.

The implementation of an archiving solution should include ensuring that data is stored in a neutral format, that security, integrity and authenticity of the data are maintained over time, and that access rights and procedures are defined.

Due to the substantial differences between archiving and backup requirements it is not advisable to force a backup solution to cover archival needs (e.g. by mandating to retain backup tapes for 10 years as a replacement for a proper archiving solution).

Common obligations

Whether it is duplicate data (backup) or original data (archiving), data at rest is subject to the same regulatory requirements, in particular the GDPR for personal data. Note that these obligations extend to technical data in backup and archiving systems which may themselves contain personal data.

To reconcile the usability of data with respect for confidentiality and the rights of individuals, several approaches are possible:

Anonymisation: all elements that could identify individuals (including by cross-checking) are eliminated from the dataset, either by deleting them, or by randomly switching them around, or by making them insufficiently precise;

Pseudonymization: explicit identifiers are replaced by codes in the dataset. Unlike anonymization, pseudonymization is a reversible operation, and therefore less protective;

Encryption: the entire dataset is encrypted and unintelligible to anyone without the appropriate rights.

In addition to these techniques, the effectiveness of which must be regularly monitored, mechanisms and procedures must also be implemented to allow individuals to exercise their rights to consult and rectify their personal data, as well as their right to be forgotten.