Big Archive Blog

End to End Data Integrity with User Supplied Checksums

Versity Storage Manager is an archiving platform intended for long term data storage and preservation. Often files are stored in VSM to protect against data loss resulting from failures within tier one enterprise storage systems.  Thus data integrity is an important element of a well constructed enterprise archival storage system.

VSM has always supported checksums of files internally, and recently added additional checksum algorithm support. See our blogpost on the additinal checksum support here: https://www.versity.com/blog/data-integrity-checksums.

Having a system automatically generate and verify checksums is very powerful. However, there is a window of vulnerability when relying upon VSM or any other system to generate its own checksums. To understand the window of vulnerability, consider the following workflow:

  1. Data copied to archive filesystem cache
  2. Data archived to archival media and checksum generated
  3. Data released from filesystem cache
  4. Data staged from archival media verifying the checksum
  5. Data retrieved from archive file system cache to primary storage

Since the checksum is calculated by VSM in step 2, there is no checksum protecting the data interactions in step 1 and step 5. These areas of vulnerability include the data source, the network transmission, and the VSM disk cache storage hardware. Any one of these elements could corrupt the data without the checksum detecting the corruption. To protect against these windows of vulnerability, Versity has now added the ability for the user or the application to supply the file checksum to VSM during step 1.

The user or application computes the checksum before the file is copied into the VSM filesystem cache. After the file is copied into VSM, the user or application supplies the algorithm and generated checksum with the ssum command or the sam_ssum API. VSM immediately verifies the checksum and returns a completion status to the ssum call. If the user includes the “u” option in the ssum call, the checksum is verified each time the file is staged from the archival media. When the user retrieves the file from VSM back to primary storage, the user can validate the checksum resulting in true end-to-end data integrity. The checksum of the file on VSM can be displayed with the sls -E command.

Example CLI Use:

Example API Use:

The sls -E command will display the checksum value for a file if it exists:

True and complete end to end data integrity validation is now possible with VSM. We feel that this level of data protection is an important feature for our customers who rely upon archival storage for the long term preservation of their most valuable data assets.

More about the author: