Cloud storage can save users a lot of money over local storage solutions. Unfortunately, users relinquish a considerable degree of control of their data once it has been offloaded to the cloud, and generally have only a Service Level Agreement (SLA) to ensure that their files are being stored at all. To combat this predicament, RSA Labs has developed Proofs of Retrievability (PORs). A POR allows a user to query the storage provider and obtain proof that the file is being correctly stored.
A naïve approach would be to simply download the file at regular intervals. This solution is suboptimal, as cloud storage providers generally charge for bandwidth. PORs are designed for this constrained environment. Instead of checking the entire file as in the naïve approach, a POR samples the file. Because the provider does not know ahead of time which pieces of the file will be asked for, it must store the file in its entirety in order to respond correctly to any and all challenges.
The limitation of a sampling approach is the inability to reliably detect small errors. Unfortunately, even a single bit corruption can render a file useless.
To protect against this, PORs append error correcting information to a file before it is uploaded to the cloud. This additional information makes the file resilient to small errors,
and errors large enough that they cannot be corrected by the redundant information are easily detected by sampling.
As an example, by adding 10% redundant information to a file, the file becomes resilient to a 5% corruption (every 2 bits of redundancy can correct a 1-bit error). Said another way, more than 5% of the file must be corrupted before the original contents can no longer be retrieved. It takes only 20 random samples before one would expect to find an error larger than 5%. The addition of redundant information amplifies the effectiveness of sampling, transforming it from a linear cost in the size of the file, to a constant cost.
Sampling is extremely bandwidth efficient when compared to downloading the entire file, and can be further improved by aggregation. Sample-requesting messages (called challenges) don't need to explicitly identify each sample to be taken. Instead they can provide a seed to a random function, which can then generate the sample locations locally. Once those samples are collected, they can be combined before being sent back in the response. This combination of samples is done through a keyed aggregation function, with the key being provided by the challenger as part of his challenge.
As a result, the challenge/response protocol only requires a few hundred bytes of bandwidth, yet can provide the challenger with strong assurances that his file is being correctly stored.
PORs provide an effective mechanism by which a cloud storage provider can be forced to prove correct behavior. However, they provide no remediation in the event a cloud storage provider does not live up to his SLA and the file becomes unrecoverable. In fact, in a single provider model, there is no way to prevent the provider from deleting a file. To add protection against provider failure requires additional measures. HAIL applies the techniques of a POR in a multi-provider setting, enabling file recovery even in the face of provider failure.
HAIL does this by dispersing the file and additional redundant information across multiple cloud storage providers. Using a POR-like mechanism, HAIL checks that each provider is storing his piece of the file correctly and in the event of an error, uses the redundant pieces stored by other providers to reconstruct the information on the failed provider. In this way, HAIL can not only identify faulty providers, but can recover from their failure without any loss of information.
HAIL does for cloud storage what RAID did for hard drives. By combining cheap, somewhat reliable components in an intelligent way, HAIL creates a system with strong assurances and high reliability. HAIL enables users to take advantage of the cost savings of cloud storage, without forcing them to pay the costs associated with weakened control over their data.
- Ari Juels and Burt Kaliski. PORs: Proofs of Retrievability for Large Files. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07), pages 584–597, 2007.
- Kevin D. Bowers, Ari Juels, and Alina Oprea. HAIL: A high-availability and integrity layer for cloud storage. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09), 2009. (conference version) (full version)
- Kevin D. Bowers, Ari Juels, and Alina Oprea. Proofs of retrievability: Theory and implementation. In Proceedings of the ACM Cloud Computing Security Workshop (CCSW '09), 2009. (conference version) (full version)