Azure Blob Storage and storage hierarchy

Azure-Blob-Storage-1

After seeing how many other applications are dealing with Azure Blob storage, I’ve discovered a couple of common practices when it comes to the creation of a storage hierarchy, along with some performance and security considerations. I haven’t found a lot of insights regarding this topic and therefore decided to write something down.

Azure Blob Storage: A bit about containers

As you probably know, a storage container to some degree resembles a base directory off the root of a hard drive. However, storage containers can’t be nested nor can they contain folders. Therefore it’s required to construct a storage hierarchy by mimicking a structure which can be implemented in various ways.

A storage hierarchy is most often accomplishing by adding a prefix on the name of the blob itself. Let me show an example to make all a bit more tangible.

Let’s say you have a storage container for storing raw footage.

Object structure: Storage account » Container » Blob name
Example: sa » [blob.core.windows.net] » raw » picture1. rif

NOTE: There is one exception to this rule. Azure Blob storage supports using a root container which serves as a default container for a storage account. When using a root container, a blob stored in the root container may be addressed without referencing the root container name, so that a blob can be addressed at the top level of the storage account hierarchy.

This all looks fine, but now you also want to store uncompressed audio files and videos as well. Many applications are adding a prefix within the name of the Blob to build a storage structure as demonstrated below.

sa » [blob.core.windows.net] » raw » pictures/picture1.rif
sa » [blob.core.windows.net] » raw » audio/recording1.pcm

In this example “pictures/picture1.raw” and “audio/recording1.pcm “are the names of the Blob entries. There isn’t really a problem from a performance perspective give that blob’s aren’t partitioned based on containers. However doing so will require some additional logic when using filters or iterating over the entire collection.

I personally don’t want to deal with parsing logic on a blob level, and therefore include all as part of the container name instead. But there is one restriction; Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character http://msdn.microsoft.com/en-us/library/azure/dd135715.aspx. And will therefore be using the dash as a delimiter. This results in the following;

sa » [blob.core.windows.net] » raw-pictures » picture1.rif
sa » [blob.core.windows.net] » raw-audio » recording1.rif

In this case, I’m using two separate storage containers. Allowing me to simplify the usage of filters when retrieving blobs and even able to iterate though all blobs without the need of a content type check for example when you want to host the container within a different storage account for performance reasons. Another benefit is that this also allows specifying a different security setting based in your hierarchy.

Conclusion

If you don’t care about having dashes within the blob endpoints and CDN assets, using the dash as a delimiter is an excellent way of keeping ugly naming logic out of your Blob names.

Post Navigation