After seeing how many other applications are dealing with Azure Blob storage, I’ve discovered a couple of standard practices when it comes to the creation of a storage hierarchy, along with some performance and security considerations. I haven’t found a lot of insights regarding this topic and therefore decided to write something down.
Azure Blob Storage: A bit about containers
As you probably know, a storage container, to some degree, resembles a base directory of the root of a hard drive. However, storage containers can’t be nested, nor can they contain folders. Therefore it’s required to construct a storage hierarchy by mimicking a structure which can be implemented in various ways.
A storage hierarchy is most often accomplishing by adding a prefix on the name of the Blob itself. Let me show an example to make all a bit more tangible.
Let’s say you have a storage container for storing raw footage.
Object structure: Storage account
» Container
» Blob name
Example: sa
» [blob.core.windows.net]
» raw
» picture1. rif
NOTE: There is one exception to this rule. Azure Blob storage supports using a root container which serves as a default container for a storage account. When using a root container, a blob stored in the root container may be addressed without referencing the root container name, so that a blob can be addressed at the top level of the storage account hierarchy.
This all looks fine, but now you also want to store uncompressed audio files and videos as well. Many applications are adding a prefix within the name of the Blob to build a storage structure, as demonstrated.
sa
» [blob.core.windows.net]
» raw
» pictures/picture1.rif
sa
» [blob.core.windows.net]
» raw
» audio/recording1.pcm
In this example, “pictures/picture1.raw” and “audio/recording1.pcm “are the names of the Blob entries. There isn’t a problem from a performance perspective give that blobs aren’t partitioned based on containers. However, doing so will require some additional logic when using filters or iterating over the entire collection.
I don’t want to deal with parsing logic on a blob level, and therefore include all as part of the container name instead. But there is one restriction; Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character http://msdn.microsoft.com/en-us/library/azure/dd135715.aspx. And will therefore be using the dash as a delimiter. This results in the following;
sa
» [blob.core.windows.net]
» raw-pictures
» picture1.rif
sa
» [blob.core.windows.net]
» raw-audio
» recording1.rif
In this case, I’m using two separate storage containers, allowing me to simplify the usage of filters when retrieving blobs. It also allows easy iteration through all blobs without the need for a content type check. For example, when you want to host the container within a different storage account for performance reasons. Another benefit is that this also allows specifying a different security setting based in your hierarchy.
Conclusion
If you don’t care about having dashes within the blob endpoints and CDN assets, using the dash as a delimiter is an excellent way of keeping ugly naming logic out of your Blob names.
Recent Comments