Storage Monkeys Blogs

Rants and Raves from the community
BasRaayman

To start off from the beginning, I'm a regular reader over at Gestalt IT. "So what is Gestalt IT" you may wonder now. Well, on their website you will find the following:

We are collecting the best analysis and commentary from leaders in the fields of virtualization, networking, storage, and desktop engineering.
...
We work with independent experts, bloggers, and writers to generate content focused on IT infrastructure topics. Many of our articles and posts are syndicated from the blogs of their authors, meaning that they select their best and most relevant work and transmit it to us using an RSS feed, just like Google Reader and other feed readers use. Posts are then formatted and edited for publication here.

So, you could call it an agregate of posts that are collected and submitted by people who know their stuff in things IT.

So, I read that they were organizing something called a "Tech Field Day", which is basically an event where they invite bloggers from all over the place to come to San Francisco and see and test new products by various vendors. You can use the stuff they introduce, punch holes in their product and exchange ideas and opinions.

The big difference to a regular conference? It's not sponsored by a big company. The main point is not profit. The people there are free to say what they want and can write about the things that they find interesting. So, after reading about it I asked the initiator of this Idea, Stephen Foskett, when we would be seeing something similar in Europe. Shorlty after I received a short tweet from him with the question if I would consider attending.

So I did, and we've been visiting several companies last week. One of the companies shares it's name with an "ancient flute-like wind instrument" and instead of being a windbag actually does some pretty nifty things:

Ocarina Networks logo

Ocarina Networks, or Ocarina as they are usually called are a company that specialize in a thing called data deduplication and compression. Basically you can think of it as removing all the data that you find more than once and replacing all duplicates with a pointer to just one original version of the data. This can be done on multiple levels, and the most 'simple' version would be to use a corporate mailbox as an example. Say you would send out a mail to 5 colleagues with a Powerpoint presentation you want them to review. Normally each recipient will have a copy of this file in his or her mailbox and consume the space for the attached file. A deduplication solution could for example look and find that the same file exists 5 times. It saves one version and has the others just point to this one file.

Now, you could try and do the same thing on different layers. One of those layers is for example the storage system. There the various vendors look for similar chunks of data and see if there are comparable patterns and then use the same pointer technique. There is one drawback of doing it at that level though. As soon as you have the same presentation and one of the people changes it, the disk footprint of the file changes in a way that avoids deduplication. That is quite odd considering that they probably just edited some small things and a lot of slides, logos and pictures will remain unchanged.

Ocarina actually found a way around that by working on a different layer. This also provides some other benefits, and fortunately one of the other attendees, Simon Seagrave of TechHead brought along his Flip camera (I forgot mine) and recorded Ocarina's CTO Goutham Rao as he explained what their product does and where the advantage in their product can be found.

 

Now, as you have heard, this is actually an optimizer that is content aware. To pick up on the example above, the optimizers created by Ocarina look at the files. They will actually go into files and check their content for duplicate chunks. Think of the example that Goutham mentioned of a corporate logo that appears in various unrelated files. The Ocarina optimizers are actually able to find such examples and effectively reduce the total footprint by combining deduplication and compression.

For a rough drill down in the areas of compression and deduplication I would recommend you bring some time and watch the following video, but be sure I warned you since it's roughly 40 minutes long. It's absolutely worth it though!

 

And yes, you did hear that right. One of the first compression algorithms was the Morse code. For more information on that and a further intro in to compression you can find some more information here.

Now, all of this technology is packed into two rack mountable housings called "optimizers". You will currently find two versions of these optimizers. The first one is the 2400 and you can find the 3400. Main differences include the amount of CPU's which is only natural when you take the amount of number crunching that is being done into account. Other differences are among others the amount of RAM, the size (1U vs. 2U) and the built in disks.

 

Ocarina optimizer compared to NetApp FAS

Now, Ocarina actually makes some pretty big claim as to how they perform. If you read along on Twitter you will have seen the following picture already that shows the dedupe and compressed dedupe results when compared to a NetApp FAS. My apologies about the bad quality of the picture by the way. I didn't bing a decent camera along and only had my cellphone handy at the time.

All of the above was crammed in to a few hours, combined with some hand on and a challenge which I already wrote about. The challenge actually showed us some interesting things about the optimizers.

First and foremost, this stuff actually works, and works quite good! Because you reduce the footprint of the data going over the line, you actually use less space in all areas. I have seen a reduction in footprint of up to 70% which can make a lot of people very happy. Your storage, network and backup admins will probably be first in line to thank you for using such a product.

Second, it does have it's weaknesses. Depending on the existence of for example duplicate files, encryption and the dictionary used, your results may vary. One of the attendees brought along a small USB stick with 2GB of data on it consisting of ESX install iso files. The compression rate on them? None whatsoever. Yes, that's right. None at all. But that might be due to the fact that we did not have duplicate files, and we just simply didn't have a dictionary for iso files. One of the advantages is that since we are dealing with software, the chances of Ocarina adding such support is not too bad. Especially since they will probably mull on the results of our datasets.

All in all I have to say that this was one of the best presentations during the GestaltIT Tech Field Days, and it's probably something that can be used as an example for future similar events.

My guess is we will be seeing a lot more from Ocarina networks in the future, and since this technology allows us to save on almost all fronts, I would assume that it won't be too long before we will be seeing similar systems that were created by other companies. I'm looking forward to see the potential of this technology unfold further and would love to see some of your comments on the product.

Oh, and last but not least a big thank you to Simon for letting me use his footage! :)

P.S. Just as a small note since I'm publishing it on a US site, the presenting sponsors for this event (and some non presenting sponsors) paid for the flight and accommodations for the attendees. We are free to write what we want about the events and the presented products and this is my impression of the event and product. :)


Tagged in: Untagged 
Comments (1)Add Comment
sunshinemug
Great post--hope others followed the comments on yr blog
written by sunshinemug, November 30, 2009
Lots of good discussion of this on Renegade's Technical Diatribe where it first appeared: http://renegade.tweakblogs.net...tion.html.

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy