Stefan Radtke's Blog: Isilon vs. SONAS Part1: Hardware Architecture

Here is my first Blog in 2012, just some weeks after I have started my new job at EMC. Until December 2011 I have worked 16 years for IBM and in my last job there I was in the Advanced Technical Skills team for SONAS. So the first thing I started at EMC is to dive into Isilon, an exciting solution in the high end arena of networked attached storage. You can imagine that as I am learning, I always compare in my mind things with SONAS. And my idea was to share my thoughts with the community topic by topic in a blog form rather than waiting until the end and then write a larger paper. Things may have changed then so I start right now. I have in mind covering the several topics such as:

Hardware and general Architecture
Data layout, capacity and efficiency
Growth options
Supported protocols
Availability
Security
Management
and others which may come into my mind while studying Isilon

I’ll do this one after the other and start right here.

Hardware Architecture high level view

Isilon and SONAS are both targeting the high end segment of the NAS market. Both are based on a parallel filesystem which allows scaling with a single namespace and parallel access from hundreds or even thousands of client users or processes. Both systems are scale out systems using a high bandwidth low latency Infiniband network for internal communication and data distribution across the cluster nodes.

One first obvious and major difference is that Isilon uses nodes with internal storage whereas SONAS does stick with more inflexible and more expensive fiber channel architecture to attach storage to it’s nodes. Also in SONAS the IO-nodes or front end nodes have been separated from the storage or backend nodes, although the underlying GPFS would allow running the respective processes of the Network Shared Disks (NSD) server and client processes on the same node (much like Isilon is doing it). The separation is probably to prevent the backend nodes from being over-utilized in terms of CPU power. In that case IO could time out and the application might get in trouble. I have seen this in raw GPFS installations where all nodes were configured as client and server nodes. The more official reason might be that this separation allows for independent scaling of IO and storage nodes. Well, there might be cases were this is useful like high throughput workloads (i.e. massive video scaling). In Isilon these scenarios can be covered with so called accelerator nodes. These nodes do not have storage so they very much comparable with IO nodes in SONAS. Furthermore, accelerator nodes can be used for backup purposes as they can be directly attached to a FC SAN for direct IO other media like tape or VTLs.

Unlike OneFS, GPFS does not subsidize the RAID, LVM and Filesystem in one layer, so IBM had to stick with the somewhat outdated RAID type of data protection. Furthermore a Fiber Channel network connects the storage nodes with the heavy storage nodes (from DDN) and adds another layer of complexity and latency to the IO path. Imagine that in SONAS release updates typically contain all or many software components like the Linux OS of the nodes, GPFS, Firmware of the Storage Controllers, the Fiber Channel Switches, the Ethernet Switches (which are mandatory for the internal management network), IB Switches etc.

Fig.1: SONAS Components and Networks

As you can see from the above picture, the data traffic in SONAS goes from Gibabit Ethernet (1 or 10GB) -> Infiniband -> FC ->SAS . Is that smart? No. It is not flexible but complex and expensive. The processes of the shown management node have been shifted into the GPFS nodes in the meantime (with SONAS 1.3) so that no separate node is required anymore.

The next picture shows the high level Isilon architecture. Since OneFS combines all the features of the RAID, LVM and Filesystem layer, the overall architecture is much smarter; Isilon doesn’t need a heavy weight Fiber Channel SAN since it can use internal storage with greater and more flexible protection level.

Fig.2 Isilon and it's components

Currently SONAS only supports RAID6 on their disks with a 8+P+Q configuration which means 0 spare disks !!) So as soon as a disk fails, the administrator should be alerted to react immediately to replace the failed disk. Another not-so-nice feature. The rebuild times of RAID6 disks are long (can take up to days) and getting inacceptable as the disks size grows but it also impacts performance for that array dramatically.
Much more flexibility with Isilon: not only can the administrator chose the protection level much more flexible in a M+N configuration like 9+1, 12+2, 15+5, 16+4 (see a good explanation here). Furthermore, this can be set on file and directory level. Also, there are no dedicated parity nor spare disks. All disks are used during normal operations, data and parity information (more on the data distribution later) are distributed across stripes. This is not only more efficient but also the rebuild times in case of a disk failure are much faster as many disks are used to rebuild the missing data. (IBM guys know this kind of rebuild technique well from XIV but is has not made it’s way into GPFS or SONAS).

Now you may ask why SONAS with all it’s components like SAMBA, CTDB, GPFS, Tivoli Flash Copy Manager, TSM, IO- and Storage nodes is so much more complex compared to Isilon. Well, I think it is due to the way IBM tends to develop things: they look what parts do they already have in their basket. This is the starting point rather than the more basic question “how needs the system look like from the customer/user perspective”. Isilon obviously went this way and as a startup years ago and they really developed a much more integrated solution from scratch.

Next topic will cover data protection mechanisms of SONAS and Isilon.

Stefan Radtke's Blog

07 February 2012

Isilon vs. SONAS Part1: Hardware Architecture

Hardware Architecture high level view

1 comment: