22 February 2016

Transparent Storage Tiering to the Cloud using Isilon Cloudpools

Isilon Storage Tiering

The Isilon Storage Tiering (aka Smartpools) is a functionality that has been around for many years. It allows to send data to specific storage pools (a storage pool is a pool of nodes with the same node type or density). This allows to store data economically effective to the lowest price level. For example (see figure 1), you may create a policy that stores all new data onto a Pool1 that has been built out of some fast S210 nodes. Response times will be extremely good, but price point is also higher (faster disks, faster CPUs etc.). Then, you create another policy that says, move all data that has not been accessed for 30 days into Pool2. This Pool2 may contain X410 nodes. Much more capacity (36 SATA disks) but somewhat lower response times compared to Pool1. Further, you may have a third pool Pool3 that contains data that has not been touched for a year. This data is hosted on HD400 nodes (very dense, 59 SATA drives + 1 SSD per 4U chassis), but portentially lower response times than tier2. However, since this tier is only used for rarely accessed data, it would not impact the user experience significantly (may vary from use case to use case of course). The movement of data, according to policies will be done by the Job-Engine in OneFS. It happens in the background and the user is not impacted. The logical location of the files (the path) will not change. That means, that we could have a single directory that contains files that reside on three different storage technologies.



image
Figure 1:  Policy Based Storage Tiering with Smartpools

 


Storage-Pools and Storage-Tiers

As discussed above, a pool consist of nodes of similar types (nodes can differ slightly but that will not be discussed here. See storage pool compatibility rules in the manual [2]). If you have a very large cluster, you may decide to introduce tiers. A tier contains one or more storage pools. However, in many cases this is not required. The policies that you setup can be applied against pools or tiers.
 
  image
Figure 2: One Filesystem across different Pools/Tiers
 

Policies

The policies that determine where data is stored are simple to setup via the WebUI, CLI or API. Your policies are not limited to file access times alone. You may also want to consider file type, location, ownership or any other file attribute that is available to create your policies (you may even decide to store arbitrary attributes for files in OneFS and use them in policies) . The GUI already contains some nice templatea which you can start with. For example, there is an Archive Template available that contains rules to move older data to older storage. Or an ExtraProtect template that protects files with a specific attribute value with a higher protection level (i.e. n+3 instead of n+2). The WebUI is quite intuitive (see the following screenshot).
  
image
image
Figure 3: Create Policy Wizzard
 

New in OneFS 8.0: Cloudpools

The new and cool feature that comes along with OneFS 8.0 is a new pool of the Cloud type.
With the initial OneFS 8.0 release, which is available since February 2016, the following Cloud Storage (Object APIs) types are supported:
The last two can be used to implement a private cloud archive, because both support a number of Object REST APIs. I guess support for other Cloud / Storage providers will follow such as Google and Virtustream. There might also be service providers that use ECS or Isilon to host Cloudpool target storage.
 
clip_image006
Figure 4: One Filesystem extended to the Cloud with CloudPools


 

Secure Stub to Cloud

Assuming you have set up an appropriate policy that defines the requirements for those files that need to be moved to the cloud. For example, all files that meet all the following criteria :
  1. The files are larger than 5 MB    AND
  2. The files reside in the directory /ifs/data/stefan    AND
  3. The files have not been touched for 3 month      
will be moved. The policy will run according to the schedule for the Smartpools jobs. Like all other Jobs in OneFS, Job scheduling and execution is controlled centrally by the job engine. The jobs can be given priority, a schedule and can be paused/resumed according to the actual needs. The next picture is a snapshot from the List Job Types Tab of the Job-Engine.
 
image
Figure 5: Job Types of the Job-Engine
 
Once the Smartpools Job is kicked off (i.e, every day at 22:00), it examines all directories/files in question (according to the policy) and if the configured criterias in the policy are met, the files will be moved to the cloud. Within the policy, you can also configure, whether the cloud data will be compresses and/or encrypted. The encryption keys are stored on Isilon which means, that no one could read your data in the Cloud, even if someone got access to your cloud storage account.
In the local filesystem a stub file will remain, that contains three things:
  1. The file meta data (like usual: creation time, last access time, size, …)
  2. A ‘link’ to the cloud data
  3. Eventually some of the original cached data.

clip_image008
Figure 6: Content of a stub file
As said, from the application or user point of view, we cannot see the difference of a normal file and one that is stubbed to the cloud (you can of course figure it out – but more on that in a later post).
 

Recalling files from a Cloud-Pool and Local Cache

When a stubbed file is accessed, its content is retrieved from the cloud and cached locally (can be on SSD or HDD). However, it will *not* be stored permanently in the filesystem. If that would be the case, every regular user could fill up the local filesystem with very view commands. The files archived in the cloud could have multiple times the capacity of the local filesystem, therefore a permanent recall of files can only be performed by the administer or a user with the appropriate privileges. The behavior of the local cache can be modified in the CloudPools settings. For example, we can tell the cluster to
  • Cache or not cache recalled data locally
  • To use Cache Read Ahead mechanisms only for accessed data or full files
  • Cache expiration time (a second to years)
  • Writeback Frequency  (how often will the cluster write modified local cached data out to the cloud)


Retention Time

The retention time you can configure with each policy defines how long archive data in the cloud will remain after the local stub file hast been deleted. The default is one week. After that period, the relevant data in the cloud will be deleted. In addition, you can define
  • A Backup retention period for NDMP incremental Backup and SyncIQ. That period defines how long data in the cloud will be kept that has been synchronized by SyncIQ to another location or that has been backed up with an incremental NDMP job. The default is 5 years. That means, if someone has deleted the local stub file, it ran be restored by an NDMP or SyncIQ job and the data can still be accessed for the period of time configured here.
  • A Backup retention period for a full NDMP backup. Like the previous but for full NDMP backups.

 

Summary

With CloudPools, the Scale-Out-NAS System Isilon has now a cool feature that allows transparent tiering to an external storage layer. Right now, two Cloud Providers and two external systems (Isilon, ECS) are supported. There might be more options going forward. The data movement is transparent to clients and secured through AES-256 encryption. By using Cloudpools, one can implement a fast and scalable multi-protocol system with fast response times that can grow almost limitless to cloud scale.
There are some more aspects to cover though. For example performance: What happens with stub files during backup and replication? What about disaster recovery, access to CloudPool data from different sites, step by step approach to set up CloudPools? Stay tuned, I will come back to these questions as my day-by-day job allows me to enter further stuff.

 

Upcoming Webcast on Cloudpools

I’ll discuss Cloudpools in a Webcast on April 12th, 2016. Feel free to register here to join the session:  http://bit.ly/1PYTR0P

References

[1] Isilon OneFS 8.0 Documentation EMC Community Network:
https://community.emc.com/docs/DOC-51646
[2] All actual and older Isilon and OneFS documentation can be found here:
https://support.emc.com/products/15209_Isilon-OneFS
[3] OneFS Technical Overview
http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf
[4] Official EMC Isilon Cloudpools site:
http://www.emc.com/en-gb/storage/isilon/cloudpools.htm

 

Feedback

Appreciate any feedback, good or helpful :-).  Feel free to leave a comment and/or connect me on linkedin: https://de.linkedin.com/in/drstefanradtke

 

Acknowledgements

Many thanks to Matthias Radtke for the review and correction of my numerous typos.









































4 comments:

  1. It allows you to choose the folders on your DiskStation which you want to sync to your mobile device and make available for offline viewing. simple disaster recovery plan

    ReplyDelete
  2. Now a days all companies and every organizations depend upon backup solution. Because all using computer and storage server for storage all data. Storage Servers are increasingly becoming important for network storage and processing. Storage servers allow proficient communication between computer devices and servers with the use of various technologies.

    ReplyDelete
  3. Thanks for providing this informative information you may also refer.
    http://www.s4techno.com/blog/2016/08/01/cloud-computing-interview-questions/

    ReplyDelete
  4. Thanks for providing this informative information you may also refer.
    http://www.s4techno.com/blog/2016/08/01/cloud-computing-interview-questions/

    ReplyDelete