Skip to content

Joe Arnold's Blog

Founder / CEO SwiftStack

There is more than one way to do cloud billing. Many deployments out there use the Amazon way of billing for things — no commits, pay-per-use. I think a reminder needs to be given that pay-per-use billing isn’t the only way.

For example, bandwidth providers typically, charge 95-percentile. Meaning, the customer pays for capacity, whether or not they use the capacity. Customers are expected to pay for a bandwidth pipe of some size, for some time interval.

Elasticity comes at a cost. It means that providers need to provision for peaks. While customers only pay for usage. While that may sound ideal for users of cloud services, it does mean higher costs for providers.

Providers need to provision for peaks while only get paid for the shaded area.

Not every customer’s workload is elastic and elasticity isn’t the only feature of infrastructure clouds. Ease of deployment, available tooling, services like managed databases and storage are all valuable components of an infrastructure cloud.

As OpenStack becomes more mature, more boutique firms will come online that offer unique services. In this context it may be perfectly reasonable to ask for commits over a longer time-frame. That means that the provider only needs to provision for the committed workloads and, in turn, offer better pricing to their customers.

For the upcoming OpenStack meetup the theme is ‘Corporate IT’. This got me thinking about what a small-scale Object Storage (Swift) cluster would look like.

At Cloudscaling, we have already done two of the early large-scale OpenStack Object Storage deployments outside of Rackspace. These deployments were for service providers at the petabyte scale.
Petabyte Deployment of OpenStack Object Storage (Swift)

We had 80-100TB staging environments, but those can still be big entry points for some shops.

I wanted something small in the 10’s of TB range that would be useful or corporate IT or for web/app shops that for whatever reason, don’t use public clouds. There is a lot of great tooling available for object storage systems that private deployments can take advantage of. So the challenge was to design a Swift cluster that could start-out with a single node (4-16 TB) and expand up to 4 nodes (32-144 TB).

Why is this a challenge? — Zones

Zones
Swift is designed for large-scale deployments. The mechanisms for replication and data distribution are built on the concept that data is distributed across isolated failure boundaries. These isolated failure boundaries are called zones.

Unlike RAID systems, data isn’t chopped up and distributed throughout the system. Whole files are distributed throughout the system. Each copy of the data resides in a different zone.

As there are 3 copies of the data, at least 4 zones are required. Preferably 5 zones (so that 2 zones can fail).

Racks or Nodes as Zones
In the big clusters, failure boundaries can be separate racks with their own networking components.

In medium deployments, a physical node can represent a zone.

Drives as Zones
For smaller deployments with fewer then 4 nodes, drives need to be grouped together to form pseudo-failure boundaries. A grouping of drives is simply declared a zone.

Here is a scheme for starting small and growing the cluster bit-by-bit (well.. terabyte-by-terabyte).

1 Storage Node

For a single storage node the minimum configuration would have 4 drives for data + 1 boot drive.
If a single drive fails, it’s data will be replicated to the remaining 3 drives in the system.

The system would grow, 4-disks at at time (one in each zone) until the chassis was full.

2 Storage Nodes

The strategy here is to split the zones evenly across the two nodes.

The addition of an additional node does increases availability (assuming that load balancing is configured), but it does does not create a master-slave configuration. If one of the nodes is down ½ of your zones are unavailable.

The good news is that if one of the nodes is down (½ of your zones), data is still accessible. This is because because at least one of the zones will still up on the remaining node.

The bad news is that there is still a 1 in 2 chance that writes will fail because at least two of three zones need to be written to for the write to be considered successful.

3 Storage Nodes

The addition of a third node further enables distribution of zones across the nodes. Something strange is going on here by putting whole zones in each node, but breaking up zone 4 into thirds and distributing across the three nodes. This is done to enable smoother rebalancing when going to 4 nodes.

Again, if a single node is down, data will be available, but there will be a 1 if 5 chance that a write would fail.

4 Storage Nodes

The strategy of breaking up Zone 4 into thirds with 3 nodes, is to make this transition easier. The cluster can be configured with zone 4 entirely on that new server, then the remaining zones can slowly be rebalanced to fold-in the newly vacated drives on their node.

Now, if a single node fails, writes will be successful as at least two zones will be available.

Why Small-Scale Swift?
Using OpenStack Object Storage is a private-cloud alternative to S3, CloudFiles, etc. This enables private cloud builders to start out with a single machine their own data center and scale-up as their needs grow.

Why not use RAID?
Why not use a banana? :) It’s a different storage system, used for different purposes. Going with a private deployment of Object Storage gives something that looks and feels just like Rackspace Cloud Files. App developers don’t need to attach a volume to use the storage system and assets can be served directly to end users or to a CDN.

The bottom line is that a small deployment can transition smoothly into a larger deployment. The great thing about OpenStack being open-source software is that it gives us the freedom to build and design systems however we see fit.

OpenStack Summit Spring 2011
I’m wrapping up my notes from the OpenStack Conference with a rundown of some of the other sessions I attended that didn’t fit neatly in the areas I was focusing on during my time at the conference (Networking, Block/Volumes, Object Storage). Here are my remaining notes.

Unified Identity System
This is a welcome proposal. There is a need to unify the interfaces for authentication between Nova, Swift and other upcoming services. This proposal is to rip out identity into a separate system so that new OpenStack services can be written without having to re-implement an authentication service. Basically, the proposal is to have a similar architecture to that of Swift. Where there is a separate service (called Keystone) which is responsible for responding to a series of API calls for Authentication/Authorization. This is a huge change as ~30% of the code will need to be touched. As such, the they will be doing the refactoring in phases. I’m looking forward to this change as it will help out with the integrations we’re doing.
http://plansthis.com/auth

Atlas Load Balancing
Rackspace is developing an load balancing service. Zeus is the backend. Something unusual about the project is that is’s a Java project that integrates with Glassfish, rather than the usual python stack. Naturally, Citrix is working on a solution based on NetScaler as well. An open-source version based on HAProxy is in the works as well.
http://wiki.openstack.org/Atlas-LB

Burrow
Burrow, is an event notification system that is an AWS SNS equivalent.
http://etherpad.openstack.org/notifications

Reference Architecture
There was a little coming down to earth on Friday morning when references architectures were discussed. It was refreshing break from days spent discussing features and adding layers to the existing architecture. I imagine what becomes a reference architecture is going to be whomever rolls up their sleeves and builds a stable CI system that goes through frequent builds & test runs. Likely a contingent of Rackspace, Dell and Citrix.
http://etherpad.openstack.org/configs
https://blueprints.launchpad.net/nova/+spec/reference-architectures

Scalr
Sebastian Stadil did a short description of Scalar. He proposed Scalr to be a tool for end-users to manage their OpenStack resources.
http://wiki.openstack.org/Scalr

That about does it for me. There were a ton of sessions I didn’t get a chance to get to. There is a ton of ground to cover over the next few months. I know I’m looking forward to the future of OpenStack.

See Previous:
OpenStack Conference Spring 2011 Object Storage (Swift)
OpenStack Conference Spring 2011 Block Storage
OpenStack Conference Spring 2011 Networking

OpenStack Summit Spring 2011

Continuing to get down my notes from the Spring 2011 OpenStack Conference. This time i’m covering a topic true to my heart as I’ve been involved with deploying Swift for our clients (Deploying Petabytes with OpenStack, KT’s Storage Cloud). As always, it was great to spend time at the conference with the core Swift team and to swap stories on our respective deployments.

Commercialization of OpenStack Object Storage
I gave a talk on Commercialization of OpenStack Object Storage (Swift). See my slides in a previous post. It was great to give the talk with our partners at KT. I’ll link to the video as soon as Stephen Spector gets back from his summer vacation and uploads it to Vimeo.

Container Sync
Gregory Holt gave a presentation on his thoughts on providing container replication in Swift. Sensibly, Gregory proposed an incremental approach to the introduction of this feature — begin with simple replication of containers between Swift clusters. How it will work is that a user would be able to configure a container in each cluster to ‘point’ to each other (set a shared key & respective Swift container URLs) and the containers will be able to sync with each other. This is a great feature to incorporate into Swift. It will provide a useful solution for cross-site replication of data without introducing too much complexity. The downside is that you can’t specify the number of replicas at each site, but expect that as an item on the roadmap. Lookout EMC Atmos!
http://www.slideshare.net/openstackcommgr/swift-container-sync

Upcoming Swift Features
(FYI, the Swift team is moving quickly ahead, as of this writing they have already released 1.4.0 https://launchpad.net/swift/+milestone/1.4.0 with many refinements and bug fixes.)

Discussion on the last day was about what features should be worked on in core Swift.
Highlights included:

  • Access log delivery
  • self-destructing objects
  • tiny URL
  • server-side chunking for large-file uploads
  • The most controversial topic was Josh McKenty discussing overloading the a Swift configuration component (the Ring), for managing compute zones. Interesting…

http://etherpad.openstack.org/swift-features-ods-d

Next up: OpenStack Conference Spring 2011 Misc Sessions
See Previous:
OpenStack Conference Spring 2011 Block Storage
OpenStack Conference Spring 2011 Networking

OpenStack Summit Spring 2011

I’m continuing the several-weeks-late rundown on the OpenStack Conference. Previously, I walked through my OpenStack Conference Networking discussion notes. This time I’ll be sharing my notes from the block storage / volume sessions I attended. I hopped through most of the sessions as best I could. Here is my summary.


XenAPI Volume Driver

Renuka Apte of Citrix proposed adding a new Xen-specific volume driver in Nova (alongside the existing iSCSI, AoE drivers) that would make callouts to XenAPI. The proposal was basically a proposal to allow Nova to piggyback off of their existing XenAPI Storage Manager. This would add support for NFS, NetApp, various iSCSI / FC, EqualLogic, etc. It does add an additional layer of abstraction to volume management, but it does open the door for broader storage support for those using XenServer.
http://wiki.openstack.org/xenapi-sm-volume-driver

Gluster
Gluster talked us through their clustered filesystem and shared how various companies are using their solution. Their OpenStack proposal was to provide Swift-compatible APIs and to provide image storage. It could potentially be an interesting blend of an object store with filesystem access. Mostly it felt like they wanted to share the work that they have been doing on their open-source (with commercial offering) clustered filesystem and introduce themselves to the OpenStack community.
http://www.slideshare.net/openstackcommgr/gluster-open-stack-dev-summit-042011

Virtual Storage Arrays
A new company in stealth mode made their appearance at the conference. Their goal is to create virtual volumes that end-users can carve up into partitions however they would like, with the performance characteristics that they would like. For example, compute instances are available with many different performance characteristics (Sizing is along the dimension of RAM, CPU cores, IO capabilities, etc.). Zadara wants to provide the ability to specify performance characteristics (IO, durability, etc.) to block storage so that a cloud can provide storage blocks along more dimensions than just gigabytes.
http://wiki.openstack.org/NovaVsaSpec

Lunr
Chuck Thier of Rackspace presented the block-storage service called Lunr. Presumably, we can expect to see the results of this soon from Rackspace Cloud Servers. Lunr is a separate block storage service with its own set of OpenStack APIs. Below is the blueprint for its integration into Nova. https://blueprints.launchpad.net/nova/+spec/integrate-block-storage

Snapshot/Clone/Boot from Volume
Isaku Yamahata of VA Linux, Japan and Kazutaka Morita of NTT were also are working on an EBS-equivalent system. They modified Nova-volume itself to allow for snapshots, cloning, and boot from volumes. They’ve implemented a proof-of-concept based on the EC2 APIs and just using LVM2. The Lunr project will probably superseded this effort.
http://www.slideshare.net/openstackcommgr/snapshot-clonebootpresentationfinal

In Summary, expect to see Lunr API support in Diablo. Watch for software storage vendors to provide out-of-the-box support for the Lunr API. I expect the next OpenStack Conference to have session discussing where next to take Lunr to support more features.

Next up: OpenStack Conference Spring 2011 Object Storage (Swift)

See Previous: OpenStack Conference Spring 2011 Networking

OpenStack Summit Spring 2011I know this is a couple of months late… But, hey, we’ve been busy at Cloudscaling. Besides the talk I gave (Commercializing OpenStack Object Storage), I wanted to distill my notes from the Spring 2011 OpenStack Conference / Design Summit and give a rundown of where things are heading. There are a ton of topics from the conference to cover. So to start, here are my notes from the networking sessions I was able to attend. I’ll be covering more topics in future posts.


There was a a cacophony of proposals related to networking abstractions and interfaces. I attended the tail-end of an impromptu session on the first evening and about 40% of the 3-hour power-session on Thursday. One thing was clear — everyone who was participating in the discussions thought of networking as a first-order element in OpenStack (rightfully so!) just like compute and storage. NaaS (Networking as a Service) was an often-used phrase, as was references to Amazon’s CloudFormation and DCaaS (Data-Center-as-a-Service).

Dan Wendlandt from Nicira presented a proposal to abstract network services for OpenStack (called Quantumn). This proposal was primarlly focused on providing network connectivity-as-a-service. Additional network services (like load balancers, VPNs and the like) would connect into a Quantum resource just like a VM. http://www.slideshare.net/danwent/quantum-diablo-summary

Donabe was proposed by Ram Durairaj of Cisco. Lew Tucker and Ram Durairaj have a vision they really want to see through with network services. This proposal is a big vision of where networking in OpenStack can go. Donabe is a method of grouping of network resources into a manageable hunks that could be offered (…take a guess…) as-a-service. The idea is to contain other services into a meta service. For example, a service provider could bundle a ‘web application’ bundle which would contain load balancing services, a caching service, a front-end network for the application servers and a high-speed back-end network for data. These containers would have metering and pricing attached to them somehow. Check out the proposal: http://www.slideshare.net/ramdurairaj/donabe

IP management proposal Melange was discussed, but I didn’t see a presentation on it first-hand. Blueprint has it scheduled for the Diablo release. This would change existing Nova IP assignment. http://wiki.openstack.org/Melange

There was a lot of deliberation. One comment was memorable, “trying to get everyone to agree on a single monolithic network service is trying to boil the ocean”. I have to hand it to all involved with the networking discussions. Everyone worked hard to hash out issues and communicate their perspectives throughout the conference.

In summary, the target for networking in Diablo is to have in experimental mode:

  1. L2 services (code name Quantum)
  2. Network container abstraction (code named Donabe)
  3. IP Address Management (aka IPAM, code name Melange)
  4. and the Nova refactoring to support it

Next up: OpenStack Conference Block Storage

At the Symposium on Massive Storage Systems and Technologies, Geoff Arnold from Yahoo! calls out the major trends he is seeing in storage.

  • From enterprise-class to consumer hardware
  • From filers and arrays to commodity JBODs
  • From disk to RAM (or SSD if necessary)
  • From RAID to decentralized erasure coding
  • From SANs to converged networks
  • From FC (and FCoE) to Ethernet and iSCSI
  • From NFS to REST
  • From POSIX & SQL to objects & NoSQL (key-value)
  • From transaction to eventual consistency
  • From schema & XML to open content & JSON
  • From synchronous to asynchronous operations
  • From MTBF to MTTR
  • From predictable business models to chaotic innovation
  • From human operator processes to autonomics and self-service

Link to the full presentation here: http://storageconference.org/2011/Presentations/Tutorial/1.Arnold.pdf

I think these trends are spot-on.

Follow

Get every new post delivered to your Inbox.