Thursday, October 6, 2011

With a mind on the cloud, eyes on virtualization and feet planted firmly on the (datacenter) ground...


Hindsight is great! Too bad it comes from mistakes.

Not too long ago in human time but light-years in IT time there were mainframes and terminals. It only made sense; keep the power of processing and management on a single location (or a limited set of distributed locations) and give users access only to what they need. Which back then wasn't much at all by today's standards. Admin/programmers (the two were virtually indistinguishable) had it all figured out. Mainframes would only get more powerful and applications more complex, but essentially the model would stay the same. Why wouldn't it?

Then along came the Personal Computer. At first it didn't seem like a ground-shaker, it was perceived more like an expensive toy. However it got consistently better at doing, well, everything. Then came the need for PC users to be able to work together in groups, then the groups of users had to work together, enter the Internet, and well, here we are.

We are essentially stuck with the server-client model since the early nineties. Since then there has been a number of technological trends like thin clients, blade servers, virtualization, and now 'the Cloud'. And every time there's a new trend, the market goes through the same predictable loop; First there's the talk. Expos, conferences, blogs, newsletters and the such. "Technology X is the word." "It's the new style." "It's the way of the future." Then along come the sales people with scare-mongering and blitzkrieg tactics; "Why are you still stuck with the old model?" "Do you realize that your competition is ahead of you?" "Research shows that 99.99% of companies that moved to the Technology X model are already seeing a 400% reduction in costs and gazillion% increase in productivity." Vendors release optimized hardware, evangelists preach the gospel of change and everyone is waiting for the dawn of a new era...

...that simply never comes. It's the same story over and over again. As IT gets ever more complex, new technologies provide solutions in one context and cause problems in another. Let's take Virtualization for example. It's a fantastic bit of technology that will play an ever-growing part of our lives for years to come. It allows businesses to save on hardware and electricity, consolidate their infrastructure and better provision for the future. But think for a minute it's the answer to all your problems and you're in for a world of pain.

A great example of this is an old client of mine; they had 30+ mission-critical servers. Then along came a major hardware vendor that put the Virtualization spell on the management. The servers were aging they said. Think of the electricity bill. Think of all the hardware support contracts. Think of the management overhead. The competition has already virtualized their entire infrastructure. What are you waiting for? And they went for it! The vendor consolidated all 30+ servers into a virtualization cluster with 2 physical hosts and a storage solution. The new equipment was an absolute beast performance and storage-wise with plenty of both to spare. Everyone was happy, plenty of smiles and pats in the back to go around.

Then performance began degrading. At first it was unnoticeable, but it progressively got worse and in 9 months' time since its introduction it all went pear-shaped. Performance was so slow the whole system became unworkable. The in-house IT department could not figure out why this was happening. When I first got involved, I was mortified! Storage performance indicators were off the chart, everything was on the red zone. The whole thing seemed to be set up horribly wrong. I fought the urge to ask the age-old IT Pro question "Who set this up?". At a second glance it became clear to me that the problem wasn't the initial design and set-up. It was the lack of design, provisioning and the absence of a distinct road-map on how that initially virtualized infrastructure would be allowed to grow inside the available hardware. The chaotic way that data grew was what caused the storage subsystem to choke to death.

Now I'm at a crossroads. I can either start the tech-talk and risk losing my non-technical audience or skip the tech-talk and get raised eyebrows from the IT Professionals reading this. I'll opt for the latter and promise to come back at a future blog post and analyse this case study in detail.

The bottom line is that data tends to grow, and it frequently grows exponentially. Now, the most prevalent argument in favour of Virtualization is that when you buy a new physical server you usually buy more than you need: more processing power, more memory, more disk space. The fact is that you also buy relative peace of mind; the knowledge that your data and the processing power it requires can expand unimpeded, at least up to a certain point. Then again, another pro-Virtualization argument is that you can add processing power, memory and storage at will. But where will these resources come from?  How do you manage growth in a virtual machine environment? How do you plan for the future? You can easily come to a point where provisioning generously for a Virtualization environment costs as much as having a non-virtualized environment.

The truth is that properly maintaining and provisioning a Virtualized environment not only has a larger administrative overhead, it also requires more knowledgeable staff. Something the case study client mentioned above hadn't planned for. In such a radical infrastructure change, the management needs to factor in the essential training and the costs it entails. And then there's the issue of availability; If any one of the aforementioned 30+ non-Virtualized mission critical servers goes down, it's bad enough. What if, in the Virtualized environment, the entire storage unit gives up the spirit? Granted it's unlikely, but still a possibility. If and when it happens, it's an absolute disaster. So now you need a second storage system to mirror the first. Let's face it, the math sometimes just doesn't add up.

There's a simple reality behind all this; The larger the infrastructure, the more cost-effective Virtualization is. Could it be then, that Virtualization is not THE way forward for everyone? Could it be that there's just no ONE technological trend to fast forward us to the future?

I know a lot of Cloud advocates that would strongly disagree. The Cloud offers true 'resource on demand'. It charges only for what you use, when you use it. It allows for rapid expansion of any infrastructure, followed with rapid downsizing without losing investment in equipment. It's always available and has a reduced management overhead. It offers state-of-the-art security fail safes. It's a businessman's dream which is destined to render IT Professionals obsolete. Or is it?

What is this 'Cloud'? Well, it's nothing more than a fully virtualized infrastructure with mechanisms that automatically allocate resources on demand or according to predefined rules. Nothing more and nothing less. So, in theory, the Cloud encompasses all advantages from Virtualization with none of the disadvantages... but in reality, although it pieces together many aspects of the IT jigsaw puzzle, it raises new issues.

Where is the Cloud? Nobody really knows where and there are numerous Cloud providers out there. And anyway, location is supposedly irrelevant since a Cloud infrastructure is, or rather should be, everywhere. What I can tell you is where it's not: inside your company/organization. Do you care? Should you? Well, Cloud advocates say no. It's a seamless solution, where a Cloud provider takes all necessary steps to provide you with ample resources, absolute availability, total control, lightning-fast speeds, world-wide distribution, etc, etc. So what could go wrong? What issues could possibly stem from a move to the Cloud?

The Cloud is nothing more than an elaborate sum of components and sub-systems, commonly found in any IT infrastructure. The cascading levels of redundancy, intricate net of failover mechanisms and wide physical location distribution, are what make any Cloud what it is. Cut enough strings from the net of redundancy and resources and the whole thing comes crumbling down. It's happening all over the place, just a superficial Google search produces tons of evidence: the AWS (Amazon Web Services), Google and the Google App Engine, Sony PlayStation Network, the list goes on and on. Security-wise, there's the issue of attack surface area. If you run an average-sized business with an average IT budget and an average datacenter, you can put enough security measures in place to deter potential attackers, simply because the amount of money and effort that needs to be put in place to hack your infrastructure is far greater than what this data is worth. Now, in the case of the Cloud, you've got thousands or hundreds of thousands of such bundles of data in virtually one place. Hacking the Cloud pays, hence the numerous occurrences of such incidents.

Location is also a major factor. In many countries a 100 Mbps pipeline to the Internet costs more than to equip an maintain a datacenter for years. What's the point of having your local data on the Internet if you cannot access it at LAN speeds? What if that pipeline goes down? Now you need another one for failover. The productivity and cost-effectiveness equation is found wanting when it comes to moving all data to the cloud. Businesses tend to need some data close-by for a variety of reasons; speed of access, security policies/protocols, sheer size, cost of Internet access and, yes, psychological reasons. I know many a CEO and CFO old-timers that would rather die than have their ERP data at any place they cannot see and touch. Call it a gross exaggeration, call it ignorant, call it backward, call it what you want and it still makes no difference. Sometimes the benefits of caution outweigh those of cost-effectiveness in the long run.

Another client of mine had the misfortune of hitting both snags in one go. They opted for a total move to the cloud after some coaxing by a major Cloud provider in the South-East Europe. They just went for it, all-out. I didn't have a say in it since my support contract fees were, among others, part of the overall cost saving the move would bring. The move went great, everything ran smoothly, plenty of hand-shakes, smiles and pats in the back to go around. However they soon found out that they miscalculated their overall peak data usage, especially regarding file sharing. The amount of bandwidth required in mid-day far exceeded that of their Internet pipeline. So some services had to move back to their datacenter which, by the way, had just been refurbished into a meeting room. But anyway, the proper balance was struck, the management was happy, the users were as happy as users can be after a major infrastructure change, all seemed well.

Then, on a sunny summer morning (June 2011) it all went pear-shaped. Right now, I am actually looking at the email the Cloud provider's Operations Director sent to all customers. The subject reads: "Update for the Cloud infrastructure failure event dated [...]". Due to ethical and possibly legal implications I will not quote the body of the email. He proceeds to describe the reasons of this failure, which were attributed to the inability of the redundancy mechanisms to maintain data integrity and physical server connectivity. This caused performance issues and occasional service loss. Finally, he attributes the failure to the hardware vendor and promises swift resolution. To my knowledge this issue has still not been fully resolved, and it's October the 6th.

Having worked extensively with storage systems myself, I cannot help but sympathize. Such failures are more common than one might think and every precaution should be duly taken to plan for every contingency. But customers don't care who's to blame. The damage is done and no SLAs, no amount of compensation can make up for it. Extensive infrastructure downtime can hurt a business in so many ways, it's a whole new blog post to even scratch the surface of the matter. The moral of the story is that the severity of any infrastructure failure rises proportionally with the level of consolidation that has been achieved. And as the Cloud is the epitome of consolidation, total reliance on it brings great risk. And don't get me wrong, I am not saying that there's a high chance of failure. Quite the contrary, the chances of any major public Cloud infrastructure going down are low. But it does happen and if you're totally reliant on it, it's out of your hands and herein lies the risk. Now just pause and think. What are the chances of a major datacenter disaster? How often do we see one? You must admit, it's pretty rare as well...

There are various degrees of commitment to any type of infrastructure change. Some businesses start slow and some go all-out. Learn a lesson from the ones that got burnt, and take it slow. There's a distinct pattern emerging here. In the end, too much consolidation costs far more than no consolidation at all.

Trends and technologies might come and go, but my verdict remains decidedly fixed in its fluidity.

There is no single answer, nor will there ever be one, to our IT woes. It's a delicate balance that must be continuously struck to achieve the optimal mixture of available technologies, which changes dynamically depending on business demand. Given the current worldwide market conditions, long-term plans are measured in months or weeks, not years.

It's time for IT Departments and decision-makers to forget about technological trends and fight the urge to overhaul production environments. We cannot afford to generalize, every business and every individual business need must have its own tailor-made solution. Individual adaptations of new technologies should be tested in labs and pilot environments until they mature. It's not backward thinking, it's plain common sense.

In one sentence:

Set your mind to the future, your eyes to the present and plant your feet firmly on the past. It's a winning recipe.

Yours,

the IT Guy.

3 comments:

RichBos said...

Having escaped the corporate LAN all those years ago I enjoy reading posts like this as it reminds me how different the web world I occupy these days is when it comes to resource delivery, where 'the cloud' is in fact the best option for virtual (web) businesses requiring dynamic scalability.

That said, just because a web server platform is distributed from a perceptually robust virtual source it doesn't mean to say it's 100% resilient and anyone who doesn't design their systems with tolerance can't really blame the likes of Amazon when problems arise, as you say, technology fails, at all levels.

I deliver all my solutions from the AWS platform (they're all purely web facing) and for me there couldn't be a better option. The bonus isn't in it's resilience, but in it's ability to facilitate designed resilience, coupled, obviously, with speed of deployment and unlimited scalability. It's also a killer for R&D. I can fire off a virtual server farm in minutes, run it for a few hours, or a few days, and then dump it, only paying for the time it was up.

But I digress from the essence of this post, and I agree, at corporate LAN level the cloud perhaps isn't the fix-all answer (although I do prefer Google Apps over Exchange), and yes, neither is complete virtualisation. My Brother has recently took on systems management for a small finance company who run everything from MS SBS on a SAN, and I do mean everything, eMail, file storage, DHCP, it even runs the VPN, and guess what, it's dying.

the ITGuy said...

Don't get me wrong, the Cloud's great, as a concept. In practice it depends on the Cloud provider, AWS and Google Apps have had minimal downtime over the years. The one I used in my example absolutely sucked!

Then it's a matter of what you're trying to achieve. If you've got a website that needs hosting, putting it on the Cloud is ace! Minimal investment, no hassle, pay as you go, 99.999% uptime, the lot!

But say, for instance, you want to transfer your entire infrastructure to the Cloud. I'm not talking just corporate environments either, take for example a small 25-user company. Let's say everything goes peachy, your stuff's on the Cloud and everything's working perfectly as far as the Cloud is concerned. You still need an internet connection to access it right? And if you need LAN-like performance the Internet line needs to run at LAN-like speeds correct? Couldn't it be that the cost of owning a 100 Meg pipeline costs more than you're saving from your move to the Cloud? What if the cleaning lady spills some tea on the router and fries it? Now you need redundant equipment too!

Similarly, there's no need to build a datacenter with web farms for hosting a site. It's a waste of effort and money. AWS like you said would be faster to setup and scale, and cheaper.

It's all about design, design, design!

RichBos said...

Exactly, and hiring experts like us to do it :-)

About Me

My Photo
Andreas Panagopoulos
View my complete profile

My Business Network

Search

Loading...

Categories

Powered by Blogger.

Followers

Follow by Email