By Wil Cunningham, head of WilPower IT Consulting. An ex-RBS senior IT manager and now a contractor with Lloyds Banking Group and other banks.
Wil Cunningham, an independent technology consultant who oversaw RBS’s Data Centre Optimisation (EDCO) programme in Scotland and has spent the last two years as a senior contractor working on the integration of LloydsTSB and Halifax Bank of Scotland here shares his best practice guidelines for the best way to optimise your data centre. The HBOS Live command, control and execution project included integration work and the Lloyds Banking Group (LBG) disaster recovery excellence programme in London. This latest implementation, recently completed, has reinforced the primary of the ‘make the most of what you have’ approach which he believes is crucial to any data centre optimisation project.
It’s a sad fact that as individuals or professionals we never really think about optimising our budgets, until some unplanned external factor forces us to reassess and reprioritise. The banking crisis of 2008 was one such huge reality check that forced banks to ‘sweat’ existing Information, Communication & Technology (ICT) resources in a time of budgetary constraint.
Following the long-running aftermath of the economic downturn, we are now in the midst of the eurozone crisis, so ICT budgets are still constrained, yet demand for space and power in data centres is ever increasing. So, with less money to continue to fund data centre expansion and buy ourselves out of trouble, the question arises what can be done?
There are solutions and I’d like to share some of the things I learnt while working on the RBS Data Centre Optimisation programme and the HBOS Live disaster recovery initiative, which I believe solved the ‘sweating your assets’ conundrum by focusing on the following fundamentals that can also be used as a guideline for others:
Step 1: The process – Challenge policies and standards
Policies are rarely formed proactively, most evolve due to past problems and existing staff approaches. Why we do things today are usually the result of an historic problem yesterday, either internally or mandated due to external forces. Some historic policies were formed due to previous technology constraints or what was perceived to be best at a certain point in time, but do we ever really review or challenge these assumptions? I continually challenged policies on the EDCO programme and asked why were we co-locating test and production systems, why were we allowing critical and non-critical systems to be located within the same data centres, or shared infrastructure, and why were we providing disaster recovery services for minor systems?
The result of this approach was senior stakeholder support to change the policies and create a new ‘placement policy strategy’. By mandating that test beds, production support beds and production should never share the most valuable, critical data centre space this allowed the team I lead to plan to optimise the scope and spending power of the project to extend the life of the data centre.
Additionally, I then challenged the standards, reducing the numbers of backup copies taken, removing the need for minor systems to have disaster recovery and stopping new systems being given six years growth capacity, as it was no longer required nor risk appropriate with the levels of built-in resiliency in the new technologies and servers being deployed, especially not in the ‘zero growth world’ in which we find ourselves. We were then able to re-use the freed up storage rather than annually buying more. In addition, we added power capping limits to ‘idling’ servers that also reduced power wastage, rejuvenating the data centre and sweating its assets for very little extra cost.
Step 2: Premises
The RBS optimisation programme had already created a global dedicated test centre from a moth-balled, then re-commissioned, data centre. This allowed for the transfer of all test and production support environments from our critical data centres to this new facility, allowing re-usable production space to be freed up. The programme ‘lifted and shifted’ the kit ahead where appropriate and provided some consolidation and re-stacking opportunities, optimised with technology upgrades. Finally, we employed multiple weekend moves, with the business units prioritising the beds and the technical proving teams checking out the facilities. Having flexible premises and working hours helped immensely.
Step 3: The Technologies
The programme utilised virtualisation technology and servers at ratios of circa 20:1; to most Wintel and Citrix infrastructure depending on the age and type of the applications. Vendors helped us here as they had tools that scanned our estate both for the ‘low hanging fruit’, but more importantly to checkout and assess application compatibility with the new virtualisation technologies. For other mid-range platforms we employed consolidation that provided a great way of reducing power and space and allowed a re-stacking of systems against their individual criticalities. Policy now dictated not to stack critical systems with non-critical systems, as by default we would be giving management information (MI) systems the same level of disaster recovery (DR), resilience and support as payments systems which is not cost effective in the long-run and is unnecessary.
Step 4: People
I made platform owners individually accountable for their part in the data centre estate optimisation projects to maximise success. This covered decommissioning and potential re-use (part of new installations); new technology adoption that benefited the data centres; opportunities via standards or process; and finally general waste / cost reduction – i.e. gold stocks with third parties at reduced costs. We then created inter-platform competition by actively supporting and funding the platforms which had the best ideas with respect to power and space efficiencies. We created a real sense of healthy competition. Every platform had its own initiatives, the success came from choosing the right ones and rewarding and recognising the best efforts. People who could demonstrate good ideas and efficiency were rewarded.
We created a central log of the opportunities and generically sized them against four weighted criteria, covering cost / benefit / risk / do-ability (could the platform easily implement and support the change).
We kept it simple with a process that was like a cut down Request For Information (RFI). It had to be simple because the platform owners didn’t want to spend too much time on opportunities that might not ‘fly’. The platform reps and data centre staff felt empowered to challenge each other, however the weighted scores showed the most efficient opportunities. We plotted the opportunities onto a matrix and the top right best opportunities were the ones we invested in.
We challenged all internal and external vendors to join the team and support and get behind our goals, we only chose partners who could demonstrate proven previous delivery capabilities.
All external vendors were obviously keen to sell us new technology but by driving them through the same process as the platform owners, we knew we were funding the right initiatives. The major caveat here was that at the ‘coal face’ the platforms needed to support whatever technologies were recommended by the vendors, so they sponsored the vendor initiatives and ultimately signed off against the standards and roadmaps.
Step 5: Funding
The principle of being ‘vendor agnostic’ encouraged the vendors to bid for each other’s business, generating savings in terms of ‘loss leaders’. It meant we could cheaply get new kit and support at greatly reduced prices, create opportunities to ‘bundle up’ deals with business-as-usual (BAU) spend, plus get the experience gained from vendors in other installations. We extended the same simple opportunity sizing and prioritisation model for all infrastructure spend for BAU and for specific projects. We introduced a capital expenditure (Capex) board that brought these funding elements together and made sure we got the best vendor deals in the context of ‘bundling’.
Step 6: Processes
I recommended decommissioning as part of the project lifecycles on both the EDCO and LBG optimisations in recent years, so that all programme funding estimates had to contain a percentage of the new infrastructure spend as a decommission cost in the estimate and business case presentations. By capturing decommissioning estimates either at the pre-commence or commencement stage, the Data Centre Management (DCM) teams could get the earliest possible overview. Additionally, platform forecast spend and capacity plans could be worked out and future data centre demand set and plans laid accordingly. This data along with the actual usage of meter readings provided the teams with the best chance of efficiently planning to use the space and power, rather than just being reactive. Finally, I developed the challenge and mantra for all DCM staff that ‘nothing goes in, unless something comes out’. This reinforced that decommissioning remained mandatory. Leaving old power hungry systems and redundant DR solutions in place, which were doing nothing or replicating effort was simply not an option.