agility - TMHor Space

It’s important to acknowledge that not all servers in a data centre run at full capacity. Peak loads often fail to sustain, let alone utilize the extra resources provisioned by some system administrators to preempt server crashes during load peaks. What if we could harness and repurpose 20% of such idle capacity from a 1,000-server farm while enhancing service levels and adding value?

Server Virtualization

Many daily activities in a data centre involve moving (servers), adding (servers), and changing (server configurations), commonly known as MAC (Move, Add, Change) operations. These seemingly routine tasks become increasingly prevalent and complex in many large enterprises with a growing array of operating systems, databases, web and application services, and geographically dispersed data centres.

From hardware setup to software configuration, virtualization slices physical hardware into multiple programmable servers, each with its CPU, memory, and I/O. Strictly speaking, once automated, software work incurs no labour cost, allowing MAC activities to scale swiftly with cost-effectiveness, precise accuracy, and no boundaries.

Virtualization underpins a significant shift in data centre operations:

Firstly, we no longer need to oversize servers, knowing that CPU, memory, and storage resources can be dynamically adjusted. This, however, doesn’t diminish the importance of proper capacity sizing, but it eliminates the psychological “more is better” effect.

Secondly, we no longer need to panic when a server suffers from the infamous “crash of unknown cause.” A hot or cold standby server, utilizing harvested resources, can quickly minimize user impact.

Thirdly, cloning a server becomes effortless, especially when enforcing the same security settings across all servers, minimizing human oversights.

Fourthly, it serves as a kill switch during a suspicious cyberattack by taking a snapshot of the server and its memory map for forensic purposes before shutting it down to contain the exposure.

Workstation Enablement

High-end workstations are typically reserved at desktops for power users who work with large datasets in tasks like data modelling, analytics, simulation, and gaming. Thanks to significant advancements in chip technology, virtualization has gained substantial traction in high-performance computing (HPC). This allows more desktop users to have workstation capabilities and provides ready-to-use specialized HPC software, such as MATLAB, SPSS, AutoCAD, etc., maintained centrally without the hassle of per-unit installation. Both CPU- and GPU-intensive workloads are processed at the data centre, with screen changes, for example, transmitted back to the user on a lightweight desktop computer. Achieving decent performance largely depends on sufficient desktop bandwidth, with a minimum of 1 Gbit, based on my experience, assuming the enterprise has ample bandwidth within the data centre.

Network Virtualization

Computer networking primarily involves switching and routing data packets from source to destination. It seems simple, except when addressing MAC activities such as firewalling a group of servers at dispersed locations for a business unit dealing with sensitive data or filtering malicious traffic among desktops. The proliferation of IoT devices and surveillance cameras with delayed security patches only exacerbates the situation.

By creating logical boundaries at layer two for data switching or layer three for data routing among the servers in the data centre, users’ desktops, or specialized devices, one can easily insert either a physical or software-based firewall into the data path to protect workloads.

Crucial Requirement

While both the Cloud and Virtualization offer similar capabilities in agility within modern IT, the staff’s expertise in network and system architecture remains the most crucial requirement for the successful implementation and realization of the benefits. It is timely for enterprises to incorporate Generative AI into their technology workforce, allowing them to learn and grow together, promoting knowledge retention and transfer.

Digital agility is of utmost importance in modern business, encompassing speed and responsiveness. For instance, a rapid turnaround to address an infrastructure bottleneck, a quick resolution to erroneous code, a prompt diagnosis of user-reported issues, or an immediate response to contain a cyberattack would undoubtedly be appealing. Nonetheless, achieving agility in a large enterprise is no easy task, and these efforts can be hampered by a risk-averse corporate culture, untimely policies, and staff competency.

I define Capacity-on-Demand as an organization’s ability to scale up digital capacity, specifically focusing on infrastructure capacity in this post, as and when it is required. A highly versatile, high-performing, and secure infrastructure is a crucial asset for any enterprise, with strict uptime and performance requirements often committed as service levels to their business partners by Enterprise IT.

However, this system works well only when the operating environment remains unchanged. As usage increases, businesses modernize, technologies become obsolete, and maintenance costs for aging equipment escalate, many enterprise technology chiefs are faced with the due diligence of upgrading their infrastructures approximately once every five years to keep up with user demand and application workloads.

But what alternatives exist when this upgrade entails intensive capital outlay for a system likely to be useful for only 60 months? Even with the blessing of new investment, the epic effort to commission the major upgrade, including technical design, prototyping, specifications, installation, and other administrative overheads, may amount to a woeful 18 months or more. The Return on Investment (ROI) in such a scenario is utterly inefficient!

Cloud Storage

From mirror copies to backup and archival copies of enterprise data, meeting operational and legal requirements necessitates provisioning nearly triple the storage capacity for every unit increase in data volume. In a large enterprise, this total can amount to tens of petabytes or even more. Dealing with such large-scale and unpredictable demands often leads us to consider Cloud storage. It offers elasticity and helps reduce data centre footprint. However, it also assumes no legal implications on data residency, and the organization must be willing to accept less desired contract terms on service levels, data privacy, liability, indemnity, security safeguards, and exit clauses.

Storage Leasing

Storage leasing presents a viable alternative if you possess the economy of scale, a mid- to long-term horizon, and a fairly accurate but non-committal year-by-year growth prediction during the contract period. These considerations are crucial for a cost-effective proposal.

Similar to Cloud storage, storage leasing helps alleviate capital constraints and smoothes out lumpy expenses in the budget plan over the years, a preferred approach by some finance chiefs. Additionally, you have the option to choose between finance lease with asset ownership or operating lease to save tedious efforts in asset keeping.

Sporadic Demands

Despite the forecasted storage growth rate, addressing urgent demands within a short notice necessitates pre-provisioning spare capacity onsite without activating it. I used to include such requirements in the leasing contract at a fraction of the total cost, enabling the option to turn it on and off as needed or normalize it as part of the forecasted growth, although the latter approach prevailed in my previous environment.

Access Speed

Does the access speed to the Cloud differ from onsite storage? It is a rather complex assessment. Apart from factors like drive technologies, data transfer protocols, and cache size, onsite storage in any end-user environment, where users and employees are mostly located within the enterprise, would provide a better user experience since the speed is not limited by Internet bandwidth. Additionally, we should consider the nature of data that nowadays, is predominantly machine-generated data such as transaction logs, user access records, and security events, etc. These voluminous and real-time data are latency-sensitive, consuming much of the Internet bandwidth, making it advisable to be located closest to the storage.

Storage Operations

Equipping the workforce with the necessary expertise and knowledge of proprietary tools to manage and operate Cloud or onsite storage is crucial. Cloud storage offers ease of provision and management, including storage provisioning, backup & recovery, and site redundancy, etc. However, I am hesitant about operating a black box in a heterogeneous environment without understanding its internal dynamics and having a plan for skill transfer. Storage is a significant component of the entire enterprise technology stack, and highly committed and collaborative efforts from the storage provider are essential for planning and successfully executing drills and post-reviews and avoiding not my problem syndrome.

Onsite storage will entail more technical management overheads compared to the Cloud. One can include the required expertise and make provisions in the contract to support the adopted solution. The service provider, backed by the principal, will have the most experienced personnel to support your organization. Once again, we must not overlook the importance of having a plan for skill transfer.

Mutual Trust

Technology leasing is not a novel concept. The key is to customize a contract to bridge the gap left by the Cloud. The initial journey may encounter challenges, but with shared goals and mutual trust, it can lead to a long-term win-win partnership. Throughout my experience, I have utilized both Cloud and onsite storage, ranging from file storage to block and object storage, and transitioning from SCSI to Fiber Channel connectivity and finally to all-flash drives, to meet my needs. At the end of each contract, there was a comprehensive review of the overall service performance and upcoming technologies, resulting in reduced data centre space and energy footprint, as well as lower per terabyte cost for the next phase of development. This approach also provides the right opportunity to give a new lease of life to the storage infrastructure.

Next Post

On-demand provisioning is far from complete without the agile provisioning of the server and network capacity which I will cover in the next post.

*Post is copyedited by ChatGPT, https://chat.openai.com/chat