Drills For Cybersecurity Fitness

Not just in personal finance, but also in cybersecurity, the phrase “making hay while the sun shines” holds true. Cyber threats are persistent but often remain concealed until they strike. In such unfortunate instances, the safeguards you expected to work have failed, and your secure coding checklist was not followed. To make matters worse, a standby service has failed to activate, and the log files for crucial insights of the attack have gone missing. Moreover, rumour-heard reporters kept reaching out to you for details. As an example, this episode could begin with a seemingly benign web defacement and escalate to a massive SQL Injection, compromising sensitive data, causing prolonged service outages, and leaving uncertainties about the timeline and sources of the attack, despite painstaking investigation.

Many companies invest heavily in cybersecurity but pay less attention to its ongoing operations. This is not by choice but due to prioritization constraints. Tech staff often face hectic schedules and long working hours. One moment, we are racing against time to meet project deadlines; the next, we are scrambling to recover from system outages. On quieter days, we conduct training sessions, attend sales pitches, experiment with emerging technologies, and endure the monotony of meetings. If any time remains, we work on procurement tenders, technical documentation, reports, and assist with recruitment. After getting home late, we clear urgent emails, get a few hours of rest, and repeat the routine the next day. As a result, aspects of cybersecurity that are not immediately pressing are often neglected. Over time, we let our guard down, allowing the busyness to take over our minds.

Just as regular exercise is essential for maintaining physical health, consistent drills are necessary to ensure cybersecurity fitness. These drills help confirm that all safeguards, processes, and alerts remain effective and operate as intended. Most importantly, they allow businesses, staff, users, and vendors to identify any missteps and validate assumptions made from the last review, and be ready for actions should crisis erupts.

Penetration Test

The idea behind a penetration test (Pentest) is to identify our own vulnerabilities before our adversaries do. The tools allow us to scan the entire network, mapping out hosts, operating systems, protocols, services, and versions in use, and critically uncover any shadow IT without our knowledge. With scripting, we can simulate common attacks like brute-force attempts on websites and databases. It can, also, check against a list of Common Vulnerabilities and Exposures (CVEs) such as SQL Injection and Cross-Site Scripting, etc., and alert us if our hosts were indeed vulnerable. Over time, the accomplished Pentest and its checklist could serve as a key performance indicator of the enterprise and to be tracked yearly.

Red Teaming

Unlike penetration testing that aims to uncover vulnerabilities, Red Teaming simulates a threat actor’s thought process to target specific assets, whether personal data, intellectual property, critical services, or privileged access for financial gain. This exercise tests various methods and pathways to bypass corporate defenses, evade surveillance systems, and exploit both system and human vulnerabilities within an organization, all while leaving no trace.

For example, understanding that remote reconnaissance might be blocked by corporate firewalls, the Red Team could exploit exposed remote access services to establish a covert foothold. This allows them to bypass perimeter defenses and advance the exploits further, such as harvesting login credentials and creating shadow accounts with elevated privileges to exfiltrate sensitive data. In another scenario, the Red Team might study the organization’s hierarchy, impersonating a procurement officer to submit fraudulent purchase orders to suppliers or posing as a newly hired senior executive to trick employees for financial gain.

From denial-of-service attacks to DLL sideloading, DNS poisoning, identity theft, ransomware, social engineering, spear phishing, and SQL injection, Red Teaming adapts its attack vectors to bypass specific defenses. As cyber threats evolve rapidly, it is advisable to engage external Red Team services, as these vendors bring a wealth of experience and up-to-date industry knowledge.

Phishing Drills

Phishing attacks are inexpensive to launch but highly effective. They exploit human emotions such as fear, greed, empathy, and curiosity. A single inadvertent click or screen touch can lead to disastrous consequences for an organization, and with finger-tip access to Generative AI and deepfakes, it has made it worse. While technology can provide some level of protection, regular drills and user education are far more effective in mitigating human error. But are there supporting data?

A well-designed phishing drill can help test several assumptions. First, does the relevance of the drill’s theme affect the likelihood of falling prey? For instance, general staff may be quick to click on an announcement about pay structures, while healthcare professionals might be more concerned with changes in patient care regulations. Second, do regular reminders from corporate leadership help reduce phishing click rates? Third, are employees who are subjected to regular drills less susceptible than those in a control group?

The results from previous drills I’ve experienced were encouraging. Staff members were particularly vulnerable to phishing emails related to organizational matters, with a 24% fall-prey rate compared to just 8% for other themes. The drills themselves were highly effective, reducing the click rate to 15% for those who received two rounds of practice, compared to 18% for the control group. However, management intervention, such as reminders from corporate leadership, did not significantly reduce the click rate.

That said, phishing drills aren’t without challenges. They can lead to resentment or erode trust among staff. Still, they remain a worthwhile exercise as they address the reality that individuals are often the weakest link in an organization’s security.

Table-Top Exercise

A cyber breach can cause far-reaching damage to an organization beyond just its infrastructure and systems. This includes business disruption, potential privacy violations, financial and reputational losses, legal claims, regulatory penalties, and more. With so much at stake, incident response should not be confined to the tech team alone but must also involve business partners and corporate leadership, including the heads of communications and legal.

In the event of a breach, time is of the essence. The tech team must sift through vast amounts of data and devices to identify the source of the attack and neutralize it. The situation can quickly become chaotic, with team members rushing into action from all directions, calling for additional resources, deciding on the best course of action, issuing public communications, and updating users and board executives, all while new findings and hypotheses continue to emerge.

Infrequent though they may be, cyber breaches can leave both tech and corporate leadership unprepared. Some team members may be unclear about their roles, while others could be distracted by irrelevant system issues. This is where a tabletop exercise becomes invaluable. By working through realistic scenarios, the interdisciplinary incident response team can familiarize themselves with their roles, actions, procedures, and responses in the event of a breach. Only when our response becomes as automatic as a muscle reflex can we contain an attack and minimize damage as quickly as possible.

Finally, with regular drills, we will be prepared to defend, recover quickly, and minimize losses in the event of a breach.


*Copyedit: ChatGPT

Cyber Safeguards Against Human Lapses

Whether you are a novice or a seasoned professional, one of the great advantages of working in cybersecurity is the availability of well-established industry frameworks and standards. These include concepts like Zero Trust Security, Secure by Design, Defense in Depth, and guidelines from the National Institute of Standards and Technology (NIST). These frameworks offer excellent guidance on security strategies, designs, and practices. However, selecting the most effective safeguards for your specific organization requires significant on-the-job experience and possibly, lessons learned from past headline-making incidents.

Some organizations, driven by a fear of cyber threats, may choose to implement as many safeguards as their budget allows. Others might assess their risk profile, quantifying the likelihood and impact of various attack scenarios to determine the essential protections. However, neither approach is ideal. An excess of defensive solutions can create a false sense of security, expand the attack surface, and unnecessarily inconvenience users. Conversely, focusing solely on risks from potential attacks can overlook the critical human factor.

Humans are often the weakest link in cybersecurity. End users with poor cyber hygiene, who are unaware of phishing attempts and security patches, or who continue to use outdated software, pose a constant threat. However, it’s important to recognize that even the most seasoned tech staff can be part of this vulnerability. Network engineers, server administrators, and software developers are only human, and a momentary lapse in judgment could lead to dire consequences. These individuals often hold privileged credentials, which, if compromised, could allow attackers to create backdoors into servers, inject malware, or crack user passwords across the organization.

In my decades of experience as a tech chief, I have seen that the most malicious attacks require the exploitation of three key assets: the corporate network, a login identity, and a device. Unfortunately, many organizations fall victim to attacks because the human element, which serves as the first line of defense, is inadvertently compromised.

Capitalizing on Private IP Addresses

Personal computers, by design, are often viewed as devices that individuals can use with minimal restrictions. This includes the freedom to share folders and files, and to run freeware and shareware. However, the influx of thousands of “install-and-forget” Internet-of-Things (IoT) sensors and personal smart gadgets into corporate networks has made it increasingly difficult to strike a balance between mitigating endpoint exposures and providing a user-friendly experience.

One effective protection against attacks, particularly Zero-Day threats from the Internet, is the use of private IP addresses. These addresses are not routable, meaning that servers, applications, desktops, IoT devices, and other resources within this address space are not reachable from the outside. This effectively blocks malicious probes and connection requests from ever reaching them.

Enforcing Network Admission Control

Internally, the combination of user lapses and the sheer scale of desktop computers presents significant risks. It’s not uncommon to find misconfigured folders with open access to sensitive data, outdated software with known vulnerabilities, or desktops lacking the anti-malware provisions that should have been in place from day one.

With Internet ingress heavily guarded by firewalls and virtual private networks, adversaries often target users’ desktops as a soft entry point into the enterprise. Known as lateral movement in cybersecurity, this tactic allows attackers to conduct reconnaissance, exploit identities, escalate privileges, and eventually target high-value resources.

To mitigate user lapses, it’s crucial to limit users’ rights to make indiscriminate changes to their desktops. If this isn’t feasible administratively, Network Admission Control (NAC) should be adopted to enforce compliance before allowing any desktop to connect to the network. The enrollment process should ensure that all legitimate and authenticated devices are registered centrally. Upon user login, NAC will check for compliance against a pre-qualified list of cybersecurity safeguards, such as excessive rights or signs of infection. This is particularly valuable in complex environments with multiple operating systems, hardware, and software profiles.

Automating Security Patches and Configurations

A moment of human error can be costly for an enterprise. Relying too heavily on memory, written standard operating procedures (SOPs), or common practices often falls short when it comes to addressing anomalies. Server administrators are inundated with software updates, bug fixes, security patches, and configuration changes daily. A missed patch on one of thousands of servers might go unnoticed until it’s too late, especially if that server was supposed to be taken offline months ago but becomes the initial point of entry for a lateral move attack.

With frequent server additions, removals, and configuration changes, it’s essential for server administrators to maintain continuous visibility of all servers, be promptly alerted to security patches and dubious changes, and have confidence in an accurate asset list for remediation.

Patch and Configuration Management (PCM) automates asset tracking, checks for pending software updates and security patches, and applies remediation. As with any automation, it’s crucial to establish a process with identified control points before implementing the tools around it. In the case of PCM, ensuring that the enterprise keeps an up-to-date server inventory is pivotal to the overall cybersecurity operation.

Locking Up Privileged Credentials

Most user access to corporate resources is now protected by two-factor authentication (2FA). While not perfect due to risks like phishing and lifelike login pages, 2FA is still a reasonable safeguard for general user logins. However, when it comes to privileged credentials with full control and access over databases, log files, memory dumps, and the ability to spawn new processes across all servers, the stakes are much higher.

Integrating 2FA for privileged credentials in a heterogeneous environment, with a mix of third-party cloud applications, proprietary core business management software, and network and security appliances, is not always straightforward. Furthermore, the human factor often comes under scrutiny during audits. For example, should admin credentials be disabled when idle? Are there improper uses when there is no record of access? Should credentials be changed after each use? With staff turnover, disgruntled employees, and operational lapses, audits, rightfully highlight the need for action.

Like a bank managing deposits and withdrawals, organizations should use automation and tools to secure privileged credentials and allow access only upon approval. These tools can enforce audit trails, check out privileged credentials, set time limits for use, check them in upon expiry, and change passwords without the tech staff’s knowledge. Effectively, nobody should have access to privileged credentials unless cleared through the control process.

Final Thoughts

From the boardroom to the executive suite, very few would argue against investing in cybersecurity. However, one provocative thought I encountered is that even the top companies by market capitalization, despite significant cybersecurity investments, still get hacked. My response? The key to success isn’t just how much you spend, but the people on the job—those who can make or break your security efforts.

Ultimately, the effectiveness of cybersecurity lies not just in technology but in the people who implement, manage, and use it.





*Copyedit: ChatGPT

Just Too Many Digital Chiefs

Like a medical specialist providing in-depth and expert care in a specific area, the tech industry has seen a similar shake-up in recent times, resulting in a plethora of high-sounding titles such as Chief Analytics Officer (CAO), Chief Artificial Intelligence Officer (CAIO), Chief Data Officer (CDO), Chief Digital Transformation Officer (CDTO), Chief Information Officer (CIO), Chief Information Security Officer (CISO), Chief Knowledge Officer (CKO), Chief Machine Learning Officer (CMLO), and Chief Technology Officer (CTO). This trend is ongoing, as evidenced by the myriad of executive programs offered by Ivy League colleges and training schools for those keen to qualify.

The rapid tech advancement has caught many enterprises off guard. The surge of chief titles like CAIO and CMLO appears to be a knee-jerk reaction to the phenomenal growth of generative AI. In the past few years, many CISO appointments were fast-tracked to comply with regulatory mandates in some parts of the world, requiring a dedicated chief for cybersecurity amidst escalating cyber breaches and privacy invasions. On the other hand, the once in-demand CKO hiring of the late 1990s is fast-fading, likely ousted by the CDO and CAO amid a shifting focus to big data and analytics. Lastly, the de facto tech chief, the CIO, has seen its technology portfolio mostly taken over by the CTO, often to spare focus on technology.

Obviously, we do not need a management professor to tell us that too many chiefs without a chief of the chiefs would be a grave mistake in corporate governance. For instance, should the CISO be accountable for the security of an AI system? Intuitively, yes, provided the CISO has veto power over the AI because accountability requires control. From frivolous data to business insights and invaluable knowledge, should the CKO be rejuvenated and made responsible for all these seemingly discrete domains, thus offloading responsibilities from and right-sizing the CIO and CDO? Ironically, does the CDTO really fit the bill of a digital chief with goals to transform business? Realistically, must all the chiefs bear the same titles and compensations if their job sizes differ?

Nobody would argue if the Chief Executive Officer (CEO) were to be the overall digital chief, given how tech has been transforming industries and businesses. A level closer to the head of the organization allows for more direct communication, level brainstorming, and faster decision-making. However, this is impractical given the day-to-day management chores. For non-tech, non-profit, and end-user enterprises, IT is mostly a tool, not a strategy, and an expense rather than an investment that hardly creeps into the KPIs (Key Performance Indicators) of the CEO. Also, it takes more than a tech-savvy CEO to oversee the work among the digital chiefs, dealing with operational issues and personnel conflicts.

It is an opportune time to rehash the chiefs’ departments if you have close to a double numeric of digital chiefs, especially when some have no direct reports. The CIO debuted in 1980, and the CTO in 1990, when the first batch of CIOs had already been functioning well for a decade before relinquishing their tech function to the CTO. The CIO nomenclature has suffered from a birth defect with a missing specific – Technology – despite it being a substantial part of their roles. Given the continuous advancement and escalating reliance on technology, it makes perfect sense for a new chief function, the Chief Information Technology Officer (CITO), to take on both portfolios. In fact, the CITO role has emerged in recent years as a response to the increasing importance of technology in organizations, likely evolving from the CIO and CTO roles.

There are CISOs reporting to an independent entity, such as the Board, CEO, or a corporate chief on risk management, citing autonomy without being undermined by the CIO or any other chief. Unlike audits, the CISO is not an inspect-and-control function; it is the inherent cybersecurity knowledge and skills that are most valued. The CISO should be an integral part of the CIO department, incorporating security design and operating requirements into any tech development. The CISO should also be the party to endorse tech implementation and operational changes. Checks and balances can be achieved through independent audits, external consultancy, and certifications like ISO 27001 Information Security Management System.

Data does not lie but stops short of saying anything if it is not clean. Like clean water to humans, pristine data is the lifeline to AI, and the CAIO, CDO, CAO, and CMLO, despite each taking a different spin on it. The CDO should define relevant policies for data ownership, cleansing, protection, sharing, and retention, govern and coordinate efforts among the business units to ensure compliance and resolve disputes. Separately, the CAO focuses on data analytics, using tools like Excel, Python, SQL, and SPSS to justify business actions and decisions and subsequently measure performance. Raw data is akin to unrefined ore; it’s abundant and contains potential value, but in its unprocessed state, it lacks clarity and insights. Combining the CDO and CAO functions into a Chief Data and Analytics Officer (CDAO) provides oversight and management controls for transforming raw data into valuable insights.

The CMLO, equipped with strong mathematics, statistics, and coding knowledge, builds algorithmic models for applications such as generative AI, behavior analysis, and pattern recognition. The CAIO, with a similar background, spearheads AI direction, strategies, ethical use, and staff training across the entire enterprise. It is an ecosystem where the chiefs interact and work to embed AI seamlessly in all business functions.

In the context of the CDTO, the latest kid on the block, Tech and Digital are not interchangeable. As the name implies, digital transformation aims to modernize the business by leveraging progressive tech advancements. Transformation is disruptive, often requiring mindset changes, new learning, and critical thinking to debureaucratize the organization. Besides possessing necessary business acumen, having a clear mandate and authority to make decisions is crucial for effectively addressing and overcoming objections. The emergence of the CDTO is timely, fueled by attainable technologies such as Cloud, RPA (Robotic Process Automation), next-generation ERP (Enterprise Resource Planning), and the prevalence of BPO (Business Process Outsourcing) that enable businesses to own their transformation.

Except for the CDTO, all tech chiefs have either a share of operational duties or a high stake in them. In a unified approach, tech-related activities such as strategic planning, manpower forecasting, and budgeting should be integrated and coordinated across the enterprise, rather than being siloed among separate digital chiefs. This collaborative approach ensures alignment, efficiency, and effective resource allocation, enabling the organization to achieve its goals and business priorities cohesively and strategically. As the saying goes, “A house divided against itself cannot stand.” By working together, we can build a strong and resilient organization that thrives in today’s fast-paced and competitive landscape.

Merging the CIO and CTO functions into CITO and combining the CDO and CAO into CDAO are pivotal steps prior to integrating the CAIO, CMLO, and CISO functions into the same CITO office. Partnership hinges on individuals, but an integrated system, once built, will be long-lasting regardless of personnel changes and how technology evolves. Transformation is not a transient function, and the CDTO, primarily a business function, should stay abreast of technological changes and continue to lead the effort.

With the optimized hierarchy, the CITO, with combined functions of CIO, CTO, CAIO, CMLO, and CISO, will report to the CEO or their deputy, as will the CDTO and CDAO with combined functions of CDO and CAO. Knowledge will become on-the-fly with proper safeguards when generative AI becomes more intelligent and widespread, thus diminishing the CKO’s role further.

Organizational changes are risky. Dealing with potentially inflated titles, re-designation, and job resizing may unsettle many incumbents. It reminds one of those heated debates between centralizing and decentralizing tech functions in a large enterprise. Ultimately, organizations persevering through these changes will benefit from agility to cost savings, clarity of ownership, accountability, less politicking, a healthier workplace, and, finally, emerging as leaders in their industry.



*Copyedited by ChatGPT, https://chat.openai.com/chat

IT Helpdesk Who Needs Help

Once, users commented: “It is the Helpdesk who needs help, not us.” Out of frustration? Maybe. But it certainly served as a wake-up call when technology was already an integral part of all enterprise functions, and yet, support services couldn’t live up to expectations. This sentiment resonated with my own experiences of below-par customer services in various verticals, suggesting that IT helpdesks in most enterprises are perceived as peripheral and non-strategic. Can we turn it around, and how?

Begin With The Technology Leader

In my previous organization, the IT Helpdesk provided a single point of contact for problem reports, general inquiries, complaints, and requests for resources, etc. With a captive user base of 38,000, the majority being digital natives or self-proclaimed IT literates who could literally argue for any advice we offer, the demand for support was high. Dealing with two major Enterprise Resource Planning (ERP) suites, multiple coding platforms, hundreds of Cloud and bespoke applications, over 2,000 wireless access points, 120,000 end-points, servers, network devices, and an average of 8,000 user tickets monthly, the job was demanding and thankless. Annual surveys showed little to be proud of, and staff morale was low. In the mid-2000s, an outsourcing trend emerged, appealing for cost savings and improved service levels. I was skeptical but dived in because expanding staffing was considered a fixed cost.

A technology leader, understandably overwhelmed by business politics and digital intricacies, could only afford direct oversight on limited strategic functions like business relationships, applications, infrastructure, and cybersecurity, not the helpdesk, despite its significant impact on user experience. This disconnect makes it hard for the technology leader to intervene in disastrous situations. If one chooses to do so based on filtered reports, the lack of insights would just prolong and limit systemic issues from being resolved at the root cause.

Enterprise IT tends to be highly compartmentalized by functions, with each one led by functional heads. The Help Desk, to be frank, is not glamorous. For the Help Desk, commanding the least respect in the enterprise hierarchy and having no authority over the priority of fellow engineers and heads of the functions responsible for the fixes desperately needed for users, this fuels more user frustrations.

Next, in terms of workforce, morale, and commitment, we cannot expect these personnel challenges in the helpdesk to diminish with outsourcing. In the unfortunate event of being stuck with a slacking third-party, you need the technology chief to pull their weight for prompt remediation. It is essential for the chief to commit undivided attention to helpdesk operations, cultivating a strategic relationship with the service provider, regularly reviewing service levels, and keeping a pulse on the ground for concerns and expectations for sustainable improvement and success.

Obsess to Serve and Services

The Helpdesk is a people business, where the users’ experience heavily impacts the perceived performance of the entire IT organization. While technical competency is essential, what differentiates an exceptional helpdesk from a mediocre one is a deep sense of urgency, empathy, passion to serve, and committed leadership. These factors go a long way toward user satisfaction, even when practical resolutions are not immediately possible at times.

Organizations that are truly obsessed with customer service not only act upon users’ feedback but proactively seek service enhancement. Here are some practices:

1. First-Call Resolution

Many of us have had the poor experience of making repeated calls to the helpdesk for the same issue when the advice given initially doesn’t work. First-Call Resolution defines the rate of problem resolution at the first contact. It is a sensible performance target for customer services to judge the accuracy of advice and attention to details. Also, it is an indicator that one should watch closely for the growing technical maturity of the helpdesk since the reported issues can range from technically trivial to complex and sophisticated. The higher the rate, the more technical capable the helpdesk are.

2. Minimal Referrals

An enterprise helpdesk will have to deal with a vast amount and variety of problems and user enquiries daily. Certainly not all the issues can be addressed by the agents at the front desk, and upon exercising due care, some cases may be referred to the backend engineers for advice. However, an effective helpdesk should function as a cushion to engineers. Excessive referrals without proper diagnostics can be a sign of incompetence or negligence of the helpdesk.

3. Call Listening

Despite the well-intended note that “your call will be recorded for service improvement purposes,” users are more concerned about immediate resolution, not service step up in the future. Giving that the first impression is likely a long-lasting one, every effort should be made to ensure a delightful user experience at the first call. Implementing call listening allows supervisors to join calls selectively, watching conversations in real-time and intervening if necessary for advice and immediate solutions.

4. Personal Touch

Despite the growing intelligence of AI-powered chatbots, nothing beats a personal touch with attentive and empathic listening for greater user satisfaction. Agents can identify themselves before interacting with the caller, leave a contact for a return call, verify understanding of reported issues, share possible causes to support the recommended actions to the users, and offer a personalized experience.

5. Mysterious Self-Audit

Many enterprises’ IT have experienced audit fatigue and yet another self-inflicted one will certainly set them to the verge of burn-out. Unlike most regularized audits, mysterious audit is impromptu and specific without prior notice to the helpdesk. It involves trained users appearing unexpectedly to assess the helpdesk’s listening, communication, problem-solving skills, and ability to manage difficult users and unreasonable demands. It helps flag out errand agents for specific coaching among many other systemic issues. Mysterious audit is lightweight and practical because it does not deal with the energy-sapping tasks like documentations and board reports, etc.

6. Self-Experiencing

Providing tech specialists an opportunity to serve as user-support agents allows them to experience digital services as users do, enhancing the design and friendliness of products. It just happens too often that missing details and miscommunications between the helpdesk and the tech. specialists causing  unnecessary delays to the fixes. Also, it is hard for the tech. specialists to appreciate what the poor experiences users may have with the digital services without a direct interaction with them.

In a highly competitive services sector, the one emerging and sustained at the top would have differentiated support services from the pack, not just products alone.

*Copyedited by ChatGPT, https://chat.openai.com/chat

Capacity-on-Demand (Part 2 of 2)

It’s important to acknowledge that not all servers in a data centre run at full capacity. Peak loads often fail to sustain, let alone utilize the extra resources provisioned by some system administrators to preempt server crashes during load peaks. What if we could harness and repurpose 20% of such idle capacity from a 1,000-server farm while enhancing service levels and adding value?

Server Virtualization

Many daily activities in a data centre involve moving (servers), adding (servers), and changing (server configurations), commonly known as MAC (Move, Add, Change) operations. These seemingly routine tasks become increasingly prevalent and complex in many large enterprises with a growing array of operating systems, databases, web and application services, and geographically dispersed data centres.

From hardware setup to software configuration, virtualization slices physical hardware into multiple programmable servers, each with its CPU, memory, and I/O. Strictly speaking, once automated, software work incurs no labour cost, allowing MAC activities to scale swiftly with cost-effectiveness, precise accuracy, and no boundaries.

Virtualization underpins a significant shift in data centre operations:

Firstly, we no longer need to oversize servers, knowing that CPU, memory, and storage resources can be dynamically adjusted. This, however, doesn’t diminish the importance of proper capacity sizing, but it eliminates the psychological “more is better” effect.

Secondly, we no longer need to panic when a server suffers from the infamous “crash of unknown cause.” A hot or cold standby server, utilizing harvested resources, can quickly minimize user impact.

Thirdly, cloning a server becomes effortless, especially when enforcing the same security settings across all servers, minimizing human oversights.

Fourthly, it serves as a kill switch during a suspicious cyberattack by taking a snapshot of the server and its memory map for forensic purposes before shutting it down to contain the exposure.

Workstation Enablement

High-end workstations are typically reserved at desktops for power users who work with large datasets in tasks like data modelling, analytics, simulation, and gaming. Thanks to significant advancements in chip technology, virtualization has gained substantial traction in high-performance computing (HPC). This allows more desktop users to have workstation capabilities and provides ready-to-use specialized HPC software, such as MATLAB, SPSS, AutoCAD, etc., maintained centrally without the hassle of per-unit installation. Both CPU- and GPU-intensive workloads are processed at the data centre, with screen changes, for example, transmitted back to the user on a lightweight desktop computer. Achieving decent performance largely depends on sufficient desktop bandwidth, with a minimum of 1 Gbit, based on my experience, assuming the enterprise has ample bandwidth within the data centre.

Network Virtualization

Computer networking primarily involves switching and routing data packets from source to destination. It seems simple, except when addressing MAC activities such as firewalling a group of servers at dispersed locations for a business unit dealing with sensitive data or filtering malicious traffic among desktops. The proliferation of IoT devices and surveillance cameras with delayed security patches only exacerbates the situation.

By creating logical boundaries at layer two for data switching or layer three for data routing among the servers in the data centre, users’ desktops, or specialized devices, one can easily insert either a physical or software-based firewall into the data path to protect workloads.

Crucial Requirement

While both the Cloud and Virtualization offer similar capabilities in agility within modern IT, the staff’s expertise in network and system architecture remains the most crucial requirement for the successful implementation and realization of the benefits. It is timely for enterprises to incorporate Generative AI into their technology workforce, allowing them to learn and grow together, promoting knowledge retention and transfer.

Capacity-on-Demand (Part 1 of 2)

Digital agility is of utmost importance in modern business, encompassing speed and responsiveness. For instance, a rapid turnaround to address an infrastructure bottleneck, a quick resolution to erroneous code, a prompt diagnosis of user-reported issues, or an immediate response to contain a cyberattack would undoubtedly be appealing. Nonetheless, achieving agility in a large enterprise is no easy task, and these efforts can be hampered by a risk-averse corporate culture, untimely policies, and staff competency.

I define Capacity-on-Demand as an organization’s ability to scale up digital capacity, specifically focusing on infrastructure capacity in this post, as and when it is required. A highly versatile, high-performing, and secure infrastructure is a crucial asset for any enterprise, with strict uptime and performance requirements often committed as service levels to their business partners by Enterprise IT.

However, this system works well only when the operating environment remains unchanged. As usage increases, businesses modernize, technologies become obsolete, and maintenance costs for aging equipment escalate, many enterprise technology chiefs are faced with the due diligence of upgrading their infrastructures approximately once every five years to keep up with user demand and application workloads.

But what alternatives exist when this upgrade entails intensive capital outlay for a system likely to be useful for only 60 months? Even with the blessing of new investment, the epic effort to commission the major upgrade, including technical design, prototyping, specifications, installation, and other administrative overheads, may amount to a woeful 18 months or more. The Return on Investment (ROI) in such a scenario is utterly inefficient!

Cloud Storage

From mirror copies to backup and archival copies of enterprise data, meeting operational and legal requirements necessitates provisioning nearly triple the storage capacity for every unit increase in data volume. In a large enterprise, this total can amount to tens of petabytes or even more. Dealing with such large-scale and unpredictable demands often leads us to consider Cloud storage. It offers elasticity and helps reduce data centre footprint. However, it also assumes no legal implications on data residency, and the organization must be willing to accept less desired contract terms on service levels, data privacy, liability, indemnity, security safeguards, and exit clauses.

Storage Leasing

Storage leasing presents a viable alternative if you possess the economy of scale, a mid- to long-term horizon, and a fairly accurate but non-committal year-by-year growth prediction during the contract period. These considerations are crucial for a cost-effective proposal.

Similar to Cloud storage, storage leasing helps alleviate capital constraints and smoothes out lumpy expenses in the budget plan over the years, a preferred approach by some finance chiefs. Additionally, you have the option to choose between finance lease with asset ownership or operating lease to save tedious efforts in asset keeping.

Sporadic Demands

Despite the forecasted storage growth rate, addressing urgent demands within a short notice necessitates pre-provisioning spare capacity onsite without activating it. I used to include such requirements in the leasing contract at a fraction of the total cost, enabling the option to turn it on and off as needed or normalize it as part of the forecasted growth, although the latter approach prevailed in my previous environment.

Access Speed

Does the access speed to the Cloud differ from onsite storage? It is a rather complex assessment. Apart from factors like drive technologies, data transfer protocols, and cache size, onsite storage in any end-user environment, where users and employees are mostly located within the enterprise, would provide a better user experience since the speed is not limited by Internet bandwidth. Additionally, we should consider the nature of data that nowadays, is predominantly machine-generated data such as transaction logs, user access records, and security events, etc. These voluminous and real-time data are latency-sensitive, consuming much of the Internet bandwidth, making it advisable to be located closest to the storage.

Storage Operations

Equipping the workforce with the necessary expertise and knowledge of proprietary tools to manage and operate Cloud or onsite storage is crucial. Cloud storage offers ease of provision and management, including storage provisioning, backup & recovery, and site redundancy, etc. However, I am hesitant about operating a black box in a heterogeneous environment without understanding its internal dynamics and having a plan for skill transfer. Storage is a significant component of the entire enterprise technology stack, and highly committed and collaborative efforts from the storage provider are essential for planning and successfully executing drills and post-reviews and avoiding not my problem syndrome.

Onsite storage will entail more technical management overheads compared to the Cloud. One can include the required expertise and make provisions in the contract to support the adopted solution. The service provider, backed by the principal, will have the most experienced personnel to support your organization. Once again, we must not overlook the importance of having a plan for skill transfer.

Mutual Trust

Technology leasing is not a novel concept. The key is to customize a contract to bridge the gap left by the Cloud. The initial journey may encounter challenges, but with shared goals and mutual trust, it can lead to a long-term win-win partnership. Throughout my experience, I have utilized both Cloud and onsite storage, ranging from file storage to block and object storage, and transitioning from SCSI to Fiber Channel connectivity and finally to all-flash drives, to meet my needs. At the end of each contract, there was a comprehensive review of the overall service performance and upcoming technologies, resulting in reduced data centre space and energy footprint, as well as lower per terabyte cost for the next phase of development. This approach also provides the right opportunity to give a new lease of life to the storage infrastructure.

Next Post

On-demand provisioning is far from complete without the agile provisioning of the server and network capacity which I will cover in the next post.

*Post is copyedited by ChatGPT, https://chat.openai.com/chat

Let Us Define IT Quality

The consequences of technology failures, such as system crash, cyber breach or sluggish apps performance can be devastating. It affects businesses, operations, customers and users. It could even be a life-or-death matter in the event of a disrupted surgical operation or a breached IoT sensor in an autonomous vehicle.

In my incubating years of career in management, I was quite struggle with the performance of IT. Not because we performed poorly as a team, but our positive results did not necessarily resonate well with the business. Could it be a stereotype in the community I supported, an excuse from the responsible party, or indeed substandard IT work?

In the storm of digital transformation today, both the Business and IT are tightly coupled to the extent that one’s performance is dependent on the other. For instances, an ill-formed data-driven workflow could not benefit from simply automation if we failed to have an integrated data source, and incoherent data sets will frustrate our customers upon receiving duplicated marketing materials in regardless of the system performance. In such situations when both parties’ performance is at stake, it could lead to many unpleasant arguments, fault findings and finger-pointing in the project room.

Performance is not measurable without the indicators defined, commonly known as Key Performance Indicators (KPIs) in many organizations. More specific to the technology as it cuts across a variety of business functions, the defined KPIs should align to the business goals and thus, encourage co-ownerships. It should also appeal to the watchful stakeholders, like the funding and risk management entities across the enterprise.

The Seven Performance Indicators

If I take a business perspective, a high-quality system shall deliver accuracy, performance, security, stability and be user-friendly.

Accuracy – it refers to the precision of the constructed system in meeting the specific business requirements and ensuring the important aspects of data integrity, authenticity and correctness in data processing and presentation of information. From my experience, data quality is a make or break to your project. It is also the hardest to deal with in an environment with dispersed data sets and multiple ownerships.

Performance – a high performing system will provide good and consistent response time to support the designed workload as agreed between the IT and Business. Typically, this would require adequately sized system capacity, optimized database queries and rigorous load-testing in various use scenarios before releasing it for general use.

Security – system security is utmost important. It concerns the required defending tools, best practices, controls, methodologies, and threats intelligence, etc. deployed to safeguard the digital assets, personal data and privacy. Any security events, real-time alerts and triggers to the intended responders should be defined prior to the system development. Besides the technical means, continuous trainings and user education are essential to mitigate the risk of humans commonly regarded as the weakest link in cyber defence.

Stability – a stable system will have minimal unscheduled disruption, typically quantified by the uptime per year as committed by the IT in agreement with the Business. There are numerous mechanisms, like having standby hardware, secondary site and dual transmission links, etc. to ensure continuous operations should any failure happen to the primary resource. The extent of redundancy is often a trade-off between the cost and business criticality, and again a decision of the IT and Business.

User-Friendliness – last but certainly not the least is user-friendliness that a poor user interface with inconsistent layouts, misleading error messages, and cluttered clickable actions will just annoy the users. Web design is a separate professional skill from IT and some organizations have resorted to external help on design thinking to address the issue.

As IT is mostly concerned with the continuous operation and changes to the system at the end of the day, there are additional quality attributes, like Scalability, Maintainability and Supportability to be cared for.  Unless we take further efforts to formalize it across the enterprise, these KPIs are, unfortunately, less known or perceived lower in priority to the business.

Scalability – it is like gauging the IT’s ability to commission additional system resources to cater for the projected increase in workload within just weeks or the sooner the better. It goes beyond just Cloud for the solution as one may need to review the respective scale up for the in-house supporting services like firewall, intrusion detection, load balancer, Internet bandwidth, transaction logging and data backup capacity, etc.

Maintainability – a hard-to-maintain system will be rigid to adapt to or unable to cope with the business changes without a major overhaul or huge investment. Addressing the issue would require a combination of supporting technologies, modular software design, technical skills and experience, etc.

Supportability – IT is a knowledge economy. Knowledge, skills and expertise (KSEs) prevail for a high-quality system. When a coding error, configuration mistake or operation oversight could result in dire consequences, do we ever concern if we have the required KSEs and human capital to support and sustain the continual operations of a newly adopted technology?

In summary, IT performance is a collective effort of the organization. It must be defined in the perspectives of the IT and Business with the objectives below:

1. Bring clarity of digital performance, objective assessment, and harmony across the enterprise in the pursuit of business transformation.

2. Nurture a digital-literate community for further work on technology governance and strategies to achieving the performance goals.

3. Make clear to the stakeholders for the essential investments and priorities of IT for the overall organizational performance, rather than technology performance.

Quick-Win For Accuracy, Not Speed

“Quick-Win” in software development is sort of a down to earth version of Agile methodology commonly employed in many organizations today. It literally means quick delivery of project results by breaking down a sizable software project into several small manageable modules. The quick turnaround also enables the business to spread out the overwhelming activities like functional test and data preparation across all the modules and provides timely feedback to the IT for quick fixes. Quick-Win is a business jargon that resonates well with many stakeholders to the extent that sometimes, it overshadows accuracy and quality, the critical attributes in any technical work.

A large enterprise tends to have many projects in progress and in plan at any point in time. For the projects in progress, it is vital to commission them as per committed timeline without any distraction. The stake of missing target is high such as implications to cost, business and most concerning, staff competency. For the projects in plan, the various business owners are eager to commence and commission the work at the shortest possible timeframe for understandably, the first-mover advantage. I often caught a quick frown from the project sponsors on the proposed project timelines, no matter in weeks, months or sometimes, years for mega projects. Frankly, there is no standard approach to decide the optimal project schedule as situation varies. One usually assesses the required efforts of design & coding, technical complexity, hardware readiness and human resources, etc. but these factors are far less significant and deliberated than the extent of the business ownerships, data readiness, clarity and stability of the business requirements. How possibly Quick-Win can be tweaked to tackle the issues and deliver results not only in speed but with accuracy in project timelines?

Among all the impact factors to project timelines, the most inconspicuous one is the extent of business and process ownership involved in a project. Seeking consensus for diverging business requirements and harmonizing processes cutting across multiple owners in a large enterprise just take time if not politically challenging. Sensitive matters like refusal ownerships, transferred workload and added responsibilities among the business units do not command much attention until confronted upon the details given by the Quick-Win at the module level.

Separately, we know the functional specifications in software development is a high-level abstraction of the system design and coding logics derived by IT based on their best understanding. What happen if the business has overstated the requirements? How about a workflow looking simple to the business but not in proportion to the intense IT efforts as proposed, what could be the causes? What is less known on Quick-Win is that the technical considerations and IT efforts provided by Quick-Win do allow both the business and Enterprise IT coming to terms at the fitting moments before the technical work begin. I have some pleasant experience with Quick-Win as once commented by the business, “we do not realize that the required data source for the function is missing and let us take a different approach”. As for the IT, “we can optimize the IT efforts since we have better understanding of the requirements now”. Yet, the most astonishing is from the project sponsor, saying that “I do not think this function is essential to the business since it takes so much effort and let us defer it”. The details exposed by Quick-Win does force meaningful discussion and sensible compromise between the business and IT on the project timeline without jeopardizing the project outcomes. That’s win-win for all.

Enterprise & IT

Let me contextualize “Enterprise” and “Enterprise IT” for the intended subjects to be written in this space. In my views, a large “Enterprise” tends to have tens to hundreds of thousands of digitally enabled employees and users, and a plethora of business functions, processes, digital solution and services. Striving hard to stay ahead among their peers, many of them have been investing aggressively in technologies for business growth, innovations, and service excellence. Such investments are often business driven, time sensitive and cyclic in nature. Depending on the digital maturity of the organizations, many business decisions on technology investments at the unit level may not have thorough considerations on the impact to the enterprise-wide infrastructure, software interoperability or availability of technical competency for continued operations. This sets out the greatest challenges and concerns for the “Enterprise IT”, a central office who is responsible for the governance, planning, management and operations of the entire IT landscape across the enterprise.

Besides specific thoughts to be shared to address the challenges above, you may wonder why many mothers and fathers working in IT I know do not recommend their children to follow their footsteps. In addition, why is there stereotype that techies are poor in communication? What is the future-proof discipline in IT, a mobile apps developer, network engineer, AI or cybersecurity specialist? What does it take to becoming a high-performing IT professional? Why do many technologies fail to gain a foothold in large enterprise? These are just few subjects I have in mind to share in the future. Feel free to suggest.