Friday, April 25, 2025
spot_img
More
    HomeBusiness InsightsLearning from the Microsoft CrowdStrike Outage: Conversation with Abhishek Tripathi, S&P Global 

    Learning from the Microsoft CrowdStrike Outage: Conversation with Abhishek Tripathi, S&P Global 

    In an era where digital security is paramount, the recent Microsoft CrowdStrike outage has sent shockwaves across industries worldwide. This unprecedented incident demonstrated the potential for chaos when a seemingly simple operating system update spirals out of control, impacting organizations across all sectors.

    To shed light on this critical event, Tech Achieve Media recently spoke to Abhishek Tripathi, Director of Information Security at S&P Global, for his expert insights. With over two decades of experience in cybersecurity and process optimization, Abhishek has held leading roles in prominent companies, making him well-versed in navigating complex security challenges.

    Both Microsoft and CrowdStrike have released details of the technical flaw that precipitated the incident and have issued apologies for the inconvenience caused. However, beyond these statements, the conversation aimed to take a deeper dive into understanding the implications of this Microsoft CrowdStrike outage and the lessons that can be learned to prevent such occurrences in the future.

    TAM: What is your analysis and take on the CrowdStrike Outage? 

    Abhishek Tripathi: When discussing this particular incident, it’s clear that the impacts of recent events have been significant. Consider the case of SolarWinds, which affected even the U.S. Department of Defense, illustrating the far-reaching consequences of breaches involving major vendors. In contrast, the Log4j vulnerability, though originating from a smaller component, also had widespread impact.

    As digitalization continues, the potential damage from incidents is increasing, especially when they involve industry leaders. For example, CrowdStrike, a top EDR provider globally, is trusted by many organizations. However, when that trust is compromised, it can result in significant losses. According to Parametrix, an insurance organization, losses amounted to around $5.4 billion.

    Beyond financial impact, the reputational damage to companies like CrowdStrike and Microsoft is substantial. These incidents can trigger a wave of phishing campaigns and new social engineering tactics, affecting the global landscape in the long run. This situation highlights the importance of robust patching and risk management methodologies.

    While it’s beneficial to treat vendors as partners, they shouldn’t overshadow the business itself. Ironically, this imbalance is what contributed to the current scenario. In the early days of IT, every update was thoroughly tested before going live. However, in this case, a channel file update, crucial for detecting malware behavior, caused widespread issues.

    Effective release management and rigorous testing are essential before rolling out updates to production. Many organizations practice a staging strategy, keeping updates in a pre-production environment for a week to observe their effects. Unfortunately, this wasn’t done, and now organizations are calling for such measures.

    Additionally, Microsoft’s decision to open their platform to third-party antivirus providers, under pressure from the European Union, inadvertently exposed vulnerabilities. CrowdStrike’s access to Microsoft’s kernel is an example of the complexities involved.

    While some aspects of these incidents are controllable, others are not. As I mentioned, the larger the vendor, the greater the potential impact. Nevertheless, by refining our processes, we can aim to mitigate the effects as much as possible.

    TAM: What are some of the long-term impacts of the CrowdStrike outage on enterprise operations and end-users?

    Abhishek Tripathi:  When considering the long-term impact, one significant issue is the erosion of trust in vendors. This will persist for a long time, prompting third-party processes to be strengthened. Third parties will need to ask more probing questions, moving beyond just theoretical exercises in third-party risk management (TPRM). They will need to delve deeper into technical aspects, such as the level of access vendors require to their infrastructure and systems, whether general access is necessary, how the vendors’ supply chain pipelines operate, and the specifics of their release processes. Vendors will face increased scrutiny in the future.

    Being a vendor will be challenging for a while. From an organizational perspective, this situation is similar to how COVID-19 underscored the importance of business continuity planning (BCP) beyond just theoretical concepts. I recall reading in the Harvard Business Journal about two companies in the automotive industry that both needed sensors for their digital operations.

    The differences in their business continuity and disaster recovery planning significantly impacted their production. After COVID-19, even in India, there was a shortage of car sensors, leading to delayed deliveries. Mahindra responded by reducing the number of sensors in their cars, sacrificing some features to manage the shortage.

    One company had a backup plan and managed to navigate the situation, even if they couldn’t fully supply the sensors and chips. The other company struggled and couldn’t cope. This highlights the importance of robust business continuity planning. Leading airlines, traditionally using Windows systems, might consider diversifying to Linux systems or Macs for better business continuity and disaster recovery planning. In India, the return to issuing written boarding passes is an example of adapting to challenges.

    There will also be increased emphasis on legal and compliance aspects. Governments worldwide, particularly those responsible for critical infrastructure like airlines, will tighten regulations and demand greater compliance.

    Lastly, cyber insurance will become more challenging. With losses reaching $5.4 billion, insurers will ask more questions and conduct thorough assessments of infrastructure. Companies may face higher premiums if their processes and controls aren’t robust.

    TAM: What lessons can be drawn from CrowdStrike outage regarding the integration of cybersecurity solutions with core operating systems? Are there specific vulnerabilities or risk factors that organizations should be aware of? What best practices would you recommend for organizations to enhance their resilience against such incidents?  

    Abhishek Tripathi: It’s important to scrutinize the permissions third-party vendors need on our infrastructure. Do they really require kernel access, or can Windows and other operating system providers offer alternative methods to ensure fairness? This issue arose when antivirus service providers requested kernel access from Windows, so perhaps a mechanism could be developed to provide necessary information without compromising security.

    Otherwise, there might not be much more that can be done on this front. However, third-party risk management will play a more significant role, asking tougher questions when vendors are involved. Secondly, organizations will need to improve their change management processes.

    Organizations must ensure that all changes go through proper Change Advisory Boards (CABs) and that even standard changes undergo thorough risk assessment to understand their potential impact. Regardless of whether CrowdStrike or other vendors conduct sufficient testing, organizations must conduct their own tests. They should implement patches and channel files in lower environments to identify and address any issues. Effective problem management is also crucial to diagnose and resolve any failures.

    These are the technical steps that must be taken to tackle these challenges in the future. Improved disaster recovery (DR) and business continuity planning (BCP) will be essential. These elements are integral to the NIST cybersecurity framework, which emphasizes identifying, protecting, detecting, responding, and recovering.

    Recovery is always the final step, so the recovery process needs to be robust enough to restore infrastructure in such scenarios. For example, having a Linux backup when running Windows systems or a Windows backup when using Linux can enhance resilience.

    (The views and opinions expressed in this article are solely those of the speaker, Abhishek Tripathi, and do not necessarily reflect the official policy or position of S&P Global or any other organization he is affiliated with)

    Author

    RELATED ARTICLES

    LEAVE A REPLY

    Please enter your comment!
    Please enter your name here

    Most Popular

    spot_img
    spot_img