Category Archives: Crisis

CrowdStrike IT Outage Explained by a Windows Developer

Understanding the CrowdStrike IT Outage: Insights from a Former Windows Developer

Introduction 

Hey, I’m Dave. Welcome to my shop.

I’m Dave Plummer, a retired software engineer from Microsoft, going back to the MS-DOS and Windows 95 days. Thanks to my time as a Windows developer, today I’m going to explain what the CrowdStrike issue actually is, the key difference in kernel mode, and why these machines are bluescreening, as well as how to fix it if you come across one.

Now, I’ve got a lot of experience waking up to bluescreens and having them set the tempo of my day, but this Friday was a little different. However, first off, I’m retired now, so I don’t debug a lot of daily blue screens. And second, I was traveling in New York City, which left me temporarily stranded as the airlines sorted out the digital carnage.

But that downtime gave me plenty of time to pull out the old MacBook and figure out what was happening to all the Windows machines around the world. As far as we know, the CrowdStrike bluescreens that we have been seeing around the world for the last several days are the result of a bad update to the CrowdStrike software. But why? Today I want to help you understand three key things.

Key Points

  • Why the CrowdStrike software is on the machines at all.
  • What happens when a kernel driver like CrowdStrike fails.
  • Precisely why the CrowdStrike code faults and brings the machines down, and how and why this update caused so much havoc.

Handling Crashes at Microsoft 

As systems developers at Microsoft in the 1990s, handling crashes like this was part of our normal bread and butter. Every dev at Microsoft, at least in my area, had two machines. For example, when I started in Windows NT, I had a Gateway 486 DX 250 as my main dev machine, and then some old 386 box as the debug machine. Normally you would run your test or debug bits on the debug machine while connected to it as the debugger from your good machine.

Anti-Stress Process 

On nights and weekends, however, we did something far more interesting. We ran a process called Anti-Stress. Anti-Stress was a bundle of tests that would automatically download to the test machines and run under the debugger. So every night, every test machine, along with all the machines in the various labs around campus, would run Anti-Stress and put it through the gauntlet.

The stress tests were normally written by our test engineers, who were software developers specially employed back in those days to find and catch bugs in the system. For example, they might write a test to simply allocate and use as many GDI brush handles as possible. If doing so causes the drawing subsystem to become unstable or causes some other program to crash, then it would be caught and stopped in the debugger immediately.

The following day, all of the crashes and assertions would be tabulated and assigned to an individual developer based on the area of code in which the problem occurred. As the developer responsible, you would then use something like Telnet to connect to the target machine, debug it, and sort it out.

Debugging in Assembly Language 

All this debugging was done in assembly language, whether it was Alpha, MIPS, PowerPC, or x86, and with minimal symbol table information. So it’s not like we had Visual Studio connected. Still, it was enough information to sort out most crashes, find the code responsible, and either fix it or at least enter a bug to track it in our database.

Kernel Mode versus User Mode 

The hardest issues to sort out were the ones that took place deep inside the operating system kernel, which executes at ring zero on the CPU. The operating system uses a ring system to bifurcate code into two distinct modes: kernel mode for the operating system itself and user mode, where your applications run. Kernel mode does tasks such as talking to the hardware and the devices, managing memory, scheduling threads, and all of the really core functionality that the operating system provides.

Application code never runs in kernel mode, and kernel code never runs in user mode. Kernel mode is more privileged, meaning it can see the entire system memory map and what’s in memory at any physical page. User mode only sees the memory map pages that the kernel wants you to see. So if you’re getting the sense that the kernel is very much in control, that’s an accurate picture.

Even if your application needs a service provided by the kernel, it won’t be allowed to just run down inside the kernel and execute it. Instead, your user thread will reach the kernel boundary and then raise an exception and wait. A kernel thread on the kernel side then looks at the specified arguments, fully validates everything, and then runs the required kernel code. When it’s done, the kernel thread returns the results to the user thread and lets it continue on its merry way.

Why Kernel Crashes Are Critical 

There is one other substantive difference between kernel mode and user mode. When application code crashes, the application crashes. When kernel mode crashes, the system crashes. It crashes because it has to. Imagine a case where you had a really simple bug in the kernel that freed memory twice. When the kernel code detects that it’s about to free already freed memory, it can detect that this is a critical failure, and when it does, it blue screens the system, because the alternatives could be worse.

Consider a scenario where this double freed code is allowed to continue, maybe with an error message, maybe even allowing you to save your work. The problem is that things are so corrupted at this point that saving your work could do more damage, erasing or corrupting the file beyond repair. Worse, since it’s the kernel system that’s experiencing the issue, application programs are not protected from one another in the same way. The last thing you want is solitaire triggering a kernel bug that damages your git enlistment.

And that’s why when an unexpected condition occurs in the kernel, the system is just halted. This is not a Windows thing by any stretch. It is true for all modern operating systems like Linux and macOS as well. In fact, the biggest difference is the color of the screen when the system goes down. On Windows, it’s blue, but on Linux it’s black, and on macOS, it’s usually pink. But as on all systems, a kernel issue is a reboot at a minimum.

What Runs in Kernel Mode 

Now that we know a bit about kernel mode versus user mode, let’s talk about what specifically runs in kernel mode. And the answer is very, very little. The only things that go in the kernel mode are things that have to, like the thread scheduler and the heap manager and functionality that must access the hardware, such as the device driver that talks to a GPU across the PCIe bus. And so the totality of what you run in kernel mode really comes down to the operating system itself and device drivers.

And that’s where CrowdStrike enters the picture with their Falcon sensor. Falcon is a security product, and while it’s not just simply an antivirus, it’s not that far off the mark to look at it as though it’s really anti-malware for the server. But rather than just looking for file definitions, it analyzes a wide range of application behavior so that it can try to proactively detect new attacks before they’re categorized and listed in a formal definition.

CrowdStrike Falcon Sensor 

To be able to see that application behavior from a clear vantage point, that code needed to be down in the kernel. Without getting too far into the weeds of what CrowdStrike Falcon actually does, suffice it to say that it has to be in the kernel to do it. And so CrowdStrike wrote a device driver, even though there’s no hardware device that it’s really talking to. But by writing their code as a device driver, it lives down with the kernel in ring zero and has complete and unfettered access to the system, data structures, and the services that they believe it needs to do its job.

Everybody at Microsoft and probably at CrowdStrike is aware of the stakes when you run code in kernel mode, and that’s why Microsoft offers the WHQL certification, which stands for Windows Hardware Quality Labs. Drivers labeled as WHQL certified have been thoroughly tested by the vendor and then have passed the Windows Hardware Lab Kit testing on various platforms and configurations and are signed digitally by Microsoft as being compatible with the Windows operating system. By the time a driver makes it through the WHQL lab tests and certifications, you can be reasonably assured that the driver is robust and trustworthy. And when it’s determined to be so, Microsoft issues that digital certificate for that driver. As long as the driver itself never changes, the certificate remains valid.

CrowdStrike’s Agile Approach 

But what if you’re CrowdStrike and you’re agile, ambitious, and aggressive, and you want to ensure that your customers get the latest protection as soon as new threats emerge? Every time something new pops up on the radar, you could make a new driver and put it through the Hardware Quality Labs, get it certified, signed, and release the updated driver. And for things like video cards, that’s a fine process. I don’t actually know what the WHQL turnaround time is like, whether that’s measured in days or weeks, but it’s not instant, and so you’d have a time window where a zero-day attack could propagate and spread simply because of the delay in getting an updated CrowdStrike driver built and signed.

Dynamic Definition Files 

What CrowdStrike opted to do instead was to include definition files that are processed by the driver but not actually included with it. So when the CrowdStrike driver wakes up, it enumerates a folder on the machine looking for these dynamic definition files, and it does whatever it is that it needs to do with them. But you can already perhaps see the problem. Let’s speculate for a moment that the CrowdStrike dynamic definition files are not merely malware definitions but complete programs in their own right, written in a p-code that the driver can then execute.

In a very real sense, then the driver could take the update and actually execute the p-code within it in kernel mode, even though that update itself has never been signed. The driver becomes the engine that runs the code, and since the driver hasn’t changed, the cert is still valid for the driver. But the update changes the way the driver operates by virtue of the p-code that’s contained in the definitions, and what you’ve got then is unsigned code of unknown provenance running in full kernel mode.

All it would take is a single little bug like a null pointer reference, and the entire temple would be torn down around us. Put more simply, while we don’t yet know the precise cause of the bug, executing untrusted p-code in the kernel is risky business at best and could be asking for trouble.

Post-Mortem Debugging 

We can get a better sense of what went wrong by doing a little post-mortem debugging of our own. First, we need to access a crash dump report, the kind you’re used to getting in the good old NT days but are now hidden behind the happy face blue screen. Depending on how your system is configured, though, you can still get the crash dump info. And so there was no real shortage of dumps around to look at. Here’s an example from Twitter, so let’s take a look. About a third of the way down, you can see the offending instruction that caused the crash.

It’s an attempt to move data to register nine by loading it from a memory pointer in register eight. Couldn’t be simpler. The only problem is that the pointer in register eight is garbage. It’s not a memory address at all but a small integer of nine c hex, which is likely the offset of the field that they’re actually interested in within the data structure. But they almost certainly started with a null pointer, then added nine c to it, and then just dereferenced it.

CrowdStrike driver woes

Now, debugging something like this is often an incremental process where you wind up establishing, “Okay, so this bad thing happened, but what happened upstream beforehand to cause the bad thing?” And in this case, it appears that the cause is the dynamic data file downloaded as a sys file. Instead of containing p-code or a malware definition or whatever was supposed to be in the file, it was all just zeros.

We don’t know yet how or why this happened, as CrowdStrike hasn’t publicly released that information yet. What we do know to an almost certainty at this point, however, is that the CrowdStrike driver that processes and handles these updates is not very resilient and appears to have inadequate error checking and parameter validation.

Parameter validation means checking to ensure that the data and arguments being passed to a function, and in particular to a kernel function, are valid and good. If they’re not, it should fail the function call, not cause the entire system to crash. But in the CrowdStrike case, they’ve got a bug they don’t protect against, and because their code lives in ring zero with the kernel, a bug in CrowdStrike will necessarily bug check the entire machine and deposit you into the very dreaded recovery bluescreen.

Windows Resilience 

Even though this isn’t a Windows issue or a fault with Windows itself, many people have asked me why Windows itself isn’t just more resilient to this type of issue. For example, if a driver fails during boot, why not try to boot next time without it and see if that helps?

And Windows, in fact, does offer a number of facilities like that, going back as far as booting NT with the last known good registry hive. But there’s a catch, and that catch is that CrowdStrike marked their driver as what’s known as a bootstart driver. A bootstart driver is a device driver that must be installed to start the Windows operating system.

Most bootstart drivers are included in driver packages that are in the box with Windows, and Windows automatically installs these bootstart drivers during their first boot of the system. My guess is that CrowdStrike decided they didn’t want you booting at all without their protection provided by their system, but when it crashes, as it does now, your system is completely borked.

Fixing the Issue 

Fixing a machine with this issue is fortunately not a great deal of work, but it does require physical access to the machine. To fix a machine that’s crashed due to this issue, you need to boot it into safe mode, because safe mode only loads a limited set of drivers and mercifully can still contend without this boot driver.

You’ll still be able to get into at least a limited system. Then, to fix the machine, use the console or the file manager and go to the path window like windows, and then system32/drivers/crowdstrike. In that folder, find the file matching the pattern c and then a bunch of zeros 291 sys and delete that file or anything that’s got the 291 in it with a bunch of zeros. When you reboot, your system should come up completely normal and operational.

The absence of the update file fixes the issue and does not cause any additional ones. It’s a fair bet that the update 291 won’t ever be needed or used again, so you’re fine to nuke it.

Conclusion 

Further references 

 CrowdStrike IT Outage Explained by a Windows DeveloperYouTube · Dave’s Garage13 minutes, 40 seconds2 days ago

Leveraging SFIA for Objective Downsizing: Safeguarding Your Digital Team’s Future

Utilizing the Skills Framework for the Information Age to Strategically Reduce Staff: Protecting the Future of Your Digital Workforce

In an ever-evolving digital landscape, organizations are continuously faced with the challenge of aligning their workforce capabilities with the strategic objectives and technological demands of the market. This occasionally necessitates the difficult decision of downsizing. 

However, when approached with a strategic framework such as the Skills Framework for the Information Age (SFIA), downsizing can be managed in a way that not only reduces the workforce but also strategically refines it, ensuring that the remaining team is more aligned with future goals. 

i. Understanding SFIA

The Skills Framework for the Information Age (SFIA) provides a comprehensive model for the identification of skills and competencies required in the digital era. It categorizes skills across various levels and domains, offering a structured approach to workforce development, assessment, and strategic alignment. By mapping out competencies in detail, SFIA allows organizations to objectively assess the skills available within their teams against those required to achieve their strategic goals.

ii. SFIA: A Framework for Fair and Transparent Downsizing

SFIA offers a standardized way to assess and compare employee skill sets. By leveraging SFIA, organizations can:

o Identify critical skills: Pinpoint the skills essential for current and future digital initiatives.

o Evaluate employee capabilities: Assess employees objectively based on their SFIA profiles, ensuring data-driven decisions.

o Maintain a strong digital core: Retain top talent with the most crucial skill sets to safeguard the team’s future.

iii. Strategic Downsizing with SFIA: A Guided Approach

A. Analyzing Current and Future Skill Requirements

The first step in leveraging SFIA for downsizing involves a thorough analysis of the current skill sets within the organization against the backdrop of the future skills required to meet evolving digital strategies. This diagnostic phase is critical in identifying not just surplus roles but also areas where the organization is at risk of skill shortages.

B. Objective Assessment and Decision Making

With SFIA, the assessment of each team member’s skills and competencies becomes data-driven and objective, mitigating biases that can often cloud downsizing decisions. This framework enables managers to make informed decisions about which roles are essential for future growth and which are redundant or can be merged with others for efficiency.

C. Skill Gaps and Redeployment

Identifying skill gaps through SFIA provides insights into potential areas for redeployment within the organization. Employees whose roles have been identified as redundant might possess other skills that are underutilized or looko could be valuable in other departments. This not only minimizes job losses but also strengthens other areas of the business.

D. Future-proofing Through Upskilling

SFIA also helps organizations to future-proof their remaining workforce through targeted upskilling. By understanding the precise skills that will be needed, companies can implement training programs that are highly relevant and beneficial, ensuring that their team is not only lean but also more capable and aligned with future digital challenges.

E. Communication and Support Structures

Effective communication is crucial during downsizing. Using the insights gained from the SFIA framework, leaders can better articulate the reasons behind the restructuring decisions, focusing on the strategic realignment towards future goals. Additionally, offering support structures for both departing and remaining employees, such as career counseling or upskilling opportunities, can help in maintaining morale and trust.

iv. Benefits of Leveraging SFIA for Downsizing

A. Objective Skills Assessment:

   o SFIA facilitates an objective assessment of employees’ skills and competencies, enabling organizations to identify redundancies, skill gaps, and areas of expertise within the digital team.

   o By basing downsizing decisions on skills rather than job titles or seniority, organizations can ensure alignment with strategic objectives and retain critical capabilities.

B. Strategic Workforce Planning:

   o SFIA supports strategic workforce planning by providing insights into the current skill landscape, future skill requirements, and potential areas for development within the digital team.

   o Organizations can use this information to align workforce capabilities with evolving business needs, anticipate skill shortages, and proactively address talent gaps.

C. Efficient Resource Allocation:

   o By leveraging SFIA to identify redundancies or underutilized skills, organizations can optimize resource allocation and streamline the digital team’s structure.

   o This ensures that resources are allocated effectively to high-priority projects and initiatives, maximizing productivity and return on investment.

D. Retaining Critical Capabilities:

   o SFIA enables organizations to identify and retain employees with critical skills and expertise essential for the success of digital initiatives.

   o By offering redeployment opportunities, upskilling programs, or knowledge transfer initiatives, organizations can retain valuable talent and maintain continuity in project delivery and innovation.

E. Enhancing Employee Engagement:

   o Involving employees in the skills assessment process and offering opportunities for redeployment or skills development demonstrates a commitment to employee development and engagement.

   o This approach fosters a positive organizational culture, enhances morale, and mitigates the negative impact of downsizing on remaining staff.

v. Beyond Downsizing: Building a Future-Proof Digital Team

While SFIA can aid in objective downsizing, it also promotes long-term digital team development:

o Skills gap analysis: Identify skill deficiencies across the team and implement training programs to bridge those gaps.

o Targeted upskilling: Invest in upskilling initiatives aligned with SFIA to prepare your team for future digital challenges.

o Succession planning: Leverage SFIA data to develop succession plans and cultivate future digital leaders.

vi. Conclusion

Downsizing, especially within digital and tech teams, poses the risk of eroding an organization’s competitive edge if not handled with foresight and precision. 

By employing the SFIA framework, businesses can approach this delicate process objectively, ensuring that decisions are made with a clear understanding of the skills and competencies that will drive future success. 

This not only helps in retaining a robust digital capability amidst workforce reduction but also aligns employee growth with the evolving needs of the organization. 

Ultimately, leveraging SFIA for objective downsizing serves as a strategic maneuver to safeguard your digital team’s future, ensuring the organization emerges stronger and more resilient in the face of challenges.

vii. Further references 

LinkedIn · SkillsTX8 reactions  ·  5 months agoLeveraging SFIA for Objective Downsizing: Safeguarding Your Digital Team’s Future

LinkedIn · John Kleist III10+ reactions  ·  11 months agoNavigating Technology Layoffs: Why Using a SFIA Skills Inventory is the Ideal Approach

SFIAhttps://sfia-online.org › about-sfiaSFIA and skills management — English

International Labour Organizationhttps://www.ilo.org › publicPDF▶ Changing demand for skills in digital economies and societies

Digital Education Resource Archivehttps://dera.ioe.ac.uk › eprint › evid…Information and Communication Technologies: Sector Skills …

De Gruyterhttps://www.degruyter.com › pdfPreparing for New Roles in Libraries: A Voyage of Discovery

Digital Education Resource Archivehttps://dera.ioe.ac.uk › eprint › evid…Information and Communication Technologies: Sector Skills … 

Building An Effective Crisis Management Team

Building an Effective Crisis Management Team: Preparing for the Unexpected

In today’s unpredictable world, businesses are constantly exposed to potential crises. These can range from public relations disasters and data breaches to natural disasters and supply chain disruptions. To navigate the formation of an effective crisis management team (CMT) is indispensable.

Having a well-prepared and effective crisis management team in place is crucial to navigating these tumultuous times successfully, and protecting your organization’s reputation, operations, and employees.

i. Understanding the Role of a Crisis Management Team

A crisis management team is a group of individuals tasked with preparing for, responding to, and recovering from any emergency or crisis. This team is responsible not just for immediate response, but also for strategic planning to minimize the impact of crises on the organization’s operations, reputation, and stakeholders.

ii. Key Steps to Building an Effective Crisis Management Team

A. Selecting the Right Team Members

The composition of the team is critical. Members should be selected based on their expertise, decision-making abilities, and leadership skills. It’s essential to have a diverse group that includes representatives from various departments (e.g., HR, IT, operations, finance, and legal) to ensure all aspects of the organization are considered in crisis planning and response.

B. Defining Roles and Responsibilities

Clearly defined roles prevent confusion during a crisis. Each member should know their specific responsibilities, how they fit into the larger response effort, and who they report to or collaborate with within the team.

o Team Leader: Appoint a clear leader to guide the team’s overall response and ensure all members are informed and aligned.

o Communication Specialist: Designate a dedicated individual to manage external communications, including media relations and messaging to stakeholders.

o Internal Communications: Assign someone to handle internal communications, keeping employees informed, managing anxiety, and maintaining morale.

o Subject Matter Experts: Identify specific team members with expertise relevant to the potential crisis scenarios, who can offer specific guidance and support.

C. Training and Preparedness

Training is a cornerstone of an effective CMT. Regular drills and simulation exercises should be conducted to prepare the team for various crisis scenarios. This not only helps in refining response strategies but also in identifying potential gaps in preparedness. Continuous education on crisis management best practices is also vital.

D. Developing a Comprehensive Crisis Management Plan

A well-crafted crisis management plan (CMP) is the team’s playbook. It should outline the procedures for different types of crises, communication strategies, stakeholder management, and recovery processes.

o Identify Potential Risks: Conduct a thorough risk assessment to identify potential vulnerabilities and the likelihood of different crisis scenarios.

o Develop Response Protocols: Create detailed protocols for various crisis scenarios, outlining communication strategies, decision-making processes, and resource allocation plans.

o Regular Training and Drills: Regularly conduct training exercises and simulations to ensure the team is familiar with the plan, can work effectively together, and practice their roles under pressure.

E. Effective Communication

Communication during a crisis must be clear, consistent, and transparent. The CMT should establish protocols for internal and external communications, including predefined templates for public statements. It’s also crucial to identify a spokesperson skilled in media relations to ensure the organization speaks with one voice.

F. Stakeholder Engagement

Identifying and engaging stakeholders is critical before, during, and after a crisis. Understanding stakeholders’ expectations and concerns can guide the crisis response and communication strategy, helping to maintain trust and confidence in the organization.

G. Review and Learn

Post-crisis, the team should conduct a thorough review of the response to identify successes and areas for improvement. This should involve feedback from all levels of the organization and, where appropriate, from external stakeholders. Lessons learned should inform future revisions of the CMP.

H. Crisis Communication Tools

Invest in communication tools and platforms that facilitate efficient information sharing within the team and with stakeholders.

I. Continuous Improvement

Regularly review and update your crisis management plan and protocols to reflect evolving risks and lessons learned from past experiences.

iii. Conclusion

Building an effective crisis management team takes time, dedication, and ongoing effort; it requires careful planning, dedication, and ongoing refinement. Such a team becomes the organization’s anchor during crises, providing direction, reducing chaos, and enabling a more resilient organization. By prioritizing the development of a skilled and prepared CMT, businesses can navigate crises with confidence, safeguarding their operations, reputation, and future.

Remember, a well-prepared team can help mitigate the impact of a crisis, protect your reputation, and ensure the continued success of your organization.

iv. Further references 

6 Steps to Creating a Capable Crisis Management Team – PreparedEx

Continuity2continuity2.comCrisis Management Team: Function, Roles & Responsibilities

Agility Recoveryhttps://www.agilityrecovery.com › …6 Keys to Assembling a Crisis Management Team

Universal Classhttps://www.universalclass.com › de…Developing a Team for Crisis Management

International Crisis Management Conferencehttps://crisisconferences.com › 8-ste…8 Steps to Creating a Competent Crisis Management Team

LinkedInhttps://www.linkedin.com › adviceWhat are the best ways to build a strong and resilient team for crisis management?

How can you manage a crisis in a highly regulated industry?

Managing a crisis in a highly regulated industry presents unique challenges, and requires a calculated, swift, and compliant response, but it can be successfully navigated with a proactive and meticulous approach. 

Effectively managing a crisis requires a comprehensive and well-coordinated approach that involves multiple stakeholders and a clear understanding of the regulatory landscape.

Here are some steps to consider:

A. Preparation and Prevention:

a. Establish a crisis management plan that outlines roles, responsibilities, communication protocols, and response procedures.

b. Conduct regular risk assessments to identify potential crisis scenarios and develop mitigation strategies.

c. Implement strong internal controls and risk management practices to prevent crises from occurring.

B. Understand Regulatory Obligations: Quickly assess and clearly understand the regulatory obligations that apply to the specific crisis situation. This includes legal requirements, reporting obligations, and any industry-specific regulations.

C. Early Detection and Response:

a. Establish clear channels for reporting and escalating potential crises.

b. Monitor industry news, social media, and internal sources for signs of emerging issues.

c. Respond promptly and decisively to crisis situations, taking initial steps to contain the situation and protect public safety.

D. Response Procedure: Establish a clear procedure about what steps to follow and who to notify during a crisis. It should assign responsibilities, provide guidance on decision-making regulations, and include steps for external and internal communication.

E. Stakeholder Engagement:

a. Engage with key stakeholders, including regulators, industry bodies, and community leaders, to seek support and cooperation.

b. Listen to concerns and feedback from stakeholders and incorporate their perspectives into the crisis response strategy.

c. Demonstrate a commitment to collaboration and transparency to build trust and maintain relationships.

F. Communication Strategy: Develop a comprehensive communication strategy that addresses both internal and external stakeholders. Clearly communicate the steps being taken to manage the crisis, comply with regulations, and ensure transparency.

G. Establish a Crisis Management Team: 

a. A cross-functional team led by senior management that includes representatives from key departments, including legal, compliance, communications, operations, and senior leadership. 

b. The team should be responsible for making swift decisions, coordinating responses, and communicating with stakeholders (including regulatory bodies). 

H. Regulatory Compliance:

a. Thoroughly understand the applicable laws, regulations, and reporting requirements related to crisis management in your industry.

b. Work closely with regulatory agencies to ensure compliance with all requirements and maintain open communication channels.

c. Seek guidance from legal counsel to navigate complex regulatory issues and potential liability concerns.

I. Documented Evidence: Maintain well-documented evidence of every action taken during the crisis. This will not only aid in regulatory compliance but also provide valuable insights for future references.

J. Compliance with Reporting Requirements: Ensure timely and accurate reporting to relevant regulatory authorities as required by law. This may involve notifying regulatory bodies of incidents, providing updates, and collaborating transparently throughout the crisis.

K. Communication and Transparency:

a. Communicate openly and transparently with stakeholders, including customers, employees, partners, media, regulatory bodies and the public.

b. Provide accurate and timely information to address concerns and prevent misinformation from spreading.

c. Establish a designated spokesperson to represent the organization and convey its message consistently.

d. Use all available channels, such as press conferences, emails, social media, and your company’s website.

L. Liaise with Regulatory Bodies: 

a. In a highly-regulated industry, cooperating fully with regulatory authorities to ensure that your crisis response is in compliance with the rules and regulations that govern your industry. 

b. Designate a point of contact to liaise with regulatory authorities. This individual should be well-versed in regulatory requirements and be able to communicate effectively with regulators during the crisis.

c. Keep them informed, submit necessary reports, and follow given guidelines and procedures.

d. Cooperate with regulators; they are often perceived as adversaries, but in a crisis, they can offer valuable advice and support.

M. Legal Counsel: Engage legal counsel early in the crisis response to provide guidance on legal implications, regulatory compliance, and potential liabilities. Legal experts can help navigate complex regulatory landscapes.

N. Comprehensive Documentation and Record Keeping: 

a. Maintain thorough records of all actions taken during the crisis along with their rationale. 

b. This includes communication records, decision-making processes, and compliance efforts. Accurate records are essential for regulatory inquiries and investigations.

c. This can help manage legal and regulatory requirements, and provide valuable information for a post-crisis review.

O. Training and Preparedness:

a. Regularly train employees on crisis management procedures and ensure they are aware of their roles during a crisis. 

b. Preparedness helps streamline the response and ensures a more effective compliance strategy.

P. Regulatory Updates:

a. Stay informed about any updates or changes in regulations related to the crisis. 

b. Regulatory requirements may evolve, and staying current is crucial for maintaining compliance throughout the crisis management process.

Q. Internal Investigations: Conduct thorough internal investigations to determine the root cause of the crisis. This may involve collaboration between internal teams, external experts, and regulatory bodies, if necessary.

R. Engage External Experts: If the crisis requires specialized knowledge, consider engaging external experts or consultants who can provide insights into compliance issues, regulatory expectations, or specific industry challenges.

S. Collaborate with Industry Associations: Work collaboratively with industry associations to share best practices, insights, and lessons learned. Industry peers can offer valuable perspectives and support during challenging times.

T. Scenario Planning and Simulation: Conduct scenario planning and simulation exercises to prepare for potential crises. This helps identify gaps in the crisis response plan and ensures that teams are well-equipped to manage a crisis within regulatory constraints.

U. Recovery and Evaluation: 

a. Post-crisis, conduct a thorough evaluation to analyze the effectiveness of the crisis management strategy. 

b. This review provides valuable learnings and the opportunity to refine your plan, making it more robust for future scenarios. 

c. Review to understand the root cause, learn lessons, identify improvements that can be made to prevent future occurrences, and update your crisis management plan accordingly.

d. Share learnings across the organization to enhance crisis preparedness and response capabilities.

V. Continuous Improvement: After the crisis is resolved, conduct a thorough debriefing to assess the response and identify areas for improvement. Use the lessons learned to enhance crisis management procedures and ensure ongoing compliance readiness.

Crisis management in highly regulated industries requires a proactive and comprehensive approach that emphasizes prevention, early detection, effective communication, compliance, and stakeholder engagement. 

By implementing these key steps, organizations can effectively navigate crisis situations and minimize their impact on public safety, regulatory compliance, and brand reputation.

By combining regulatory expertise, effective communication, and a proactive approach, organizations in highly regulated industries can navigate crises while meeting compliance requirements and protecting their reputation.

https://www.reuters.com/legal/legalindustry/best-practices-crisis-management-preparation-2023-06-13/

https://www.bcg.com/capabilities/risk-management-and-compliance/compliance-and-crisis-management