Dark hours – postincident recovery without procedures and documentation

In post-incident recovery, actions, procedures, and documentation are the key elements for an organization to get back on its feet. The reality, however, very often brutally verifies badly prepared plans. Check two real-life scenarios, where many things went wrong and read what should have been done.

SCENARIO I

A big global company in the chemical industry was attacked by cybercriminals and their data in branches across the world were encrypted. The organization refused to pay the ransom and decided to restore infrastructure by using data backups and paper documentation (the law required the company to keep it in the archive). They decided to take a risk, even if there was a possibility that some of the data would be permanently lost. Operational technology was not infected and there was no direct connection to IT infrastructure.

We were asked for help in post-incident recovery by attacked company’s business partner. On-site, we were expecting to receive proper documentation and procedures, but quickly realized there is none. There were technically competent people at the UK and US headquarters of the attacked company, but they were not prepared for such an event affecting so many countries and regions around the world at the same time on this scale. Their first idea for recovery didn’t work well in branches. We were supposed to perform scanning with a tool provided by them and flag healthy systems. If even one was unhealthy, then all the systems in the network should have been reinstalled, but there was no procedure on how to do it and especially with such a vast number of systems without proper documentation. We were doing it manually as automation for the installation didn’t exist. A much better idea would have been to set up all the systems at the same time with the help of a previously prepared server. Halfway through the work, the headquarters decided that it needed a different, customized system. We needed to start from the beginning. After installation, we realized there was no step to enter a login and password to log into the systems. They were not connected to Active Directory, there was no admin account accessible for us or anyone else. Long story short, HQ made a mistake while preparing new images and there was no time left for them to prepare and deliver new ones. We managed to deal with this problem. We also wrote some scripts that speeded up our work. Finally, in cooperation with us, procedures were created, which finally were to be implemented in other locations.

One of the reasons why the company had such a vast number of problems was connected to serious technological debt. It wasn’t an issue of the IT in a particular branch, but of the entire organization, in particular in the headquarters. There were 80 domain administrators, the attack surface was extensive. They were using old systems without support, e.g., Windows Server 2003. The attack vector was probably classic phishing followed by privilege escalation. Main servers were a mess, with lots of unnecessary things installed, they were like a regular workstation. Fortunately, not all company locations had been encrypted.

SCENARIO II

A global technological company was attacked by a ransomware group of cybercriminals and lost access to encrypted data. In this case, the board decided to pay and the criminals delivered the decryptor. Despite the decryption of files, many systems still didn’t work properly and the company didn’t get access to all of its resources. The documentation existed but was encrypted. The company also had backups, but no one could log into them because the authentication server was encrypted.
Like in the previous case, we were asked for help by our partner. The main consulting company hired by the attacked organization belonged to the group of “the big four.” There was a procedure created by this large consulting company, but it didn’t fully work in the field. We got the hardware into our hands after the decryptor was used and our task was to put everything in motion. The headquarters was unable to start many processes and the preconfigured device that was delivered to us turned out to be inoperative. We spent dozens of hours working with HQ specialists to solve those problems and support them whenever the standard operating procedure was not working for them. Finally, we recovered all the systems and were able to bring back the fully operational state of their production sites. During this time, we even reversed engineered malware and decryptor to fully understand how we can decrypt unrecoverable files which at first looked like they might be lost forever.

Main issue
In post-incident recovery, actions, procedures, and documentation are the key elements for an organization to get back on its feet. Problems in such situations are a result of neglecting to practice catastrophic level event scenarios and recreation of the organization from a non-existent environment. It’s very common among organizations to focus on the idea of having backups but there is no fundamental analysis of ransomware-related risks. There is a lack of decent impact analysis. In our cases, it turns out that the average time of recovery from a catastrophic event was two weeks. During this time, the organizations cannot produce goods, or the logistics department is not operative. In the first case scenario, the company couldn’t print the labels for the barrels that are legally necessary for the circulation of this particular commodity. Production could work on a full scale, the trucks were waiting, but nothing was happening because the print servers were not working.
Another source of the problems are contracts with a specific range of activities signed by organizations with cybersecurity providers. In the first case, they were responsible only for cleaning systems and getting rid of the malware. They didn’t care if the company was operating after they have fulfilled their obligations, it was not the kind of service they were paid for, just wanted to clear the site and move to the next one as soon as possible.

Companies’ reactions
After the incident, the company’s budget usually is more generous for the cybersecurity department. Unfortunately, the memory is short and after a few months’ security loses its importance again. Especially when proposed changes may have an uncomfortable impact on the business, e.g., more standardization, less flexibility, more restrictions for employees processing sensitive data. Sometimes companies fire CISO or CTO and hire a new person for this position. It will always be a mistake, especially just after an incident or even worse, during the incident, because a new CISO will spend a long time before understanding the system into which he or she has entered. A new person comes with some experience and very often changes old solutions for completely new ones. Replacement is not a method to fix the issues. In many organizations there is no position such as a CISO, the person responsible for security is an employee responsible for the maintenance of production, who does not want to complicate their lives in the name of security.

Solution
Permanent cooperation with a managing security service provider could prevent the development of such scenarios. It’s a cost-effective option for organizations without an in-house security operations centre. However, there are some limitations of this solution, the service is generic and there is not much customization with regard to the particular system. Cooperation with a managing security service provider allows the organization to be prepared for incidents and introduce the short-term and long-term post-incident strategy. It is necessary to consider different scenarios, even just on paper. Playbooks are in every security framework, it is standard, but procedures very often don’t have too much in common with reality or they are simply being ignored.

Comments