The Industrial Control System's (ICS) problems (and those industries reliant on ICSs) have existed for nearly 15 years, but continuously increase in impact and frequency as more systems become IP based and compute-based automation is introduced into the systems. Additionally, because ICSs are specialized there is limited industry expertise spanning ICS, IT, and security. In terms of mitigation techniques, most vendors and organizations leverage generic security methodologies resulting in a reactive approach.
The Triton/Trisis malware, discovered in late 2017, was significant in that is effectively targeted ICS environments and had broad penetration across multiple regions. It is noteworthy that, for the first time, malware targeted a specific vendor based on very specific knowledge of the underlying technology – Schneider Electric's Triconex Safety System (SIS).
The common question in ICS-oriented blogs, social media outlets, and open forums is from where and how should the ICS vendors better provide security within their product? The mitigation cannot rely solely on the deploying organization to build security around the deployment nor can it be a reactive approach of fixing vulnerabilities in production as they are found. It begins with the ICS vendors building security within; however, as with most IT systems and applications, this will evolve over time. Thus, for the foreseeable future, the best operational outcomes must be planned. Precisely planned as a phased approach and diligently executed, rather than chasing the rabbit. RSA's experience with customers and participating in industry events led to the development of the following framework/process as a good high-level representation of the stages to plan:
1. Discovery and Planning
Specific to ICS, the initial discovery and planning should allow the organization to:
Quantify the risk to the orgnization, specifically impacting ICS. This is not to say that business systems should not be in scope; rather, a focused approach on ICS is required (business systems can be analyzed in parallel or as a separate initiative). Quantification of risk is critical to having a positive impact to the organization in the most efficent means possible by identifying the most important aspects of the organization.
Prioritize systems based on business impact. A key issue is trying to protect all systems at the same level at the same time. For most organizations there are not enough resources to accomplish this. Within the risk quantification approach you can better understand what systems need to be prioritized based on the risk to revenues/mission, organizational reputation, and regulatory non-compliance. A common approach is to follow the NIST FIPS 199 standard for security categorization defining how to perform a Confidentiality, Integrity and Availability impact analysis.
GAP Analysis of where you are today and where you need to be to provide the optimal level of security to achieve the organization's acceptable risk tolerance. This includes technology, processes and people.
Realistic, achievable planning that can be executed within the organization. The best evaluation and the best fact-based planning is worthless if the organization cannot take action to make improvements. This includes recognition of the all-important ability to apply resources and achieve improvements within defined timelines to meet the acceptable business risk tolerance level.
Ownership and roles model (more commonly known as RACI) for making sure the correct people within an organization know their roles, what they are responsible and accountable for, and who should be consulted and informed during the project.
Specific steps in this phase include network (air gap, partitioned/flat, protocol/service assessment) and systems (confidentiality, integrity, and availability) assessments. The idea is to map the complete network and analyze layers zero through five. Each system can be prioritized based on its impact to the organization by leveraging traditional risk quantification techniques. The intial assesment should identify the current monitoring capababity with high-level gaps that may create network blind spots. Some common areas discovered during this assesment phase relate to access.
If the assessment reveals permanent remote connectivity to the network then steps needs to be taken to tightly configure and monitor it. Any change control should be stringently scrtuinized and contain multiple check points to deter a successful breach.
Internet connections with the sole purpose of downloading new firmware should be closely monitored and any downloaded firmware MD5s should be verified against a separately provided list from the vendor. Ensuring ICS engineering Human-Machine Interfaces (HMIs) are segregated and non-critical internet connections blocked, will help prevent direct incursion and reduce the human vulnerability element from the attack surface.
Any device used to transfer data between the Internet/IT network and the ICS network should be encrypted and presented to an air-gapped 'sheep dip' machine prior to being used on the ICS network.
If earlier questionnaire assessments highlight a flat network topology then it's imperative to proceed with extreme caution. A flat network is difficult to monitor, as there are no choke points to install network security monitoring devices on or natively difficult to access areas of the network as an attacker would be free to fully scan and laterally move unimpeded by simple firewalls. The first recommendation is to re-design the network architecture, zoning the network and creating monitored choke points with appropriate controls that can be used to both block and detect suspicious traffic occurring between the zones.
If a network map cannot be provided, create a generic network map utilizing the collected information to show a high level before and after network topology and design. This helps explain the architectural changes required to monitor the network effectively.
2. Intelligence Collection and Storage
Threat intelligence is the most neglected domain within the ICS environment. Today, there is no shortage of intelligence collection; rather, there is so much intelligence available it can be difficult to discern which intelligence applies. Organizations face the opposite phenomenon of actionable intelligence. Organizations need to understand, specific to their environment and circumstances, what intelligence is needed and if it is actionable. For example, who is attacking the organization, for what gain, and what is their primary vector of attack and targets?
Actionable Intelligence is the output of collected information leading to data that once analyzed can be utilized in the defense and contextualization of events on the network. In other words, in all the noise you were able to discern actionable steps to reduce threats to the organization, reduce dwell time, or mitigate the impact.
Tools can be configured to collect a multitude of data; once collected analysts can extract and create intelligence. It's important to understand that Intelligence cannot be automated; gathering and filtering is assisted with automation and services. Only correlated and carefully vetted data becomes intelligence. If the data leaves open ended questions, it's not intelligence. Intelligence is derived only when the questions have been answered.
The organization needs to define parameters to cater intelligence for the following:
Internal intelligence goes far beyond the initial project discovery phase. Internal intelligence are the keys the adversary needs to successfully achieve an objective. Likewise, in order to successfully monitor and protect a network, the defense team also needs up-to-date and accurate internal intelligence.
Internal intelligence is to the defender what external intelligence is to the adversary (TTPs). Neither side wants the opposition to have those details. Therefore, internal intelligence is arguably the most important data set when considering cybersecurity.
Defense teams usually consider external threat intelligence to be most important, but this information is useless – even when an attack is found – if you cannot determine what the attack is targeting. NAT, flat, or unmonitored segments on a network may lead to an alert on the perimeter of a network with no viable capability of determining which internal asset the attack is targeting.
- People – Who owns the asset?
- Process – What should the asset be doing?
- Technology – What is the asset? (Client, Server, HMI, IED, PLC etc...)
It's also critical to validate data as a network map may be out of date and contain incorrect information. Asset versions may also be wrong. Validation of this data should be fluid and constant in response to ad-hoc lookups and scheduled data verification tasks.
External intelligence is, by deinition, the attacker's internal intelligence. This is the data the adversary wants to keep safe and and away from defenders. An attacker with high-value malware typically will use multi-staged infection tactics in order to enable its removal – leveraging the vulnerability and only leaving the commodity malware providing the command and control (C2) or remote access Trojan (RAT) functions. The key to obtaining this intelligence lies in a rapid response, and forensic analysis of an asset following infection. The quicker this can be achieved, the more likely you are to retrieve the high value malware before the data is overwritten.
Attack groups usually use similar TTPs, much like a defense group always follow a set of processes. These TTPs can be collected over time via collecting and storing data following attacks and breaches regardless of the success level of the attack. Over time these TTPs can lead to advanced Use Cases which can be used to prevent breaches and potentially uncover targeted attack reconnaissance activity before the attack starts.
3. Monitoring & Detection :
The primary ICS network monitoring method is via Network Security Monitoring (NSM). A response strategy should include the capability to install a monitoring device on the inner segments of the network during an incident or to conclude an investigation following a series of suspicious events. Monitoring should be based on:
Monitoring devices can be complex installations or simple ad-hoc devices. Some IDS solutions have ICS rules available as ICS becomes more mainstream and well-known to attackers and defenders alike. These monitoring devices must be installed as passive collection devices in order to prevent any degradation of traffic on the network.
Passive packet monitoring should be utilized to build a baseline of activities over time and later serve as a reference to identify abnormal activity, such as unusual or uncommon ports and protocols. This type of internal information should also be used to build the internal intelligence database.
In a well baselined environment, abnormal activity is easy to pinpoint. For example, .exe and .dll files should never be seen transferred across an ICS network; firmware files in a well-managed environment will only be presented from a specific engineering asset or group of assets. Any transfer or installation of firmware from an unusual device may indicate an attack or breach.
As the majority of ICS devices will be diskless devices, system logs may not always be available. However, some newer devices do produce system logs which may be collectable,
Log collection assists in correlating network events with device events. Many ICS devices don't generate system logs, but some newer devices do. When an asset is discovered, or on-boarded, it should be determined if logs are available and the type of events these logs can produce in order to assist in monitoring for security events.
Understanding the difference between normal and abnormal events at this level requires baselining and creating metric reports. During the baselining, frequent meetings with the ICS engineering team are held to assist in understanding the data and what abnormal looks like from an acceptable and unacceptable point of view.
Feeding Logs, Packets, Endpoint and parsed ICS OT events into a robust monitoring suite creates a fully contextualized flow of events occurring across the entire ICS network for true defense in depth and complete visibility into the critical infrastructure, as seen in Figure 2.
4. Incident Response
ICS Incident Response primarily focuses on the safety and visibility of the infrastructure; loss of visibility of critical devices can lead to devastation. Rapid response to ensure operational and safe running of the infrastructure is paramount.
The primary objective of incident response on an ICS infrastructure is the continuous operation, visibility and safe operation of the environment. During response, containment and remediation all details should flow through to the Content Creation and Intelligence teams in order to develop rapid detection content and research additional IOCs and TTPs. This allows the analyst to monitor for additional assets or threats which have not yet been identified.
Unlike traditional Incident Response, forensic analysis must take place following containment and remediation. Full forensic analysis should be conducted to determine how the breach occurred and uncover previously unknown vulnerabilities to allow a plan to be developed to further strengthen and protect the network.
A full and detailed analysis report should be produced at the end of the forensic analysis phase identifying any previously undetected indicators and allowing the assets to be fully remediated, intelligence databases updated and content management updating the detection content to detect future compromise attempts.
"What's Measured Improves" Peter Drucker
Metrics are ultimately just numbers (cardinal numbers!). It is the context surrounding them and how we use them that drives improvement into our security programs. Stopping to address exactly what we want to achieve and then collecting the metrics to support the mission statement will drive RESULTS. Eventually we need to evaluate our security against what matters to us, from both the business and security sides. Knowing there are millions of EP's is all well and good, but what do they tell us? Perhaps the signatures are very loose or we need to do more on false positive reduction – filter the white noise?
Security and risk management in today's fast paced world must be continuous. Changes in the threat landscape, technology, your own environment, business risk tolerance, and many other variables can make all your planning ineffective, or in the extreme, completely incorrect. Systematically collecting, evaluating, and adapting to information leveraging your existing approach is critical to success. Automation must be part of your strategy as no organization has enough people to manually manage a risk program, nor perform continuous security event and incident monitoring.
Reducing risk within an organization, and aligning to business priorities is critical. This convergence of security at risk is RSA® Business-Driven Security. Continually reducing the risk and securing ICS environments takes effort, time and resources. The most effective way to accomplish anything in security is leveraging facts – intelligence – having a plan, and establishing a sustainable and continuous operational program. There are quick wins, but there is no fast track available towards continually reducing the risk to, and securing your ICS environment at 100% level. Risk reduction and security is a continual process and the above recommended risk mitigation strategy, or one like it, is essential to an organization's success in dealing with risk across the ICS environment. The key is to implement a proactive approach that helps you to minimize breach and cyber incident risk exposure over time. Success is never built in a day, but rather systematically in a way that is sustainable and measurable over time.
# # #
Learn more on how the RSA® Advanced Cyber Defense Practice helps organizations protect their highest value assets from targeted attacks. Read how RSA Netwitness® Platform provides comprehensive Threat Detection and Response.