An Analyst’s Dilemmas
Written by: Sai Molige
Have you ever watched the clip where Charlie Chaplin works in a factory? If you have not, then give it a look: it is called “Modern Times-Factory Work.” To summarize, Chaplin has to work on a metal piece very fast, without missing a single item, before it reaches another person for the next phase, and the chain continues until the product reaches more machinery. The reason I mention this is that I feel like in some SOC environments, the analysts are like Chaplin, having to work on parts (alerts) FAST, WITHOUT ERROR, and PASS ON TO OTHERS.
We will take a look at some of the issues SOCs face from these moving parts that may prevent an analyst’s efficiency and consistency in their triaging process. More importantly, we will also discuss what an analyst may do to overcome these issues. Remember, many things are above the analyst’s pay grade, but we’ll only discuss what we as analysts have control over.
I have divided this blog into four parts: The Usual, Architectural, Log Augmentation, and Alerts. Later we will discuss some of the things an analyst may do to help themselves.
The number of alerts some SOCs have to deal with is just shy of 25,000/day. Because of this, two issues arise: analyst burnout and tribal knowledge, or unwritten knowledge not commonly known within a team. Due to the increase in the number of alerts, an analyst’s already increased workload just spiked. Now add “less efficient/skilled staff” to the mix. These two factors increased the resources (time, energy, etc.) required from an already drained analyst. Tribal knowledge will not scale well if it is not transferred to the rest of the team. If you do not think it is an actual problem, read the “Phoenix Project” book and you will understand its effect at the team/organizational level.
To better understand how each component hinders an analyst’s work, let’s use an example. An analyst (let’s call the analyst Chaplin) wants to work on the alert, but he does not have the rule the alert triggered on nor does he have all the context required around the triggered alert. So, he took whatever info he had from the alert and searched in the SIEM’s storage using SIEM’s search engine. The info he wants is not populating right away, as there is too much data in SIEM’s storage, and the SIEM is not designed to handle significant amounts of writing, indexing, and querying. So, Chaplin decided to trim his search and decided to only concentrate on a particular data set, and after crawling for a while he saw some results related to the alert. The documents or logs returned are not enriched, have no common naming convention with which he can correlate with other log sources, and, on top of that, the results are skewed because of time format/zone issues.
SIEM is full of logs:
Some of the questions one might ask are: what is the problem, why it is so important, and how does it affect an analyst? Let’s address them one by one:
Due to the lack of understanding of the log onboarding and enrichment process:
- we may have “blind spots” on what we see and protect
- inconsistencies in the field naming conventions
- ugly blocks of data that aren’t searchable
- adverse impact on the query search time and query return time
- missed opportunity on using out-of-box SIEM alerts and ability to create custom alerts
This gives us a nice segue into our last set of problems in a SOC: alerts.
Alerts, where the life of an analyst revolves around:
This is one of the places an analyst spends a lot of time in a SIEM. An analyst is normally either hired, appraised, and even fired on the number of alerts they triage. Like logs, alerts can also be generated from various places and some of the most common ones are Intrusion Detection/Prevention Systems, firewalls, and Anti-Virus (AV) engines. Though you will get some information from the point products, it doesn’t give an analyst the full picture of what happened. Some of the issues an analyst faces with alerts generated by point products are:
- Too many technologies generate an alert which forces an analyst to learn them to triage
- Lack of context on alerts will make an analyst’s work cumbersome
- Missing rule code leaves the analyst hanging on the question “what is the reason the alert was triggered?”
- Not knowing what they are protecting will thwart an analyst’s ability to prioritize alerts
Due to the above-mentioned problems, you are making an analyst work for the SIEM instead of the SIEM working for an analyst (sorry Justin Henderson, I stole your quote).
Anything we can do?
- Automate some of the common processes/steps an analyst performs during an investigation to help with short staffing. It need not be complex; it can be as simple as tools like threat miner.
- Documentation is an important part of any organization and in SOC it will help reduce the training period and also help analysts to follow set policies and procedures.
- Trimming the fat (logs) which provide no/little value to improve storage and search speed.
- Understanding the business policies for collecting ALL data which is usually because of misunderstanding the framework. This will improve search speeds as there is relevant and value-rich data.
- If the data is not correlated in the logs/alerts, the best place to start is to get familiar with the product presence in the organization’s architecture. This way, an analyst can see the purpose of the asset and what other data sources the logs can be correlated with.
- Another thing that can help in searching logs is by creating some predefined questions related to what you want to see in your logs, like determining the goal of the search.
- To better the importance of these, don’t be afraid to do a small PoC and present it to the management.
This is a TL;DR version of a series of blogs. If you like this and want to know more, please check the blog and watch the associated webinar.