How object storage can contribute to cybersecurity analytics [Q&A]

Data volumes are growing exponentially year after year, this means huge amounts of log data that security teams are struggling to collect, analyze and act on in a timely manner.

As a result, security teams are inundated with data that is fragmented across locations and platforms. We spoke to CTO of MinIO Ugur Tigli to discuss how modern object storage can be used to build automated cybersecurity analytics pipelines to break down these barriers and enable security teams to quickly take action on the information stored in log files.

BN: Why has object storage become the default storage choice for cybersecurity log analytics?

UT: To start with, modern object storage provides a unique suite of capabilities. It is cloud native, it is infinitely scalable. Certain object stores deliver performance and a subset of those deliver performance at scale. It is cost effective.

Given that log analytics can grow by 250TB per day -- these are hard requirements.

Second, object storage provides a more flexible data model -- enabling easy access and analysis of log data. Given that cybersecurity log data can come in various formats, and object storage excels in the storage of unstructured data it is a natural choice.

Object storage also allows for the use of metadata, which can be used to tag and categorize log data, making it easier to search and analyze. Optimally designed systems will write the metadata atomically with the object -- not employing a third party metadata database which can collapse at scale.

Finally, object storage is designed to be highly available and durable. Given the regulatory and audit requirements surrounding cybersecurity log data -- immutability, object locking, legal and governance holds are must have capabilities.

BN: How does object storage help to reduce the risk of data loss or corruption when storing cybersecurity log data and what are the regulatory implications?

UT: Object storage is immutable by definition but there are a number of techniques that can protect data from a series of catastrophic events.

Modern object stores use inline erasure coding to protect the data. Data is stored as objects and are distributed across multiple nodes in a cluster. This distribution provides redundancy, meaning that if one node fails, the data can still be accessed from another node in the cluster.

Given the mission critical nature of log files in the modern enterprise, this level of resiliency is a hard requirement.

Object storage systems typically use replication to ensure that data is stored on multiple nodes in the cluster. This replication provides additional protection against data loss or corruption, as multiple copies of the data exist in the system. Replication can be set up across regions, even across clouds and should be strictly consistent to avoid data loss.

Given the requirements around the retention of data (getting more significant and burdensome), designing a system that can meet the needs of the enterprise from an analytics and compliance perspective can be challenging. Using lifecycle management techniques will ensure that active data is readily accessible whereas infrequently accessed data is stored economically -- even at PB scale. Further, for compliance reasons the system has to support features like object locking and holds (legal and governance). For object storage this is a natural extension of the concept of immutability.

BN: What role does performance play in cybersecurity log data?

UT: Performance is a critical factor for cybersecurity log data. The key is to understand the concept of performance and how it relates to scale. Performance is easy when dealing with 10s or 100s of TBs. That is not what log data becomes. Log data gets big, fast -- think PBs per week fast.

Performance that degrades with a week or two of data isn't very performant. It needs to deliver performance in terms of throughput and it needs to scale linearly. Anything less can lead to delays in detecting and responding to cybersecurity threats.

Object storage clusters can be expanded horizontally, allowing organizations to add more nodes to the cluster as data volumes increase. This scalability allows for fast data storage and retrieval, even when dealing with large volumes of log data.

BN: What tools and technologies are typically used in conjunction with object storage for cybersecurity log analytics, and how do they enhance data processing and analysis?

UT: Cybersecurity log analytics is better thought of as a data pipeline with the object store being a foundational component. As a result there are a number of different tools that are required to deliver insight at scale.

There are data ingestion tools for log files and network traffic. Examples of data ingestion tools include Logstash and Fluentd.

There are data processing and analysis tools. This is by far the biggest area from a solutions perspective. They are what power the real-time analysis and detection required to detect and compartmentalize cybersecurity threats. Examples include Elasticsearch and Splunk but also tools like Cloudfabrix and Cribl.

Visualizing log data in a way that is easy to understand is helpful in the investigation and detection process as it helps identify patterns and trends. Examples include Kibana and Grafana.

Machine learning and AI are at the forefront of the ongoing arms race that is cybersecurity. They analyze log data and detect anomalous behavior. Examples include Apache Spark, Presto/Trino and TensorFlow.

These tools are part of an overall solution and an enterprise will need all of them and often will need multiple vendors from each category. The better the data store (in this case object storage) the better the tools work.

BN: What is the best cloud to run large scale cybersecurity analytics?

UT: There is no 'best' cloud for cybersecurity analytics. If there were, it would be an amazing attack surface. The truth is that the best 'cloud' is the cloud operating model. Companies that adopt this model will succeed at a higher rate than their peers.

The cloud operating model involves several key elements:

  • Infrastructure as Code (IaC): The use of IaC tools and techniques to manage infrastructure and automate the deployment of resources. IaC allows for the rapid and consistent deployment of infrastructure, and helps to reduce the risk of errors and inconsistencies. Software-defined everything is the goal here.
  • Containers: The use of containerization technology such as Docker and Kubernetes to deploy and manage applications. Containers provide a lightweight and portable way to package and deploy applications, and can help to improve consistency and reduce dependencies.
  • DevOps: The integration of development and operations teams to enable faster and more efficient development and deployment of applications. DevOps practices such as continuous integration and continuous deployment (CI/CD) can help to streamline the development process and reduce time to market.
  • Microservices: The use of microservices architecture to develop and deploy applications as a set of small, independent services. Microservices allow for greater scalability and flexibility, and can help to reduce complexity and improve maintainability.

Overall, the cloud operating model is designed to enable organizations to take advantage of the scalability, flexibility, and cost-effectiveness of cloud resources. By leveraging IaC, DevOps, microservices, and containers, organizations can streamline their development and deployment processes, reduce infrastructure management overhead, and improve agility and scalability.

These principles apply everywhere from the public cloud, the private cloud, co-locations and the edge. If the enterprise adheres to them they will have greater optionality and better outcomes.

Image credit: ml12nan/

Comments are closed.

© 1998-2024 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.