How to build a secure and scalable anomaly detection system? – Part two

Wiktor Franus

In the first part of the article, I tried to explain what knowledge is needed to build the system. In this part, I will describe the tools without which the system could not work.


Central data repositoryCDR

Within Smart RDM, we most often use OSIsoft PI data repository, a solution designed and proven in manufacturing and utility industries around the world. In CDR, we mainly use two of its components: Asset Framework (AF), where we build the data structure and Data Archive, where we store time series. Additionally, we provide data visualization using the SmartRDM visualization module. This component, however, goes beyond the scope of the Big Data module. If you are curious, you can find more information at

We have ensured that the data is stored in a secure, certified repository. Now it’s time for the next step in the analysis process, the computing environment.

Kubernetes – what it is and why we use it?

Simplifying, it is a platform for managing, automating, and scaling containerized applications. Kubernetes works with many container tools. Most cloud solutions also support it.

A container is a virtually separated part of resources with specific parameters such as the number of cores, ram, and operating system. In such an isolated environment, we can perform operations related to, in our case, the start of processes performing data analysis. What is the advantage of such a container? Probably the biggest is that it works only for a certain amount of time. After the operation is completed, it is removed, and its resources are released. Secondly, it is isolated from other environments and thus safe. Third, it gives the possibility to minimize the cost of resources, for example, the cloud, precisely because it can function only for the time necessary to perform the calculation.

I mention the cloud, and the question of data security immediately arises.

I asked our experts about it, and I can say with full responsibility, yes, the data is safe. I can say this because the cloud is one of the options of the solution: the analyses can be performed on local resources separated from the network or in the cloud, and they are not stored in the cloud.

The analysis process begins locally, in the central data repository (CRD), which in our case is by definition located on the Client’s resources. The process of retrieving the data needed for the analysis from CRD is run on a local machine separated from the OT infrastructure. The repository and local network security guarantee security at this stage.

Next, the data is sent using a secure protocol to the Azure cloud, where it is saved as files. The container in which the analysis is performed is created within the same cloud. Then the files with the analysis results, using a secure protocol, are sent to the local machine. As I wrote earlier, the container is closed, and all data is deleted. Throughout the path, data is secured in several ways. Whether the cloud or local resources will be used to perform the analysis, in my opinion, is more due to the organisation’s security policy because the level of security is comparable.

Ufff, that’s not all I learned from Karolina and Wiktor. There is still enough knowledge for a few articles. I hope that what I wrote from a layman’s perspective will show how vital expertise and proper tools are. Once again, I invite you to the remaining blog articles and the website Smart RDM.

Interview conducted by: Jakub Ładyński