Data flow monitoring has existed for many years. My first experience was around 10 to 12 years ago when talking to members of the security team at a large US networking company. 

While monitoring the corporate network, each night they discovered large flows of data were being transferred from one side of the US to another. 

On further investigation, several other similar flows were also identified. These were made up of 2GB compressed files which, in itself, was suspicious as this is typically the approach taken by attackers to exfiltrate data (2GB typically being the maximum compressed file size and the maximum file size for most common 32bit operating systems at the time.) 

As it transpired, the activity was not malicious, but instead the result of local offices backing up data to larger central offices. What it did confirm, however, is that identifying and analysing data flows could be used to detect malicious activity.

The team developed their own monitoring tools to capture metadata from the headers of network traffic (source and destination IP and service port numbers, data size data/time, for example) and other tools to analyse the resulting data. 

This has developed over the years with data flow collectors now available and analysis being used in analytic and anomaly detection systems. 

Typically, we would use data flow as part of our development of analytic use cases to detect malicious activity based on known advanced persistent threat (APT) habits, such as the transfer of large files to an internet destination.

Retention of data flow information over a period of 60 to 90 days is also very valuable in investigating an incident, as it provides the history of an intrusion and helps identify the attacker’s initial foothold.

To make the most of data flow information, it is essential that you know your network, hosts, servers and gateways, the value of your data, where it is stored and what data flows are expected. While experience shows that very few organisations do track this, an attacker will discover this during the reconnaissance phase on an attack to identify where to target. 

Creating and maintaining the configuration control of your system is an essential first step in exploiting data flow. While asset discovery can be done manually, there are a number of tools – both active and passive – available. This should be augmented with manual reviews that include disconnected or dormant hosts, although tool supported discovery is the only real option to create and maintain an asset inventory. 

The second step is identifying the data you really care about, where it is stored and if it is encrypted. While there are tools to help with this, only you know what is valuable to your organisation and where it is stored. Once you know both your assets and your data, you can start identifying valid and more importantly invalid data flows.

Recently, data flow has attracted a regulatory importance as knowing the disposition of your sensitive data and how it flows through your network is part of the General Data Protection Regulation (GDPR). 

The new data law requires that organisations know what personal data they are collecting and the source and method of collection; where and what format it is stored, such as hard copy, digital, desktop, cloud, mobile; how it is moved and shared internally and externally; who is accountable for it at each stage in the business process; and who has access to it.   

Organisations may also have similar obligations under the Payment Card Industry Data Security Standard (PCI DSS) for the protection of credit card numbers; and associated information and under ISO 27001

Many companies have to do a significant amount of work to meet their regulatory requirements, so where possible the data flow information collected should be used to protect your systems, as well as to satisfy regulation.  The need for regulatory compliance should however help justify the budget for the work involved.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.