The Importance of NetFlow Reliability
2020-11-02 | 9 min read
Network Security, Monitoring and Analysis tools can be classified in many ways, including by the way they ingest traffic – some can ingest raw data, a.k.a. the full packet streams, others can ingest a stripped-down version of the raw data. Often, this stripped-down data is called metadata and only contains the most basic parts of the information about the raw traffic, which are essential for the tools to perform their job. Many tools would be overwhelmed by the sheer bandwidth and processing power needed to ingest raw data and have “adapted” to live off metadata just fine.
NetFlow has been an industry standard feature for generating, exporting and ingesting metadata, ever since it was developed by Cisco and adopted by most networking vendors. NetFlow version 9 and version 10, which is also called IPFIX, are the most popular implementations, and they are used by a significant number of forensics, compliance, SIEM and monitoring tools, and even for re-building real network traffic patterns, such as in the case of Keysight’s TrafficREWIND.
NetFlow v9 uses a list of standard, fixed-size flow fields, while IPFIX made it easy to introduce custom, variable-size fields, meaning that anyone could export any information they wanted about the traffic flows, with virtually no limitation. Keysight leveraged on this via AppStack’s IxFlow, which is a v10 implementation with added support for 130+, and counting, custom flow fields, in addition to the standard ones.
Those specialized tools I mentioned above rely solely on metadata information to perform critical tasks such as monitoring QoS and network health or detecting breaches and attacks. If the metadata exported to these tools is incomplete or plain wrong, then the tool’s job is compromised or severely inefficient; as they say, “garbage in – garbage out”.
In their latest report, the Tolly Group compares between Keysight Vision X and Gigamon GigaVUE-HC3, including their NetFlow implementations, and checks them for performance and accuracy. You can read about this testing report here: Vision X KOs Gigamon GigaVUE-HC3 in Tolly Tests.
So, what do the results mean?
In order to have accurate NetFlow records, it’s necessary to have protocols and applications identified accurately. Vision X complies with flying colors, both under light and high load. But, with nearly 50% rate of failure to identify protocols under higher load, GigaVUE-HC3 clearly falls short of this requirement.
Consequently, 100% of the filtered application traffic arrives at the NetFlow engine on Vision X, while only about 50% of it arrives on GigaVUE-HC3’s Metadata engine.
The immediate consequence of this behavior is that Vision X exports NetFlow and IxFlow records for 100% of traffic, but GigaVUE-HC3 misses about 50%, resulting in poor quality flow records. HC3’s case is undesirable for several reasons:
- Tools receive only a fraction of the information about the network. Instead of knowing the reality, for example, that there are 100 active users with a total bandwidth of 1 Gbps, the tools think there are only 50 users with a bandwidth of 500 Mbps. This can have serious implications on QoS monitoring and capacity planning. Network admins might rest assured thinking that the backbone link is lightly utilized, when in fact it is almost full.
- The info about protocol distribution is severely skewed, affecting the ability to assess what applications are used and how much. For example, Facebook and YouTube might show up as only using 10% of bandwidth, when they’re actually taking up a much higher ratio.
- Traffic reconstruction from the NetFlow records is far from reflecting the realistic mix on the production network. Thus, using reconstruction tools to test the network yields irrelevant results.
- Suspicious and malicious apps and activities get lost in the large amount of “unknown” traffic which is not identified. Unknown traffic is not reported as metadata, so the info about what it contains is lost to the tools. Attacks and breaches are not detected in due time, leading to data and financial loss.
Next, let’s look at performance. The table below summarizes the rounded maximum rate at which the two products can generate and export flow records, reliably, without loss or packet drops. The results are given per CPU, for basic single-application HTTP or DNS bi-directional flows, and the units of measurement are Connections Per Second (CPS) for TCP and Frames Per Second (FPS) for UDP.
|NetFlow v9, HTTP/TCP||300k CPS||25k CPS|
|NetFlow v10, HTTP/TCP||100k CPS||75k CPS|
|NetFlow v10, DNS/UDP||125k FPS||125k FPS|
Reliability isn’t only about generating flow records, but also about transporting the exported flows. Most visibility network links are reliable, so NetFlow can easily be transported over User Datagram Protocol (UDP), which is the most common practice. However, when the transport network is unreliable, reliable methods and protocols are needed. Luckily, Vision X is able to export flows over Stream Control Transmission Protocol (SCTP), a reliable, message-oriented, congestion-aware protocol, which also happens to be good with jumbo frames containing lots of information fields.
Let’s take the example of UDP, supported by both Vision X and GigaVUE-HC3, and see what the actual export looks like on the wire:
It looks like the Gigamon exporter, by default, creates huge NetFlow packets and then breaks them into fragments when transmitting on the wire, while the Keysight exporter sends standard length, non-fragmented frames. I’m not a big fan of fragmentation, because of two reasons:
- When a single one of the IP fragments is lost in transit, the original jumbo packet cannot be re-assembled, and the info is lost. This is particularly annoying with connectionless protocols such as UDP which have no way to re-transmit failed packets.
- Collector tools need to perform the task or fragment re-assembly. Some tools may have no issue with that, but others might not support re-assembly or might have their performance degraded, you never know so it’s best to stay away from fragmenting altogether.
NetFlow accuracy and reliability under pressure is essential for a company’s network monitoring and security posture.
Vision X has demonstrated the ability to export accurate and rich flow records with high-performance identification of application traffic.
GigaVUE-HC3 struggles to keep up on performance and is severely impaired in the accuracy of high-bandwidth application flow records. The consequences on business could range from missed QoS/KPI targets and angry customers, to security breaches and compliance issues.