Wireshark comes with a wide range of cool analysis tools. But I have yet to find a way to measure jitter and delay  over a network node that works with any kind of traffic. You might be able to utilize Cisco's IP SLA and graph the results using SNMP. But that will require two Cisco routers: one on each side of the node in question. And without any nodes in between. This might not be your topology. And you won't measure the real traffic as IP SLA generates its own traffic for measurement. With TAP devices and the method described below, you will be able to measure directly on TCP or UDP traffic going through the node.


The setup

In my setup, which is actually a GNS3 lab, I have two networks linked together by a router. On each interface of the router, I have attached a tap[1] device, that will send all traffic on the two links to my analyzer PC. Each tap is connected to an individual network card, named tap1 and tap2 on the analyzer PC


Packet order misalignment

The first thing worth noticing when capturing from two interfaces at once is out of order packets:

stdbuf -o 0 tshark -i tap1 -i tap2 -T fields -e frame.time_epoch -e ip.src -e ip.dst -e tcp.checksum 'tcp port 8000'


The capture on each interface is handled by separate threads by the operating system kernel and the threads are not synchronized. This is a known issue with Wireshark / Tshark. Misordered packets can be handled with reordercap, if the captures are saved in files. But since I want to display the results live using feedgnuplot, I need to have the packets from the two interfaces in the right order.

A solution for this is to combine the two interfaces into a virtual bridge and capture from the bridge instead:

A bridge is not supposed to re-order packages. So capturing from the bridge, will capture from both interfaces at once and the packets will be aligned.

Adding another device, the bridge and though virtual, I was worried that timestamps from captured packets would be affected when traversing from the capture interface into the bridge. However: This is not the case. The timestamp from the captured packet is preserved when it enters the virtual bridge. The below commands will create a bridge named tapbr0 and add the interfaces tap1 and tap2 to it:

m-lund@mlu-Latitude-E5520:~$ brctl addbr tapbr0
m-lund@mlu-Latitude-E5520:~$ brctl addif tapbr0 tap1
m-lund@mlu-Latitude-E5520:~$ brctl addif tapbr0 tap2
m-lund@mlu-Latitude-E5520:~$ brctl show
bridge name     bridge id               STP enabled     interfaces
tapbr0          8000.86d22f1edfc5       no              tap1

A virtual bridge is in fact a bridge. In my GNS3 setup, I have actually created a layer 2 bridge across the node in question. This should not be an issue with real taps though, as they will not forward traffic sent to the monitor port.

Another issue which I cannot overcome at the moment: the bridge will forward traffic between the two capture interfaces. Thus each interface will have to deal with broadcast traffic from both of the links. If this issue is negligible or not depend on your network traffic.


How to identify packages across the link

First  of all one needs to specify a PCAP filter, that only captures the traffic in question. It could be the TCP traffic between two end devices on a specified portt.

To identify the same package on both side of the node, we would need something to identify it by. For TCP packets the tcp..checksum can be used as it will most likely not be the same for two sequential packets. Several sequential TCP packets even with the same data will still have different checksums, as the TCP sequence number changes from packet to packet.

WIth UDP packages the checksum does not change, if the data field remains the same in sequential packets, however.. So you will have to rely on the uniqueness of each packet - or maybe combine it with the TTL in the IP header, which will be subtracted by one passing the layer three node. In fact: Why not just TTL in addition to checksum to check for uniqueness of TCP packets? This way checksum collisions are overcome as well.

Your monitoring session will look something like this for a TCP session. Notice how the checksum remains the same in pairs? This is the same packet, traversing the node and the TTL is subtracted by one by the node.


m-lund@mlu-Latitude-E5520:~$ stdbuf -o 0 tshark -i tapbr0 -T fields -e frame.time_epoch -e tcp.checksum -e ip.ttl 'tcp port 8000'   
apturing on 'tapbr0'
1518616973.420394300    0x000036b6      64
1518616973.420602625    0x000036b6      63
1518616973.421926118    0x000036b0      64
1518616973.422103967    0x000036b0      63
1518616973.422947663    0x00007a75      64
1518616973.423102321    0x00007a75      63
1518616973.521698351    0x0000366e      64


For UDP it is a little different. To illustrate the above issue with UDP and checksum, I have embedded the same data in my all my UDP packages, and the checksum thus remains the same. But the TTL is still counted down by the node.

m-lund@mlu-Latitude-E5520:~$ stdbuf -o 0 tshark -i tapper -T fields -e frame.time_epoch -e udp.checksum -e ip.ttl 'udp port 8000'  
Capturing on 'tapbr0'
1518616283.387078169    0x000071aa      64
1518616283.387614734    0x000071aa      63
1518616283.488870251    0x000071aa      64
1518616283.489190599    0x000071aa      63
1518616283.589671744    0x000071aa      64
1518616283.589892542    0x000071aa      63
1518616283.691113026    0x000071aa      64


Calculating the the delay added by the node

With the output created by Tshark, awk can compare checksum values and calculate delay times. To avoid the UDP checksum issue, we also need to make sure that the TTL has in fact been reduced by one as well.

You might wonder: Does the script really have to be somewhat complex? Can't we just match line by line? We can, if the above output example is the truth all the time. However: In the real world, packages might buffered in the router, before being sent out again. In that case, packet will not be shown as in/out pairs just like above.

In the below script I have taken care of that by introducing tolerance. Tolerance is the number of packets we allow to skip until we find our matching packet. Skipped packages will not be measured.

# File: calculate_delay.awk

    # If we are told to read the next packet, store the values of the
    # packet in timestamp, checksum and ttl for later comparison
    if (readnext == 1) {
        print "DEBUG: Reading checksum: ", checksum," at ",timestamp
        readnext = 0

    # Compare the active packet with the stored packet.
    if (readnext == 0) {
        if (checksum == $2  && (ttl-1) == $3 ) {

# We have a match!
# Print out the timestamp and the difference in timestamp between the two
# packets, which is the node delay.

            print $1, $1-timestamp
            readnext = 1

        # If we don't have a packet match, increment the skip counter.
       else {
            skipped = skipped + 1;
            # If skip counter reaches above tolerance, we assume we will never see
            # the partner of the stored packet. Instead, we want to go ahead and
            # read in the next packet
            if (skipped >= tolerance) {

                # We want to skip toggle between an even and uneven offset randomly
                # for reading the next packet by modifying the tolerance.
# Otherwise: In some cases we might end in some weird situations
# Were we will only read the initial packet on the outbound interface
# thus never be able to calculate the delay time

                tolerance=initialtolerance + int(rand()+.5)
                readnext = 1
                skipped = 0

Pllotting the data

In the end we want to plot the delay as a  function of time using feedgnuplot. The output from the above script fits directly into Feedgnuplot. So the combined command line will look something like this:

stdbuf -o 0 tshark -i tapbr0 -T fields -e frame.time_epoch -e tcp.checksum -e ip.ttl "tcp port 8000" 2> /dev/null | stdbuf -o 0 awk -f /tmp/calculate_delay.awk  | stdbuf -o 0 grep -v "DEBUG" |  feedgnuplot --timefmt %s --domain --lines --stream --exit

And the result will look something like this:

Please note, that I'm running this in a virtual lab in GNS3. This causes extreme delay and a lot of jitter. A physical device that forwards traffic using asics should leave much less delay and jitter.


Final words

 So what can this be used for? My first thought is benchmarking:

  • Measure router / firewall performance
  • Measuring impact of added firewall rules
  • Measure impact of microbursts

You'll  need high quality taps to avoid adding latency. Your network card should definitely support hardware time stamping. Mosts cards do. For higher accuracy go with a card with a high precision oscillator. Look for card to be used for time protocol (PTP).

If you do not know feedgnuplot yet, please refer to Plot performance counters in realtime with feedgnuplot, which is an article I wrote a few weeks ago: