94
5.5.1 sanitize
If you are particularly sensitive to privacy or security concerns, you may want to consider sanitize, a collection of five Bourne shell scripts that reduce or condense tcpdump trace files and eliminate
confidential information. The scripts renumber host entries and select classes of packets, eliminating all others. This has two primary uses. First, it reduces the size of the files you must deal with,
hopefully focusing your attention on a subset of the original traffic that still contains the traffic of interest. Second, it gives you data that can be distributed or made public for debugging or network
analysis without compromising individual privacy or revealing too much specific information about your network. Clearly, these scripts wont be useful for everyone. But if internal policies constrain
what you can reveal, these scripts are worth looking into.
The five scripts included in sanitize are sanitize-tcp, sanitize-syn-fin, sanitize-udp, sanitize-encap, and sanitize-other. Each script filters out inappropriate traffic and reduces the remaining traffic. For
example, all non-TCP packets are removed by sanitize-tcp and the remaining TCP traffic is reduced to six fields—an unformatted timestamp, a renumbered source address, a renumbered destination address,
the source port, a destination address, and the number of data bytes in the packet.
934303014.772066 205.153.63.30.1174 205.153.63.238.23: . ack 3259091394 win 8647 DF
4500 0028 b30c 4000 8006 2d84 cd99 3f1e cd99 3fee 0496 0017 00ff f9b3 c241 c9c2
5010 21c7 e869 0000 0000 0000 0000
would be reduced to
934303014.772066 1 2 1174 23 0.
Notice that the IP numbers have been replaced with
1
and
2
, respectively. This will be done in a consistent manner with multiple packets so you will still be able to compare addresses within a single trace. The actual data reported
varies from script to script. Here is an example of the syntax:
bsd1 sanitize-tcp tracefile
This runs sanitize-tcp over the tcpdump trace file tracefile. There are no arguments.
5.5.2 tcpdpriv
The program tcpdpriv is another program for removing sensitive information from tcpdump files. There are several major differences between tcpdpriv and sanitize. First, as a shell script, sanitize
should run on almost any Unix system. As a compiled program, this is not true of tcpdpriv. On the other hand, tcpdpriv supports the direct capture of data as well as the analysis of existing files. The
captured packets are written as a tcpdump file, which can be subsequently processed.
Also, tcpdpriv allows you some degree of control over how much of the original data is removed or scrambled. For example, it is possible to have an IP address scrambled but retain its class designation.
If the -C4 option is chosen, an IP address such as 205.153.63.238 might be replaced with 193.0.0.2. Notice that address classes are preserved—a class C address is replaced with a class C address.
There are a variety of command-line options that control how data is rewritten, several of which are mandatory. Many of the command-line options will look familiar to tcpdump users. The program does
not allow output to be written to a terminal, so it must be written directly to a file or redirected. While a useful program, the number of required command-line options can be annoying. There is some
concern that if the options are not selected properly, it may be possible to reconstruct the original data from the scrambled data. In practice, this should be a minor concern.
95
As an example of using tcpdpriv, the following command will scramble the file tracefile:
bsd1 tcpdpriv -P99 -C4 -M20 -r tracefile -w outfile
The -P99 option preserves doesnt scramble the port numbers, -C4 preserves the class identity of the IP addresses, and -M20 preserves multicast addresses. If you want the data output to your terminal,
you can pipe the output to tcpdump:
bsd1 tcpdpriv -P99 -C4 -M20 -r tracefile -w- | tcpdump -r-
The last options look a little strange, but they will work.
5.5.3 tcpflow
Another useful tool is tcpflow, written by Jeremy Elson. This program allows you to capture individual TCP flows or sessions. If the traffic you are looking at includes, say, three different Telnet
sessions, tcpflow will separate the traffic into three different files so you can examine each individually. The program can reconstruct data streams regardless of out-of-order packets or
retransmissions but does not understand fragmentation.
tcpflow stores each flow in a separate file with names based on the source and destination addresses and ports. For example, SSH traffic port 22 between 172.16.2.210 and 205.153.63.30 might have the
filename 172.016.002.210.00022-205.153.063.030.01071, where 1071 is the ephemeral port created for the session.
Since tcpflow uses libpcap, the same packet capture library tcpdump uses, capture filters are constructed in exactly the same way and with the same syntax. It can be used in a number of ways.
For example, you could see what cookies are being sent during an HTTP session. Or you might use it to see if SSH is really encrypting your data. Of course, you could also use it to capture passwords or
read email, so be sure to set permissions correctly.
5.5.4 tcp-reduce