SSH as File Transport
Wed 13 October 2010
You may know about scp, the copy-over-ssh program, or sftp, the ftp-like remote client. You may even know about sshfs, which provides a FUSE module to mount a remote filesystem over SSH. Nifty! But, what happens if you have something generating a bunch of data and want to send it over? All of those work on existing files in the filesystem. Luckily, there is a way. Lets say you're taking a forensic copy of a disk with dd and want to transport it? You could make an NFS share and copy it there, or use the aforementioned sshfs, but the simplest method is just using SSH itself. It's actually a lot faster than you would expect. For example:
dd if=/dev/sda bs=1M | ssh user@forensic.box.com 'cat - > forensic-image.dd'
Monitoring the transfer
If you have my [pipestat][] program installed (available in Fedora as well) or one of the other pipe monitoring programs like *pv*, you can add it on at the remote site to see the amount of data coming through the pipe:
dd if=/dev/sda bs=1M | ssh user@forensic.box.com 'pipestat > forensic-image.dd'
The output of the above command would look something like this:
Transferring 49.76 MB/sec , overall 57.04 MB/sec, total 328.19 MB
Wrote 335.58 MB bytes in 5 seconds
687278+0 records in
687278+0 records out
351886336 bytes (352 MB) copied, 5.919 s, 59.5 MB/s
md5 : 4fe869ee987a340198fb0d54c55c47f1
sha1: 1de7bacb4fbbd7b6d391a69abfe174c2509ec303
The fact that the two values (352MB and 328MB) don't match is because pipestat uses Mebibytes (1,048,576 bytes, or 1024 kilobytes, what most people call megabytes) and dd displays in megabytes (1,000,000 bytes, what hard drive manufacturers call megabytes). The first line, containing the snapshot transfer rate, overall rate, and speed, is pipestat.
Inline checksums
Having the md5 and sha1 values done inline can save a lot of time. You can now have a known good checksum without having to do any extra work. On a several-hundred-gigabyte file this can save you an hour or more of md5sum/sha1sum.
Compression
You can add in compression as well. gzip and bzip2 are the two commonly used methods to do inline compression, with lzma faster than bzip2 and compresses better than gzip. Depending on the system you are transferring from and the speed of your network, you may want to use different compression levels to get the CPU/network tradeoff you are comfortable with. Examples: dd if=/dev/sda bs=1M | gzip -c | ssh user@forensic.box.com 'gunzip -c | pipestat > forensic-image.dd' This will inline compress and uncompress the data. The pipestat will show "actual" sent data, not the compressed version dd if=/dev/sda bs=1M | pipestat | gzip -c | ssh user@forensic.box.com 'cat - > forensic-image.dd.gz' Same as above, but running pipestat locally, and saving the file on the remote machine as a gzip file lzma(unlzma) and bzip2(bunzip2) both work similarly, with the -c option. I usually use gzip because lzma isn't available on all systems. bzip2 often will max out the CPU well before the network connection is filled. [pipestat]: http://code.google.com/p/pipestat/
Category: Linux Tagged: disk imaging scripting security shell ssh