Solved Encrypt large file (tar archive)

Hi,

I would like to know which tool(s) would you recommend to encrypt a large file (archive). The objective is personal backup.

Backup scenario:
  • archive folders to one file
  • encrypt archived file
  • send to online backup
Recover:
  • download file from backup
  • decrypt, extract from archive
I would also like to automate the steps for both backup and recovery. I guess symmetrical encryption would be best? Any advice for how to store the key?

Thank you!
 
I once had such a backup solution, it even worked "on the fly" not requiring temporary files, eg. tar -> openssl (AES) -> ssh. Restore would work too the other way around, on the fly. I did that when I didn't have control over the backup server file system. But that was rather flaky as burps in the network connection required the entire process be restarted. Also, no way to checksum data and test for integrity, unless, of course, you first tar, then encrypt into another file, then transfer over. Now I'm just content with rsnapshot over ssh from a server under my control with geli encrypted disks, which is better as it keeps a hardlink-based history of snapshots. That way if the "backupee" got compromised, it couldn't affect the backups as it was passive (called from the backup server). I also keep the encrypted partition separate from root and have to manually mount it and unlock it after each reboot, but that happens rarely and I don't mind it.
 
I think the GPG encryption utility would work well in this case. It'll use keys you have generated to encrypt a file. Alternatively, OpenSSL's enc function will do something similar. Either could be used in a script after the tar command runs and before the upload occurs.
 
Hi,

Thanks for your responses. I do have now some options to encrypt my files. In the long run, I should probably use sysutils/rsnapshot over ssh as blackflow suggested.

But I have a question here: since the servers I'm backing up to, even if they are kind of in my control, still, they are either VPS or instances in some cloud, and as I understand, geli does not offer protection if an intruder gets access to the server, while it is running. So basically if someone gains root, they will be able to access my files. So geli would offer protection only if they steal the hard disks. Is that correct?

How could I still use rsync/rsnapshot so that I can benefit incremental backups, and have the data encrypted on the backup server? The data does not need to be immediately accessible, it just needs to sit there protected. Or should I rather focus on securing the backup server?

Thank you!
 
scrypt has another purpose, namely hard to brute force password hashes. The targetted problem is that there are gazillions of password databases out there using md5 (and similarly insecure) hashes.

Unless one is prepared to use (and probably build oneself) more exotic encryption utilities (NaCl might come to mind as a base), simply using AES based encryption should do the trick. Some people also love blowfish (and it's desendants/derivates) but considering how fast AES is, it's probably offering the best cost/speed terms of the more secure encryption implementations. Moreover it's supported by most hardware encryption schemes (e.g. VIA Padlock).

As for the transport mechanism for the backups there are many available and well proven but not all are available with a given web space provider.

Hint: I would strongly suggest for no matter what scheme is chosen to also hash the archive file or, even better, to hash backup sub-elements (e.g. directories) or even each file and to keep those hashes locally available (e.g. on a CD-ROM). It's often overlooked but transferring data, particularly over not necessarily reliable media, and storing them who knows where on this planet and transferring them back always incurs a risk of changed/corrupted/tampered data.
 
But I have a question here: since the servers I'm backing up to, even if they are kind of in my control, still, they are either VPS or instances in some cloud, and as I understand, geli does not offer protection if an intruder gets access to the server, while it is running. So basically if someone gains root, they will be able to access my files. So geli would offer protection only if they steal the hard disks. Is that correct?

The primary reason to encrypt is of course physical access to the disks. As for the VPS, not all companies will wipe your disk image clean after you're done with it. So unless you encrypt your virtual disks, another customer could read raw blocks from the virtual disks if their image ends up reusing blocks your image used.

As far as I know, and I hope more security-oriented people will correct me if I'm wrong, no one can access a running VPS unless the hypervisor controls have been somehow modified, including even the guest kernel. Now, whether someone logged into the host could read parts of your VPS RAM, or disk image file, that's another story, but attaching to the image and entering its process space in order to read decrypted disks, not with regular (unmodified hypervisor and guest kernel) tools, as far as I know, without knowing your root password.

Host base intrusion detection like AIDE or Tripwire might help discovering tracks if someone did. Also remote syslog logging will also help if you're really that paranoid. If anyone logs in, it will be syslogged but the "attacker" won't be able to cover the tracks.

Or should I rather focus on securing the backup server?

If the data is really very sensitive, then by all means make sure you have physical access to the server as close to 100% under your control as possible.
 
Are we talking average Joe's security or are we looking at a really sensitive case?

If the latter ... I wouldn't say "forget Tripwire and similar" but I would definitely say "forget Tripwire and similar unless you have a profound understanding of the OS and security". The problem is that whoever has physical access to a machine basically (p)owns it. If in doubt this is rather more true for VPS. Simple as that.

You have basically two ways to choose from. Either a very high security solution (which I even won't go into because having to ask questions of the kind you asked strongly indicates that's not the right way for you to go) or a standard (as in "available standard packages") way shifting security issues to (or keeping them at) where you are in control, i.e. on your box.

So, back up your files any way you see fit (and ideally create hashes/fingerprints along the way) and then push/pull that file/those files to/from your backup system in the cloud/internet and that's about it.

Sure, you can use an encrypted file system on your VPS, but why? Why encrypt again and why doing that on a system that's not (guaranteed to be) under your control? Better send and store encrypted backups.

Generally speaking, the story is this: if you have to store something on a machine you don't control (i.e. anywhere on the internet) treat the material to store as if you had to entrust it to your worst enemy. -> Encrypt, encrypt wisely, and encrypt properly.

Second, have locally (or at least elsewhere) kept hashes for your file(s) to be able to know when your file(s) have been tampered with, be it on the VPS or be it during transmission.

Finally, a bit of general advice:

Don't overdo encryption. Rather try to understand well what you're doing and how things work. "Encryption" (or what people usually mean by that) is not one thing but rather a toolbox with different tools for different jobs. Concrete example: you basically have two jobs (in my mind's eye). One, you don't want your data to be accessed/read by someone else; that's a classical encryption scenario. Second, you want to be sure that the data you'll get back one day are exactly the data you sent/stored today; that's a classical hashing/fingerprinting scenario. Possibly third, you might want to be sure that your data survive for any (realistic) amount of time. This is about your VPS provider going down (just think "Lavabit" or "Megaupload"). So, if your data are very important to you, store them at two (or even more) different sites, possibly geographically and politically and legally distant/dispersed.

Hint: You do not access backups continuously and highly frequent and your backups are "somewhere in the internet" - that's a very different scenario from continuously accessed (pseudo)local data as, for instance, on your local box or a web server. Accordingly the approach to chose is different.
 
Asymmetrical encryption like security/gnupg is the way to go. You don't need the secret key to encrypt because you are sending a "message" to yourself in a way. Only you can then decrypt the encrypted file when you dig up that secret key from the concrete bunker.
 
Hello,

Thank you for your answers.
rmoe, indeed the data is not extremely sensitive. Nevertheless, since I'm uploading my backup somewhere on a VPS or in the cloud, I want it to be encrypted.
So for now, I'll go with having an archive, or a few of them, hash them, encrypt them, and upload them. I was interested what are the tools nowadays used for encryption, and I got my answers, thank you!
 
Hello,

Thank you for your answers.
rmoe, indeed the data is not extremely sensitive. Nevertheless, since I'm uploading my backup somewhere on a VPS or in the cloud, I want it to be encrypted.
So for now, I'll go with having an archive, or a few of them, hash them, encrypt them, and upload them. I was interested what are the tools nowadays used for encryption, and I got my answers, thank you!
There is already a program in ports which will do all that for you sysutils/duplicity which can be easily used using its "frontend" sysutils/duply. The program uses GPG as a back-end.
 
Last edited:
Oko, thanks that's exactly what I was looking for. Because indeed, having a single tar of all your backup content, and having to encrypt and upload that every time you want to make a backup, is not a pretty solution if we're talking hundreds of GB. So I was also searching for a way to backup incrementally, and encrypt the increment data that is sent over the network. And although rsync works wonderful at incremental backups, I haven't seen an obvious way to add encryption to this.

So I will definitely experiment the automation of the backups with sysutils/duplicity, seems the perfect tool for that.

I also think that ZFS snapshots are also a good tool for the job. You save the initial snapshot to a file ( zfs send) and you can also save the "incremental streams" (that is, the diff between the initial snapshot and the current state...or intermediate states). You can encrypt those files, and back them up. Then, with zfs receive you can restore using the initial snapshot and the incremental stream.

Now I will play with both options (ZFS snapshots and sysutils/duplicity to see what works best in my case.
 
Back
Top