Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I\'m planning to take offsite and offline backups of photographs - 700GB of them

ID: 661806 • Letter: I

Question

I'm planning to take offsite and offline backups of photographs - 700GB of them - to blu-ray disks. Selecting and cataloging files manually is too time-consuming. I'm trying to find a solution that:

Stores raw files, instead of custom format. This is important for archiving - that tool is not necessarily available after 5-10 years, but I still want to be able to access the files. As photographs are usually <50MB, lost disk space for not splitting files is negligible.
Automatically catalogs files per disks. I want to be able to find disk holding a specific file.
Automatically detects and chooses modified/new files to be backed up.
Preferably stores checksums to detect corrupted files.
Supports disks with different sizes: some data might go to CD-Rs (700MB) or DVDs. Or to some next-generation disk with greater capacity.
Runs on Windows, Linux and/or OS X. It doesn't have to support all of those.
Either CLI or GUI.
Alerts ("There's this many files that are not backed up yet") are a bonus.

Does such tool exist? I couldn't find anything even close.

I'm aware optical disks decay over time (and that M-Disc is one solution for that). Also, often people seem to recommend HDDs for offsite backups. The problem with HDDs is a) fragility and b) updating the contents. To backup new files, (one of the) disk(s) must be carried back home, updated and then stored offsite again.

I have checked similar questions, including this, this, this and this with no luck.

Explanation / Answer

You can use standard unix tools to do this. They are available on linux (and afaik also on osx, but you have to test that though).

You also should have a look at cdbackup which sounds matching, but I haven't tried it yet.

The script I have in mind uses

bash to process the script
find to locate the files
touch to store date values in files
cpio to do the backup; They are stored in the crc format of cpio which adds checksums to the archive. Don't be worried about cpio not being available in the future. The crc file format is sufficiently easy and well documented that you could re-implement an extraction routine if all else fails. (and I would be really surprised if there is no cpio in 20 years)

If you are interested in doing this by shell, I would give it a shot and write such a script.

You would get a tool that prompts you to switch devices (cpio does that, at least for magnetic tapes, should be the same for DVD-R or any other block device). And a file that has a list of all archived files together with the timestamp of the equivalent backup action and the number of the dvd used. So you will have to open that file, search for the filename you want to recover and then search up for the DVD number and date. Not the handiest solution, but would work.

I have used all these tools in the past, although not for this specific purpose.