Monday, December 17, 2007

Recovering data from the EXT2 Filesystem

EXT2 Undelete: A pretty good tutorial for recovering data from ext2 filesystems:

http://fedora.linuxsir.org/doc/ext2undelete/Ext2fs-Undeletion.html

See also:

4 comments:

joshuat said...
This comment has been removed by the author.
joshuat said...

Ironic that you post this today. Friday I had to recover deleted data, though on a ext3 filesystem. Since ext3 doesn't have recovery tools available, I used grep -a and perl (with binmode()). Fortunately it was an HTML file and easy enough to find.

cmihai said...

I'm glad you've managed to get your data back... Feel free to post any interesting (perl) scripts you may have used.

If it's one thing I've leaned about data recovery, is that there are many ways to recover files if they aren't overwritten. So the best approach when data loss occurs is to sync the disks and halt the system (power off). You can the mount it read only (or image it), and dig around for your data ;-).

Knowing a thing or two about what you're looking for helps a lot with file carving (signatures for example). Also, having a filesystem specification handy helps. Or you could just use some undelete tool specific to the filesystem.

PS: Comments are moderated so they may take a while to show up.

joshuat said...

The file in question was a tiddlywiki.html file, which had dozens of backups created, so I had a LOT of duplicate data to sort through. The command that got the bulk of the data I needed was this:

perl -wnl -e 'BEGIN{binmode STDIN; binmode STDOUT}; /^beginning regexp/ .. /^ending regexp/ and print;' /dev/sda5

To filter out raw data, control characters and wide/unicode characters from the range in the div, I also used the following command:

perl -wnl -e 'BEGIN{binmode STDIN; binmode STDOUT}; /^beginning regexp/ .. /^ending regexp/ and print unless /[\x00-\x1F\x7F];' /dev/sda5

I used [\x80-\xff] and [\ca-\cz] in the 'unless' regexp as well. In both, the 'beginning regexp' and 'ending regexp' were html div tags, but blogger didn't like those tags. Just imagine they're in the code.