Recover Deleted Data Using Scalpel

Each person in life has such a situation as rm -rf on the folder where this should not have been done. Backups are good, but what if they are not? For Linux systems, there is the Scalpel utility, which allows you to recover deleted files according to given patterns, including the use of regular expressions.

Scalpel is a fork of the Foremost project (since version 0.69), which began its history in 2005. Has its own githubrepository and is faster in data recovery speed, as well as efficiency than Foremost. Speaking about the difference between these two projects, we can say that Foremost, released after version 0.69, has new semantic data recovery techniques. For example, when restoring JPEG files, the header of this file is used to calculate the corresponding image body, when Scalpel simply takes data between the specified signatures of completion and the beginning of the image file. Thus, we can say that Foremost can more accurately recover lost data when Scalpel does it much faster.

Features that Scalpel provides:

  1. File System Independent Recovery
  2. Setting the minimum and maximum sizes of the restored file
  3. Using multithreading on multicore systems
  4. Asynchronous I / O operations giving an increase when searching by a template
  5. Using TRE regexes to search by start and end of file
  6. Ability to recover from nested data structures
  7. The possibility of using a GPU is available for geeks, which is available only for Linux and requires a pre-installed NVIDIA CUDA SDK and small modifications of the source code (searching using regular expressions does not work with the GPU)

Scalpel is usually available among the packages of the corresponding Linux distribution, but you can also collect it from raw materials by taking it from the github repository .

The application is configured in the /etc/scalpel/scalpel.conf file, where the appropriate file search templates are set. In it you can see ready-made presets for searching, for example, on images or doc files. To recover lost data, uncomment the corresponding templates and run the application.

If the file does not contain the template of the desired file, or for example, you are looking for some specific xml format, then you need to create your own template, which is described similar to the rules below
TypeCase sensitiveSize rangeHeaderFooterSearch option
aviy50,000,000RIFF ???? AVI
docy10,000,000\ xd0 \ xcf \ x11 \ xe0 \ xa1 \ xb1 \ x1a \ xe1 \ x00 \ x00\ xd0 \ xcf \ x11 \ xe0 \ xa1 \ xb1 \ x1a \ xe1 \ x00 \ x00NEXT
pdfy500,000% PDF% EOF \ x0dREVERSE
pdfy500,000% PDF% EOF \ x0aREVERSE
texy300: 50,000/%.{1,20►\.tex//%.{1,20►\.tex\sEnd/
Briefly about columns

  • Type - it matters only in the context of the program, it has nothing to do with the files being restored and will be used only when displaying the log and the name of directories in the restored data. You can not specify, in this case just put NONE
  • Case sensitive - whether to be case sensitive when searching for a given pattern
  • Size range - if just a number is specified, then the maximum size of the file to search for, if through a colon then min: max file size
  • Header , Footer - templates for finding the beginning and end of a file. The use of TRE regular expressions is acceptable. The \ s character is used as a space, you can also use the hexadecimal and octal representations of the desired data, for example, \ x [0-f] [0-f] or \ [0-3] [0-7] [0-7]

It is important to note that the following values ​​are allowed as the last Search option field:

  • REVERSE - this parameter should be used if several trailing patterns can be used in the file. For example, PDF files or PHP, in which there may be several scripts framed in
  • NEXT - is used when it is necessary to obtain data between the beginning and the first completion of the file, if the end of the file is not found then an area of ​​the size specified in the size field will be taken
  • FORWARD_NEXT - This is a standard behavior and its specification is optional. In this version, a strict check of the template occurs, and if the beginning of the file is found and its completion is not found in the specified memory size, this section is excluded from the search results, unless the -b option is specified when the program starts, then a section of the specified size with the corresponding beginning will be taken , and the application’s log [audit.txt] will write about the cropped file (chop column)

In case you need to use the question mark "?" as the desired value in the header or end of the file, then you need to redefine the wildcard symbol, which is the question mark. To do this, write at the beginning of the configuration file
wildcard S
Where S is the new wildcard symbol in the search expression, or use the hexadecimal representation of this symbol, which is equivalent to \ 0x3f or \ 063

Now practice

Suppose we deleted the files that are described by the templates in the table above. We will write these templates in the configuration file (the tab character is used as the column separation) /etc/scalpel/scalpel.conf and start the recovery (I prepared an image with data for recovery in advance)

root# scalpel MyDrive.img -o recover
Written by Golden G. Richard III, based on Foremost 0.69.
Opening target "/home/username/Documents/repair_files/test/MyDrive.img"
Image file pass 1/2.
MyDrive.img: 100.0% |*****************************************************|  500.0 MB    00:00 ETA 
Allocating work queues...
Work queues allocation complete. Building carve lists...
Carve lists built.  Workload:
avi with header "\x52\x49\x46\x46\x3f\x3f\x3f\x3f\x41\x56\x49" and footer "" --> 1 files
doc with header "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00" and footer "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00" --> 2 files
pdf with header "\x25\x50\x44\x46" and footer "\x25\x45\x4f\x46\x0d" --> 33 files
pdf with header "\x25\x50\x44\x46" and footer "\x25\x45\x4f\x46\x0a" --> 19 files
php with header "\x3c\x3f\x70\x68\x70" and footer "\x3f\x3e" --> 8 files
Carving files from image.
Image file pass 2/2.
MyDrive.img: 100.0% *****************************************************|  500.0 MB    00:00 ETA
Processing of image file complete. Cleaning up...
Scalpel is done, files carved = 63, elapsed = 6 seconds.

After completion of execution in the resulting folder, we will find the files found and audit.txt in which there will be a brief information about the files found, similar to the one presented below

Scalpel version 1.60 audit file
Started at Wed Jan  7 12:50:52 2015
Command line:
scalpel MyDrive.img -o recover 
Output directory: /home/username/Documents/repair_files/test/recover
Configuration file: /etc/scalpel/scalpel.conf
Opening target "/home/username/Documents/repair_files/test/MyDrive.img"
The following files were carved:
File		  Start			Chop		Length		Extracted From
00000003.pdf       549888		NO             4162		MyDrive.img
00000055.php      1227776		NO            99954		MyDrive.img
00000001.doc      8916992		YES        10000000		MyDrive.img

Also pay attention to some of the available options
-p when using this option, the files will not be restored, but an audit file will be created in which it will be possible to see which files will be restored
-q with this option scalpel will only scan the beginning of each cluster of a given size and look for the corresponding start of the file to search for
-v verbose mode
-o specify the directory where the result of data recovery will be put

All successful data recovery!

useful links
  1. Github Scalpel repository
  2. Scalpel: A Frugal, High Performance File Carver. Golden R. Richard III, Vassil Roussev
  3. SANS Institute InfoSec reading room: data carving concepts
  4. TRE Regex Syntax

Also popular now: