====== Hex-Edit your way through the volume ======
This how-to document will explain you how to find a file manually in some NTFS volume.
You will need
* Free time.
* A calculator that supports Hexadecimal display.
* The will do do this.
* And, ahm, an hex-editor.
===== Finding the $MFT =====
Open the volume using your favourite hex-editor. The hex listings in this document were generated by the standard ''hexdump''.
At the very beginning (offset 0), you should see something similar to:
# hexdump -Cn 16 /dev/hda1
00000000 eb 52 90 4e 54 46 53 20 20 20 20 00 02 08 00 00 |.R.NTFS .....|
The first 3 bytes are a jump command in x86 assembly. It is the first instruction executed after the boot sector receives control from the MBR (let's say, from GRUB or LILO). Right after that you find the ''NTFS'' magic. If you don't see that magic value - you're not looking on an NTFS volume.
Since this is the boot sector, let's open the [[:documentation|NTFS Technical documentation]] on the $Boot entry (in the files chapter), and see what information we can extract here.
Well, it does say that offset 0 has a 3 bytes jump, and an 8 bytes magic, so I'm right this far. But it also says something about "LCN of VCN 0 of the $MFT", what is that?
The MFT is the Master File Table. It is a table (ya right) of entries (that's what a table is about), where each entry is called a "file record". This is the table that lists every file/directory/meta-data on the volume. We obviously interested in that (unless all you want is to find the volume serial number or some other trivial stuff).
And what are those LCN/VCN abbreviations? These are Logical/Virtual Cluster Numbers. If you don't know what a cluster is, check out the ''cluster'' entry in the concepts chapter of the [[:documentation|NTFS Technical documentation]]. The cluster size is determined by the "Bytes per sector" and "Sectors per cluster" fields of $Boot.
Let's return to our example:
So we need to find the "LCN of VCN 0 of the $MFT".
# hexdump -C -s 0x30 -n 8 /dev/hda1
00000030 0a 00 00 00 00 00 00 00 |........|
(Almost) All the NTFS structures are in [[wp>Little_endian|Little-Endian]] format. In simple and practical terms, numbers are read backwards. Therefore, the LCN we need is 0x0A, or in decimal: 10.
The contents of offset 0x0B in $Boot is 0x0200 (see the listing above, and don't forget that it is in [[wp>Little_endian|Little-Endian]] format), and offset 0x0D is 0x08. That means that each cluster (on my system) is 512*8=4096 (=0x1000) bytes.
Now we know where to find the $MFT: in offset 4096*10=40960 (=0xA000). Let's dump the very beginning of it:
# hexdump -C -s 0xa000 -n 32 /dev/hda1
0000a000 46 49 4c 45 2a 00 03 00 41 c7 b4 c6 00 00 00 00 |FILE*...A.......|
0000a010 01 00 01 00 30 00 01 00 c0 01 00 00 00 04 00 00 |....0...........|
The ''FILE'' magic indicates that this is a ''file record''. There is a special entry for that in the [[:documentation|NTFS Technical documentation]].
Congratulations, you (me) have found the (first entry in the) $MFT file.
===== Finding a specific MFT record =====
The $MFT is a stream of MFT records, a.k.a ''file record''s.
Each MFT record has the size of ''Clusters per MFT Record'', that is the field at offset 0x40 in $Boot.
# hexdump -C -s 0x40 -n 4 /dev/hda1
00000040 f6 00 00 00 |....|
On my volume, this is -10 (if you don't know how to recognize negative numbers, this is the time to learn). Looking again at the $Boot entry, we find out that -10 means that each MFT record contains exactly 1024 bytes.
Let's suppose we want to find MFT record #1234. We can find it in offset 1234 * 1024 in the $MFT.
A regular guy would try to add the $MFT offset in the volume and would conclude that the record is in 1234 * 1024 + 2048 = xyz from the start of the volume. This is wrong because the MFT can become fragmented, and therefore, The VCNs of the $MFT are not laid out sequentially on the volume, as a result, LCN x is not always VCN x + 10 (10 comes from the example).
The real question is "Where is $MFT offset 1234 * 1024 on the volume?"
===== Seeking an offset in a file. =====
Suppose you have a ''file record''. It contains some data. That data is stored in attributes. one of these attributes is 0x80 ($DATA), and more specifically, the unnamed one.
Most attributes are resident, however, some can become nonresident. A resident attribute is an attribute that stores its data inside the ''file record''. When it becomes large enough, it is too big and can't fit inside a ''file record''. That is when its contents are extracted to somewhere else on the volume, and the ''file record'' only points to that location. The pointer to that location is in the form of a run-list.
To find an offset in the file, we need to find what VCN this offset is in. In the example, this is in the middle of VCN 308 (1234 * 1024 / 4096). Then we look at the run-list to map it to a LCN. Don't forget to remove the fixup before searching through the run-list - this is one of the most common mistakes.
==== Finding the run-list ====
Look at the MFT record, read offset 0x14 (Offset to the first Attribute), and go through that list by using offset 0x4 of each attribute (Length) until you find the one you look for (usually 0x80).
==== Removing the fixup ====
There's a long description on that in the [[:documentation|NTFS Technical documentation]]. No need to duplicate it here.
==== Finding a specific MFT record (cont) ====
So by now, we know where the $MFT starts. We have the first file record. By looking in the ''files'' chapter, we find out that this is the ''file record'' that describes the $MFT itself.
And since we have the ''file record'' of the $MFT, we can find the run-list of the $MFT, and find any other MFT record.
Notice that this is almost cheating - For finding the file record of $MFT we don't look it up like every other file record (using the run-list). This is because a cluster is guaranteed to be larger than a file record, and thus the ''file record'' of $MFT is guaranteed to be contiguous.
Of course, we know how to find an offset inside the $MFT because we know how to use that run-list. and we know how to calculate that offset, so were set.
Now all that we need it to find The MFT record number of the file we need.
===== Translating a filename to its MFT record number =====
MFT record #5 is called . (dot). This is the root directory.
You will need to split the path leading to your file to sub-directories. Then, you will need to scan each directory, starting with "." until you can find your file.
But since this howto becomes large enough, and fulfilled it's goal to help you start, you will have to use another how-to.
===== Conclusion =====
You have experienced in using an hex-viewer/editor and looked (logically) at your hard-disk from a very low level.
You also have experienced in reading the [[:documentation|NTFS Technical documentation]]. That means you now have the tools (and starting to develop the skill) to read every little detail that resides on [[:NTFS]] volumes.
Congratulations, You have reached the end of this how-to. I hoped you enjoyed your flight.
The Linux-NTFS team.