The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

FileSystem::LL::FAT - Perl extension for low-level access to FAT partitions

SYNOPSIS

  use FileSystem::LL::FAT;
  blah blah blah

DESCRIPTION

MBR_2_partitions($sector)

  ($fields, @partitions) = MBR_2_partitions($sector) or die "Not an MBR";

Takes the first sector as a string, extracts the partition info and other information. Currently the only fields in the hash referenced by $fields is bootcode (string of length 446) and signature (0xAA55).

Each element of @partitions is a hash reference with fields

  raw is_active start_head start_sec_trac type end_head end_sec_track
  start_lba sectors start_sec end_sec start_trac end_track

Returns an empty list unless signature is correct.

interpret_bootsector($bootsector)

Takes a string containing 512Byte bootsector; returns a hash reference with decoded fields. The keys include

  jump oem sector_size sectors_in_cluster FAT_table_off num_FAT_tables
  root_dir_entries total_sectors1 media_type sectors_per_FAT16
  sectors_per_track heads hidden_sectors total_sectors2
  machine_code FS_type boot_signature volume_label physical_drive
  ext_boot_signature serial_number raw

  bpb_ext_boot_signature guessed_FAT_flavor
  total_sectors sectors_per_FAT pre_sectors last_cluster sector_of_cluster0

(the last line contains info calculated based on other entries; guessed_FAT_flavor is one of 12,16,32, and bpb_ext_boot_signature is the ext_boot_signature calculated assuming FAT12 or FAT16 layout of bootsector).

Additional flavor-dependent keys: in FAT32 case

  sectors_per_FAT32 FAT_flags version rootdir_start_cluster
  fsi_sector_sector bootcopy_sector_sector reserved1 reserved2

otherwise

  extended_bpb head___dirty_flags

check_bootsector($fields)

Takes a hash reference with decoded fields of a bootsector; returns TRUE if minimal sanity checks hold; die()s otherwise.

interpret_directory($dir, $is_fat32, [$keep_del, [$keep_dots, [$keep_labels]]])

  ($res, $files) = interpret_directory($dir, $is_FAT32);
  $files = interpret_directory($dir, $is_FAT32);

Takes catenation of directory cluster(s) as a string, extracts information about the files in the directory. Each element of array referenced by $files is a hash reference with keys

  raw basename ext attrib name_ext_case creation_01sec time_create
  date_create date_access cluster_high time_write date_write cluster_low
  size cluster name dos_name time_creation
  is_readonly is_hidden is_system is_volume_label is_subdir is_archive is_device

and possibly lfn_name, lfn_name_UTF16, lfn_raw (if applicable). (The last row lists flags extracted from attrib.)

basename and $<ext> are parts of the "DOS name" (lowercased if indicated by the flags), time_create has 0.01sec granularity (while time_creation has 2sec granularity). Entries for deleted files are filtered out unless $keep_del is TRUE; . and .. are also filtered out unless $keep_dots is TRUE; records representing volume labels are also deleted unless $keep_labels is TRUE. If not filtered out, hashes for deleted files have an extra key deleted with a true value.

lfn_raw contains an array reference with all the fractional entries which contain the Long File Name. Each of them is a hash reference with keys

  raw seq_number name_chars_1 attrib nt_reserved checksum_dosname
  name_chars_2 cluster_low name_chars_3 name_chars

$res is 'end' if end-of-directory entry is encountered; it is 'mid' if directory ends in middle of LFN info. Otherwise $res is not defined.

FAT_2array($fat, $s, $w [, $offset [, $lim ] ] )

Takes a reference $s to a string, at offset $offset of which is the string representation of the FAT table; the length of FAT table in bytes is assumed to be $lim. $offset defaults to 0, $lim defaults to go to the end of string.

Appends to the array referenced by $fat a numeric array representating FAT. $w is the bitwidth of the field (in 12,16,32).

check_FAT_array($fat, $b [, $offset ])

$fat is a reference to a numeric array, or to the string containing the representation of FAT at $offset (which defaults to 0). $b is a hash reference with keys guessed_FAT_flavor, media_type (e.g., the result of interpret_bootsector()).

Returns TRUE if the first two clusters satisfy the FAT conventions; otherwise die()s.

cluster_chain($cluster, $maxc, $fat, $b [, $compress [, $offset ] ])

 ($total, $chain) = cluster_chain($cluster, $maxc, $fat, $b, $offset);

$fat is a reference to numeric array, or to the string containing the representation of FAT at $offset (which defaults to 0). $cluster is the start cluster, $maxc is the maximal number of clusters to look for (0 meaning no limit). $b is a hash reference with keys guessed_FAT_flavor, last_cluster (e.g., the result of interpret_bootsector()).

$chain is an array reference with the clusters in the chain. $total is FALSE if no end-of-a-chain marker was seen; otherwise it contains the total number of clusters.

If $compress is TRUE (defaults to FALSE), the cluster chain is run-compressed: each continuous run of clusters is converted to a pair of numbers: the starting cluster number, and length in clusters. If $compress is a subroutine reference, then it is called with these numbers as arguments; otherwise these numbers are pushed into $chain.

read_FAT_data($fh, $how [, $offset, $b, $FAT ])

  $hash = read_FAT_data($fh, $how [, $offset, $b, $FAT ]);

Extracts one or more of MBR, bootsector, FAT table, root directory from a file $fh containg "contents of a disk". $fh may be a reference to a file handle, or a name of the file. The optional argument $offset is the offset inside the file of the first entry to extract, or of bootsector (default 0).

The hash reference $how contains extraction instructions. If values of keys do_MBR, do_bootsector, do_FAT, do_rootdir are defined, the corresponding parts of filesystem are read. If do_MBR's value is 'maybe' and do_bootsector's is defined, the MBR part is checked whether it is an actual MBR or a bootsector. The actual value of the key do_FAT chooses the copy of FAT to work with.

The value of key partition governs which partition of 0..3 to choose (only primaries are currently supported); if not defined, and the number of valid partitions differs from 1, the call die()s.

If the value of key FAT_separate is TRUE, $offset is the offset of the start of (the first) FAT in the file; otherwise it is the offset of MBR or bootsector (offsets of other parts are calculated as needed). If the value of kye rootdir_is_standalone is TRUE, rootdir is assumed to be the whole content of the file.

If the values of keys parse_MBR, parse_bootsector, parse_FAT, parse_rootdir are defined (or this is needed for processing of remaining parts to extract), the corresponding read parts are interpreted as in MBR_2_partitions(), interpret_bootsector(), FAT_2array(), interpret_directory().

The corresponding parsed values are put into $hash->{MBR}, $hash->{bootsector}; if not parsed, the values are hash references {raw => STRING}. $hash->{FAT} is suitable for argument of cluster_chain(): it is either a reference to string representation of the FAT, or to array representation of FAT.

(To avoid overflowing the memory) the FAT is converted to array only if parse_FAT is defined, AND the number of clusters is below a certain limit. The limit is the value of parse_FAT unless 0; if 0, the default value 3000000 is used (the corresponding memory usage for array FAT representation is about 60MB).

When bootsector is read, $hash->{bootsector_offset} is the actual offset of bootsector (useful if $offset is actually referencing an MBR). Finally, if parse_rootdir's value is defined, $hash->{rootdir_files} is a reference to array of files in the root directory, $hash->{rootdir_ended} is true if end-of-directory marker was seen (i.e., the directory ends before the end of the allocated space); anyway, $hash->{rootdir_raw} is string representation of the root directory.

The keys keep_del, keep_dots, keep_labels are given as corresponding arguments to interpret_directory(). If values referenced by raw_FAT is TRUE, or by parse_FAT is undefined, $hash->{FAT_raw} contains a reference to the string representation of FAT.

write_dir($fh, $o_root, $d, $b, $FAT, [$how, $depth, $offset, $exists])

recursively extract the content of directory $d (a reference to raw string representation of the directory as represented on disk). $depth zero corresponds to no extraction of subdirectories (give undef or an insanely large number to have unlimited depth; e.g., 1e100). $fh should be a file handle representing the disk content with bootsector at $offset. $o_root is the output directory: the files in $d will be put there.

If $exists is TRUE, $o_root exists. (The parent of $o_root should always exist.)

$how is an optional hash reference, with values for keys keep_del, keep_dots, keep_labels giving arguments for interpret_directory() call.

write_file($fh, $dir, $file, $b, $FAT [, $offset ] )

Extract $file (should be a hash reference representing a record from a directory) into a directory $dir. $fh should be a file handle representing the disk content with bootsector at $offset.

EXPORT

None by default.

EXAMPLES

  perl -MFileSystem::LL::FAT=interpret_directory -wle "
    {local $/; binmode STDIN; $s = <STDIN>}
    (undef,@f) = interpret_directory $s, 1;
    print qq($_->{is_subdir} $_->{cluster}\t$_->{size}\t$_->{name}) for @f"
        < dir-clusters

outputs content of a "directory converted to a file" (may be created by disasterous chkdsk run), including the starting cluster.

Given an information about the number of "pre-cluster sectors", and size of the cluster, one can convert the starting cluster number to starting sector number. Then one can extract the files by raw-read of the disk partition:

  $sector = $bootsec->{pre_sectors}
          + ($cluster - 2)*$bootsec->{sectors_in_cluster}
          = $bootsec->{sector_of_cluster0}
          + $cluster * $bootsec->{sectors_in_cluster}

Likewise, one can inspect a bootsector via

  perl -MFileSystem::LL::FAT=interpret_bootsector,check_bootsector -wle
   "{local $/; binmode STDIN; $s = <STDIN>}
    $b = interpret_bootsector $s; check_bootsector $b;
    print qq($_\t=>\t$b->{$_}) for sort keys %$b"
       < disk.bootsector

On DOSish systems one can read bootsector of drive d: by reading the first 512 bytes of the file \\.\d:. E.g., with dd one could do it as

  dd if=//./d: bs=512 count=1 of=disk.bootsector

On UNIXish systems one needs to find the corresponding device file (by calling mount or /sbin/mount?), and do

  dd if=/dev/hda3 bs=512 count=1 of=disk.bootsector

Other DOSish conventions (see also diskext, bootpart, mkbt programs):

  \\?\Device\Harddisk0\Partition0       # Partition0 is entire disk
  //./physicaldrive0
  /dev/fd0                              # Floppy 0 under CygWin
  /dev/sdc                              # physical HDs No. 2 (=c) under CygWin
  /dev/sdc1                             # Same, partition 1

Other programs may be used too:

  D:\mkbt20>mkbt -x -c e: c:\bootstrap-e2
  * Expert mode (-x)
  * Copy bootsector mode (-c)

  dd if=//./e: of=c:/bootstrap-e-dd count=16
  dd --list

CygWin's dd may be flacky; you may want to try http://www.chrysocome.net/dd. You may need "elevated privilige" under Vista.

BUGS

When lowercasing non-LFN names, which codepage should one use (and how)?

We ignore LFNs records with seq-number > 0x7F, unless 0xE5. When do they appear?

How to follow logical partitions?

Test suite is practically absent...

When recursing into a directory without FAT table present, we assume that subdirs have size of one cluster. To do otherwise, need to check that subsequent clusters are not directories; how to do it?

And how often are directories continuous on disk?

SEE ALSO

See http://en.wikipedia.org/wiki/Fat32.

AUTHOR

Ilya Zakharevich, <ilyaz@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009 by Ilya Zakharevich

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.