Finding Fat (The Quest For FAT)

When developing a certain hardware and software complex, it was necessary to create a client device, which for other devices should look like a regular USB flash drive, or, more formally, a USB Mass Storage Device. The device is unusual in that it must simulate the FAT file system with files of a sufficiently large size (2GB or more) for the outside world, despite the fact that the files themselves on the device, of course, are absent and are on the network. Anyway, these are not files, but some kind of audio streams.

The task, at first glance, is simple: for each request to read a block (SCSI command) we give the contents of this block. A block can either belong to any of the “files” or contain FAT overhead information.

The first thought was, of course, to pack the FAT image with, for example, bzip2 and unpack it on the device as needed. Three problems arise immediately:

A compressed image of the file system from the heel of 100 megabyte files and several directories takes something around 125Kb, which exceeds the amount of RAM on the device.
Bzip2, apparently, does not know anything about sectors and works at the thread level, therefore, it will be necessary to pack sector-by-sector, therefore, the image size will be extremely prohibitive: as you know, packing with small blocks is unprofitable - the overhead is large compared to the gain from compression.
Image preparation (mounting an empty file, creating a file system, packing) takes a lot of time and is hardly acceptable for implementation on a server with estimated thousands of users. And the image itself in the process of creation requires some fairly noticeable disk space.

Well, not to mention the fact that you have to port bzip2 to the microcontroller.

So something else had to be come up.

The problem can be posed as follows: it is necessary to write a code that will take some description of the file system as an input in some form, and return its contents for each request of the sector number. The content is either service information or file data, which are taken from the corresponding audio stream at a given URL.

This formulation of the question leads us to a system of rules:



Номер сектора => Содержимое сектора

Note that we are talking about sectors, not “clusters,” since a cluster is a concept of the FAT file system itself. Devices work at the block level, they are also sectors. Let's say our “playlist” contains 10 “files” of 2Gb each (2Gb is such a practical approach to infinity). If each rule has a size of one byte, which, of course, is impossible, then we get it. byte for all the rules. Somewhat more reasonable. But, of course, the rules are not unique to each sector. We will set the rules for the ranges of sectors. This leads us to a set of rules:

 2*1024*1024*1024 * 10 / 512 = 41 943 040

 Номер сектора (A) => Содержимое сектора

 Диапазон секторов (A,B) => Содержимое сектора

We will also try to pack the sectors themselves. Since we are not faced with the task of compressing data - the data themselves are not available in the device and are taken from the Web, you just need to somehow provide more or less compact presentation of the service data of the file system itself. At first glance, there are many repeating sequences in this data, so we will encode as follows: repetitive sequences are presented as Non- repeating sequences , presented as In addition, those sequences that we have already encoded, or it would be nice not to re-insert them, but refer to them. Probably, we have another sequence. Perhaps in the process of implementation, other sequences may appear for a more compact representation of the file system structures.

 (Признак RLE, Количество, Символ)

 (Признак Sequence, Последовательность символов)

 (Вызов последовательности, идентификатор)

All this is very similar to the command system of a virtual machine, and since there are calls, that is, a stack. The simplest known virtual machine is one of the varieties of a fort. In fact, this is a reverse
Polish entry on steroids, with an added stack of return addresses from calls, which eliminates the need to bother with organizing function frames: everything is extremely simple - when returning from a call, remove the
top word from the R stack and go to the address to which it points .

In addition, the token threaded code (and this will be it) for the two-stack machine has a very good density, which, in this case, suits us very well.

Interpreting such a code is fast, it turns out just an average of five times slower than the native code, and also very simple.

So, we have some kind of coding system, a system of rules and some kind of virtual machine on which these rules should be played.

It remains to generate these rules from a certain description, get a bytecode and implement a machine for its interpretation. And only then we will see what happened.

With the implementation of a virtual machine, the situation is simple: it will work on a microcontroller, respectively, so far there are no C options here. True, it is possible that there is nothing to write there - it will turn out to be generated in some
way.

All that remains is the generation of the rule system from the description, the description itself, the code generation and the description of the commands of this code. In addition, it would be good to check the rules not sequentially, but somehow more reasonably: organize checks in the form
comparison tree, so that the number of comparisons per sector would be of the order of the binary logarithm of the number of comparisons.

We’ve finished the initial analysis, we would have to make a prototype and see what we get.

We will need to generate and, possibly, read binary data of various dimensions and endianesses (FAT service data are written in low-endian format) and work with nested data structures.

What would it be implemented on? C, C ++ or maybe Python? Or Ruby? Joke.
Of course, we will do it at Haskell: the task is not the simplest, it requires some kind of performance, and we have little time. Well, anyway, the server that will call this code is also implemented on
Haskell, so the choice is quite natural.

Let's get started.

The central thing in the system is the “rules”. They convert the description of the file system, the code is generated from them. We describe them: In addition, there is a description of the file system itself, which consists of directories and files, with some specifics of the FAT itself. Here we dwell in more detail. The strange constructors DirDor and DirDotDot are nothing more than the '.' Directories and '..', which - here's a surprise - are top-notch, physically present directory entries. Fortunately, they are only links and do not require the allocation of clusters. Otherwise, everything is pretty obvious: the first attribute of type constructors is a unique identifier. It obviously can be useful to us in order to understand the firmware from which “file” the data was requested.

data Rule = REQ Int [Chunk] | RANGE Int Int [Chunk] deriving Show

data Chunk = SEQ BS.ByteString

 | RLE Int Word8 

 deriving (Eq, Ord)

data Entry =  DirRoot Int [Entry]
            | DirDot     Int
            | DirDotDot  Int
            | Dir Int String [Entry]
            | File Int String Int BS.ByteString
    deriving (Eq, Ord, Data, Typeable, Show)

The second attribute is the file name. In the case of a file, we also add its size and data. This, of course, is not the data of the file itself, but some indication of the firmware of the device where to get this data. There you can write, for example, a syssh structure or stream URL. Therefore ByteString.

Now we need to somehow construct the Entry, taking into account the requirements of the file system: each directory, except the root, must contain the entries '.' and '..', they must refer to the corresponding directories, should not be
identical record names, names should not contain forbidden characters and so on and so forth. It can be seen that manually creating this structure is difficult, in addition, the user of the API must deal with this, and he will definitely mix something up and everything will break, but this is a serious matter. So it’s better to prohibit the import of Entry-type content from our module, and provide the user with some more convenient and error-protected solution. Something like: It looks good, even not knowing the language can understand what is described here. It’s easy to implement: to generate something, there is already a ready-made Monad Writer.

fileContents = ... 
fatSample2 = filesystem $ do
    file "file0" (16384) fileContents 
    dir "A" $ do
      file "file1" (megs 100) fileContents 
      dir "C" $ do
        file "file3" (megs 100) fileContents 
        file "file4" (megs 100) fileContents 
        file "file5" (megs 100) fileContents 
        dir "E" $ emptyDir 
     dir "B" $ do
       file "file2" (megs 50) emptyFile

In addition, we will need to distribute unique identifiers, so State is also useful, where we put some kind of counter. Since we want to cross State and Writer, the monad transformer will not hurt us. Something like this: Each function takes parameters like name, size, and another monadic value for constructing nested records. Each such calculation will be run in a separate Writer, and the State will be dragged along to ensure that the identifiers are unique. So, we have set the directory structure, now we need to get the rules out of it somehow. To do this, somehow place the data files and directories on the "disk". We assume that they are placed sequentially, first directories, then files:

newtype EntryIdT m a = EntryIdT {
    runF :: (WriterT [Entry] (StateT (Int, Int) m)) a 
} deriving (Monad, MonadWriter [Entry], MonadState (Int, Int))
type EntryIdM = EntryIdT Identity
runEntryIdM :: (Int, Int) -> EntryIdM a -> ([Entry], (Int, Int))
runEntryIdM init f = runState (execWriterT (runF f)) init
filesystem :: EntryIdM () -> Entry
filesystem f = DirRoot 0 dirs
  where dirs = fst $ runEntryIdM (1,0) f
dir :: String -> EntryIdM () -> EntryIdM ()
file :: String -> Int -> (EntryInfo -> BS.ByteString) -> EntryIdM ()

data AllocEntry = AllocEntry { beginSect :: Int
                             , endSect   :: Int
                             , entry     :: Entry
                             } deriving (Show)
allocate :: ClustSize32 -> Int -> Entry -> [AllocEntry]
allocate cl from = eFix . eAlloc . eOrder . filter eFilt . universe
  where eFilt (File _ _ _ _) = True
        eFilt (Dir _ _ _)    = True
        eFilt (DirRoot _ _)  = True
        eFilt _            = False
        eOrder = uncurry (++) . partition (not.isFile)
        eAlloc = reverse . snd . foldl fentry (from, [])
        fentry (n, xs) e =
          let sectors = entryLen cl e `div` fatSectLen
              begin = n
              end   = begin + sectors - 1
              n'    = n + sectors
              allocated = AllocEntry begin end e
          in (n', allocated : xs)
        eFix = id

The code as a whole is quite obvious: we take all the records, remove the '.' and '..' which do not have their own clusters, but only point to foreign ones, make directories go first, then files (there is no difference, but it’s more logical,
and the volume table will be faster to read), we select sectors (it’s more convenient for us work with sectors, “clusters” is an artificial concept) and that’s it.

It is worth noting the universe function from the uniplate module. It allows you to list all the elements of a nested structure in a list (if desired, with list comprehension) to avoid the routine writing of recursive traversal functions.

It is for her sake that we declared the type Entry deriving (Data, Typeable) above.

Having files placed by sectors, it costs us nothing to generate rules for them:

generateData :: Maybe CalendarTime -> ClustSize32 -> [AllocEntry] -> [Rule]
generateData ct cl es = mergeRules $ execWriter $ do
  forM_ es $ \(AllocEntry {beginSect = a, endSect = b, entry = e}) -> do
    case e of
      DirRoot _ es  -> writeEntries a b es
      Dir _ _ es    -> writeEntries a b es
      File _ _ _ bs -> tell [RANGE a b (encodeBlock (BS.take (fatSectLen) bs))]
  where
    ...

the encodeBlock function here can encode a ByteString into a sequence of rules, writeEntries generates directory entries and encodes them, and mergeRule tries to combine the range of sectors of successive rules.

The generation of one directory entry looks something like this: Here we use the exceptionally useful PutM monad from Data.Binary.Put, which allows you to output data of any capacity and endianity to a lazy byte string. So, we have the directory structure of the FAT volume, we have their allocation by sector and the corresponding rules. What do we have left? Here you need to step back a little and remember the FAT device. If you do not go into unnecessary details that are widely available on the Web and in literature, then FAT32 is designed like this:

entryRecordShort :: String
                 -> Int
                 -> Int
                 -> Maybe CalendarTime
                 -> [ATTR] 
                 -> BS.ByteString
entryRecordShort nm size clust clk a = runPut $ do
putNameASCII nm -- Name
putWord8 (fatAttrB a) -- Attr
putWord8 0      -- NTRes
putWord8 0      -- CrtTimeTenth
putWord16le cT  -- CrtTime
putWord16le cD  -- CrtDate
putWord16le cD  -- LstAccDate
putWord16le cHi -- FstClusHI
putWord16le cT  -- WrtTime
putWord16le cD  -- WrdDate
putWord16le cLo -- FstClusLO
putWord32le (fromIntegral size) -- FileSize
where ...

    | BootSect | FAT32 Info | FAT1 | FAT2 | DATA |

So far, we only have rules for DATA. FAT1 and FAT2 are cluster allocation tables. Each file or directory (which is also a file) occupies a chain of clusters in the data area, and each cluster in the data area is represented by a 32-bit value in FAT1 and FAT2 (they are identical).

Each FAT cell contains the number of the next cluster of the file, the last cluster is marked with a special value. The number of the first cluster of the file is indicated in the directory entry. The data we have is placed sequentially, so that in each cell of the chain the number N + 1 will be written, where N is the previous value.

Here the first problem arises: for our calculated 10 x 20Gb, this table will occupy as many as 655360 32-bit values, which again exceeds the available RAM. However, these rules cannot be compressed.
our primitive RLE packing algorithm, since there are no duplicate values. However, since we were able to generate this sequence once, we will probably be able to generate it again, already on the device.

A closer look showed that the values in one sector of the allocation table depend on the maximum value in the previous one, and in general, the sequence is determined by the expression:

    Na = BASE + (Nsect - M) * STEP
    Ni <- [Na, Na + 1 ..]

where Na is the first value for this sector, Nsect is the number of the requested sector (it will be at the top of the stack of our fort machine), M, BASE and STEP are the constants calculated statically, Ni is the i-th number of the sequence, and in total in the sector, obviously 512/4.

Thus, we have acquired a new sequence that generates a series of values based on dynamic data (sector numbers). We add types for this sequence and adjacent ones: Looking ahead, we will add another rule for the callback, which should be called after the generation of the file data sector so that the device firmware takes a buffer and fills it with real data.

data Chunk = SEQ BS.ByteString
           | RLE Int Word8 
           | SER Word32 Word32
           | NSER Word32 Int Word32 -- base offset step
           | CALLBACK Word8
           deriving (Eq, Ord)

It would be possible to immediately generate a table in the form of a set of rules, but for some reason I needed it in binary form, in addition there is already a debugged function for encoding binary strings, and it is easy to
make mistakes in direct generation .

This table is quite large, and in the case of a large data area and a small cluster size, poor Haskell has a hard time.

At some point, from a large lazy Word32 list, the application felt really bad, so I had to quickly rewrite it to lazy byte lines and use runPut / runGet to put 32-bit values there and retrieve them.

Surprisingly, this led to an acceleration of about ten times and everything began to work at an acceptable speed, although, of course, you should rewrite it in such a way as to immediately generate rules and not create data.
But for the concept, it will.

We omit the generation functions of the table and the rules for it, they are quite large, but quite obvious: The table encoding function first associates each sector with one REQ a rule (NSER _ _ _), then considers sectors in pairs and if two sectors form a common sequence of values, then the rule for the sector is replaced by the rule for the range of sectors, the result is compact enough to bring it here:

type ClusterTable = BS.ByteString 
genFAT :: ClustSize32 -> Int -> [AllocEntry] -> ClusterTable
encodeFAT :: Int -> ClusterTable -> [Rule]

REQ 32 [SEQ [F8], RLE 2 255, SEQ [0F], RLE 3 255, SEQ [0F],
        RLE 3 255, SEQ [0F], RLE 3 255, SEQ [0F], RLE 3 255,
        SEQ [0F], RLE 3 255, SEQ [0F], RLE 3 255, SEQ [0F],
        SEQ [08], RLE 3 0, SEQ [09], RLE 3 0, SEQ [0A],
        RLE 3 0, RLE 3 255, SEQ [0F], SER 12 128]
RANGE 33,231 [NSER 129 33 128]
REQ 232 [SER 25601 25610, RLE 3 255, SEQ [0F], SER 25612 25728]
RANGE 233 431 [NSER 25729 233 128]
REQ 432 [SER 51201 51210, RLE 3 255, SEQ [0F], SER 51212 51328]
RANGE 433 631 [NSER 51329 433 128]
REQ 632 [SER 76801 76810, RLE 3 255, SEQ [0F], SER 76812 76928]
RANGE 633 831 [NSER 76929 633 128]
REQ 832 [SER 102401 102410, RLE 3 255, SEQ [0F], SER 102412 102528]
RANGE 833 931 [NSER 102529 833 128]
REQ 932 [SER 115201 115210, RLE 3 255, SEQ [0F], RLE 468 0]
RANGE 933 1056 [RLE 512 0]

It looks promising, clearly better than a two megabyte piece of data.
The second copy of the table is accurate to constants, so in the future you can replace this sequence by subtracting the constant from the offset and calling the first table. But that later.

So, we have FAT1, FAT2 and DATA. It remains to get only BootSect and FAT32 Info. This is static binary data, so we again use Data.Binary.Put, and then we pack it into the rules.

These two modules (Put and Get) are literally indispensable and personally I quote them higher than the binary patterns in Erlang, although this is subjective. We put our packer on the result, merge the rules into ranges, and get the final list of rules that describes our entire file system.

fatGenBoot32 :: FAT32GenInfo -> BS.ByteString 
fatGenBoot32 info = addRsvd $ runPut $ do
                                -- BOOT AREA   sect0
  putBytes [0xEB, 0x58, 0x90]   --  0 JmpBoot
  putBytes bsOEMName            --    OEMName
  putWord16le bps               --    BytesPerSec
  putWord8 spc                  --    SectPerClust
  putWord16le rsvd              --    ReservedSecCnt
  putWord8 2                    --    NumFATs
  putWord16le 0                 --    RootEntCnt
  putWord16le 0                 --    TotSec16
  putWord8 0xF8                 --    Media
  putWord16le 0                 --    FAT16Sz
  putWord16le 0x3F              --    SectPerTract
  putWord16le 0xFF              --    NumHeads
  putWord32le 0                 --    HiddSec
  putWord32le sectNum           --    TotSec32
                                -- FAT32 Structure
  putWord32le fsect             --    FATSz32
  -- ...
  -- и так далее

So, we have a set of rules. It remains to generate a comparison tree for them and
compile it all in bytecode.

Let's start with the tree: It may not be the best option, but the rules turned out to be less than a hundred, so far you should not worry. It is up to the virtual machine, the set of commands, and the compiler: Alas, here a simple Haskell type system starts to be missed: I want to set compile-time invariants for the commands and their classes, so that, for example, it would be impossible to create a command with the wrong opcode. But you just can’t do this, but you don’t want to introduce a separate type for each opcode, an existential data type for a command and still do not want to use metaprogramming to generate opcodes.

data CmpTree = GEQ Int CmpTree CmpTree | CODE [Rule]
  deriving (Show)
mkCmpTree :: [Rule] -> CmpTree
mkCmpTree r = mkTree' rulemap
  where rulemap = M.fromList $ map (\x -> (fsect x, x)) r
        splitGeq n m =
          let (a, b, c) = M.splitLookup n m
          in (a, c `M.union` (maybe M.empty (M.singleton n) b))
        mkTree' xs | M.null xs     = CODE [] 
                   | M.size xs < 3 = CODE (map snd (M.toList xs))
                   | otherwise =
          let ks = map fst $ M.toAscList xs
              n = ks !! (length ks `div` 2)
              (le, geq) = splitGeq n xs
          in GEQ n (mkTree' le) (mkTree' geq)

- классификатор команд, команды могут кодироваться разнообразно
- для достижения максимальной компактности
class OpcodeCL a where
  isRLE  :: a -> Bool
  arity0 :: a -> Bool
  arity1 :: a -> Bool
  arity2 :: a -> Bool
  arity3 :: a -> Bool
  firstCode :: a
  lastCode  :: a
data Opcode =  DUP | DROP
             | CONST | CRNG
             | JNZ | JZ | JGQ | JNE | JMP | CALLT | CALL | RET
             | NOT | EQ | NEQ | GT | LE | GQ | LQ | RNG
             | LOADS2 | LOADS3 | LOADS4 | LOADS5 | LOADS6 | LOADS7
             | LOADS8 | LOADS9 | LOADS10 | LOADSN
             | SER | NSER | NSER128
             | RLE1 | RLE2 | RLE3 | RLE4 | RLE5 | RLE6 | RLE7 | RLE8
             | RLE16 | RLE32 | RLE64 | RLE128 | RLE256 | RLE512 | RLEN
             | OUTLE | OUTBE | OUTB
             | NOP
             | CALLN
             | DEBUG
             | EXIT
  deriving (Eq, Ord, Enum, Show)
data CmdArg = W32 Word32 | W16 Word16 | W8 Word8 | ADDR Addr
data Addr = ALabel Label | AOffset Int
data Cmd =  Cmd0 Opcode
          | CmdConst Word32
          | Cmd1 Opcode CmdArg
          | Cmd2 Opcode CmdArg CmdArg
          | Cmd3 Opcode CmdArg CmdArg CmdArg
          | CmdJmp Opcode Addr
          | CmdCondJmp Opcode Addr
          | CmdLabel Label
          | RawByte Word8
type Label = Int
type Block = (Label, [Cmd])

Put it off until better times, let’s get by with what we have. Anyway, for the implementation of the virtual machine, you will have to write tests, so the errors that come up will pop up there.

So, there is a virtual machine command system, now you need to compile a comparison tree into it, built from our rules: Here we use the favorite way to generate all sorts of things using eDSL built on top of the Writer monad. The generation of flat code from the comparison tree leads to a lot of “snot”, for example, to long chains of exit from blocks:

mkVMCode :: CmpTree -> [Block]
mkVMCode xs = normalize maxl code
	-- 	skip
    scanT :: CmpTree -> GenM ()
    scanT (GEQ n left right) = do
      s <- newLabel
      l <- runGen' (scanT left)  >>= withLabel
      r <- runGen' (scanT right) >>= withLabel
      _ex <- newLabel
      label s
      dup
      cnst n
      jgq (labelOf r)
      block l >> jmp _ex
      block r >> label _ex
    scanT (CODE [])    = op0 EXIT
    scanT (CODE rules) = mapM_ scanR rules
    scanR :: Rule -> GenM ()
    scanR ( REQ n code ) = do
      s <- newLabel
      code' <- runGen' (scanmC code) >>= withLabel
      ex <- newLabel
      label s
      dup
      cnst n
      jne ex
      block code'
      label ex
    scanR ( RANGE a b code ) = do
      s <- newLabel
      code' <- runGen' (scanmC code) >>= withLabel
      ex <- newLabel
      label s
      dup
      crng a b
      jz ex
      block code'
      label ex
	--  skip

L1:
    ...
    JMP L2
L2:
    JMP L3
L3:
    JMP L4
L4:
    EXIT

jumping into the next blocks and so on. normalize eliminates these disgraces, and breaks the code into blocks, each of which begins with a label, and ends with an unconditional jump to the next block command. There are no conditional or unconditional jump commands inside the block; they are valid only at the end. We need such blocks to calculate label offsets. After it, you can merge blocks, getting rid of unnecessary transitions completely.

We will write the Show instance for our bytecode for beautiful fort printing and see what we get after optimizing the blocks: Not perfect, but there are no snot, the general code is partially allocated to the procedures, there is a branch tree. It will do. It remains to run it on something, for this we need to implement, finally, the virtual machine itself.

...
L215:
    DUP
    CONST 2122
    JGQ L220
    DUP
    CRNG 00000843 00000849
    JZ L235
    RLE512 00
    EXIT
L220:
    DUP
    CRNG 0000084A 000C8869
    JZ L223
    LOADS2
    BYTE 48
    BYTE 45
    RLE2 4C
    LOADS7
    BYTE 4F
    BYTE 20
    BYTE 57
    BYTE 4F
    BYTE 52
    BYTE 4C
    BYTE 44
    RLE2 21
    CALLN 00
    EXIT
L223:
    DUP
    CRNG 000C886A 000E1869
    JZ L235
    RLE512 00
    CALLN 00 ;; а вот и вызов коллбэка прошивки --- значит, это сектор данных
    EXIT     ;; файла
L235:
    EXIT
...
L0:
    LOADS5
    BYTE 02
    BYTE 08
    BYTE 20
    BYTE 00
    BYTE 02
    RET
...

It can be simply written in C, since only the opcodes change significantly, but experience has shown that it is better to generate it all than to monitor the consistency of the opcodes and C code later. There is no way to verify this, and the situation when the compiler produces one, and vm wants to interpret something completely different, is quite likely. So it’s better to generate everything. Again, we outline the mini-eDSL for generating C, so as not to bother closing the brackets, indents and semicolons.

Again Writer, no variety ... Let's see what we got:

stubs :: String
stubs = 
  envFile $ do
    comment "top of the file"
    put "#include <stdint.h>"
    put "#include \"emufatstubs.h\""
    defines
...
  stmt (pt codeType ++ op `assign` "code")
    endl
    push a "n"
    put "for(;;)"
    braces $ indented $ do
      put "switch(*op)"
      braces $ do
        forM_ codes $ \op -> do
          put (printf "case %s:" (show op))
          indented $ decode op
          endl
        put "default:"
        indented $ exit
  exitLabel
  indented $ stmt "return 0"
...
decode (CRNG)    = do
  skip "1"
  stmt (tmp0 `assign` pop a)
  stmt (tmp1 `assign` decode32) >> skip "4"
  stmt (tmp2 `assign` decode32) >> skip "4"
  push a ( _and (tmp0 `gq` tmp1) (tmp0 `lq` tmp2) )
  next
decode (CALL)    = do
  skip "1"
  stmt (tmp0 `assign` decode32) >> skip "4"
  stmt (push' r pc')
  jump tmp0
...

#define DEFSTACK(n, t, l) ...
#define RESET(a) ...
#define PTOP(a) ...
#define TOP(a) ...
#define POP(a) ...
#define PUSH(a,v) ...
#define NEXT(x) ...
#define JUMP(x, b, o) ...
#define SKIP(x, n) ...
#define PC(x, b) ...
#define DECODE32(op) ...
#define DECODE8(op) ...
    ...
    DEFSTACK(a, uint32_t, 16);
    DEFSTACK(r, uint32_t, 8);
    uint32_t tmp0;
    uint32_t tmp1;
    uint32_t tmp2;
    uint32_t tmp3;
    ...
    uint8_t *op = code;
    PUSH(a, n);
    for(;;)
    {
        switch(*op)
        {
    ...
        case CRNG:
            SKIP(op, (1));
            tmp0 = POP(a);
            tmp1 = DECODE32(op);
            SKIP(op, (4));
            tmp2 = DECODE32(op);
            SKIP(op, (4));
            PUSH(a, ((tmp0 >= tmp1) && (tmp0 <= tmp2)));
            NEXT(op);
    ...
        case CALL:
            SKIP(op, (1));
            tmp0 = DECODE32(op);
            SKIP(op, (4));
            PUSH(r, PC(op, code));
            JUMP(op, code, tmp0);
    ...	
        case EXIT:
            goto _exit;
        default:
            goto _exit;
        }
    }
_exit:
    return 0;
    ...

Well, that should be. An important nuance: in order for switch to be compiled into the transition table, it is necessary that the values of its labels go sequentially and do not have holes. And probably fit in bytes. In case of violation of these heuristics, C compilers can generate a comparison tree, which in this case does not suit us at all. We provided the opcode sequence with the definition of the Enum instance for our Opcode type (see above).

What a pity that such a low-level, it would seem, C does not have a standard way to navigate to a variable address, even though GCC supports such an extension. Only not for all interesting platforms there is GCC, so we restrict ourselves to switch-based interpretation.

Our virtual machine is ready. Let's write tests for her. This is easy - let the test VM take a bytecode stream as input, generate the contents of the buffer as a result of their interpretation and send it to the output stream. Each test case, therefore, will be considered passed if the contents of the buffer ultimately meet expectations.

We’ll write tests ... ... and test cases: and a shell to run them: Run, fix all the problems and crashes in the core (surprisingly few) and run everything together: Everything works as expected: the image of the file system is generated, checked and mounted. The content is as described on our eDSL.

testJne = makeTest $ do
  [l1, l2] <- replicateM 2 newLabel
  cnst 1
  cnst 2
  jne l1
  exit
  label l1
  cnst 0xCAFEBABE -- 1
  outle
  cnst 1
  cnst 1
  jne l2
  cnst 0xCAFEBABE -- 2
  outle
  exit
  label l2
  cnst 0xFFFFFFFF
  outle

tests = testSuite $ do
  ...
  test "testJne"   testJne (assert $ do
                              a <- getWord32le
                              b <- getWord32le
                              return $ a == 0xCAFEBABE && b == 0xCAFEBABE)

runTest :: String -> Test -> IO Bool 
runTest path (T{tname=nm, tcode=code, tcheck = tc})= do
  let bin = toBinary code
  (inp,out,err,pid) <- runInteractiveProcess path [] Nothing Nothing
  BS.hPut inp bin
  hClose inp
  res <- BS.hGetContents out
  let r = tc res
  hPutStrLn stderr (printf "test %-24s : %s" nm (if r then "PASSED" else "FAILED !"))
  return r
...
case args of
    ...
    ... -> mapM_ (runTest path) tests
    ...
...

...
test testJgq                  : PASSED
test testJne                  : PASSED
test testCallRet1             : PASSED
...

...
helloFile = const $ BS8.pack "HELLO WORLD!!"
fatSample2 = filesystem $ do
  file "file0" (16384) helloFile 
  dir "A" $ do
    file "file1" (megs 100) helloFile 
    dir "C" $ do
      file "file3" (megs 100) helloFile 
      file "file4" (megs 100) helloFile 
      file "file5" (megs 100) helloFile 
      dir "E" $ emptyDir 
  dir "B" $ do
    file "file2" (megs 50) emptyFile
...
$ ./FatGen bin | cbits/genfat 1000000 > fat.img
  521106 / 1000000      ( 13027 kb/s) 
$ fsck.vfat ./fat.img         
dosfsck 3.0.9, 31 Jan 2010, FAT32, LFN
Free cluster summary uninitialized (should be 15863)
./fat.img: 10 files, 115209/131072 clusters
$ sudo mount -o loop ./fat.img /mnt/test2/
$ find /mnt/test2/
/mnt/test2/
/mnt/test2/FILE0
/mnt/test2/A
/mnt/test2/A/FILE1
/mnt/test2/A/C
/mnt/test2/A/C/FILE3
/mnt/test2/A/C/FILE4
/mnt/test2/A/C/FILE5
/mnt/test2/A/C/E
/mnt/test2/B
/mnt/test2/B/FILE2

The size of the compiled rules file in this case is a little over 2Kb and lends itself to further optimization, 2Kb is quite an acceptable size for dynamic downloads even via GSM / EDGE, not to mention 3G.

Fort performance also lends itself to optimization, not to mention the fact that in the most extreme case it can be compiled in C and then into the native processor code.

Here's a short story about the benefits of Haskell in the national economy.

Tags:

Finding Fat (The Quest For FAT)

Also popular now: