1.4. What's file "journaling"?

Journaling is related to transactions: file journaling allows a program to "commit" or "rollback" changes performed against data.

Programmers used to deal with transactional databases like PostgreSQL™ and last releases of MySQL™ are already familiar with terms like "commit", "rollback", "data integrity" and so on.

I'll briefly explain these concepts; I'll assume you are a C addicted because libjf is a C library; I'll tend to use C standard I/O instead of POSIX I/O functions because libjf can be ported to non POSIX systems too.

Take a look to this piece of code:

...
FILE *file1, *file2;
...
fprintf(file1, "Hello ");
fprintf(file2, "world!\n");
...
    

Yeah, I mean, you'd prefer to write "Hello world!" on the same file (stream), but imagine you have to write 2 records on 2 different files and want to be sure than exactly one of this condition is true:

Why should you want this "strange thing"?

Many times you need this behavior: every time you have related information stored in two or more files you have to deal with a "data integrity issue".

1.4.1. Data integrity issue examples

File1 contains a "bit map" of used/free records of file2: if file2 contains billions of records, a bit map may improve dramatically the time wasted to look for a free record. File1 and file2 contents are strongly related and "data integrity issue" must be solved before your application reach "production status".

File1 contains "ack pending packets", file2 contains "OK packets": when an ack arrives, a record must be deleted from file1 and a record must be stored on file2. File1 and file2 are strongly related.

Sometimes, "data integrity issue" can be workarounded with a different data organization, sometimes every workaround exploits one or more "race condition" and the "data integrity issue" must be solved.

Come back to our code:

...
fprintf(file1, "Hello ");
fprintf(file2, "world!\n");
...
      

C standard I/O supplies a function can be used to flush application buffers and signal operating system a write must be performed: fflush; please pay attention there is no warranty data are stored on block device (hard disk) at fflush end (we will explain this later).

We can try to insert fflush in our program:

        ...
[A]
        fprintf(file1, "Hello ");
[B]
        fprintf(file2, "world!\n");
[C]
        fflush(file1);
[D]
        fflush(file2);
[E]
        ...
      

Now we can analyze what happens if the program is interrupted, for example with a POSIX signal, at step [A], [B], ... [E]

  1. we are 100% sure files have not been updated

  2. file2 has not been updated, file1 might be updated (it depends from buffer status, calling fflush is not a must to start buffer flushing)

  3. file1 might be updated, file2 might be updated

  4. file1 has been updated, file2 might be updated

  5. file1 and file2 have been updated.

Even if we supposed data are flushed only by fflush (this is generally not true!), we could not workaround the integrity issue of step D.

What about POSIX I/O?

Using POSIX I/O implies usage of write instead of fprintf and fsync/ fdatasync instead of fflush. The functions do not perform the same actions because POSIX I/O does not use an application side buffer and fsync/fdatasync guarantees data are stored on block device (hard disk), but the "data integrity issue", in the event of a system crash, is the same:

        ...
        int fd1, fd2;
        const char *str1 = "Hello ";
        const char *str2 = "world!\n";
        ...
[A]
        write(fd1, str1, strlen(str1));
[B]
        write(fd2, str2, strlen(str2));
[C]
        fsync(fd1);
[D]
        fsync(fd2);
[E]
        ...
      

The "data integrity issue" at step D has a pattern like the example based on C standard I/O.

Our examples show there are two type of matters:

1.4.2. Data integrity issue with only one file

Our previous examples showed us some typical data integrity issues that need a transactional tool to be solved.

There are situations that can benefit from a transactional support even when only one file is used; the best examples are text editors, office applications (word processors, spreadsheets, ...), configuration editors, and more...

All around the world there are programs writing many copies of the same file and checking file integrity at start-up to assure the text/document/configuration is consistent and is not affected by the consequences of an application/system crash. All that stuff might be replaced by a transactional tool like libjf.