Journaling is related to transactions: file journaling allows a program to "commit" or "rollback" changes performed against data.
Programmers used to deal with transactional databases like PostgreSQL™ and last releases of MySQL™ are already familiar with terms like "commit", "rollback", "data integrity" and so on.
I'll briefly explain these concepts; I'll assume you are a C addicted because libjf is a C library; I'll tend to use C standard I/O instead of POSIX I/O functions because libjf can be ported to non POSIX systems too.
Take a look to this piece of code:
... FILE *file1, *file2; ... fprintf(file1, "Hello "); fprintf(file2, "world!\n"); ...
Yeah, I mean, you'd prefer to write "Hello world!" on the same file (stream), but imagine you have to write 2 records on 2 different files and want to be sure than exactly one of this condition is true:
all the two records are stored on file1, file2
no record is stored on file1, file2
Why should you want this "strange thing"?
Many times you need this behavior: every time you have related information stored in two or more files you have to deal with a "data integrity issue".
File1 contains a "bit map" of used/free records of file2: if file2 contains billions of records, a bit map may improve dramatically the time wasted to look for a free record. File1 and file2 contents are strongly related and "data integrity issue" must be solved before your application reach "production status".
File1 contains "ack pending packets", file2 contains "OK packets": when an ack arrives, a record must be deleted from file1 and a record must be stored on file2. File1 and file2 are strongly related.
Sometimes, "data integrity issue" can be workarounded with a different data organization, sometimes every workaround exploits one or more "race condition" and the "data integrity issue" must be solved.
Come back to our code:
... fprintf(file1, "Hello "); fprintf(file2, "world!\n"); ...
C standard I/O supplies a function can be used to flush application
buffers and signal operating system a write must be performed:
fflush
; please pay attention there is
no warranty
data are stored on block device (hard disk) at
fflush
end (we will explain this later).
We can try to insert fflush
in our program:
... [A] fprintf(file1, "Hello "); [B] fprintf(file2, "world!\n"); [C] fflush(file1); [D] fflush(file2); [E] ...
Now we can analyze what happens if the program is interrupted, for example with a POSIX signal, at step [A], [B], ... [E]
we are 100% sure files have not been updated
file2 has not been updated, file1 might be updated (it depends from
buffer status, calling fflush
is not a must to
start buffer flushing)
file1 might be updated, file2 might be updated
file1 has been updated, file2 might be updated
file1 and file2 have been updated.
Even if we supposed data are flushed only by fflush
(this is generally not true!), we could
not workaround the integrity issue of
step D.
What about POSIX I/O?
Using POSIX I/O implies usage of write
instead of
fprintf
and fsync
/
fdatasync
instead of fflush
.
The functions do not perform the same actions because
POSIX I/O does not use an application side buffer
and fsync
/fdatasync
guarantees
data are stored on block device (hard disk), but the
"data integrity issue", in the event of a system crash, is
the same:
... int fd1, fd2; const char *str1 = "Hello "; const char *str2 = "world!\n"; ... [A] write(fd1, str1, strlen(str1)); [B] write(fd2, str2, strlen(str2)); [C] fsync(fd1); [D] fsync(fd2); [E] ...
The "data integrity issue" at step D has a pattern like the example based on C standard I/O.
Our examples show there are two type of matters:
I/O functions (fprintf
,
write
, ...) do
not specify at what time file
update happens: it may be any time between
fprintf
(write
) and
fflush
(fsync
/
fdatasync
)
there is not an atomic "flush"/"sync" function for 2 or more streams/file descriptors
Our previous examples showed us some typical data integrity issues that need a transactional tool to be solved.
There are situations that can benefit from a transactional support even when only one file is used; the best examples are text editors, office applications (word processors, spreadsheets, ...), configuration editors, and more...
All around the world there are programs writing many copies of the same file and checking file integrity at start-up to assure the text/document/configuration is consistent and is not affected by the consequences of an application/system crash. All that stuff might be replaced by a transactional tool like libjf.