Chapter 4. Diving into libjf

In the previous chapters we discussed about transactions and recovery but we didn't afford the argument of data synchronization we announced in introduction. It's time to discover this intriguing land.

4.1. Synchronization type

Rule number one: every operating system has some differences when dealing with data synchronization. libjf should be portable across many environments and it's difficult to take benefit of some specific operating system related features when the software must be portable.

Rule number two: documentation from standards are very weak; just to figure out what "very weak" means, take a look to documentation available in IEEE std. 1003-2001

libjf supply two type of synchronization: "fast" and "safe".

4.1.1. libjf fast synchronization

This type of synchronization prevent data loss in case of application crash and does not supply any warranty in case of system crash.

Fast synchronization uses fflush function to flush buffer content to operating system: in the event of application crash, operating system closes all open file descriptors and queues pending data for writing. If the application crashed its data would be saved by operating system.

4.1.2. libjf safe synchronization

This type of synchronization prevent data loss in case of system crash.

Safe synchronization uses fdatasync (fsync when the previous is not available) function to sync device content.

4.1.3. How can an application choose the type of synchronization?

An application may hard code the type of synchronization specifying flag JF_JOURNAL_PROP_SYNC_SAFE or JF_JOURNAL_PROP_SYNC_FAST at jf_journal_open time:

jf_journal_t j;
struct jf_journal_opts_s jopts;

jf_set_default_journal_opts(&jopts);
jopts.flags |= JF_JOURNAL_PROP_SYNC_SAFE;
rc = jf_journal_open(&j, "jf_tut_foo-journal", 2, &jopts);
      
this method has all the benefits and the disadvantages of "hard wired" parameters. libjf allows you to specify the type of synchronization at run time: this is the default behavior, but you may ask for it by your own:
jf_journal_t j;
struct jf_journal_opts_s jopts;

jf_set_default_journal_opts(&jopts);
jopts.flags |= JF_JOURNAL_PROP_SYNC_ENV_VAR;
rc = jf_journal_open(&j, "jf_tut_foo-journal", 2, &jopts);
      
an application that uses JF_JOURNAL_PROP_SYNC_ENV_VAR searches for environment variable JF_JOURNAL_SYNC_TYPE to establish the type of synchronization must be used:

  • environment variable is defined and its value is "0": fast synchronization is adopted

  • environment variable is defined and its value is "1": safe synchronization is adopted

  • else: JF_JOURNAL_PROP_SYNC_SUGGESTED synchronization is adopted (take a look to "API reference guide")

4.1.4. Playing with synchronization type

Showing the effects of different synchronization type is a hard job out of the scope of this tutorial, but an example to empirically verify the performance gap is easy to build.

Example 4-1. many_hello_world.c

     1	#include <jf_file.h>
       
     2	int main()
     3	{
     4	        int rc, i;
     5	        jf_file_t jf;
     6	        size_t write;
       
     7	        rc = jf_file_open(&jf, NULL, "jf_tut_foo", "w", NULL);
     8	        if (JF_RC_OK != rc)
     9	                return 1;
       
    10	        for (i = 0; i < 10000; ++i) {
    11	                rc = jf_file_printf(&jf, &write, "%s",
    12	                                    "Hello world!\n");
    13	                if (JF_RC_OK != rc)
    14	                        return 1;
       
    15	                rc = jf_file_commit(&jf);
    16	                if (JF_RC_OK != rc)
    17	                        return 1;
    18	        } /* for (i = 0; i < 10000; ++i) */
       
    19	        rc = jf_file_close(&jf);
    20	        if (JF_RC_OK != rc)
    21	                return 1;
       
    22	        printf("Many hello world program is OK!\n");
    23	        return 0;
    24	}
	

many_hello_world.c is like hello_world.c but it performs 10000 transactions instead of only 1. We do not specify JF_JOURNAL_PROP_SYNC_ENV_VAR because it's the default option. To compile many_hello_world.c you can use this command:

libtool --mode=link gcc -Wall -I/opt/libjf/include -L/opt/libjf/lib -ljf \
        -o many_hello_world many_hello_world.c
      
execute it:
tiian@linux:~/tutorial> rm jf_tut_foo*
tiian@linux:~/tutorial> export JF_JOURNAL_SYNC_TYPE=0
tiian@linux:~/tutorial> time ./many_hello_world
Many hello world program is OK!

real    0m0.499s
user    0m0.149s
sys     0m0.345s

tiian@linux:~/tutorial> rm jf_tut_foo*
tiian@linux:~/tutorial> export JF_JOURNAL_SYNC_TYPE=1
tiian@linux:~/tutorial> time ./many_hello_world
Many hello world program is OK!

real    0m3.478s
user    0m0.130s
sys     0m0.390s

tiian@linux:~/tutorial> rm jf_tut_foo*
tiian@linux:~/tutorial> unset JF_JOURNAL_SYNC_TYPE
tiian@linux:~/tutorial> time ./many_hello_world
Many hello world program is OK!

real    0m0.507s
user    0m0.173s
sys     0m0.331s
      
second execution take 7 times the first; third execution is very like the first: this means current value of JF_JOURNAL_PROP_SYNC_SUGGESTED is JF_JOURNAL_PROP_SYNC_FAST but in the future it might be changed. To check the journaled files contains 10000 rows issue this command:
tiian@linux:~/tutorial> wc -l jf_tut_foo
10000 jf_tut_foo
      
To check journal contains 10000 commits issue this command:
tiian@linux:~/tutorial> jf_report -j jf_tut_foo.jf | grep commit | wc -l
10000
      
Please pay attention many_hello_world is not a benchmark program! To measure libjf performances, utility program jf_bench is supplied, but this is another tale.

4.1.5. How is synchronization tested?

To test synchronization, crashes must be reproduced. Application crash is easy to simulate: a division by zero exception, a segmentation fault exception, etc... Simulating a system crash is a much difficult task; a realistic simulation is probably an impossible task without hacking the operating system kernel. Despite this fact, some types of test must be performed against a "journaled files library"...

libjf implements a "crash simulation feature" used to stress the library with crashes in all the interesting code steps: this simulation should be sufficiently closed to a real crash to declare "libjf should be a safe journaling tools". Nothing is engraved in the stone and some stuff might be changed in the future.