Journaled File LIBrary (LIBJF) tutorial: "from the ground up"
Prev	Chapter 4. Diving into libjf	Next

4.2. Journaling and caching

Designing a super safe journaling tool without keeping in consideration the performance point of view is a useless academic exercise: no one would use a very slow "safe journaling tool" instead of standard I/O libraries. libjf is not already optimized and a lot of code review, in the future, would be probably increase performances, but from an architectural point of view, the library adopt some strategies to limit performance degradation when compared with standard I/O libraries.

The most important feature is a high level cache we can explain with few words: every time the application updates a journaled file, the change is not propagated to the underlining file, but simply kept in the cache managed by libjf. Data are copied to file when cache reaches maximum size or a commit is requested by application. If the cache is large enough, no underlining file is touched until commit point and, in case of rollback, no file is touched at all. Managing a new level of cache is expensive in terms of CPU and virtual memory, but updating files before commit or rollback dramatically increases elapsed times because every time a bit is touched, its undo record must have been saved and synchronized in a safe place (call it "journal", "log" or "rollback tablespace" does not alter the concept).

If libjf was kernel stuff at filesystem level, its performances would be closer to native file access operations, but a lot of big issues should be solved:

GNU/Linux has 4 different "official" filesystems: ext3 (and ext2), reiser, xfs, jfs and libjf should have 4 different implementations only for GNU/Linux
proprietary UNIX are not so easy to hack: there might be problems related to licenses; some proprietary UNIX does not supply kernel source code and a modification would be quite impossible
not to mention the Microsoft Windows operating system families...

The efficient kernel level implementation of libjf would not exist and we are not discussing about libjf...

The maximum size of cache allocated for every journaled file can be specified setting the field cache_size_limit of struct jf_journal_opts_s of struct jf_file_open_opts_s. Take a look to this sample program:

Example 4-2. cache_size.c

     1	#include <jf_file.h>
       
     2	int main()
     3	{
     4	        int rc;
     5	        jf_journal_t j;
     6	        jf_file_t jf1, jf2, jf3;
     7	        struct jf_journal_opts_s jopts;
     8	        struct jf_file_open_opts_s fopts;
       
     9	        jf_set_default_journal_opts(&jopts);
    10	        jopts.flags |= JF_JOURNAL_PROP_OPEN_O_CREAT |
    11	                JF_JOURNAL_PROP_OPEN_O_EXCL;
    12	        rc = jf_journal_open(&j, "jf_tut_foo-journal", 2, &jopts);
    13	        if (JF_RC_OK != rc) {
    14	                printf("%d/%s\n", rc, jf_strerror(rc));
    15	                return 1;
    16	        }
       
    17	        jf_set_default_file_open_opts(&fopts);
    18	        fopts.join_the_journal = TRUE;
    19	        fopts.journal_opts.journal_file_opts.cache_size_limit = 123400;
       
    20	        rc = jf_file_open(&jf1, &j, "jf_tut_foo-data1", "w", &fopts);
    21	        if (JF_RC_OK != rc) {
    22	                printf("%d/%s\n", rc, jf_strerror(rc));
    23	                return 1;
    24	        }
    25	        printf("Cache limit for first journaled file: "
    26	               JF_OFFSET_T_FORMAT "\n",
    27	               jf_file_get_cache_limit(&jf1));
       
    28	        fopts.journal_opts.journal_file_opts.cache_size_limit = -1;
       
    29	        rc = jf_file_open(&jf2, &j, "jf_tut_foo-data2", "w", &fopts);
    30	        if (JF_RC_OK != rc) {
    31	                printf("%d/%s\n", rc, jf_strerror(rc));
    32	                return 1;
    33	        }
    34	        printf("Cache limit for second journaled file: "
    35	               JF_OFFSET_T_FORMAT "\n",
    36	               jf_file_get_cache_limit(&jf2));
       
    37	        rc = jf_file_open(&jf3, NULL, "jf_tut_foo-data3", "w", NULL);
    38	        if (JF_RC_OK != rc) {
    39	                printf("%d/%s\n", rc, jf_strerror(rc));
    40	                return 1;
    41	        }
    42	        printf("Cache limit for third journaled file: "
    43	               JF_OFFSET_T_FORMAT "\n",
    44	               jf_file_get_cache_limit(&jf3));
       
    45	        rc = jf_file_close(&jf1);
    46	        if (JF_RC_OK != rc) {
    47	                printf("%d/%s\n", rc, jf_strerror(rc));
    48	                return 1;
    49	        }
    50	        rc = jf_file_close(&jf2);
    51	        if (JF_RC_OK != rc) {
    52	                printf("%d/%s\n", rc, jf_strerror(rc));
    53	                return 1;
    54	        }
    55	        rc = jf_file_close(&jf3);
    56	        if (JF_RC_OK != rc) {
    57	                printf("%d/%s\n", rc, jf_strerror(rc));
    58	                return 1;
    59	        }
    60	        rc = jf_journal_close(&j);
    61	        if (JF_RC_OK != rc) {
    62	                printf("%d/%s\n", rc, jf_strerror(rc));
    63	                return 1;
    64	        }
       
    65	        printf("two_files program ended OK!\n");
    66	        return 0;
    67	}

cache_size.c source code explanation

Rows 19-20: set cache size to value 123400 bytes for journaled file jf1
Row 27: retrieve the size of cache associated to journaled file jf1
Rows 28-29: set cache size to default value for journaled file jf2
Row 36: retrieve the size of cache associated to journaled file jf2
Row 37: open journaled file jf3 with default values
Row 44: retrieve the size of cache associated to journaled file jf3

4.2.1. Compilation and execution

To compile cache_size program you can use this command:

libtool --mode=link gcc -Wall -I/opt/libjf/include -L/opt/libjf/lib -ljf \
        -o cache_size cache_size.c

executed it:

tiian@linux:~/src/tutorial> rm jf_tut_foo*
tiian@linux:~/src/tutorial> export JF_JOURNALED_FILE_CACHE_SIZE=765000
tiian@linux:~/src/tutorial> ./cache_size
Cache limit for first journaled file: 123400
Cache limit for second journaled file: 262144
Cache limit for third journaled file: 765000
two_files program ended OK!
tiian@linux:~/src/tutorial> rm jf_tut_foo-*
tiian@linux:~/src/tutorial> export JF_JOURNALED_FILE_CACHE_SIZE=437900
tiian@linux:~/src/tutorial> ./cache_size
Cache limit for first journaled file: 123400
Cache limit for second journaled file: 262144
Cache limit for third journaled file: 437900
two_files program ended OK!

cache associated to first journaled file is 123400 bytes large and it's the same at first and second execution because it's the value explicitly coded by the program
cache associated to second journaled file is 262144 bytes large and it's the same at first and second execution because it's the default value
cache associated to third journaled file varies according to the value of JF_JOURNALED_FILE_CACHE_SIZE; the same behavior can be obtained avoiding explicit setting of cache_size_limit in jf_file_open_opts_s struct.

4.2.2. How cache size limit can be tuned

After you developed your application you can try to expand the cache size limit and measure elapsed times: only if the performance improves significantly the cache size expansion is suggested. For most applications, default value should be fine.

the parameter has the meaning of "cache size limit": only necessary memory are allocated by the application.