Combinatorial Transforms : Applications in Lossless Image Compression

Combinatorial Transforms : Applications in

Lossless Image Compression

3 E. Syahrul , J. Dubois , V. Vajnovszki

LE2I - Universit´e de Bourgogne

B.P. 47 870, 21078 Dijon Cedex - France

{elfitrin.syahrul , julien.dubois , vvajnov }@u-bourgogne.fr

Abstract

Common image compression standards are usually based on fre-

quency transform such as Discrete Cosine Transform. We present a

different approach for lossless image compression, which is based on

a combinatorial transform. The main transform is Burrows Wheeler

Transform (BWT) which tends to reorder symbols according to their

following context. It becomes one of promising compression approach

based on context modeling. BWT was initially applied for text com-

pression software such as BZIP2; nevertheless it has been recently ap-

plied to the image compression field. Compression schemes based on

the Burrows Wheeler Transform have been usually lossless; therefore

we implement this algorithm in medical imaging in order to recon-

struct every bit. Many variants of the three stages which form the

original compression scheme can be found in the literature. We pro-

pose an analysis of the latest methods and the impact of their associa-

tion and present an alternative compression scheme with a significant

improvement over the current standards such as JPEG and JPEG2000.

Keywords: BWT, Lossless (image) compression, combinatorial transform.

1 Introduction

BWT was created by Burrows and Wheeler in 1994. Since then its improve- ments have been considered by many authors, as well as its application not only for text compression but also for image compression [1–6]. This paper presents the improvement and analysis of BWT methods particularly imple- mented for image compression. We analyze the effects of each stage of BWT pre- and post-processing. The results are compared with JPEG and JPEG2000.

2 Original method based on BWT

A typical Burrows Wheeler Transform method that has been proposed by Burrows and Wheeler for lossless text compression consists of 3 stages as seen in Figure 1 [7], where

BWT is the Burrows Wheeler Transform itself that tends to group similar characters together,
GST is the Global Structure Transform that transforms the output of BWT local structure redundancy to a global redundancy using a ranking list. It produces sequences of contiguous zeros, • DC is a Data Coding.

Input _{✲ ✲ ✲ ✲} Output

BW T GST DC

data

... data Figure 1: Original scheme of BWT method.

The stages are processed sequentially from left to right. The output of a stage becomes the input of the next stage. BWT as a main transform rearranges the input data using a sorting algorithm. The output content is exactly the same with the input, except the ordering. Figure 2 gives a simple example how BWT works in a small image. Pixels are encoded by a pair of hexadecimal values. The image is generated to assume that scanning the data left to right where there are not two identical consecutive symbols. BWT as a context based transform tends to group similar pixels together as seen in Figure 2(b). This transform does not reduce the data size; by contrast it adds a few bytes information as a primary index to decode the data. The detailed work of BWT and other related transforms will be explained below.

BWT Figure 3(a) and (b) show how the forward BWT works. In this example, we consider the first 16 bytes of the image represented by the array in Fig- ure 2(a). BWT makes all possible cyclic rotations of the input data as seen in Figure 3(a). Then, it sorts the rotations [8]. The obtained matrix is ff 0f dc 07 c8 0f dc ff 0f ff 0f

07 05 0f dc

⇐

10 ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff 0f 15 dc07 ff 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f 1 ^{ff 0fdc07c80fdc ff 0f ff 0f07050fdc 07 ← 15} 16 07 ff 0f dc 07c8 0f dc ff 0f ff 0f 0705 0f dc 8 ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 c8 0f dc (a) Rotated BWT input. (b) Sorted BWT rotations.

15 dc 07 ff 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f 13 05 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 07 7 dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 0f 14 0f dc07 ff 0f dc07 c8 0f dc ff 0f ff 0f 07 05

6 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 9 0f ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff 9 0f ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff 10 ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff 0f 5 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 11 0f 0705 0f dc07 ff 0f dc07c8 0f dc ff 0f ff 3 dc 07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f 12 0705 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f

2 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 7 dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 0f 14 0f dc07 ff 0f dc07 c8 0f dc ff 0f ff 0f 07 05 8 ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 c8 0f dc

4 07 c8 0f dc ff 0f ff 0f 0705 0f dc 07 ff 0f dc 4 07c8 0f dc ff 0f ff 0f 0705 0f dc 07 ff 0f dc 16 07 ff 0f dc 07c8 0f dc ff 0f ff 0f 0705 0f dc 5 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 11 0f 0705 0f dc07 ff 0f dc07c8 0f dc ff 0f ff 6 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8

1 ^{ff 0fdc07c80fdc ff 0f ff 0f07050fdc07} 13 05 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 07 2 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 12 07 05 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 3 dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f

Rotated BWT input ^{F L}

(d) The output of RLE0 (c) The output of MTF Figure 2: Example how BWCA works in a small image 8 × 8.

01 .

07 07 dc dc dc 0f dc dc dc dc 05 c8 0f dc ff 0f dc dc dc 0f 0f dc dc dc dc ff ff ff ff ff

07 07 ff dc dc dc dc dc dc dc 05 c8 0f dc ff 0f dc ff 0f dc

04 07 dc

07 dc

(a) Image 8x8 ⇓

07 c8 0f dc ff (b) The output of BWT

07 07 dc 07 ff 0f 07 dc

07 0f 0f 0f 0f 0f 0f 0f

07 07 0f 0f

07 07 0f dc 05 c8 0f dc ff 0f dc 0f 0f

ff c8 c8 c8 c8 c8 ff

⇒

05 ff ff 07 dc 07 ff 0f 07 dc

05 00 ca

00 04 ff

00 03 ff

00 02 ca

Figure 3: The forward of BWT.

Position input context link ₀₇ ₀₅

1 _0f ₀₇

2 _dc ₀₇

3 _dc ₀₇

4 _{ff 0f}

5 _{ff 0f}

6 _{05 0f}

7 _{c8 0f}

8 _{ff 0f}

9 _{07 c8}

10 _{0f dc}

11 _{0f dc}

12 _{0f dc}

13 _{0f ff}

16 _→

14 _{07 ff}

5 15 6 ← _{dc ff}

9 Figure 4: The BWT inverse transform. represented in Figure 3(b). The position of original data is named primary index (equals to 15 in our example). This information is required for the reconstruction process. The last column (L) of this matrix and the primary index are the BWT output. As it can be seen in this example, BWT output tends to groups similar pixels together.

The reverse BWT [9] is principally just another permutation of the original data. Figure 4 shows how this process works. The second column of this figure is the BWT output of the data in Figure 3. The third column (called here context) is obtained from the second one by sorting its elements in ascending order. The link in the fourth column refers to the position of the context in the input (the second column). For repeated symbols the _th _th i occurrence of a context symbol corresponds to the i occurrence of the same symbol in the data input. The process is started from position 15 (the primary index) where the first pixel value of the original data is placed. It refers to the context f f (as the first pixel of original data) and also refers to link 6, as a clue of next position to the second element of original data. So, the next position is 6 which refers to 0f and gives the next link to the next output. Therefore each step gives the permuted pixel value as output and will process the whole file, because of the cyclic rotations.

GST Common GST that has been used by Burrows and Wheeler is Move-To- Front (MTF) from Bentley et al. [10]. As the second stage of original BWT chain, MTF does not shrink data but it can reduce redundancy. MTF uses an update list as an index of MTF input. The list consists of 256 symbols since the input of grey level image are 256 pixels for 8 bits per pixel. It processes the input symbols sequentially. Every input of MTF is moved to the front of the list, so the input symbols that occur often are transformed into small indices. Runs of repeated symbols are transformed into zeros. Some authors proposed other GST transforms such as Move One From Front (M1FF), Move One From Front Two (M1FF2), Frequency Count (FC), etc. These transforms will be discussed in Section 6.

Data Coding A data coding essentially based on Entropy Coding. Burrows and Wheeler propose to use Run Length Encoding Zeros (RLE0) after MTF since it pro- duces a lot of zeros, and then Huffman Coding or Arithmetic Coding to increase data compression [7]. Some approaches use Arithmetic Coding, which offers the best compression rates [11]. Further, Arithmetic Coding translates the entire data into numbers represented in certain base rather than trans- lating each data symbol into a series of digit in certain base. Therefore, AC approach is often more optimal than Huffman Coding.

Since the MTF output tends to produce a lot of small pixels value, Fen- wick proposes an adaptive Arithmetic Coding that follows the skew distribution of MTF output [9]. More details on this process will be discussed in Section 7.

3 Corpus

The BWT compression method has been applied for lossless text compression and do not take into account the data’s nature. Nevertheless, it can be applied successfully to image, as we will discuss in the next section. The lossless approach is obviously appropriate to medical image compression, which is expected to be lossless. Our experiments use 100 medical images selected from IRMA (Image Retrieval in Medical Applications) database [12] and Lukas Corpus [13]. The images are extracted randomly. IRMA database consists of primary and secondary digitized X-ray films in portable network graphics (PNG) and tagged image file format (TIFF), 8 bits per pixel (8 bpp), examples of images are shown in Figure 5. The image sizes are between 101 KB and 4684 KB. And for the Lukas Corpus, we are using two dimensional 8 bit radiographs in TIFF format.

Lehmann et al. [4] and Syahrul et al. [6] present the implementation of the BWT method in medical images which give an interesting result. They use medical images from IRMA database. These papers have not considered BWT preprocessing nor other variants of GST. Below, we consider different Figure 5: Example of tested images. From left to right: hand; head; pelvis; chest frontal and chest lateral. ways to scan the images as a BWT pre-processing step.

4 Linearization Scheme

BWT method is used to compress two-dimensional images, but the input of BWT is a one dimensional sequence. Thus, the image has to be converted from two dimensional image into one dimensional sequence. This conversion is referred to as linearization or scan path. Some codings, as Arithmetic Coding or BWT itself, depend on the relative order of gray scale values, they are therefore sensitive to the linearization method used.

(a) Left (b) Left-Right (c) Up-Down (d) Spiral (e) Zigzag

Figure 6: Linearization methods.

Some of the popular linearization schemes are given in Figure 6. We have tested 8 such different methods; namely: scanning image from left to right (L), left to right then right to left (LR), up to down then down to up (UD), zigzag (ZZ), spiral, divide image into small blocks 8 × 8, small blocks 8 × 8 covering the image in zigzag, and small blocks 3 × 3 covering the image in zigzag.

Our preliminary test uses BWT original method (Figure 1) and also adds different image scan methods to see this pre-processing impact in compression performance. We use MTF as GST stage, simple Run Length Encoding zero (RLE0) and Arithmetic Coding (AC) for EC stage. Table 1 shows the results of these tests for 8 different image scans. It shows that scan path

Table 1: Image compression ratios for different type of scan path. _{Image L LR UD Spiral ZZ 8x8 8x8ZZ 3x3zz Jpeg J2K} _{Hand3 2.114 2.123 2.253 2.221 1.991 1.933 1.994 2.049 2.136 2.733} _{Hand2 2.260 2.251 2.390 2.382 2.091 2.052 2.073 2.138 2.205 2.769} _{Hand1 2.372 2.366 2.547 2.536 2.257 2.204 2.262 2.336 2.249 2.994}

_{Head4 2.726 2.721 2.721 2.764 2.544 2.502 2.542 2.622 2.399 2.548}

_{Head3 2.566 2.565 2.527 2.563 2.350 2.303 2.349 2.422 2.363 2.932}

_{Head2 2.481 2.480 2.527 2.538 2.366 2.337 2.392 2.466 2.210 2.938}

_{Head1 2.219 2.216 2.274 2.273 2.155 2.108 2.174 2.220 1.992 2.554}

Hand4 2.685 2.679 2.830 2.802 2.551 2.411 2.510 2.557 2.189 2.909

_{Pelvis2 1.850 1.848 1.890 1.876 1.806 1.791 1.829 1.863 1.797 2.105}

_{Pelvis1 1.808 1.808 1.842 1.835 1.760 1.750 1.782 1.814 1.725 2.038}

_{Av. 100 2.516 2.515 2.577 2.575 2.357 2.315 2.364 2.442 2.280 2.924}

AV. 10 2.308 2.306 2.380 2.379 2.187 2.139 2.190 2.249 2.109 2.665

influences BWT methods performance. Image scan vertically (up-down) is slightly better than spiral method. Over the 100 images tested, 49 of them give the better result using up-down scan and 47 using spiral scan. The worst compression ratios are obtained using scan image in a block 8 × 8 (85 images), and zigzag mode (14 images). This preprocessing stage can increase the compression ratios around 4% than conventional scanning. In other words, the neighborhood of pixels influences the permutation yields by the BWT.

This result also shows that BWT method is better than JPEG but inferior than JPEG2000. For all images tested, 91 give better CR than JPEG, and among them 10 are better than JPEG2000. This preliminary result shows that scanning the image up-down and in spiral improves compression ratios. Therefore our future tests will use spiral or up-down stage.

5 BWT and its improvement

BWT is based on a sorting algorithm. There are several methods to improve the performance of sorting process, but they do not influence BWT results. Burrows and Wheeler suggested suffix tree to improve sorting process [7]. Other authors suggest suffix array or their own sorting algorithm [14–16].

Figure 7 shows the relationship between BWT and suffix array for the same input given in Figure 3, and we consider the input data is Im, then n is the length of Im, and so in this example n = 16.

The first column of Figure 7 consists of the given suffix. The third column is the sorted suffixes in ascending order and the fourth column is their suffix array (SA). We can see the correlations between sorted rotations (the usual BWT in Figure 3(b)) and sorted suffixes in the third column of Figure 7. These matrixes are similar. The second column in Figure 3(b) is the same with the third column of Figure 7 if the added symbols to create the rotations

Suffixes

ID Sorted suffixes SA ff0fdc07c80fdcff0fff0f07050fdc07 1 05 0f dc07

13 0fdc07c80fdcff0fff0f07050fdc07 2 0705 0f dc07

12 dc07c80fdcff0fff0f07050fdc07 3 07c8 0f dc ff 0f ff 0f 0705 0f dc07

4 07c80fdcff0fff0f07050fdc07

16 c80fdcff0fff0f07050fdc07 5 0f 0705 0f dc07

11 0fdcff0fff0f07050fdc07 6 0f dc07c8 0f dc ff 0f ff 0f 0705 0f dc07

2 dcff0fff0f07050fdc07 7 0f dc07

14 ff0fff0f07050fdc07 8 0f dc ff 0f ff 0f 0705 0f dc07

6 0fff0f07050fdc07 9 0f ff 0f 0705 0f dc07

9 ff0f07050fdc07 10 c8 0f dc ff 0f ff 0f 0705 0f dc07

5 0f07050fdc07 11 dc07c8 0f dc ff 0f ff 0f 0705 0f dc07

3 07050fdc07 12 dc07

15 050fdc07 13 dc ff 0f ff 0f 0705 0f dc07

7 0fdc07 14 ff 0f 0705 0f dc07

10 dc07 15 ff 0f dc07c8 0f dc ff 0f ff 0f 0705 0f dc07

07 16 ff 0f ff 0f 0705 0f dc07

8 Figure 7: Relationship between BWT and suffix arrays (SA). matrix of Figure 3(a) are ignored. For example the first line of Figure 3(b) was placed in line 13 of the rotated original data, so the rotations matrix for _{th th} this line begins with the 13 symbol till the 16 symbol then followed by _{st th} the 1 till the 12 symbols to complete the sorted rotations matrix. If the added symbols are omitted, this first line is equal to the first line of sorted suffixes (first line in the third column of Figure 7).

The BWT output can be computed using suffix array (SA) as: ½ Im[SA[i] − 1], if SA[i] 6= 1

L (i) =

(1) Im [n], otherwise.

Hence, it does not need to create the sorted rotations matrix to obtain BWT output (Figure 3(a) and (b)). Nevertheless, finding suffix sorting algorithms that run in linear worst case time is still an open problem. Figure 8 shows a few different methods for BWT computation [17]. We use the BWT of Yuta Mori that is based on SA-IS algorithm to construct the BWT output [18]. It reduces processing time and decrease memory requirements.

6 GST and its modifications

The most common GST algorithm is MTF. The interest of MTF is two- fold: runs of similar symbols are changed into runs of zeros, and the grey level image distribution is modified in the sense that the probability of lower input values increase (and so the probability of the higher values decreases). Therefore the Entropy Coding is more efficient on the output of MTF.

Method complexity time space

Worst-case Avg-case Avg-case Ukkonen’s suffix tree construction Ô (n) Ô (n) - McCreight’s suffix tree construction O (n) O (n) - Kurtz-Balkenhol’s suffix tree construction Ô

(n) Ô (n) 10n Farach’s suffix tree construction Ô (nlogn) Ô (nlogn) - Manber-Myers’s suffix array construction Ô

(nlogn) ^O (nlogn) 8n Sadakane’s suffix array construction O (nlogn) O (nlogn) 9n Larson-Sadakane’s suffix array construction ^O

(nlogn) ^O (nlogn) 8n Itoh-Tanaka’s suffix array construction ^{> O} (nlogn) ^O (nlogn) 5n Nong’s SA-IS O (nlogn) O (nlogn) 6n Burrows-Wheeler’s sorting O (n ² ^{logn ) - -}

Bentley-Sedgewick’s sorting ^O (n ² ) ^O (nlogn) 5n + stack Sedward’s sorting O (n ² ^{logn ) - -} Figure 8: Different sorting algorithms used for BWT.

MTF maintains the list of all possible symbols (MTF list), which order is modified during the process. The list can be considered as a stack and is used to obtain the output. Each input value is coded with its rank in the list. This stack is then updated: the input value is pushed at the top of the list. Therefore, the rank of this input symbol becomes zero. Consequently, a run of N identical symbols is then coded with the symbol followed by N − 1 zeros.

Most references show the efficiency of MTF, especially for text compression. Table 2 and Table 3 in the 2 ^nd and the 3 ^rd column give the results of BWT method for image compression without and with MTF. Here we can see that MTF part is significant. The compression ratios decrease for 6 images. Nevertheless, using scan image up-down, the average of compression ratios increase of approximately 3% for 10 images presented and 4% for the total database. And for scan image spiral, the compression ratio increases around 2% for 10 images and also 4% for total image database.

Table 2: The comparative results for MTF and its variants with up-down scan image. _{Image no-MTF MTF M1FF M1FF2 Ts(0) Bx3 Bx5 Bx6 FC WFC AWFC} _IFC _{Hand1 2.616 2.547 2.541 2.538 2.624 2.649 2.666 2.670 2.663 2.659 2.706 2.656} _{Hand2 2.196 2.390 2.387 2.387 2.449 2.466 2.476 2.477 2.470 2.484 2.513 2.475} _{Hand3 1.669 2.253 2.247 2.246 2.319 2.345 2.363 2.368 2.375 2.352 2.410 2.347} _{Hand4 2.589 2.830 2.825 2.824 2.920 2.939 2.946 2.945 2.938 2.979 3.000 2.954}

Head1 2.251 2.274 2.263 2.262 2.329 2.349 2.364 2.367 2.357 2.348 2.400 2.359

_{Head2 2.561 2.527 2.518 2.518 2.590 2.613 2.632 2.637 2.631 2.611 2.672 2.614} _{Head3 2.554 2.527} _{2.52 2.519 2.606 2.632 2.650 2.654 2.646 2.645 2.695 2.635}

_{Head4 2.751 2.721 2.714 2.714 2.801 2.824 2.841 2.843 2.832 2.857 2.902 2.839}

_{Pelvis1 1.978 1.842 1.835 1.835 1.888 1.905 1.919 1.922 1.911 1.898 1.908 1.909}

_{Pelvis2 2.026 1.890 1.881 1.880 1.936 1.952 1.966 1.969 1.961 1.951 1.950 1.960}

_{Av.10 2.319 2.380 2.373 2.372 2.446 2.467 2.482 2.485 2.478 2.478 2.516 2.475}

_{Av.100 2.475 2.577 2.570 2.570 2.654 2.679 2.696 2.699 2.694 2.694 2.724 2.687} Table 3: The comparative results for MTF and its variants with spiral scan image. _{Hand4 2.587 2.802 2.800 2.798 2.898 2.923 2.937 2.939 2.926 2.954 2.985 2.927} _{Hand3 1.657 2.221 2.214 2.214 2.284 2.307 2.326 2.330 2.338 2.319 2.375 2.312} _{Hand2 2.193 2.382 2.380 2.380 2.453 2.472 2.486 2.488 2.479 2.481 2.517 2.476} _{Hand1 2.616 2.536 2.530 2.527 2.614 2.639 2.657 2.662 2.655 2.648 2.701 2.646} _{Image no-MTF MTF M1FF M1FF2 Ts(0) Bx3 Bx5 Bx6 FC WFC AWFC} _IFC

_{Av.10 2.323 2.379 2.372 2.371 2.447 2.470 2.487 2.491 2.482 2.479 2.519 2.476}

_{Pelvis2 2.018 1.876 1.867 1.867 1.922 1.939 1.954 1.957 1.948 1.937 1.932 1.946}

_{Head4 2.784 2.764 2.757 2.756 2.848 2.874 2.894 2.898 2.885 2.905 2.957 2.887}

_{Head3 2.578 2.563 2.555 2.553 2.642 2.669 2.689 2.693 2.685 2.684 2.736 2.674}

_{Head2 2.576 2.538 2.528 2.528 2.602 2.626 2.646 2.651 2.645 2.623 2.689 2.627}

Head1 2.251 2.273 2.263 2.263 2.329 2.350 2.365 2.370 2.358 2.347 2.399 2.359

_{Av.100 2.477 2.575 2.568 2.567 2.651 2.677 2.694 2.698 2.692 2.691 2.721 2.685}

_{Pelvis1 1.974 1.835 1.828 1.828 1.882 1.899 1.913 1.917 1.905 1.892 1.902 1.903}

Some authors change the MTF with other Global Structure Transform.

We will test and analyze the impact of some of these transforms.

Balkenhol proposed the modification of MTF called Move One From Front (M1FF) [19]. This algorithm changes the work of the M1FF list. The input symbol from the second position in the M1FF list is moved to the first position; meanwhile the input from higher positions is moved to the second position. Furthermore, Balkenhol gives a modification of M1FF called Move One From Front Two (M1FF2). The symbol from the second position is moved to the first position of the M1FF2 list only when the previous transformed symbol was at the first position. Unfortunately, the results of our tests for these transforms decrease compression performance as seen in the _{th th} column 4 and 5 of Table 2 and Table 3.

Albers also presented other list based algorithms, called Time Stamp (TS) [20]. The deterministic version of this algorithm is TimeStamp (0) or TS (0). While MTF use 256 symbols in its list, this transform uses a double length list. So the list contains 512 symbols and each symbol occurs twice. When the input data is processed, the position of an item is one plus the number of double symbols in front of that input symbol. Then the list is updated by moving the second symbol to the front. This method is also called “Best 2 of 3” algorithm based on the Chapin’s algorithm called a “Best x of 2x − 1” algorithm [21], because TS (0) uses 2 × 256 symbols in the list.

Therefore, the list of a “Best x of 2x−1” algorithm contains x×256 symbols. Here, we have tested a “Best x of 2x−1” algorithm for x = 3, 5, and 6 (called Bx

3, Bx5 and Bx6, respectively for short) to see its impact in the BWT update methods (the original method with up-down or spiral image scan). _{th th} The results of these transforms are shown in column 6 to 9 in the Table 2 and Table 3. The compression ratios using these transforms are better than those using MTF algorithm. The average compression performance increases till 9% for a “Bx6” algorithm using image scan up-down for the whole tested images.

Other variant of GST is Frequency Counting (FC). This transform is quite different with the previous GST. There is a computation to count the symbols ranking. It gives the highest rank to the symbol with the highest frequency. This transform is not very effective since it takes time to favoring symbols [8]. The Weighted Frequency Count (WFC) improves FC transform by defining a function based on symbol frequencies [17]. It also counts the distance of symbols within a sliding window. The most frequent symbol has a higher weight. This approach does not give better compression ratios than _{th th} FC in our tests. It can be seen in the column 10 and 11 of Table 2 and Table 3. The improvement compression performance is obtained by a WFC modified called Advanced Weighted Frequency Count (AWFC). It counts a weight of symbol distribution. It is more complex than WFC, but it gives _th better compression ratios. These results can be seen in the column 12 of Table 2 and Table 3.

Abel presents other GST method called Incremental Frequency Count (IFC) [11]. It is quite similar to WFC, but it is less complex and the CR is less than that obtained with WFC. The results of this transform can be seen _th in column 13 in the Table 2 and Table 3.

Table 2 and Table 3 show that AWFC is slightly better than a “Bx6”. But AWFC is more complex than a “Bx6” transform. It takes more time to count the distance of symbol. Therefore, for the next stage, we will use two of these transforms to analyze the Entropy Coding stage and to improve BWT method performances.

7 EC and its modifications

Burrows and Wheeler original paper uses RLE0 as the first step of EC, since there are a lot of zeros after MTF [7]. The function of RLE is to support the probability estimation of the next stage. A long run of zeros tends to overestimate the global symbol probability [19].

Our previous best results use AWFC or a “Bx6” algorithm for GST stage, Up-down or spiral image scan method for BWT pre-processing and a simple RLE0 followed by Arithmetic Coding for the EC stage. This compression _{nd rd} scheme is referred as method M1 in 2 and 3 column of Table 4.

Lehmann et al. propose other RLE called Run Length Encoding two symbols (RLE2S) [4]. This coding separates the data stream and the runs so it does not interfere with the main data coding. It codes only two or more consecutive symbols into two symbols and the length of runs are placed in other data stream. This coding also decreases compression performance in _th our tests. The results of this second method (M2) are shown in the 6 and _th 7 column of Table 4.

Table 4: CR results for different EC methods. _{M1 : M2 : M3 : M4 : M5 :} _{Image UD-BWT-GST UD-BWT- UD-BWT-GST UD-BWT- UD-BWT-} _JPEG _{RLE0-AC RL2S-GST-AC modRLE-modAC GST-AC GST-modAC} ₂₀₀₀ _{Bx6 AWFC Bx6 AWFC Bx6 AWFC Bx6 AWFC Bx6 AWFC} _{Hand1 2.670 2.706 2.850 2.861 2.989 2.977 2.964 2.940 3.030 2.996 2.994}

Hand2 2.477 2.513 2.591 2.600 2.733 2.710 2.640 2.611 2.759 2.709 2.776

_{Hand3 2.368 2.410 2.510 2.525 2.537 2.552 2.489 2.496 2.551 2.560 2.733}

_{Hand4 2.945 3.000 3.108 3.120 3.289 3.280 3.173 3.156 3.329 3.297 2.909}

_{Head1 2.367 2.400 2.483 2.480 2.512 2.505 2.502 2.490 2.520 2.506 2.554}

_{Head2 2.637 2.672 2.786 2.793 2.833 2.834 2.827 2.816 2.848 2.836 2.937}

_{Head3 2.654 2.695 2.828 2.837 2.921 2.922 2.900 2.890 2.952 2.940 2.932}

_{Head4 2.843 2.902 3.046 3.058 3.206 3.187 3.164 3.128 3.254 3.202 2.673}

_{Pelvis1 1.922 1.908 2.009 1.974 2.019 1.981 2.017 1.976 2.024 1.982 2.038}

Pelvis2 1.969 1.950 2.069 2.027 2.075 2.030 2.081 2.034 2.081 2.032 2.105

_{Av.10 2.485 2.516 2.628 2.628 2.711 2.698 2.676 2.654 2.735 2.706 2.665}

_{Av.100 2.699 2.724 2.860 2.849 2.955 2.930 2.931 2.890 2.987 2.939 2.924}

Another RLE0 that never expands the file has been presented by Wheeler.

This coding encodes the data as a binary value. Reference [9] discusses more details about this coding.

For the last coding, as stated above, Burrows and Wheeler suggest to use Arithmetic Coding. As mentioned by Abel in [11] that a specific Arith- metic Coder should be used, we cannot use a simple arithmetic coder with a common order-n context. GST stage increases the probability of lower input values (and so it decreases the probability of the higher values).

Therefore, Fenwick proposes an Arithmetic Coder with hierarchical coding model for this skew distribution. We also tested this coding, referred as M3 in Table 5. The results of this method are presented in the 7 ^th and 8 ^th column of Table 4. In contrast to the common methods related in literature, we propose to omit RLE stage on this compression scheme. Two chains are then proposed. They are respectively similar to the M1 and M3 schemes but do not include any RLE stage. Therefore, the first one uses a traditional AC and the second one, a modified AC.

Table 5: CR results for different BWT methods. _Method _Bx6 _JPEG2000 _AWFC _JPEG2000 _{Nb. Image CR CR Nb. Image CR CR} _M1 _{> JPEG2000} _{(group 1)} _{29 3.132 2.486} _{13 3.196 2.633} _{< JPEG2000} _{(group 2)} _{71 2.522 3.103} _{87 2.653 2.967}

Total 100 2.699 2.924 100 2.724 2.924 _M2 _{> JPEG2000} _{(group 1)}

_{17 3.324 2.741}

_{17 3.329 2.741} _{< JPEG2000} _{(group 2)}

_{83 2.765 2.961}

_{83 2.750 2.961} _{Total 100 2.860 2.924 100 2.849 2.924} _M3 _{> JPEG2000} _{(group 1)}

_{24 3.435 2.868}

_{21 3.433 2.821} _{< JPEG2000} _{(group 2)}

_{76 2.804 2.941}

_{79 2.796 2.951} _{Total 100 2.955 2.924 100 2.930 2.924} _M4 _{> JPEG2000} _{(group 1)}

_{20 3.449 2.815}

_{18 3.416 2.768} < JPEG2000 _{(group 2)}

_{80 2.801 2.951}

_{82 2.775 2.961} _{Total 100 2.931 2.924 100 2.890 2.924} _M5 _{> JPEG2000} _{(group 1)}

_{30 3.401 2.898}

_{25 3.375 2.845} _{< JPEG2000} _{(group 2)}

_{70 2.809 2.935}

_{75 2.794 2.950} _{Total 100 2.987 2.924 100 2.939 2.924}

The results presented in Table 4 show that the performance of M5 can be slightly improved in average and therefore to be considered. We propose to analyze the results by splitting the image in two classes. The images which provide better compression ratios than JPEG2000 (>JPEG2000), referred as group 1, and others (<JPEG2000), referred as group 2. This analysis is presented in Table 5. All the methods provide, for the group 1, a significant improvement of the average compression ratio over JPEG2000 standard. The method using a “Bx6” algorithm, provides a compression ratio improvement equals to 17% in comparison with JPEG2000. It is specially significant as it concerns the higher number of images (30%). For the group 2, the modifications on the BWT schemes (M2, M3, M4 and M5) enable the difference on the average CR between BWT method and JPEG2000 to be decreased. As seen in Table 4, M5 using “Bx6” gives slightly better average CR than M4, and the lowest value of difference of average CR between BWT method and JPEG2000 for group 2. The same tendencies are observed for AWFC methods.

8 Conclusion and Perspectives

We presented the state of the art of the application of BWT in lossless image compression. We analyzed the impact of each stage that has important role to improve compression performance. We also showed the BWT implementation in image is not similar to that in text. We should consider pre-processing, where this stage improves CR till 4%.

On the 100 medical images of the database, the average compression rate obtained with the proposed method is slightly better than with JPEG2000 standard. Nevertheless, this new approach based on combinatorial method provides much higher compression rate on some images. This fact has been observed on 30% of the tested images. For these images, the CR is in average 17% higher than the JPEG2000 standard. Meanwhile, the other images (70%) provide a CR which is only lower than 4%. These promising results offer two interesting perspectives. Firstly, we currently investigate the modification of EC stage to be more adapted with the BWT stage. A preliminary result shows that the merging of RLE and AC stage should be consider. Secondly, we consider developing an automatic image classification technique based on the image features (as the image texture) to split the images in two classes: those which provide the best results with the Burrows Wheeler compression scheme and those where the standard JPEG2000 is more efficient. Therefore, a specific codec which embeds the two algorithms would be developed. References

[1] M. Ciavarella and A. Moffat, “Lossless image compression using pixel reordering,” Proceedings of the twenty-seventh Australasian Computer Science Conference , pp. 125–132, 2004. [2] N. R. Jalumuri, “A Study of Scanning Paths for BWT Based Image Compression,” Master’s thesis, 2004.

[3] X. Bai, J. S. Jin, and D. Feng, “Segmentation-based multilayer diagnosis lossless medical image compression,” in VIP ’05: Proceedings of the Pan- Sydney area workshop on Visual information processing , (Darlinghurst, Australia, Australia), pp. 9–14, Australian Computer Society, Inc., 2004.

[4] T. M. Lehmann, J., and C. Weis, “The impact of lossless image compression to radiographs,” Proceeding. SPIE, vol. 6145, pp. 290–297, March 2006. [5] Y. Wiseman, “Burrows-Wheeler based JPEG,” Data Science Journal, vol. 6, pp. 19–27, 2007.

[6] E. Syahrul, J. Dubois, V. Vajnovszki, T. Saidani, and M. Atri, “Loss- less image compression using Burrows Wheeler Transform (methods and techniques),” in SITIS ’08, pp. 338–343, 2008. [7] M. Burrows and D. J. Wheeler, “A block-sorting lossless data compression algorithm,” tech. rep., System Research Center (SRC) California,

May 10, 1994. [8] D. Adjeroh, T. Bell, and A. Mukherjee, The Burrows-Wheeler Trans- form: Data Compression, Suffix Arrays, and Pattern Matching .

Springer US, june 2008. [9] P. M. Fenwick, “Burrows–Wheeler compression: Principles and reflec- tions,” Theor. Comput. Sci., vol. 387, no. 3, pp. 200–219, 2007.

[10] J. L. Bentley, D. D. Sleator, R. E. Tarjan, and V. K. Wei, “A locally adaptive data compression scheme,” Commun. ACM, vol. 29, no. 4, pp. 320–330, 1986. [11] J. Abel, “Incremental frequency count—a post BWT–stage for the

Burrows-Wheeler compression algorithm,” Softw. Pract. Exper., vol. 37, no. 3, pp. 247–265, 2007.

[12] T. M. Lehmann, M. O. G¨ uld, C. Thies, B. Fischer, K. Spitzer, D. Key- sers, H. Ney, M. Kohnen, H. Schubert, and B. B. Wein, “Content-based image retrieval in medical applications,” 2004. [13] “Lukas corpus.” http://www.data-compression.info/Corpora/LukasCorpus/index.htm. [14] S. Kurtz and B. Balkenhol, “Space efficient linear time computation of the Burrows and Wheeler-Transformation,” in complexity, Festschrift in honour of Rudolf Ahlswede’s 60th Birthday , pp. 375–384, 1999. [15] G. Manzini, “Two space saving tricks for linear time lcp array computation,” in Proc. SWAT. Volume 3111 of Lecture Notes in Computer

Science , pp. 372–383, Springer, 2004. [16] P. Ferragina, R. Giancarlo, G. Manzini, and M. Sciortino, “Boosting textual compression in optimal linear time,” ACM, vol. 52, pp. 688–713,

July 2005. [17] S. Deorowicz, Universal lossless data compression algorithms. PhD thesis, Silesian University of Technology Faculty of Automatic Control,

Electronics and Computer Science Institute of Computer Science, 2003. [18] G. Nong, S. Zhang, and W. H. Chan, “Linear suffix array construction by almost pure induced-sorting,” Data Compression Conference, vol. 0, pp. 193–202, 2009. [19] B. Balkenhol and Y. M. Shtarkov, “One attempt of a compression algorithm using the BWT.” SFB343: Discrete Structures in Mathematics,

Preprint, Faculty of Mathematics, University of Bielefeld, 1999. [20] S. Albers, “Improved randomized on-line algorithms for the list update problem,” SIAM J. Comput., vol. 27, no. 3, pp. 682–693, 1998.

[21] B. Chapin, “Switching between two on-line list update algorithms for higher compression of Burrows-Wheeler transformed data,” in Data Compression Conference , pp. 183–192, 2000.

Combinatorial Transforms : Applications in Lossless Image Compression

1 Introduction

2 Original method based on BWT

3 Corpus

4 Linearization Scheme

5 BWT and its improvement

6 GST and its modifications

7 EC and its modifications

8 Conclusion and Perspectives

Dokumen yang terkait

Pengolahan : manipulasi dari data ke dalam bentuk yang lebih

Pokok Bahasan : Mengelola File Deskripsi singkat : Dalam pertemuan ini akan mempelajari tentang file,

Efficient Implementation of Mean Formula for Image Processing using FPGA Device

Analisa Perkiraan Jumlah Penerimaan Mahasiswa Baru Dengan Menggunakan Metode Kuadrat Terkecil dan SPSS17.0 StudiKasus : STMIK JAKARTA (STIK)

Efficient Implementation of Mean, Variance and Skewness Statistic Formula for Image Processing Using FPGA Device

Organisasi dan Arsitektur Komputer : Perancangan Kinerja (William Stallings)

Sumber : Buku Pemrograman Web karangan Abdul Kadir Pengantar Java Script di http:www.ilmukomputer.com20060819pengantar- java-script

Pasal 1 Dalam Undang-undang ini yang dimaksud dengan : 1. Telekomunikasi adalah setiap pemancaran, pengiriman, dan atau penerimaan dan setiap informasi dalam bentuk tanda-tanda, isyarat, tulisan, gambar, suara, dan bunyi melalui sistem kawat, optik, radio

SATUAN ACARA PERKULIAHAN MATA KULIAH : Interaksi Manusia dan Komputer KODE SKS : 4IA( KK045306

Mata Kuliah : Konsep Teknologi Informasi B

Dukungan

Links

Combinatorial Transforms : Applications in Lossless Image Compression

1 Introduction

2 Original method based on BWT

3 Corpus

4 Linearization Scheme

5 BWT and its improvement

6 GST and its modifications

7 EC and its modifications

8 Conclusion and Perspectives

Dokumen yang terkait

Pengolahan : manipulasi dari data ke dalam bentuk yang lebih

Pokok Bahasan : Mengelola File Deskripsi singkat : Dalam pertemuan ini akan mempelajari tentang file,

Efficient Implementation of Mean Formula for Image Processing using FPGA Device

Analisa Perkiraan Jumlah Penerimaan Mahasiswa Baru Dengan Menggunakan Metode Kuadrat Terkecil dan SPSS17.0 StudiKasus : STMIK JAKARTA (STIK)

Efficient Implementation of Mean, Variance and Skewness Statistic Formula for Image Processing Using FPGA Device

Organisasi dan Arsitektur Komputer : Perancangan Kinerja (William Stallings)

Sumber : Buku Pemrograman Web karangan Abdul Kadir Pengantar Java Script di http:www.ilmukomputer.com20060819pengantar- java-script

Pasal 1 Dalam Undang-undang ini yang dimaksud dengan : 1. Telekomunikasi adalah setiap pemancaran, pengiriman, dan atau penerimaan dan setiap informasi dalam bentuk tanda-tanda, isyarat, tulisan, gambar, suara, dan bunyi melalui sistem kawat, optik, radio

SATUAN ACARA PERKULIAHAN MATA KULIAH : Interaksi Manusia dan Komputer KODE SKS : 4IA( KK045306

Mata Kuliah : Konsep Teknologi Informasi B

Dokumen yang Anda mencari sudah siap untuk unduhkan