The relation between color spaces and compositional data analysis demonstrated with magnetic resonance image processing applications

Images of living human brains can be acquired non-invasively by using magnetic resonance imaging (MRI). Different scanning parameters weight the image contrast to different tissue properties. A few examples of these differently weighted images are; T1 weighted (T1w) images to maximize the contrast between white matter and gray matter tissues, proton density weighted (PDw) for measuring concentration of hydrogen atoms and T2* weighted (T2*w) for creating a contrast highlighting the iron content. These images are commonly combined pairwise (using e.g. simple ratio images) to mitigate intensity inhomogeneities or to reveal specific tissue properties such as gray matter myelination. A principled way to combine more than two images at once is to consider multi-modal MRI data as compositions. The present study relates color space based image fusion to compositional data analysis and applies this concept to multi-modal MRI data in order to simultaneously reduce artefactual intensity inhomogeneities, enhance color balance and highlight specific tissue characteristics. To this end, brain images with three different contrasts (T1w, PD, T2*w) were acquired in-vivo at ultra high field (7 Tesla) MRI scanner and compositional data analysis methods were used to create virtual MR contrasts similar to conventional ratio images. In addition, simplicial centering was used to improve color balance and isometric logratio transformed coordinates of the tissue compositions were explored with two-dimensional transfer functions to probe meaningful compositional characteristics of brain tissues.


Introduction
Compositional data analysis can be applied to color images in the context of image processing and analysis. Color images are stored as triplets of non-negative integers representing the additive primary colors red, green and blue (referred as RGB channels). Once the color image is formed, the transformation from RGB to hue, saturation and intensity (HSI) coordinate system is an often used first step to improve the image visualization (Smith 1978;Ledley, Buas, and Golab 1990;Grasso 1993) (for the mathematical expression of this transformation see Appendix A.2). HSI color space is commonly used in computer applications because of the commonly accepted property of its components relating to human color perception (Joblove and Greenberg 1978;Levkowitz and Herman 1993;Pohl and van Genderen 2016). Hue relates to the dominant component of the primary colors (red, green, blue) in the mixture, saturation relates to the distance from an equal mixture (gray), and intensity relates to distance from total darkness (black to white). In this article, inspiration is drawn from RGB to HSI color space transformation to propose an analogous method based on compositional data analysis. During this process, a simple vector decomposition is outlined to justify the use of compositional data analysis (Aitchison 1982;Pawlowsky-Glahn, Egozcue, and Tolosana-Delgado 2015) and the simplex space is leveraged to manipulate vector fields to perform image enhancement. The proposed method is demonstrated using a magnetic resonance imaging (MRI) dataset. The results are also visualized using a digital color photograph to provide a general intuition. Due to the interdisciplinary nature of this work a glossary is provided in Appendix A.1 to accompany readers from different fields.

MRI data acquisition and preprocessing
Whole head T1 weighted (T1w), proton density weighted (PDw) and T2* weighted (T2*w) images at 0.7 mm isotropic volumetric cubic element (voxel) resolution were acquired in one male participant using a three dimensional magnetization prepared rapid acquisition gradient echo sequence with a 32-channel head coil (Nova Medical) on a 7 Tesla whole-body scanner (Siemens) in Maastricht Brain Imaging Center. These measurements reflect different intrinsic properties of the tissues, for instance T1w image shows the optimal contrast between white matter and gray matter, PDw image shows the density of the hydrogen atoms, T2*w image shows the iron content. For the detailed report of the data acquisition parameters see Appendix A.3. For a review on ultra high field MRI ((≥ 7 Tesla)) see Ugurbil (2014).
As a standard preprocessing step, a masking operation was performed based on the PDw image using FSL-BET software (version 2.1; Smith (2002); Smith, Jenkinson, and Woolrich (2004)). The resulting volumetric mask was applied to all images in order to discard parts containing non-brain tissues (e.g. bones, muscles, skin, air). This masking step was necessary to reduce the overall processing time in the following analyses by reducing the total number of voxels from 26214400 (320 × 320 × 256) to ∼ 4.5 million (4543582 to be exact). Measurements (T1w, PDw and T2*w) stored at each one of the ∼ 4.5 million voxels are considered as components of three separate scalar fields constituting a vector field when combined.

Barycentric decomposition
The image analysis illustrated in this paper starts from the following vector decomposition, where the real space of n dimensions is indicated with R n and the simplex space is indicated with S n symbols: where R n >0 indicates positive real numbers, k is an arbitrary scalar and the letter C stands for the closure operation used in compositional data analysis (Aitchison 2002;Pawlowsky-Glahn et al. 2015): The letter s is another scalar that is the sum of the vector components: Note that k disappears from the expression in Eq. 1 and 2 when selected as one. The decomposition ( Eq. 1) of the vector v into its barycentric coordinates (C( v)) and a scalar (s) allows the application of the compositional data analysis to the barycentric coordinates. Historically, the closure operation was used by August Ferdinand Mobius (Fauvel, Flood, and Wilson 1993) and the resulting vector was interpreted as relating to the barycenter (center of mass) of a simplex, hence giving the name to the vector decomposition demonstrated here.

Compositional image analysis
In this section, operations performed on the the barycentric coordinates are laid out. These operations are mostly adapted from Pawlowsky-Glahn et al. (2015) to fit the context of the analyzed vector field. Tensor notation is used for explicit formulations: Let A be a tensor with three dimensions, A x,y,z , where the coordinates x, y, z of an element represents the spatial location. As mentioned the in Section 2.1, there are ∼ 4.5 million tensors (i.e. voxels) of interest in total.
T stands for the transpose operator and t ∈ T 1w, P Dw, T 2 * w.
Concatenation denoted by is used to matricize the vectorized tensors as the next step: where number of rows of the matrix V is x × y × z (n =∼ 4.5 million), and the number of columns is 3. These operations are done for convenient indexing in the following equations.
Let V indicate the set of voxels where the vector v i is the composition of T1w, PDw and T2*w measurements: At this stage, V ∈ R 3 . Voxel-wise (i.e. row-wise) closure is applied to V to acquire the barycentric coordinates of every composition: The set of compositions X was centered by finding the sample center and perturbing each composition with the inverse of the sample center: where ⊕ denotes the perturbation operator (analogous to addition in real space): x ⊕ y = C[x T 1w y 1 , x P Dw y 2 , x T 2 * w y 3 ].
Multipliers of the components are indicated with y 1 , y 2 and y 3 . The term cen(X) is a vector that stands for the component-wise (i.e. column-wise) geometric mean across all voxels: After centering, the data is standardized: where symbol stands for the power operator (analogous to scaling in real space). The exponent p is applied to every component: and the total variance is computed by: where d 2 a indicates squared Aitchison distance: In the current example D = 3 because of operating on a set of three part compositions X (see Equation 7).
At this stage some visual intuition could be gained with regards to the vector field (MR images) that is decomposed and processed via compositional data analysis methods. For illustration purposes, this is done by generating virtual image contrasts computed through metrics of simplex space (see Figure 2).
The norm in S 3 image in Figure 2 is generated by computing Aitchison norm voxel-wise: The subscript a stands for the vector norm defined in simplex space (analogous to Euclidean norm in real space). Following Pawlowsky-Glahn et al. (2015), Aitchison norm is defined as: where D = 3 considering three different MRI measurements (T1w, PDw, T2*w).
It should be noted that this norm image is analogous to saturation dimension in the aforementioned RGB to HSI color space transformation (see Appendix A.2).
The angular difference in S 3 image in Figure 2 is the voxel-wise angular difference (∠) between the set of compositional vectors (X) and a reference vector (r): where x a stands for Aitchison norm of the vector x as defined in Equation 16 and x, r a stands for inner product in simplex space: Here, D = 3 again and it should be noted that the choice of the reference vector r is arbitrary. In this example the reference vector is selected as r = [0.05, 0.9, 0.05]. This angle difference image is similar to hue dimension in RGB to HSI color space transformation (see Appendix A.2).
Recognizing the similarity of norm and angular difference in simplex space to saturation and hue dimensions of HSI color space at this stage leads to the proposal of a color balance algorithm which is used together with centering (Eq. 8) and standardization (Eq. 11) to generate the color image in Figure 3 lower row: This method is inspired from other simple color balance methods that makes use of the dynamic range of RGB channels of rendered color images (Limare, Lisani, Morel, Petro, and Sbert 2011). Truncate function is useful when when there are a small amount of outliers compositions. In essence this functions pulls the compositions that are too far away from the center of the simplex space towards the center. In the current work, distance threshold was arbitrarily chosen λ = 3 based on visual inspection of the resulting color enhancement. It is conceivable that other operations based on percentiles can also be used to the make the decision less arbitrary (e.g. 99 th percentile of Aitchison norm distribution consisting of all vectors of X).
The ilr transformation (Egozcue, Pawlowsky-Glahn, Mateu-Figueras, and Barceló-Vidal 2003;Pawlowsky-Glahn et al. 2015) was performed voxel-wise to acquire real space coordinates (ilr coordinates) of the set compositions to generate Figure 3 right column: where the ilr transformation is defined as: H indicates the Helmert sub-matrix, chosen by following Tsagris, Preston, and Wood (2011);Lancaster (1965). In this case H consists of 3 rows and 2 columns and selected as the following: To explore the usefulness of the ilr coordinates and probe physically interpretable compositional characteristics, a real-time interactive visualization method is used with joint ilrcoordinates and image space representations of the data. This method is similar to using pre-defined modulation transfer functions for mapping the bins of a 2D histogram data representation to 3D image space used in volume rendering software (Kniss, Kindlmann, and Hansen 2005). Major brain tissues such as gray matter, white matter, cerebrospinal fluid arteries and sinuses are identified manually by the author and delineated in both ilr-coordinates and brain images using neuro-anatomical expertise.
All analysis steps demonstrated in this work are implemented in a free and open source Python package (available at https://github.com/ofgulban/compoda; Gulban (2018) 3. Results Figure 1 shows the similarity between HSI and the analogous metrics demonstrated in methods section. When comparing Fig. 1 rows 2-3 column 1 to column 3, it can be seen that hue and angular difference in simplex space are both invariant to intensity gradient visible in the sky (brightness change from upper right corner to lower left). The same observation can also be made for the saturation and norm in simplex space images (compare Fig. 1  to the underlying multiplicative scalar field (i.e. brightness). These image contrasts based on compositional metrics are useful in object recognition tasks, see how the clouds are invisible in hue and angular difference images or the well-defined tip of the rocket in saturation and norm images. Figure 2 depicts two different slices of the 3D brain images visualized in gray scale depicting different image contrasts. It can be seen that the smooth, artefactual multiplicative scalar field (referred as bias field in MRI literature, see arrows with asterisks in Fig. 2) is separated from the compositional components (compare Fig. 2 rows 1-2 column 2 with row 3 column 2 in both panels). This is similar to the separation of the brightness gradient in the sky in Fig. 1 as mentioned in the previous paragraph. The compositional image contrasts can be considered as virtual contrasts which enhance the visual appearance by being invariant to certain artefactual features. It should be mentioned that other image contrasts can be generated by changing the reference vector in angular difference images (see Eq. 18) to highlight specific tissues. Similarly, the compositions can also be perturbed (see Eq.8) differently to generate different contrasts in norm images. The generation of taskspecific virtual contrasts could potentially be useful in medical imaging applications since the input channels and the virtual contrasts are physically interpretable.
For the physical interpretation of the compositional vectors, a joint representation of color images and ilr coordinates of compositional vectors are presented in Figure 3. The difference between the rows in left column demonstrates the effect of color balance (see Equations 8 and 11). It can be seen that the image contrast is enhanced to the level of easily recognizing major brain tissues by associating them to different colors. This is a more convenient visualization compared to inspecting T1w, PDw, T2*w image contrasts individually considering that the color image contains information from all three inputs at the same time. Right column in Fig. 3 shows the 2D histograms of the ilr coordinates (see Eq. 22) of the vector compositions. The effect of color balance becomes apparent when the compositions are interpreted with regards to the embedded RGB color cube primary axes (real space axes). For instance most of the clusters are on the left hand side of the center, close to the green arrow (PDw). This indicates that PDw measurements were initially dominating the compositions causing the green heavy color image. After the color balance, compositions spread more equally along the primary color axes, indicating that color contrast in the image will be richer (i.e. less dominated by a single component). The color balanced visualization is more useful for human observers in terms of intuitively recognizing the tissues. For instance the arteries have mostly reddish-white colors and the sinuses appear in green, which corresponds to the areas delineated for these tissues in ilr coordinates when the positions of clusters are considered relative to the embedded primary axes of RGB color cube. Although both of these are blood vessels, the difference between arteries and sinuses is meaningful because sinuses contain mostly deoxygenated hemoglobin, leading to a rapid decay of the MRI signal. In contrast, arteries contain oxygenated blood with slower MR signal decay. The difference in signal decay times of blood vessels effects T2*w and PDw measurements separating the compositional properties in relation of T1w images. Similarly the compositional change from white matter to gray matter to cerebrospinal fluid can be seen as an approximately straight line which corresponds to the change from red-heavy color of white matter to cyan of cerebrospinal fluid. It can be seen that these tissues are not dispersed towards PDw or T2*w axes, which can be intepreted as PDw and T2*w measurements not revealing different compositions in relation to T1w measurements when white matter, gray matter and cerebrospinal fluid is considered. T2*w Nr. voxels Figure 3: MRI measurements rendered as a color image. Red channel is assigned to T1w, green to PDw and blue to T2*w measurements. Left column shows the 2D histograms of the corresponding ilr coordinates (Eq. 22). The projection of primary axes of RGB color cube (in R 3 ) to ilr coordinates are embedded to provide an intuitive reference for the characteristics of the compositions. The effect of centering and standardization inside the simplex (Eq. 8, 11) and the compositional truncation (Eq. 20) is visible as color balance improvement in the rendered brain slice. The labels pointing to the tissues in brain image and circles in 2D ilr coordinate histograms shows the relation of the compositional characteristics with the coloration. The circles in 2D histograms are the edges of the 2D transfer functions used to manually probe the tissue-ilr coordinate relationship.

Discussion
In this work, compositional data analysis methods are used to reformulate RGB-HSI color space transformation. It is visually demonstrated that the compositional metrics such as angular difference and norm in simplex space relate to hue and saturation concepts in color space literature. This reformulation of hue and saturation would be advantageous for having a well-principled framework when operating on images with n-dimensions. For instance, a potential future application would be to analyze MRI datasets with more than three types of measurements by adding cerebral blood volume measurements (Uludaǧ and Blinder 2018) or multi-echo echo planar imaging (Poser and Norris 2009) to T1, T2 and PD weighted measurements at ultra high fields. Another application area would be processing of multispectral images for image fusion purposes (Pohl and van Genderen 2016). In cases where more than three measurements are acquired for each element in an image, dimensionality reduction methods such as principal component analysis in simplex space (Wang, Shangguan, Guan, and Billard 2015) would become relevant to maximize the visualized information content of color images. This is the disadvantage of being limited to three primary additive colors for color image rendering. However the virtual image contrasts (scalar images) could still be explored without dimensionality reduction. For instance the norm in simplex space can still be straightforwardly computed for n-dimensional compositions or the angular difference images can be explored can be explored by selecting different reference vectors to track the positions of n-dimensional compositions on an n-sphere.
The disadvantages of compositional metrics presented here should also be mentioned. If the signal to noise ratio of the acquired images are low, compositional image contrasts would become less useful. For instance, the parts of images where only the measurement noise is recorded (e.g. thermal noise) the compositional metrics would carry no meaning. In such cases the angular difference and norm in simplex space would return enhanced noise patterns therefore the local brightness (i.e. intensity) should be taken into account before the compositional image analysis. The domain-specific exploration of the parameter spaces are also needed. For instance what is the extent of the usefulness of the compositional image processing when the noise properties are dissimilar between different types of measurements or when the measurements have different spatial scales (e.g. images with different resolutions). Further investigation is necessary to establish the relevance of the proposed application of compositional data analysis specifically to MRI data and generally to image processing and image fusion.

Acknowledgements
The data was acquired thanks to Federico De Martino, under the project supported by NWO VIDI grant 864-13-012. The author O.F.G. was also supported by the same grant. I thank Ingo Marquardt for language editing in the initial version of the manuscript and Alberto Cassese for the advice on mathematical notations in Section 2.2. In addition, I wish to express my appreciation for the comments and suggestions of the anonymous reviewers which I believe have helped to improve the manuscript.

A.1. Glossary
Barycentric coordinates: Center of mass-centric coordinates. Used in relation to the center of mass of an n-simplex.
Color balance: Used to indicate centering and standardizing compositional vectors in this work.
HSI: A three dimensional space with named axes relating to human color perception: hue, saturation, intensity.

MRI: Magnetic resonance imaging.
PDw: Proton density weighted MRI signal acquisition. Indicates density of the hydrogen atoms in different tissues.

RGB:
A three dimensional space with named axes relating to color cube: red, green, blue Simplex: Generalized geometrical notion of triangles. For example, 0-simplex is a point, 1-simplex is a line segment, 2-simplex is a triangle, 3-simplex is a tetrahedron etc.
T1w: T1 weighted MRI signal acquisition. Optimized to give the highest image contrast between white matter and gray matter brain tissues. T2*w: T2* weighted MRI signal acquisition. Optimized to give the highest image contrast related to iron concentration between brain tissues.
Voxel: Volumetric cubic element, in other words a three dimensional pixel.