Guided block matching and 4-D transform domain filter projection denoising method for dynamic PET image reconstruction

AlgorithmOverview of GBM4D

The noise of dynamic PET sinogram can be well-modeled as Poisson distribution. BM3D and BM4D methods are designed for Gaussian noise. Thus, the generalization Anscombe transform was first applied to the sinogram. The general procedure of GBM4D is demonstrated in Fig. 1.

Fig. 1figure 1

Flowchart of guided block matching and 4-D transform domain filter projection denoising method for dynamic PET image reconstruction. Steps 1 and 2 are repeatedly conducted for each block

The final estimation is obtained by inversed Anscombe transformation of GBM4D filtered sinogram. The algorithm consists of two steps: hard thresholding step and Wiener filtering step. Each of the processes consists of block-matching and collaborative filtering by shrinkage in a 4-D transform domain as follows:

Find blocks that are similar to the reference block in the cumulative PET sinogram. 2-D blocks at the corresponding spatial position in each scanning frame are stacked together to generate 3-D sinogram blocks. All similar blocks were stacked together to form a 4-D array (group).

Perform collaborative filtering of the group, then aggregate the sinograms by returning the filtered 3-D blocks to the original position.

Detailed of the Anscombe transformation and denoising Step 1 and Step 2 in Fig. 1 will be described in the following sections.

Guided block matching and grouping in Step 1

Considering Poisson noise in noisy sinogram \(z:X, T \to }\) as the form

$$\beginc} \right) = P\left( \right)} \right), \quad x\varepsilon X,t\varepsilon T} \\ \end$$

(1)

where \(}\) is independent random Poisson distribution, and y is the true sinogram. The Anscombe transformation was first conducted on the sinogram before Step 1. In this case, only pure Poisson noise is considered. After denoising process (both Step 1 and Step 2 in Fig. 1), inverse Anscombe exact unbiased transformation were conducted using validated database to avoid biased inverse result in low-count Poisson image [22].

After general Anscombe transformation, it is reasonable to assume that the noisy sinogram \(z:X, T \to }\) as the form

$$\beginc} \right) = y\left( \right) + \eta \left( \right), x\varepsilon X,t\varepsilon T} \\ \end$$

(2)

where \(x\) is the 2-D spatial position of the sinogram \(X \subset }^\), t is the temporal position of dynamic sinograms \(T \subset }^\) and \(\eta \left( \cdot \right)\sim N\left( } \right)\). In the block-matching process, the similarity of two blocks were measured by the inverse of the \(\ell^\)-distance. If the true image y were available, the block distance would be measured as:

$$\beginc} }}^ , Z_^ } \right) = \frac }}^ - Y_^ } \right\|_^ }} }}} \\ \end$$

(3)

where \(\left\| \cdot \right\|_\) denotes the \(\ell^\)-norm, and the blocks \(Z_ }}^ \;}\;Z_^\) are in \(z\) and are located at \(x_\) and \(x \in X\) at time \(t \in T\) and \(x_\) are located at the reference position, and blocks \(Y_ }}^ \;}\;Y_^\) are located at \(x_\) and \(x \in X\) at time \(t \in T\) in \(y\), N denotes the block size. In realistic situations, only noisy blocks \(Z_ }}^ \;}\; Z_^\) are available. Therefore, the distance is estimated as:

$$\beginc} \left( }}^ , Z_^ } \right) = \frac }}^ - Z_^ } \right\|_^ }} }}} \\ \end$$

(4)

The distance is a non-central chi-square random variable with expectation

$$\beginc} \left( }}^ , Z_^ } \right)} \right\} = d\left( }}^ , Z_^ } \right) + 2\sigma^ } \\ \end$$

(5)

and variance

$$\beginc} } \left\\left( }}^ ,Z_^ } \right)} \right\} = \frac }} }} + \frac d\left( }}^ , Z_^ } \right)}} }}} \\ \end$$

(6)

The variance grows asymptotically with \(}\left( } \right)\). For dynamic PET sinograms, the noise in each frame is relatively large compared with accumulative sinograms due to fewer photon counts in each frame. For larger \(\sigma\), the probability densities of different \(\hat\left( }}^ ,Z_^ } \right)\) might overlap heavily. Such mismatches can worsen the sparsity in the 4-D groups, which may lead to inefficiency in the collaborative filtering process. Previous work used coarse prefiltering to avoid such mismatch [8, 9], which is realized by linear transform on blocks and hard-thresholding. In this work, coarse prefiltering was applied to avoid mismatch along with the introduction of guide image (accumulated transformed PET sinograms). The distance after coarse prefiltering can be written as:

$$\beginc} \left( }} , Z_ } \right) = \frac \left( }\left( }}^}}} } \right)} \right) - \Upsilon^ \left( }\left( ^}}} } \right)} \right)} \right\|_^ }} }}} \\ \end$$

(7)

where \(\Upsilon^\) is the hard-thresholding operator and \(}\) is the linear transformation and

$$\beginc} }}^}}} = \mathop \sum \limits_ Z_ }}^ \;}\;Z_^}}} = \mathop \sum \limits_ Z_^ } \\ \end$$

(8)

As stated previously, the transformed accumulated PET sinogram has smaller noise compared with each dynamic PET sinogram frame. Therefore, we used accumulated transformed PET sinograms as guide images. For Step 1, the blocking set at \(x_\), \(S_ }}\), generated by block matching contains the blocks in each frame at \(x_\) and \(x\) where \(Z_ }}\) and \(Z_\) of guide image is similar:

$$\beginc} }} = \left\\left( }}^}}} , Z_^}}} } \right) \le \tau_}}} } \right\}} \\ \end$$

(9)

where \(\tau_}}}\) is the maximum \(\hat\left( }} , Z_ } \right)\) for which the block is considered similar to reference block. The block group is formed base on \(S_ }}\) by stacking \(Z_ }} }}^\) into a 4-D array. The array is of size \(N \times N \times \left| T \right| \times \left| }} } \right|\).

Collaborative filtering using hard-thresholding in Step 1

The collaborative filtering of \(Z_ }} }}^\) is conducted in 4-D domain using hard-thresholding in Step 1 in Fig. 1. This filtering can maintain good sparsity while obtaining the information of the correlation 1) between the pixels of a single block 2) between the pixels at the corresponding spatial position in grouped blocks 3) between the pixels at the corresponding temporal position in grouped blocks.

Similar 3-D patches were stacked to form 4-D patches to conduct collaborative filtering. For BM3D, denoising takes advantage of the sparsity in the spectrum of 3-D similar block groups. As demonstrated in Fig. 2, the sparsity of the 3-D block spectrum is enhanced by introducing kinetic information since the temporal correlation in the signals is also considered in GBM4D. The hard-thresholding filtering in the 4-D domain is expressed as:

$$\beginc} }_ }} }}^ = }_^ \left( }_ \left( }_ }} }}^ } \right)} \right)} \right)} \\ \end$$

(10)

Fig. 2figure 2

Example of a 3-D spectrum of a group in BM3D method performed on a single sinogram frame of dynamic PET transformed by 3-D linear transform b 4-D spectrum of the group in GBM4D method performed on a dynamic PET sinogram transformed by 4-D linear transform. The 4-D spectrum is sparser than the 3-D spectrum

\(}_\) and \(}_^\) are the normalized 4-D linear transform and inverse transform. In this work, 3-D DCT in spatial and temporal domain followed by 1D DCT transform in group direction and its inverse transform are applied. \(\) denotes the hard-thresholding process in Step 1 in Fig. 1:

$$\Upsilon \left( \upsilon \right) = \left\c} & \sigma } \\ & }} \\ \end } \right.$$

(11)

Here, \(\lambda_\) is set to 2.8 based on a previous study [24]. After aggregation by weighted average (detailed stated in 2.1.5), the filtered blocks were returned to the original position to form the basic estimation of sinogram \(\hat^}}} \left( \right)\) in Step 1 in Fig. 1.

Grouping and collaborative wiener filtering in Step 2

Step 1 gives a basic estimation of true dynamic PET sinogram \(\hat^}}} \left( \right)\). By accumulated sinogram based on \(\hat^}}} \left( \right)\), the guide image in Step 2 is calculated as:

$$\beginc} ^}}} \left( x \right) = \mathop \sum \limits_ \hat^}}} \left( \right)} \\ \end$$

(12)

The denoising is further improved by performing the grouping in Step 2 in Fig. 1 using the basic estimation and applying Wiener filtering.

As stated previously, the accumulated basic estimation, referring to guide image, is significantly attenuated, which helps to find more accurate block groups. The match blocks in Step 2 were generated as:

$$\beginc} }}^}}} = \left\_ }}^}}} - \hat} \right\|_^ }} }} \le \tau_}}}^}}} } \right\}} \\ \end$$

(13)

\(\hat_ }}^}}} }}^}_}} }}\) as the stacked block of grouped basic estimation blocks and \(Z_ }}^}}} }}^}_}} }}\) as the stacked block of grouped noisy sinogram blocks. The Wiener shrinkage coefficient is calculated as:

$$\beginc} }_ }}^}}} }}^ = }_^ \left( }}^}}} }} }_ \left( }_ }}^}}} }}^ } \right)} \right)} \\ \end$$

(14)

where

$$\beginc} }}^}}} }} = \frac}_ \left( _ }}^}}} }}^}_ }} } \right)} \right|^ }}}_ \left( _ }}^}}} }}^}_ }} } \right)} \right|^ + \sigma^ }}} \\ \end$$

(15)

By using the Wiener filtering, power spectrum of the basic estimate can be used to filter the groups by minimizing the least-square of the difference between modeled and filtered signals. After aggregation by weighted average, the Weiner filtered blocks were returned to the original position to form the final estimation of sinogram.

Aggregation by weighted average in Step 1 and Step 2

By returning the filtered block to the original position, the estimation of \(\hat^}}} \; }\; \hat^}}}\) can be calculated for both Step 1 and Step 2 in Fig. 1, which is called aggregation. Weighted average aggregation was adopted in this work as:

$$\beginc} \left( \right) = \frac \epsilon X}} \mathop \sum \nolimits_ \epsilon S_ }} }} \omega_ }} \hat_ }}^ }} \left( \right)}} \epsilon X}} \mathop \sum \nolimits_ \epsilon S_ }} }} \omega_ }} \chi_ }} \left( \right)}}, \quad \forall x\varepsilon X} \\ \end$$

(16)

where \(\chi_ }} :X \to \left\ \right\}\) is the characteristic function of the block and \(\omega_ }}\) is the weight function based on [7]. Kaiser window is also part of the weights to reduce border effects [7, 25].

Experimental setupComputer simulation

We performed computer simulation on the Shepp–Logan phantom (SLP) [26] and a digital brain phantom developed by Martin A. Belzunce et al. [27] For SLP, only physical decay was considered when generating the sinogram. The reconstruction image size was 8 × 128 × 128 × 128. For the digital brain phantom, TACs of gray matter, white matter and tumor tissue were calculated by compartment model. The pharmacokinetic parameters of gray matter, white matter and tumor were \(K_\) = 0.1104, 0.0622, 0.0640 mL/min/mL, \(k_\) = 0.1910, 0.1248, 0.0890 mL/min/mL, \(k_\) = 0.1024, 0.0070, 0.0738 mL/min/mL. \(F_}}\) were set to 0. The input function was extracted from previous work [28]. According to a previous study [2], a dynamic PET of 8 \(\times\) 6 min was performed. The tumor size is 4 × 4 × 4 pixels. The size of sinograms is 8 × 128 × 128 × 128. The sinogram is generated by forward radon transformation using Python scikit-image toolkit. After generating the noise-free sinogram, Poisson noise was added to the sinogram assuming the total photon count of 5 \(\times\) 108 according to previous simulation work [8]. The dynamic PET was then reconstructed using 2D-OSEM with twenty iterations and eight subsets with matrix size of 128 × 128 and no post-filter. The update equation for the OSEM can be briefly described as:

$$\hat_^ \right)}} = \frac_^ \right)}} }} \epsilon S_ }} H_ j}} }}\mathop \sum \limits_ }} H_ \frac }} H_ \hat_^ \right)}} }}$$

where \(f\) is the image under reconstruction, j and k are voxel indices, n is iteration number b is the subset number, \(i\) is the sinogram indices and \(S_\) is subset \(b\). \(p\) is the sinogram voxel measurement, and \(H\) is the system matrix generated by inversed radon transform using Python scikit-image toolkit. The size of the reconstruction image was 8 × 128 × 128 × 128, and the voxel size is 1.5 × 1.5 × 1.5 mm3. All simulation and reconstruction were performed based on PYTHON.

To validate the performance of GBM4D compared with other algorithms, total variation, wavelet, non-local means (NLM), and BM3D method were applied to denoise the sinogram using skimage toolkit in PYTHON except for BM3D. Total variation denoising aims at obtaining an image that has a minimal total variation norm. The weight of the total variation is set to 0.1 [29]. The non-local means algorithm replaces the value of a pixel by an average of a selection of other similar non-local pixels values. The patch size and the search area of NLM are set to 5 × 5 and 13 × 13 pixels [30]. Wavelet denoising uses the wavelet representation of the image to removed noise by shrinking all coefficients toward zero by a given amount. Soft thresholding and Bayes shrinking methods were adopted for wavelet denoising [31]. During the denoising process, the robust wavelet-based estimator of the noise standard deviation was applied based on a previous study [32]. Before the denoising, generalized Anscombe transformation was performed on all sinograms since all the methods were designed based on Gaussian noise instead of Poisson noise. The exact unbiased inverse of the Anscombe transformation was then performed on the denoised sinogram before the reconstruction. To exclude the effect of the reconstruction algorithm, the ground truth images were the reconstruction of the noiseless sinogram.

For the quantitative evaluation of different denoising methods, the structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) were calculated. SSIM measures the similarity of ground truth and denoised image based on the degradation of structural information [33]:

$$\beginc} } = \frac \mu_ + c_ } \right)\left( + c_ } \right)}}^ + \mu_^ + c_ } \right)\left( ^ + \sigma_^ + c_ } \right)}}} \\ \end$$

(17)

where \(\mu_\), \(\mu_\) are the mean of the ground truth image and the denoised image, \(\sigma_^\), \(\sigma_^\) are the deviation of the ground truth image and the denoised image, \(\sigma_\) is the covariance of the ground truth image and the denoised image, \(c_\) = \(c_\) = 0.012. PSNR was calculated in this work to measure the image quality at the pixel level:

$$\beginc} } = 20\log_ \frac}}}}}}\;}} \\ \end$$

(18)

where RMSE is the root mean square error between the ground truth image and the denoised image and peak is the peak value of the ground truth image. SSIM and PSNR in each time frame were measured. The TACs of different tissues were measured in the volume of interest of 4 × 4 × 4 pixels. The region of interest (ROI) positions can be seen in Fig. 7. To quantitatively measure the temporal smoothing performance of GBM4D, RMSE of TACs in different tissues measured from denoised images compared with the ground truth were calculated.

Real patient data

The real patient data in this retrospective study are based on an open accessed dynamic PET list-mode sinogram data source, which is acquired on a Siemens Biograph mMR, using amyloid tracer 18F-florbetapir, provided by Avid Radiopharmaceuticals, Inc., a wholly-owned subsidiary of Lilly [34, 35]. The data extraction and reconstruction of the dynamic PET data were performed offline using NiftyPET. The reconstruction was performed using histogram mode with image matrix sizes of 344 × 344 and no post-filter. [36]. The frame setting is also 8 \(\times\) 6 min. The dynamic PET images were then reconstructed using OSEM with four iterations and eight subsets. The reconstruction PET image size was 8 × 127 × 344 × 344. The direct sinogram and oblique sinograms were denoised separately. The denoising methods and parameters are the same as stated in the previous section.

Comments (0)

No login
gif