J Innov Med Technol 2024; 2(2): 61-68
Published online November 30, 2024
https://doi.org/10.61940/jimt.240007
© Korean Innovative Medical Technology Society
Correspondence to : Kwang Gi Kim
Department of Biomedical Engineering, College of Medicine, Gachon University Gil Medical Center, 38-13 Dokjeom-ro 3beon-gil, Namdong-gu, Incheon 21565, Korea
e-mail kimkg@gachon.ac.kr
https://orcid.org/0000-0001-9714-6038
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: Because computer-aided diagnoses can identify features that are not easily visible, their application in X-ray images of the chest is continuously increasing. However, most chest X-ray computer-aided diagnosis studies have been conducted using frontal projection,—such as posterior-anterior or anterior-posterior—radiographs, because other projections that might show otherwise obscured characteristic features are not widely accessible. Accordingly, we investigated the rib semantic segmentation in chest X-ray images projected frontally and obliquely.
Materials and Methods: Chest X-rays were captured in frontal and oblique projections, and three UNet-based models were employed for efficient segmentation. Chest X-ray images may exhibit overlapping tissues, which can negatively affect the results. To overcome this issue, rib enhancement preprocessing was carried out, using a publicly available deep-learning rib suppression model.
Results: The U2Net model was found to be the most effective model when working with oblique data, achieving a Dice coefficient of 0.89. Significantly outperforming the UNet and attention UNet, the U2Net P-values of 0.03, respectively.
Conclusion: The difference in performance between oblique and frontal projections was hypothesized to be less pronounced than that between other projections. Despite employing a limited set of data with the frontal projection, we expected a significant performance improvement with the oblique projection. This hypothesis was confirmed, revealing a significant difference with P<0.05. Further research on different projections may serve as a foundation for offering more diversified information to radiologists and researchers.
Keywords Diagnosis, computer-assisted; X-ray; Deep learning; Ribs
A chest X-ray (CXR) is a widely used radiological method for diagnosing various diseases and injuries such as pneumonia, lung nodules, and rib fractures1,2. However, CXR loses accuracy stems due to overlapping tissues, hinders the diagnostic process2,3. Because computer-aided diagnosis (CAD) identifies features that are imperceptible to the naked eye, it has become an attractive solution for CXR4,5. To facilitate this process, the National Institutes of Health have released more than 100,000 anonymized CXR images in 2017, which now comprise ChestX-ray8 dataset1. Wang et al.1, who provided the report released with ChestX-ray8, classified eight pathology classes with area under the curve value ranging from 0.56 to 0.81 using the ChestX-ray8 dataset and ResNet50. Subsequently, ChestX-ray8 has been utilized in numerous studies as a pre-training data set for CXR classification networks, yielding superior performance compared to networks pre-trained with ImageNet2,6.
CXR serves as a routine diagnostic method for rib fractures, which result in significant morbidity and often accompany major trauma, causing complications in the surrounding soft tissue7,8. Thus, the identifying of rib fractures serves as an indicator of trauma and predictor of morbidity8. Although deep-learning (DL) based CAD enables the effective prediction of thoracic diseases using CXR, the diagnosis of rib fractures still requires refinement9. Tsai et al.9 trained two convolution neural networks—namely UNet and EfficientNet-b0—to detect lung fields and classify the presence of fractures for diagnosis. To create a dataset, 4,366 CXR radiographs were gathered, including 3,411 normal images and 955 rib fractures images, with frontal and oblique (OBL) projections. They achieved a region of interest in the lungs with a Dice coefficient score of up to 0.95 and classified fractures with an area under receiver operation characteristic of up to 0.97 in frontal images and 0.86 in OBL images.
Owing to the coronavirus disease 2019 (COVID-19) pandemic, CAD has become necessary to meet the increased demand for CXR and prevent contact between infected individuals and radiologists10-12. To diagnose of COVID-19, Ismael and Şengür11 utilized a convolution neural networks for feature extraction and a support vector machine for deep feature classification. For fine-tuning, they adopted a dataset comprising 180 COVID-19 and 200 normal CXR radiographs. By applying a fine-tuned ResNet50 and support vector machine with a linear kernel function, an accuracy of 94.7% was achieved. Tahir et al.12 proposed a collaborative human-machine approach to localizing and grading the severity of COVID-19 infections, with UNet, UNet++, and a feature pyramid network used to separate the lung and COVID-19 pneumonia masks. They separated lung and infection images with Dice coefficient scores up to 0.98 and 0.88, respectively, and determined the degree of COVID-19 infection by calculating the overall percentage between two-pixel pairs.
The studies exemplify the utility of DL for CXR-based diagnoses. Although the rib projection in CXR can aid in identifying the relative locations of abnormalities, it can also obscure the identification of soft tissue overlaps13. Therefore, improving or suppressing the rib projection can facilitate manual diagnoses processes as well as data preprocessing for DL models. DL-based image segmentation has been found to be effective in this regard14,15. Oliveira et al.14 utilized 3D data obtained from computed tomography to obtain 2D images resembling CXR, which were then used as rib segmentation training data. Their pipeline garnered an area under the curve higher than 0.86 on OpenIST and 0.93 on Japanese Society of Radiological Technology Database. They reported that rib segmentation is a crucial preprocessing step to ensure the diverse applicability of DL models. Wang et al.15 proposed a framework to enhance rib segmentation accuracy for limited labeled samples by utilizing unpaired sample augmentation and multi-scale networks. The framework achieved and accuracy of 88.03% in rib segmentation on CXR images. The authors emphasized that rib segmentation is a crucial step in computer-aided lung cancer diagnostic systems, facilitating the quantitative analysis of diverse lung diseases.
Although DL diagnostics using CXR images are effective, most CXR CAD studies involving rib bone segmentation utilize frontal projection radiographs, such as posterior-anterior (PA) and anterior-posterior (AP) projection because other projections are often inaccessible16. Furthermore, the performance of CAD deteriorates, resulting in significant variability between projections17. However, other projections may be important to achieve a diagnosis, may be important to achieve a diagnosis, as they show characteristic features that are not visible in the frontal projection18-21. Accordingly, our study was conducted to assess model performance when supplementing frontal projection datasets with CXR images in OBL projection.
We collected CXR images in frontal and OBL projections and employed three UNet-based models for semantic segmentation. By comparing the segmentation results of CXR images taken from both frontal and OBL projections, we elucidated the potential applications of CXR within CAD systems.
Our dataset consists of CXR images and corresponding mask images. A total of 127 CXR images available in the DICOM format that were provided by the Gachon University Gil Medical Center (GUGMC). To evaluate the network’s rib segmentation performance, we relied on masks derived from expert annotations. The retrospective study was approved by the Institutional Review Board of GUGMC, and the requirement for informed patient consent was waived (approval number: GBIRB-2019-337).
The CXR images are of vary in sizes and are typically between 2,000 and 3,000 pixels in width and height. To prevent memory resource issues and ensure an adequate batch size, all images were resized to a resolution of 512×512 pixels. The overall dataset includes images taken from the PA, AP, and OBL projections. Two subsets were utilized for training: including only PA and AP projections, and the other encompassing all projections. The positioning of projections was ascertained by analyzing DICOM headers. All mask images had binary values of 0 or 255 and possessed the same resolution as the CXR images. Furthermore, they were resized while still retaining their binary values. The mask only identifies whether a pixel belongs to a rib, with no indication of location or number for individual ribs. Fig. 1 illustrates an example of data.
Our objective was to enhance the performance of segmentation network by integrating OBL data with frontal projection images. However, only 22 of the 127 images represented OBL projection data. To ensure a sufficient quantity of validation data, we conducted k-fold validation with k=5.
Because various tissues may overlap in CXR images, it is essential to preprocess the images by enhancing or suppressing identifiable tissues. Extensive research has been conducted on CXR concerning regarding the suppression of rib bones, using DL-based image generators4,22. Such generators can produce rib suppression images to enhance performance when combined with raw image data. We utilized to supplement faint features that may have been obscured in the original data. Each image was assigned to one RGB channel. To evaluate the impact of preprocessing, we compared the original and enhanced data using a ResNet-based bone suppression network, as suggested by Rajaraman et al.4 Histogram equalization was applied to each image. Fig. 2 illustrates the preprocessing sequence, and Fig. 3 presents an example result.
For the rib segmentation task, we employed UNet, attention-UNet, and U2Net, as referenced from the “keras-unet collection” by Sha23 on GitHub. UNet is a semantic segmentation network, commonly used as a primary tool in medical imaging. Consisting of a symmetric encoder and decoder, UNet is able to generate detailed segmentation maps with limited training data, making it particularly valuable for medical imaging24.
We compared performance between UNet and its two variants: attention UNet, and U2Net. The attention UNet, proposed by Oktay et al.25 in 2018, integrates the attention mechanism, known as the attention gate, into the skip connection. These attention gates ensure that the network focuses on target structures of varying shapes and sizes, thereby improving the prediction performance. U2Net was proposed by Qin et al.26, in 2020 to capture more contextual information at different scales, and increase the depth of the overall architecture without significantly increasing computational cost. Fig. 4 illustrates the structures of these networks.
Because mask does not isolate individual bones, the segmentation task is a binary classification problem for each pixel. Network performance was evaluated calculated by comparing the mask pixels predicted by networks with the ground truth. We adopted the Jaccard index—also known as intersection over union, IoU—and the Dice coefficient as performance metrics27. We compared performance under three distinct conditions: the network utilized, the implementation of preprocessing, and the integration of OBL data.
To verify the impact of including diagonal data, we compare the results of the three segmentation networks in three datasets. Performance was assessed by computing Dice coefficients between the predictions and radiographer-labeled masks. The resulting coefficients are presented in Table 1 and illustrated through box plots in Fig. 5.
Table 1 Results of each network and dataset in terms of Dice coefficient index
Network | Including oblique data | Dataset-A | Dataset-B | Dataset-C | Network mean |
---|---|---|---|---|---|
UNet | True | 0.88 | 0.88 | 0.88 | 0.88 |
False | 0.88 | 0.88 | 0.88 | 0.88 | |
U2Net | True | 0.89 | 0.89* | 0.89 | 0.89 |
False | 0.89 | 0.88 | 0.89 | 0.88 | |
Attention | True | 0.87 | 0.87 | 0.87 | 0.87 |
False | 0.87 | 0.89 | 0.87 | 0.86 | |
Dataset mean | True | 0.88 | 0.88 | 0.88 | |
False | 0.87 | 0.87 | 0.87 |
Dataset-A: only histogram equalization, Dataset-B: bone enhancement with histogram equalization, Dataset-C: RGB images.
*P<0.05.
The U2Net model was demonstrated to be the most effective model when working with OBL data, achieving a Dice coefficient of 0.89. Significantly outperforming the UNet and attention UNet, the U2Net yielded P-values of 0.03 and 0.00, respectively, in the same dataset. Preprocessing was found to be ineffective.
The Dice coefficients in the OBL dataset were lower by 0.05 to 0.11 for the mean and 0.06 to 0.14 for the bottom quartile when compared to the frontal. These differences are attributed to the data projection and smaller number of samples. However, a 20% proportion of OBL data was associated with an average performance improvement of 0.06 0.08 for the bottom quartile. The Average and quartiles of the Dice coefficients for each projection are shown in Table 2.
Table 2 Average and quartile of Dice coefficient for each projection, in U2Net with single-channel data and preprocessing
Projection | Inclusion of oblique | Average | 25 Percentiles | Median | 75 Percentiles |
---|---|---|---|---|---|
Frontal | True | 0.90 | 0.89 | 0.91 | 0.93 |
Oblique | 0.85 | 0.83 | 0.87 | 0.89 | |
Frontal | False | 0.90 | 0.89 | 0.91 | 0.92 |
Oblique | 0.79 | 0.75 | 0.81 | 0.87 |
The enhancement of performance in result with lower grades was more discernible compared to those with higher grades. Although we anticipated improved performance for frontal projections due to additional information provided by OBL data, no such improvement was observed. To determine if there was a significant difference in performance based on the inclusion of OBL data, we calculated P-values between the inference results of the models, and the results are shown in Table 3.
Table 3 P-value based on inclusion of oblique data
Network | Dataset-A | Dataset-B | Dataset-C |
---|---|---|---|
UNet | 0.41 | 0.41 | 0.12 |
U2Net | 0.09 | 0.02* | 0.07 |
Attention UNet | 0.37 | 0.43 | 0.52 |
*P<0.05.
We computed P-values to determine whether the inclusion of OBL data yields a statistically significant improvement in performance. A significant difference at P<0.05 was observed when using U2Net with single-channel data and preprocessing. Fig. 6 presents results for U2Net with single-channel data and preprocessing.
Our study assessed the effect on model performance when supplementing frontal projection data with OBL-projected. To compare model performance with and without OBL data, we created two datasets: to represent the two cases. These datasets were used to train three models-UNet, U2Net, and attention UNet. Performance was evaluated by comparing the Jaccard indexes and Dice scores obtained by the models.
Because OBL data provides additional information on overlapping tissues20, we anticipated that the model would offer significant input for these tissues, and that the OBL data would prove comparable to frontal-projected data. Furthermore, we expected even a small amount of data could have a considerable impact.
We found U2-Net to outperform both other networks, likely because its nested structure permits sufficient training with less data. Because our dataset comprises 127 images, only 22 of which represent OBL data, it is crucial to compare findings with a more extensive range of data.
We also employed a DL bone-suppression-based preprocessing procedure. The suppression model uses ResNetBS, as proposed by Rajaraman et al.4, with the network obtained from their GitHub repository following the proposed image input and output sizes of 224×224. Consequently, our preprocessing tool reduced the dimensions of bone suppression images. This reduction in performance, along with the limited dataset, may have impaired performance, especially for cases where the bones are already clearly distinguishable. Therefore, a thorough review under a larger dataset is necessary.
In this study, we assessed the effect of augmenting frontal projection data with CXR images from OBL projections on model performance. Our results reveal that when including OBL data, an average performance increase of 0.05–0.11 can be achieved, with the lower quartile exhibiting a performance increase in the range of 0.06–0.14. We anticipated improved performance for frontal projections due to the additional information about overlapping tissues, but no such improvement was observed. Studies of different projections will provide a greater understanding of CXR applications at various projections in CAD systems.
None.
No potential conflict of interest relevant to this article was reported.
This work was supported by the GRRC program of Gyeonggi province. [GRRC-Gachon2023(B01), Development of AI-based medical imaging technology].
J Innov Med Technol 2024; 2(2): 61-68
Published online November 30, 2024 https://doi.org/10.61940/jimt.240007
Copyright © Korean Innovative Medical Technology Society.
Jun-Ha Park1 , Young Jae Kim2 , Kwang Gi Kim1,3,4,5
1Department of Bio-Health Medical Engineering, Gachon University, Seongnam, Korea, 2Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon, Korea, 3Department of Biomedical Engineering, College of Medicine, Gachon University Gil Medical Center, Incheon, Korea, 4Medical Devices R&D Center, Gachon University Gil Medical Center, Incheon, Korea, 5Department of Biomedical Engineering, College of IT Convergence, Gacheon University, Seongnam, Korea
Correspondence to:Kwang Gi Kim
Department of Biomedical Engineering, College of Medicine, Gachon University Gil Medical Center, 38-13 Dokjeom-ro 3beon-gil, Namdong-gu, Incheon 21565, Korea
e-mail kimkg@gachon.ac.kr
https://orcid.org/0000-0001-9714-6038
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: Because computer-aided diagnoses can identify features that are not easily visible, their application in X-ray images of the chest is continuously increasing. However, most chest X-ray computer-aided diagnosis studies have been conducted using frontal projection,—such as posterior-anterior or anterior-posterior—radiographs, because other projections that might show otherwise obscured characteristic features are not widely accessible. Accordingly, we investigated the rib semantic segmentation in chest X-ray images projected frontally and obliquely.
Materials and Methods: Chest X-rays were captured in frontal and oblique projections, and three UNet-based models were employed for efficient segmentation. Chest X-ray images may exhibit overlapping tissues, which can negatively affect the results. To overcome this issue, rib enhancement preprocessing was carried out, using a publicly available deep-learning rib suppression model.
Results: The U2Net model was found to be the most effective model when working with oblique data, achieving a Dice coefficient of 0.89. Significantly outperforming the UNet and attention UNet, the U2Net P-values of 0.03, respectively.
Conclusion: The difference in performance between oblique and frontal projections was hypothesized to be less pronounced than that between other projections. Despite employing a limited set of data with the frontal projection, we expected a significant performance improvement with the oblique projection. This hypothesis was confirmed, revealing a significant difference with P<0.05. Further research on different projections may serve as a foundation for offering more diversified information to radiologists and researchers.
Keywords: Diagnosis, computer-assisted, X-ray, Deep learning, Ribs
A chest X-ray (CXR) is a widely used radiological method for diagnosing various diseases and injuries such as pneumonia, lung nodules, and rib fractures1,2. However, CXR loses accuracy stems due to overlapping tissues, hinders the diagnostic process2,3. Because computer-aided diagnosis (CAD) identifies features that are imperceptible to the naked eye, it has become an attractive solution for CXR4,5. To facilitate this process, the National Institutes of Health have released more than 100,000 anonymized CXR images in 2017, which now comprise ChestX-ray8 dataset1. Wang et al.1, who provided the report released with ChestX-ray8, classified eight pathology classes with area under the curve value ranging from 0.56 to 0.81 using the ChestX-ray8 dataset and ResNet50. Subsequently, ChestX-ray8 has been utilized in numerous studies as a pre-training data set for CXR classification networks, yielding superior performance compared to networks pre-trained with ImageNet2,6.
CXR serves as a routine diagnostic method for rib fractures, which result in significant morbidity and often accompany major trauma, causing complications in the surrounding soft tissue7,8. Thus, the identifying of rib fractures serves as an indicator of trauma and predictor of morbidity8. Although deep-learning (DL) based CAD enables the effective prediction of thoracic diseases using CXR, the diagnosis of rib fractures still requires refinement9. Tsai et al.9 trained two convolution neural networks—namely UNet and EfficientNet-b0—to detect lung fields and classify the presence of fractures for diagnosis. To create a dataset, 4,366 CXR radiographs were gathered, including 3,411 normal images and 955 rib fractures images, with frontal and oblique (OBL) projections. They achieved a region of interest in the lungs with a Dice coefficient score of up to 0.95 and classified fractures with an area under receiver operation characteristic of up to 0.97 in frontal images and 0.86 in OBL images.
Owing to the coronavirus disease 2019 (COVID-19) pandemic, CAD has become necessary to meet the increased demand for CXR and prevent contact between infected individuals and radiologists10-12. To diagnose of COVID-19, Ismael and Şengür11 utilized a convolution neural networks for feature extraction and a support vector machine for deep feature classification. For fine-tuning, they adopted a dataset comprising 180 COVID-19 and 200 normal CXR radiographs. By applying a fine-tuned ResNet50 and support vector machine with a linear kernel function, an accuracy of 94.7% was achieved. Tahir et al.12 proposed a collaborative human-machine approach to localizing and grading the severity of COVID-19 infections, with UNet, UNet++, and a feature pyramid network used to separate the lung and COVID-19 pneumonia masks. They separated lung and infection images with Dice coefficient scores up to 0.98 and 0.88, respectively, and determined the degree of COVID-19 infection by calculating the overall percentage between two-pixel pairs.
The studies exemplify the utility of DL for CXR-based diagnoses. Although the rib projection in CXR can aid in identifying the relative locations of abnormalities, it can also obscure the identification of soft tissue overlaps13. Therefore, improving or suppressing the rib projection can facilitate manual diagnoses processes as well as data preprocessing for DL models. DL-based image segmentation has been found to be effective in this regard14,15. Oliveira et al.14 utilized 3D data obtained from computed tomography to obtain 2D images resembling CXR, which were then used as rib segmentation training data. Their pipeline garnered an area under the curve higher than 0.86 on OpenIST and 0.93 on Japanese Society of Radiological Technology Database. They reported that rib segmentation is a crucial preprocessing step to ensure the diverse applicability of DL models. Wang et al.15 proposed a framework to enhance rib segmentation accuracy for limited labeled samples by utilizing unpaired sample augmentation and multi-scale networks. The framework achieved and accuracy of 88.03% in rib segmentation on CXR images. The authors emphasized that rib segmentation is a crucial step in computer-aided lung cancer diagnostic systems, facilitating the quantitative analysis of diverse lung diseases.
Although DL diagnostics using CXR images are effective, most CXR CAD studies involving rib bone segmentation utilize frontal projection radiographs, such as posterior-anterior (PA) and anterior-posterior (AP) projection because other projections are often inaccessible16. Furthermore, the performance of CAD deteriorates, resulting in significant variability between projections17. However, other projections may be important to achieve a diagnosis, may be important to achieve a diagnosis, as they show characteristic features that are not visible in the frontal projection18-21. Accordingly, our study was conducted to assess model performance when supplementing frontal projection datasets with CXR images in OBL projection.
We collected CXR images in frontal and OBL projections and employed three UNet-based models for semantic segmentation. By comparing the segmentation results of CXR images taken from both frontal and OBL projections, we elucidated the potential applications of CXR within CAD systems.
Our dataset consists of CXR images and corresponding mask images. A total of 127 CXR images available in the DICOM format that were provided by the Gachon University Gil Medical Center (GUGMC). To evaluate the network’s rib segmentation performance, we relied on masks derived from expert annotations. The retrospective study was approved by the Institutional Review Board of GUGMC, and the requirement for informed patient consent was waived (approval number: GBIRB-2019-337).
The CXR images are of vary in sizes and are typically between 2,000 and 3,000 pixels in width and height. To prevent memory resource issues and ensure an adequate batch size, all images were resized to a resolution of 512×512 pixels. The overall dataset includes images taken from the PA, AP, and OBL projections. Two subsets were utilized for training: including only PA and AP projections, and the other encompassing all projections. The positioning of projections was ascertained by analyzing DICOM headers. All mask images had binary values of 0 or 255 and possessed the same resolution as the CXR images. Furthermore, they were resized while still retaining their binary values. The mask only identifies whether a pixel belongs to a rib, with no indication of location or number for individual ribs. Fig. 1 illustrates an example of data.
Our objective was to enhance the performance of segmentation network by integrating OBL data with frontal projection images. However, only 22 of the 127 images represented OBL projection data. To ensure a sufficient quantity of validation data, we conducted k-fold validation with k=5.
Because various tissues may overlap in CXR images, it is essential to preprocess the images by enhancing or suppressing identifiable tissues. Extensive research has been conducted on CXR concerning regarding the suppression of rib bones, using DL-based image generators4,22. Such generators can produce rib suppression images to enhance performance when combined with raw image data. We utilized to supplement faint features that may have been obscured in the original data. Each image was assigned to one RGB channel. To evaluate the impact of preprocessing, we compared the original and enhanced data using a ResNet-based bone suppression network, as suggested by Rajaraman et al.4 Histogram equalization was applied to each image. Fig. 2 illustrates the preprocessing sequence, and Fig. 3 presents an example result.
For the rib segmentation task, we employed UNet, attention-UNet, and U2Net, as referenced from the “keras-unet collection” by Sha23 on GitHub. UNet is a semantic segmentation network, commonly used as a primary tool in medical imaging. Consisting of a symmetric encoder and decoder, UNet is able to generate detailed segmentation maps with limited training data, making it particularly valuable for medical imaging24.
We compared performance between UNet and its two variants: attention UNet, and U2Net. The attention UNet, proposed by Oktay et al.25 in 2018, integrates the attention mechanism, known as the attention gate, into the skip connection. These attention gates ensure that the network focuses on target structures of varying shapes and sizes, thereby improving the prediction performance. U2Net was proposed by Qin et al.26, in 2020 to capture more contextual information at different scales, and increase the depth of the overall architecture without significantly increasing computational cost. Fig. 4 illustrates the structures of these networks.
Because mask does not isolate individual bones, the segmentation task is a binary classification problem for each pixel. Network performance was evaluated calculated by comparing the mask pixels predicted by networks with the ground truth. We adopted the Jaccard index—also known as intersection over union, IoU—and the Dice coefficient as performance metrics27. We compared performance under three distinct conditions: the network utilized, the implementation of preprocessing, and the integration of OBL data.
To verify the impact of including diagonal data, we compare the results of the three segmentation networks in three datasets. Performance was assessed by computing Dice coefficients between the predictions and radiographer-labeled masks. The resulting coefficients are presented in Table 1 and illustrated through box plots in Fig. 5.
Table 1 . Results of each network and dataset in terms of Dice coefficient index.
Network | Including oblique data | Dataset-A | Dataset-B | Dataset-C | Network mean |
---|---|---|---|---|---|
UNet | True | 0.88 | 0.88 | 0.88 | 0.88 |
False | 0.88 | 0.88 | 0.88 | 0.88 | |
U2Net | True | 0.89 | 0.89* | 0.89 | 0.89 |
False | 0.89 | 0.88 | 0.89 | 0.88 | |
Attention | True | 0.87 | 0.87 | 0.87 | 0.87 |
False | 0.87 | 0.89 | 0.87 | 0.86 | |
Dataset mean | True | 0.88 | 0.88 | 0.88 | |
False | 0.87 | 0.87 | 0.87 |
Dataset-A: only histogram equalization, Dataset-B: bone enhancement with histogram equalization, Dataset-C: RGB images..
*P<0.05..
The U2Net model was demonstrated to be the most effective model when working with OBL data, achieving a Dice coefficient of 0.89. Significantly outperforming the UNet and attention UNet, the U2Net yielded P-values of 0.03 and 0.00, respectively, in the same dataset. Preprocessing was found to be ineffective.
The Dice coefficients in the OBL dataset were lower by 0.05 to 0.11 for the mean and 0.06 to 0.14 for the bottom quartile when compared to the frontal. These differences are attributed to the data projection and smaller number of samples. However, a 20% proportion of OBL data was associated with an average performance improvement of 0.06 0.08 for the bottom quartile. The Average and quartiles of the Dice coefficients for each projection are shown in Table 2.
Table 2 . Average and quartile of Dice coefficient for each projection, in U2Net with single-channel data and preprocessing.
Projection | Inclusion of oblique | Average | 25 Percentiles | Median | 75 Percentiles |
---|---|---|---|---|---|
Frontal | True | 0.90 | 0.89 | 0.91 | 0.93 |
Oblique | 0.85 | 0.83 | 0.87 | 0.89 | |
Frontal | False | 0.90 | 0.89 | 0.91 | 0.92 |
Oblique | 0.79 | 0.75 | 0.81 | 0.87 |
The enhancement of performance in result with lower grades was more discernible compared to those with higher grades. Although we anticipated improved performance for frontal projections due to additional information provided by OBL data, no such improvement was observed. To determine if there was a significant difference in performance based on the inclusion of OBL data, we calculated P-values between the inference results of the models, and the results are shown in Table 3.
Table 3 . P-value based on inclusion of oblique data.
Network | Dataset-A | Dataset-B | Dataset-C |
---|---|---|---|
UNet | 0.41 | 0.41 | 0.12 |
U2Net | 0.09 | 0.02* | 0.07 |
Attention UNet | 0.37 | 0.43 | 0.52 |
*P<0.05..
We computed P-values to determine whether the inclusion of OBL data yields a statistically significant improvement in performance. A significant difference at P<0.05 was observed when using U2Net with single-channel data and preprocessing. Fig. 6 presents results for U2Net with single-channel data and preprocessing.
Our study assessed the effect on model performance when supplementing frontal projection data with OBL-projected. To compare model performance with and without OBL data, we created two datasets: to represent the two cases. These datasets were used to train three models-UNet, U2Net, and attention UNet. Performance was evaluated by comparing the Jaccard indexes and Dice scores obtained by the models.
Because OBL data provides additional information on overlapping tissues20, we anticipated that the model would offer significant input for these tissues, and that the OBL data would prove comparable to frontal-projected data. Furthermore, we expected even a small amount of data could have a considerable impact.
We found U2-Net to outperform both other networks, likely because its nested structure permits sufficient training with less data. Because our dataset comprises 127 images, only 22 of which represent OBL data, it is crucial to compare findings with a more extensive range of data.
We also employed a DL bone-suppression-based preprocessing procedure. The suppression model uses ResNetBS, as proposed by Rajaraman et al.4, with the network obtained from their GitHub repository following the proposed image input and output sizes of 224×224. Consequently, our preprocessing tool reduced the dimensions of bone suppression images. This reduction in performance, along with the limited dataset, may have impaired performance, especially for cases where the bones are already clearly distinguishable. Therefore, a thorough review under a larger dataset is necessary.
In this study, we assessed the effect of augmenting frontal projection data with CXR images from OBL projections on model performance. Our results reveal that when including OBL data, an average performance increase of 0.05–0.11 can be achieved, with the lower quartile exhibiting a performance increase in the range of 0.06–0.14. We anticipated improved performance for frontal projections due to the additional information about overlapping tissues, but no such improvement was observed. Studies of different projections will provide a greater understanding of CXR applications at various projections in CAD systems.
None.
No potential conflict of interest relevant to this article was reported.
This work was supported by the GRRC program of Gyeonggi province. [GRRC-Gachon2023(B01), Development of AI-based medical imaging technology].
Table 1 . Results of each network and dataset in terms of Dice coefficient index.
Network | Including oblique data | Dataset-A | Dataset-B | Dataset-C | Network mean |
---|---|---|---|---|---|
UNet | True | 0.88 | 0.88 | 0.88 | 0.88 |
False | 0.88 | 0.88 | 0.88 | 0.88 | |
U2Net | True | 0.89 | 0.89* | 0.89 | 0.89 |
False | 0.89 | 0.88 | 0.89 | 0.88 | |
Attention | True | 0.87 | 0.87 | 0.87 | 0.87 |
False | 0.87 | 0.89 | 0.87 | 0.86 | |
Dataset mean | True | 0.88 | 0.88 | 0.88 | |
False | 0.87 | 0.87 | 0.87 |
Dataset-A: only histogram equalization, Dataset-B: bone enhancement with histogram equalization, Dataset-C: RGB images..
*P<0.05..
Table 2 . Average and quartile of Dice coefficient for each projection, in U2Net with single-channel data and preprocessing.
Projection | Inclusion of oblique | Average | 25 Percentiles | Median | 75 Percentiles |
---|---|---|---|---|---|
Frontal | True | 0.90 | 0.89 | 0.91 | 0.93 |
Oblique | 0.85 | 0.83 | 0.87 | 0.89 | |
Frontal | False | 0.90 | 0.89 | 0.91 | 0.92 |
Oblique | 0.79 | 0.75 | 0.81 | 0.87 |
Jun-Ha Park, Young Jae Kim, Kwang Gi Kim
J Innov Med Technol 2024; 2(1): 11-19