Original Article

Split Viewer

J Innov Med Technol 2024; 2(1): 11-19

Published online May 30, 2024

https://doi.org/10.61940/jimt.240002

© Korean Innovative Medical Technology Society

Semantic segmentation networks of organs in minimally invasive surgery

Jun-Ha Park1 , Young Jae Kim2 , Kwang Gi Kim1,3,4,5

1Department of Bio-Health Medical Engineering, Gachon University Gil Medical Center, Incheon, Korea, 2Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon, Korea, 3Medical Devices R&D Center, Gachon University Gil Medical Center, Incheon, Korea, 4Department of Biomedical Engineering, Gachon University Gil Medical Center, Incheon, Korea, 5Department of Health Sciences & Technology, Gachon Advanced Institute for Health Sciences & Technology (GAIHST), Gachon University, Lee Gil Ya Cancer and Diabetes Institute, Incheon, Korea

Correspondence to : Kwang Gi Kim
Department of Biomedical Engineering, Gachon University Gil Medical Center, 38-13 Dokjeom-ro 3beon-gil, Namdong-gu, Incheon 21565, Korea
e-mail kimkg@gachon.ac.kr
https://orcid.org/0000-0001-9714-6038

Received: April 26, 2024; Accepted: April 29, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background: Minimally invasive surgery (MIS) and robot-assisted surgery have gained recognition as procedures safer than traditional laparotomy which facilitate faster patient recovery. However, MIS limits the sense of the surgeon. Therefore, a computer-assisted algorithm is proposed to assist in this surgery. With the advent of convolutional neural networks, machine vision technology has become an attractive option.
Materials and Methods: We use four networks, TernausNet, TernausResNet, LinkNet, and DeepLab V3+, to predict organ segments in endoscopy images. Furthermore, endoscopy images have several issues such as noise, hemorrhage, and shading. Therefore, we perform preprocessing and draw parallels between the images with and without preprocessing.
Results: The network with the lowest performance is TernausNet; the performances of the other three networks show marginal differences. The most significant factor for predicting performance is the encoder network. All networks demonstrate reliable performance with a minimum intersection over union score of 0.68 in TernausNet.
Conclusion: The segmentation of organs in images can be used for the quantitative evaluation of surgery and to help surgeons understand anatomy.

Keywords Deep learning; Artificial intelligence; Diagnosis, computer-assisted; Minimally invasive surgical procedures

Minimally invasive surgery (MIS) and robot-assisted surgery (RAS) have gained recognition for their ability to provide safer procedures and facilitate faster patient recovery than traditional laparotomy, resulting in reduced hospitalization durations1,2. However, surgeons encounter challenges in effectively manipulating surgical tools and obtaining a comprehensive understanding of the tissues at the surgical site due to their reliance on screens and endoscopic instruments for information gathering3,4. To overcome these challenges, researchers have actively conducted studies aimed at providing valuable feedback information by attaching sensors to instruments or employing computer vision technology to supplement the visual information5-8. Notably, computer vision-assistive technology has emerged as a promising solution, capitalizing on the rapid advancements in deep learning and convolutional neural networks (CNNs)7-9.

Even before the advent of deep learning and CNNs, researchers have proposed algorithms that leverage classical computer vision techniques to assist with MIS. In a notable 2003 study, Lo et al.10 introduced an algorithm aimed at evaluating tissue–instrument interactions through image analysis. Their approach involved instrument segmentation and tracking using color segmentation while quantitatively assessing tissue deformation caused by instruments through optical flow and shape-from-shading techniques. In a separate study conducted in 2006, Bilodeau et al.11 proposed an algorithm to segment the cavity and thereby aid surgeons in performing thoracic laminectomies; to validate their method, they segmented the cavity using laparoscopic images of a surgery performed on a pig. Their technique involved splitting and merging the cavity into meaningful regions using a multilevel graph approach known as the recursive shortest spanning tree. Both studies are significant because they utilize computer vision techniques to track instruments and aid surgeons performing MIS. These methods offer benefits such as lower computational costs and interpretability of the segmentation process when compared to deep learning approaches. However, they exhibit notably lower generalization capabilities than deep learning methods, making their application in endoscopic environments challenging due to the presence of multiple variables12.

Deep learning-based semantic segmentation has demonstrated its superiority over classical methods in achieving more accurate segmentation, particularly exhibiting enhanced generalization capabilities, which render it well-suited for endoscopic images encompassing multiple variances like illuminance, blurring, light spillage, hemorrhage, and overshadowing12-15. In 2018, Shvets et al.14 proposed a deep-learning-based semantic segmentation method for robotic instrument detection and tracking. They conducted binary and multiclass classifications of instruments in porcine surgical images acquired using the da Vinci Xi surgical system. The authors employed U-net-based networks, specifically TernausNet16 and LinkNet34, achieving a binary classification intersection over union (IoU) score of 0.66 and a multiclass classification IoU score of 0.35 using TernausNet16. Despite the effectiveness of deep-learning-based semantic segmentation in laparoscopic images, it exhibits insufficient multiclass classification performance. In 2020, Scheikl et al.15 proposed the application of semantic segmentation to assist surgeons in scene understanding during laparoscopic surgery. They trained segmentation networks, including U-net, TernausNet, LinkNet, FCN, and SegNet, using various encoder networks. Three loss functions, namely soft-Jaccard, generalized Dice, and cross entropy, were cross-applied to laparoscopic cholecystectomy images using labeled data categories such as image exterior, liver, gallbladder, instrument, fat, and others. The network trained with TernausNet11 using the soft-Jaccard function achieved a maximum IoU score of 0.78 for image segmentation.

In 2021, Sun et al.16 proposed a lightweight segmentation network for the real-time detection and tracking of surgical instruments in RAS. Their approach involved the utilization of a lightweight network that Ghost Module apply to MobileNetV3 as the encoder for real-time image segmentation while employing Lite R-ASPP as the decoder. The network was trained using image sequences obtained from the da Vinci Xi system provided by the MICCAI Endoscopic Vision Challenge 2017. The segmentation achieved an impressive speed of approximately 37.0 frames per second (FPS) with an accuracy of 0.70; notably, this real-time speed was accomplished with a minimal compromise in accuracy. In 2019, Ni et al.17 conducted surgical instrument segmentation using RASNet, incorporating a decoder with an attentional mechanism. They utilized RAS images from the MICCAI Endoscopic Vision Challenge 2017 dataset and achieved an IoU score of 0.90. To address the issue of background class imbalance in surgical images, they implemented the global attention upsample, which focused on the features of surgical instruments. This approach resulted in a noteworthy 7.58% improvement in the IoU score compared with that of the baseline model. By directing attention to the instruments through the attention mechanism, the challenge of class imbalance was effectively mitigated, leading to a substantial enhancement in the multi-classification performance of various surgical instruments. However, it is important to note that both studies specifically focused on segmenting the surgical instruments and did not encompass other aspects likes organs within the surgical images.

In MIS, the primary objective of image segmentation is to delineate surgical instruments accurately and assist surgeons in understanding the precise positioning of these instruments during the procedure. However, it is equally important to identify and analyze organs within MIS images, which play a crucial role in computer-aided diagnosis15,18. The accurate segmentation of organs in endoscopy images relies on the effective handling of variables, and multiple segmentations of organs are required. In this context, deep-CNN-based multiclass segmentation has emerged as a compelling solution. Therefore, this study proposes an image segmentation method for organs that employs a deep CNN as an encoder within a multiclass segmentation network and whose primary goal is to provide surgical assistance. This study provides a detailed description of the proposed solution, including the architecture of the semantic segmentation network, the learning process of the network, and a comprehensive analysis of the results obtained by segmenting organs in real surgical images.

Our study focuses on organ semantic segmentation during MIS. Therefore, we utilize several semantic segmentation networks, including TernausNet, TernausResNet, LinkNet, and DeepLab V3+ for MIS images. Endoscopy images used in MIS may demonstrate noise, hemorrhage, and shading issues. To address this problem, we apply preprocessing techniques and compare the results obtained using the datasets with and without preprocessing; as a result, the most influential factor is observed to be the encoder network, whereas the decoder network and preprocessing have marginal variances. Organ-semantic segmentation in MIS images has a different purpose: enabling the quantitative evaluation of surgery and assisting surgeons in understanding anatomical structures.

Fig. 1A illustrates the network training sequence used in a medical image segmentation system for image segmentation. The networks are implemented using TensorFlow 2.6.0 with CUDA 11.3 and cuDNN 8.2.1. The Adam optimizer is employed, with the learning rate, beta1, and beta2 set to 0.001, 0.9, and 0.999, respectively. The number of epochs is set to a maximum of 1,000, and the early stop method is applied with a patience of 20. During training, the networks are evaluated using a validation dataset and the top-performing networks are saved.

Figure 1.(A) Flow chart of research. (B) Flowchart of data preprocessing. IoU: intersection over union, CLAHE: contrast-limited adaptive histogram equalization.

Dataset

This retrospective study was approved by the Institutional Review Board of Gachon University Gil Hospital, and the requirement for patient informed consent was waived (approval number: GDIRB2020-346). The raw data comprises PNG images with a resolution of 1,920×1,080 pixels and RGB channels, along with XML annotations with polygon masks recording each organ. The mask is encoded as a one-hot vector comprising four binary images of the same size as the original image. To reduce the training time the image resolution is reduced to 512×288, while maintaining the aspect ratio of the image at 16:9. Table 1 presents the components of the data, including the number of classes. Each image contains one or more classes, and the total number of images is 2,244.

Table 1 Components of the data, including the corresponding number of classes

Raw data categoryLiverGallbladderSpleenTotal
Each231608471,138
Liver–gallbladder9369360936
Liver–spleen1600160160
Liver–gallbladder–spleen10101010
Total1,3371,0061,0172,244

Each image contains one or more classes, with a total of 2,244 images.


Preprocessing prevents the network training from being affected by noise and unwanted features, and it is compared with a non-conducted dataset to evaluate compatibility. Preprocessing normalization, contrast-limited adaptive histogram equalization (CLAHE), and Gaussian blur are applied for preprocessing19. Fig. 1B depicts the preprocessing process.

Network architecture

The network architectures are constructed based on the U-net model, which comprises an encoder (or a backbone network) and a decoder. These include skip connections that facilitate the recovery of resolution through the transport of low-level features20. This is illustrated in Fig. 2A. Two variants of TernausNet, LinkNet, and DeepLab V3+ are used for image segmentation of organs, with each network’s encoder pre-trained with the ImageNet dataset14,21-24.

Figure 2.(A) Network architecture of U-net with an encoder and decoder. (B) Decoder block of TernausNet. (C) Decoder block of LinkNet. (D) Network architecture of DeepLab V3+. DCNN: deep convolutional neural network.

TernausNet is a semantic segmentation network proposed by Iglovikov and Shvets21 in 2018. It utilizes a VGG11 encoder, and its architecture is inspired by U-net. The networks concatenate the low-level and decoded features. In our study, VGG16 and ResNet50 are used instead of VGG11; hereafter, these networks are referred to as TernausNet and TernausResNet21,25,26. Fig. 2B illustrates the decoder block of TernausNet. LinkNet is a semantic segmentation network proposed by Chaurasia and Culurciello22 in 2017. It is based on U-net and applies a ResNet18 encoder, which constructs a skip-connection with a residual connection between the low-level and decode features. Fig. 2C illustrates the decoder block of LinkNet. DeepLab V3+ is a semantic segmentation network proposed by the Google Brain team in 2018, which enhanced DeepLab V3+ in 2017. By using atrous spatial pyramid pooling (ASPP), the DeepLab V3+ processing encoder has high-level features on a variable scale23. The DeepLab V3+ has encoder-decoder structures, the backbone is Xeception, and the mask is decoded by the ASPP and decoding modules23. It is illustrated in Fig. 2D.

Loss function

The loss function is composed of two components: cross entropy, which evaluates the accuracy of the entire pixel, and the IoU loss, which evaluates each label prediction. The IoU loss is defined as the negative logarithm of the IoU, and the loss function is based on the study by Shvets et al.14 Equation (1) represents the loss function:

Loss = Corss EntropylogIoU

Data preprocessing utilizes normalization as a default, with histogram equalization through CLAHE, and Gaussian blur as an option. To evaluate the effectiveness of each preprocessing method, the networks were trained using distinct preprocessed datasets. By comparing each area, the network prediction labels were validated using labels assigned by the surgeons. IoU and Dice are coefficients calculated to compare the performance of each network14,15,23. These coefficients range from zero to one, where a value closer to one indicates a higher similarity between the two areas. Table 2 lists the IoU score, and Dice coefficient based on the type of preprocessing and network architecture.

Table 2 IoU score and Dice coefficient according based on the type of preprocessing and network architecture

NetworkPreprocessingIoUDiceLatency (ms)
TernausNet0.68±0.250.75±0.2585
TernausResNet0.76±0.230.82±0.23103
LinkNet0.74±0.250.80±0.2485
DeepLab V3+0.75±0.230.81±0.2387
TernausNet×0.69±0.250.75±0.2586
TernausResNet×0.74±0.250.80±0.24102
LinkNet×0.73±0.250.79±0.2588
DeepLab V3+×0.74±0.230.80±0.2286

Values are presented as mean±standard deviation.

IoU: intersection over union.


TernausNet has the lowest performance among the networks, showing an IoU score approximately 8.58% lower and a Dice coefficient approximately 7.11% lower than the average. In contrast, TernausResNet has the highest IoU score and Dice coefficient among the networks with marginal variances. Network training using preprocessed datasets has a higher performance than that using unprocessed datasets, with marginal variances. Examples of the network predictions are listed in Fig. 3.

Figure 3.Examples of prediction by networks trained preprocessed data sets. Blue: liver, green: gallbladder, red: spleen.

TernausResNet, trained on preprocessed datasets, exhibits the best performance among the networks. To assess the performance within each class, the IoU scores and Dice coefficients are calculated for each class in TernausResNet, which has been trained on preprocessed datasets. The gallbladder class has an IoU score of 0.77, indicating the highest performance; by contrast, the liver class has an IoU score of 0.72, which represents a lower but nonetheless reliable performance (Fig. 4).

Figure 4.Box float of intersection over union (IoU) score and Dice coefficient as each class in TernausResNet.

Fig. 5 lists three cases of poor performance. Case 1 shows an image including a hemorrhage, case 2 shows overshadowing, and case 3 shows edge fading owing to light spillage. In these cases, an endoscopy image has noise, hemorrhage, and shading issues; these problems confuse the network and result in erroneous class predictions27. To address these problems and improve the network performance, a follow-up study collects additional images that correspond to these cases and proposes the development of automated exclusion systems or generalization techniques.

Figure 5.Examples of poor prediction cases. Case 1 includes hemorrhage, case 2 includes excessive shadows on the object class, and case 3 includes unintentional blurring between instrument and organ.

In studies by Shvets et al.14 and Scheikl et al.15, TernausNet16 has exhibited good performance, and shallow networks such as TernausNet11 and TernausNet16 outperform deep networks such as LinkNet34 and LinkNet50; this is different from our results, where TernausNet exhibits poor performance. We infer that the reason for the differences in the results is the use of a deeper encoder network than that used by Scheikl et al.15 Scheikl et al.15 reported that TernausNet16 exhibited the highest performance when considering only the best performance from seven runs; however, when considering all the runs, TernausNet16 was excluded from the ranking. Therefore, the differences in the results can be attributed to variations in the number and quality of the datasets, as well as differences in the training methods.

The network with the lowest performance is TernausNet; the performances of the other three networks show marginal differences. These findings can be attributed to discrepancies in network encoders. Specifically, TernausNet employs VGG16 as its encoder, whereas the other networks utilize ResNet50 as their encoder. In our study, we observe that the choice of decoder does not have a significant impact on performance. Consequently, it is necessary to optimize the decoder and utilize various decoders to improve the results. In our study, TernausNet and LinkNet have distinct encoders compared to the original model21-23. While these modified models perform reliably, it is essential to conduct a performance comparison with the original model and analyze the influence of the encoder modifications. The network latencies are as follows: TernausResNet, 85 ms; LinkNet, 85 ms; and DeepLab V3+, 87 ms. When converted to FPS, the approximate range is 10–11 FPS. The network with the longer latency is TernausNet because its employed encoder, VGG16, requires a longer inference time than ResNet5026. The segmentation of organs in images can be used for the quantitative evaluation of surgery and to help surgeons understand anatomy.

Our study aims to alleviate the diminished sensory perception in minimally invasive and robot-assisted surgeries through the application of deep learning-based computer-aided techniques. We used deep learning for laparoscopic image segmentation, achieving a DICE score of 0.82 at approximately 11 FPS. Our results need refinement, especially in enhancing inference speed and diversifying data. Employing deep learning in MIS, like our approach stands as a promising solution.

No potential conflict of interest relevant to this article was reported.

This work was supported by the GRRC program of Gyeonggi province. [GRRC-Gachon2023(B01), Development of AI-based medical imaging technology], and by the Technology Innovation Program (K_G012001185601, Building Data Sets for Artificial Intelligence Learning) funded By the Ministry of Trade Industry & Energy (MOTIE, Korea).

  1. Longmore SK, Naik G, Gargiulo GD. Laparoscopic robotic surgery: current perspective and future directions. Robotics 2020;9:42. https://doi.org/10.3390/robotics9020042.
    CrossRef
  2. Omisore OM, Han S, Xiong J, Li H, Li Z, Wang L. A review on flexible robotic systems for minimally invasive surgery. IEEE Trans Syst Man Cybern Syst 2020;52:631-644. https://doi.org/10.1109/TSMC.2020.3026174.
    CrossRef
  3. Bandari N, Dargahi J, Packirisamy M. Tactile sensors for minimally invasive surgery: a review of the state-of-the-art, applications, and perspectives. IEEE Access 2020;8:7682-7708. https://doi.org/10.1109/ACCESS.2019.2962636.
    CrossRef
  4. Zhang YX, Wei XY, Yue WC, Zhu CJ, Ju F. A dual-mode tactile hardness sensor for intraoperative tumor detection and tactile imaging in robot-assisted minimally invasive surgery. Smart Mater Struct 2021;30:085041. https://doi.org/10.1088/1361-665X/ac112b.
    CrossRef
  5. Zhang X, Ji X, Wang J, Fan Y, Tao C. Renal surface reconstruction and segmentation for image-guided surgical navigation of laparoscopic partial nephrectomy. Biomed Eng Lett 2023;13:165-174. https://doi.org/10.1007/s13534-023-00263-1.
    Pubmed KoreaMed CrossRef
  6. Lee DH, Kim U, Gulrez T, Yoon WJ, Hannaford B, Choi HR. A laparoscopic grasping tool with force sensing capability. IEEE ASME Trans Mechatron 2016;21:130-141. https://doi.org/10.1109/TMECH.2015.2442591.
    CrossRef
  7. Pirie K, Myles PS, Riedel B. A survey of neuraxial analgesic preferences in open and laparoscopic major abdominal surgery amongst anaesthetists in Australia and New Zealand. Anaesth Intensive Care 2020;48:314-317. https://doi.org/10.1177/0310057X20937315.
    Pubmed CrossRef
  8. Padoy N. Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 2019;28:82-90. https://doi.org/10.1080/13645706.2019.1584116.
    Pubmed CrossRef
  9. Chen LC, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv [Preprint]. 2017 [cited 2023 May 8].
    Available from: https://doi.org/10.48550/arXiv.1706.05587.
    CrossRef
  10. Lo BP, Darzi A, Yang GZ. Episode classification for the analysis of tissue/instrument interaction with multiple visual cues. Proceedings of Medical Image Computing and Computer-Assisted Intervention - MICCAI 2003: 6th International Conference, 2003 Nov 15-18; Montréal, Canada.
    CrossRef
  11. Bilodeau GA, Shu Y, Cheriet F. Multistage graph-based segmentation of thoracoscopic images. Comput Med Imaging Graph 2006;30:437-446. https://doi.org/10.1016/j.compmedimag.2006.07.003.
    Pubmed CrossRef
  12. Kawaguchi K, Bengio Y, Kaelbling L. Generalization in deep learning. In: Grohs P, Kutyniok G, eds. Mathematical aspects of deep learning. Cambridge University Press, pp 112-148, 2022.
    Pubmed CrossRef
  13. AlBadawy EA, Saha A, Mazurowski MA. Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 2018;45:1150-1158. https://doi.org/10.1002/mp.12752.
    Pubmed CrossRef
  14. Shvets AA, Rakhlin A, Kalinin AA, Iglovikov VI. Automatic instrument segmentation in robot-assisted surgery using deep learning. Proceedings of 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018 Dec 17-20; Orlando, USA.
    CrossRef
  15. Scheikl PM, Laschewski S, Kisilenko A, et al. Deep learning for semantic segmentation of organs and tissues in laparoscopic surgery. Curr Dir Biomed Eng 2020;6:20200016. https://doi.org/10.1515/cdbme-2020-0016.
    CrossRef
  16. Sun YW, Pan B, Fu YL. Lightweight deep neural network for real-time instrument semantic segmentation in robot assisted minimally invasive surgery. IEEE Robot Autom Lett 2021;6:3870-3877. https://doi.org/10.1109/LRA.2021.3066956.
    CrossRef
  17. Ni ZL, Bian GB, Xie XL, Hou ZG, Zhou XH, Zhou YJ. RASNet: segmentation for tracking surgical instruments in surgical videos using Refined Attention Segmentation Network. Annu Int Conf IEEE Eng Med Biol Soc 2019;2019:5735-5738. https://doi.org/10.1109/EMBC.2019.8856495.
    Pubmed CrossRef
  18. Hu P, Wu F, Peng J, Bao Y, Chen F, Kong D. Automatic abdominal multi-organ segmentation using deep convolutional neural network and time-implicit level sets. Int J Comput Assist Radiol Surg 2017;12:399-411. https://doi.org/10.1007/s11548-016-1501-5.
    Pubmed CrossRef
  19. Aurangzeb K, Aslam S, Alhussein M, Naqvi RA, Arsalan M, Haider SI. Contrast enhancement of fundus images by employing modified PSO for improving the performance of deep learning models. IEEE Access 2021;9:47930-47945. https://doi.org/10.1109/ACCESS.2021.3068477.
    CrossRef
  20. Siddique N, Paheding S, Elkin CP, Devabhaktuni V. U-net and its variants for medical image segmentation: a review of theory and applications. IEEE Access 2021;9:82031-82057. https://doi.org/10.1109/ACCESS.2021.3086020.
    CrossRef
  21. Iglovikov V, Shvets A. TernausNet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation. arXiv [Preprint]. 2018 [cited 2023 May 8].
    Available from: https://doi.org/10.48550/arXiv.1801.05746.
    CrossRef
  22. Chaurasia A, Culurciello E. LinkNet: exploiting encoder representations for efficient semantic segmentation. Proceedings of 2017 IEEE Visual Communications and Image Processing (VCIP), 2017 Dec 10-13; Petersburg, USA.
    CrossRef
  23. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), 2018 Sep 8-14; Munich, Germany.
    CrossRef
  24. Rakshit S. Multiclass semantic segmentation using DeepLabV3+ [Internet]. Keras; 2021 [cited 2023 Apr 5].
    Available from: https://keras.io/examples/vision/deeplabv3_plus/.
  25. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv [Preprint]. 2014 [cited 2023 May 17].
    Available from: https://doi.org/10.48550/arXiv.1409.1556.
  26. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 Jun 26-Jul 1; Las Vegas, USA.
    Pubmed CrossRef
  27. Xu Y, Li Z, Li W, et al. Dual-channel residual network for hyperspectral image classification with noisy labels. IEEE Trans Geosci Remote Sens 2022;60:5502511. https://doi.org/10.1109/TGRS.2021.3057689.
    CrossRef

Article

Original Article

J Innov Med Technol 2024; 2(1): 11-19

Published online May 30, 2024 https://doi.org/10.61940/jimt.240002

Copyright © Korean Innovative Medical Technology Society.

Semantic segmentation networks of organs in minimally invasive surgery

Jun-Ha Park1 , Young Jae Kim2 , Kwang Gi Kim1,3,4,5

1Department of Bio-Health Medical Engineering, Gachon University Gil Medical Center, Incheon, Korea, 2Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon, Korea, 3Medical Devices R&D Center, Gachon University Gil Medical Center, Incheon, Korea, 4Department of Biomedical Engineering, Gachon University Gil Medical Center, Incheon, Korea, 5Department of Health Sciences & Technology, Gachon Advanced Institute for Health Sciences & Technology (GAIHST), Gachon University, Lee Gil Ya Cancer and Diabetes Institute, Incheon, Korea

Correspondence to:Kwang Gi Kim
Department of Biomedical Engineering, Gachon University Gil Medical Center, 38-13 Dokjeom-ro 3beon-gil, Namdong-gu, Incheon 21565, Korea
e-mail kimkg@gachon.ac.kr
https://orcid.org/0000-0001-9714-6038

Received: April 26, 2024; Accepted: April 29, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: Minimally invasive surgery (MIS) and robot-assisted surgery have gained recognition as procedures safer than traditional laparotomy which facilitate faster patient recovery. However, MIS limits the sense of the surgeon. Therefore, a computer-assisted algorithm is proposed to assist in this surgery. With the advent of convolutional neural networks, machine vision technology has become an attractive option.
Materials and Methods: We use four networks, TernausNet, TernausResNet, LinkNet, and DeepLab V3+, to predict organ segments in endoscopy images. Furthermore, endoscopy images have several issues such as noise, hemorrhage, and shading. Therefore, we perform preprocessing and draw parallels between the images with and without preprocessing.
Results: The network with the lowest performance is TernausNet; the performances of the other three networks show marginal differences. The most significant factor for predicting performance is the encoder network. All networks demonstrate reliable performance with a minimum intersection over union score of 0.68 in TernausNet.
Conclusion: The segmentation of organs in images can be used for the quantitative evaluation of surgery and to help surgeons understand anatomy.

Keywords: Deep learning, Artificial intelligence, Diagnosis, computer-assisted, Minimally invasive surgical procedures

Introduction

Minimally invasive surgery (MIS) and robot-assisted surgery (RAS) have gained recognition for their ability to provide safer procedures and facilitate faster patient recovery than traditional laparotomy, resulting in reduced hospitalization durations1,2. However, surgeons encounter challenges in effectively manipulating surgical tools and obtaining a comprehensive understanding of the tissues at the surgical site due to their reliance on screens and endoscopic instruments for information gathering3,4. To overcome these challenges, researchers have actively conducted studies aimed at providing valuable feedback information by attaching sensors to instruments or employing computer vision technology to supplement the visual information5-8. Notably, computer vision-assistive technology has emerged as a promising solution, capitalizing on the rapid advancements in deep learning and convolutional neural networks (CNNs)7-9.

Even before the advent of deep learning and CNNs, researchers have proposed algorithms that leverage classical computer vision techniques to assist with MIS. In a notable 2003 study, Lo et al.10 introduced an algorithm aimed at evaluating tissue–instrument interactions through image analysis. Their approach involved instrument segmentation and tracking using color segmentation while quantitatively assessing tissue deformation caused by instruments through optical flow and shape-from-shading techniques. In a separate study conducted in 2006, Bilodeau et al.11 proposed an algorithm to segment the cavity and thereby aid surgeons in performing thoracic laminectomies; to validate their method, they segmented the cavity using laparoscopic images of a surgery performed on a pig. Their technique involved splitting and merging the cavity into meaningful regions using a multilevel graph approach known as the recursive shortest spanning tree. Both studies are significant because they utilize computer vision techniques to track instruments and aid surgeons performing MIS. These methods offer benefits such as lower computational costs and interpretability of the segmentation process when compared to deep learning approaches. However, they exhibit notably lower generalization capabilities than deep learning methods, making their application in endoscopic environments challenging due to the presence of multiple variables12.

Deep learning-based semantic segmentation has demonstrated its superiority over classical methods in achieving more accurate segmentation, particularly exhibiting enhanced generalization capabilities, which render it well-suited for endoscopic images encompassing multiple variances like illuminance, blurring, light spillage, hemorrhage, and overshadowing12-15. In 2018, Shvets et al.14 proposed a deep-learning-based semantic segmentation method for robotic instrument detection and tracking. They conducted binary and multiclass classifications of instruments in porcine surgical images acquired using the da Vinci Xi surgical system. The authors employed U-net-based networks, specifically TernausNet16 and LinkNet34, achieving a binary classification intersection over union (IoU) score of 0.66 and a multiclass classification IoU score of 0.35 using TernausNet16. Despite the effectiveness of deep-learning-based semantic segmentation in laparoscopic images, it exhibits insufficient multiclass classification performance. In 2020, Scheikl et al.15 proposed the application of semantic segmentation to assist surgeons in scene understanding during laparoscopic surgery. They trained segmentation networks, including U-net, TernausNet, LinkNet, FCN, and SegNet, using various encoder networks. Three loss functions, namely soft-Jaccard, generalized Dice, and cross entropy, were cross-applied to laparoscopic cholecystectomy images using labeled data categories such as image exterior, liver, gallbladder, instrument, fat, and others. The network trained with TernausNet11 using the soft-Jaccard function achieved a maximum IoU score of 0.78 for image segmentation.

In 2021, Sun et al.16 proposed a lightweight segmentation network for the real-time detection and tracking of surgical instruments in RAS. Their approach involved the utilization of a lightweight network that Ghost Module apply to MobileNetV3 as the encoder for real-time image segmentation while employing Lite R-ASPP as the decoder. The network was trained using image sequences obtained from the da Vinci Xi system provided by the MICCAI Endoscopic Vision Challenge 2017. The segmentation achieved an impressive speed of approximately 37.0 frames per second (FPS) with an accuracy of 0.70; notably, this real-time speed was accomplished with a minimal compromise in accuracy. In 2019, Ni et al.17 conducted surgical instrument segmentation using RASNet, incorporating a decoder with an attentional mechanism. They utilized RAS images from the MICCAI Endoscopic Vision Challenge 2017 dataset and achieved an IoU score of 0.90. To address the issue of background class imbalance in surgical images, they implemented the global attention upsample, which focused on the features of surgical instruments. This approach resulted in a noteworthy 7.58% improvement in the IoU score compared with that of the baseline model. By directing attention to the instruments through the attention mechanism, the challenge of class imbalance was effectively mitigated, leading to a substantial enhancement in the multi-classification performance of various surgical instruments. However, it is important to note that both studies specifically focused on segmenting the surgical instruments and did not encompass other aspects likes organs within the surgical images.

In MIS, the primary objective of image segmentation is to delineate surgical instruments accurately and assist surgeons in understanding the precise positioning of these instruments during the procedure. However, it is equally important to identify and analyze organs within MIS images, which play a crucial role in computer-aided diagnosis15,18. The accurate segmentation of organs in endoscopy images relies on the effective handling of variables, and multiple segmentations of organs are required. In this context, deep-CNN-based multiclass segmentation has emerged as a compelling solution. Therefore, this study proposes an image segmentation method for organs that employs a deep CNN as an encoder within a multiclass segmentation network and whose primary goal is to provide surgical assistance. This study provides a detailed description of the proposed solution, including the architecture of the semantic segmentation network, the learning process of the network, and a comprehensive analysis of the results obtained by segmenting organs in real surgical images.

Our study focuses on organ semantic segmentation during MIS. Therefore, we utilize several semantic segmentation networks, including TernausNet, TernausResNet, LinkNet, and DeepLab V3+ for MIS images. Endoscopy images used in MIS may demonstrate noise, hemorrhage, and shading issues. To address this problem, we apply preprocessing techniques and compare the results obtained using the datasets with and without preprocessing; as a result, the most influential factor is observed to be the encoder network, whereas the decoder network and preprocessing have marginal variances. Organ-semantic segmentation in MIS images has a different purpose: enabling the quantitative evaluation of surgery and assisting surgeons in understanding anatomical structures.

Materials and Methods

Fig. 1A illustrates the network training sequence used in a medical image segmentation system for image segmentation. The networks are implemented using TensorFlow 2.6.0 with CUDA 11.3 and cuDNN 8.2.1. The Adam optimizer is employed, with the learning rate, beta1, and beta2 set to 0.001, 0.9, and 0.999, respectively. The number of epochs is set to a maximum of 1,000, and the early stop method is applied with a patience of 20. During training, the networks are evaluated using a validation dataset and the top-performing networks are saved.

Figure 1. (A) Flow chart of research. (B) Flowchart of data preprocessing. IoU: intersection over union, CLAHE: contrast-limited adaptive histogram equalization.

Dataset

This retrospective study was approved by the Institutional Review Board of Gachon University Gil Hospital, and the requirement for patient informed consent was waived (approval number: GDIRB2020-346). The raw data comprises PNG images with a resolution of 1,920×1,080 pixels and RGB channels, along with XML annotations with polygon masks recording each organ. The mask is encoded as a one-hot vector comprising four binary images of the same size as the original image. To reduce the training time the image resolution is reduced to 512×288, while maintaining the aspect ratio of the image at 16:9. Table 1 presents the components of the data, including the number of classes. Each image contains one or more classes, and the total number of images is 2,244.

Table 1 . Components of the data, including the corresponding number of classes.

Raw data categoryLiverGallbladderSpleenTotal
Each231608471,138
Liver–gallbladder9369360936
Liver–spleen1600160160
Liver–gallbladder–spleen10101010
Total1,3371,0061,0172,244

Each image contains one or more classes, with a total of 2,244 images..



Preprocessing prevents the network training from being affected by noise and unwanted features, and it is compared with a non-conducted dataset to evaluate compatibility. Preprocessing normalization, contrast-limited adaptive histogram equalization (CLAHE), and Gaussian blur are applied for preprocessing19. Fig. 1B depicts the preprocessing process.

Network architecture

The network architectures are constructed based on the U-net model, which comprises an encoder (or a backbone network) and a decoder. These include skip connections that facilitate the recovery of resolution through the transport of low-level features20. This is illustrated in Fig. 2A. Two variants of TernausNet, LinkNet, and DeepLab V3+ are used for image segmentation of organs, with each network’s encoder pre-trained with the ImageNet dataset14,21-24.

Figure 2. (A) Network architecture of U-net with an encoder and decoder. (B) Decoder block of TernausNet. (C) Decoder block of LinkNet. (D) Network architecture of DeepLab V3+. DCNN: deep convolutional neural network.

TernausNet is a semantic segmentation network proposed by Iglovikov and Shvets21 in 2018. It utilizes a VGG11 encoder, and its architecture is inspired by U-net. The networks concatenate the low-level and decoded features. In our study, VGG16 and ResNet50 are used instead of VGG11; hereafter, these networks are referred to as TernausNet and TernausResNet21,25,26. Fig. 2B illustrates the decoder block of TernausNet. LinkNet is a semantic segmentation network proposed by Chaurasia and Culurciello22 in 2017. It is based on U-net and applies a ResNet18 encoder, which constructs a skip-connection with a residual connection between the low-level and decode features. Fig. 2C illustrates the decoder block of LinkNet. DeepLab V3+ is a semantic segmentation network proposed by the Google Brain team in 2018, which enhanced DeepLab V3+ in 2017. By using atrous spatial pyramid pooling (ASPP), the DeepLab V3+ processing encoder has high-level features on a variable scale23. The DeepLab V3+ has encoder-decoder structures, the backbone is Xeception, and the mask is decoded by the ASPP and decoding modules23. It is illustrated in Fig. 2D.

Loss function

The loss function is composed of two components: cross entropy, which evaluates the accuracy of the entire pixel, and the IoU loss, which evaluates each label prediction. The IoU loss is defined as the negative logarithm of the IoU, and the loss function is based on the study by Shvets et al.14 Equation (1) represents the loss function:

Loss = Corss EntropylogIoU

Results

Data preprocessing utilizes normalization as a default, with histogram equalization through CLAHE, and Gaussian blur as an option. To evaluate the effectiveness of each preprocessing method, the networks were trained using distinct preprocessed datasets. By comparing each area, the network prediction labels were validated using labels assigned by the surgeons. IoU and Dice are coefficients calculated to compare the performance of each network14,15,23. These coefficients range from zero to one, where a value closer to one indicates a higher similarity between the two areas. Table 2 lists the IoU score, and Dice coefficient based on the type of preprocessing and network architecture.

Table 2 . IoU score and Dice coefficient according based on the type of preprocessing and network architecture.

NetworkPreprocessingIoUDiceLatency (ms)
TernausNet0.68±0.250.75±0.2585
TernausResNet0.76±0.230.82±0.23103
LinkNet0.74±0.250.80±0.2485
DeepLab V3+0.75±0.230.81±0.2387
TernausNet×0.69±0.250.75±0.2586
TernausResNet×0.74±0.250.80±0.24102
LinkNet×0.73±0.250.79±0.2588
DeepLab V3+×0.74±0.230.80±0.2286

Values are presented as mean±standard deviation..

IoU: intersection over union..



TernausNet has the lowest performance among the networks, showing an IoU score approximately 8.58% lower and a Dice coefficient approximately 7.11% lower than the average. In contrast, TernausResNet has the highest IoU score and Dice coefficient among the networks with marginal variances. Network training using preprocessed datasets has a higher performance than that using unprocessed datasets, with marginal variances. Examples of the network predictions are listed in Fig. 3.

Figure 3. Examples of prediction by networks trained preprocessed data sets. Blue: liver, green: gallbladder, red: spleen.

TernausResNet, trained on preprocessed datasets, exhibits the best performance among the networks. To assess the performance within each class, the IoU scores and Dice coefficients are calculated for each class in TernausResNet, which has been trained on preprocessed datasets. The gallbladder class has an IoU score of 0.77, indicating the highest performance; by contrast, the liver class has an IoU score of 0.72, which represents a lower but nonetheless reliable performance (Fig. 4).

Figure 4. Box float of intersection over union (IoU) score and Dice coefficient as each class in TernausResNet.

Discussion

Fig. 5 lists three cases of poor performance. Case 1 shows an image including a hemorrhage, case 2 shows overshadowing, and case 3 shows edge fading owing to light spillage. In these cases, an endoscopy image has noise, hemorrhage, and shading issues; these problems confuse the network and result in erroneous class predictions27. To address these problems and improve the network performance, a follow-up study collects additional images that correspond to these cases and proposes the development of automated exclusion systems or generalization techniques.

Figure 5. Examples of poor prediction cases. Case 1 includes hemorrhage, case 2 includes excessive shadows on the object class, and case 3 includes unintentional blurring between instrument and organ.

In studies by Shvets et al.14 and Scheikl et al.15, TernausNet16 has exhibited good performance, and shallow networks such as TernausNet11 and TernausNet16 outperform deep networks such as LinkNet34 and LinkNet50; this is different from our results, where TernausNet exhibits poor performance. We infer that the reason for the differences in the results is the use of a deeper encoder network than that used by Scheikl et al.15 Scheikl et al.15 reported that TernausNet16 exhibited the highest performance when considering only the best performance from seven runs; however, when considering all the runs, TernausNet16 was excluded from the ranking. Therefore, the differences in the results can be attributed to variations in the number and quality of the datasets, as well as differences in the training methods.

The network with the lowest performance is TernausNet; the performances of the other three networks show marginal differences. These findings can be attributed to discrepancies in network encoders. Specifically, TernausNet employs VGG16 as its encoder, whereas the other networks utilize ResNet50 as their encoder. In our study, we observe that the choice of decoder does not have a significant impact on performance. Consequently, it is necessary to optimize the decoder and utilize various decoders to improve the results. In our study, TernausNet and LinkNet have distinct encoders compared to the original model21-23. While these modified models perform reliably, it is essential to conduct a performance comparison with the original model and analyze the influence of the encoder modifications. The network latencies are as follows: TernausResNet, 85 ms; LinkNet, 85 ms; and DeepLab V3+, 87 ms. When converted to FPS, the approximate range is 10–11 FPS. The network with the longer latency is TernausNet because its employed encoder, VGG16, requires a longer inference time than ResNet5026. The segmentation of organs in images can be used for the quantitative evaluation of surgery and to help surgeons understand anatomy.

Our study aims to alleviate the diminished sensory perception in minimally invasive and robot-assisted surgeries through the application of deep learning-based computer-aided techniques. We used deep learning for laparoscopic image segmentation, achieving a DICE score of 0.82 at approximately 11 FPS. Our results need refinement, especially in enhancing inference speed and diversifying data. Employing deep learning in MIS, like our approach stands as a promising solution.

Acknowledgments

None.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Funding

This work was supported by the GRRC program of Gyeonggi province. [GRRC-Gachon2023(B01), Development of AI-based medical imaging technology], and by the Technology Innovation Program (K_G012001185601, Building Data Sets for Artificial Intelligence Learning) funded By the Ministry of Trade Industry & Energy (MOTIE, Korea).

Fig 1.

Figure 1.(A) Flow chart of research. (B) Flowchart of data preprocessing. IoU: intersection over union, CLAHE: contrast-limited adaptive histogram equalization.
Journal of Innovative Medical Technology 2024; 2: 11-19https://doi.org/10.61940/jimt.240002

Fig 2.

Figure 2.(A) Network architecture of U-net with an encoder and decoder. (B) Decoder block of TernausNet. (C) Decoder block of LinkNet. (D) Network architecture of DeepLab V3+. DCNN: deep convolutional neural network.
Journal of Innovative Medical Technology 2024; 2: 11-19https://doi.org/10.61940/jimt.240002

Fig 3.

Figure 3.Examples of prediction by networks trained preprocessed data sets. Blue: liver, green: gallbladder, red: spleen.
Journal of Innovative Medical Technology 2024; 2: 11-19https://doi.org/10.61940/jimt.240002

Fig 4.

Figure 4.Box float of intersection over union (IoU) score and Dice coefficient as each class in TernausResNet.
Journal of Innovative Medical Technology 2024; 2: 11-19https://doi.org/10.61940/jimt.240002

Fig 5.

Figure 5.Examples of poor prediction cases. Case 1 includes hemorrhage, case 2 includes excessive shadows on the object class, and case 3 includes unintentional blurring between instrument and organ.
Journal of Innovative Medical Technology 2024; 2: 11-19https://doi.org/10.61940/jimt.240002

Table 1 . Components of the data, including the corresponding number of classes.

Raw data categoryLiverGallbladderSpleenTotal
Each231608471,138
Liver–gallbladder9369360936
Liver–spleen1600160160
Liver–gallbladder–spleen10101010
Total1,3371,0061,0172,244

Each image contains one or more classes, with a total of 2,244 images..


Table 2 . IoU score and Dice coefficient according based on the type of preprocessing and network architecture.

NetworkPreprocessingIoUDiceLatency (ms)
TernausNet0.68±0.250.75±0.2585
TernausResNet0.76±0.230.82±0.23103
LinkNet0.74±0.250.80±0.2485
DeepLab V3+0.75±0.230.81±0.2387
TernausNet×0.69±0.250.75±0.2586
TernausResNet×0.74±0.250.80±0.24102
LinkNet×0.73±0.250.79±0.2588
DeepLab V3+×0.74±0.230.80±0.2286

Values are presented as mean±standard deviation..

IoU: intersection over union..


References

  1. Longmore SK, Naik G, Gargiulo GD. Laparoscopic robotic surgery: current perspective and future directions. Robotics 2020;9:42. https://doi.org/10.3390/robotics9020042.
    CrossRef
  2. Omisore OM, Han S, Xiong J, Li H, Li Z, Wang L. A review on flexible robotic systems for minimally invasive surgery. IEEE Trans Syst Man Cybern Syst 2020;52:631-644. https://doi.org/10.1109/TSMC.2020.3026174.
    CrossRef
  3. Bandari N, Dargahi J, Packirisamy M. Tactile sensors for minimally invasive surgery: a review of the state-of-the-art, applications, and perspectives. IEEE Access 2020;8:7682-7708. https://doi.org/10.1109/ACCESS.2019.2962636.
    CrossRef
  4. Zhang YX, Wei XY, Yue WC, Zhu CJ, Ju F. A dual-mode tactile hardness sensor for intraoperative tumor detection and tactile imaging in robot-assisted minimally invasive surgery. Smart Mater Struct 2021;30:085041. https://doi.org/10.1088/1361-665X/ac112b.
    CrossRef
  5. Zhang X, Ji X, Wang J, Fan Y, Tao C. Renal surface reconstruction and segmentation for image-guided surgical navigation of laparoscopic partial nephrectomy. Biomed Eng Lett 2023;13:165-174. https://doi.org/10.1007/s13534-023-00263-1.
    Pubmed KoreaMed CrossRef
  6. Lee DH, Kim U, Gulrez T, Yoon WJ, Hannaford B, Choi HR. A laparoscopic grasping tool with force sensing capability. IEEE ASME Trans Mechatron 2016;21:130-141. https://doi.org/10.1109/TMECH.2015.2442591.
    CrossRef
  7. Pirie K, Myles PS, Riedel B. A survey of neuraxial analgesic preferences in open and laparoscopic major abdominal surgery amongst anaesthetists in Australia and New Zealand. Anaesth Intensive Care 2020;48:314-317. https://doi.org/10.1177/0310057X20937315.
    Pubmed CrossRef
  8. Padoy N. Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 2019;28:82-90. https://doi.org/10.1080/13645706.2019.1584116.
    Pubmed CrossRef
  9. Chen LC, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv [Preprint]. 2017 [cited 2023 May 8]. Available from: https://doi.org/10.48550/arXiv.1706.05587.
    CrossRef
  10. Lo BP, Darzi A, Yang GZ. Episode classification for the analysis of tissue/instrument interaction with multiple visual cues. Proceedings of Medical Image Computing and Computer-Assisted Intervention - MICCAI 2003: 6th International Conference, 2003 Nov 15-18; Montréal, Canada.
    CrossRef
  11. Bilodeau GA, Shu Y, Cheriet F. Multistage graph-based segmentation of thoracoscopic images. Comput Med Imaging Graph 2006;30:437-446. https://doi.org/10.1016/j.compmedimag.2006.07.003.
    Pubmed CrossRef
  12. Kawaguchi K, Bengio Y, Kaelbling L. Generalization in deep learning. In: Grohs P, Kutyniok G, eds. Mathematical aspects of deep learning. Cambridge University Press, pp 112-148, 2022.
    Pubmed CrossRef
  13. AlBadawy EA, Saha A, Mazurowski MA. Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 2018;45:1150-1158. https://doi.org/10.1002/mp.12752.
    Pubmed CrossRef
  14. Shvets AA, Rakhlin A, Kalinin AA, Iglovikov VI. Automatic instrument segmentation in robot-assisted surgery using deep learning. Proceedings of 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018 Dec 17-20; Orlando, USA.
    CrossRef
  15. Scheikl PM, Laschewski S, Kisilenko A, et al. Deep learning for semantic segmentation of organs and tissues in laparoscopic surgery. Curr Dir Biomed Eng 2020;6:20200016. https://doi.org/10.1515/cdbme-2020-0016.
    CrossRef
  16. Sun YW, Pan B, Fu YL. Lightweight deep neural network for real-time instrument semantic segmentation in robot assisted minimally invasive surgery. IEEE Robot Autom Lett 2021;6:3870-3877. https://doi.org/10.1109/LRA.2021.3066956.
    CrossRef
  17. Ni ZL, Bian GB, Xie XL, Hou ZG, Zhou XH, Zhou YJ. RASNet: segmentation for tracking surgical instruments in surgical videos using Refined Attention Segmentation Network. Annu Int Conf IEEE Eng Med Biol Soc 2019;2019:5735-5738. https://doi.org/10.1109/EMBC.2019.8856495.
    Pubmed CrossRef
  18. Hu P, Wu F, Peng J, Bao Y, Chen F, Kong D. Automatic abdominal multi-organ segmentation using deep convolutional neural network and time-implicit level sets. Int J Comput Assist Radiol Surg 2017;12:399-411. https://doi.org/10.1007/s11548-016-1501-5.
    Pubmed CrossRef
  19. Aurangzeb K, Aslam S, Alhussein M, Naqvi RA, Arsalan M, Haider SI. Contrast enhancement of fundus images by employing modified PSO for improving the performance of deep learning models. IEEE Access 2021;9:47930-47945. https://doi.org/10.1109/ACCESS.2021.3068477.
    CrossRef
  20. Siddique N, Paheding S, Elkin CP, Devabhaktuni V. U-net and its variants for medical image segmentation: a review of theory and applications. IEEE Access 2021;9:82031-82057. https://doi.org/10.1109/ACCESS.2021.3086020.
    CrossRef
  21. Iglovikov V, Shvets A. TernausNet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation. arXiv [Preprint]. 2018 [cited 2023 May 8]. Available from: https://doi.org/10.48550/arXiv.1801.05746.
    CrossRef
  22. Chaurasia A, Culurciello E. LinkNet: exploiting encoder representations for efficient semantic segmentation. Proceedings of 2017 IEEE Visual Communications and Image Processing (VCIP), 2017 Dec 10-13; Petersburg, USA.
    CrossRef
  23. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), 2018 Sep 8-14; Munich, Germany.
    CrossRef
  24. Rakshit S. Multiclass semantic segmentation using DeepLabV3+ [Internet]. Keras; 2021 [cited 2023 Apr 5]. Available from: https://keras.io/examples/vision/deeplabv3_plus/.
  25. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv [Preprint]. 2014 [cited 2023 May 17]. Available from: https://doi.org/10.48550/arXiv.1409.1556.
  26. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 Jun 26-Jul 1; Las Vegas, USA.
    Pubmed CrossRef
  27. Xu Y, Li Z, Li W, et al. Dual-channel residual network for hyperspectral image classification with noisy labels. IEEE Trans Geosci Remote Sens 2022;60:5502511. https://doi.org/10.1109/TGRS.2021.3057689.
    CrossRef
Journal of Innovative Medical Technology
May 30, 2024 Vol.2 No.1, pp. 1~28

Stats or Metrics

Share this article on

  • line

JIMT