Colorectal cancer (CRC) is a common malignant tumor that seriously threatens human health. CRC presents a formidable challenge in terms of accurate identification due to its indistinct boundaries. With the widespread adoption of convolutional neural networks (CNNs) in image processing, leveraging CNNs for automatic classification and segmentation holds immense potential for enhancing the efficiency of colorectal cancer recognition and reducing treatment costs. This paper explores the imperative necessity for applying CNNs in clinical diagnosis of CRC. It provides an elaborate overview on research advancements pertaining to CNNs and their improved models in CRC classification and segmentation. Furthermore, this work summarizes the ideas and common methods for optimizing network performance and discusses the challenges faced by CNNs as well as future development trends in their application towards CRC classification and segmentation, thereby promoting their utilization within clinical diagnosis.
Cardiovascular disease (CVD) is one of the leading causes of death worldwide. Heart sound classification plays a key role in the early detection of CVD. The difference between normal and abnormal heart sounds is not obvious. In this paper, in order to improve the accuracy of the heart sound classification model, we propose a heart sound feature extraction method based on bispectral analysis and combine it with convolutional neural network (CNN) to classify heart sounds. The model can effectively suppress Gaussian noise by using bispectral analysis and can effectively extract the features of heart sound signals without relying on the accurate segmentation of heart sound signals. At the same time, the model combines with the strong classification performance of convolutional neural network and finally achieves the accurate classification of heart sound. According to the experimental results, the proposed algorithm achieves 0.910, 0.884 and 0.940 in terms of accuracy, sensitivity and specificity under the same data and experimental conditions, respectively. Compared with other heart sound classification algorithms, the proposed algorithm shows a significant improvement and strong robustness and generalization ability, so it is expected to be applied to the auxiliary detection of congenital heart disease.
Objective To develop a deep learning system for CT images to assist in the diagnosis of thoracolumbar fractures and analyze the feasibility of its clinical application. Methods Collected from West China Hospital of Sichuan University from January 2019 to March 2020, a total of 1256 CT images of thoracolumbar fractures were annotated with a unified standard through the Imaging LabelImg system. All CT images were classified according to the AO Spine thoracolumbar spine injury classification. The deep learning system in diagnosing ABC fracture types was optimized using 1039 CT images for training and validation, of which 1004 were used as the training set and 35 as the validation set; the rest 217 CT images were used as the test set to compare the deep learning system with the clinician’s diagnosis. The deep learning system in subtyping A was optimized using 581 CT images for training and validation, of which 556 were used as the training set and 25 as the validation set; the rest 104 CT images were used as the test set to compare the deep learning system with the clinician’s diagnosis. Results The accuracy and Kappa coefficient of the deep learning system in diagnosing ABC fracture types were 89.4% and 0.849 (P<0.001), respectively. The accuracy and Kappa coefficient of subtyping A were 87.5% and 0.817 (P<0.001), respectively. Conclusions The classification accuracy of the deep learning system for thoracolumbar fractures is high. This approach can be used to assist in the intelligent diagnosis of CT images of thoracolumbar fractures and improve the current manual and complex diagnostic process.
The processing mechanism of the human brain for speech information is a significant source of inspiration for the study of speech enhancement technology. Attention and lateral inhibition are key mechanisms in auditory information processing that can selectively enhance specific information. Building on this, the study introduces a dual-branch U-Net that integrates lateral inhibition and feedback-driven attention mechanisms. Noisy speech signals input into the first branch of the U-Net led to the selective feedback of time-frequency units with high confidence. The generated activation layer gradients, in conjunction with the lateral inhibition mechanism, were utilized to calculate attention maps. These maps were then concatenated to the second branch of the U-Net, directing the network’s focus and achieving selective enhancement of auditory speech signals. The evaluation of the speech enhancement effect was conducted by utilising five metrics, including perceptual evaluation of speech quality. This method was compared horizontally with five other methods: Wiener, SEGAN, PHASEN, Demucs and GRN. The experimental results demonstrated that the proposed method improved speech signal enhancement capabilities in various noise scenarios by 18% to 21% compared to the baseline network across multiple performance metrics. This improvement was particularly notable in low signal-to-noise ratio conditions, where the proposed method exhibited a significant performance advantage over other methods. The speech enhancement technique based on lateral inhibition and feedback-driven attention mechanisms holds significant potential in auditory speech enhancement, making it suitable for clinical practices related to artificial cochleae and hearing aids.
This study aims to optimize surface electromyography-based gesture recognition technique, focusing on the impact of muscle fatigue on the recognition performance. An innovative real-time analysis algorithm is proposed in the paper, which can extract muscle fatigue features in real time and fuse them into the hand gesture recognition process. Based on self-collected data, this paper applies algorithms such as convolutional neural networks and long short-term memory networks to provide an in-depth analysis of the feature extraction method of muscle fatigue, and compares the impact of muscle fatigue features on the performance of surface electromyography-based gesture recognition tasks. The results show that by fusing the muscle fatigue features in real time, the algorithm proposed in this paper improves the accuracy of hand gesture recognition at different fatigue levels, and the average recognition accuracy for different subjects is also improved. In summary, the algorithm in this paper not only improves the adaptability and robustness of the hand gesture recognition system, but its research process can also provide new insights into the development of gesture recognition technology in the field of biomedical engineering.
Objective To establish a machine learning-based risk prediction model of combined chronic obstructive pulmonary disease (COPD) with lung cancer, so as to explore the high risk factors for COPD patients with lung cancer and to lay the foundation for early detection of lung cancer risk in COPD patients. Methods A total of 154 patients from the Second Hospital of Dalian Medical University from 2010 to 2021 were retrospectively analyzed, including 99 patients in the COPD group and 55 patients in the COPD with lung cancer group. the chest high resolution computed tomography (HRCT) scans and pulmonary function test of each patient were acquired. The main analyses were as follow: (1) to valid the statistically differences of the basic information (such as age, body mass index, smoking index), laboratory test results, pulmonary function parameters and quantitative parameters of chest HRCT between the two groups; (2) to analyze the indicators of high risk factors for lung cancer in COPD patients using univariate and binary logistic regression (LR) methods; and (3) to establish the machine learning model (such as LR and Gaussian process) for COPD with lung cancer patients. Results Based on the statistical analysis and LR methods, decreased BMI, increased whole lung emphysema index, increased whole lung mean density, and increased percentage activity of exertional spirometry and prothrombin time were risk factors for COPD with lung cancer patients. Based on the machine learning prediction model for COPD with lung cancer patients, the area under the receiver operating characteristic curve for LR and Gaussian process were obtained as 0.88 using the soluble fragments of prothrombin time percentage activity, whole lung emphysema index, whole lung mean density, and forced vital capacity combined with neuron-specific enolase and cytokeratin 19 as features. Conclusion The prediction model of COPD with lung cancer patients using a machine learning approach can be used for early detection of lung cancer risk in COPD patients.
The finite element method is a new method to study the mechanism of brain injury caused by blunt instruments. But it is not easy to be applied because of its technology barrier of time-consuming and strong professionalism. In this study, a rapid and quantitative evaluation method was investigated to analyze the craniocerebral injury induced by blunt sticks based on convolutional neural network and finite element method. The velocity curve of stick struck and the maximum principal strain of brain tissue (cerebrum, corpus callosum, cerebellum and brainstem) from the finite element simulation were used as the input and output parameters of the convolutional neural network The convolutional neural network was trained and optimized by using the 10-fold cross-validation method. The Mean Absolute Error (MAE), Mean Square Error (MSE), and Goodness of Fit (R2) of the finally selected convolutional neural network model for the prediction of the maximum principal strain of the cerebrum were 0.084, 0.014, and 0.92, respectively. The predicted results of the maximum principal strain of the corpus callosum were 0.062, 0.007, 0.90, respectively. The predicted results of the maximum principal strain of the cerebellum and brainstem were 0.075, 0.011, and 0.94, respectively. These results show that the research and development of the deep convolutional neural network can quickly and accurately assess the local brain injury caused by the sticks blow, and have important application value for understanding the quantitative evaluation and the brain injury caused by the sticks struck. At the same time, this technology improves the computational efficiency and can provide a basis reference for transforming the current acceleration-based brain injury research into a focus on local brain injury research.
Manual segmentation of coronary arteries in computed tomography angiography (CTA) images is inefficient, and existing deep learning segmentation models often exhibit low accuracy on coronary artery images. Inspired by the Transformer architecture, this paper proposes a novel segmentation model, the double parallel encoder u-net with transformers (DUNETR). This network employed a dual-encoder design integrating Transformers and convolutional neural networks (CNNs). The Transformer encoder transformed three-dimensional (3D) coronary artery data into a one-dimensional (1D) sequential problem, effectively capturing global multi-scale feature information. Meanwhile, the CNN encoder extracted local features of the 3D coronary arteries. The complementary features extracted by the two encoders were fused through the noise reduction feature fusion (NRFF) module and passed to the decoder. Experimental results on a public dataset demonstrated that the proposed DUNETR model achieved a Dice similarity coefficient of 81.19% and a recall rate of 80.18%, representing improvements of 0.49% and 0.46%, respectively, over the next best model in comparative experiments. These results surpassed those of other conventional deep learning methods. The integration of Transformers and CNNs as dual encoders enables the extraction of rich feature information, significantly enhancing the effectiveness of 3D coronary artery segmentation. Additionally, this model provides a novel approach for segmenting other vascular structures.
The count and recognition of white blood cells in blood smear images play an important role in the diagnosis of blood diseases including leukemia. Traditional manual test results are easily disturbed by many factors. It is necessary to develop an automatic leukocyte analysis system to provide doctors with auxiliary diagnosis, and blood leukocyte segmentation is the basis of automatic analysis. In this paper, we improved the U-Net model and proposed a segmentation algorithm of leukocyte image based on dual path and atrous spatial pyramid pooling. Firstly, the dual path network was introduced into the feature encoder to extract multi-scale leukocyte features, and the atrous spatial pyramid pooling was used to enhance the feature extraction ability of the network. Then the feature decoder composed of convolution and deconvolution was used to restore the segmented target to the original image size to realize the pixel level segmentation of blood leukocytes. Finally, qualitative and quantitative experiments were carried out on three leukocyte data sets to verify the effectiveness of the algorithm. The results showed that compared with other representative algorithms, the proposed blood leukocyte segmentation algorithm had better segmentation results, and the mIoU value could reach more than 0.97. It is hoped that the method could be conducive to the automatic auxiliary diagnosis of blood diseases in the future.
High resolution (HR) magnetic resonance images (MRI) or computed tomography (CT) images can provide clearer anatomical details of human body, which facilitates early diagnosis of the diseases. However, due to the imaging system, imaging environment and human factors, it is difficult to obtain clear high-resolution images. In this paper, we proposed a novel medical image super resolution (SR) reconstruction method via multi-scale information distillation (MSID) network in the non-subsampled shearlet transform (NSST) domain, namely NSST-MSID network. We first proposed a MSID network that mainly consisted of a series of stacked MSID blocks to fully exploit features from images and effectively restore the low resolution (LR) images to HR images. In addition, most previous methods predict the HR images in the spatial domain, producing over-smoothed outputs while losing texture details. Thus, we viewed the medical image SR task as the prediction of NSST coefficients, which make further MSID network keep richer structure details than that in spatial domain. Finally, the experimental results on our constructed medical image datasets demonstrated that the proposed method was capable of obtaining better peak signal to noise ratio (PSNR), structural similarity (SSIM) and root mean square error (RMSE) values and keeping global topological structure and local texture detail better than other outstanding methods, which achieves good medical image reconstruction effect.