Objective To propose a heart sound segmentation method based on multi-feature fusion network. Methods Data were obtained from the CinC/PhysioNet 2016 Challenge dataset (a total of 3 153 recordings from 764 patients, about 91.93% of whom were male, with an average age of 30.36 years). Firstly the features were extracted in time domain and time-frequency domain respectively, and reduced redundant features by feature dimensionality reduction. Then, we selected optimal features separately from the two feature spaces that performed best through feature selection. Next, the multi-feature fusion was completed through multi-scale dilated convolution, cooperative fusion, and channel attention mechanism. Finally, the fused features were fed into a bidirectional gated recurrent unit (BiGRU) network to heart sound segmentation results. Results The proposed method achieved precision, recall and F1 score of 96.70%, 96.99%, and 96.84% respectively. Conclusion The multi-feature fusion network proposed in this study has better heart sound segmentation performance, which can provide high-accuracy heart sound segmentation technology support for the design of automatic analysis of heart diseases based on heart sounds.
In the process of lower limb rehabilitation training, fatigue estimation is of great significance to improve the accuracy of intention recognition and avoid secondary injury. However, most of the existing methods only consider surface electromyography (sEMG) features but ignore electrocardiogram (ECG) features when performing in fatigue estimation, which leads to the low and unstable recognition efficiency. Aiming at this problem, a method that uses the fusion features of ECG and sEMG signal to estimate the fatigue during lower limb rehabilitation was proposed, and an improved particle swarm optimization-support vector machine classifier (improved PSO-SVM) was proposed and used to identify the fusion feature vector. Finally, the accurate recognition of the three states of relax, transition and fatigue was achieved, and the recognition rates were 98.5%, 93.5%, and 95.5%, respectively. Comparative experiments showed that the average recognition rate of this method was 4.50% higher than that of sEMG features alone, and 13.66% higher than that of the combined features of ECG and sEMG without feature fusion. It is proved that the feature fusion of ECG and sEMG signals in the process of lower limb rehabilitation training can be used for recognizing fatigue more accurately.
Current studies on electroencephalogram (EEG) emotion recognition primarily concentrate on discrete stimulus paradigms under controlled laboratory settings, which cannot adequately represent the dynamic transition characteristics of emotional states during multi-context interactions. To address this issue, this paper proposes a novel method for emotion transition recognition that leverages a cross-modal feature fusion and global perception network (CFGPN). Firstly, an experimental paradigm encompassing six types of emotion transition scenarios was designed, and EEG and eye movement data were simultaneously collected from 20 participants, each annotated with dynamic continuous emotion labels. Subsequently, deep canonical correlation analysis integrated with a cross-modal attention mechanism was employed to fuse features from EEG and eye movement signals, resulting in multimodal feature vectors enriched with highly discriminative emotional information. These vectors are then input into a parallel hybrid architecture that combines convolutional neural networks (CNNs) and Transformers. The CNN is employed to capture local time-series features, whereas the Transformer leverages its robust global perception capabilities to effectively model long-range temporal dependencies, enabling accurate dynamic emotion transition recognition. The results demonstrate that the proposed method achieves the lowest mean square error in both valence and arousal recognition tasks on the dynamic emotion transition dataset and a classic multimodal emotion dataset. It exhibits superior recognition accuracy and stability when compared with five existing unimodal and six multimodal deep learning models. The approach enhances both adaptability and robustness in recognizing emotional state transitions in real-world scenarios, showing promising potential for applications in the field of biomedical engineering.
In order to solve the current problems in medical equipment maintenance, this study proposed an intelligent fault diagnosis method for medical equipment based on long short term memory network(LSTM). Firstly, in the case of no circuit drawings and unknown circuit board signal direction, the symptom phenomenon and port electrical signal of 7 different fault categories were collected, and the feature coding, normalization, fusion and screening were preprocessed. Then, the intelligent fault diagnosis model was built based on LSTM, and the fused and screened multi-modal features were used to carry out the fault diagnosis classification and identification experiment. The results were compared with those using port electrical signal, symptom phenomenon and the fusion of the two types. In addition, the fault diagnosis algorithm was compared with BP neural network (BPNN), recurrent neural network (RNN) and convolution neural network (CNN). The results show that based on the fused and screened multi-modal features, the average classification accuracy of LSTM algorithm model reaches 0.970 9, which is higher than that of using port electrical signal alone, symptom phenomenon alone or the fusion of the two types. It also has higher accuracy than BPNN, RNN and CNN, which provides a relatively feasible new idea for intelligent fault diagnosis of similar equipment.
Speech feature learning is the core and key of speech recognition method for mental illness. Deep feature learning can automatically extract speech features, but it is limited by the problem of small samples. Traditional feature extraction (original features) can avoid the impact of small samples, but it relies heavily on experience and is poorly adaptive. To solve this problem, this paper proposes a deep embedded hybrid feature sparse stack autoencoder manifold ensemble algorithm. Firstly, based on the prior knowledge, the psychotic speech features are extracted, and the original features are constructed. Secondly, the original features are embedded in the sparse stack autoencoder (deep network), and the output of the hidden layer is filtered to enhance the complementarity between the deep features and the original features. Third, the L1 regularization feature selection mechanism is designed to compress the dimensions of the mixed feature set composed of deep features and original features. Finally, a weighted local preserving projection algorithm and an ensemble learning mechanism are designed, and a manifold projection classifier ensemble model is constructed, which further improves the classification stability of feature fusion under small samples. In addition, this paper designs a medium-to-large-scale psychotic speech collection program for the first time, collects and constructs a large-scale Chinese psychotic speech database for the verification of psychotic speech recognition algorithms. The experimental results show that the main innovation of the algorithm is effective, and the classification accuracy is better than other representative algorithms, and the maximum improvement is 3.3%. In conclusion, this paper proposes a new method of psychotic speech recognition based on embedded mixed sparse stack autoencoder and manifold ensemble, which effectively improves the recognition rate of psychotic speech.
Existing emotion recognition research is typically limited to static laboratory settings and has not fully handle the changes in emotional states in dynamic scenarios. To address this problem, this paper proposes a method for dynamic continuous emotion recognition based on electroencephalography (EEG) and eye movement signals. Firstly, an experimental paradigm was designed to cover six dynamic emotion transition scenarios including happy to calm, calm to happy, sad to calm, calm to sad, nervous to calm, and calm to nervous. EEG and eye movement data were collected simultaneously from 20 subjects to fill the gap in current multimodal dynamic continuous emotion datasets. In the valence-arousal two-dimensional space, emotion ratings for stimulus videos were performed every five seconds on a scale of 1 to 9, and dynamic continuous emotion labels were normalized. Subsequently, frequency band features were extracted from the preprocessed EEG and eye movement data. A cascade feature fusion approach was used to effectively combine EEG and eye movement features, generating an information-rich multimodal feature vector. This feature vector was input into four regression models including support vector regression with radial basis function kernel, decision tree, random forest, and K-nearest neighbors, to develop the dynamic continuous emotion recognition model. The results showed that the proposed method achieved the lowest mean square error for valence and arousal across the six dynamic continuous emotions. This approach can accurately recognize various emotion transitions in dynamic situations, offering higher accuracy and robustness compared to using either EEG or eye movement signals alone, making it well-suited for practical applications.
The effective classification of multi-task motor imagery electroencephalogram (EEG) is helpful to achieve accurate multi-dimensional human-computer interaction, and the high frequency domain specificity between subjects can improve the classification accuracy and robustness. Therefore, this paper proposed a multi-task EEG signal classification method based on adaptive time-frequency common spatial pattern (CSP) combined with convolutional neural network (CNN). The characteristics of subjects' personalized rhythm were extracted by adaptive spectrum awareness, and the spatial characteristics were calculated by using the one-versus-rest CSP, and then the composite time-domain characteristics were characterized to construct the spatial-temporal frequency multi-level fusion features. Finally, the CNN was used to perform high-precision and high-robust four-task classification. The algorithm in this paper was verified by the self-test dataset containing 10 subjects (33 ± 3 years old, inexperienced) and the dataset of the 4th 2018 Brain-Computer Interface Competition (BCI competition Ⅳ-2a). The average accuracy of the proposed algorithm for the four-task classification reached 93.96% and 84.04%, respectively. Compared with other advanced algorithms, the average classification accuracy of the proposed algorithm was significantly improved, and the accuracy range error between subjects was significantly reduced in the public dataset. The results show that the proposed algorithm has good performance in multi-task classification, and can effectively improve the classification accuracy and robustness.
Colorectal polyps are important early markers of colorectal cancer, and their early detection is crucial for cancer prevention. Although existing polyp segmentation models have achieved certain results, they still face challenges such as diverse polyp morphology, blurred boundaries, and insufficient feature extraction. To address these issues, this study proposes a parallel coordinate fusion network (PCFNet), aiming to improve the accuracy and robustness of polyp segmentation. PCFNet integrates parallel convolutional modules and a coordinate attention mechanism, enabling the preservation of global feature information while precisely capturing detailed features, thereby effectively segmenting polyps with complex boundaries. Experimental results on Kvasir-SEG and CVC-ClinicDB demonstrate the outstanding performance of PCFNet across multiple metrics. Specifically, on the Kvasir-SEG dataset, PCFNet achieved an F1-score of 0.897 4 and a mean intersection over union (mIoU) of 0.835 8; on the CVC-ClinicDB dataset, it attained an F1-score of 0.939 8 and an mIoU of 0.892 3. Compared with other methods, PCFNet shows significant improvements across all performance metrics, particularly in multi-scale feature fusion and spatial information capture, demonstrating its innovativeness. The proposed method provides a more reliable AI-assisted diagnostic tool for early colorectal cancer screening.
Lung nodules are the main manifestation of early lung cancer. So accurate detection of lung nodules is of great significance for early diagnosis and treatment of lung cancer. However, the rapid and accurate detection of pulmonary nodules is a challenging task due to the complex background, large detection range of pulmonary computed tomography (CT) images and the different sizes and shapes of pulmonary nodules. Therefore, this paper proposes a multi-scale feature fusion algorithm for the automatic detection of pulmonary nodules to achieve accurate detection of pulmonary nodules. Firstly, a three-layer modular lung nodule detection model was designed on the deep convolutional network (VGG16) for large-scale image recognition. The first-tier module of the network is used to extract the features of pulmonary nodules in CT images and roughly estimate the location of pulmonary nodules. Then the second-tier module of the network is used to fuse multi-scale image features to further enhance the details of pulmonary nodules. The third-tier module of the network was fused to analyze the features of the first-tier and the second-tier module of the network, and the candidate box of pulmonary nodules in multi-scale was obtained. Finally, the candidate box of pulmonary nodules under multi-scale was analyzed with the method of non-maximum suppression, and the final location of pulmonary nodules was obtained. The algorithm is validated by the data of pulmonary nodules on LIDC-IDRI common data set. The average detection accuracy is 90.9%.
Remote photoplethysmography is susceptible to motion artifacts and individual physiological variations in complex environments. This paper proposes a remote heart rate estimation method based on frequency regulation and multi-scale spatio-temporal modeling. To address artifact noise issues, a frequency-regulated normalization module is designed to emphasize the dominant heart rate frequency while suppressing noise. To address the issue of individual physiological variations, the proposed method introduces a multi-level spatio-temporal feature fusion module to comprehensively capture physiological information through multi-scale convolutions and cross-layer integration. Subsequently, a dynamic weighting spatio-temporal feature module is introduced during spatio-temporal modeling to enhance long-term dependency modeling. Experimental results demonstrate that the proposed method achieves superior performance in cross-dataset evaluation. When trained on the PURE dataset and tested on the UBFC-rPPG dataset, the mean absolute error decreases from 1.31 to 1.28. Conversely, when trained on the UBFC-rPPG dataset and tested on the PURE dataset, the mean absolute error further decreases from 0.97 to 0.82. These results significantly outperform existing state-of-the-art methods, demonstrating the strong generalization capability and outstanding performance of our model across datasets. From the perspectives of frequency-regulated and multi-scale spatio-temporal modeling, this work enriches the modeling methodology for remote photoplethysmography pulse wave-based heart rate estimation, enhancing the stability and usability of remote heart rate estimation under complex interference and cross-scenario conditions.