The rapid development of high-throughput chromatin conformation capture (Hi-C) technology provides rich genomic interaction data between chromosomal loci for chromatin structure analysis. However, existing methods for identifying topologically associated domains (TADs) based on Hi-C data suffer from low accuracy and sensitivity to parameters. In this context, a TAD identification method based on spatial density clustering was designed and implemented in this paper. The method preprocessed the raw Hi-C data to obtain normalized Hi-C contact matrix data. Then, it computed the distance matrix between loci, generated a reachability graph based on the core distance and reachability distance of loci, and extracted clustering clusters. Finally, it extracted TAD boundaries based on clustering results. This method could identify TAD structures with higher coherence, and TAD boundaries were enriched with more ChIP-seq factors. Experimental results demonstrate that our method has advantages such as higher accuracy and practical significance in TAD identification.
The deoxyribonucleic acid (DNA) molecule damage simulations with an atom level geometric model use the traversal algorithm that has the disadvantages of quite time-consuming, slow convergence and high-performance computer requirement. Therefore, this work presents a density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm based on the spatial distributions of energy depositions and hydroxyl radicals (·OH). The algorithm with probability and statistics can quickly get the DNA strand break yields and help to study the variation pattern of the clustered DNA damage. Firstly, we simulated the transportation of protons and secondary particles through the nucleus, as well as the ionization and excitation of water molecules by using Geant4-DNA that is the Monte Carlo simulation toolkit for radiobiology, and got the distributions of energy depositions and hydroxyl radicals. Then we used the damage probability functions to get the spatial distribution dataset of DNA damage points in a simplified geometric model. The DBSCAN clustering algorithm based on damage points density was used to determine the single-strand break (SSB) yield and double-strand break (DSB) yield. Finally, we analyzed the DNA strand break yield variation trend with particle linear energy transfer (LET) and summarized the variation pattern of damage clusters. The simulation results show that the new algorithm has a faster simulation speed than the traversal algorithm and a good precision result. The simulation results have consistency when compared to other experiments and simulations. This work achieves more precise information on clustered DNA damage induced by proton radiation at the molecular level with high speed, so that it provides an essential and powerful research method for the study of radiation biological damage mechanism.
At present, the incidence of Parkinson’s disease (PD) is gradually increasing. This seriously affects the quality of life of patients, and the burden of diagnosis and treatment is increasing. However, the disease is difficult to intervene in early stage as early monitoring means are limited. Aiming to find an effective biomarker of PD, this work extracted correlation between each pair of electroencephalogram (EEG) channels for each frequency band using weighted symbolic mutual information and k-means clustering. The results showed that State1 of Beta frequency band (P = 0.034) and State5 of Gamma frequency band (P = 0.010) could be used to differentiate health controls and off-medication Parkinson’s disease patients. These findings indicated that there were significant differences in the resting channel-wise correlation states between PD patients and healthy subjects. However, no significant differences were found between PD-on and PD-off patients, and between PD-on patients and healthy controls. This may provide a clinical diagnosis reference for Parkinson’s disease.
ObjectiveTo predict the total hospitalization expenses of bronchopneumonia inpatients in a tertiay hospital of Sichuan Province through BP neural network and support vector machine models, and analyze the influencing factors.MethodsThe home page information of 749 cases of bronchopneumonia discharged from a tertiay hospital of Sichuan Province in 2017 was collected and compiled. The BP neural network model and the support vector machine model were simulated by SPSS 20.0 and Clementine softwares respectively to predict the total hospitalization expenses and analyze the influencing factors.ResultsThe accuracy rate of the BP neural network model in predicting the total hospitalization expenses was 81.2%, and the top three influencing factors and their importances were length of hospital stay (0.477), age (0.154), and discharge department (0.083). The accuracy rate of the support vector machine model in predicting the total hospitalization expenses was 93.4%, and the top three influencing factors and their importances were length of hospital stay (0.215), age (0.196), and marital status (0.172), but after stratified analysis by Mantel-Haenszel method, the correlation between marital status and total hospitalization expenses was not statistically significant (χ2=0.137, P=0.711).ConclusionsThe BP neural network model and the support vector machine model can be applied to predicting the total hospitalization expenses and analyzing the influencing factors of patients with bronchopneumonia. In this study, the prediction effect of the support vector machine is better than that of the BP neural network model. Length of hospital stay is an important influencing factor of total hospitalization expenses of bronchopneumonia patients, so shortening the length of hospital stay can significantly lighten the economic burden of these patients.
The use of echocardiography ventricle segmentation can obtain ventricular volume parameters, and it is helpful to evaluate cardiac function. However, the ultrasound images have the characteristics of high noise and difficulty in segmentation, bringing huge workload to segment the object region manually. Meanwhile, the automatic segmentation technology cannot guarantee the segmentation accuracy. In order to solve this problem, a novel algorithm framework is proposed to segment the ventricle. Firstly, faster region-based convolutional neural network is used to locate the object to get the region of interest. Secondly, K-means is used to pre-segment the image; then a mean shift with adaptive bandwidth of kernel function is proposed to segment the region of interest. Finally, the region growing algorithm is used to get the object region. By this framework, ventricle is obtained automatically without manual localization. Experiments prove that this framework can segment the object accurately, and the algorithm of adaptive mean shift is more stable and accurate than the mean shift with fixed bandwidth on quantitative evaluation. These results show that the method in this paper is helpful for automatic segmentation of left ventricle in echocardiography.
Due to the minimum free energy model, it is very important to predict the RNA secondary structure accurately and efficiently from the suboptimal foldings. Using clustering techniques in analyzing the suboptimal structures could effectively improve the prediction accuracy. An improved k-medoids cluster method is proposed to make this a better accuracy with the RBP score and the incremental candidate set of medoids matrix in this paper. The algorithm optimizes initial medoids through an expanding medoids candidate sets gradually.The predicted results indicated this algorithm could get a higher value of CH and significantly shorten the time for calculating clustering RNA folding structures.
Our team proposed and constructed an Expert-knowledge and Data-driven Comprehensive Evaluation Model of Chinese Patent Medicine (EDCEM-CPM) using the machine learning algorithm. This model could improve the system of the comprehensive evaluation of the Chinese patent medicine in technology and provide measurement tools for Chinese patent medicine according to its characteristics. The model evaluates the multi-dimensional value of Chinese patent medicine by data pre-treatment, clustering algorithms, and data training steps, such as automatic learning weighting. This evaluation model is already in practice. In this paper, we introduced the establishment of the model with the calculation process for reference.
Objective To investigate the dietary patterns of rural residents in the high-incidence areas of esophageal cancer (EC), and to explore the clustering and influencing factors of risk factors associated with high-incidence characteristics. Methods A special structured questionnaire was applied to conduct a face-to-face survey on the dietary patterns of rural residents in Yanting county of Sichuan Province from July to August 2021. Univariate and multivariate logistic regression models were used to analyze the influencing factors of risk factor clustering for EC. Results There were 838 valid questionnaires in this study. A total of 90.8% of rural residents used clean water such as tap water. In the past one year, the people who ate fruits and vegetables, soybean products, onions and garlic in high frequency accounted for 69.5%, 32.8% and 74.5%, respectively; the people who ate kimchi, pickled vegetables, sauerkraut, barbecue, hot food and mildew food in low frequency accounted for 59.2%, 79.6%, 68.2%, 90.3%, 80.9% and 90.3%, respectively. The clustering of risk factors for EC was found in 73.3% of residents, and the aggregation of two risk factors was the most common mode (28.2%), among which tumor history and preserved food was the main clustering pattern (4.6%). The logistic regression model revealed that the gender, age, marital status and occupation were independent influencing factors for the risk factors clustering of EC (P<0.05). Conclusion A majority of rural residents in high-incidence areas of EC in Yanting county have good eating habits, but the clustering of some risk factors is still at a high level. Gender, age, marital status, and occupation are influencing factors of the risk factors clustering of EC.
In order to develop safe training intensity and training methods for the passive balance rehabilitation training system, we propose in this paper a mathematical model for human standing balance adjustment based on T-S fuzzy identification method. This model takes the acceleration of a multidimensional motion platform as its inputs, and human joint angles as its outputs. We used the artificial bee colony optimization algorithm to improve fuzzy C-means clustering algorithm, which enhanced the efficiency of the identification for antecedent parameters. Through some experiments, the data of 9 testees were collected, which were used for model training and model results validation. With the mean square error and cross-correlation between the simulation data and measured data, we concluded that the model was accurate and reasonable.
Accurate segmentation of pulmonary nodules is an important basis for doctors to determine lung cancer. Aiming at the problem of incorrect segmentation of pulmonary nodules, especially the problem that it is difficult to separate adhesive pulmonary nodules connected with chest wall or blood vessels, an improved random walk method is proposed to segment difficult pulmonary nodules accurately in this paper. The innovation of this paper is to introduce geodesic distance to redefine the weights in random walk combining the coordinates of the nodes and seed points in the image with the space distance. The improved algorithm is used to achieve the accurate segmentation of pulmonary nodules. The computed tomography (CT) images of 17 patients with different types of pulmonary nodules were selected for segmentation experiments. The experimental results are compared with the traditional random walk method and those of several literatures. Experiments show that the proposed method has good accuracy in the segmentation of pulmonary nodule, and the accuracy can reach more than 88% with segmentation time is less than 4 seconds. The results could be used to assist doctors in the diagnosis of benign and malignant pulmonary nodules and improve clinical efficiency.