Image segmentation is a crucial step in various image analysis pipelines and constitutes one of the cutting-edge areas of digital pathology.The advent of quantitative analysis has enabled the evaluation of millions of individual cells in tissues,allowing for the combined assessment of morphological features,biomarker expression,and spatial context.The recorded cells can be described as a point pattern process.However,the classical statistical approaches to point pattern processes prove unreliable in this context due to the presence of multiple irregularly-shaped interstitial cell-devoid spaces in the domain,which correspond to anatomical features(e.g.vessels,lipid vacuoles,glandular lumina)or tissue artefacts(e.g.tissue fractures),and whose coordinates are unknown.These interstitial spaces impede the accurate calculation of the domain area,resulting in biased clustering measurements.Moreover,the mistaken inclusion of empty regions of the domain can directly impact the results of hypothesis testing.The literature currently lacks any introduced bias correction method to address interstitial cell-devoid spaces.To address this gap,we propose novel resampling methods for testing spatial randomness and evaluating relationships among different cell populations.Our methods obviate the need for domain area estimation and provide non-biased clustering measurements.We created the SpaceR software(https://github.com/GBertolazzi/SpaceR)to enhance the accessibility of our methodologies.
Giorgio BertolazziMichele TumminelloGaia MorelloBeatrice BelmonteClaudio Tripodo
Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration.When nanoparticles(NPs)enter the living system,they quickly interact with proteins in the body fluid,forming the protein corona(PC).The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs.However,achieving accurate predictions remains a significant challenge.Although several machine learning(ML)models like Random Forest(RF)have been used for PC prediction,they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution.In this study,resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data.Various ML models were evaluated,and RF model was finally used for prediction,and good correlation coefficient(R^(2))and root-mean-square deviation(RMSE)values were obtained.Our ablation experiments demonstrated that the proposed method achieved an R^(2) of 0.68,indicating an improvement of approximately 10%,and an RMSE of 0.90,representing a reduction of approximately 10%.Furthermore,through the verification of label-free quantification of four NPs:hydroxyapatite(HA),titanium dioxide(TiO_(2)),silicon dioxide(SiO_(2))and silver(Ag),and we achieved a prediction performance with an R^(2) value>0.70 using Random Oversampling.Additionally,the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration,PDI and surface modification.
Rong LiaoYan ZhuangXiangfeng LiKe ChenXingming WangCong FengGuangfu YinXiangdong ZhuJiangli LinXingdong Zhang
Epistasis is a ubiquitous phenomenon in genetics,and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases.Simulation data is crucial for evaluating epistasis detection tools in genome-wide association studies(GWAS).Existing simulators normally suffer from two limitations:absence of support for high-order epistasis models containing multiple single nucleotide polymorphisms(SNPs),and inability to generate simulation SNP data independently.In this study,we proposed a simulator SimHOEPI,which is capable of calculating penetrance tables of high-order epistasis models depending on either prevalence or heritability,and uses a resampling strategy to generate simulation data independently.Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data,the accurate calculation and embedding of high-order epistasis models,and acceptable simulation time.A series of experiments were carried out to verify these properties from different aspects.Experimental results show that SimHOEPI can generate simulation SNP data independently with high-order epistasis models,implying that it might be an alternative simulator for GWAS.
Yahan LiXinrui CaiJunliang ShangYuanyuan ZhangJin-Xing Liu
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.