搜索到1791篇“ RESAMPLING“的相关文章
Resampling approaches for the quantitative analysis of spatially distributed cells
2024年
Image segmentation is a crucial step in various image analysis pipelines and constitutes one of the cutting-edge areas of digital pathology.The advent of quantitative analysis has enabled the evaluation of millions of individual cells in tissues,allowing for the combined assessment of morphological features,biomarker expression,and spatial context.The recorded cells can be described as a point pattern process.However,the classical statistical approaches to point pattern processes prove unreliable in this context due to the presence of multiple irregularly-shaped interstitial cell-devoid spaces in the domain,which correspond to anatomical features(e.g.vessels,lipid vacuoles,glandular lumina)or tissue artefacts(e.g.tissue fractures),and whose coordinates are unknown.These interstitial spaces impede the accurate calculation of the domain area,resulting in biased clustering measurements.Moreover,the mistaken inclusion of empty regions of the domain can directly impact the results of hypothesis testing.The literature currently lacks any introduced bias correction method to address interstitial cell-devoid spaces.To address this gap,we propose novel resampling methods for testing spatial randomness and evaluating relationships among different cell populations.Our methods obviate the need for domain area estimation and provide non-biased clustering measurements.We created the SpaceR software(https://github.com/GBertolazzi/SpaceR)to enhance the accessibility of our methodologies.
Giorgio BertolazziMichele TumminelloGaia MorelloBeatrice BelmonteClaudio Tripodo
关键词:RESAMPLING
Unveiling protein corona composition:predicting with resampling embedding and machine learning
2024年
Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration.When nanoparticles(NPs)enter the living system,they quickly interact with proteins in the body fluid,forming the protein corona(PC).The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs.However,achieving accurate predictions remains a significant challenge.Although several machine learning(ML)models like Random Forest(RF)have been used for PC prediction,they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution.In this study,resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data.Various ML models were evaluated,and RF model was finally used for prediction,and good correlation coefficient(R^(2))and root-mean-square deviation(RMSE)values were obtained.Our ablation experiments demonstrated that the proposed method achieved an R^(2) of 0.68,indicating an improvement of approximately 10%,and an RMSE of 0.90,representing a reduction of approximately 10%.Furthermore,through the verification of label-free quantification of four NPs:hydroxyapatite(HA),titanium dioxide(TiO_(2)),silicon dioxide(SiO_(2))and silver(Ag),and we achieved a prediction performance with an R^(2) value>0.70 using Random Oversampling.Additionally,the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration,PDI and surface modification.
Rong LiaoYan ZhuangXiangfeng LiKe ChenXingming WangCong FengGuangfu YinXiangdong ZhuJiangli LinXingdong Zhang
关键词:NANOPARTICLES
SimHOEPI:A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model
2024年
Epistasis is a ubiquitous phenomenon in genetics,and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases.Simulation data is crucial for evaluating epistasis detection tools in genome-wide association studies(GWAS).Existing simulators normally suffer from two limitations:absence of support for high-order epistasis models containing multiple single nucleotide polymorphisms(SNPs),and inability to generate simulation SNP data independently.In this study,we proposed a simulator SimHOEPI,which is capable of calculating penetrance tables of high-order epistasis models depending on either prevalence or heritability,and uses a resampling strategy to generate simulation data independently.Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data,the accurate calculation and embedding of high-order epistasis models,and acceptable simulation time.A series of experiments were carried out to verify these properties from different aspects.Experimental results show that SimHOEPI can generate simulation SNP data independently with high-order epistasis models,implying that it might be an alternative simulator for GWAS.
Yahan LiXinrui CaiJunliang ShangYuanyuan ZhangJin-Xing Liu
微生物不平衡数据重采样算法的比较研究
2024年
重采样算法的主要思想为原始数据集经欠采样、过采样或混合采样处理,生成一个趋于平衡的数据集,进而使用经典的分类算法解决类不平衡问题。在疾病诊断领域,微生物数据集由于其高稀疏性,与其他不平衡数据集有较大区别。现有的重采样算法已在其他领域得到验证,但在疾病诊断领域,很少有研究对此类算法的有效性和适用性进行深入对比。基于此,对现有的重采样算法利用不同的微生物数据集和分类器进行研究比对。根据重采样算法的采样效果、分类器在不同数据集上的分类性能和不同分类器在数据集上的分类性能等3个方面分析实验结果,得到在不同的评价指标下最适用的重采样算法。验证了重采样算法在处理微生物不平衡数据集上的有效性,有利于解决数据不平衡分类问题,有助于在疾病诊断领域中研究人员快速选择合适的重采样算法和分类器。
温柳英谢潇楠
关键词:疾病诊断不平衡数据分类器
基于数据重采样与GRU神经网络的风电功率多步提前预测
2024年
准确预测不同时间尺度风电功率对于实现能源管理系统可靠运行至关重要。针对当前预测方法随着步数增加无法保持较高预测精度的问题,提出一种数据重采样技术与GRU神经网络相结合的风电功率多步提前预测方法;利用数据重采样技术对原始风电功率时间序列重新采样,得到新的风电功率时间序列;通过GRU神经网络对重新采样的时间序列进行单步提前预测,实现对原始风电功率时间序列的多步提前预测。利用澳大利亚某风力发电厂2022年、2023年数据进行试验,结果表明,本文方法比已有方法的平均绝对百分比误差和均方根误差至少降低了1.94%和6.13,具有更好的预测结果。
胡珈宁王旭周振雄
关键词:风电功率预测多步预测
一种整数抽取结合小数插值重采样技术研究
2024年
射频直采技术近年来在雷达、通信、电子对抗尤其是一体化接收等领域得到了广泛的应用。然而射频采样数据率极高,导致后续信号传输与实时处理难度变大。数字重采样技术能够在数字域降低采样率,从而减轻射频直采带来的信号处理压力。本文针对数字重采样技术开展研究,提出一种整数抽取结合小数插值重采样方法,推导了理论模型,给出了相应的参数递推解算方法,仿真验证了方法的有效性。采用本文所提方法,既可解决在射频高采样率下,直接进行小数插值带来的数字信号实时处理压力,又可在保证信号质量的情况下,解决特殊场景下精确变采样率处理的需求,进而实现对大范围任意分数倍抽取。
单长胜尹曙明郑哲郝利云
基于双向重采样的高分辨率前视成像算法
2024年
高速平台双基SAR的高速机动特性和双基前视构型使其高分辨率成像面临严峻挑战。在该体制下,发射机以侧视方式发送信号,而高速运动的接收平台在前视模式下接收回波。由于高速度、大加速度的存在,使SAR回波的距离徙动现象以及二维耦合、空变特性都更加严重,传统的“停走停”模型不再适用。为了解决上述问题,提出了适用于高速平台的“非停走停”斜距方程及回波模型,然后通过分析信号中的空变分量及其对回波相位的影响,提出了基于双向重采样的成像算法。该算法有效补偿了SAR回波在距离和方位向的空变相位误差,提高了高速平台双基SAR的前视聚焦性能,通过仿真验证了所提算法的有效性。
郑志瀛谭鸽伟蒋丁一
面向类不平衡数据集的重采样方法影响研究
2024年
为了评估重采样方法对类不平衡数据集的影响,对被广泛使用的美国威斯康星州的乳腺癌诊断数据集进行研究,基于逻辑斯特回归、支持向量机、随机森林等三种机器学习算法进行实验,对随机上采样抽样、随机下采样抽样、SMOTE以及ADASYN四种重采样方法使用F1值和AUC值进行了分析。实验结果表明,四种重采样方法均可以提升模型性能,其中随机下采样抽样在处理类不平衡数据集时被证明更加有效。
丁浩杰
关键词:支持向量机
基于键相信号微分的转子振动整周期重采样方法研究
2024年
转子振动信号处理的重要环节在于对振动信号进行整周期采样,但由于转速变化导致这一过程变得复杂,不利于对转子振动信号进行频谱分析。为了解决这一问题,利用光电传感器建立了一个软件方法来实现振动信号的整周期采样。针对光电传感器配合光电靶标采集到的键相信号,提出了一种结合移动平均滤波和信号微分的方法来获取转速触发信号,从而获得更准确的转速值和工频周期的起始标定,且不受转速变化的影响。为了验证所提方法的有效性,采集了32个整周期的转子振动信号,用傅里叶变换获得其频谱。结果显示,经过整周期重采样的信号频谱图谱线条清晰,没有频谱泄漏。这种方法可为相关系统的键相信号处理提供借鉴。
陶理王晓宇
关键词:转子振动重采样
An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine
2024年
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.
Bo ZhuXiaona JingLan QiuRunbo Li

相关作者

温伟刚
作品数:36被引量:128H指数:6
供职机构:北京交通大学机械与电子控制工程学院
研究主题:混沌 滚动轴承 故障诊断 变转速 解调算法
林明
作品数:43被引量:147H指数:7
供职机构:北京大学图书馆
研究主题:国际编目原则 RDA 试剂盒 西文编目 编目规则
程卫东
作品数:58被引量:265H指数:9
供职机构:北京交通大学机械与电子控制工程学院
研究主题:变转速 滚动轴承 滚动轴承故障诊断 故障诊断 瞬时故障
李建勇
作品数:246被引量:775H指数:15
供职机构:北京交通大学
研究主题:砂带 钢轨 钢轨打磨 廓形 砂带磨削