With the rapid development of Internet technology and the advent of the era of big data,more and more cyber security texts are provided on the Internet.These texts include not only security concepts,incidents,tools,guidelines,and policies,but also risk management approaches,best practices,assurances,technologies,and more.Through the integration of large-scale,heterogeneous,unstructured cyber security information,the identification and classification of cyber security entities can help handle cyber security issues.Due to the complexity and diversity of texts in the cyber security domain,it is difficult to identify security entities in the cyber security domain using the traditional named entity recognition(NER)methods.This paper describes various approaches and techniques for NER in this domain,including the rule-based approach,dictionary-based approach,and machine learning based approach,and discusses the problems faced by NER research in this domain,such as conjunction and disjunction,non-standardized naming convention,abbreviation,and massive nesting.Three future directions of NER in cyber security are proposed:(1)application of unsupervised or semi-supervised technology;(2)development of a more comprehensive cyber security ontology;(3)development of a more comprehensive deep learning model.
为了提高软件缺陷预测的准确率,利用布谷鸟搜索(cuckoo search,CS)算法的寻优能力和人工神经网络(artificial neural network,ANN)算法的非线性计算能力,提出了基于CS-ANN的软件缺陷预测方法。此方法首先使用基于关联规则的特征选择算法降低数据的维度,去除了噪声属性;然后利用布谷鸟搜索算法寻找神经网络算法的权值,使用权值和神经网络算法构建出预测模型;最后使用此模型完成缺陷预测。使用公开的NASA数据集进行仿真实验,结果表明该模型降低了误报率,并提高了预测的准确率,综合评价指标AUC(area under the ROC curve)、F1值和G-mean都优于现有模型。