随着网络信息文本的爆发式增长,人们从繁多的新闻中获取特定有效的信息变得愈发困难。在大数据处理中,学者们经常使用文本聚类方法作为新闻主题提取和趋势跟踪的主要措施。针对凝聚型层次聚类算法和K-Means算法在文本聚类上的优势和缺陷,提出一种新的新闻文本聚类优化处理算法——QH-K(K-Means based on Quick Hierarchical Clustering)算法。首先,通过word2vector模型训练文本得到词向量;其次,采用优化的凝聚型层次聚类算法对文本聚类,并根据优化处理算法所提出聚类有效性指标ST得到初始聚类个数和聚类中心;最后,引入K-Means算法对聚类结果进行优化,提高最终聚类的效果。实验证明,QHK聚类优化处理算法的正确率、召回率、F值相比传统算法都得到了一定程度的提升;此外,算法的运行时间也有所下降。
传统的社交网络推荐一般依靠用户之间的好友关系,但好友关系不是基于共同兴趣而产生的。针对这种情况,提出通过用户标签所表达的情感兴趣来扩展用户好友关系,形成基于用户好友关系和共同兴趣的混合推荐。利用用户间直接的朋友关系构建显式社交网络,利用标签数据构建隐式社交网络;在显式和隐式社交网络图中分别采用提出的SNA_SPFA(Social Networks Algorithm Based on Shortest Path Faster Algorithm)算法得到推荐结果;最后按照一定权重混合两种推荐结果。实验表明,该方法优于传统的协同过滤方法和社交网络推荐。
To solve the problem that the traditional location-based services anonymity model is not applied to the continuous query, quasi real-time cloak algorithm (QR-TCA) has been proposed, and the non-delay cloak model (N-DCM) has been established. After comprehensive analysis of location-based continuous query privacy protection models, a model that can solve the user service delay problem is proposed, which can provide the users with quasi-real time location-based services. The experiment measures the N-DCM model with multiple dimensions, such as service response time and service quality of standard datasets. The experiment results show that the method is suitable for continuous query location privacy protection and can effectively protect the user’s location privacy.