An algorithm for plagiarism-detection of scientific papers based on local word-frequency fingerprint is presented. Sentence is regarded as the basic component elements of a document, and extracting efficient keywords, sorting and reconstructing them. According to the code and word-frequency, the fingerprints are get to compute text similarity degree.【提出一种基于局部词频指纹的论文抄袭检测算法。将句子看成文档的基本构成元素,对其进行有效关键词提取排序重构,根据编码和词频联合方式获取句子指纹,以此计算文本间相似度。在新闻网页精简集SOGOU-T 上的实验结果表明,该算法在一定程度上克服了现有论文抄袭检测算法检测精度低的缺点,具有较快的检测速度。】
This paper compares and evaluates 4 kinds of foreign full- text E- journal databases ( Blackwell Synergy, scienceDirect Onsite, SpringerLINK, Wiley Interscience) from the aspects of retrieval function, retr ieval result handling, and personality service, and puts forward some suggestion in the end.【本文主要从检索功能、检索结果处理、个性化服务等几个方面对国外四种综合电子期刊全文数据库(Blackwell Synergy、ScienceDirectOnSite、SpringerLINK、WileyInterscience) 进行比较和评价,最后提出了几点建议。】
The phenomenon of high coincidence rate and paper plagiarism were illustrated from the author and content, combined with the academic research for professional titles evaluation materials in universities. The mysteries of coincidence rate were unraveled, which provided a reference to avoid high content coincidence rate for the reader.【结合高校职称评审材料学术检索工作,从作者和内容两方面说明了论文高内容重合率的种种现象以及判定为抄袭的各种情况,为读者解开内容重合率神秘的面纱,并为读者在论文写作中避免高内容重合率提供借鉴作用。】
A new model for plagiarism-identification of scientific papers based on sentence similarity is presented.Large-scale texts are quickly detected with Local Word-Frequency Fingerprint(LWFF) to find suspected plagiarism ones.Sentence similarity is computed according to the Longest Sorted Common Subsequence(LSCS) between source texts and destination texts.The algorithm can mark plagiarism details,and show evidence. 【提出一种基于句子相似度的论文抄袭检测模型。利用局部词频指纹算法对大规模文档进行快速检测,找出疑似抄袭文档。根据最长有序公共子序列算法计算句子间的相似度,并标注抄袭细节,给出抄袭依据。在标准中文数据集SOGOU-T上进行的实验表明,该模型具有较强的局部信息挖掘能力,在一定程度上克服了现有的论文抄袭检测算法精度不高的缺点。】