查看论文信息

中文题名：	基于文本分析和代价敏感支持向量机的众筹欺诈项目监测模型研究
姓名：	张雅暄
学科名称：	计算机科学与技术
学生类型：	学士
学位名称：	工学学士
学校：	中国人民大学
院系：	信息学院
专业：	计算机科学与技术
第一导师姓名：	许伟
完成日期：	2016-05-12
提交日期：	2016-05-12
外文题名：	Fraud detection in Crowdfunding based on text analysis and cost-sensitive SVM
中文关键词：	众筹 ; 欺诈检测 ; 说服理论 ; 隐含狄利克雷分布 ; 代价敏感支持向量机
外文关键词：	Crowdfunding ; fraud detection ; Theory of Persuasion ; LDA ; Cost-Sensitive SVM
中文摘要：	︿摘要近年来,越来越多的投资者和创业者将目光投向众筹领域。在我国,长期以来同时存在着中小微企业融资难和民间资本投资渠道不畅的问题,众筹,作为金融创新的新思路, 解决了投资者和企业家的燃眉之急。随着众筹的火热,众多研究者对该领域产生了浓厚兴趣,针对众筹平台、众筹项目及众筹各方参与者进行了多方位的研究,特别是在众筹发展模式探索、驱动因素分析、项目成功率预测、项目推荐等领域进行了深入的研究并取得了一定的成果。伴随着众筹领域广受关注,项目欺诈问题也由此滋生。国内外均发生了多起欺诈案例, 涉及多种众筹形式、波及大量知名众筹平台,给投资者带来不少损失,也给众筹领域监管敲响了警钟。然而,目前针对众筹领域的欺诈问题的研究还极其有限。国内外多数研究都仅仅是从法律监管层面进行了探讨,并未从众筹项目本身出发,给出控制众筹欺诈行之有效的方案。本文基于凡勃伦效应、说服理论的基础上,利用 LDA 模型提取项目文案中的主题分布,通过情感分析模型提取项目结束日前情感值,结合项目本身的数值型、统计信息,构建特征集,再对代价敏感性支持向量机模型进行训练,最终建立众筹欺诈监测模型。通过 python 语言中的 urllib、re 以及 beautifulsoup 模块构建爬虫,爬取了国内最大众筹平台众筹网(http://www.zhongchou.cn)的详细项目数据,对模型进行实证分析,证明了模型的可行性、实用性,同时验证了特征的有效性。关键词:众筹;欺诈检测;说服理论;隐含狄利克雷分布;代价敏感支持向量机﹀
外文摘要：	︿ Abstract In recent years, increasing number of entrepreneurs, investors and aspirants cast their eyes to crowdfunding. Many Researchers have contributed to a profound and multi-dimensional study of crowdfunding, including development pattern, intuition analysis, success rate prediction and project recommendation. Too fast development will breed fraudulent problems, which will bring huge loss to investors and platforms. There have been several fraudulent cased in China and in America, concerning various categories and platforms. However, studies on fraud detection in crowdfunding are limited at present in both sides. The paper studies fraudulent projects in online crowdfunding platforms and proposes an automatic fraud detection model using text mining and machine learning based on persuasion theory and social influence theory. By programming in python, the study builds a web spider to crawl project details from Zhongchou.com (http://www.zhongchou.com) and assess the established model with the data. To a degree, the model can capture fraudulent projects and give risk warning. Especially, this paper proposes features based on psychological and sociological theory and verify the effect of them by comparative experiments. Key Words: Crowdfunding;fraud detection;Theory of Persuasion;LDA;Cost-Sensitive SVM ﹀
总页码：	33
参考文献：	︿参考文献 [1] 肖芳. 国内众筹网站举步维艰[J]. 互联网周刊, 2013(10):20-20. [2] Narita, K., Kaneda, T., Yamada, S., Niina, E., Miyazaki, K., & Ishikawa, H. (2009). Ex Ante Crowdfunding and the Recording Industry: A Model for the U.S.[J]. 49(3):1592-1594. [3] Belleflamme, P., Lambert, T., & Schwienbacher, A. (2014). Crowdfunding: Tapping the Right Crowd[J]. Core Discussion Papers, 29(5), 585-609. [4] Gerber, E., Hui, J., & Kuo, P. Y. (2012). Crowdfunding: Why People are Motivated to Post and Fund Projects on Crowdfunding Platforms[J]. Computer Supported Cooperative Work. [5] Mollick, E. (2014). The Dynamics of Crowdfunding: An Exploratory Study[J]. Journal of Business Venturing, 29(1), 1-16. [6] Etter, V., Grossglauser, M., & Thiran, P. (2013). Launch Hard or Go Home!: Predicting the Success of Kickstarter Campaigns[J]. ACM Conference on Online Social Networks , 177-182. [7] Mitra, T., & Gilbert, E. (2014). The Language That Gets People to Give: Phrases That Predict Success on Kickstarter[J]. ACM Conference on Computer Supported Cooperative Work & Social Computing , 49-61. [8] Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The Application of Data Mining Techniques in Financial Fraud Detection: A Classification Framework and An Academic Review of Literature[J]. Decision Support Systems, 50(3), 559-569. [9] Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data Mining for Credit Card Fraud: A Comparative Study[J]. Decision Support Systems, 50(3), 602-613. [10] Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of Financial Statement Fraud and Feature Selection Using Data Mining Techniques[J]. Decision Support Systems, 50(2), 491-500. [11] Hu, N., Liu, L., & Sambamurthy, V. (2011). Fraud Detection in Online Consumer Reviews[J]. Decision Support Systems, 50(3), 614-626. [12] Verma, V. K., Ranjan, M., & Mishra, P. (2015). Text Mining and Information Professionals: Role, Issues and Challenges[J]. International Symposium on Emerging Trends and Technologies in Libraries and Information Services. IEEE. [13] Hearst, M. (2003). What is Text Mining[C]. (Vol.17, pp.40). [14] Berry, M. W. (2010). Survey of Text Mining. Springer Berlin. [15] Schumaker, R. P., & Chen, H. (2010). A Discrete Stock Price Prediction Engine Based on Financial News[J]. Computer, 43(1), 51-56. [16] Wang, J. C., Chiu, C. C., & Tang, J. J. (2005). The Correlation Study of eWOM and Product Sales Predictions through SNA Perspectives: An Exploratory Investigation by Taiwan's Cellular Phone Market[C]. International Conference on Electronic Commerce, 666-673. [17] Yu, X., Liu, Y., Huang, X., & An, A. (2012). Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain[J]. IEEE Transactions on Knowledge & Data Engineering, 24(4), 720-734. [18] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 3, 993-1022. [19] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 101 Suppl 1, 5228-5235. [20] Fan, W., Stolfo, S. J., Zhang, J., & Chan, P. K. (1999). AdaCost: Misclassification Cost-Sensitive Boosting[C]. Sixteenth International Conference on Machine Learning, 22, 97-105. [21] 郑恩辉, 李平, & 宋执环. (2006). 代价敏感支持向量机[J]. 控制与决策, 21(4), 473-476. [22] Veblen, T. (2000). The Theory of the Leisure Class: An Economic Study in the Evolution of Institutions[J]. American Journal of Sociology, 15(2), 138-142. [23] Cialdini, R. B. (1993). Influence: the Psychology of Persuasion[J]. Influence the Psychology of Persuasion. [24] 谢尔巴特赫. 欺诈术与欺诈心理[M]. 华文出版社, 2006. [25] Chevalier, J. A., & Mayzlin, D. (2003). The Effect of Word of Mouth on Sales: Online Book Reviews[J]. Nber Working Papers, 43(3), 345--354. [26] Antweiler, W., & Frank, M. Z. (2004). Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards[J]. Journal of Finance, 59(3), 1259-1294. [27] 苏号朋, & 鞠晔. (2012). 论网络消费欺诈的法律规制. 法律适用(1), 31-36. [28] 零壹数据. 零壹财经独家发布中国众筹行业2014年度简报[EB/OL]. http://www.01caijing.com/html/zc/1439_8230.html. 2016年4月20日访问. ﹀
开放日期：	2016-05-13

附件下载