| 中文题名: | 大模型自动化提示词攻击实证研究 |
| 姓名: | |
| 学科名称: | 工学 - 计算机类 - 计算机科学与技术 |
| 学科代码: | 080901 |
| 学生类型: | 学士 |
| 学位名称: | 工学学士 |
| 学校: | 中国人民大学 |
| 院系: | |
| 专业: | |
| 第一导师姓名: | |
| 完成日期: | 2025-05-16 |
| 提交日期: | 2025-05-27 |
| 中文关键词: | |
| 外文关键词: | Large language models ; Prompt injection attacks ; Model security ; Domestic models ; Automated generation |
| 中文摘要: |
随着大语言模型的应用越来越广泛,大模型安全也越来越受到关注。在众多攻击方式中,提示词攻击是最具威胁性的种类之一,它具有复杂性高、形式多样、攻击门槛低、难以防范等众多特点,因此具有较高的研究价值。 本文首先对提示词攻击涉及到的一些背景知识进行说明,随后重点研究并介绍了实验中四种攻击涉及到的多种提示词攻击方式的定义、原理以及示例,为接下来实验对完整实验流程介绍进行铺垫。 在紧接着的下一个小节中,本文对实验目的进行了说明,并基于这个目的完成了对实验对象的选择:混合攻击、语言学变异、角色扮演、DAN攻击四种攻击方式,以及来自DeepSeek、Qwen、ChatGLM三个系列的六款大模型。随后,本文开始详细描述完整的实验流程,重点介绍了自动化生成提示词以及自动化判断攻击结果的研究方法,并展示了四种攻击使用的提示词和产生的输出结果。在实验部分的最后,本文根据实验结果的统计数据进行了分析,并总结出实验的结论:DeepSeek系列的模型安全性能较差、语言学变异的攻击方式成功率相对较高、参数规模较大的大模型的安全防护性能通常比同系列小型模型更强。 最后,本文对这次实验进行了归纳与总结,并指出了局限性以及对未来研究的展望。该实验的数据和结论可为其他对实验选取模型或其他小型模型安全性的研究提供一定参考。
关键词:大语言模型,提示词攻击,模型安全,国产模型,自动化生成 |
| 外文摘要: |
With the increasingly widespread application of large language models, the security of these models has drawn growing attention. Among various attack methods, prompt injection attacks are one of the most threatening types, characterized by high complexity, diverse forms, low attack threshold, and difficulty in prevention. Therefore, they have significant research value. This paper first explains some background knowledge related to prompt injection attacks, and then focuses on studying and introducing the definitions, principles, and examples of multiple prompt injection attack methods involved in the four attacks in the experiment, laying the groundwork for the subsequent introduction of the complete experimental process. In the following section, the paper clarifies the experimental purpose and, based on this purpose, selects the experimental subjects: four attack methods (hybrid attack, linguistic variation, role-playing, and DAN attack) and six large models from three series (DeepSeek, Qwen, and ChatGLM). Subsequently, the paper elaborates on the complete experimental process, highlighting the research methods for automatically generating prompts and automatically judging attack results, and presents the prompts used in the four attacks and the resulting outputs. At the end of the experimental section, the paper analyzes the statistical data of the experimental results and concludes that models from the DeepSeek series have relatively poor security performance, linguistic variation attacks have a relatively high success rate, and large models with larger parameter scales generally have stronger security protection performance than smaller models in the same series. Finally, the paper summarizes and concludes the experiment, points out its limitations, and looks forward to future research. The data and conclusions of this experiment can provide certain references for other studies on the security of the selected models or other small models.
Keywords: Large language models, Prompt injection attacks, Model security, Domestic models, Automated generation |
| 论文分类号: | TP3 |
| 总页码: | 40 |
| 参考文献: |
[1] 王磊,张超. 基于Prompt的安全性测试与评估[J]. 信息安全研究,2023, 9(2): 105–112. [2] 赵鑫,李军毅,周昆,唐天一,文继荣. 大语言模型[M]. 北京:高等教育出版社,2024:3-8. [3] OpenAI. GPT-4 Technical Report[EB/OL]. https://openai.com/research/gpt-4, 2023-03-15. [7] OWASP. OWASP Top 10 for LLM Applications 2025[R]. California: OWASP, 2025. [9] 陈卓. 大语言模型的越狱攻击与防御综述[J]. 网络空间安全,2024, 12(4): 34–42. [13] 张谧,潘旭东,杨珉.JADE-DB:基于靶向变异的大语言模型安全通用基准测试集[J].计算机研究与发展,2024,61(05):1113-1127. [20] 非凡产研. 全球 AI 应用榜单(100AIApps)[EB/OL]. https://100aiapps.cn/, 2025-05-16. |
| 开放日期: | 2025-05-28 |
| 主修/辅修: | 主修 |