引入
目前与LLM交互存在一些常见的问题
- 期望返回稳定的数据格式,但是模型不太稳定,我们需要不断修改 Promot 来告诉大模型如何返回格式,但是偶尔大模型还是会抽风,会多返回一些内容。
- 指定数据输出格式的Promot 在 LangChain 中会有固定模板去生成
- 如果部分格式错误,LangChain 会提供重试和修复方法。来减少模型返回 BadCase
- 目前Feed Server RD大多与大模型的交互是一轮交互。类似文本重写,摘要等应用。近一段时间模型应用开始变得复杂一些,如评估和改写,多个Promot 相互配合。而多个 Prompt 相互组合可以达到更好的效果,开发更复杂的应用。
- LangChain 提供了 Chain 来实现模型应用的“顺序执行”(SequencialChain)和“分支执行”(RouterChain)
- LangChain 可以使用 Agent 结合 Tool 让大模型“持续思考”,“拥有手脚” 通过这些可以构建较为复杂的大模型应用。
*LangChain 是什么:*
LangChain 是一个开源框架,用于构建基于大型语言模型(LLM)的应用程序。LLM 是基于大量数据预先训练的大型深度学习模型,可以生成对用户查询的响应,例如回答问题或根据基于文本的提示创建图像。LangChain 提供各种工具和抽象,以提高模型生成的信息的定制性、准确性和相关性。
*分享内容:*
- 分享LangChain 核心组件
- 分享两个实战案例
- 私有知识库
- 给出需求自动生成 SQL
组件概述
组件介绍
模型 IO(Model IO)
语言模型(Model)
LangChain 支持的模型有三大类
- 大语言模型(LLM) 也叫 Text Model,这些模型将文本字符串作为输入,并返回文本字符串作为输出
- 聊天模型(Chat Model)主要代表 Open AI 的 ChatGPT 系列模型,Ernie-bot等
- 文本嵌入模型(Embedding Model)这些模型将文本作为输入并返回浮点数列表,也就是 Embedding。
- 文本嵌入模型负责把文档存入向量数据库
这些服务无论是 OpenAI 还是baidu千帆,都会提供,如下是千帆大模型平台 API 文档
提示词模板 (Prompts)
PromptTemplate
最常用的提示词模板:
ChatPromptTemplate
常用的Chat提示模版,用于组合各种角色的消息模版,传入聊天模型(Chat Model)
聊天模型可以给模型进行角色的设置,以及历史对话的能力。
具体的消息模版包括ChatMessagePromptTemplate、HumanMessagePromptTemplate、AIMessagePromptTemplate、和SystemMessagePromptTemplate
格式化结果
FewShotTemplate
*什么是 FewShot?*
Few-Shot(少样本)、One-Shot(单样本)和与之对应的 Zero-Shot(零样本)的概念都起源于机器学习。
我们可以使用 FewShotTemplate 设置一些样本信息:
PipeLineTemplate
在平常业务开发中,我们常常需要把一些公共模块提取出来作为一个独立的部分,然后将业务中去将这些模块进行组合。在LLM应用开发中,我们也会需要采用这种思想,比如将一些公共的promt模版独立出来,这样prompt模版就可以更好地复用,减少不必要的代码,保持代码和逻辑的简洁。
解析(Parse)
LangChain 提供的解析模型输出的功能,使你能够更容易地从模型输出中获取结构化的信息,这将大大加快基于语言模型进行应用开发的效率。
StructuredOutputParser
结构化输出解释器:这个解析器用于处理复杂的、结构化的输出。如果你的应用需要模型生成具有特定结构的复杂回答
返回结果
Pydantic(JSON)
是什么: 这个解析器用于处理模型的输出,当模型的输出应该是一个符合特定格式的 JSON 对象时使用。它使用 Pydantic 库,这是一个数据验证库,可以用于构建复杂的数据模型,并确保模型的输出符合预期的数据模型。
案例:
输出:
Auto-Fixing Parser
是什么:这个解析器可以自动修复某些常见的模型输出错误。例如,如果模型的输出应该是一段文本,但是模型返回了一段包含语法或拼写错误的文本,自动修复解析器可以自动纠正这些错误。
案例: JSON返回错误格式修复
自动修复是如何实现的:
在 OutputFixingParser 内部,调用了原有的 PydanticOutputParser,如果成功,就返回;如果失败,它会将格式错误的输出以及格式化的指令传递给大模型,并要求 LLM 进行相关的修复。
RetryWithErrorOutputParser
这个解析器用于在模型的初次输出不符合预期时,尝试修复或重新生成新的输出。例如,如果模型的输出应该是一个日期,但是模型返回了一个字符串,那么重试解析器可以重新提示模型生成正确的日期格式。
链 (Chain)
顺序链 (SequencialChain )
顺序联将多个Promot 输入输出串联起来
案例:根据鲜花的介绍和评论写出一篇自媒体的文案
Step1. 第一个LLMChain:生成鲜花的介绍
Step2. 第二个LLMChain:根据鲜花的介绍写出鲜花的评论
Step3. 第三个LLMChain:根据鲜花的介绍和评论写出一篇自媒体的文案,需要拿到 1,2 的结果
https://github.com/huangjia2019/langchain/blob/main/08_%E9%93%BE%E4%B8%8A/04_SequentialChain.py
路由链(RouterChain)
路由链本身其实是让大模型做 If-Else
对于两个场景,两份不同的 Promot执行两个不同的任务,先编写 Prompt 让大模型自己判断应该执行哪个Promot,再去选择对应的 Promot获得对应的结果。
案例: 制作了一个智能客服,针对用户的问题选择不同的 Prompt,用户的问题可能有两类,【如何养花】以及【如何做鲜花装饰】
Step1. 编写场景 1 的 Prompt
Step2. 编写场景 2 的 Prompt
Step3. 编写 Prompt 让大模型决定应该选择哪个 Prompt
Code:
https://github.com/huangjia2019/langchain/blob/main/09_%E9%93%BE%E4%B8%8B/Rounter_Chain.py
记忆(Memory)
如何使用?
1 | conversation = ConversationChain(llm=llm, memory=ConversationBufferMemory()) |
其中ConversationSummaryBufferMemory 即对话总结缓冲记忆,它是一种混合记忆模型,结合了上述各种记忆机制,包括 ConversationSummaryMemory 和 ConversationBufferWindowMemory 的特点。这种模型旨在在对话中总结早期的互动,同时尽量保留最近互动中的原始内容。
1 | 第一次对话后的记忆: |
耗费 Token数随交互词数增长图:
代理 (Agent)
关于翻译
Agent 大部分场景下确实叫做代理,不过偶尔也会叫做智能体,比如以这俩关键词搜索,出现如下结果
个人感觉,Agent 在 LLM 场景叫做智能体可能更符合语意。因为 Agent 的执行本身就很类似于人的思考过程
Agent 是如何工作的
代理类型有很多,目前介绍ReAct 代理类型。
工具 (Tools)
实战
自己实现一个专有问答机器人? 啥是 RAG
相关Demo Code:
使用到的组件
- 文档加载 document_loaders
- 文本切分 RecursiveCharacterTextSplitter
- 向量数据库
- 通过向量查询 MultiQueryRetriever
- RetrievalQA链
这其实就是检索增强生成:
什么是 RAG?其全称为 Retrieval-Augmented Generation,即检索增强生成,它结合了检索和生成的能力,为文本序列生成任务引入外部知识。RAG 将传统的语言生成模型与大规模的外部知识库相结合,使模型在生成响应或文本时可以动态地从这些知识库中检索相关信息。这种结合方法旨在增强模型的生成能力,使其能够产生更为丰富、准确和有根据的内容,特别是在需要具体细节或外部事实支持的场合。
RAG 的工作原理可以概括为几个步骤。
- 检索:对于给定的输入(问题),模型首先使用检索系统从大型文档集合中查找相关的文档或段落。这个检索系统通常基于密集向量搜索,例如 ChromaDB、Faiss 这样的向量数据库。
- 上下文编码:找到相关的文档或段落后,模型将它们与原始输入(问题)一起编码。
- 生成:使用编码的上下文信息,模型生成输出(答案)。这通常当然是通过大模型完成的。
RAG 的一个关键特点是,它不仅仅依赖于训练数据中的信息,还可以从大型外部知识库中检索信息。这使得 RAG 模型特别适合处理在训练数据中未出现的问题。
大模型连接数据库?生成 SQL?
不使用 Agent:
1 | # 导入langchain的实用工具和相关的模块 |
执行结果:
Why?
1 | You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. [背景介绍] |
通过这个 Promot 来交给大模型生成 SQL 信息
1 | SELECT COUNT(DISTINCT \"Name\") AS \"Number of Flowers\" FROM \"Flowers\" |
之后需要解析返回结果。在这一步也会请求大模型
1 | You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. |
轮次 2 响应
1 | 有5种不同的鲜花。 |
问题,如果库表多了,几十张表,我们难道要把所有的表 Schema 都给到模型?Token 也扛不住呀,解决方案是让大模型判断,需要哪张表
使用 Agent 的查询:
Code
1 | from langchain.utilities import SQLDatabase |
ReAct: 输出
LangChain 是如何实现这样的功能的?它是如何引导大模型,同时结合工具来实现的,直接 debug 源码不如直接抓包看最终和大模型的交互,查看 LangChain 生成了什么 Prompt
这次查询用请求了 4 次大模型
第一次请求,要求大模型思考,需要调用什么工具
轮次 | 备注 | Request | Result |
---|---|---|---|
1 | 给 LLM 工具让大模型思考解决这个问题应该选取那个工具,执行什么行动大模型选择”sql_db_list_tables”先看下数据表有哪些 | You are an agent designed to interact with a SQL database.Given an input question, create a syntactically correct sqlite query to run, then look at the results of the query and return the answer.Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 10 results.You can order the results by a relevant column to return the most interesting examples in the database.Never query for all the columns from a specific table, only ask for the relevant columns given the question.You have access to tools for interacting with the database.Only use the below tools. Only use the information returned by the below tools to construct your final answer.You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.If the question does not seem related to the database, just return "I don't know" as the answer.sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3sql_db_list_tables: Input is an empty string, output is a comma separated list of tables in the database.sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [sql_db_query, sql_db_schema, sql_db_list_tables, sql_db_query_checker]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Question: 哪种鲜花的存货数量最少?Thought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables. |
Action: sql_db_list_tablesAction Input: "" |
2 | 大模型思考应该选择什么库表,调用工具,返回所有的表名 | You are an agent designed to interact with a SQL database.Given an input question, create a syntactically correct sqlite query to run, then look at the results of the query and return the answer.Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 10 results.You can order the results by a relevant column to return the most interesting examples in the database.Never query for all the columns from a specific table, only ask for the relevant columns given the question.You have access to tools for interacting with the database.Only use the below tools. Only use the information returned by the below tools to construct your final answer.You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.If the question does not seem related to the database, just return "I don't know" as the answer.sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3sql_db_list_tables: Input is an empty string, output is a comma separated list of tables in the database.sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [sql_db_query, sql_db_schema, sql_db_list_tables, sql_db_query_checker]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Question: 哪种鲜花的存货数量最少?Thought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.Action: sql_db_list_tablesAction Input: ""Observation: FlowersThought: |
I should query the schema of the Flowers table to see what columns are available.Action: sql_db_schemaAction Input: Flowers |
3 | 大模型观察表名称,选择了 Flower表,调用 Agent 获得了表 Schema,再次和大模型交互 | You are an agent designed to interact with a SQL database.Given an input question, create a syntactically correct sqlite query to run, then look at the results of the query and return the answer.Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 10 results.You can order the results by a relevant column to return the most interesting examples in the database.Never query for all the columns from a specific table, only ask for the relevant columns given the question.You have access to tools for interacting with the database.Only use the below tools. Only use the information returned by the below tools to construct your final answer.You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.If the question does not seem related to the database, just return "I don't know" as the answer.sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3sql_db_list_tables: Input is an empty string, output is a comma separated list of tables in the database.sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [sql_db_query, sql_db_schema, sql_db_list_tables, sql_db_query_checker]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Question: 哪种鲜花的存货数量最少?Thought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.Action: sql_db_list_tablesAction Input: ""Observation: FlowersThought:I should query the schema of the Flowers table to see what columns are available.Action: sql_db_schemaAction Input: FlowersObservation: CREATE TABLE "Flowers" ( "ID" INTEGER, "Name" TEXT NOT NULL, "Type" TEXT NOT NULL, "Source" TEXT NOT NULL, "PurchasePrice" REAL, "SalePrice" REAL, "StockQuantity" INTEGER, "SoldQuantity" INTEGER, "ExpiryDate" DATE, "Description" TEXT, "EntryDate" DATE DEFAULT CURRENT_DATE, PRIMARY KEY ("ID"))/*3 rows from Flowers table:ID Name Type Source PurchasePrice SalePrice StockQuantity SoldQuantity ExpiryDate Description EntryDate1 Rose Flower France 1.2 2.5 100 10 2023-12-31 A beautiful red rose 2024-01-172 Tulip Flower Netherlands 0.8 2.0 150 25 2023-12-31 A colorful tulip 2024-01-173 Lily Flower China 1.5 3.0 80 5 2023-12-31 An elegant white lily 2024-01-17*/Thought: |
I can query the Flowers table to find the flower with the minimum stock quantity.Action: sql_db_queryAction Input: SELECT Name, StockQuantity FROM Flowers ORDER BY StockQuantity ASC LIMIT 1 |
4 | 获取最终答案 | You are an agent designed to interact with a SQL database.Given an input question, create a syntactically correct sqlite query to run, then look at the results of the query and return the answer.Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 10 results.You can order the results by a relevant column to return the most interesting examples in the database.Never query for all the columns from a specific table, only ask for the relevant columns given the question.You have access to tools for interacting with the database.Only use the below tools. Only use the information returned by the below tools to construct your final answer.You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.If the question does not seem related to the database, just return "I don't know" as the answer.sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3sql_db_list_tables: Input is an empty string, output is a comma separated list of tables in the database.sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [sql_db_query, sql_db_schema, sql_db_list_tables, sql_db_query_checker]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Question: 哪种鲜花的存货数量最少?Thought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.Action: sql_db_list_tablesAction Input: ""Observation: FlowersThought:I should query the schema of the Flowers table to see what columns are available.Action: sql_db_schemaAction Input: FlowersObservation: CREATE TABLE "Flowers" ( "ID" INTEGER, "Name" TEXT NOT NULL, "Type" TEXT NOT NULL, "Source" TEXT NOT NULL, "PurchasePrice" REAL, "SalePrice" REAL, "StockQuantity" INTEGER, "SoldQuantity" INTEGER, "ExpiryDate" DATE, "Description" TEXT, "EntryDate" DATE DEFAULT CURRENT_DATE, PRIMARY KEY ("ID"))/*3 rows from Flowers table:ID Name Type Source PurchasePrice SalePrice StockQuantity SoldQuantity ExpiryDate Description EntryDate1 Rose Flower France 1.2 2.5 100 10 2023-12-31 A beautiful red rose 2024-01-172 Tulip Flower Netherlands 0.8 2.0 150 25 2023-12-31 A colorful tulip 2024-01-173 Lily Flower China 1.5 3.0 80 5 2023-12-31 An elegant white lily 2024-01-17*/Thought:I can query the Flowers table to find the flower with the minimum stock quantity.Action: sql_db_queryAction Input: SELECT Name, StockQuantity FROM Flowers ORDER BY StockQuantity ASC LIMIT 1Observation: [('Orchid', 50)]Thought: |
The flower with the minimum stock quantity is Orchid.Final Answer: Orchid |
如何打造一个个人助理?
【林亦LYi的作品】 https://v.douyin.com/iLDk6FcB/
参考资料
https://github.com/huangjia2019/langchain
相关资料
附录
大语言模型参数