Phi-3-mini-128k-instruct实战教程：基于vLLM API封装REST接口供Web端调用

张开发

• 2026/6/19 9:37:07 • 15 分钟阅读

分享文章

Phi-3-mini-128k-instruct实战教程基于vLLM API封装REST接口供Web端调用1. 模型简介Phi-3-Mini-128K-Instruct是一个38亿参数的轻量级开放模型属于Phi-3系列的最新成员。这个模型经过精心训练特别擅长理解和执行各种指令任务。模型的主要特点包括支持长达128K tokens的上下文处理能力在常识推理、语言理解、数学计算和编程任务上表现优异经过监督微调和直接偏好优化确保响应质量和安全性相比同类模型在参数规模小于130亿的模型中性能领先2. 环境准备与部署验证2.1 检查模型服务状态部署完成后可以通过以下命令验证服务是否正常运行cat /root/workspace/llm.log如果看到类似下面的输出说明模型已成功加载并准备好接收请求Loading model weights... Model loaded successfully vLLM API server started on port 80002.2 使用Chainlit进行初步测试Chainlit提供了一个简单的前端界面可以快速测试模型功能。2.2.1 启动Chainlit界面在终端运行以下命令启动Chainlitchainlit run app.py这将打开一个本地Web界面您可以直接与模型交互。2.2.2 测试模型响应在Chainlit界面中输入问题例如请用简单的语言解释量子计算的基本概念模型应该会返回一个结构清晰、易于理解的回答展示其指令遵循能力。3. 封装REST API接口3.1 创建FastAPI应用我们将使用FastAPI来封装vLLM的原始API提供更友好的REST接口。首先安装必要的依赖pip install fastapi uvicorn requests然后创建api_server.py文件from fastapi import FastAPI from pydantic import BaseModel import requests app FastAPI() class PromptRequest(BaseModel): prompt: str max_tokens: int 512 temperature: float 0.7 app.post(/generate) async def generate_text(request: PromptRequest): vllm_url http://localhost:8000/v1/completions headers {Content-Type: application/json} payload { prompt: request.prompt, max_tokens: request.max_tokens, temperature: request.temperature } response requests.post(vllm_url, jsonpayload, headersheaders) return response.json()3.2 启动API服务运行以下命令启动FastAPI服务uvicorn api_server:app --host 0.0.0.0 --port 5000现在您可以通过http://localhost:5000/generate访问封装后的API。4. Web前端集成4.1 创建简单的前端页面创建一个index.html文件!DOCTYPE html html head titlePhi-3 Mini 交互界面/title style body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; } #response { margin-top: 20px; white-space: pre-wrap; } textarea { width: 100%; height: 100px; } button { margin-top: 10px; padding: 8px 16px; } /style /head body h1Phi-3 Mini 128K Instruct/h1 textarea idprompt placeholder输入您的问题或指令.../textarea button onclickgenerateText()生成回答/button div idresponse/div script async function generateText() { const prompt document.getElementById(prompt).value; const responseDiv document.getElementById(response); responseDiv.textContent 正在生成回答...; try { const response await fetch(http://localhost:5000/generate, { method: POST, headers: { Content-Type: application/json }, body: JSON.stringify({ prompt: prompt }) }); const data await response.json(); responseDiv.textContent data.choices[0].text; } catch (error) { responseDiv.textContent 请求出错: error.message; } } /script /body /html4.2 测试完整流程确保vLLM服务运行在端口8000确保FastAPI服务运行在端口5000在浏览器中打开index.html文件输入问题并点击生成回答按钮观察模型返回的结果5. 进阶配置与优化5.1 添加身份验证为了保护API我们可以添加简单的API密钥验证。修改api_server.pyfrom fastapi import FastAPI, HTTPException, Header from pydantic import BaseModel import requests app FastAPI() API_KEY your-secret-key # 替换为实际密钥 class PromptRequest(BaseModel): prompt: str max_tokens: int 512 temperature: float 0.7 app.post(/generate) async def generate_text( request: PromptRequest, authorization: str Header(None) ): if authorization ! fBearer {API_KEY}: raise HTTPException(status_code403, detailInvalid API key) vllm_url http://localhost:8000/v1/completions headers {Content-Type: application/json} payload { prompt: request.prompt, max_tokens: request.max_tokens, temperature: request.temperature } response requests.post(vllm_url, jsonpayload, headersheaders) return response.json()5.2 前端添加API密钥修改前端JavaScript代码async function generateText() { const prompt document.getElementById(prompt).value; const responseDiv document.getElementById(response); responseDiv.textContent 正在生成回答...; try { const response await fetch(http://localhost:5000/generate, { method: POST, headers: { Content-Type: application/json, Authorization: Bearer your-secret-key }, body: JSON.stringify({ prompt: prompt }) }); const data await response.json(); responseDiv.textContent data.choices[0].text; } catch (error) { responseDiv.textContent 请求出错: error.message; } }6. 总结本教程详细介绍了如何将Phi-3-mini-128k-instruct模型通过vLLM部署并使用FastAPI封装为REST接口供Web前端调用。主要内容包括模型的基本特性和优势使用Chainlit进行初步测试和验证通过FastAPI创建RESTful接口开发简单的前端交互界面添加安全措施和优化建议这套方案可以轻松扩展到其他应用场景如构建知识问答系统开发智能客服接口创建内容生成工具搭建编程辅助平台通过本教程您应该已经掌握了将大型语言模型集成到Web应用中的完整流程。下一步可以尝试添加更多功能如对话历史记录、多轮对话支持或更复杂的前端界面。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。