不知道有多少人像我一样,需要使用多个模型,并不是因为研究需要,而是我使用的编程语言太古早。尝试过2个开源项目,但是都没有完全用起来:
ChatAll 
ChatHub 
Byzer-llm 
 
1、部署目标 
能集成openai、智谱和通义千问
因为智谱最近送token几个月。我的业务只需要文案生成,所以不一定非要openai。 
 
 
所有的key都在外部维护不在它的平台上存放,它只是中转 
自动启动 
 
我的情况是对Python部署,对Linux不熟悉,整个过程持续了接近4天,其中还有一天通宵了,所以,如果你的水平和我差不多,看完应该能减少一些摸索的时间。
1、环境准备 公司阿里云上有2台机器,一台windows,一台Debian,尝试在Windows上部署,结果失败了,和作者沟通后发现只在mac和linux测试过,那就算了。
智谱的key现在有免费送的活动,大概是3个月有效期,通问的申请位置不太好找,在阿里云上,这名字也是有个性的。openai的代理用cf的worker。
 
1.1、安装python环境 1 2 3 wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh chmod  +x Anaconda3-2023.09-0-Linux-x86_64.sh./Anaconda3-2023.09-0-Linux-x86_64.sh 
这里有个小坑,最后按 ESC 键才能继续
 
因为我是非root环境启动的,习惯安装应用到opt目录,所以需要给richardson用户权限
1 2 3 切换到root用户 mkdir  /opt/condaprojchown  richardson:richardson /opt/condaproj
1.2、创建虚拟环境 1 2 3 回到richardson身份 cd  /opt/condaproj/~/anaconda3/bin/conda create --name Byzer-LLM python=3.10.11 anaconda 
1.3、开始安装byzer-llm 激活环境
1 2 3 4 5 6 7 8 . ~/anaconda3/bin/activate Byzer-LLM git clone  https://hub.fgit.cf/allwefantasy/byzer-llm.git cd  byzer-llmpip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ pip install -U byzerllm -i https://mirrors.aliyun.com/pypi/simple/ ray start --head  
注意,这里用的不是github的环境,而是用了一个第三方的,有可能会失效,我尝试了大概5,6个才搞定,主要是不太会用git…… 
 
这里的所谓gpu是指本地部署大模型,本地跑,我的需求是用openai这样的大模型,在byzer里面就是被称之为saas。
 
这里ray启动的时候,默认绑定在127.0.0.1,也就是说,如果这个linux没有图形化界面,外面是无法访问8265端口的。解决办法看下面的。
 
这里看起来云淡风轻的,其实花费了我很长时间,因为要安装很多依赖,而且还很大,个人推荐的做法是安装screen。简单理解就是远程桌面,一直在跑着的,我们和它的会话断开,它其实还在跑,避免跑一半死了。
 
1 2 sudo  apt install screen -yscreen 
直接回车就好了,就会新起一个会话,查看列表和恢复命令如下
1 2 screen -ls  screen -r xxxxx 
2、启动服务 1 ray start --head   --dashboard-host=172.16.225.209 
这里的ip是ecs的内网地址,目的也是安全性考虑,默认是127.0.0.1,我只是换个地址,因为还有另外的windows机器可以看,如果是nginx或者candy反代也很方便。
3、我的集成 我的需求是通过byzer来和这些大模型对接,原有的应用通过http请求,但是不再关注它的实现细节,为了更加解耦,这些大模型的key是外部传入的,而byzer提供了一个deploy的机制,所以其实只要开始的时候初始化好,后面就不用带参数了。
文件结构大概是
app.py 
llm_zhipu.py 
llm_openai.py 
llm_tongwen.py 
…… 
 
通过app来调度,整体结构就清爽了。后面的模型随时可以加。
一个坑,这里的文件名不要和系统的名词重合,一开始做了一个openai.py,结果,排查了一天,文件名改成 llm_openai.py ,好了,这玩意咋说呢!!!
 
3.1、app.py 1 2 3 4 5 6 7 8 9 10 11 12 13 from  flask import  Flask, request, jsonifyfrom  llm_zhipu import  zhipu_appfrom  llm_openai import  openai_appfrom  llm_tongwen import  tongwen_appapp = Flask(__name__) app.register_blueprint(zhipu_app) app.register_blueprint(openai_app) app.register_blueprint(tongwen_app) print (app.blueprints)if  __name__ == '__main__' :        app.run(host='0.0.0.0' ,port=8099 ) 
3.2、llm_openai.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 import  rayfrom  byzerllm.utils.client import  ByzerLLMfrom  flask import  request, jsonify,Blueprintopenai_app = Blueprint('openai' , __name__)  ray.init(address="auto" ,namespace="default" ,ignore_reinit_error=True )   llm = ByzerLLM(verbose=True ) @openai_app.route('/openai/deploy' , methods=['POST' ] def  openai_deploy ():    data = request.get_json()     api_key = data.get('api_key' )     model_name = data.get('model_name' )     model_type = data.get('model_type' )     chat_name = data.get('chat_name' )     http_proxy = data.get('http_proxy' )     workers = data.get('workers' )     llm.setup_num_workers(workers).setup_gpus_per_worker(0 )          llm.deploy(model_path="" ,                pretrained_model_type=model_type,                  udf_name=chat_name,                infer_params={                   "saas.api_key" : api_key,                   "saas.model" : model_name,                   "saas.base_url" : http_proxy,                })     return  jsonify({"ret" : "ok" }) @openai_app.route('/openai/chat' , methods=['POST' ] def  openai_chat ():    print ("Receiving openai chat request..." )     data = request.get_json()          content = data.get('content' )     chat_name = data.get('chat_name' )     v = llm.chat_oai(model=chat_name,conversations=[{ 	    "role" :"user" , 	    "content" :content, 	}])     results = v[0 ].output                      return  jsonify(results) 
我用的proxy好像没那么复杂,设置base_url就可以了,openai官方的示例里面有个proxies的设置,有点复杂了
 
3.3、llm_zhipu.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 import  rayfrom  byzerllm.utils.client import  ByzerLLMfrom  flask import  request, jsonify,Blueprintzhipu_app = Blueprint('zhipu' , __name__)  ray.init(address="auto" ,namespace="default" ,ignore_reinit_error=True )   llm = ByzerLLM(verbose=True ) @zhipu_app.route('/zhipu/deploy' , methods=['POST' ] def  zhipu_deploy ():    data = request.get_json()          api_key = data.get('api_key' )     model_name = data.get('model_name' )     chat_name = data.get('chat_name' )     model_type = data.get('model_type' )     workers = data.get('workers' )     llm.setup_num_workers(workers).setup_gpus_per_worker(0 )     llm.deploy(model_path="" ,                pretrained_model_type=model_type,                  udf_name=chat_name,                infer_params={                   "saas.api_key" : api_key,                   "saas.model" : model_name                  })     return  jsonify({"ret" : "ok" }) @zhipu_app.route('/zhipu/chat' , methods=['POST' ] def  zhipu_chat ():    print ("Receiving zhipu chat request..." )     data = request.get_json()          content = data.get('content' )     chat_name = data.get('chat_name' )     v = llm.chat_oai(model=chat_name,conversations=[{ 	    "role" :"user" , 	    "content" :content, 	}])     results = v[0 ].output     return  jsonify(results) 
3.4、启动和升级 后续如果要升级可以在byzer-llm目录下执行 
测试脚本 
1 2 3 4 5 6 7 8 curl -X POST -H "Content-Type: application/json"  -d '{      "api_key": "9d3c9fxxxxxxx",     "chat_name": "zhipu_chat",      "model_name": "glm-4",     "model_type": "saas/zhipu", 	"workers": 4 }'  http://127.0.0.1:8099/zhipu/deploy
会话1 2 3 4 curl -X POST -H "Content-Type: application/json"  -d '{      "chat_name": "zhipu_chat",  	"content": "你好,你是谁" }'  http://127.0.0.1:8099/zhipu/chat
 
 
1 2 3 4 5 6 7 8 9 2024-01-19 11:09:29,479 INFO worker.py:1489 -- Connecting to existing Ray cluster at address: 172.16.225.209:6379... 2024-01-19 11:09:29,488 INFO worker.py:1664 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265  Send to model[zhipu_chat]:['{"instruction": "", "meta": true}' ] Send to model[zhipu_chat]:['{"instruction": "你好,你是谁", "history": []}' ] (UDFWorker pid=3677479) MODEL[zhipu_chat] Init Model,It may take a while . (UDFWorker pid=3677479) MODEL[zhipu_chat] Successful to init model, time  taken:0.025364160537719727s (UDFWorker pid=3677479) MODEL[zhipu_chat] Init Model,It may take a while . (UDFWorker pid=3677479) MODEL[zhipu_chat] Successful to init model, time  taken:0.025364160537719727s 你好,我是一个人工智能助手,很高兴为您提供帮助。请问有什么问题我可以解答或者协助您解决吗? 
3.5、正常部署 因为没搞定uwsgi,换了unicon先顶着用
1 2 conda install gunicorn gunicorn -w 4 -b 0.0.0.0:8099 app:app  
4、失败记录 
安装依赖的时候花了很长时间,很多包是弄一半死了,即使screen也有这个情况,就重复执行命令吧 
uwsgi安装失败了,作者的建议是另外弄个conda环境,采用pip安装uwsgi
但是那个新环境也需要ray㩐包,弄了 pip install ray,pyjava, byzerllm发现不行 
还要搞点别的包,放弃了 
 
 
 
下面是失败记录
卡在最后编译环境,提示 libpython.3.10.a 文件不存在
可以安装,但是如果和Byzer-LLM一样的conda环境会提示https://discuss.ray.io/t/runtimeerror-version-mismatch-when-using-conda/13191 
1 2 3 4 5 6 RuntimeError: Version mismatch: The cluster was started with:     Ray: 2.8.0     Python: 3.10.8 This process on node 10.42.0.23 was started with:     Ray: 2.8.0     Python: 3.10.13 
总之是没搞定
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [uwsgi] http-socket  = 0.0 .0.0 :8099 base =/opt/condaproj/llm_adapterchdir =/opt/condaproj/llm_adapterpidfile  = ./uwsgi.pidwsgi-file  = app.pymaster  = true processes  = 4 threads  = 2 callable  = apppy-autoreload  = 1 enable-threads  = true logto  = /var/log/llm/%n.logvacuum  = true die-on-term  = true 
5、后续计划 byzer看起来相当牛掰,好好搞下
6、弄一下Gemini  2024.1.30自己折腾一下
下面这个代码里面为啥chat要传参数,是因为我没搞定全局变量咋设置。也尝试过用llm.sys_conf。好像也不行,先这样顶着用吧
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 from  byzerllm.utils.client import  ByzerLLMfrom  flask import  request, jsonify,Blueprintimport  requestsimport  jsongemini_app = Blueprint('gemini' , __name__)  llm = ByzerLLM(verbose=True ) gemini_key = ''  gemini_proxy = ''  gemini_model = ''  @gemini_app.route('/gemini/deploy' , methods=['POST' ] def  gemini_deploy ():    data = request.get_json()          gemini_key = data.get('api_key' )     gemini_proxy = data.get('http_proxy' )     gemini_model = data.get('model_name' )     print (gemini_model)          return  jsonify({"ret" : "ok" }) @gemini_app.route('/gemini/chat' , methods=['POST' ] def  gemini_chat ():    print ("Receiving gemini chat request..." )     data = request.get_json()          content = data.get('content' )     chat_name = data.get('chat_name' )     gemini_key = data.get('api_key' )     gemini_proxy = data.get('http_proxy' )     gemini_model = data.get('model_name' )     print (f'content: {content} ' )      headers = {'Content-Type' : 'application/json' }          datas = {     "contents" : [         {             "parts" : [                 {                     "text" : content                 }             ]         }     ],     "generationConfig" : {         "temperature" : 0.9 ,         "topK" : 1 ,         "topP" : 1 ,         "maxOutputTokens" : 2048 ,         "stopSequences" : []     },     "safetySettings" : [         {"category" : "HARM_CATEGORY_HARASSMENT" , "threshold" : "BLOCK_MEDIUM_AND_ABOVE" },         {"category" : "HARM_CATEGORY_HATE_SPEECH" , "threshold" : "BLOCK_MEDIUM_AND_ABOVE" },         {"category" : "HARM_CATEGORY_SEXUALLY_EXPLICIT" , "threshold" : "BLOCK_MEDIUM_AND_ABOVE" },         {"category" : "HARM_CATEGORY_DANGEROUS_CONTENT" , "threshold" : "BLOCK_MEDIUM_AND_ABOVE" }     ] }     print (json.dumps(datas))     response = requests.post( 	    f'{gemini_proxy} /v1beta/models/{gemini_model} :generateContent?key={gemini_key} ' , 	    headers={'Content-Type' : 'application/json' }, 	    data=json.dumps(datas) 	)     print (response.json())     results = response.json()['candidates' ][0 ]['content' ]['parts' ][0 ]['text' ]                      return  jsonify(results)