跳转至

Intel NeuralChat

  • 拉取模型

    ollama run neural-chat
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 89fa737d3b85 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 89fa737d3b85 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 73940af9fe02 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit f4c6a8e532e8 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.3
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 89fa737d3b85 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q4_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 73940af9fe02 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q4_1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit e94062f979ad · 4.6GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q5_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit b31c14a4bfcf · 5.0GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q5_1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit 874ed3640927 · 5.4GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q8_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 8-bit 2965939336f6 · 7.7GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q2_K
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 2-bit 620f8a81424b · 3.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q3_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit 7825c6bd7656 · 3.2GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q3_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit cf3dc2d014a5 · 3.5GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q3_K_L
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit acf7b66e2aee · 3.8GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q4_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 7d8b92a60fab · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q4_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit d2fbc68137fc · 4.4GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q5_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit d2c083566569 · 5.0GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q5_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit f36c54b7e769 · 5.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-q6_K
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 6-bit 824a1e326c89 · 5.9GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.1-fp16
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization F16 1b3251040972 · 14GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q4_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit f4c6a8e532e8 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q4_1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 7a06e87d4696 · 4.6GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q5_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit 070678810598 · 5.0GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q5_1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit 379f3f601243 · 5.4GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q8_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 8-bit 6e958fe1fd6e · 7.7GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q2_K
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 2-bit 103bcb6b16f2 · 3.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q3_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit aac750b8a41e · 3.2GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q3_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit f719737f98a2 · 3.5GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q3_K_L
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit 66750b08787d · 3.8GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q4_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 4a8f50c4525a · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q4_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 33172df918d2 · 4.4GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q5_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit 79ae554db28d · 5.0GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q5_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit 4f8475c82182 · 5.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-q6_K
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 6-bit 9c0c2ea9c046 · 5.9GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.2-fp16
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization F16 f5256ca4757e · 14GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["<|im_start|>","<|im_end|>"]} 74B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q4_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 89fa737d3b85 · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q4_1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit e081f623106e · 4.6GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q5_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit f5372fe561c2 · 5.0GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q5_1
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit b8ba685c1303 · 5.4GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q8_0
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 8-bit fc8c63cac8c7 · 7.7GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q2_K
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 2-bit 85dca14ccac6 · 3.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q3_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit 645fb17707c9 · 3.2GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q3_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit 189d5ae31c17 · 3.5GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q3_K_L
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 3-bit 5b9c964d9ac8 · 3.8GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q4_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit 07c6aaab950b · 4.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q4_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 4-bit b1a7b3b85566 · 4.4GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q5_K_S
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit fb872be705fd · 5.0GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q5_K_M
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 5-bit 6e7f6242bbec · 5.1GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-q6_K
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization 6-bit 3098763b2a1d · 5.9GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B
  • 拉取模型

    ollama run neural-chat:7b-v3.3-fp16
    
  • 模型信息 (model)

    Manifest Info Size
    model arch llama parameters 7B quantization F16 7b7e1b58d372 · 14GB
    template ### System: {{ .System }} ### User: {{ .Prompt }} ### Assistant: 67B
    params {"num_ctx":4096,"stop":["</s>","<|im_start|>","<|im_end|>"]} 91B

模型详情

该模型是在Intel Gaudi 2处理器上对7B参数的大型语言模型(LLM)进行了微调,使用的是Intel/neural-chat-7b-v3-1meta-math/MetaMathQA数据集。该模型采用直接性能优化(DPO)方法进行对齐,使用的数据集为Intel/orca_dpo_pairsIntel/neural-chat-7b-v3-1最初是从mistralai/Mistral-7B-v0.1微调而来。更多信息,请参阅博客在Intel Gaudi2上进行监督微调和直接偏好优化的实践

模型详情 描述
模型作者 - 公司 Intel。NeuralChat团队,成员来自DCAI/AISE/AIPT。核心团队成员包括:高考、梁旅、王畅、张文新、任旭辉和沈海昊。
日期 2023年12月
版本 v3-3
类型 7B大型语言模型
相关论文或资源 Medium博客
许可证 Apache 2.0
问题或评论 社区标签页Intel开发者Discord
预期用途 描述
主要预期用途 您可以使用微调后的模型进行多种语言相关任务。查看LLM排行榜了解此模型的表现。
主要预期用户 从事语言相关任务推理的任何人。
不在预期范围的用途 在大多数情况下,这个模型需要针对您的具体任务进行微调。该模型不应用于故意制造对人具有敌意或排斥的环境。

如何使用

此模型的上下文长度为:8192 token(与https://huggingface.co/mistralai/Mistral-7B-v0.1 相同)

复现模型

以下是复现模型的示例代码:GitHub示例代码。以下是复现模型构建的文档:

git clone https://github.com/intel/intel-extension-for-transformers.git
cd intel-extension-for-transformers

docker build --no-cache ./ --target hpu --build-arg REPO=https://github.com/intel/intel-extension-for-transformers.git --build-arg ITREX_VER=main -f ./intel_extension_for_transformers/neural_chat/docker/Dockerfile -t chatbot_finetuning:latest

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host chatbot_finetuning:latest

# after entering docker container
cd examples/finetuning/finetune_neuralchat_v3

我们选择最新的预训练模型mistralai/Mistral-7B-v0.1和开源数据集Open-Orca/SlimOrca来进行实验。

以下脚本使用deepspeed zero2在8张Gaudi2卡上启动训练。在finetune_neuralchat_v3.py文件中,默认设置为use_habana=True, use_lazy_mode=True, device="hpu"适用于Gaudi2。如果您想在NVIDIA GPU上运行,可以将它们设置为use_habana=False, use_lazy_mode=False, device="auto"。

deepspeed --include localhost:0,1,2,3,4,5,6,7 \
    --master_port 29501 \
    finetune_neuralchat_v3.py

合并LoRA权重:

python apply_lora.py \
    --base-model-path mistralai/Mistral-7B-v0.1 \
    --lora-model-path finetuned_model/ \
    --output-path finetuned_model_lora

使用模型

使用Transformers进行FP32推理

import transformers


model_name = 'Intel/neural-chat-7b-v3-3'
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

def generate_response(system_input, user_input):

    # Format the input using the provided template
    prompt = f"### System:\n{system_input}\n### User:\n{user_input}\n### Assistant:\n"

    # Tokenize and encode the prompt
    inputs = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False)

    # Generate a response
    outputs = model.generate(inputs, max_length=1000, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only the assistant's response
    return response.split("### Assistant:\n")[-1]


# 示例用法
system_input = "You are a math expert assistant. Your mission is to help users understand and solve various math problems. You should provide step-by-step solutions, explain reasonings and give the correct answer."
user_input = "calculate 100 + 520 + 60"
response = generate_response(system_input, user_input)
print(response)

# 预期回应
"""
为了计算100、520和60的总和,我们将按照以下步骤进行:

1. 将前两个数字相加:100 + 520
2. 将步骤1的结果与第三个数字相加:(100 + 520) + 60

步骤1:将100和520相加
100 + 520 = 620

步骤2:将步骤1的结果加上第三个数字(60)
(620) + 60 = 680

因此,100、520和60的总和是680。
"""

使用Intel扩展和Pytorch进行BF16推理

from transformers import AutoTokenizer, TextStreamer
import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
import intel_extension_for_pytorch as ipex

model_name = "Intel/neural-chat-7b-v3-3"
prompt = "Once upon a time, there existed a little girl,"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
model = ipex.optimize(model.eval(), dtype=torch.bfloat16, inplace=True, level="O1", auto_kernel_selection=True)

outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)

使用Transformers和Intel扩展进行INT4推理

from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig
model_name = "Intel/neural-chat-7b-v3-3"

# for int8, should set weight_dtype="int8"
config = WeightOnlyQuantConfig(compute_dtype="bf16", weight_dtype="int4")
prompt = "Once upon a time, there existed a little girl,"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=config)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
因素 描述
组别 关于数据集和注释的更多详情可在meta-math/MetaMathQA,项目页面 https://meta-math.github.io/,以及相关论文 https://arxiv.org/abs/2309.12284 找到。
仪器 模型的性能可能因输入的不同而变化。在这种情况下,提供的提示会极大地改变语言模型的预测结果。
环境 该模型是在Intel Gaudi 2处理器(8卡)上训练的。
卡片提示 在不同的硬件和软件上部署模型会改变模型性能。模型评估因素来自Hugging Face LLM排行榜:ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, 和 GSM8K(见下面的量化分析)。
指标 描述
模型性能指标 模型性能根据LLM排行榜上的指标与其他LLM进行评估。这些被选中是因为它们已成为LLM性能的标准。
决策阈值 没有使用决策阈值。
应对不确定性和变异性的方法 -
培训和评估数据 描述
数据集 训练数据来自meta-math/MetaMathQA,这是从GSM8k和MATH训练集中增强而来的。由于在训练期间没有包含GSM8k的测试集,因此没有污染。
动机 -
预处理 -

定量分析

Open LLM 排行榜的结果可以在此处找到:https://huggingface.co/datasets/open-llm-leaderboard/details_Intel__neural-chat-7b-v3-3。各项指标结果如下:

指标
平均值 69.83
ARC (25次射击) 66.89
HellaSwag (10次射击) 85.26
MMLU (5次射击) 63.07
TruthfulQA (0次射击) 63.01
Winogrande (5次射击) 79.64
GSM8K (5次射击) 61.11

伦理考量和限制

Neural-chat-7b-v3-3可能产生事实上不正确的输出,不应依赖于产生事实上准确的信息。由于预训练模型和微调数据集的限制,该模型可能生成下流、偏见或其他冒犯性输出。

因此,在部署任何使用neural-chat-7b-v3-3的应用程序之前,开发者应进行安全测试。

注意事项和建议

应向用户(直接用户和下游用户)明确模型的风险、偏见和限制。

以下是学习更多关于Intel AI软件的几个有用链接: * Intel Neural Compressor 链接 * Intel Transformers扩展 链接

免责声明

此模型的许可证不构成法律建议。我们对使用此模型的第三方的行为不承担责任。请在将此模型用于商业目的之前咨询律师。

Open LLM 排行榜评估结果

详细结果可以在这里找到

指标
平均值 69.83
AI2 推理挑战 (25次射击) 66.89
HellaSwag (10次射击) 85.26
MMLU (5次射击) 63.07
TruthfulQA (0次射击) 63.01
Winogrande (5次射击) 79.64
GSM8k (5次射击) 61.11