BMInf’s documentation!
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
Introduction
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs). It has following features:
Hardware Friendly. BMInf supports running models with more than 10 billion parameters on a single NVIDIA GTX 1060 GPU in its minimum requirements. Running with better GPUs leads to better performance. In cases where the GPU memory supports the large model inference (such as V100 or A100), BMInf still has a significant performance improvement over the existing PyTorch implementation.
Open. The parameters of models are open. Users can access large models locally with their own machines without applying or accessing an online API.
Comprehensive Ability. BMInf supports generative model CPM1 [1], general language model CPM2.1 [2], and dialogue model EVA [3]. The abilities of these models cover text completion, text generation, and dialogue generation.
Upgraded Model. Based on CPM2 [2], the newly upgraded model CPM2.1 is currently supported. Based on continual learning, the text generation ability of CPM2.1 is greatly improved compared to CPM2.
Convenient Deployment. Using BMInf, it will be fast and convenient to develop interesting downstream applications.
Supported Models
BMInf currently supports these models:
CPM2.1. CPM2.1 is an upgraded version of CPM2 [1], which is a general Chinese pre-trained language model with 11 billion parameters. Based on CPM2, CPM2.1 introduces a generative pre-training task and was trained via the continual learning paradigm. In experiments, CPM2.1 has a better generation ability than CPM2.
CPM1. CPM1 [2] is a generative Chinese pre-trained language model with 2.6 billion parameters. The architecture of CPM1 is similar to GPT [4] and it can be used in various NLP tasks such as conversation, essay generation, cloze test, and language understanding.
EVA. EVA [3] is a Chinese pre-trained dialogue model with 2.8 billion parameters. EVA performs well on many dialogue tasks, especially in the multi-turn interaction of human-bot conversations.
Besides these models, we are now working on adding more PLMs especially large-scale PLMs. We welcome every contributor to add their models to this project by proposing an issue.
Performances
Here we report the speeds of CPM2 encoder and decoder we have tested on different platforms. You can also run benchmark/cpm2/encoder.py
and benchmark/cpm2/decoder.py
to test the speed on your machine!
Implementation | GPU | Encoder Speed (tokens/s) | Decoder Speed (tokens/s) |
---|---|---|---|
BMInf | NVIDIA GeForce GTX 1060 | 718 | 4.4 |
BMInf | NVIDIA GeForce GTX 1080Ti | 1200 | 12 |
BMInf | NVIDIA GeForce GTX 2080Ti | 2275 | 19 |
BMInf | NVIDIA Tesla V100 | 2966 | 20 |
BMInf | NVIDIA Tesla A100 | 4365 | 26 |
PyTorch | NVIDIA Tesla V100 | - | 3 |
PyTorch | NVIDIA Tesla A100 | - | 7 |
Contributing
Here is the QRCode to our WeChat user community and we welcome others to contribute codes following our contributing guidelines.
License
The package is released under the Apache 2.0 License.
References
CPM-2: Large-scale Cost-efficient Pre-trained Language Models. Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, Chaojun Xiao, Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun.
CPM: A Large-scale Generative Chinese Pre-trained Language Model. Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training. Hao Zhou, Pei Ke, Zheng Zhang, Yuxian Gu, Yinhe Zheng, Chujie Zheng, Yida Wang, Chen Henry Wu, Hao Sun, Xiaocong Yang, Bosi Wen, Xiaoyan Zhu, Minlie Huang, Jie Tang.
Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
Installation
From pip (Recommended)
pip install bminf
From Source
git clone https://github.com/OpenBMB/BMInf.git
cd BMInf
python setup.py install
From Docker
docker run -it --gpus 1 -v $HOME/.cache/bigmodels:/root/.cache/bigmodels --rm openbmb/bminf python3 examples/fill_blank.py
After installation, you can run an example in the examples
folder to find if it is installed correctly.
python examples/fill_blank.py
Hardware Requirement
Here we list the minimum and recommended configurations for running BMInf.
Minimum Configuration | Recommended Configuration | |
---|---|---|
Memory | 16GB | 24GB |
GPU | NVIDIA GeForce GTX 1060 6GB | NVIDIA Tesla V100 16GB |
PCI-E | PCI-E 3.0 x16 | PCI-E 3.0 x16 |
GPUs with compute capability 6.1 or higher are supported by BMInf. Refer to the table to check whether your GPU is supported.
Software Requirement
BMInf requires CUDA version >= 10.1 and all the dependencies can be automaticlly installed by the installation process.
python >= 3.6
requests
tqdm
jieba
numpy
cpm_kernels >= 1.0.9
If you want to use the backpropagation function with PyTorch, make sure torch
is installed on your device.
About Us
BMInf is developed and maintained by OpenBMB (Open Lab for Big Model Base). OpenBMB is founded and supported by Beijing Academy of Artificial Intelligence (BAAI) and Tsinghua University.
The goal of OpenBMB is to build the model base and toolkits for large-scale pre-trained language models. We aim to accelerate the process of training, tuning, and inference for big models (with more than 10 billion parameters) and lower the barriers to use them. Based on this, we further aim to build the open-source community to promote the open-source ecosystems of pre-trained language models, build the AI infrastructure, and define the application paradigm in the intelligent era.
Our Team
Manager
![]() Zhiyuan Liu |
Member
![]() Guoyang Zeng |
![]() Chao Jia |
![]() Huadong Wang |
![]() Xu Han |
![]() Ning Ding |
![]() Zhengyan Zhang |
![]() Yujia Qin |
![]() Xiangrong Yin |
Contact Us
Join our WeChat community to contact us:
Demo Introduction
BMInf-Demos includes application examples designed according to models in BMInf. These examples are:
Fill Blank. It is a use case based on CPM2.1. It supports arbitrary input of a paragraph and can generate corresponding content in the blank according to the context.
Generate Story. It is an example based on CPM1. You only need to write the beginning of a paragraph, and it can create a coherent essay for you.
Dialogue. It is an example based on the EVA model. Here, you can talk freely with the machine.
Demonstration
Fill Blank

Generate Story

Dialogue

Install
Run the follow command after installing
nvidia-docker2
:
$ docker run -it --gpus 1 -v $HOME/.cache/bigmodels:/root/.cache/bigmodels -p 0.0.0.0:8000:8000 --rm openbmb/bminf-demos
Open your browser and visit http://localhost:8000/ to access the demo.