请在宿主服务器上安装 cuda
的驱动程序
镜像准备
下载 image 此镜像(可以 docker pull
或者直接下载 .tar
文件),注意选择的 ARCH
为 amd64
若下载
.tar
文件,则需要使用docker load -i
导入镜像
镜像下载成功后,我们需要开启 sudo
权限,在 /etc/docker/daemon.json
中添加如下内容(若不存在此文件,请先创建):
{ "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }}
容器建立
运行:
docker run --restart=on-failure --runtime=nvidia --network host -it {{your image name}} /usr/bin/sh
{{your image name}}
处填写镜像名称,使用 docker image ls
可查看,注意需要名称加标签,例如:
写作:nvidia/cuda:11.7.0-cudnn8-devel-ubuntu20.04
进入容器后,执行命令:
nvidia-smi
出现类似下面的表格即成功创建了可运行 cuda
的 docker
容器
构建准备
安装必要的软件,当然首先进行换源:
sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
然后安装软件:
apt updateapt upgradeapt updateapt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential libedit-dev libxml2-dev libssl-dev unzip vim wget git
安装 cmake-3.20
wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0.tar.gztar -xvf cmake-3.20.0.tar.gzcd cmake-3.20.0./configuremake -j6make installcp bin/cmake /usr/bin/
完成后输入 cmake -version
查看是否成功
安装 llvm
运行
apt install lsb-release wget software-properties-common gnupg -ywget https://apt.llvm.org/llvm.shchmod u+x llvm.sh./llvm.sh 14 all
构建 tvm
-
从
github
下载Terminal window git clone --recursive https://github.com/apache/tvm tvm -
下载
.zip
然后从本机传到docker
上解压在本机上,输入:
docker cp ~/tvm.zip {{container id}}:/root/
, 在{{container id}}
处输入容器的id
,可以用docker ps -a
进行查看然后进入
docker
中,输入Terminal window unzip tvm.zip
得到一个 tvm
文件夹,随后进行构建
注意,请下载
v0.1.0
版本,否则可能无法正常运行
unzip tvm.zipcd tvmmkdir -p buildcd buildcp ../cmake/config.cmake buildvim config.cmake # 在这里修改配置
修改内容如下:
将set(USE_CUDA OFF)
改为 set(USE_CUDA ON)
启用 CUDA
后端,如果要使用例如 OpenGL
则启用对应的即可
LLVM
将 set(USE_LLVM OFF)
改为 set(USE_LLVM ON)
IR
调试,设置 set(USE_RELAY_DEBUG ON)
,同时设置环境变量 TVM_LOG_DEBUG
export TVM_LOG_DEBUG="ir/transform.cc=1,relay/ir/transform.cc=1"
然后:
source .bashrc
重启环境
然后在 build
文件夹中,输入:
cmake -DCMAKE_BUILD_TYPE=Debug ..make -j8
等待 libtvm.so
的构建完成
对于 python
包的构建,设置环境变量 PYTHONPATH
来告诉 python 在哪里找到这个库。例如,假设我们在 /path/to/tvm
目录下克隆了 tvm
,那么我们可以在 ~/.bashrc
中添加以下一行。一旦你拉出代码并重建项目,这些变化将立即反映出来(不需要再次调用 setup
)
export TVM_HOME=/path/to/tvmexport PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
然后在 tvm
目录下输入:
cd python; python3 setup.py install --user; cd ..
Bug
在安装
scipy
时会报错,要求python
版本大于等于3.9
,所以有些包需要我们手动安装:Terminal window
apt install -y pippip uninstall numpypip install "numpy<=1.23" decorator attrs psutil 'xgboost<1.6.0' cloudpickle ml_dtypes scipy
启用 C++
测试(注意这里在 ~
目录下进行)
git clone https://github.com/google/googletestcd googletestmkdir buildcd buildcmake -DBUILD_SHARED_LIBS=ON ..make -j6make install
然后在 tvm
目录下,运行:
make cpptest -j6
进行构建,若无报错则已构建成功
测试
参见 编译PyTorch模型 测试
运行命令
pip install torch==1.7.0pip install torchvision==0.8.1
然后创建一个 torch_test.py
,内容如下:
import timeimport tvmfrom tvm import relayimport numpy as npfrom tvm.contrib.download import download_testdataimport torchimport torchvisionfrom scipy.special import softmax# device = torch.device("cpu")model_name = "resnet18"model = getattr(torchvision.models, model_name)(pretrained=True)model = model.eval()
# We grab the TorchScripted model via tracinginput_shape = [1, 3, 224, 224]input_data = torch.randn(input_shape)scripted_model = torch.jit.trace(model, input_data).eval()
from PIL import Image
img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"img_path = download_testdata(img_url, "cat.png", module="data")print(img_path)img = Image.open(img_path).resize((224, 224))
# Preprocess the image and convert to tensorfrom torchvision import transforms
my_preprocess = transforms.Compose( [ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ])img = my_preprocess(img)img = np.expand_dims(img, 0)
####################################################################### Import the graph to Relay# -------------------------# Convert PyTorch graph to Relay graph. The input name can be arbitrary.input_name = "input0"shape_list = [(input_name, img.shape)]mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
####################################################################### Relay Build# -----------# Compile the graph to llvm target with given input specification.target = "llvm"target_host = "llvm"dev = tvm.cpu(0)with tvm.transform.PassContext(opt_level=7): lib = relay.build(mod, target=target, target_host=target_host, params=params)
####################################################################### Execute the portable graph on TVM# ---------------------------------# Now we can try deploying the compiled model on target.from tvm.contrib import graph_executor
m = graph_executor.GraphModule(lib["default"](dev))
tvm_time_spent=[]torch_time_spent=[]n_warmup=5n_time=10# tvm_t0 = time.process_time()for i in range(n_warmup+n_time): dtype = "float32" # Set inputs m.set_input(input_name, tvm.nd.array(img.astype(dtype))) tvm_t0 = time.time() # Execute m.run() # Get outputs tvm_output = m.get_output(0) tvm_time_spent.append(time.time() - tvm_t0)# tvm_t1 = time.process_time()
###################################################################### Look up synset name# -------------------# Look up prediction top 1 index in 1000 class synset.synset_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_synsets.txt", ])synset_name = "imagenet_synsets.txt"synset_path = download_testdata(synset_url, synset_name, module="data")with open(synset_path) as f: synsets = f.readlines()
synsets = [x.strip() for x in synsets]splits = [line.split(" ") for line in synsets]key_to_classname = {spl[0]: " ".join(spl[1:]) for spl in splits}
class_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_classes.txt", ])class_name = "imagenet_classes.txt"class_path = download_testdata(class_url, class_name, module="data")with open(class_path) as f: class_id_to_key = f.readlines()
class_id_to_key = [x.strip() for x in class_id_to_key]
# Get top-1 result for TVMtop1_tvm = np.argmax(tvm_output.asnumpy()[0])tvm_class_key = class_id_to_key[top1_tvm]
# Convert input to PyTorch variable and get PyTorch result for comparison# torch_t0 = time.process_time()# torch.set_num_threads(1)for i in range(n_warmup+n_time): with torch.no_grad(): torch_img = torch.from_numpy(img) torch_t0 = time.time() output = model(torch_img) torch_time_spent.append(time.time() - torch_t0) # Get top-1 result for PyTorch top1_torch = np.argmax(output.numpy()) torch_class_key = class_id_to_key[top1_torch]# torch_t1 = time.process_time()
# tvm_time = tvm_t1 - tvm_t0# torch_time = torch_t1 - torch_t0tvm_time = np.mean(tvm_time_spent[n_warmup:]) * 1000torch_time = np.mean(torch_time_spent[n_warmup:]) * 1000tvm_output_prob = softmax(tvm_output.asnumpy())output_prob = softmax(output.numpy())print("Relay top-1 id: {}, class name: {}, class probality: {}".format(top1_tvm, key_to_classname[tvm_class_key], tvm_output_prob[0][top1_tvm]))print("Torch top-1 id: {}, class name: {}, class probality: {}".format(top1_torch, key_to_classname[torch_class_key], output_prob[0][top1_torch]))print('Relay time(ms): {:.3f}'.format(tvm_time))print('Torch time(ms): {:.3f}'.format(torch_time))
注意,在文档中给出了 import set_env
set_env
用于在本次运行代码前添加如下函数用于设置 Python 临时环境:
def set_env(num, current_path='.'): ''' num 表示相对于 current_path 的父级根目录级别 ''' import sys from pathlib import Path
ROOT = Path(current_path).resolve().parents[num] sys.path.extend([str(ROOT/'src')]) # 设置 `tvm_book` 环境 from tvm_book.config.env import set_tvm # 设置 TVM 环境 set_tvm(TVM_ROOT)
set_tvm
需要自行配置以适配设备
如果使用 cuda
的话,代码为:
import timeimport tvmfrom tvm import relayimport numpy as npfrom tvm.contrib.download import download_testdataimport torchimport torchvisionfrom scipy.special import softmax# device = torch.device("cpu")model_name = "resnet18"model = getattr(torchvision.models, model_name)(pretrained=True)model = model.eval()
# We grab the TorchScripted model via tracinginput_shape = [1, 3, 224, 224]input_data = torch.randn(input_shape)scripted_model = torch.jit.trace(model, input_data).eval()
from PIL import Image
img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"img_path = download_testdata(img_url, "cat.png", module="data")print(img_path)img = Image.open(img_path).resize((224, 224))
# Preprocess the image and convert to tensorfrom torchvision import transforms
my_preprocess = transforms.Compose( [ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ])img = my_preprocess(img)img = np.expand_dims(img, 0)
####################################################################### Import the graph to Relay# -------------------------# Convert PyTorch graph to Relay graph. The input name can be arbitrary.input_name = "input0"shape_list = [(input_name, img.shape)]mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
####################################################################### Relay Build# -----------target = "cuda"target_host = "llvm"dev = tvm.gpu(0)with tvm.transform.PassContext(opt_level=7): lib = relay.build(mod, target=target, target_host=target_host, params=params)
####################################################################### Execute the portable graph on TVM# ---------------------------------# Now we can try deploying the compiled model on target.from tvm.contrib import graph_executor
m = graph_executor.GraphModule(lib["default"](dev))
tvm_time_spent=[]torch_time_spent=[]n_warmup=5n_time=10# tvm_t0 = time.process_time()for i in range(n_warmup+n_time): dtype = "float32" # Set inputs m.set_input(input_name, tvm.nd.array(img.astype(dtype))) tvm_t0 = time.time() # Execute m.run() # Get outputs tvm_output = m.get_output(0) tvm_time_spent.append(time.time() - tvm_t0)# tvm_t1 = time.process_time()
###################################################################### Look up synset name# -------------------# Look up prediction top 1 index in 1000 class synset.synset_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_synsets.txt", ])synset_name = "imagenet_synsets.txt"synset_path = download_testdata(synset_url, synset_name, module="data")with open(synset_path) as f: synsets = f.readlines()
synsets = [x.strip() for x in synsets]splits = [line.split(" ") for line in synsets]key_to_classname = {spl[0]: " ".join(spl[1:]) for spl in splits}
class_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_classes.txt", ])class_name = "imagenet_classes.txt"class_path = download_testdata(class_url, class_name, module="data")with open(class_path) as f: class_id_to_key = f.readlines()
class_id_to_key = [x.strip() for x in class_id_to_key]
# Get top-1 result for TVMtop1_tvm = np.argmax(tvm_output.asnumpy()[0])tvm_class_key = class_id_to_key[top1_tvm]
# Convert input to PyTorch variable and get PyTorch result for comparison# torch_t0 = time.process_time()# torch.set_num_threads(1)for i in range(n_warmup+n_time): with torch.no_grad(): torch_img = torch.from_numpy(img) torch_t0 = time.time() output = model(torch_img) torch_time_spent.append(time.time() - torch_t0) # Get top-1 result for PyTorch top1_torch = np.argmax(output.numpy()) torch_class_key = class_id_to_key[top1_torch]# torch_t1 = time.process_time()
# tvm_time = tvm_t1 - tvm_t0# torch_time = torch_t1 - torch_t0tvm_time = np.mean(tvm_time_spent[n_warmup:]) * 1000torch_time = np.mean(torch_time_spent[n_warmup:]) * 1000tvm_output_prob = softmax(tvm_output.asnumpy())output_prob = softmax(output.numpy())print("Relay top-1 id: {}, class name: {}, class probality: {}".format(top1_tvm, key_to_classname[tvm_class_key], tvm_output_prob[0][top1_tvm]))print("Torch top-1 id: {}, class name: {}, class probality: {}".format(top1_torch, key_to_classname[torch_class_key], output_prob[0][top1_torch]))print('Relay time(ms): {:.3f}'.format(tvm_time))print('Torch time(ms): {:.3f}'.format(torch_time))
运行代码即可
运行会出现很多日志,暂时还没找到消除的方法,这个日志应该是由于开启了
USE_RELAY_DEBUG
的原因
运行成功的截图如下:
如果第二次运行出现
Module Not Found: "tvm" is not found
这种错误,请重新安装tvm
的python
环境