We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我下载了官方的ascend_mindspore镜像,并使用如下指令run了一个容器: docker run -it --name lkx_mindspore2 --ipc=host --network host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/common -v /usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/driver -v /etc/ascend_install.info:/etc/ascend_install.info -v /etc/vnpu.cfg:/etc/vnpu.cfg -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info -v /home/lkx:/home/lkx --entrypoint=/bin/bash 930poc:py310_cannRC3_mindieT65_torch_npu_dev20240929 然后我在容器中开始训练就会出现下图的结果: 具体来说就是显示一堆warning,然后就自动退出训练,这个过程中不会报任何error,因此无法查找问题,同样的环境和代码拿到宿主机环境里面训练就不会出现任何问题。我在atlas 800T和800I上试了至少10几个官方的ascend_mindspore镜像,都是这样的现象。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我下载了官方的ascend_mindspore镜像,并使用如下指令run了一个容器:
docker run -it --name lkx_mindspore2 --ipc=host --network host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/common -v /usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/driver -v /etc/ascend_install.info:/etc/ascend_install.info -v /etc/vnpu.cfg:/etc/vnpu.cfg -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info -v /home/lkx:/home/lkx --entrypoint=/bin/bash 930poc:py310_cannRC3_mindieT65_torch_npu_dev20240929
然后我在容器中开始训练就会出现下图的结果:
具体来说就是显示一堆warning,然后就自动退出训练,这个过程中不会报任何error,因此无法查找问题,同样的环境和代码拿到宿主机环境里面训练就不会出现任何问题。我在atlas 800T和800I上试了至少10几个官方的ascend_mindspore镜像,都是这样的现象。
The text was updated successfully, but these errors were encountered: