POWER 10 MMA inference whisper

Forum|Forum|1 year ago
October 2, 2024
2 replies
11 views

Tomas Kovacik
New Participant

Hi,

does anyone here have experience with https://github.com/openai/whisper, I tried to transcribe mp3 length 13:35 min on different hardware with following results :

i used LARGE MODEL :

1) AC922 GPU 1xV100 utilization (U)=30 % , transcription time (TS) = 5:38 min

2) AC922 32xP9 CPU U=90 + %, TS= 80 min

3) 1050 p311 pytorch-cpu 2.1.1 8CPU TS=28 min U = more less 4.5 cores only few threads work

4) 1050 p311 pytorch-cpu 2.1.1 12CPU TS=25 min

in 4) i used

export OPENBLAS_NUM_THREADS=12
export GOTO_NUM_THREADS=12
export OMP_NUM_THREADS=12

I found out that by incrasing the number of threads (export OPENBLAS....) I can increase CPU utilization but this does not shorten the transition time , but rather increases it

Is it possible to get closer to V100 with some tuning or this is the best which can i expect from p10 with this quite large model ?

Is it normal that with pytorch-cpu 2.1.2 py311_1 rocketce i got this message ?:

/data/miniconda3/lib/python3.11/site-packages/whisper/transcribe.py:126: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Error in cpuinfo: processor architecture is not supported in cpuinfo
Error in cpuinfo: processor architecture is not supported in cpuinfo

thanks

Tomas

------------------------------
Tomas Kovacik
Rocket Forum Shared Account
------------------------------

Suyog Jadhav
Rocketeer
Forum|Forum|1 year ago
October 4, 2024

Hi,

does anyone here have experience with https://github.com/openai/whisper, I tried to transcribe mp3 length 13:35 min on different hardware with following results :

i used LARGE MODEL :

1) AC922 GPU 1xV100 utilization (U)=30 % , transcription time (TS) = 5:38 min

2) AC922 32xP9 CPU U=90 + %, TS= 80 min

3) 1050 p311 pytorch-cpu 2.1.1 8CPU TS=28 min U = more less 4.5 cores only few threads work

4) 1050 p311 pytorch-cpu 2.1.1 12CPU TS=25 min

in 4) i used

export OPENBLAS_NUM_THREADS=12
export GOTO_NUM_THREADS=12
export OMP_NUM_THREADS=12

I found out that by incrasing the number of threads (export OPENBLAS....) I can increase CPU utilization but this does not shorten the transition time , but rather increases it

Is it possible to get closer to V100 with some tuning or this is the best which can i expect from p10 with this quite large model ?

Is it normal that with pytorch-cpu 2.1.2 py311_1 rocketce i got this message ?:

/data/miniconda3/lib/python3.11/site-packages/whisper/transcribe.py:126: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Error in cpuinfo: processor architecture is not supported in cpuinfo
Error in cpuinfo: processor architecture is not supported in cpuinfo

thanks

Tomas

------------------------------
Tomas Kovacik
Rocket Forum Shared Account
------------------------------

Hi Tomas,

Thanks for reaching out.

We don't have a direct measurement for whisper for comparison at present.

In general, some tweaks are required for getting best performance out of P10 LPAR.

Let me investigate and come back.

Could you please answer some of following questions?

How was the whisper compiled for P9 and P10? Any specific flags used?
What is the configuration of the P10 LPAR from CPU perspective?

------------------------------
Suyog Jadhav
Rocket Internal - All Brands
------------------------------

Like

T

Tomas Kovacik
Author
New Participant
Forum|Forum|1 year ago
October 4, 2024

Hi Tomas,

Thanks for reaching out.

We don't have a direct measurement for whisper for comparison at present.

In general, some tweaks are required for getting best performance out of P10 LPAR.

Let me investigate and come back.

Could you please answer some of following questions?

How was the whisper compiled for P9 and P10? Any specific flags used?
What is the configuration of the P10 LPAR from CPU perspective?

------------------------------
Suyog Jadhav
Rocket Internal - All Brands
------------------------------

Hi,

this is the example how libraries were installed on GPU environment :

conda install pytorch=2.1.2=cuda12.2_py310_1
pip install llvmlite
conda install tiktoken=0.6.0
pip install numba
pip install -U openai-whisper

Others environments were installed similarly with libraries and python version visible in the listings :

#P10 GPU

llvmlite 0.43.0 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
openai-whisper 20231117 pypi_0 pypi
pytorch 2.1.2 cuda12.2_py310_1 https://ftp.osuosl.org/pub/open-ce/current
pytorch-base 2.1.2 cuda12.2_py310_pb4.21.12_7 https://ftp.osuosl.org/pub/open-ce/current
tiktoken 0.6.0 py310ha2369f3_0 https://ftp.osuosl.org/pub/open-ce/current

python --version
Python 3.10.13

llvmlite 0.43.0 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
openai-whisper 20231117 pypi_0 pypi
pytorch 1.13.1 cpu_py310hc26b713_0
tiktoken 0.6.0 py310ha2369f3_0 https://ftp.osuosl.org/pub/open-ce/current

python --version
Python 3.10.13

llvmlite 0.43.0 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
openai-whisper 20240927 pypi_0 pypi
pytorch-base 2.1.2 cpu_py311_pb4.21.12_7 rocketce
pytorch-cpu 2.1.2 py311_1 rocketce
tiktoken 0.6.0 py311ha2369f3_0 rocketce

python --version
Python 3.11.5

P10 1050 hardware : 72 cores 40 activated

LPAR configuration was 8 or 12 CPU dedicated mode 16,32 GB RAM

thanks

Tomas

------------------------------
Tomas Kovacik
Rocket Forum Shared Account
------------------------------

Like

Recent badge winners

Sign up

Please log in or register:

Welcome to the Rocket Forum!

Please log in or register:

Scanning file for viruses.

This file cannot be downloaded