Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HELP: No matching distribution found for torch==2.1.0+cu121 error while install maga_transformer with .whl in release 0.2.0 #91

Open
HuXinjing opened this issue Jul 21, 2024 · 7 comments

Comments

@HuXinjing
Copy link

I'm preforming the instruction in Startup example, and
pip3 install -r ./open_source/deps/requirements_torch_gpu_cuda12.txt completed successfully, but when I installed maga_transformer-0.2.0+cuda121-cp310-cp310-manylinux1_x86_64.whl an ERROR occurred:
Processing ./maga_transformer-0.2.0+cuda121-cp310-cp310-manylinux1_x86_64.whl Requirement already satisfied: filelock==3.13.1 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.13.1) Requirement already satisfied: jinja2 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.1.4) Requirement already satisfied: sympy in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.13.1) Requirement already satisfied: typing-extensions in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (4.12.2) Requirement already satisfied: importlib_metadata in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (8.0.0) Requirement already satisfied: transformers==4.39.3 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (4.39.3) Requirement already satisfied: sentencepiece==0.1.99 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.1.99) Requirement already satisfied: fastapi==0.108.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.108.0) Requirement already satisfied: uvicorn==0.21.1 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.21.1) Requirement already satisfied: dacite in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.8.1) Requirement already satisfied: pynvml in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (11.5.3) Requirement already satisfied: thrift in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.20.0) Requirement already satisfied: numpy==1.24.1 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.24.1) Requirement already satisfied: psutil in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (6.0.0) Requirement already satisfied: tiktoken==0.7.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.7.0) Requirement already satisfied: lru-dict in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.3.0) Requirement already satisfied: py-spy in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.3.14) Requirement already satisfied: safetensors in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.4.3) Requirement already satisfied: cpm_kernels in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.0.11) Requirement already satisfied: pyodps in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.11.6.1) Requirement already satisfied: Pillow in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (10.4.0) Requirement already satisfied: protobuf==3.20.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.20.0) Collecting torchvision==0.16.0 (from maga-transformer==0.2.0+cuda121) Using cached torchvision-0.16.0-cp310-cp310-manylinux1_x86_64.whl.metadata (6.6 kB) Requirement already satisfied: einops in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.8.0) Requirement already satisfied: prettytable in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.10.2) Requirement already satisfied: pydantic==2.5.3 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (2.5.3) Requirement already satisfied: timm==0.9.12 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.9.12) Requirement already satisfied: sentence-transformers==2.7.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (2.7.0) Requirement already satisfied: grpcio==1.62.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.62.0) Collecting xfastertransformer_devel_icx==1.6.0.0 (from maga-transformer==0.2.0+cuda121) Using cached xfastertransformer_devel_icx-1.6.0.0-py3-none-any.whl.metadata (16 kB) Requirement already satisfied: decord==0.6.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.6.0) Requirement already satisfied: accelerate==0.25.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.25.0) INFO: pip is looking at multiple versions of maga-transformer to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement torch==2.1.0+cu121 (from maga-transformer) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1) ERROR: No matching distribution found for torch==2.1.0+cu121

@HuXinjing
Copy link
Author

HuXinjing commented Jul 21, 2024

It seems I misunderstood the cuda version maga_transformer need, my nvcc version is cuda 11.8, but run_time is 12.2.

So, will maga_transformer-0.1.9+cuda118-cp310-cp310-manylinux1_x86_64.whl become available?

@HuXinjing
Copy link
Author

I have tried both cuda11 and cuda12 image in docker, but sudo sh ./create_container.sh rtp registry.cn-hangzhou.aliyuncs.com/havenask/rtp_llm:deploy_image_cuda12 or 11 told me docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. fairly, do I need install nvidia image before preform this script?

@HuXinjing
Copy link
Author

I have tried both cuda11 and cuda12 image in docker, but sudo sh ./create_container.sh rtp registry.cn-hangzhou.aliyuncs.com/havenask/rtp_llm:deploy_image_cuda12 or 11 told me docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. fairly, do I need install nvidia image before preform this script?

I installed nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi and it work.

@HuXinjing
Copy link
Author

HuXinjing commented Jul 21, 2024

But I got
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 82, in <module> main() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 76, in main return local_rank_start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 35, in local_rank_start raise e File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start app.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start self.inference_server.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start self._inference_worker = InferenceWorker() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in __init__ self.model = ModelFactory.create_from_env() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env model = ModelFactory.from_model_config(normal_model_config, sp_model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config model = ModelFactory._create_model(model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model model = model_cls.from_config(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config return cls(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in __init__ self.load() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load self._load_weights(self.config.ref_model, device) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 205, in _load_weights database = CkptDatabase(self.config.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 196, in __init__ ckpt.set_metadata(self._load_meta(ckpt.file_name)) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 237, in _load_meta with safe_open(file, framework="pt") as f_: FileNotFoundError: No such file or directory: "/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/model.safetensors" after preforming TOKENIZER_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ CHECKPOINT_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 python3 -m maga_transformer.start_server where /LLMs/... is mounted from my host. The tokenizer and model.saftensor are link , does it mean I need copy all weights into container?

@HuXinjing
Copy link
Author

But I got Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 82, in <module> main() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 76, in main return local_rank_start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 35, in local_rank_start raise e File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start app.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start self.inference_server.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start self._inference_worker = InferenceWorker() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in __init__ self.model = ModelFactory.create_from_env() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env model = ModelFactory.from_model_config(normal_model_config, sp_model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config model = ModelFactory._create_model(model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model model = model_cls.from_config(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config return cls(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in __init__ self.load() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load self._load_weights(self.config.ref_model, device) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 205, in _load_weights database = CkptDatabase(self.config.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 196, in __init__ ckpt.set_metadata(self._load_meta(ckpt.file_name)) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 237, in _load_meta with safe_open(file, framework="pt") as f_: FileNotFoundError: No such file or directory: "/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/model.safetensors" after preforming TOKENIZER_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ CHECKPOINT_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 python3 -m maga_transformer.start_server where /LLMs/... is mounted from my host. The tokenizer and model.saftensor are link , does it mean I need copy all weights into container?

I found link in my LLMs folders, so I mounted .cache/huggingface/ and it was solved.

@HuXinjing
Copy link
Author

HuXinjing commented Jul 21, 2024

But I got another!
`[root][07/21/2024 08:51:36][start_server.py:local_rank_start():34][ERROR] start server error: module 'torch' has no attribute 'uint32', trace: multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 254, in _load_layer_weight
raise e
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 241, in _load_layer_weight
tensor = self._load_and_convert_tensor(weight, ref_model=ref_model, layer_id=layer_id)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 399, in _load_and_convert_tensor
before_merge_tensors.append(ckpt_weight.merge_fun(self.load_tensor(ckpt_weight.tensor_name(layer_id), datatype)))
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 379, in load_tensor
return self._database.load_tensor(name, datatype)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 252, in load_tensor
tensors.append(self._load(name, ckpt_file, datatype))
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 275, in _load
return f.get_tensor(name).to(datatype)
File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1833, in getattr
raise AttributeError(f"module '{name}' has no attribute '{name}'")
AttributeError: module 'torch' has no attribute 'uint32'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start
app.start()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start
self.inference_server.start()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start
self._inference_worker = InferenceWorker()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in init
self.model = ModelFactory.create_from_env()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env
model = ModelFactory.from_model_config(normal_model_config, sp_model_config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config
model = ModelFactory._create_model(model_config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model
model = model_cls.from_config(config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config
return cls(config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in init
self.load()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load
self._load_weights(self.config.ref_model, device)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 213, in _load_weights
self.weight = model_weights_loader.load_weights_from_scratch(num_process=load_parallel_num)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 61, in load_weights_from_scratch
all_results = pool.starmap(
File "/usr/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
AttributeError: module 'torch' has no attribute 'uint32'

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 254, in _load_layer_weight
raise e
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 241, in _load_layer_weight
tensor = self._load_and_convert_tensor(weight, ref_model=ref_model, layer_id=layer_id)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 399, in _load_and_convert_tensor
before_merge_tensors.append(ckpt_weight.merge_fun(self.load_tensor(ckpt_weight.tensor_name(layer_id), datatype)))
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 379, in load_tensor
return self._database.load_tensor(name, datatype)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 252, in load_tensor
tensors.append(self._load(name, ckpt_file, datatype))
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 275, in _load
return f.get_tensor(name).to(datatype)
File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1833, in getattr
raise AttributeError(f"module '{name}' has no attribute '{name}'")
AttributeError: module 'torch' has no attribute 'uint32'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 82, in
main()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 76, in main
return local_rank_start()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 35, in local_rank_start
raise e
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start
app.start()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start
self.inference_server.start()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start
self._inference_worker = InferenceWorker()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in init
self.model = ModelFactory.create_from_env()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env
model = ModelFactory.from_model_config(normal_model_config, sp_model_config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config
model = ModelFactory._create_model(model_config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model
model = model_cls.from_config(config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config
return cls(config)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in init
self.load()
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load
self._load_weights(self.config.ref_model, device)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 213, in _load_weights
self.weight = model_weights_loader.load_weights_from_scratch(num_process=load_parallel_num)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 61, in load_weights_from_scratch
all_results = pool.starmap(
File "/usr/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
AttributeError: module 'torch' has no attribute 'uint32'`

I think I have no ability to figure it out

@ySingularity
Copy link
Collaborator

could you please check where the compute type is set as uint32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants