-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Support intern-s1 #14875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support intern-s1 #14875
Conversation
name = name.replace(r".lambda_2", r".ls2") | ||
name = name.replace(r".layernorm_before.", r".norm1.") | ||
name = name.replace(r".layernorm_after.", r".norm2.") | ||
return name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use self.map_tensor_name(name)
with proper mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this name remapping is only for intern-s1, so I believe this is better to put the logic in a function and mapping it the internvlchat model weight names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please at least add the vision_tower
/vision_model
mappings to gguf-py/gguf/tensor_mapping.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. will update later.
The |
@CISC hi, could you tell how to fix this error? Seems not reasonable to me
|
Running llama.cpp/convert_hf_to_gguf.py Lines 3002 to 3005 in 5eba3e3
|
@@ -2998,7 +2999,12 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter | |||
@ModelBase.register("InternVisionModel") | |||
class InternVisionModel(MmprojModel): | |||
def set_gguf_parameters(self): | |||
if isinstance(self.hparams_vision['image_size'], list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(self.hparams_vision['image_size'], list): | |
assert self.hparams_vision is not None | |
if isinstance(self.hparams_vision['image_size'], list): |
names_map = { | ||
"model.multi_modal_projector.layer_norm.bias": "mlp1.0.bias", | ||
"model.multi_modal_projector.layer_norm.weight": "mlp1.0.weight", | ||
"model.multi_modal_projector.linear_1.bias": "mlp1.1.bias", | ||
"model.multi_modal_projector.linear_1.weight": "mlp1.1.weight", | ||
"model.multi_modal_projector.linear_2.bias": "mlp1.3.bias", | ||
"model.multi_modal_projector.linear_2.weight": "mlp1.3.weight", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok I think the mapping of 6 tensors can't be added to tensor_mapping.py
, as it will mess up conversion for other models. So it's ok to keep these 6 tensors here for now.
But one thing I'm not sure, why mapped name you are using is mlp1.%d.%s
? I think it should be mm.model.mlp.%d.%s
to match the original InternVL model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just maps interns1 weight name to internvl weight name
llama.cpp/gguf-py/gguf/tensor_mapping.py
Line 1078 in 00131d6
"mlp1.{bid}", # InternVL |
gguf-py/gguf/tensor_mapping.py
Outdated
@@ -1190,6 +1205,7 @@ class TensorNameMap: | |||
|
|||
MODEL_TENSOR.V_MM_INP_NORM: ( | |||
"multi_modal_projector.norm", | |||
"model.multi_modal_projector.layer_norm", # Intern-S1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed as the norm is already mapped to mlp.0
. This is an artifact from InternVL: https://huggingface.co/OpenGVLab/InternVL3-8B-Instruct/blob/a34d3e4e129a5856abfd6aa6de79776484caa14e/modeling_internvl_chat.py#L79
"model.multi_modal_projector.layer_norm", # Intern-S1 |
Support internlm/Intern-S1