-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the training pipeline of R_Transformer #98
Comments
Hi, the logits returned by self.trans_forward is not in the dimension of
(B, L, Vocab_size), the self.output_project is a linear layer which
projects the logistics into the discrete distribution space.
…On Sun, 29 Dec 2024 at 08:27, 1311894932 ***@***.***> wrote:
Hello Dear Authors, thanks for sharing the code of MoMask! you're so great
!!! but i have a question...
In paper, it says: " All the tokens in the preceding layers are summed as
the token embeddings...the residual transformer is trained to predict the
j-th layer tokens...We also share the parameters of the j-th prediction
layer and the (j + 1)-th motion token embedding layer"
In code of class ResidualTransformer, the function forward:
logits = self.trans_forward(history_sum, active_q_layers, cond_vector,
~non_pad_mask, force_mask) # 64,49,512
logits = self.output_project(logits, active_q_layers-1)
I cant understand, it looks like we have got the j-th prediction(the first
logits) throuth the sum of (0,j-1)-th layers, what does the
self.output_project mean?
i konw, we use a new (motion_idx-->motion embedding) projection in
self.token_embed_weight and self.output_proj_weight, in self.output_project:
output_proj_weight = self.output_proj_weight[qids]
output = torch.einsum('bnc, bcs->bns', output_proj_weight, logits)
i know we use the new projection to represent the "qids", but what does
the second line mean...
Any response will be appreciated !!!
—
Reply to this email directly, view it on GitHub
<#98>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKRYNB3EZE2BGAYREPCNTGL2H72CPAVCNFSM6AAAAABUKZZNSOVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43DEMRVGE4DKMY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
thank you Dr.Guo, thanks for your reply!!!! |
Thanks again and Happy new year! I finally understand the projection during your reply and T2M-GPT pipeline. But I still cannot understand why we can share part of embedding parameter: |
Hello Dear Authors, thanks for sharing the code of MoMask! you're so great !!! but i have a question...
In paper, it says: " All the tokens in the preceding layers are summed as the token embeddings...the residual transformer is trained to predict the j-th layer tokens...We also share the parameters of the j-th prediction layer and the (j + 1)-th motion token embedding layer"
In code of class ResidualTransformer, the function forward:
logits = self.trans_forward(history_sum, active_q_layers, cond_vector, ~non_pad_mask, force_mask) # 64,49,512
logits = self.output_project(logits, active_q_layers-1)
, I cant understand, it looks like we have got the j-th prediction(the first logits) throuth the sum of (0,j-1)-th layers, what does the self.output_project mean?
i konw, we use a new (motion_idx-->motion embedding) mapping in function process_embed_proj_weight:
self.output_proj_weight = torch.cat([self.embed_proj_shared_weight, self.output_proj_weight_], dim=0)
self.token_embed_weight = torch.cat([self.token_embed_weight_, self.embed_proj_shared_weight], dim=0)
, andin self.output_project:
output_proj_weight = self.output_proj_weight[qids]
output = torch.einsum('bnc, bcs->bns', output_proj_weight, logits)
i know we use the new mapping to represent the "qids", but what does the second line mean...
Any response will be appreciated !!!
The text was updated successfully, but these errors were encountered: