Content-Length: 224918 | pFad | http://github.com/EricGuo5513/momask-codes/issues/98

18 Question about the training pipeline of R_Transformer · Issue #98 · EricGuo5513/momask-codes · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the training pipeline of R_Transformer #98

Open
1311894932 opened this issue Dec 29, 2024 · 3 comments
Open

Question about the training pipeline of R_Transformer #98

1311894932 opened this issue Dec 29, 2024 · 3 comments

Comments

@1311894932
Copy link

1311894932 commented Dec 29, 2024

Hello Dear Authors, thanks for sharing the code of MoMask! you're so great !!! but i have a question...

In paper, it says: " All the tokens in the preceding layers are summed as the token embeddings...the residual transformer is trained to predict the j-th layer tokens...We also share the parameters of the j-th prediction layer and the (j + 1)-th motion token embedding layer"

In code of class ResidualTransformer, the function forward:
logits = self.trans_forward(history_sum, active_q_layers, cond_vector, ~non_pad_mask, force_mask) # 64,49,512
logits = self.output_project(logits, active_q_layers-1)
, I cant understand, it looks like we have got the j-th prediction(the first logits) throuth the sum of (0,j-1)-th layers, what does the self.output_project mean?

i konw, we use a new (motion_idx-->motion embedding) mapping in function process_embed_proj_weight:
self.output_proj_weight = torch.cat([self.embed_proj_shared_weight, self.output_proj_weight_], dim=0)
self.token_embed_weight = torch.cat([self.token_embed_weight_, self.embed_proj_shared_weight], dim=0)
, andin self.output_project:
output_proj_weight = self.output_proj_weight[qids]
output = torch.einsum('bnc, bcs->bns', output_proj_weight, logits)
i know we use the new mapping to represent the "qids", but what does the second line mean...

Any response will be appreciated !!!

@EricGuo5513
Copy link
Owner

EricGuo5513 commented Dec 29, 2024 via email

@1311894932
Copy link
Author

thank you Dr.Guo, thanks for your reply!!!!

@1311894932
Copy link
Author

1311894932 commented Dec 30, 2024

Hi, the logits returned by self.trans_forward is not in the dimension of (B, L, Vocab_size), the self.output_project is a linear layer which projects the logistics into the discrete distribution space.

Thanks again and Happy new year! I finally understand the projection during your reply and T2M-GPT pipeline. But I still cannot understand why we can share part of embedding parameter:
self.output_proj_weight = torch.cat([self.embed_proj_shared_weight, self.output_proj_weight_], dim=0)
self.token_embed_weight = torch.cat([self.token_embed_weight_, self.embed_proj_shared_weight], dim=0)
In T2M-GPT, it uses two irrelevant parameters:
self.tok_emb = nn.Embedding(num_vq + 2, embed_dim)
self.head = nn.Linear(embed_dim, num_vq + 1, bias=False)
I can feel that, maybe because they are both the represent of the same motion? So 6 layers in total, and 2-5 be shared? but why we can use the representation to do:
output = torch.einsum('bnc, bcs->bns', output_proj_weight, logits) # 64,513,512 * 64,512,49 --> 64,513,49
i'm confused...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/EricGuo5513/momask-codes/issues/98

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy