Skip to content

a-b-test-on-re-importing-tf-after-gpt-baseline #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
david-thrower opened this issue Apr 13, 2025 · 0 comments
Open

a-b-test-on-re-importing-tf-after-gpt-baseline #176

david-thrower opened this issue Apr 13, 2025 · 0 comments

Comments

@david-thrower
Copy link
Owner

TLDR:

  • Deletion of the duplicative import tensorflow as tf after we train the baseline GPT model, before we train the Cerebros model, appears to lower val_binary_accuracy on the cerebros NLP model from 0.97 to 0.94.
  • We need to do 2 AB testing by re-applying this duplicative import, with and without the zero mask.
  • This is the test without the zero mask and with the duplicative import re-applied...

Details:

For some reason, all trials after we deleted the re-import of tensorflow between the GPT baseline test and the cerebros NLP trial to compare it to, the Cerebros runs without re-importing tensorflow are getting a lower val_binary_accuracy in this configuration (0.94 without re-importing tensorflow and 0.97 0n 2/3 trials with the re-import and 0.95 on the 3rd such trial). With 3 trials after deleting the re-import 2 without the zero-mask on the embedding and one with the zero mask, all getting like 0.94 val_binary_accuracy, this is probably not a spurious finding, though it could be.

There are several plausible reasons:

1. TensorFlow Session or Graph Reset

Re-importing TensorFlow between the two training tasks might reset the TensorFlow session or graph, potentially clearing any accumulated state or variables from the first model. This could lead to a "clean slate" for the second model, allowing it to train more effectively.

  • When TensorFlow is imported, it creates a default graph. If the first model is built on this graph and not properly cleared, it might interfere with the second model's construction.
  • Re-importing TensorFlow could reset the graph, eliminating any potential conflicts or memory leaks.

2. Memory Management and Garbage Collection

Re-importing TensorFlow might trigger a more thorough garbage collection or memory cleanup, which could help alleviate memory constraints.

  • The first model's memory allocation might not be fully released until the TensorFlow module is re-imported, potentially reducing memory fragmentation or other issues that could affect the second model's performance.

3. Random Seed and Initialization

The initialization of TensorFlow or its components might be affected by the re-import.

  • If the random seed is not explicitly set, re-importing TensorFlow could result in a different initialization, potentially influencing the model's performance.

4. Optimizer Initialization

The optimizer or variable initialization might be affected by the re-import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy