Skip to content

zer0int/CLIP-SAE-finetune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLIP finetune: SAE-informed adversarial training 💥🤖💫

  • ⚠️ This is EXPERIMENTAL code / a repo for messing with CLIP + Sparse Autoencoders (SAE)
  • For 'good, known-working' code (and more scripts + info), please see zer0int/CLIP-fine-tune!

Changes 19/DEC/2024:


🔨

  • Contains the code used to fine-tune my model HF: zer0int/CLIP-SAE-ViT-L-14 🤗
  • See the "attack" folder to obtain datasets required / used in 'a1-finetune.py'
  • Gradients will be very large throughout training. Comment out 'monitor_gradient_norms' as needed
  • Use a2 to convert GmP model back to .weight after fine-tune -> normal CLIP model (use in any 'import clip' downstream tasks)
  • Use a4 to quickly zero-shot test the 3 typographic attack test images provided

🔎

  • The attack dataset was curated via SAE
  • Selected for typographic attack salience (i.e. CLIP's 'text obsession' -> misclassifies image, as text is highly salient to model)
  • Fine-tune: Geometric Parametrization (GmP) + scaling of 'text salient' neurons top stimulating images (via SAE)
  • For details about GmP, see my other repo: zer0int/CLIP-fine-tune

🔬

  • Info: Toy Models of Superposition | Perturbing a single feature
  • Reasoning: Brute-force snap those geometric bonds, hoping to force CLIP model to find better (less text obsessed) solution 😅
  • ...Until I learn / find out what I am actually doing here (with regard to Sparse Autoencoders), at least. =)
  • Sparse Autoencoder inspiration:
  • Anthropic.AI research "Golden Gate Claude" + SAE details
  • OpenAI: Top-K activation function (replace ReLU in Sparse Autoencoders), arxiv

💡❓

  • My SAE: Encoder-Decoder, tied weights + Top-K (puzzled together from the above!)
  • Is this a good autoencoder for CLIP? I don't know. 🤔
  • Small hidden dimension + low Top-K => very sparse -> will learn concepts from CLIP that [with SAE-reconstructed embeds] retrieve images of very narrow concepts, e.g. ONLY stop signs.
  • Huge hidden dimension (e.g. 8192) -> not so sparse, accuracy drops, more (seemingly) random encoded concepts (judging via image retrieval)
  • Intermediate -> Learns complex, surprising, but meaningful concepts that are 'totally an AI-thing to encode'
  • Alas: SAE empirically shown to be 'working', but is it good? What is BEST? 🤔
  • Should I be using projection? Going 'back up' in the model with pinv? Hook into residual stream? I don't (yet) know! 🤷
  • I will publish the code for the SAE once I am more confident in that I know what I am actually doing (and cleaned up the mess of a code 😂).

🤪 For now, here's a fun concept of "things on the back of other things" in CLIP ViT-L/14 that the SAE learned:

6

Example of the effect of images the SAE had chosen as salient typographic attacks for CLIP.

8

And zero-shot results via script (4):

results-zeroshot

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy