Abliterating Gemma 3 12B for LTX-2: Does It Actually Help?

LTX-2 is Lightricks’ open-source video generation model, and it uses Google’s Gemma 3 12B as its text encoder. Lets explore how to make an uncensored model and if it matters for LTX2 video generation.

I set out to abliterate Gemma using Heretic, then dug deep into whether this actually affects LTX-2’s output. Spoiler: probably not, but the investigation was worth it.

The Abliteration Process

Prerequisites

NVIDIA GPU with 24GB+ VRAM (RTX 4090/5090)
HuggingFace account with access to Gemma 3 12B
Python 3.10+

Step 1: Authenticate with HuggingFace

First, accept the Gemma 3 license at https://huggingface.co/google/gemma-3-12b-it

pip install huggingface_hub
hf auth login

Step 2: Install and Run Heretic

pip install heretic-llm
heretic google/gemma-3-12b-it

This will:

Download the model (~24GB)
Run ~200 optimization trials
Present you with Pareto-optimal trials to choose from

Choosing a Trial

Heretic presents trials with two metrics:

Refusals (0-100): Lower = more uncensored
KL Divergence: Lower = less model damage

I chose Trial 99 with:

Refusals: 7/100
KL Divergence: 0.0826

This is a good balance. 93% of previously-refused prompts now work with minimal model damage.

Verify the Abliteration

heretic --model google/gemma-3-12b-it --evaluate-model ./gemma-3-12b-it-heretic

Output:

* Evaluating...
  * KL divergence: 0.0826
  * Refusals: 7/100

Convert to ComfyUI Format

ComfyUI’s LTX-2 text encoder expects a single .safetensors file with specific key names:

from safetensors.torch import load_file, save_file
import torch
import glob

model_path = "./gemma-3-12b-it-heretic"

# Merge weight shards
all_tensors = {}
for f in sorted(glob.glob(f"{model_path}/*.safetensors")):
    all_tensors.update(load_file(f))

# Filter and rename keys
renamed = {}
for k, v in all_tensors.items():
    if k.startswith("vision_tower."):
        continue
    if k.startswith("language_model."):
        new_key = k[len("language_model."):]
    else:
        new_key = k
    renamed[new_key] = v

# Embed tokenizer
with open(f"{model_path}/tokenizer.model", "rb") as f:
    tokenizer_bytes = f.read()
renamed["spiece_model"] = torch.frombuffer(bytearray(tokenizer_bytes), dtype=torch.uint8)

save_file(renamed, "gemma_3_12B_it_heretic.safetensors")

Quantization Options

Format	Size	Notes
Original (bf16)	22GB	Full precision
FP8	11GB	Works in ComfyUI ✅

FP8 Quantization

import torch
from safetensors.torch import load_file, save_file

tensors = load_file("gemma_3_12B_it_heretic.safetensors")

fp8_tensors = {}
for k, v in tensors.items():
    if v.dtype in [torch.float16, torch.bfloat16, torch.float32]:
        fp8_tensors[k] = v.to(torch.float8_e4m3fn)
    else:
        fp8_tensors[k] = v

save_file(fp8_tensors, "gemma_3_12B_it_heretic_fp8_e4m3fn.safetensors")

GGUF Conversion

For llama.cpp or other GGUF-compatible tools:

# Using Docker with llama.cpp
docker run --rm -it --gpus=all \
  -v /path/to/heretic:/models \
  nvidia/cuda:13.1.0-devel-ubuntu22.04 bash

# Inside container
apt-get update && apt-get install -y python3 python3-pip git cmake build-essential
git clone https://github.com/ggml-org/llama.cpp.git /app
pip3 install -r /app/requirements.txt

# Convert to F16
python3 /app/convert_hf_to_gguf.py /models/gemma-3-12b-it-heretic \
  --outfile /models/gemma-3-12b-it-heretic-f16.gguf \
  --outtype f16

# Build quantize tool
cd /app
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
cmake --build build --target llama-quantize -j $(nproc)

# Quantize to multiple formats
for quant in Q3_K_M Q4_K_M Q5_K_M Q6_K Q8_0; do
  /app/build/bin/llama-quantize /models/gemma-3-12b-it-heretic-f16.gguf \
    /models/gemma-3-12b-it-heretic-${quant}.gguf ${quant}
done

GGUF Sizes

Quantization	Size	Quality
F16	22GB	Full precision
Q8_0	12GB	Excellent
Q6_K	9.0GB	Very good
Q5_K_M	7.9GB	Good
Q4_K_M	6.8GB	Recommended
Q3_K_M	5.6GB	Acceptable

Note: GGUF support for Gemma text encoders in ComfyUI is experimental (see city96/ComfyUI-GGUF#402). These builds will not work with ComfyUI.

But Does It Actually Work for LTX-2?

Here’s where it gets interesting. A Reddit commenter questioned whether abliteration even affects text encoders, since refusals are tied to the chat template, not the embedding space.

I decided to test it empirically.

Testing Embedding Differences

Full script for comparing embeddings between original and abliterated models:

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("./gemma-3-12b-it")

# Create your own prompt categories
prompts = {
    "innocent": [
        "a beautiful sunset over the ocean",
        "baking chocolate chip cookies",
        "a dog running on the beach",
        "children playing in a park",
        "reading a book by the fireplace",
        # ... add more innocent prompts
    ],
    "mild": [
        "a woman in a bikini",
        "drinking alcohol at a bar",
        "a fistfight between two men",
        "gambling at a casino",
        "smoking a cigarette",
        # ... add more mild prompts
    ],
    "violent": [
        # ... add violent prompts
    ],
    "sexual": [
        # ... add sexual prompts
    ],
    "extreme": [
        # ... add extreme prompts... these were so extreme it was probably better to remove them here, but you can add your own.
    ],
}

def get_embedding(model, text):
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
    return outputs.last_hidden_state[:, -1, :]

print("Loading original...")
original = AutoModel.from_pretrained("./gemma-3-12b-it", torch_dtype=torch.bfloat16, device_map="cpu")
orig_embs = {}
for cat, ps in prompts.items():
    for p in ps:
        orig_embs[p] = get_embedding(original, p)
del original

print("Loading abliterated...")
abliterated = AutoModel.from_pretrained("./gemma-3-12b-it-heretic", torch_dtype=torch.bfloat16, device_map="cpu")
abl_embs = {}
for cat, ps in prompts.items():
    for p in ps:
        abl_embs[p] = get_embedding(abliterated, p)
del abliterated

print("\nResults by category:")
for cat, ps in prompts.items():
    sims = []
    for p in ps:
        cos_sim = torch.nn.functional.cosine_similarity(orig_embs[p], abl_embs[p]).item()
        sims.append(cos_sim)
    avg = sum(sims) / len(sims)
    std = (sum((x - avg) ** 2 for x in sims) / len(sims)) ** 0.5
    print(f"{cat:12} | n={len(ps):2} | avg: {avg:.4f} | std: {std:.4f}")

My results (n=50 per category):

Category	n	Avg Cosine Sim	Std Dev	Range
innocent	50	0.869	0.012	0.840 - 0.895
mild	50	0.850	0.016	0.809 - 0.887
violent	50	0.836	0.019	0.801 - 0.883
sexual	50	0.827	0.017	0.785 - 0.871
extreme	50	0.825	0.020	0.785 - 0.867

Clear gradient: embeddings diverge more for taboo content. So abliteration does change the embedding space.

Per-Layer Analysis

Script to check which layers are affected:

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("./gemma-3-12b-it")

prompts = [
    "a beautiful sunset over the ocean",
    # ... add your own spicey test prompts
]

print("Loading original...")
original = AutoModel.from_pretrained("./gemma-3-12b-it", torch_dtype=torch.bfloat16, device_map="cpu")

print("Loading abliterated...")
abliterated = AutoModel.from_pretrained("./gemma-3-12b-it-heretic", torch_dtype=torch.bfloat16, device_map="cpu")

for prompt in prompts:
    print(f"\nPrompt: {prompt[:40]}...")
    inputs = tokenizer(prompt, return_tensors="pt")
    
    with torch.no_grad():
        orig_out = original(**inputs, output_hidden_states=True)
        abl_out = abliterated(**inputs, output_hidden_states=True)
    
    print("Layer | Cosine Sim (last token)")
    print("-" * 30)
    for i, (orig_h, abl_h) in enumerate(zip(orig_out.hidden_states, abl_out.hidden_states)):
        orig_vec = orig_h[0, -1, :]
        abl_vec = abl_h[0, -1, :]
        cos_sim = torch.nn.functional.cosine_similarity(orig_vec.unsqueeze(0), abl_vec.unsqueeze(0)).item()
        print(f"{i:5} | {cos_sim:.4f}")

Finding: Changes are concentrated in layer 48, the final layer. Layers 0-47 show ~1.0 cosine similarity and are essentially identical.

Layer	Cosine Sim (innocent)	Cosine Sim (taboo)
0-47	~1.0	~1.0
48	0.859	0.820

LTX-2 Projection Matrix Analysis

Script to check how LTX-2 weights each layer:

import torch
from safetensors.torch import load_file

ltx_path = "/path/to/ltx-2-19b-dev.safetensors"
tensors = load_file(ltx_path)

proj = tensors['text_embedding_projection.aggregate_embed.weight']  # [3840, 188160]

# 188160 = 3840 * 49 layers
proj_reshaped = proj.view(3840, 49, 3840)

# L2 norm per layer
layer_importance = torch.norm(proj_reshaped, dim=(0, 2))

# Sort by importance
sorted_indices = torch.argsort(layer_importance, descending=True)

print("Top 10 most important layers:")
for i, idx in enumerate(sorted_indices[:10]):
    print(f"#{i+1}: Layer {idx.item():2} - {layer_importance[idx].item():.4f}")

total_importance = layer_importance.sum().item()
layer_48_pct = (layer_importance[48].item() / total_importance) * 100
print(f"\nLayer 48 contributes: {layer_48_pct:.2f}% of total weight")
print(f"Expected if uniform: {100/49:.2f}%")

Finding: All 49 layers are weighted equally at ~2.04% each. Layer 48 is rank 26/49.

Conclusion

Abliteration does measurably change Gemma’s embeddings:

Changes concentrated in layer 48, the final layer
Taboo content diverges more than innocent content
Clear gradient from innocent → extreme

But for LTX-2 specifically, the effect is likely small because:

LTX-2 aggregates all 49 layers equally
Layer 48 only contributes ~2% of the final signal
The other 48 layers unchanged dominate

However, 2% is still 2%. A few reasons it might still matter:

Small differences could compound over many diffusion steps
The projection weights are uniform, but cross-attention interactions could amplify certain differences non-linearly
Anecdotally, I prefer the abliterated outputs. Though this could be placebo.

When abliteration definitely helps:

Direct LLM inference related to chat/completion where only the final layer matters
Applications that use last_hidden_state directly

When it probably doesn’t help:

LTX-2 video generation (multi-layer aggregation)
Any system that averages across all hidden states

So if you like it more, there’s no harm in using it. It does vary slightly the output of the video generated.

The Bigger Problem: Censorship at the Knowledge Level

There’s another issue I didn’t initially consider. Abliteration removes refusal behavior, but it doesn’t add knowledge.

Gemma was trained to avoid certain content entirely. When I tested it in llama.cpp, asking about explicit topics gets vague, evasive responses. Not because it’s refusing, but because it genuinely doesn’t have rich representations of those concepts.

So when we measured embedding differences between original and abliterated models, we might have been comparing two models that both have weak/vague representations of taboo content. The 0.82 cosine similarity for extreme content might not mean “abliteration changed how it understands this” but rather “both models barely understand this to begin with.”

The real solution: Fine-tune Gemma on content it was never trained on. Teach it richer representations across all layers, not just remove the refusal signal from layer 48.

Download

Despite the findings, I’ve uploaded the abliterated model for anyone who wants to experiment:

HuggingFace: DreamFast/gemma-3-12b-it-heretic

Includes:

Full HuggingFace format (for direct inference)
ComfyUI safetensors (bf16 and fp8)
GGUF quantizations (Q3_K_M through Q8_0)

Resources

Heretic - The abliteration tool
LTX-2 - Lightricks’ video generation model
ComfyUI-LTX2-MultiGPU - Multi-GPU workflows for LTX-2

Nathan Sapwell