LTX-2 is Lightricks’ open-source video generation model, and it uses Google’s Gemma 3 12B as its text encoder. Lets explore how to make an uncensored model and if it matters for LTX2 video generation.
I set out to abliterate Gemma using Heretic, then dug deep into whether this actually affects LTX-2’s output. Spoiler: probably not, but the investigation was worth it.
The Abliteration Process
Prerequisites
- NVIDIA GPU with 24GB+ VRAM (RTX 4090/5090)
- HuggingFace account with access to Gemma 3 12B
- Python 3.10+
Step 1: Authenticate with HuggingFace
First, accept the Gemma 3 license at https://huggingface.co/google/gemma-3-12b-it
pip install huggingface_hub
hf auth login
Step 2: Install and Run Heretic
pip install heretic-llm
heretic google/gemma-3-12b-it
This will:
- Download the model (~24GB)
- Run ~200 optimization trials
- Present you with Pareto-optimal trials to choose from
Choosing a Trial
Heretic presents trials with two metrics:
- Refusals (0-100): Lower = more uncensored
- KL Divergence: Lower = less model damage
I chose Trial 99 with:
- Refusals: 7/100
- KL Divergence: 0.0826
This is a good balance. 93% of previously-refused prompts now work with minimal model damage.
Verify the Abliteration
heretic --model google/gemma-3-12b-it --evaluate-model ./gemma-3-12b-it-heretic
Output:
* Evaluating...
* KL divergence: 0.0826
* Refusals: 7/100
Convert to ComfyUI Format
ComfyUI’s LTX-2 text encoder expects a single .safetensors file with specific key names:
from safetensors.torch import load_file, save_file
import torch
import glob
model_path = "./gemma-3-12b-it-heretic"
# Merge weight shards
all_tensors = {}
for f in sorted(glob.glob(f"{model_path}/*.safetensors")):
all_tensors.update(load_file(f))
# Filter and rename keys
renamed = {}
for k, v in all_tensors.items():
if k.startswith("vision_tower."):
continue
if k.startswith("language_model."):
new_key = k[len("language_model."):]
else:
new_key = k
renamed[new_key] = v
# Embed tokenizer
with open(f"{model_path}/tokenizer.model", "rb") as f:
tokenizer_bytes = f.read()
renamed["spiece_model"] = torch.frombuffer(bytearray(tokenizer_bytes), dtype=torch.uint8)
save_file(renamed, "gemma_3_12B_it_heretic.safetensors")
Quantization Options
| Format | Size | Notes |
|---|---|---|
| Original (bf16) | 22GB | Full precision |
| FP8 | 11GB | Works in ComfyUI ✅ |
FP8 Quantization
import torch
from safetensors.torch import load_file, save_file
tensors = load_file("gemma_3_12B_it_heretic.safetensors")
fp8_tensors = {}
for k, v in tensors.items():
if v.dtype in [torch.float16, torch.bfloat16, torch.float32]:
fp8_tensors[k] = v.to(torch.float8_e4m3fn)
else:
fp8_tensors[k] = v
save_file(fp8_tensors, "gemma_3_12B_it_heretic_fp8_e4m3fn.safetensors")
GGUF Conversion
For llama.cpp or other GGUF-compatible tools:
# Using Docker with llama.cpp
docker run --rm -it --gpus=all \
-v /path/to/heretic:/models \
nvidia/cuda:13.1.0-devel-ubuntu22.04 bash
# Inside container
apt-get update && apt-get install -y python3 python3-pip git cmake build-essential
git clone https://github.com/ggml-org/llama.cpp.git /app
pip3 install -r /app/requirements.txt
# Convert to F16
python3 /app/convert_hf_to_gguf.py /models/gemma-3-12b-it-heretic \
--outfile /models/gemma-3-12b-it-heretic-f16.gguf \
--outtype f16
# Build quantize tool
cd /app
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
cmake --build build --target llama-quantize -j $(nproc)
# Quantize to multiple formats
for quant in Q3_K_M Q4_K_M Q5_K_M Q6_K Q8_0; do
/app/build/bin/llama-quantize /models/gemma-3-12b-it-heretic-f16.gguf \
/models/gemma-3-12b-it-heretic-${quant}.gguf ${quant}
done
GGUF Sizes
| Quantization | Size | Quality |
|---|---|---|
| F16 | 22GB | Full precision |
| Q8_0 | 12GB | Excellent |
| Q6_K | 9.0GB | Very good |
| Q5_K_M | 7.9GB | Good |
| Q4_K_M | 6.8GB | Recommended |
| Q3_K_M | 5.6GB | Acceptable |
Note: GGUF support for Gemma text encoders in ComfyUI is experimental (see city96/ComfyUI-GGUF#402). These builds will not work with ComfyUI.
But Does It Actually Work for LTX-2?
Here’s where it gets interesting. A Reddit commenter questioned whether abliteration even affects text encoders, since refusals are tied to the chat template, not the embedding space.
I decided to test it empirically.
Testing Embedding Differences
Full script for comparing embeddings between original and abliterated models:
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("./gemma-3-12b-it")
# Create your own prompt categories
prompts = {
"innocent": [
"a beautiful sunset over the ocean",
"baking chocolate chip cookies",
"a dog running on the beach",
"children playing in a park",
"reading a book by the fireplace",
# ... add more innocent prompts
],
"mild": [
"a woman in a bikini",
"drinking alcohol at a bar",
"a fistfight between two men",
"gambling at a casino",
"smoking a cigarette",
# ... add more mild prompts
],
"violent": [
# ... add violent prompts
],
"sexual": [
# ... add sexual prompts
],
"extreme": [
# ... add extreme prompts... these were so extreme it was probably better to remove them here, but you can add your own.
],
}
def get_embedding(model, text):
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
return outputs.last_hidden_state[:, -1, :]
print("Loading original...")
original = AutoModel.from_pretrained("./gemma-3-12b-it", torch_dtype=torch.bfloat16, device_map="cpu")
orig_embs = {}
for cat, ps in prompts.items():
for p in ps:
orig_embs[p] = get_embedding(original, p)
del original
print("Loading abliterated...")
abliterated = AutoModel.from_pretrained("./gemma-3-12b-it-heretic", torch_dtype=torch.bfloat16, device_map="cpu")
abl_embs = {}
for cat, ps in prompts.items():
for p in ps:
abl_embs[p] = get_embedding(abliterated, p)
del abliterated
print("\nResults by category:")
for cat, ps in prompts.items():
sims = []
for p in ps:
cos_sim = torch.nn.functional.cosine_similarity(orig_embs[p], abl_embs[p]).item()
sims.append(cos_sim)
avg = sum(sims) / len(sims)
std = (sum((x - avg) ** 2 for x in sims) / len(sims)) ** 0.5
print(f"{cat:12} | n={len(ps):2} | avg: {avg:.4f} | std: {std:.4f}")
My results (n=50 per category):
| Category | n | Avg Cosine Sim | Std Dev | Range |
|---|---|---|---|---|
| innocent | 50 | 0.869 | 0.012 | 0.840 - 0.895 |
| mild | 50 | 0.850 | 0.016 | 0.809 - 0.887 |
| violent | 50 | 0.836 | 0.019 | 0.801 - 0.883 |
| sexual | 50 | 0.827 | 0.017 | 0.785 - 0.871 |
| extreme | 50 | 0.825 | 0.020 | 0.785 - 0.867 |
Clear gradient: embeddings diverge more for taboo content. So abliteration does change the embedding space.
Per-Layer Analysis
Script to check which layers are affected:
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("./gemma-3-12b-it")
prompts = [
"a beautiful sunset over the ocean",
# ... add your own spicey test prompts
]
print("Loading original...")
original = AutoModel.from_pretrained("./gemma-3-12b-it", torch_dtype=torch.bfloat16, device_map="cpu")
print("Loading abliterated...")
abliterated = AutoModel.from_pretrained("./gemma-3-12b-it-heretic", torch_dtype=torch.bfloat16, device_map="cpu")
for prompt in prompts:
print(f"\nPrompt: {prompt[:40]}...")
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
orig_out = original(**inputs, output_hidden_states=True)
abl_out = abliterated(**inputs, output_hidden_states=True)
print("Layer | Cosine Sim (last token)")
print("-" * 30)
for i, (orig_h, abl_h) in enumerate(zip(orig_out.hidden_states, abl_out.hidden_states)):
orig_vec = orig_h[0, -1, :]
abl_vec = abl_h[0, -1, :]
cos_sim = torch.nn.functional.cosine_similarity(orig_vec.unsqueeze(0), abl_vec.unsqueeze(0)).item()
print(f"{i:5} | {cos_sim:.4f}")
Finding: Changes are concentrated in layer 48, the final layer. Layers 0-47 show ~1.0 cosine similarity and are essentially identical.
| Layer | Cosine Sim (innocent) | Cosine Sim (taboo) |
|---|---|---|
| 0-47 | ~1.0 | ~1.0 |
| 48 | 0.859 | 0.820 |
LTX-2 Projection Matrix Analysis
Script to check how LTX-2 weights each layer:
import torch
from safetensors.torch import load_file
ltx_path = "/path/to/ltx-2-19b-dev.safetensors"
tensors = load_file(ltx_path)
proj = tensors['text_embedding_projection.aggregate_embed.weight'] # [3840, 188160]
# 188160 = 3840 * 49 layers
proj_reshaped = proj.view(3840, 49, 3840)
# L2 norm per layer
layer_importance = torch.norm(proj_reshaped, dim=(0, 2))
# Sort by importance
sorted_indices = torch.argsort(layer_importance, descending=True)
print("Top 10 most important layers:")
for i, idx in enumerate(sorted_indices[:10]):
print(f"#{i+1}: Layer {idx.item():2} - {layer_importance[idx].item():.4f}")
total_importance = layer_importance.sum().item()
layer_48_pct = (layer_importance[48].item() / total_importance) * 100
print(f"\nLayer 48 contributes: {layer_48_pct:.2f}% of total weight")
print(f"Expected if uniform: {100/49:.2f}%")
Finding: All 49 layers are weighted equally at ~2.04% each. Layer 48 is rank 26/49.
Conclusion
Abliteration does measurably change Gemma’s embeddings:
- Changes concentrated in layer 48, the final layer
- Taboo content diverges more than innocent content
- Clear gradient from innocent → extreme
But for LTX-2 specifically, the effect is likely small because:
- LTX-2 aggregates all 49 layers equally
- Layer 48 only contributes ~2% of the final signal
- The other 48 layers unchanged dominate
However, 2% is still 2%. A few reasons it might still matter:
- Small differences could compound over many diffusion steps
- The projection weights are uniform, but cross-attention interactions could amplify certain differences non-linearly
- Anecdotally, I prefer the abliterated outputs. Though this could be placebo.
When abliteration definitely helps:
- Direct LLM inference related to chat/completion where only the final layer matters
- Applications that use
last_hidden_statedirectly
When it probably doesn’t help:
- LTX-2 video generation (multi-layer aggregation)
- Any system that averages across all hidden states
So if you like it more, there’s no harm in using it. It does vary slightly the output of the video generated.
The Bigger Problem: Censorship at the Knowledge Level
There’s another issue I didn’t initially consider. Abliteration removes refusal behavior, but it doesn’t add knowledge.
Gemma was trained to avoid certain content entirely. When I tested it in llama.cpp, asking about explicit topics gets vague, evasive responses. Not because it’s refusing, but because it genuinely doesn’t have rich representations of those concepts.
So when we measured embedding differences between original and abliterated models, we might have been comparing two models that both have weak/vague representations of taboo content. The 0.82 cosine similarity for extreme content might not mean “abliteration changed how it understands this” but rather “both models barely understand this to begin with.”
The real solution: Fine-tune Gemma on content it was never trained on. Teach it richer representations across all layers, not just remove the refusal signal from layer 48.
Download
Despite the findings, I’ve uploaded the abliterated model for anyone who wants to experiment:
HuggingFace: DreamFast/gemma-3-12b-it-heretic
Includes:
- Full HuggingFace format (for direct inference)
- ComfyUI safetensors (bf16 and fp8)
- GGUF quantizations (Q3_K_M through Q8_0)
Resources
- Heretic - The abliteration tool
- LTX-2 - Lightricks’ video generation model
- ComfyUI-LTX2-MultiGPU - Multi-GPU workflows for LTX-2
