feat(gemma4): add Gemma 4 27B (MoE) LoRA recipe by aniruddh-alt · Pull Request #2492 · oumi-ai/oumi

aniruddh-alt · 2026-06-04T17:15:00Z

Add Gemma 4 27B (MoE) LoRA recipe

Adds a LoRA SFT recipe for Gemma 4 27B (google/gemma-4-26B-A4B-it — Mixture-of-Experts, 26.5B total / ~4B active, image+text) under configs/recipes/gemma4/sft/27b_lora/. Builds on #2479 (now merged), which added the lora_exclude_modules support these recipes rely on.

What's in this PR

configs/recipes/gemma4/sft/27b_lora/train.yaml — FSDP (FULL_SHARD) LoRA SFT on alpaca-cleaned; same text-transformer scoping (lora_exclude_modules: .*vision_tower.*, .*multi_modal_projector.*) and transformer_layer_cls: Gemma4TextDecoderLayer as the 31B recipe.
configs/recipes/gemma4/sft/27b_lora/gcp_job.yaml — SkyPilot GCP job (A100:8, FSDP via oumi distributed torchrun).
configs/recipes/gemma4/README.md — mark 27B as "LoRA config available" + launch example.

Validation

Validated end-to-end in oumi's OSS environment (torch 2.10.0+cu128, transformers 5.7.0, peft 0.19.1, trl 1.4.0) on H100s with FSDP FULL_SHARD. google/gemma-4-26B-A4B-it loads and LoRA trains to completion — 9,292,800 trainable params (~0.035%, attention projections only, see the MoE note) — loss descending to <0.3, adapter saved (TRAIN_DONE rc=0).

Pointing the recipe's LoRA setup at a task's training split (in place of the shipped alpaca-cleaned default) gives a real downstream gain on the MoE, via oumi's NATIVE engine:

Task	Base	+ LoRA
pubmedqa (n=100)	57.0%	76.0%

MoE note: the standard gate_proj/up_proj/down_proj targets do not match this model's fused expert MLPs, so LoRA currently adapts the attention projections only. Adapting the experts would need their specific module names — follow-up; the recipe comment documents this.

Eval note: native HF evaluation of this MoE OOMs on long prompts — the default batched_mm expert kernel copies expert weights per token-expert pair (~25.6 GiB for a ~900-token prompt at batch 1, independent of device_map sharding). Short-prompt tasks (pubmedqa) evaluate fine; long-prompt ones (e.g. banking77's 77-label prompt) don't. The grouped_mm kernel avoids this; wiring it through for this nested MoE is a follow-up. Training is unaffected.

Related issues

N/A — config-only addition. Builds on #2479 (merged).

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

…urate transformers/MoE notes)

gitar-bot · 2026-06-05T19:17:53Z

Gitar is working

_Gitar

README LoRA prose claimed the recipes exclude .*audio_tower.*, but the Larger image+text models (31B/27B) have no audio tower and exclude .*multi_modal_projector.* — generalize the prose to cover both families. Remove ddp_find_unused_parameters from 27b_lora/train.yaml: it is a no-op under FSDP (which this recipe always enables; distributed.py routes the flag only to the DDP wrapper) and its comment was misleading. Reword the header exclusion rationale to match the e4b sibling (Gemma4ClippableLinear).

Gemma 4 is under the Gemma Terms of Use and gated on HF, not apache-2.0/ungated. Match the rest of the repo's wording. Same liberate-bot fix as the sibling 31B PR.

Base automatically changed from gemma4-oumi-onboarding to main June 4, 2026 17:39

aniruddh-alt added 2 commits June 5, 2026 12:16

feat(gemma4): add Gemma 4 27B MoE LoRA recipe (editable-install test)

c6769b0

fix(gemma4): finalize 27B MoE LoRA recipe (released install form, acc…

38eb807

…urate transformers/MoE notes)

aniruddh-alt force-pushed the gemma4-27b-lora branch from c1925c4 to 38eb807 Compare June 5, 2026 19:17

aniruddh-alt added 2 commits June 5, 2026 14:06

fix(gemma4): correct 27B LoRA license note (Gemma ToU, gated)

bb80994

Gemma 4 is under the Gemma Terms of Use and gated on HF, not apache-2.0/ungated. Match the rest of the repo's wording. Same liberate-bot fix as the sibling 31B PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gemma4): add Gemma 4 27B (MoE) LoRA recipe#2492

feat(gemma4): add Gemma 4 27B (MoE) LoRA recipe#2492
aniruddh-alt wants to merge 4 commits into
mainfrom
gemma4-27b-lora

aniruddh-alt commented Jun 4, 2026 •

edited

Loading

Uh oh!

gitar-bot Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aniruddh-alt commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Gemma 4 27B (MoE) LoRA recipe

What's in this PR

Validation

Related issues

Before submitting

Uh oh!

gitar-bot Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aniruddh-alt commented Jun 4, 2026 •

edited

Loading