Refactor VLM detection in studio by danielhanchen · Pull Request #70 · unslothai/unsloth-staging-1

danielhanchen · 2026-05-06T12:18:32Z

Staging mirror of unslothai#5245

Original PR: unslothai#5245
Author: Datta0

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

~~Fixes : unslothai#5239~~. This seems to have not been an issue.

~~Add gemma4 as VLM to studio models list~~
Refactor is VLM checks to single place
Add a fallback to check vision config for the same.

This PR tracks the moving review branch (pr-5245-head). Iteration fix commits land here directly. Review-added tests are in a separate PR.

Changed files:

studio/backend/utils/models/model_config.py
studio/backend/tests/test_vision_cache.py

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-06T12:21:07Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the Vision Language Model (VLM) detection logic by centralizing it into a new _is_vlm helper function and introducing a fallback mechanism, _raw_config_has_vision_config, which inspects the config.json directly if subprocess checks fail. It also adds support for gemma4 models and includes unit tests for these changes. Feedback was provided to use a more idiomatic Python approach for checking model type prefixes in the detection logic.

gemini-code-assist · 2026-05-06T12:26:12Z

+        or hasattr(config, "image_token_index")
+        or (
+            model_type is not None
+            and any(model_type.startswith(vlm_type) for vlm_type in _VLM_MODEL_TYPES)


The startswith method in Python can accept a tuple of strings directly. Using model_type.startswith(tuple(_VLM_MODEL_TYPES)) would be more efficient and idiomatic than using a generator expression with any().

Suggested change

and any(model_type.startswith(vlm_type) for vlm_type in _VLM_MODEL_TYPES)

and model_type.startswith(tuple(_VLM_MODEL_TYPES))

danielhanchen · 2026-05-06T13:02:48Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the Vision Language Model (VLM) detection logic by centralizing the identification criteria into a reusable _is_vlm helper function and introducing a fallback mechanism that inspects the raw config.json file when standard methods fail. It also adds support for gemma4 models and includes corresponding unit tests. Review feedback suggests improving the robustness of configuration checks by explicitly verifying that attributes are not None and handling empty configuration dictionaries more effectively.

gemini-code-assist · 2026-05-06T13:08:28Z

+                )
+            )
+        config = json.loads(config_path.read_text())
+        return "vision_config" in config and bool(config["vision_config"])


The use of bool(config["vision_config"]) might incorrectly return False if a model uses a default vision configuration represented as an empty dictionary {} in the config.json. A safer check is to verify that the key exists and its value is not None.

Suggested change

return "vision_config" in config and bool(config["vision_config"])

return config.get("vision_config") is not None

gemini-code-assist · 2026-05-06T13:08:28Z

+        or hasattr(config, "vision_config")
+        or hasattr(config, "img_processor")
+        or hasattr(config, "image_token_index")


Using hasattr on an AutoConfig object can be misleading because it returns True even if the attribute value is None. While this preserves the previous behavior, it's generally safer to check if the attribute exists and is not None to avoid false positives for models that might have these attributes initialized to None.

Suggested change

or hasattr(config, "vision_config")

or hasattr(config, "img_processor")

or hasattr(config, "image_token_index")

or getattr(config, "vision_config", None) is not None

or getattr(config, "img_processor", None) is not None

or getattr(config, "image_token_index", None) is not None

Datta0 and others added 6 commits May 1, 2026 09:52

Gemam4 as vision model for studio

f86bbcf

cleanup

d1b23e1

Simplify and add fallback onto config

71ffe2b

consolidate VLM checks

4ba6a25

[pre-commit.ci] auto fixes from pre-commit.com hooks

be47403

for more information, see https://pre-commit.ci

Merge branch 'main' into studio_gemma4_vlm

9e55909

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

danielhanchen force-pushed the main branch 3 times, most recently from e128c6f to 1555c15 Compare May 18, 2026 03:46

danielhanchen force-pushed the main branch from 9f47625 to b9dd7cf Compare June 7, 2026 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor VLM detection in studio#70

Refactor VLM detection in studio#70
danielhanchen wants to merge 6 commits into
mainfrom
pr-5245-head

danielhanchen commented May 6, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	and any(model_type.startswith(vlm_type) for vlm_type in _VLM_MODEL_TYPES)
	and model_type.startswith(tuple(_VLM_MODEL_TYPES))

	return "vision_config" in config and bool(config["vision_config"])
	return config.get("vision_config") is not None

Conversation

danielhanchen commented May 6, 2026

Staging mirror of unslothai#5245

Original description

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants