Skip to content

Refactor VLM detection in studio#70

Open
danielhanchen wants to merge 6 commits into
mainfrom
pr-5245-head
Open

Refactor VLM detection in studio#70
danielhanchen wants to merge 6 commits into
mainfrom
pr-5245-head

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Staging mirror of unslothai#5245

Original PR: unslothai#5245
Author: Datta0

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Fixes : unslothai#5239. This seems to have not been an issue.

  • Add gemma4 as VLM to studio models list
  • Refactor is VLM checks to single place
  • Add a fallback to check vision config for the same.

This PR tracks the moving review branch (pr-5245-head). Iteration fix commits land here directly. Review-added tests are in a separate PR.

Changed files:

  • studio/backend/utils/models/model_config.py
  • studio/backend/tests/test_vision_cache.py

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Vision Language Model (VLM) detection logic by centralizing it into a new _is_vlm helper function and introducing a fallback mechanism, _raw_config_has_vision_config, which inspects the config.json directly if subprocess checks fail. It also adds support for gemma4 models and includes unit tests for these changes. Feedback was provided to use a more idiomatic Python approach for checking model type prefixes in the detection logic.

or hasattr(config, "image_token_index")
or (
model_type is not None
and any(model_type.startswith(vlm_type) for vlm_type in _VLM_MODEL_TYPES)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The startswith method in Python can accept a tuple of strings directly. Using model_type.startswith(tuple(_VLM_MODEL_TYPES)) would be more efficient and idiomatic than using a generator expression with any().

Suggested change
and any(model_type.startswith(vlm_type) for vlm_type in _VLM_MODEL_TYPES)
and model_type.startswith(tuple(_VLM_MODEL_TYPES))

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Vision Language Model (VLM) detection logic by centralizing the identification criteria into a reusable _is_vlm helper function and introducing a fallback mechanism that inspects the raw config.json file when standard methods fail. It also adds support for gemma4 models and includes corresponding unit tests. Review feedback suggests improving the robustness of configuration checks by explicitly verifying that attributes are not None and handling empty configuration dictionaries more effectively.

)
)
config = json.loads(config_path.read_text())
return "vision_config" in config and bool(config["vision_config"])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of bool(config["vision_config"]) might incorrectly return False if a model uses a default vision configuration represented as an empty dictionary {} in the config.json. A safer check is to verify that the key exists and its value is not None.

Suggested change
return "vision_config" in config and bool(config["vision_config"])
return config.get("vision_config") is not None

Comment on lines +513 to +515
or hasattr(config, "vision_config")
or hasattr(config, "img_processor")
or hasattr(config, "image_token_index")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Using hasattr on an AutoConfig object can be misleading because it returns True even if the attribute value is None. While this preserves the previous behavior, it's generally safer to check if the attribute exists and is not None to avoid false positives for models that might have these attributes initialized to None.

Suggested change
or hasattr(config, "vision_config")
or hasattr(config, "img_processor")
or hasattr(config, "image_token_index")
or getattr(config, "vision_config", None) is not None
or getattr(config, "img_processor", None) is not None
or getattr(config, "image_token_index", None) is not None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Studio doesn't detect gemma4-E4B as vision model

2 participants