Skip to content

cmip7: prefer variant variable term over root for ATTR004 long_name#47

Open
JanStreffing wants to merge 1 commit into
ESGF:masterfrom
JanStreffing:fix/long-name-variant-fallback
Open

cmip7: prefer variant variable term over root for ATTR004 long_name#47
JanStreffing wants to merge 1 commit into
ESGF:masterfrom
JanStreffing:fix/long-name-variant-fallback

Conversation

@JanStreffing

@JanStreffing JanStreffing commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Sister fix for WCRP-CMIP/WCRP-universe#191.

Several CMIP7 variables have tile-specific or operation-specific variants in the variable data descriptor (cveggrass / cvegshrub / cvegtree, hursmin / hursmax / tasmincrop, gpp/npp/ra/rh × grass/shrub/tree, ...). All variants share the same variable_id global attr (e.g. variable_id = "cVeg" for all four cVeg variants).

The current lookup in plugins/cmip7/cmip7.py picks the bare root term:

var_terms = find_terms_in_data_descriptor(
    expression=var_id_lower,
    data_descriptor_id="variable",
    selected_term_fields=["long_name"],
)
if var_terms:
    for term in var_terms:
        if getattr(term, "id", None) == var_id_lower:
            expected_var = term
            break

After WCRP-universe#191 generalizes the root's long_name and populates per-variant long_names on cveggrass/cvegshrub/cvegtree (etc.), this plugin still needs to pick the right candidate. Picking the root means the ATTR004 long_name check emits a MEDIUM finding on every non-root variant because the file's variant-specific long_name doesn't match the generic root. ATTR004 long_name is advisory per CMIP7 guidance (see WCRP-universe#190 discussion), so this PR is accuracy of the warning rather than a publication unblock.

This PR takes the variant whose registered long_name matches the file's long_name attr, and falls back to the root term if no variant matches. Single-value comparison semantics for the check stay the same. The resolution just picks the right term.

No test infra changes. The existing ATTR004 path covers both branches.

several CMIP7 variables have tile-specific or operation-specific
variants in the WCRP-universe 'variable' data descriptor (cveggrass,
cvegshrub, cvegtree; hursmin, hursmax, tasmincrop; gpp/npp/ra/rh ×
grass/shrub/tree; ...). all variants share the same root variable_id
(e.g. variable_id="cVeg" for all four cVeg variants).

the current lookup picks the bare root term, so a file with the
per-variant long_name fails the ATTR004 registry expected-term check
against the root's long_name on every non-matching variant.

after the WCRP-universe registry fix (#190 / #191) populates per-variant
long_names and generalizes the root long_name, this plugin still needs
to pick the right candidate. take the variant whose registered
long_name matches the file's long_name, fall back to the root if no
variant matches. same single-value comparison semantics for the check,
the resolution just picks the right term.

no test infra changes; the existing ATTR004 path covers both branches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant