Skip to content

TIME003: accept CMIP7 sub-daily time-range tokens#46

Open
JanStreffing wants to merge 1 commit into
ESGF:masterfrom
JanStreffing:fix/time003-cmip7-subhourly-tokens
Open

TIME003: accept CMIP7 sub-daily time-range tokens#46
JanStreffing wants to merge 1 commit into
ESGF:masterfrom
JanStreffing:fix/time003-cmip7-subhourly-tokens

Conversation

@JanStreffing

Copy link
Copy Markdown
Contributor

the regex in _extract_time_range_from_filename only accepts 6 or 8 digit tokens (YYYYMM or YYYYMMDD). CMIP7 DRS allows more: YYYY (yr), YYYYMM (mon), YYYYMMDD (day), YYYYMMDDhh (6hr/3hr point), YYYYMMDDhhmm (sub-hourly point), YYYYMMDDhhmmss (sub-min). spec-compliant sub-daily files get flagged "no time range token found in filename".

example failure from a CMIP7 AWI-ESM3-4-2-veg-HR run:

[TIME003] No time range token found in filename, but frequency='6hr' requires a time range

filename was psl_tpt-u-hxy-u_6hr_glb_g113_AWI-ESM3-4-2-veg-HR_piControl_r1i1p1f1_185001010900-185112312100.nc — the token is 12-digit YYYYMMDDhhmm per CMIP7 DRS, but the regex bails before reaching the coverage check.

looks like sync drift between two regexes in the same package: the sister check check_time_squareness.py already accepts \d{4,14}, only this one is stuck on \d{6}|\d{8}.

this PR:

  • extends _TIME_RANGE_RE to all six valid lengths (4/6/8/10/12/14)
  • _tuple_from_datestr returns the right precision tuple for each
  • _coverage_from_time always returns the full (Y,M,D,h,m,s) and the caller truncates to the filename's precision so comparisons line up
  • accepts the optional -clim suffix for climatology files
  • error messages mention the new layouts

verified the regex on representative cases including _185001010900-185112312100, _18501231235959-18501231235959, _1850-1859, and the climatology suffix _185001-185012-clim. tested against a real HR CMIP7 6hr file: token parses to (1851,1,1,9,0) and (1851,12,31,21,0) as expected.

regex only accepted 6 or 8 digit tokens. CMIP7 DRS uses 4 (yr), 6 (mon),
8 (day), 10 (6hr/3hr point), 12 (sub-hourly point), 14 (sub-min point).
files like *_185001010900-185112312100.nc were flagged "no time range
token" even though the filename was spec-compliant.

extend the regex to all six lengths, return precision so coverage is
compared at the right granularity, accept the optional -clim suffix.
@JanStreffing

Copy link
Copy Markdown
Contributor Author

I put this on draft. I'm not sure about this one anymore, since the pattern may be legal, but the way in which in time mid-points are computed, I think needs another pass by me.

@JanStreffing

JanStreffing commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Reopening with a tighter scope.

On the pycmor side I have worked around this locally by encoding tokens at whatever precision the current _TIME_RANGE_RE (\d{6}|\d{8}) accepts, minimally distorting in each direction: yearly files expand UP to 6-digit YYYYMM-YYYYMM (naturally 4-digit), and sub-daily files clamp DOWN to 8-digit YYYYMMDD-YYYYMMDD (naturally 12-digit, since we always split yearly we keep day precision). Both pass TIME003 today and we do not strictly need this PR for our files. That is the "today" position.

The underlying gap stands. CMIP7 DRS allows 4/6/8/10/12/14-digit time-range tokens (yr / mon / day / 6hr-3hr / sub-hourly point / sub-min), and the current regex hard-caps at 8. To check this against reality I sampled 2000 files from /pool/data/CMIP6/data/CMIP on DKRZ Levante: 615 daily files use 8-digit tokens, 326 monthly use 6-digit / 10y spans, 265 yearly use 4-digit, 256 sub-daily use 12-digit, plus monthly files with 100y and 165y spans. Three of the four lengths in active CMIP6 use (4 / 8 / 12) sit outside the current regex window. Under TIME003 today a CMIP7 archive that mirrors CMIP6 practice would have ~75% of files falsely flagged.

@JanStreffing

Copy link
Copy Markdown
Contributor Author

Note, this is no longer blocking for me. If you can confirm you want it this way that is quite different from CMIP6, this may be closed. If the narrower range was accidental, then continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant