TIME003: accept CMIP7 sub-daily time-range tokens#46
Conversation
regex only accepted 6 or 8 digit tokens. CMIP7 DRS uses 4 (yr), 6 (mon), 8 (day), 10 (6hr/3hr point), 12 (sub-hourly point), 14 (sub-min point). files like *_185001010900-185112312100.nc were flagged "no time range token" even though the filename was spec-compliant. extend the regex to all six lengths, return precision so coverage is compared at the right granularity, accept the optional -clim suffix.
|
I put this on draft. I'm not sure about this one anymore, since the pattern may be legal, but the way in which in time mid-points are computed, I think needs another pass by me. |
|
Reopening with a tighter scope. On the pycmor side I have worked around this locally by encoding tokens at whatever precision the current The underlying gap stands. CMIP7 DRS allows 4/6/8/10/12/14-digit time-range tokens (yr / mon / day / 6hr-3hr / sub-hourly point / sub-min), and the current regex hard-caps at 8. To check this against reality I sampled 2000 files from |
|
Note, this is no longer blocking for me. If you can confirm you want it this way that is quite different from CMIP6, this may be closed. If the narrower range was accidental, then continue. |
the regex in
_extract_time_range_from_filenameonly accepts 6 or 8 digit tokens (YYYYMMorYYYYMMDD). CMIP7 DRS allows more:YYYY(yr),YYYYMM(mon),YYYYMMDD(day),YYYYMMDDhh(6hr/3hr point),YYYYMMDDhhmm(sub-hourly point),YYYYMMDDhhmmss(sub-min). spec-compliant sub-daily files get flagged "no time range token found in filename".example failure from a CMIP7 AWI-ESM3-4-2-veg-HR run:
filename was
psl_tpt-u-hxy-u_6hr_glb_g113_AWI-ESM3-4-2-veg-HR_piControl_r1i1p1f1_185001010900-185112312100.nc— the token is 12-digitYYYYMMDDhhmmper CMIP7 DRS, but the regex bails before reaching the coverage check.looks like sync drift between two regexes in the same package: the sister check
check_time_squareness.pyalready accepts\d{4,14}, only this one is stuck on\d{6}|\d{8}.this PR:
_TIME_RANGE_REto all six valid lengths (4/6/8/10/12/14)_tuple_from_datestrreturns the right precision tuple for each_coverage_from_timealways returns the full(Y,M,D,h,m,s)and the caller truncates to the filename's precision so comparisons line up-climsuffix for climatology filesverified the regex on representative cases including
_185001010900-185112312100,_18501231235959-18501231235959,_1850-1859, and the climatology suffix_185001-185012-clim. tested against a real HR CMIP7 6hr file: token parses to(1851,1,1,9,0)and(1851,12,31,21,0)as expected.