Skip to content

Removing obsolete dataset statuses#1809

Open
ilongin wants to merge 3 commits into
mainfrom
ilongin/1801-remove-obsolete-dataset-statuses
Open

Removing obsolete dataset statuses#1809
ilongin wants to merge 3 commits into
mainfrom
ilongin/1801-remove-obsolete-dataset-statuses

Conversation

@ilongin

@ilongin ilongin commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@ilongin ilongin marked this pull request as draft June 8, 2026 22:37
@ilongin ilongin linked an issue Jun 8, 2026 that may be closed by this pull request
4 tasks
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 8, 2026

Copy link
Copy Markdown

Deploying datachain with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6674909
Status: ✅  Deploy successful!
Preview URL: https://4afb5c20.datachain-2g6.pages.dev
Branch Preview URL: https://ilongin-1801-remove-obsolete.datachain-2g6.pages.dev

View logs

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@ilongin ilongin marked this pull request as ready for review June 11, 2026 13:38
@shcheklein shcheklein requested a review from Copilot June 16, 2026 22:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes obsolete dataset status constants and updates dataset-version cleanup logic/docs to no longer reference the removed statuses.

Changes:

  • Removed PENDING and STALE from DatasetStatus.
  • Updated “final status” detection to no longer treat STALE as final.
  • Updated GC/cleanup documentation and query filters to drop STALE from “versions to clean” selection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/datachain/dataset.py Removes obsolete DatasetStatus values and updates final-status detection accordingly.
src/datachain/data_storage/metastore.py Updates dataset-version GC docstring and the SQLAlchemy predicate for selecting versions to clean.
src/datachain/catalog/catalog.py Updates cleanup API docstring to reflect the new status set.
Comments suppressed due to low confidence (1)

src/datachain/data_storage/metastore.py:350

  • If legacy PENDING/STALE statuses are still supported for GC (to avoid leaking old versions), the docstring should reflect that these statuses may be returned as eligible for cleanup; otherwise readers will assume only CREATED/FAILED/REMOVING are considered.
        - Status CREATED, FAILED where either:
          - the associated job has finished, or
          - there is no associated job (job_id is NULL) and the version is
            older than STALE_CREATED_THRESHOLD_HOURS
        - Status REMOVING: marked for deletion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/datachain/dataset.py
Comment on lines 261 to 265
class DatasetStatus:
CREATED = 1
PENDING = 2
FAILED = 3
COMPLETE = 4
STALE = 6
REMOVING = 7
Comment thread src/datachain/dataset.py
Comment on lines 369 to 373
return self.status in [
DatasetStatus.FAILED,
DatasetStatus.COMPLETE,
DatasetStatus.STALE,
DatasetStatus.REMOVING,
]
Comment on lines 1836 to 1840
dv.c.status.in_(
[
DatasetStatus.CREATED,
DatasetStatus.FAILED,
DatasetStatus.STALE,
DatasetStatus.REMOVING,
Comment on lines 1153 to 1155
Removes dataset versions that:
- Have status CREATED, FAILED, STALE, or REMOVING
- Have status CREATED, FAILED, or REMOVING
- Belong to completed/failed/canceled jobs (not running)

@dreadatour dreadatour left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove unused DatasetStatus.PENDING and DatasetStatus.STALE

4 participants