-
Notifications
You must be signed in to change notification settings - Fork 18
fix: Forward storage_options to parquet metadata reads #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 9 commits
66445eb
b5645a8
bb3b0ab
ceac55b
34d3b83
1f621c1
5df2685
48793a5
f8e717a
e33a50a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,6 +44,42 @@ def s3_tmp_path(s3_server: str, s3_bucket: str, monkeypatch: pytest.MonkeyPatch) | |
| return f"{s3_bucket}/{str(uuid.uuid4())}" | ||
|
|
||
|
|
||
| @pytest.fixture() | ||
| def s3_isolated( | ||
| s3_server: str, monkeypatch: pytest.MonkeyPatch | ||
| ) -> tuple[str, dict[str, str]]: | ||
| """A freshly-named bucket that is only reachable via the returned ``storage_options``. | ||
|
|
||
| Polars caches object stores per bucket, and these caches live Rust-side and are not | ||
| cleared by ``monkeypatch.delenv``. A bucket configured once from ``AWS_*`` env vars | ||
| (e.g. by :func:`s3_tmp_path`) therefore stays reachable without ``storage_options``, | ||
| which would let a read silently succeed even if ``storage_options`` was dropped. A | ||
| unique bucket has no such cached store, so reaching it requires forwarding | ||
| ``storage_options`` to every read. | ||
| """ | ||
| for var in ( | ||
| "AWS_ENDPOINT_URL", | ||
| "AWS_ACCESS_KEY_ID", | ||
| "AWS_SECRET_ACCESS_KEY", | ||
| "AWS_ALLOW_HTTP", | ||
| "AWS_S3_ALLOW_UNSAFE_RENAME", | ||
| "AWS_DEFAULT_REGION", | ||
| "AWS_REGION", | ||
| ): | ||
| monkeypatch.delenv(var, raising=False) | ||
| bucket = f"isolated-{uuid.uuid4()}" | ||
| boto3.client( | ||
| "s3", endpoint_url=s3_server, aws_access_key_id="", aws_secret_access_key="" | ||
| ).create_bucket(Bucket=bucket) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will keep the bucket around indefinitely which is not ideal for local development & debugging. Could you
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, done. Note: I didn't use a paginator and technically |
||
| return f"s3://{bucket}", { | ||
| "aws_access_key_id": "testing", | ||
| "aws_secret_access_key": "testing", | ||
| "aws_endpoint_url": s3_server, | ||
| "aws_region": "us-east-1", | ||
| "aws_allow_http": "true", | ||
| } | ||
|
|
||
|
|
||
| @pytest.fixture() | ||
| def any_tmp_path(request: pytest.FixtureRequest) -> str: | ||
| return str(request.getfixturevalue(request.param)) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this has the desired effect. Polars likely only reads these variables at startup and does not re-read them for every read. Could we check that?
I don't have a good alternative idea but I think all of those tests are moot if I'm right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! So yes, the tests were indeed moot as Polars caches an object store per bucket which isn't cleared by this monkeypatch. Other tests use the same bucket so it stayed reachable even without
storage_options. This is per bucket though so I fixed this by creating a fresh bucket that is never env configured hence no cache and so requiresstorage_optionsforwarding. Ran the tests without the fix which then failed as expected. Let me know what you think.