Skip to content

gh-88339: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile#27737

Merged
gpshead merged 7 commits into
python:mainfrom
JuniorJPDJ:JuniorJPDJ/zipfile-store-seek
Aug 6, 2022
Merged

gh-88339: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile#27737
gpshead merged 7 commits into
python:mainfrom
JuniorJPDJ:JuniorJPDJ/zipfile-store-seek

Conversation

@JuniorJPDJ

@JuniorJPDJ JuniorJPDJ commented Aug 12, 2021

Copy link
Copy Markdown
Contributor

At the moment stored ZipExtFile is being read to the place of seek like all other compressed variants.
It's not needed as it's possible to freely seek uncompressed file inside zip without this penalty.

Lots of apps depend on ZipExtFile seeking ability and this patch would increase performance and lower IO penalty significantly.

It disables CRC checking after first seek as it's impossible to check CRC if we are not reading whole file.

I've been using patched zipfile for almost a year now and I can say it works good ;)

https://bugs.python.org/issue44173 / #88339

redo of #26227 after a little fail

@gpshead gpshead left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test_zipfile.py unittest explicitly cover this case of seeking within a ZIP_STORED entry? If so, mention that and drop a link to where in a comment here.

Comment thread Lib/zipfile.py Outdated
@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@gpshead gpshead left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an explicit seek + read test case to Lib/test/test_zipfile.py's StoredTestsWithSourceFile class that checks that the data read matches what is expected at the seek'ed to offset?

@JuniorJPDJ

JuniorJPDJ commented Aug 19, 2021

Copy link
Copy Markdown
Contributor Author

I don't get why this test fails. Do you maybe have any idea?
EDIT: fixed

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

How about adding an explicit seek + read test case to Lib/test/test_zipfile.py's StoredTestsWithSourceFile class that checks that the data read matches what is expected at the seek'ed to offset?

Will try to do, thanks!

@serhiy-storchaka

Copy link
Copy Markdown
Member

There is a unittest explicitly covering this case. The one which fails.

@JuniorJPDJ

JuniorJPDJ commented Aug 26, 2021

Copy link
Copy Markdown
Contributor Author

There is a unittest explicitly covering this case. The one which fails.

You are right! It's that unit test. I added one assert to it.

There were problems with read buffer which I fixed now. I rebased and added fixup! commit without amending previous one to make review easier. Just please, squash them when merging.

Please run CI and tell me if I'm still missing something. ;)

BTW. Thanks for your work guys!

@JuniorJPDJ JuniorJPDJ requested a review from gpshead August 26, 2021 01:44
@JuniorJPDJ JuniorJPDJ changed the title bpo-44173: better approach for seeking in non-compressed ZipExtFile bpo-44173: [zipfile] better approach for seeking in non-compressed ZipExtFile Aug 26, 2021
@JuniorJPDJ JuniorJPDJ changed the title bpo-44173: [zipfile] better approach for seeking in non-compressed ZipExtFile bpo-44173: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile Aug 27, 2021
@JuniorJPDJ

JuniorJPDJ commented Aug 27, 2021

Copy link
Copy Markdown
Contributor Author

Previous tests passed (some asyncio on windows x64 failed, but it's probably not my fault).
I just rebased and added news entry - from my side it's ready to merge. Am I missing something?

@JuniorJPDJ

JuniorJPDJ commented Sep 19, 2021

Copy link
Copy Markdown
Contributor Author

Can I ask for review please?

EDIT: Just rebased my branch.

@JuniorJPDJ JuniorJPDJ changed the title bpo-44173: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile gh-88339: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile Apr 17, 2022
@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again

(test is already here, I added some tell asserts)

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

Comment thread Lib/zipfile.py Outdated
@JuniorJPDJ JuniorJPDJ requested a review from gpshead April 22, 2022 12:48
Comment thread Lib/zipfile.py Outdated
Comment thread Lib/zipfile.py Outdated
Comment thread Lib/zipfile.py Outdated
Comment thread Lib/zipfile.py Outdated
Comment thread Lib/zipfile.py Outdated
@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

It looks like I fixed all issues you guys found! :D

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

Oh and I again forgot to tag you guys: @gpshead, @serhiy-storchaka

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

Guys, what's wrong. It's over a year here now, I fixed everything you mentioned. Can I do something more to make this faster?

@gpshead gpshead left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could use more explicit testing around all possible seek edge cases, but that is true in general of the zipfile module and not specific to this PR. It appears to do the right thing and the existing tests that hopefully represent the more likely usages continue to pass.

@gpshead

gpshead commented Jun 15, 2022

Copy link
Copy Markdown
Member

If I haven't merged this within a few days, please ping me again. I'm giving Serhiy some time to respond if desired.

@JuniorJPDJ

JuniorJPDJ commented Jun 22, 2022

Copy link
Copy Markdown
Contributor Author

If I haven't merged this within a few days, please ping me again.

@gpshead ;)

Also I've a question - merging it now to main branch means this changes will be available from python 3.12, not 3.11?

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

If I haven't merged this within a few days, please ping me again.

@gpshead ;)

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

ping @gpshead

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

ping @gpshead

@JuniorJPDJ

Copy link
Copy Markdown
Contributor Author

bump @gpshead

@gpshead gpshead merged commit 330f1d5 into python:main Aug 6, 2022
@gpshead

gpshead commented Aug 6, 2022

Copy link
Copy Markdown
Member

yep, this is in 3.12. the feature freeze window for a release closes upon the first beta.

@gpshead

gpshead commented Aug 6, 2022

Copy link
Copy Markdown
Member

thanks for the improvement!

@JuniorJPDJ JuniorJPDJ deleted the JuniorJPDJ/zipfile-store-seek branch August 7, 2022 18:43
iritkatriel pushed a commit to iritkatriel/cpython that referenced this pull request Aug 11, 2022
…le.ZipExtFile (pythonGH-27737)

Avoid reading all of the intermediate data in uncompressed items in a zip file when the user seeks forward.

Contributed by: @JuniorJPDJ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants