Always default to .tar.gz sdists on any platform#748
Conversation
It doesn't make much sense to have different sdist defaults for different platforms, instead we'll always use .tar.gz on every platform for greater consistency between platforms.
|
Note: This PR is stemming from a desire I have to have PyPI (eventually) start rejecting all sdists except those that end in |
|
dumb question: why not all zip instead? (I know Python can decompress those, but you won't be able to, say, double-click on the archive as with a regular |
|
Currently on PyPI there are 444,338 |
|
my point was just that zip can be decompressed on any OS, including Windows or obscure linux distros, without needing to install extra software, which is not the case with tar.gz on Windows. |
|
Sure, and if we were starting from nothing that would probably be enough to push |
|
OK, I see. Thanks for you reply. |
+1 for consistency
I don't think that necessarily speaks to the popularity of the format as much as the popularity of the operating system on which the package was produced.
Good point. I do prefer tar.gz files as they stream better through a pipe (generally speaking, a zip file must be entirely loaded into memory to be expanded, at least under Python's zipfile module). There are other arguments in favor of zip though:
At first, I was ambivalent, but on further consideration, I feel fairly strongly that zip is the better format for sdists.
It seems like this motivation is the primary one. What if instead, PyPI were to display a warning banner for packages publishing the undesirable format? This could instigate the individual package maintainers to use later versions of setuptools or update their project config to match the recommendation (even if they use another packaging mechanism like distutils). |
|
To be clear, I'm expressing my preference and concerns, but this project will follow whatever consensus is reached in PyPA. |
Right, I'm not saying that people are explicitly picking
Hmm, here's something I threw together really quickly for creating a tarfile completely in memory using only in memory files. If the files are already on the filesystem then this is even easier since you can just use import tarfile
import io
tarobj = io.BytesIO()
with tarfile.open(fileobj=tarobj, mode="w:gz") as tgz:
data = b"This is an example file."
t = tarfile.TarInfo("example/file.txt")
t.size = len(data)
tgz.addfile(t, io.BytesIO(data))
with open("example.tar.gz", "wb") as fp:
fp.write(tarobj.getvalue())Similarly, you can extract all of the files of a import tarfile
import io
file_data = {}
tarobj = io.BytesIO(b"... Tar data goes here ...")
with tarfile.open(fileobj=tarobj, mode="r") as tgz:
for filename in tgz.getnames():
file_data[filename] = tgz.extractfile(filename).read()
It's one of the primary motivators, but whereas Windows users are generally free to upgrade their Python or setuptools installations on their machines more or less at will, Linux/macOS developers have that shipped as part of their OS, making them unable to upgrade their Python or their setuptools easily. On macOS for example, to upgrade your setuptools currently requires disabling "rootless" (requires a reboot, makes your computer less secure), then forcibly upgrading it, then rebooting again to rootless. On Linux, the system Python (and setuptools) integrates with the entire OS, and by upgrading them you risk breaking the entire OS by upgrading that system. It's also about the number of people who have to change here, the more people who experience a change, the greater the chance that change will break something (e.g. https://xkcd.com/1172/), I believe that changing to Is Oh, and TIL that anyone who has Python 3.4+ installed on their system, has a tool to create and extract |
Thanks for that. It worked like a charm. I don't know what I was doing wrong that I couldn't come up with something like that.
If zip is the nicer (optimal) format, I'd prefer we accept the greater challenge of moving to it, rather in three years explaining why we've managed to move everyone to an inferior format. @ncoghlan, @qwcode - any opinions on which format is optimal for sdists? Irrespective of the format chosen, and thinking about the implementation, since this format selection actually exists in distutils, I'd rather see the sanctioned format committed to the stdlib, at which point it's straightforward to add forward compatibility in setuptools. I think I'd go as far as essentially disabling platform-specific formats by having initialize_options set the default for formats to |
The thing is, I don't really think it is the optimal format, I think it greatly depends on what things you're optimizing for. For instance, a .zip compresses each file individually (possibly each file differently!) which makes it great for random access (something important for things like, Python's zip-import) but this means that it compresses to a larger size than soemthing like IOW, my "maybe" was really saying, it greatly depends on what things you're trying to optimize for and for our specific use case, it's really easy to arbitrary pick one side or the other and argue for it on it's technical merits because on the technical side, they're basically equal for sdists. |
|
Changing the default sdist output format in distutils for 3.6 would definitely be possible (albeit needing to be done before the first beta next month), as the standard library has For sdist, I think I think zip is a better fit for our built formats though, so it makes sense to continue to require that for both wheels and eggs (which is already explicitly the case for wheels, and implicitly the case for eggs). |
|
I've created http://bugs.python.org/issue27819 and assigned it to myself. Once that's in place, I'll add forward compatibility into Setuptools. |
|
Thank you! |
It doesn't make much sense to have different sdist defaults for different platforms, instead we'll always use .tar.gz on every platform for greater consistency between platforms.