Skip to content

build_zipmanifest results should be memoized #154

@ghost

Description

Originally reported by: wickman (Bitbucket: wickman, GitHub: wickman)


as far as I can tell, build_zipmanifest is not cached.

from a recent profile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.942    4.942 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/pex.py:124(_execute_internal)
        1    0.000    0.000    3.830    3.830 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/environment.py:114(activate)
        1    0.001    0.001    3.823    3.823 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/environment.py:121(_activate)
        1    0.000    0.000    3.737    3.737 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/environment.py:105(update_candidate_distributions)
       56    0.003    0.000    3.731    0.067 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/environment.py:84(load_internal_cache)
        1    0.032    0.032    3.728    3.728 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/environment.py:63(write_zipped_internal_cache)
       55    0.002    0.000    2.947    0.054 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/_twitter_common_python/util.py:46(distribution_from_path)
       57    0.005    0.000    2.945    0.052 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/pkg_resources.py:1703(__init__)
       57    0.183    0.003    2.938    0.052 /Users/wickman/clients/science/dist/aurora_client.pex/.bootstrap/pkg_resources.py:1452(build_zipmanifest)

This is the profile for starting up a PEX file (zipped python environment, see https://mail.python.org/pipermail/distutils-sig/2014-January/023727.html ) with a number of exploded eggs inside. build_zipmanifest is called with the same archive every time we construct a Distribution via EggMetadata:

    def __init__(self, module):
        EggProvider.__init__(self,module)
        self.zipinfo = build_zipmanifest(self.loader.archive)
        self.zip_pre = self.loader.archive+os.sep

It's not an unreasonable assumption that each time you construct a new Distribution, it will either be on disk or part of its own zip archive, meaning these would not be duplicated calls.

In our case, all eggs are in a single zip. This means 57 50ms calls instead of 1 50ms call in order to run this Python application which has 57 egg dependencies.

The proposal is to cache calls to zipmanifest (perhaps invalidating should os.stat/mtime change.)


Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions