Skip to content

bpo-38091: Import deadlock detection causes deadlock#17518

Merged
miss-islington merged 7 commits into
python:masterfrom
arigo:master
Mar 3, 2020
Merged

bpo-38091: Import deadlock detection causes deadlock#17518
miss-islington merged 7 commits into
python:masterfrom
arigo:master

Conversation

@arigo

@arigo arigo commented Dec 9, 2019

Copy link
Copy Markdown
Contributor

@pitrou

pitrou commented Dec 21, 2019

Copy link
Copy Markdown
Member

The fix here is more of a hack. A proper fix could be something like the below. At least it fixes the reproducer posted by @rlamy . Is there a real-world reproducer you can test it on?

diff --git a/Lib/importlib/_bootstrap.py b/Lib/importlib/_bootstrap.py
index 8de0e9ee79..7b2bfefdec 100644
--- a/Lib/importlib/_bootstrap.py
+++ b/Lib/importlib/_bootstrap.py
@@ -71,9 +71,20 @@ class _ModuleLock:
             lock = _blocking_on.get(tid)
             if lock is None:
                 return False
-            tid = lock.owner
-            if tid == me:
+            owner_tid = lock.owner
+            if owner_tid == tid:
+                # Lock owner is temporarily blocking on a lock it
+                # already owns.  This means it's not actually blocking.
+                # (this can happen at the beginning of acquire() below
+                #  -- see bpo 38091)
+                return False
+            if owner_tid == me:
+                # Lock owner is blocking on a lock we own, but we're blocking
+                # on the lock it owns => deadlock.
                 return True
+            # Move along the chain and find out whether the owner is blocking
+            # on another lock.
+            tid = owner_tid
 
     def acquire(self):
         """

@arigo

arigo commented Dec 22, 2019

Copy link
Copy Markdown
Contributor Author

Maybe a safer way to think about it would be to ask ourselves when has_deadlock() is really supposed to return True. I think it means to return True if the chain _blocking_on[_blocking_on[...[tid]...]] eventually reaches the value me, and False in all other cases. You are fixing one such case by detecting when the chain enters a fixpoint, i.e. a loop of length 1. I'm not convinced that longer loops are impossible; maybe we should also detect them and return False in these cases. Something like that (untested code):

     def has_deadlock(self):
         # Deadlock avoidance for concurrent circular imports.
         me = _thread.get_ident()
         tid = self.owner
+        seen = set()
         while True:
             lock = _blocking_on.get(tid)
             if lock is None:
                 return False
             tid = lock.owner
             if tid == me:
                 return True
+            if tid in seen:
+                # bpo 38091: the chain of tid's we encounter here
+                # eventually leads to a fixpoint or a cycle, but
+                # does not reach 'me'.  This means we would not
+                # actually deadlock.  This can happen if other
+                # threads are at the beginning of acquire() below.
+                return False    
+            seen.add(tid)

@pitrou

pitrou commented Dec 22, 2019

Copy link
Copy Markdown
Member

@arigo Yes, that fix would probably work. Perhaps @rlamy can check :-)

@rlamy

rlamy commented Dec 23, 2019

Copy link
Copy Markdown
Contributor

I think that this should fix the issue. It caused some intermittent test failures in PyPy, so I guess we should just merge the change there and monitor the CI results.

@pitrou

pitrou commented Jan 1, 2020

Copy link
Copy Markdown
Member

@rlamy Did you test this fix on PyPy?

@brettcannon brettcannon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as rejected until we here back from PyPy that the proposed fix in the PR discussion worked for them.

@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@arigo

arigo commented Jan 23, 2020

Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@brettcannon: please review the changes made to this pull request.

@arigo

arigo commented Jan 23, 2020

Copy link
Copy Markdown
Contributor Author

I have updated the PR to include the test from https://bugs.python.org/issue38091 , which fails (deadlocks) without the fix and passes with the fix.

@arigo arigo reopened this Jan 23, 2020

@brettcannon brettcannon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib.h wasn't regenerated and included in this PR. I believe that can be done with make regen-all (or something like that).

@arigo

arigo commented Feb 14, 2020

Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@brettcannon: please review the changes made to this pull request.

@codecov

codecov Bot commented Feb 15, 2020

Copy link
Copy Markdown

Codecov Report

Merging #17518 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #17518     +/-   ##
=========================================
  Coverage   82.11%   82.12%             
=========================================
  Files        1955     1954      -1     
  Lines      588906   584027   -4879     
  Branches    44429    44458     +29     
=========================================
- Hits       483572   479607   -3965     
+ Misses      95679    94777    -902     
+ Partials     9655     9643     -12     
Impacted Files Coverage Δ
Lib/distutils/tests/test_bdist_rpm.py 30.00% <0.00%> (-65.00%) ⬇️
Lib/distutils/command/bdist_rpm.py 7.63% <0.00%> (-56.88%) ⬇️
Lib/test/test_urllib2net.py 76.92% <0.00%> (-13.85%) ⬇️
Lib/test/test_smtpnet.py 78.57% <0.00%> (-7.15%) ⬇️
Lib/ftplib.py 63.85% <0.00%> (-6.06%) ⬇️
Lib/test/test_ftplib.py 87.11% <0.00%> (-4.72%) ⬇️
Tools/scripts/db2pickle.py 17.82% <0.00%> (-3.97%) ⬇️
Tools/scripts/pickle2db.py 16.98% <0.00%> (-3.78%) ⬇️
Lib/test/test_socket.py 71.94% <0.00%> (-3.77%) ⬇️
Lib/distutils/tests/test_bdist_msi.py 56.25% <0.00%> (-3.75%) ⬇️
... and 359 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bec4186...1b2dbe4. Read the comment docs.

Comment thread Lib/test/test_import/__init__.py Outdated
Comment thread Misc/NEWS.d/next/Core and Builtins/2020-02-14-23-10-07.bpo-38091.pwR0K7.rst Outdated
@miss-islington

Copy link
Copy Markdown
Contributor

@arigo: Status check is done, and it's a failure ❌ .

@miss-islington

Copy link
Copy Markdown
Contributor

@arigo: Status check is done, and it's a success ✅ .

@miss-islington miss-islington merged commit 6daa37f into python:master Mar 3, 2020
@miss-islington

Copy link
Copy Markdown
Contributor

Thanks @arigo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

@miss-islington

Copy link
Copy Markdown
Contributor

Sorry, @arigo, I could not cleanly backport this to 3.8 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 6daa37fd42c5d5300172728e8b4de74fe0b319fc 3.8

@miss-islington miss-islington self-assigned this Mar 3, 2020
@miss-islington

Copy link
Copy Markdown
Contributor

Sorry @arigo, I had trouble checking out the 3.7 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 6daa37fd42c5d5300172728e8b4de74fe0b319fc 3.7

@brettcannon

Copy link
Copy Markdown
Member

@arigo any interest in manually backporting this to 3.8 and 3.7? (Chances are it's importlib.h.)

@arigo

arigo commented Mar 5, 2020

Copy link
Copy Markdown
Contributor Author

My git-fu is very limited (in general and on github). Do you expect two other pull requests for the 3.7 and 3.8 branches?

@brettcannon

Copy link
Copy Markdown
Member

@arigo yep, but don't worry about it too much. I can try to get around to it eventually myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants