This project is archived and is in readonly mode.
Deadlock in 2.4.0.
Reported by ronin-150602 (at lighthouseapp) | May 25th, 2011 @ 09:25 AM
rel-2.4.0
os-linux
In the particular setup implemented in the attached file, psycopg ends up deadlocking.
The setup: all the threads share a single connection. Cursors are not shared across threads. Eventually, psycopg ends up in a deadlock. It may only happen after several hundred thousands SELECTs, though.
It does seem that at least three threads are required to trigger the bug. With intermittent concurrency problems it's never easy to tell for sure.
This is with psycopg 2.4.0 on Linux x86 using Python 2.5.4.
Thank you for your help.
Comments and changes to this ticket
-
ronin-150602 (at lighthouseapp) May 25th, 2011 @ 09:33 AM
Side note: sometimes psycopg seems to raise a SystemError exception instead of deadlocking, hinting at other underlying concurrency issues:
Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner self.run() File "pg_thread.py", line 20, in run CONNECTION.cursor().execute("SELECT 1") SystemError: null argument to internal routine
-
Daniele Varrazzo May 27th, 2011 @ 09:22 PM
Thank you for the report and for the test case: will definitely investigate!
-
Daniele Varrazzo May 27th, 2011 @ 11:35 PM
- State changed from new to open
I can reproduce the deadlock, even if not consistently, with the current devel too. Sometimes instead of deadlocks I have had loss of sync with the server and also a segfault - unfortunately I don't have a core.
Will try to look more into the issue, thanks.
-
Daniele Varrazzo May 28th, 2011 @ 08:54 AM
Stack trace of the three threads in a deadlock:
Thread 2:
[#0](/projects/62710/tickets/0 "Ticket #0") 0x00110416 in __kernel_vsyscall () [#1](/projects/62710/tickets/1 "Ticket #1") 0x00191169 in __lll_lock_wait () from /lib/libpthread.so.0 [#2](/projects/62710/tickets/2 "Ticket #2") 0x0018c5cb in _L_lock_748 () from /lib/libpthread.so.0 [#3](/projects/62710/tickets/3 "Ticket #3") 0x0018c3f1 in pthread_mutex_lock () from /lib/libpthread.so.0 [#4](/projects/62710/tickets/4 "Ticket #4") 0x00eec9d0 in pq_execute (curs=0x9257034, query=0x918569c "SELECT 1", async=0) at psycopg/pqpath.c:764 [#5](/projects/62710/tickets/5 "Ticket #5") 0x00ef568b in _psyco_curs_execute (self=0x9257034, operation=0x0, vars= 0x0, async=0) at psycopg/cursor_type.c:419 [#6](/projects/62710/tickets/6 "Ticket #6") 0x00ef59db in psyco_curs_execute (self=0x9257034, args=('SELECT 1',), kwargs=0x0) at psycopg/cursor_type.c:475 [#7](/projects/62710/tickets/7 "Ticket #7") 0x081b4a03 in PyCFunction_Call (func= <built-in method execute of psycopg2._psycopg.cursor object at remote 0x9257034>, arg=('SELECT 1',), kw=0x0) at ../Objects/methodobject.c:85 [#8](/projects/62710/tickets/8 "Ticket #8") 0x080fb2ce in call_function (pp_stack=0xb5e80928, oparg=1) at ../Python/ceval.c:3750
Thread 3:
[#0](/projects/62710/tickets/0 "Ticket #0") 0x00110416 in __kernel_vsyscall () [#1](/projects/62710/tickets/1 "Ticket #1") 0x00c65df6 in poll () from /lib/libc.so.6 [#2](/projects/62710/tickets/2 "Ticket #2") 0x007dc556 in ?? () from /usr/lib/libpq.so.5 [#3](/projects/62710/tickets/3 "Ticket #3") 0x007dc661 in ?? () from /usr/lib/libpq.so.5 [#4](/projects/62710/tickets/4 "Ticket #4") 0x007dc6e3 in ?? () from /usr/lib/libpq.so.5 [#5](/projects/62710/tickets/5 "Ticket #5") 0x007db7b2 in PQgetResult () from /usr/lib/libpq.so.5 [#6](/projects/62710/tickets/6 "Ticket #6") 0x007dbaa8 in ?? () from /usr/lib/libpq.so.5 [#7](/projects/62710/tickets/7 "Ticket #7") 0x00eecaf4 in pq_execute (curs=0xb7844db4, query=0x918569c "SELECT 1", async=0) at psycopg/pqpath.c:778 [#8](/projects/62710/tickets/8 "Ticket #8") 0x00ef568b in _psyco_curs_execute (self=0xb7844db4, operation=0x0, vars= 0x0, async=0) at psycopg/cursor_type.c:419 [#9](/projects/62710/tickets/9 "Ticket #9") 0x00ef59db in psyco_curs_execute (self=0xb7844db4, args=('SELECT 1',), kwargs=0x0) at psycopg/cursor_type.c:475 [#10](/projects/62710/tickets/10 "Ticket #10") 0x081b4a03 in PyCFunction_Call (func= <built-in method execute of psycopg2._psycopg.cursor object at remote 0xb7844db4>, arg=('SELECT 1',), kw=0x0) at ../Objects/methodobject.c:85
Thread 4:
[#0](/projects/62710/tickets/0 "Ticket #0") 0x00110416 in __kernel_vsyscall () [#1](/projects/62710/tickets/1 "Ticket #1") 0x00191169 in __lll_lock_wait () from /lib/libpthread.so.0 [#2](/projects/62710/tickets/2 "Ticket #2") 0x0018c5cb in _L_lock_748 () from /lib/libpthread.so.0 [#3](/projects/62710/tickets/3 "Ticket #3") 0x0018c3f1 in pthread_mutex_lock () from /lib/libpthread.so.0 [#4](/projects/62710/tickets/4 "Ticket #4") 0x00eec9d0 in pq_execute (curs=0xb7844d24, query=0x918569c "SELECT 1", async=0) at psycopg/pqpath.c:764 [#5](/projects/62710/tickets/5 "Ticket #5") 0x00ef568b in _psyco_curs_execute (self=0xb7844d24, operation=0x0, vars= 0x0, async=0) at psycopg/cursor_type.c:419 [#6](/projects/62710/tickets/6 "Ticket #6") 0x00ef59db in psyco_curs_execute (self=0xb7844d24, args=('SELECT 1',), kwargs=0x0) at psycopg/cursor_type.c:475 [#7](/projects/62710/tickets/7 "Ticket #7") 0x081b4a03 in PyCFunction_Call (func= <built-in method execute of psycopg2._psycopg.cursor object at remote 0xb7844d24>, arg=('SELECT 1',), kw=0x0) at ../Objects/methodobject.c:85 [#8](/projects/62710/tickets/8 "Ticket #8") 0x080fb2ce in call_function (pp_stack=0xb7625928, oparg=1) at ../Python/ceval.c:3750
-
Daniele Varrazzo May 29th, 2011 @ 09:30 AM
The above traceback doesn't seem a deadlock: thread 3 is stuck in a poll.
-
ronin-150602 (at lighthouseapp) May 29th, 2011 @ 09:54 AM
Hi Daniele,
Thank you for your help. :)
If it helps, we have noticed that at the time the poll() call occurs, the PostgreSQL server thinks that all its client connections are closed. Meaning the poll() call may be waiting on a socket that's not connected anymore, although I have no idea how that is even possible.
-
Daniele Varrazzo May 29th, 2011 @ 10:23 PM
In my test the connection to the server was still open: with a pg_terminate_backend() the client is released (with an exception of course). So it looks like another manifestation of loss of sync with the protocol, but I guess this can only be verified debugging with wireshark. Doesn't sound easy :-/
-
ronin-150602 (at lighthouseapp) May 30th, 2011 @ 07:47 AM
Aw, darn.
Given the other symptoms (traceback, etc) that sometimes occur instead of the deadlock, I was hoping for a 'simple' concurrency issue, perhaps a missing lock somewhere, with the inconsistent connection state being merely a side effect. That would presumably be easier to fix (for an admittedly loose definition thereof) than a loss of sync at the protocol level...
If there is something I can do to help, please let me know.
-
Daniele Varrazzo June 4th, 2011 @ 12:58 AM
- Tag set to multithread
- State changed from open to resolved
The problem should be fixed.
We were processing notifies after releasing the connection lock, together with the result process. But, while the result is in a read-only structure, processing notifies implies parsing the connection, possibly disrupting the communication for other threads.
Attached a patch that has been useful to follow the debug trace in many threads. Not committing it as not portable.
-
ronin-150602 (at lighthouseapp) June 6th, 2011 @ 09:18 AM
Wonderful! Thank you, Daniele. Is there a branch I can pull to test the fix here?
-
ronin-150602 (at lighthouseapp) June 6th, 2011 @ 09:36 AM
Wait, I found it, sorry for the noise.
I've compiled devel from git and run my test script on it, and can confirm that the bug no longer occurs here. :)
-
Garrett Cooper January 21st, 2014 @ 11:10 PM
I wasn't able to reproduce this issue with python 2.7.3 / psycopg2 on Debian Lenny 2.6 at least. Was this resolved on/after v2.4.5?
Here's my test program (it's a more complicated version based on what was provided above).
$ python -c 'import psycopg2 as pg; print(pg.__version__)' 2.4.5 (dt dec pq3 ext) $ cat test_issue_55.py #!/usr/bin/env python #from multiprocessing import Process from multiprocessing import Queue, TimeoutError try: from queue import Empty except ImportError: from Queue import Empty import sys from threading import Thread import psycopg2 CONNECTION = None COUNT = 1000000 def done(i): return "%d: here's a bone!" % (i, ) def pat(i): return '%d: pat pat' % (i, ) def db_select_loop(watchdog_queue, i): for j in xrange(COUNT): CONNECTION.cursor().execute("SELECT 1") watchdog_queue.put(pat(i)) watchdog_queue.put(done(i)) def watchdog_loop(watchdog_queue, i, dproc): epat_msgs = { 'done': done(i), 'pat': pat(i), } j = 0 while True: try: pat_msg = watchdog_queue.get(timeout=60) if pat_msg not in epat_msgs.values(): raise AssertionError('The pat message did not match one of ' 'the expected messages: `%s` != `%s`' % (pat_msg, ','.join(['"%s"' % (m, ) for m in epat_msgs]), )) if pat_msg == epat_msgs['done']: sys.stderr.write('DB worker %d is done\n' % (i, )) return elif j == 1000: sys.stderr.write('DB worker %d gave me a few pats on the head ' ';)..\n' % (i, )) j = 0 else: j += 1 except (AssertionError, Empty, TimeoutError) as e: sys.stderr.write('Watchdog (%s) timed out waiting for pat: %s\n' % (dproc.name, str(e), )) while dproc.is_alive(): if hasattr(dproc, 'terminate'): dproc.terminate() else: dproc.join() break def main(): global CONNECTION conn_dict = dict( host='XXXXXX', port='5432', user='XXXXXX', password='XXXXXX', database='XXXXXX', ) CONNECTION = psycopg2.connect(**conn_dict) # Won't work with multiprocessing.Process by design ;).. #cls = Process cls = Thread procs = [] for i in xrange(150): queue = Queue() dproc = cls(target=db_select_loop, args=(queue, i, )) dproc.start() procs.append(dproc) wproc = cls(target=watchdog_loop, args=(queue, i, dproc, )) wproc.start() procs.append(wproc) while procs: for proc in procs: if proc.is_alive(): proc.join(10) if not proc.is_alive(): procs.remove(proc) if __name__ == '__main__': main()
-
Garrett Cooper January 22nd, 2014 @ 02:37 AM
Yeah... 2.2.2 is definitely bugged in an obvious way (severe database contention, poor performance). Thanks for the bug report PVaret and the fix Daniele!
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.