This project is archived and is in readonly mode.

#78 ✓invalid
Psycopg website

Memory issue

Reported by Psycopg website | November 22nd, 2011 @ 02:35 PM

Submitted by: Morten Goodwin

We are experiencing some memory issues behavior with psycopg2 which have are not find a rational explanation for. Our application, which uses psycopg2, generating huge amounts of leaked data. We have tried examining the leaked data with various python memory tools.

We have tried with the following versions of psycopg:
2.4.1 (dt dec pq3 ext),
2.0.7 (dec mx dt ext pq3)

With the following versions of python:
2.5.2, 2.7.2, (and 3.1.4).

postgreSQL versions:
7.4.23, 8.4.7, 9.1.1

It is run on Linux #1 SMP Wed Apr 29 08:06:31 UTC 2009 x86_64 GNU/Linux

(Note: Not all combinations of python, psycopg and postgres were tested).

The following code, reproduces the error:
import gc
from psycopg2 import connect
print 'Reproducing the memory error'
acon = connect(...) #db, password disclosed.
areadcur = acon.cursor()
areadcur.execute("select pagecontent from searchdb.webpage where siteid=3;",locals()) #pagecontent is a bytea.
data = areadcur.fetchall()
del data
areadcur.close()
acon.close()
del areadcur
del acon
print gc.collect()
del gc.garbage[:]
del connect

With our data, this code generates about 410 MB in leaked data. As far as we can see, there should be close to no data which takes up memory at the end of this script: cursors and connections are closed, an everything related is deleted.

Note that the del gc-statements are not needed to reproduce the error. However, it should ensure that the memory is not held up elsewhere.

We have also tried with:
- fetchone, instead of fetchmany. - server side cursors, - using itersize - running multiple queries (assuming that some buffer is reset).

We are not able to find a solution for this. To us it looks like a bug in psycopg2.
If this is not a bug, but something we have missed or are doing wrong, we would appreciate the feedback.

This is part of a bigger project with a python script which runs constantly, with several threads and connections. Over time this script generates up to 8GB of memory, which is close to the maximum available physical memory. We now have a script that resets the python script when it gets close to the maximum value. Naturally, we would like a better solution for this.

Comments and changes to this ticket

  • Daniele Varrazzo

    Daniele Varrazzo November 22nd, 2011 @ 03:59 PM

    I've tried to reproduce the issue, but without success.

    My setup is: Ubuntu 11.10 64 bit, Python 2.7.2, Postgres 9.0.5, testing psycopg2 2.4.1

    I've created a test dataset with 100 records with 10 MB of bytea each, with:

    create table test (siteid integer, pagecontent bytea);
    insert into test select 3, repeat('1234567890', 1024 * 1024)::bytea from generate_series (1,100);
    

    I've modified your test script to run in a loop and report the vm size of the process:

    # bug78.py
    import gc
    print 'Reproducing the memory error'
    for i in range(100):
        from psycopg2 import connect
        acon = connect('')
        areadcur = acon.cursor()
        areadcur.execute("select pagecontent from test where siteid=3;",locals()) #pagecontent is a bytea.
        data = areadcur.fetchall()
    
        print "loaded", i
        for l in open('/proc/self/status'):
            if l.startswith('VmSize'):
                print l.rstrip()
    
        del data
        areadcur.close()
        acon.close()
        del areadcur
        del acon
        gc.collect()
        del gc.garbage[:]
        del connect
    
        print "cleared", i
        for l in open('/proc/self/status'):
            if l.startswith('VmSize'):
                print l.rstrip()
    

    The result of the run is:

    $ python bug78.py 
    Reproducing the memory error
    loaded 0
    VmSize:  3217672 kB
    cleared 0
    VmSize:    79332 kB
    loaded 1
    VmSize:  3216980 kB
    cleared 1
    VmSize:    79404 kB
    loaded 2
    VmSize:  3216980 kB
    cleared 2
    VmSize:    79404 kB
    loaded 3
    VmSize:  3216980 kB
    cleared 3
    ...
    

    after which the output doesn't change: before and after each cleanup the memory taken by the process is the same.

    Running the script step by step, the memory is cleaned up on the cursor delete. It can be argued that the memory could be freed when the cursor is closed instead, but this doesn't explain your leak.

    What happens if you run the script modified as above?

    The usual suspect is that you have an objects loop in your program. In order to detect refcount bugs in psycopg, I've written a script running the test suite a number of time and checking the difference in the number of objects alive. The script is available at https://github.com/dvarrazzo/psycopg/blob/devel/scripts/refcounter.py. You can try and modify it by running some code of yours and see if it shows any type of object whose number of instances fails to be reset.

  • Morten Goodwin

    Morten Goodwin November 22nd, 2011 @ 05:14 PM

    Thanks for following up.

    The output is as following.

    Reproducing the memory error
    loaded 0
    VmSize: 2239800 kB
    cleared 0
    VmSize: 158228 kB
    loaded 1
    VmSize: 2239112 kB
    cleared 1
    VmSize: 158300 kB
    loaded 2
    VmSize: 2239112 kB
    cleared 2
    VmSize: 158300 kB
    loaded 3
    VmSize: 2239112 kB
    cleared 3
    VmSize: 158300 kB
    ....

    This, as you say, looks fine. I don't understand why we have 158MB of used memory, while you have 79MB. However, that is not important.

    I was surprised by this result. It was not what I experiences earlier. Therefore I ran the same script towards my real data. Then I get the following:

    loaded 0
    VmSize: 403344 kB
    cleared 0
    VmSize: 396944 kB
    loaded 1
    VmSize: 401800 kB
    cleared 1
    VmSize: 397824 kB
    loaded 2
    VmSize: 405384 kB
    cleared 2
    VmSize: 405384 kB
    loaded 3
    VmSize: 405384 kB
    cleared 3
    VmSize: 405384 kB
    loaded 4
    VmSize: 405384 kB
    cleared 4
    VmSize: 397420 kB
    loaded 5
    VmSize: 403332 kB
    cleared 5
    VmSize: 403332 kB

    I don't understand what is the difference.

    Why is there 250MB difference in memory usage at 'cleared' between the test data and real data? There should, as far as I can see, be no data at this point.
    Further, why is the script able to clear up >2GB in one case, not 400MB in the other.
    At least it is not increasing over time

    I am not so concerned whether the memory is freed up at closing the cursor, as long as it happens :) I agree it is not related to out problems.

    Thanks for referencing to the memory-allocation script. I actually already used it for our memory. I was unfortunately not successful in locating the problem. From this I concluded that our memory leak was not related to refcount to python objects. I could of course be mistaken.

    Regards.
    Morten

  • Daniele Varrazzo

    Daniele Varrazzo November 22nd, 2011 @ 05:35 PM

    • State changed from “new” to “invalid”

    Your alloc/dealloc pattern is very strange: why on the earth cleared 3 doesn't clear anything?

    Could it be that you have some resident tool messing up with the objects life cycle? Something such as a remote debugger... For your reference, this is what I get in a clean python process:

    $ python -c "import sys; print '\n'.join(['%s: %s' % (k, v) for k, v in sorted(sys.modules.items())])"
    UserDict: <module 'UserDict' from '/usr/lib/python2.7/UserDict.pyc'>
    __builtin__: <module '__builtin__' (built-in)>
    __main__: <module '__main__' (built-in)>
    _abcoll: <module '_abcoll' from '/usr/lib/python2.7/_abcoll.pyc'>
    _codecs: <module '_codecs' (built-in)>
    _sre: <module '_sre' (built-in)>
    _warnings: <module '_warnings' (built-in)>
    _weakref: <module '_weakref' (built-in)>
    _weakrefset: <module '_weakrefset' from '/usr/lib/python2.7/_weakrefset.pyc'>
    abc: <module 'abc' from '/usr/lib/python2.7/abc.pyc'>
    apport_python_hook: <module 'apport_python_hook' from '/usr/lib/python2.7/dist-packages/apport_python_hook.pyc'>
    codecs: <module 'codecs' from '/usr/lib/python2.7/codecs.pyc'>
    copy_reg: <module 'copy_reg' from '/usr/lib/python2.7/copy_reg.pyc'>
    encodings: <module 'encodings' from '/usr/lib/python2.7/encodings/__init__.pyc'>
    encodings.__builtin__: None
    encodings.aliases: <module 'encodings.aliases' from '/usr/lib/python2.7/encodings/aliases.pyc'>
    encodings.codecs: None
    encodings.encodings: None
    encodings.utf_8: <module 'encodings.utf_8' from '/usr/lib/python2.7/encodings/utf_8.pyc'>
    errno: <module 'errno' (built-in)>
    exceptions: <module 'exceptions' (built-in)>
    genericpath: <module 'genericpath' from '/usr/lib/python2.7/genericpath.pyc'>
    linecache: <module 'linecache' from '/usr/lib/python2.7/linecache.pyc'>
    os: <module 'os' from '/usr/lib/python2.7/os.pyc'>
    os.path: <module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'>
    posix: <module 'posix' (built-in)>
    posixpath: <module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'>
    re: <module 're' from '/usr/lib/python2.7/re.pyc'>
    signal: <module 'signal' (built-in)>
    site: <module 'site' from '/usr/lib/python2.7/site.pyc'>
    sitecustomize: <module 'sitecustomize' from '/usr/lib/python2.7/sitecustomize.pyc'>
    sre_compile: <module 'sre_compile' from '/usr/lib/python2.7/sre_compile.pyc'>
    sre_constants: <module 'sre_constants' from '/usr/lib/python2.7/sre_constants.pyc'>
    sre_parse: <module 'sre_parse' from '/usr/lib/python2.7/sre_parse.pyc'>
    stat: <module 'stat' from '/usr/lib/python2.7/stat.pyc'>
    sys: <module 'sys' (built-in)>
    sysconfig: <module 'sysconfig' from '/usr/lib/python2.7/sysconfig.pyc'>
    traceback: <module 'traceback' from '/usr/lib/python2.7/traceback.pyc'>
    types: <module 'types' from '/usr/lib/python2.7/types.pyc'>
    warnings: <module 'warnings' from '/usr/lib/python2.7/warnings.pyc'>
    zipimport: <module 'zipimport' (built-in)>
    

    Closing unless further evidence it is a psycopg bug is provided.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.

Shared Ticket Bins

Tags

Pages