This project is archived and is in readonly mode.

#91 ✓invalid
Psycopg website

double free or corruption

Reported by Psycopg website | January 11th, 2012 @ 01:59 PM

Submitted by: BlendsInWell

Here's the error:
glibc detected /usr/bin/python: double free or corruption (!prev): 0x0000000001c24710 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x78a96)[0x7fe3aaa43a96] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7fe3aaa47d7c] /usr/lib/python2.7/dist-packages/psycopg2/psycopg.so(+0xddc6)[0x7fe3a9aa2dc6] /usr/lib/python2.7/dist-packages/psycopg2/psycopg.so(+0x13f9b)[0x7fe3a9aa8f9b] /usr/lib/python2.7/dist-packages/psycopg2/psycopg.so(+0x146c6)[0x7fe3a9aa96c6] /usr/bin/python(PyEval_EvalFrameEx+0x2f9)[0x4b6569] /usr/bin/python(PyEval_EvalFrameEx+0xb07)[0x4b6d77] ======= Memory map: ======== 00400000-00633000 r-xp 00000000 08:06 1901460 /usr/bin/python2.7
00832000-00833000 r--p 00232000 08:06 1901460 /usr/bin/python2.7
00833000-0089c000 rw-p 00233000 08:06 1901460 /usr/bin/python2.7
0089c000-008ae000 rw-p 00000000 00:00 0
018e5000-01e24000 rw-p 00000000 00:00 0 [heap]
7fe39c000000-7fe39c0e0000 rw-p 00000000 00:00 0
7fe39c0e0000-7fe3a0000000 ---p 00000000 00:00 0
7fe3a2807000-7fe3a281c000 r-xp 00000000 08:06 1754089 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe3a281c000-7fe3a2a1b000 ---p 00015000 08:06 1754089 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe3a2a1b000-7fe3a2a1c000 r--p 00014000 08:06 1754089 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe3a2a1c000-7fe3a2a1d000 rw-p 00015000 08:06 1754089 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe3a2a1d000-7fe3a2a1e000 ---p 00000000 00:00 0
7fe3a2a1e000-7fe3a321e000 rw-p 00000000 00:00 0
7fe3a321e000-7fe3a321f000 ---p 00000000 00:00 0
7fe3a321f000-7fe3a3a1f000 rw-p 00000000 00:00 0
7fe3a3a1f000-7fe3a3a20000 ---p 00000000 00:00 0
7fe3a3a20000-7fe3a4220000 rw-p 00000000 00:00 0
7fe3a4220000-7fe3a4221000 ---p 00000000 00:00 0
7fe3a4221000-7fe3a4a21000 rw-p 00000000 00:00 0
7fe3a4a21000-7fe3a4a22000 ---p 00000000 00:00 0
7fe3a4a22000-7fe3a5222000 rw-p 00000000 00:00 0
7fe3a5222000-7fe3a5223000 ---p 00000000 00:00 0
7fe3a5223000-7fe3a5a23000 rw-p 00000000 00:00 0
7fe3a5a23000-7fe3a5a24000 ---p 00000000 00:00 0
7fe3a5a24000-7fe3a6224000 rw-p 00000000 00:00 0
7fe3a6224000-7fe3a6225000 ---p 00000000 00:00 0
7fe3a6225000-7fe3a6a25000 rw-p 00000000 00:00 0
7fe3a6a25000-7fe3a6a31000 r-xp 00000000 08:06 1754109 /lib/x86_64-linux-gnu/libnss_files-2.13.so
7fe3a6a31000-7fe3a6c30000 ---p 0000c000 08:06 1754109 /lib/x86_64-linux-gnu/libnss_files-2.13.so
7fe3a6c30000-7fe3a6c31000 r--p 0000b000 08:06 1754109 /lib/x86_64-linux-gnu/libnss_files-2.13.so
7fe3a6c31000-7fe3a6c32000 rw-p 0000c000 08:06 1754109 /lib/x86_64-linux-gnu/libnss_files-2.13.so
7fe3a6c32000-7fe3a6c3c000 r-xp 00000000 08:06 1754113 /lib/x86_64-linux-gnu/libnss_nis-2.13.so
7fe3a6c3c000-7fe3a6e3c000 ---p 0000a000 08:06 1754113 /lib/x86_64-linux-gnu/libnss_nis-2.13.so
7fe3a6e3c000-7fe3a6e3d000 r--p 0000a000 08:06 1754113 /lib/x86_64-linux-gnu/libnss_nis-2.13.so
7fe3a6e3d000-7fe3a6e3e000 rw-p 0000b000 08:06 1754113 /lib/x86_64-linux-gnu/libnss_nis-2.13.so
7fe3a6e3e000-7fe3a6e55000 r-xp 00000000 08:06 1754103 /lib/x86_64-linux-gnu/libnsl-2.13.so
7fe3a6e55000-7fe3a7054000 ---p 00017000 08:06 1754103 /lib/x86_64-linux-gnu/libnsl-2.13.so
7fe3a7054000-7fe3a7055000 r--p 00016000 08:06 1754103 /lib/x86_64-linux-gnu/libnsl-2.13.so
7fe3a7055000-7fe3a7056000 rw-p 00017000 08:06 1754103 /lib/x86_64-linux-gnu/libnsl-2.13.so
7fe3a7056000-7fe3a7058000 rw-p 00000000 00:00 0
7fe3a7058000-7fe3a7060000 r-xp 00000000 08:06 1754105 /lib/x86_64-linux-gnu/libnss_compat-2.13.so
7fe3a7060000-7fe3a725f000 ---p 00008000 08:06 1754105 /lib/x86_64-linux-gnu/libnss_compat-2.13.so
7fe3a725f000-7fe3a7260000 r--p 00007000 08:06 1754105 /lib/x86_64-linux-gnu/libnss_compat-2.13.so
7fe3a7260000-7fe3a7261000 rw-p 00008000 08:06 1754105 /lib/x86_64-linux-gnu/libnss_compat-2.13.so
7fe3a7261000-7fe3a7280000 r-xp 00000000 08:06 1976866 /usr/lib/python2.7/lib-dynload/
ctypes.so
7fe3a7280000-7fe3a7480000 ---p 0001f000 08:06 1976866 /usr/lib/python2.7/lib-dynload/ctypes.so
7fe3a7480000-7fe3a7481000 r--p 0001f000 08:06 1976866 /usr/lib/python2.7/lib-dynload/
ctypes.so
7fe3a7481000-7fe3a7485000 rw-p 00020000 08:06 1976866 /usr/lib/python2.7/lib-dynload/_ctypes.so
7fe3a7485000-7fe3a7486000 rw-p 00000000 00:00 0
7fe3a7486000-7fe3a748d000 r-xp 00000000 08:06 1754132 /lib/x86_64-linux-gnu/librt-2.13.so
7fe3a748d000-7fe3a768c000 ---p 00007000 08:06 1754132 /lib/x86_64-linux-gnu/librt-2.13.so
7fe3a768c000-7fe3a768d000 r--p 00006000 08:06 1754132 /lib/x86_64-linux-gnu/librt-2.13.so
7fe3a768d000-7fe3a768e000 rw-p 00007000 08:06 1754132 /lib/x86_64-linux-gnu/librt-2.13.so
7fe3a768e000-7fe3a769c000 r-xp 00000000 08:06 1982536 /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime.so
7fe3a769c000-7fe3a789c000 ---p 0000e000 08:06 1982536 /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime.so
7fe3a789c000-7fe3a789d000 r--p 0000e000 08:06 1982536 /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime.so
7fe3a789d000-7fe3a789e000 rw-p 0000f000 08:06 1982536 /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime.so
7fe3a789e000-7fe3a795f000 rw-p 00000000 00:00 0
7fe3a795f000-7fe3a7962000 r-xp 00000000 08:06 1754095 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0
7fe3a7962000-7fe3a7b61000 ---p 00003000 08:06 1754095 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0
7fe3a7b61000-7fe3a7b62000 r--p 00002000 08:06 1754095 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0
7fe3a7b62000-7fe3a7b63000 rw-p 00003000 08:06 1754095 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0
7fe3a7b63000-7fe3a7b73000 r-xp 00000000 08:06 8518 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.11
7fe3a7b73000-7fe3a7d72000 ---p 00010000 08:06 8518 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.11
7fe3a7d72000-7fe3a7d73000 r--p 0000f000 08:06 8518 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.11Aborted

Here's the code:
import threading
from socket import *
import re
import time
import select
import zlib
import sys

import protocol
import Listen

myHost = '192.168.1.64'#gethostbyname("querier.saves-the-whales.com")#gethostname()
print myHost
myPort = 2007

= 10

import psycopg2

""" DB Schema

CREATE TABLE google_hits(
ngram varchar NOT NULL,
seed_word varchar NOT NULL,
neighborhood int NOT NULL,
hits bigint DEFAULT -1,
in_field boolean NOT NULL DEFAULT false,
ts timestamp without time zone NOT NULL DEFAULT now(),
PRIMARY KEY (ngram, seed_word, neighborhood)
); CREATE INDEX in_field_ts ON google_hits (in_field, ts);
"""

About 69,300,000 results

Page 100 of about 983,000,000 results

results_regex = re.compile('

.*?([\d,]+) results')
other_results_regex = re.compile('
Page \d+ of (?:about )?([\d,]+) results
')

def set_proc_name(newname):

from ctypes import cdll, byref, create_string_buffer
libc = cdll.LoadLibrary('libc.so.6')
buff = create_string_buffer(len(newname)+1)
buff.value = newname
libc.prctl(15, byref(buff), 0, 0, 0)

def get_proc_name():

from ctypes import cdll, byref, create_string_buffer
libc = cdll.LoadLibrary('libc.so.6')
buff = create_string_buffer(128)
# 16 == PR_GET_NAME from <linux/prctl.h>
libc.prctl(16, byref(buff), 0, 0, 0)
return buff.value

def extract_hits_from_page(page):

results = results_regex.findall(page)
if len(results) == 0:
    results = other_results_regex.findall(page)
    if len(results) == 0:
        if 'did not match any documents' in page or 'No results found for' in page:
            return 0
        else:
            print 'WARNING:  RETURNING NONE FOR HITS!!!'
            return None
hits = int( ''.join(re.findall('\d',results[0])) )
return hits

def make_query_url(ngram, seed_word, neighborhood):

#ngram = ''.join(map(lambda c: '%' + hex(ord(c)).split('x')[-1], ngram))
if len(ngram)*len(seed_word) != 0:
    assert neighborhood > 0
    q = '"%s" AROUND(%s) "%s"' % (ngram, neighborhood, seed_word)
    q2 = ''.join(map(lambda c: '%' + hex(ord(c)).split('x')[-1], q))
    url = 'http://www.google.com./search?gcx=w&sourceid=chrome&client=ubuntu&channel=cs&ie=UTF-8&q=%s&filter=0&num=10&start=990' % (q2)
else:
    q = '"' + ngram + seed_word + '"'
    q2 = ''.join(map(lambda c: '%' + hex(ord(c)).split('x')[-1], q))
    url = 'http://www.google.com./search?gcx=w&sourceid=chrome&client=ubuntu&channel=cs&ie=UTF-8&q=%s&filter=0&num=10&start=990' % (q2)
url = re.sub(' {2,}', ' ', url)
url = url.replace(' ', '+')

return url, q

class LockedSet:

def __init__(self):
    self.elems = []
    self.lock = threading.Lock()

def has(self, elem):
    self.lock.acquire()
    result = elem in self.elems
    self.lock.release()
    return result

def add(self, elem):
    self.lock.acquire()
    if elem in self.elems:
        self.lock.release()
        return
    self.elems.append(elem)
    self.lock.release()

def remove(self, elem):
    self.lock.acquire()
    if elem not in self.elems:
        self.lock.release()
        return
    self.elems.remove(elem)
    self.lock.release()

class MessageLogger:

def __init__(self, fd):
    self.fd = fd
    self.lock = threading.Lock()
def log_and_display(self, message):
    t = time.ctime()
    text = t + ': ' + message
    self.lock.acquire()
    print text
    #file_msg = '\n' + text
    #self.fd.write(file_msg)
    #del t
    #del text
    #del file_msg
    self.lock.release()

class DatabaseInteractor:

def __init__(self):
    self.conn = psycopg2.connect("dbname='3query_hits_db'")
    self.db_lock = threading.Lock()
    self.cur = self.conn.cursor()
def get_next_query(self):
    #self.db_lock.acquire()
    ngram = ''
    seed_word = ''
    neighborhood = ''
    try:
        #Choose the query that was updated longest ago that isn't in the field. If all are in the field, return the one that was updated longest ago.
        self.cur.execute("SELECT ngram, seed_word, neighborhood, hits, ts FROM google_hits ORDER BY in_field ASC, ts ASC LIMIT 1;")
        #self.cur.execute("SELECT ngram, seed_word, neighborhood, hits, ts FROM google_hits WHERE ngram ~ E'^evolu\.{2}o$' LIMIT 1;")
        row = self.cur.fetchone()
        if row != None:
            ngram = row[0]
            seed_word = row[1]
            neighborhood = int(row[2])
            hits = int(row[3])
            ts = row[4]
            sql = "UPDATE google_hits SET in_field='t' WHERE ngram=%s AND seed_word=%s AND neighborhood=%s;"
            self.cur.execute(sql, (ngram, seed_word, neighborhood,))
            #self.cur.execute("UPDATE google_hits SET in_field='t' WHERE ngram=E'%s' AND seed_word='%s' AND neighborhood=%s;" % (ngram.replace("'", "\\'"), seed_word, neighborhood) )
            return_result = ngram, seed_word, neighborhood, ts
            self.conn.commit()
            #del hits
            #del ts
        else:
            return_result = 'none', 'none', 0, 0
        #del self.cur
        self.cur = self.conn.cursor()
        #del row
        #del ngram
        #del seed_word
        #del neighborhood
        #self.db_lock.release()         
        return return_result
    except:
        raise
        try:
            #self.db_lock.release()
            #self.cancel_query(ngram, seed_word, neighborhood)
            #print
            pass
        except:
            pass
        return None

def update_hits(self, ngram, seed_word, neighborhood, hits):
    #self.db_lock.acquire()
#   start = time.time()
    SQL = "UPDATE google_hits SET hits=%s, in_field='f', ts=now() WHERE ngram=%s AND seed_word=%s AND neighborhood=%s;"
    self.cur.execute( SQL, (hits, ngram, seed_word, neighborhood, ) )
    self.conn.commit()
    #del ngram
    #del seed_word
    #del neighborhood
    #del hits
#   end = time.time()
#   print 'Updating hits tool %s seconds.' % (end-start)


def cancel_query(self, ngram, seed_word, neighborhood, ts):
    self.cur.execute("UPDATE google_hits SET in_field='f', ts='%s'::timestamp without time zone WHERE ngram=E'%s' AND seed_word='%s' AND neighborhood=%s;" % (ts, ngram.replace("'", "\\'"), seed_word, neighborhood) )
    self.conn.commit()

def close(self):
    #self.cur.execute("UPDATE google_hits SET in_field=false;")
    self.cur.close()
    self.conn.commit()
    self.conn.close()


#put the headers in too, and encrypt everything

def recv_all(s):

msg = ''
while True:
    data = s.recv(4096)
    if data != '':
        msg += data
    else:
        break
return msg

def decrypt_message(message):

chunks = []
i = 0
while True:
    next_chunk = message[i:i+CHUNK_SIZE]
    if next_chunk == '':
        break
    chunks.append(next_chunk)
    i += CHUNK_SIZE

decrypt_chunks = map(lambda chunk: rsa.pkcs1.decrypt(chunk, PRIVATE_KEY), chunks)

return ''.join(decrypt_chunks)

class SimpleServeClient(threading.Thread):

def __init__(self, conn, addr, db_machine, logger, listener, ip_set):
    threading.Thread.__init__(self)
    self.conn = conn
    self.addr = addr
    self.conn.settimeout(30)
    self.logger = logger
    self.db_machine = db_machine
    self.listener = listener
    self.ip_set = ip_set
def run(self):

    self.logger.log_and_display('Serving a client %s in %s' % (self.addr, self.getName()))

    while self.listener.is_alive():
        try:
            #Get the query URL for the client
            result = self.db_machine.get_next_query()
            if result == None:
                continue
            ngram, seed_word, neighborhood, ts = result


            #del result


            if not self.listener.is_alive(): break

            query_url, q = make_query_url(ngram, seed_word, neighborhood)
            message = protocol.package_for_protocol(query_url)

            if not self.listener.is_alive(): break

            self.conn.sendall(message)
            #del message
            if not self.listener.is_alive(): break

            #Get response from client (which should be a web page).
            message = '\1'
            start_time = time.time()
            while True:
                if not self.listener.is_alive(): break
                data = self.conn.recv(4096)
                message += data
                if protocol.protocol_condition(message[1:]) or time.time() - start_time > 20:
                    break
                #if time.time() - start_time > 25:
                #   print '\nLeaving early.  Bad things may happen!\n'
                #   break
            #del data

            if not protocol.protocol_condition(message[1:]): #NEW: this may mean the client disconnected
                #del ngram
                #del seed_word
                #del neighborhood
                #del message
                self.logger.log_and_display('WTF? ' + str(self.addr))
                break

            if not self.listener.is_alive():
                #del ngram
                                    #del seed_word
                #del neighborhood
                #del message

                break

            message = message[protocol.hash_len()+1:]
            message = zlib.decompress(message)
            message = eval(message)

            if message['query'] != query_url:
                logger.log_and_display('Wrong page sent back ' + str(self.addr) + query_url)
                continue
            #del query_url

            page = message['page']

            #del message

            if not self.listener.is_alive():
                #del page
                break

            hits = extract_hits_from_page(page)
            if hits == None:
                break
            #   open('./pages/'+str(time.time()), 'w').write(page)

            if not self.listener.is_alive():
                #del page
                #del hits
                break

            if hits == None:
                #self.db_machine.cancel_query(ngram,seed_word, neighborhood, ts)
                pass
            else:
                self.db_machine.update_hits(ngram, seed_word, neighborhood, hits)
                log_message = "Updated ngram='%s', seed_word='%s', neighborhood='%s', hits='%s'" % (ngram, seed_word, neighborhood, hits)
                self.logger.log_and_display(log_message + ' ' + str(self.addr))
            #del hits
            #del ngram
            #del seed_word
            #del neighborhood
        except:
            self.logger.log_and_display('Crap ' + str(self.addr))
            break


    self.logger.log_and_display(  'Connexion closed!' )
    #self.ip_set.remove(self.addr[0])

def __del__(self):
    try:
        self.conn.shutdown(SHUT_RDWR)
    except:
        pass
    try:
        self.conn.close()
    except:
        pass
    #del self.conn
    #del self.addr

def recv_all(s):

#s.settimeout(2)
msg = s.recv(4096)
while True:
        data = s.recv(4096)
    if data != '':
        msg += data
    else:
        break
return msg

def listen_for_clients():

s = socket(AF_INET, SOCK_STREAM)    # create a TCP socket
s.bind((myHost, myPort))        # bind it to the server port
s.listen(MAX_QUEUED_CONNEXIONS)     # allow MAX_CONNEXIONS simultaneous
                    # pending connections
db_machine = DatabaseInteractor()
listener = Listen.Listen(end_message='Listener stopping...')
listener.start()

ip_set = 1#LockedSet()

fd = open('log.txt', 'a')
logger = MessageLogger(fd)
logger.log_and_display('Starting up server...')

thread_list = []
s.settimeout(0.1)
while listener.is_alive():
    try:
    # wait for next client to connect
        conn, addr = s.accept() # connection is a new socket
    except:
        continue

    if listener.is_alive():
        if True:#not ip_set.has(addr[0]):
            client_server = SimpleServeClient(conn, addr, db_machine, logger, listener, ip_set)
            client_server.isdaemon = True
            #ip_set.add(addr[0])
            client_server.start()

print 'About to close the listener socket.'
s.close()
logger.log_and_display('Shutting down...')

#logger.log_and_display('Waiting for the kids to stop playing...')
#map(lambda kid: kid.join(), thread_list)

logger.log_and_display('Setting in_field to false for all queries...')
db_machine.close()
#fd.close()

if name == 'main':

set_proc_name('QuerierServer')
listen_for_clients()

Comments and changes to this ticket

  • Daniele Varrazzo

    Daniele Varrazzo January 11th, 2012 @ 03:44 PM

    Is this example self-contained?

  • Daniele Varrazzo

    Daniele Varrazzo January 11th, 2012 @ 03:58 PM

    With some google-fu I've found: http://bugs.python.org/issue13741 with the additional info:

    This program crashes after 12-24 hours of running.  My OS is Ubuntu 11.10, I'm using Python 2.7.2, and gcc 4.6.1.
    

    Now, It looks like this is a server receiving some message for remote clients. Any chance to have a simulation of the same workload? Or a core with the symbols?

  • Andrew Nystrom

    Andrew Nystrom January 13th, 2012 @ 10:09 AM

    Yes, the message on http://bugs.python.org/issue13741 was mine. I seem to have fixed the problem though. In the version that crashed, all the clients (there's a thread for each client) were sharing a cursor with a mutex lock. I made it so every thread has its own connection, and now it doesn't seem to crash anymore (it's been running for 34 hours without crashing).

    I'm wondering if, when a thread closed and the garbage collector cleaned up, it would sometimes try to delete the cursor even though other threads had a handle on it. If so, the next thread that did this might cause the double free problem.

  • Daniele Varrazzo

    Daniele Varrazzo January 13th, 2012 @ 11:45 AM

    • State changed from “new” to “invalid”

    Ok, nice you have fixed the problem.

    The cursors are not thread safe: you did good to use a different one for each thread. It would be nice if they didn't crash though. I wonder if it's easy to write a check in execute()/fetch*() to verify that the cursor is only used in a single thread.

    Closing the bug as it was caused by unsafe use of the library.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.

Shared Ticket Bins

Pages