This project is archived and is in readonly mode.

#92 ✓invalid
Dave Vitek

mogrify silently truncates on null characters

Reported by Dave Vitek | January 18th, 2012 @ 04:58 PM

I'm using version 2.3.2 with postgres 9.0. I apologize if this has been addressed already.

I'm making use of the NOTIFY command and was recently doing some testing to determine what characters could be in the payload. postgres seems to work great with everything except null characters (\0), which cause: (DataError) invalid byte sequence for encoding "SQL_ASCII": 0x00. So far so good.

Here's the problem:
print cursor.mogrify('NOTIFY abc, %s', ('foo\0bar',))
NOTIFY abc, E'foo'

I'm thinking an exception would be preferable to silently changing the payload... and maybe this problem isn't unique to NOTIFY at all. I mention NOTIFY because the payload is not subject to the encoding IIRC, which might make it an exceptional case, but I think mogrify is probably oblivious to this.

I ran things in gdb to see exactly what was responsible for truncating the string, and it's happening inside libpq. Here's where PQescapeStringInternal (not your code...) is truncating things:

2974 while (remaining > 0 && *source != '\0')

#0 PQescapeStringInternal (conn=0x7ffff08ccc00,

to=0x7ffff11071fa "\020\361\377\177", from=0x7ffff328acb4 "foo",
length=7, error=0x7fffffffcaf8, encoding=0, std_strings=0 '\000')
at fe-exec.c:2974

#1 0x00007ffff191ac2a in PQescapeStringConn (conn=0x7ffff08ccc00,

to=0x7ffff11071fa "\020\361\377\177", from=0x7ffff328acb4 "foo",
length=7, error=0x7fffffffcaf8) at fe-exec.c:3049

#2 0x00007ffff19105f7 in psycopg_escape_string (obj=0x7ffff1c0cb50,

from=0x7ffff328acb4 "foo", len=7,
to=0x7ffff11071f8 "\340q\020\361\377\177", tolen=0x7fffffffcb48)
at psycopg/utils.c:58

#3 0x00007ffff190d19f in qstring_quote (self=0x7fffefff54b0)

at psycopg/adapter_qstring.c:100

#4 0x00007ffff190d2e2 in qstring_str (self=0x7fffefff54b0)

at psycopg/adapter_qstring.c:130

#5 0x00007ffff190d327 in qstring_getquoted (self=0x7fffefff54b0, args=0x0)

at psycopg/adapter_qstring.c:139

#6 0x00000000004aa4b1 in PyCFunction_Call (func=0x7ffff019af38,

arg=0x7ffff7e9a050, kw=0x0) at Objects/methodobject.c:90

#7 0x000000000047ec94 in PyObject_Call (func=0x7ffff019af38,

arg=0x7ffff7e9a050, kw=0x0) at Objects/abstract.c:2492

#8 0x000000000047edb9 in call_function_tail (callable=0x7ffff019af38,

args=0x7ffff7e9a050) at Objects/abstract.c:2524

#9 0x000000000047f410 in _PyObject_CallMethod_SizeT (o=0x7fffefff54b0,

name=0x7ffff192b61a "getquoted", format=0x0) at Objects/abstract.c:2640

#10 0x00007ffff1900e79 in microprotocol_getquoted (obj=0x7ffff328ac90,

conn=0x7ffff1c0cb50) at psycopg/microprotocols.c:241

#11 0x00007ffff190516e in _mogrify (var=0x7ffff32969d0, fmt=0x7ffff328fbc8,

conn=0x7ffff1c0cb50, new=0x7fffffffceb0) at psycopg/cursor_type.c:232

#12 0x00007ffff1905fcd in _psyco_curs_mogrify (self=0x7ffff00b6e60,

operation=0x7ffff328fbc8, vars=0x7ffff32969d0)
at psycopg/cursor_type.c:587

#13 0x00007ffff1906165 in psyco_curs_mogrify (self=0x7ffff00b6e60,

args=0x7ffff00bc878, kwargs=0x0) at psycopg/cursor_type.c:627

It's not clear to me whether PQescapeStringConn is supposed to function like this or not.

Comments and changes to this ticket

  • Daniele Varrazzo

    Daniele Varrazzo January 19th, 2012 @ 01:04 AM

    • State changed from “new” to “invalid”

    I think it an expected behaviour: PostgreSQL internally uses mostly - or maybe uniquely - zero-terminated strings when dealing with text: only bytea can store zeros. It is forbidden to use zero into a string literal (see http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html), and trying to produce one results in an error:

    piro=> select chr(0);
    ERROR:  null character not permitted
    

    Python strings don't have this limitation, so there is a mismatch between the two objects: not all Python strings can be represented as PostgreSQL text.

    Because PQescapeString takes a C string as input, the first 0 is used as string delimiter: this is not surprising. There is no plan to work around using this function: it is the function we rely for the safety (against SQL injections) and if even it was able to encode a 0 in a string, something else downstream would most likely fail.

    The NOTIFY payload is a string, so as such it would be wise to expect all the limitations of the PostgreSQL text. I see the notify payload is not decoded in python2: that was probably a bad choice (it is, in python3), but all psycopg2 on python2 is octet-oriented, so at least it is a consistent one. I'm afraid we'll have to live with this until some psycopg3.

    I think inspecting all the strings passed by psycopg looking for zeros would have a noticeable performance effect: all the text arguments and derived are filtered through PQescapeString. Although raising a DataError would be a reasonable behaviour, I don't think dealing with \0 in strings is such a common case to deserve a dent in the general performance. It's true the argument is silently truncated, but that argument was created by a bug, otherwise you should have been using bytea data type and binary arguments in first place.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.

Shared Ticket Bins

Pages