This project is archived and is in readonly mode.
Invalid json is returned for json column type
Reported by Psycopg website | January 14th, 2014 @ 03:58 AM
Submitted by: Oleg Anashkin
For columns with type json, psycopg returns string with single quotes, like this:
{'Url': 'http://www.amazon.com'}
This is invalid json and it can't be parsed using Python's json.loads() function because it requires double quotes.
Comments and changes to this ticket
-
Oleg Anashkin January 14th, 2014 @ 04:00 AM
Sorry, somehow bug post got screwed up. What I meant is this: {'Url': 'blablabla'} - uses single quotes.
-
Daniele Varrazzo January 14th, 2014 @ 11:15 AM
Psycopg only forwards what received by postgres. How are you generating this data?
From a quick test postgres can't generate "wrong" data:
piro=# select '{''Url'': ''blablabla''}'::json; ERROR: invalid input syntax for type json LINE 1: select '{''Url'': ''blablabla''}'::json; ^ DETAIL: Token "'" is invalid. CONTEXT: JSON data, line 1: {'... piro=# select '{"Url": "blablabla"}'::json; json ---------------------- {"Url": "blablabla"} (1 row)
and psycopg shouldn't mess with it:
>>> cur.execute("""select '{"Url": "blablabla"}'::json""") >>> cur.fetchone() ({u'Url': u'blablabla'},)
Roundtripping doesn't seem a problem either:
>>> from psycopg2.extras import Json >>> cur.execute("select %s", [Json({'Url': u'blablabla'})]) >>> cur.fetchone()[0] '{"Url": "blablabla"}'
so how can I reproduce the bug?
-
Oleg Anashkin January 14th, 2014 @ 03:23 PM
I'm using psycopg through https://github.com/coleifer/peewee ORM, but it doesn't mess with the data. The column in question is defined as a simple text field. When I run "select properties from book where id=313;" from psql.exe I get this:
{"Binding": "Kindle Edition", "FileSize": 687, "Format": "Kindle eBook", "FullImage": "http://ecx.images-amazon.com/images/I/81z9cxJP02L.jpg", "Lending": true, "SoldBy": "Amazon Digital Services, Inc.", "TTS": true, "Url":"http://www.amazon.com/Love-Is-Strange-Paranormal-Romance-ebook/dp/B00ASBPAWY"}
But when I do using python:
book = Book.get(Book.id == 313) print(book.properties)
I get this:
{'Adult': False, 'Lending': True, 'Binding': 'Kindle Edition', 'SoldBy': 'Amazon Digital Services, Inc.', 'Url': 'http://www.amazon.com/Love-Is-Strange-Paranormal-Romance-ebook/dp/B00ASBPAWY', 'FileSize': 687, 'TTS': True, 'Format': 'Kindle eBook', 'FullImage': 'http://ecx.images-amazon.com/images/I/81z9cxJP02L.jpg'}
To repro, you can maybe try to not do conversion to json in cur.execute, but use existing json column. In the meantime I'm trying to come up with a shorter repro case.
-
Daniele Varrazzo January 14th, 2014 @ 03:38 PM
I'm using psycopg through https://github.com/coleifer/peewee ORM, but it doesn't mess with the data. The column in question is defined as a simple text field. When I run "select properties from book where id=313;" from psql.exe I get this:
{"Binding": "Kindle Edition", "FileSize": 687, "Format": "Kindle eBook", "FullImage": "http://ecx.images-amazon.com/images/I/81z9cxJP02L.jpg", "Lending": true, "SoldBy": "Amazon Digital Services, Inc.", "TTS": true, "Url":"http://www.amazon.com/Love-Is-Strange-Paranormal-Romance-ebook/dp/B00ASBPAWY"}
Well, this is not JSON. It remotely look so but it's not valid. It's not in a Postgres JSON field and trying to put it there fails. You cannot expect psycopg to parse invalid data as it was JSON when neither Postgres nor Python do that.
But when I do using python:
book = Book.get(Book.id == 313) print(book.properties)
This must be peewee doing something "intelligent" for you, such as eval()'ing the text as it was a Python expression. If you want psycopg to do that you can write your own typecaster.
To repro, you can maybe try to not do conversion to json in cur.execute, but use existing json column.
That column is not JSON.
In the meantime I'm trying to come up with a shorter repro case.
Please: if you can reproduce a problem with VALID JSON I'd be happy to fix psycopg. Otherwise I'll close this bug.
-
Oleg Anashkin January 14th, 2014 @ 03:45 PM
I'm sorry, but this is json and I got it from postgres. Attaching screenshot. Column type is json. Here is the table schema:
ebooktracker=# \d+ book;
Table "public.book" Column | Type | Modifiers | Storage | Stats target | Description -------------------+--------------------------------+---------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_id_seq'::regclass) | plain | | merchant_id | integer | | plain | | market_id | integer | | plain | | group_id | integer | | plain | | publisher_id | integer | | plain | | binding_id | integer | | plain | | merchant_book_id | character varying(20) | not null | extended | | isbn | character varying(255) | default NULL::character varying | extended | | title | text | not null | extended | | description | text | | extended | | language | character varying(255) | default NULL::character varying | extended | | publication_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | | number_of_pages | integer | | plain | | image_url | character varying(255) | | extended | | rating | integer | | plain | | number_of_reviews | integer | | plain | | rank | integer | | plain | | price | integer | | plain | | last_checked_at | timestamp(0) without time zone | not null | plain | | added_date | timestamp(0) without time zone | not null | plain | | properties | json | | extended | | price_change | integer | | plain | | price_changed_at | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | | -
Oleg Anashkin January 14th, 2014 @ 03:45 PM
Sorry, fixed formatting:
ebooktracker=# \d+ book; Table "public.book" Column | Type | Modifiers | Storage | Stats target | Description -------------------+--------------------------------+---------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_id_seq'::regclass) | plain | | merchant_id | integer | | plain | | market_id | integer | | plain | | group_id | integer | | plain | | publisher_id | integer | | plain | | binding_id | integer | | plain | | merchant_book_id | character varying(20) | not null | extended | | isbn | character varying(255) | default NULL::character varying | extended | | title | text | not null | extended | | description | text | | extended | | language | character varying(255) | default NULL::character varying | extended | | publication_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | | number_of_pages | integer | | plain | | image_url | character varying(255) | | extended | | rating | integer | | plain | | number_of_reviews | integer | | plain | | rank | integer | | plain | | price | integer | | plain | | last_checked_at | timestamp(0) without time zone | not null | plain | | added_date | timestamp(0) without time zone | not null | plain | | properties | json | | extended | | price_change | integer | | plain | | price_changed_at | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | |@@@
-
Oleg Anashkin January 14th, 2014 @ 03:50 PM
And here is repro using python:
import psycopg2 conn = psycopg2.connect(database='ebooktracker') cur = conn.cursor() cur.execute("select properties from book where id=313") print(cur.fetchone())
Here is its result:
({'Lending': True, 'Adult': False, 'SoldBy': 'Amazon Digital Services, Inc.', 'Binding': 'Kindle Edition', 'FileSize': 687, 'Url': 'http://www.amazon.com/Love-Is-Strange-Paranormal-Romance-ebook/dp/B...;, 'FullImage': 'http://ecx.images-amazon.com/images/I/81z9cxJP02L.jpg', ' Format': 'Kindle eBook', 'TTS': True},)
No peewee in the picture here. Purely psycopg2.
-
Federico Di Gregorio January 14th, 2014 @ 03:58 PM
That's just how Python formats a dictionary in a "print". If you try to access it you'll see that it is a perfectly valid:
record = cur.fetchone() print record[0]["Binding"] print record[0]["Url"]
-
Oleg Anashkin January 14th, 2014 @ 04:03 PM
Ok, so as I understand the problem is that the json record is actually eval'd by psycopg instead of being converted to text, right? Similar to what Daniele was claiming peewee was doing, but it's actually psycopg, not peewee.
Is there any way to suppress this eval without changing the sql? Some config/flag in psycopg?
-
Federico Di Gregorio January 14th, 2014 @ 04:52 PM
@extesy, yes, I forgot that the result is a tuple even when you select just one column. See my edited answer.
-
Oleg Anashkin January 14th, 2014 @ 04:55 PM
I peeked into psycopg's code and found that inside register_json it always creates typecasters to perform loads/dumps on the column data, therefore unpacking the data into the dict. Since most ORM's don't support json column type this causes problems when they try to interpret it as text, because python's default serialization of dict produces json-looking string which is not actually json. It would be really nice if there was a way to suppress this loads/dumps inside psycopg using some kind of configuration.
-
Federico Di Gregorio January 14th, 2014 @ 05:20 PM
Just create and register a type-caster that returns JSON as text:
JSONTEXT = psycopg2.extensions.new_type((114,), "JSONTEXT", (lambda value, curs: value)) psycopg2.extensions.register_type(JSONTEXT)
and you'll get
json
columns (oid 114) as text.BTW, your ORM is broken.
-
Daniele Varrazzo January 14th, 2014 @ 05:22 PM
Yes. The easiest is:
>>> psycopg2.extras.register_default_json(loads=lambda x: x) >>> cur.execute("select %s::json", [Json({u'Url': u'€'})]) >>> cur.fetchone()[0] '{"Url": "\\u20ac"}'
This works globally. You can have it working locally to a connection or cursor, see the docs
I will add some notes in the docs about not parsing json as it's the second time the argument crops up.
-
Daniele Varrazzo January 14th, 2014 @ 05:26 PM
@fog: I was going to give Oleg your suggestion, and I just realized that
psycopg2.extras.register_default_json(loads=lambda x: x)
works as well but is much more concise. Is the new_type/register_type way in the docs or it just came out from the previous discussion in the ML? -
Federico Di Gregorio January 14th, 2014 @ 05:55 PM
@Daniele: it is just that not all SQL types provide a
register_default_xxx
, only the new type-casters. So, I better like to use the low-level method in example code. -
Daniele Varrazzo January 14th, 2014 @ 06:01 PM
Well, even less, only
register_default_json()
exist :) That's because I've expected people wanting to customize the way it is parsed more often than with other data types.Anyway, I agree, for educational purpose, that the more generic solution is worth to be known too (and it's probably already shown in several examples, e.g.
DEC2FLOAT
). -
Daniele Varrazzo January 14th, 2014 @ 06:39 PM
- State changed from new to resolved
-
Oleg Anashkin January 14th, 2014 @ 07:07 PM
Thank you guys for your quick responses and for helping with this issue.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.