This project is archived and is in readonly mode.
Mogrify performance on arrays
Reported by Psycopg website | August 28th, 2013 @ 02:18 PM
Submitted by: ChrisB
If I run cursor.mogrify on a 64000 double array it is taking 350ms. If I just do a 'ARRAY[' + ','.join(map(str, array)) + ']' and put it into a string format myself it only takes 55ms. Although the documentation tells you not to do this, it is bringing the insert time of my app down massively.
Comments and changes to this ticket
-
Daniele Varrazzo September 20th, 2013 @ 11:28 AM
- State changed from new to invalid
Thank you for inventing the SQL injection.
Of course feel free to do it in your app if you know where the input is coming from, but don't even think to suggest it as a good idea.
-
Federico Di Gregorio September 20th, 2013 @ 12:35 PM
- State changed from invalid to open
- Tag cleared.
- Assigned user set to Federico Di Gregorio
Daniele, wait. He was talking about a simple benchmark and I can see where the bottleneck is. The mogrification of arrays is slow, because the arguments are checked one by one: we can probably make this faster by caching the adapter when the list is know to contain only objects of the same type. I'll investigate this.
-
Daniele Varrazzo May 5th, 2014 @ 11:26 PM
I think the speedup you have in mind is not easy to implement; OTOH it can be generalized. The adaptation codepath is not straightforward as it depends on several choices: is the object
__conform__
, is there aprepare()
to be called etc. We can cache a mapping class -> choices made and reuse it instead of exploring the adaptation tree for each object: this would speed up not only array adaptation but any operation such asexecute(SQL, ["s1", "s2", "s3"])
: we wouldn't have to check how to adapt the string object 3 times but only once.The cache should be short-lived as changing the adapters map should invalidate it and because the number of classes could be unbound (if they are created dynamically, e.g. namedtuples). But creating a new cache for every
execute()
/executemany()
would be already a nice speedup and there should be no side effect (as there should be no program relying on a change of the adapters map during a concurrentexecute()
...).
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.