This project is archived and is in readonly mode.
2.4beta: server-side ("named") cursors: fetchone should really just fetch one
Reported by Jon | February 13th, 2011 @ 01:38 AM
Basically, a change in 2.4beta is an "optimization" such that a fetch of one row from a server-side (named) cursor results in the fetch of (up to) 2,000 rows. IMO, it's inappropriate for the underlying library to be making decisions like this - if I say I want one row I really only want one row to be fetched. PostgreSQL even includes the "cursor_tuple_fraction" config option, which defaults to 0.1, indicating that the expected nubmer of rows to be fetched from a server-side cursor is about 10% - and the optimizer makes decisions based upon that information. Applications themselves can start using the fetch-more-than-one-row API if that's what they want (and what they should be doing) -- SQLAlchemy, for example, does just that. Making the decision to fetch /more/ rows than asked is not something that should be done at this level, IMO -- I asked for a server-side cursor for a reason.
Thanks for hearing me out.
Comments and changes to this ticket
-
Jon February 13th, 2011 @ 01:44 AM
I think I may have goofed. I'm not sure, now, under what circumstances this comes into play. I don't see cursor.next() being used in SQLAlchemy. However, I still feel as though it is unwise to arbitrarily assume that 2,000 rows is an appropriate value when people are /expecting/ single-row iteration from a server-side cursor.
-
Daniele Varrazzo February 15th, 2011 @ 09:22 AM
- State changed from new to invalid
- Tag set to named cursor, iter, rel-2.4.0-beta2
Hello Jon,
If you want single rows from the database, you can use cur.fetchone(). The roundtrip optimization only involves iter(cur) (unless I've broken fetchone, which would be a bug indeed). This is not different from iter(file) that doesn't read the file by line but by blocks, even if iteration is then per line: this kind of buffering is not even disabled running "python -u" (unbuffered): the only way to iterate line-by-line on a file is "while 1: line = file.readline(), if not line break".
I think it is easier for an application really requiring record-by-record iteration to switch from a pattern "for record in cur:" to a pattern "while 1: record = cur.fetchone(); if not record: break" whereas the other way round is clumsy. Before 2.4 switching to a named cursor in an iter pattern implied a hidden heavy cost, requiring the clumsy workaround of the fetchmany and raising the usage bar for a feature otherwise very interesting.
I have detailed the rationale here: http://archives.postgresql.org/psycopg/2011-01/msg00020.php. I would appreciate if you could raise your opinion to the ML to allow other people to consider the option. I'm not convinced that 2000 (settable using arraysize) is a perfect default, so ideas are welcome. I'm also interested in any issue you may know of about the possible interaction of this feature with other libraries.
Thank you very much.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.