This project is archived and is in readonly mode.

#37 ✓invalid
Federico Di Gregorio

copy_from does not stop reading after an error

Reported by Federico Di Gregorio | February 1st, 2011 @ 10:39 AM

It looks like copy_from does not stop reading after an error. When the input file is short, it is okay. But when the input file is very long, it is really boring to wait for the entire file to be read just to discover there is an error on the 10th row.

Given the same input file, it looks like psql \copy command behaves correctly and stops just after the incorrect row, without reading the entire file. I have checked that just by looking at the command execution time that seems proportional to the number of processed rows.

Here is a script to reproduce this bug (just create a database "test" and run the script):

https://gist.github.com/805669

Comments and changes to this ticket

  • Nicolas Grilly

    Nicolas Grilly February 1st, 2011 @ 11:40 AM

    I have ran the same script with pg8000, and it does not stop reading after an error either... Maybe it is not a bug, and just a limitation of the PostgreSQL protocol? Maybe the copy from protocol is not designed to return errors in the middle of the data stream, and I just have to split my data stream into many chunks and call copy_from for each chunk?

  • Nicolas Grilly

    Nicolas Grilly February 1st, 2011 @ 12:57 PM

    I have tested again the \copy command of psql and, contrary to what I wrote before, it looks like psql does not stop reading after an error either, and has to read the complete file before reporting the error.

    Conclusion: psycopg2, pg8000 and psql have the same behavior regarding the command "copy from stdin". The input data file is read entirely, even if there is incorrect data at the start of the file, and errors are reported only after having read the complete file.

    Therefore it is probably not a bug in psycopg2, and just a "limitation" of PostgreSQL protocol. Here is the protocol official documentation:
    http://www.postgresql.org/docs/9.0/static/protocol-flow.html#PROTOC...

    I understand we have to "end" the copy before having a chance to retrieve PostgreSQL backend response and know if our data are correct, or not. Do you confirm this analysis?

    It means copy_from is not designed to send a 10 gigabytes stream to PostgreSQL, with just one "copy from stdin" command. Maybe I have to split my input stream into smaller chunks and execute a "copy from stdin" command for each of them. Do you confirm this is the only (and adequate) solution?

    Thanks for you help and advice.

  • Nicolas Grilly
  • Daniele Varrazzo

    Daniele Varrazzo February 15th, 2011 @ 11:30 PM

    • State changed from “new” to “invalid”

    From the discussion emerged it seems this is an issue to be solved into the libpq.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

<b>WARNING:</b> the informations in this tracker are archived. Please submit new tickets or comments to <a href="https://github.com/psycopg/psycopg2/issues">the new tracker</a>.
<br/>
Psycopg is the most used PostgreSQL adapter for the Python programming language. At the core it fully implements the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.

Shared Ticket Bins

Attachments

Pages