20 Jan

PostgreSQL – BULK INSERTING from a delimited file, Most common errors in Windows

PostgreSQL – BULK INSERTING from a delimited file, Most common errors in Windows

PostgreSQL alternative for MS SQL Server BULK INSERT is the equally simple COPY command. In this article, we are going to take a step by step look at how to use this and possible errors. So I tried below command, which is completely correct. I faced a couple of issues when fixed it worked fine.

COPY part FROM '...Desktop\TPCH_001\pg_part.tbl' WITH (DELIMITER  '|')

Errors

ERROR: could not open file "file.tbl" for reading: Permission denied.
HINT:  COPY FROM instructs the PostgreSQL server process to read a file.  You may want a client-side facility such as psql's \copy. 
SQL state: 42501

To resolve this error, you need to update the permission of the file so that PostgreSQL can read them. Get to the folder/file, right-click and get to properties. Go to the Security tab. You need to add “Everyone” to the list. More info

ERROR:  extra data after last expected column data ending with dilimiter

This mean you have more columns the CSV than expected. As example you might have 4 column in CSV file and only 3 columns in the table. If it was other way around (table having more columns than file) we can define the needed columns in the query like below.

COPY part (p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment) FROM '...Desktop\TPCH_001\pg_part.tbl' WITH (DELIMITER  '|')

In my case, I used a simple python file to edit the files and write a new file. Yes! the only option is to edit the file. In my case I had a extra delimiter at the end of each line.

import os
from os import listdir
from os.path import isfile, join


data_location = "path_to_folder"
files = [f for f in listdir(data_location) if isfile(join(data_location, f))]
output_folder = "edited"

if not os.path.exists(os.path.join(data_location, output_folder)):
    os.makedirs(os.path.join(data_location, output_folder))

for file in files:
    with open(os.path.join(data_location, file), 'r+') as input_file, open(os.path.join(data_location, output_folder, file), 'w+') as output_file:
        line = input_file.readline()
        while line:
            output_file.write(line[:-2]+'\n')
            line = input_file.readline()

After these 2 fixes, my command ran without any issue.

Last updated by .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.