Import date column in Pandas to BigQuery

Imaging we have a small CSV file:

name,enroll_time
robin,2021-01-15 09:50:33
tony,2021-01-14 01:50:33
jaime,2021-01-13 00:50:33
tyrion,2021-2-15 13:22:17
bran,2022-3-16 14:00:01

Let’s try to load it into DataFrame of Pandas and upload it to a table of BigQuery:

import pandas as pd
from google.cloud import bigquery
df = pd.read_csv("test.csv", parse_dates=["enroll_time"], index_col=0)
schema = []
schema.append(bigquery.SchemaField("name", "STRING"))
schema.append(bigquery.SchemaField("enroll_time", "DATE"))
job_config = bigquery.LoadJobConfig(schema=schema)
bq_client = bigquery.Client()
table = "project.dataset.test_table"
job = bq_client.load_table_from_dataframe(
    df, table, job_config=job_config
)
job.result()

But it reports error:

  File "pyarrow/array.pxi", line 176, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to date32[day] would lose data: 1610704233000000000

Seems the BigQuery library couldn’t recognize the 1610704233000000000 as nano-seconds. Then I tried to divide the 1610704233000000000 with 1e9 but it also failed.

Actually what we need to do is just use TIMESTAMP instead of DATE as the type of column enroll_time:

schema.append(bigquery.SchemaField("name", "STRING"))
schema.append(bigquery.SchemaField("enroll_time", "TIMESTAMP"))

and the BigQuery library could recognize the column even with nano-seconds unit.

Robin on Linux

Import date column in Pandas to BigQuery

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply