BigQuery is a serverless, scalable, multi-cloud data warehouse.
Storage
- First 10
GiB are free - Anything beyond that costs
$0.02/GiB per month
Queries
- First 1024
GiB of query data processed is free - Anything beyond that costs
$0.005/GiB after that
Loading
- Loading data into BigQuery is free
The bq CLI
In the examples below, sometimes I'm going to use command-line arguments, other times I won't, you'll see that you're allowed to specify project ID, dataset ID, and table ID, all in the same string, in the following format:
PROJECT_ID:DATASET_ID.TABLE_ID
show
ls
mk
Creating a new dataset using the mk subcommand:
Output
Dataset 'my-project:my-dataset' successfully created.
Exporting data to Google Cloud Storage, link to documentation
bq mk \
--transfer_config \
--project_id 'my-project' \
--data_source 'google_cloud_storage' \
--display_name 'name' \
--target_dataset 'dataset' \
--params='parameters'
load
The load subcommand creates or updates a table and loads data in a single step.
By default, the newly loaded data will be appended to the table. Use the --replace flag to have the load overwrite the existing table
By default, data is assumed to be encoding in UTF-8 format
Load a CSV file my-data.csv with JSON schema my-schema.json to project my-project, dataset my-dataset, table my-table, replacing the contents of the table if it already exists
bq load \
--location 'US' \
--project_id 'my-project' \
--dataset_id 'my-dataset' \
'my-table' \
'my-data.csv' \
--source_format 'CSV' \
--skip_leading_rows 1 \
--schema './schema.json' \
--replace
rm
Delete a single table, my-table in the dataset my-dataset
# With confirmation
bq rm -t 'my-project:my-dataset.my-table'
# Without confirmation
bq rm -f -t 'my-project:my-dataset.my-table'
Delete every table in the dataset my-dataset
# With confirmation
bq rm -r -d 'my-dataset'
# Without confirmation
bq rm -r -f -d 'my-project:my-dataset'
show
cp
Resurrect a recently-deleted view/table
You can restore a table/view by copying the version of it from a previous point in time. You can specify how long ago with @-MILLISECONDS where MILLISECONDS is the number of milliseconds in the past to capture the snapshot of the table/view's state.
Example:
bq cp DATASET_NAME.TABLE_NAME@-3600000 DATASET_NAME.TABLE_NAME
Note: -3600000 happens to be 1 hour ago in milliseconds.
Alternatively, you can use an arithmetic substitution $((...)) to more easily calculate the distance into the past. For example, to go 24 hours into the past…
bq cp DATASET_NAME.TABLE_NAME@$((-1*1000*60*60*24)) DATASET_NAME.TABLE_NAME
Checking two tables to see if they are the same
WITH
result1 AS (SELECT * FROM `PROJECT_ID._scriptdc28a578fc5cb3548deb8f6ddc73c6b9a8b36ae1.result1`),
result2 AS (SELECT * FROM `PROJECT_ID._scriptbd5e92955041f2d41494d5dafb66127b8005d9da.result2`)
(SELECT * FROM result1
EXCEPT DISTINCT
SELECT * FROM result2)
UNION ALL
(SELECT * FROM result2
EXCEPT DISTINCT
SELECT * FROM result1)