Google BigQuery is a powerhouse for managing and analyzing large-scale datasets. Among its standout features are nested and repeated fields, which allow you to effectively represent complex, hierarchical, or denormalized data. This article dives into the details of working with these fields, explores practical examples, and highlights best practices.
What Are Nested and Repeated Fields in BigQuery?
BigQuery’s support for nested and repeated fields enables you to represent structured data hierarchies efficiently. These features reduce the need for traditional normalization while optimizing query performance for hierarchical or parent-child relationships.
1. Nested Fields
Nested fields in BigQuery are created using the STRUCT
(or RECORD
) type. A STRUCT
is a container that groups multiple fields into a single column.
- Example of a Nested Field:
A table where customer contact information is grouped under a single column calledcontact_info
.
CREATE TABLE `project.dataset.nested_table` (
customer_id INT64,
customer_name STRING,
contact_info STRUCT<
email STRING,
phone STRING
>
);
2. Repeated Fields
Repeated fields allow you to store arrays of values in a single column. These fields are denoted with the ARRAY
type.
- Example of a Repeated Field:
A table where a customer has multiple orders stored in a single column.
CREATE TABLE `project.dataset.repeated_table` (
customer_id INT64,
order_history ARRAY<
STRUCT<
order_id INT64,
order_date DATE,
total_amount FLOAT64
>
>
);
3. Combined Nested and Repeated Fields
BigQuery also supports a combination of nested and repeated fields, allowing for highly complex data structures.
- Example:
NestedSTRUCT
fields that contain repeatedARRAY
fields for customer order history.
CREATE TABLE `project.dataset.combined_table` (
customer_id INT64,
customer_name STRING,
contact_info STRUCT<
email STRING,
phone STRING
>,
order_history ARRAY<
STRUCT<
order_id INT64,
order_date DATE,
total_amount FLOAT64
>
>
);
Why Use Nested and Repeated Fields?
Advantages
- Denormalization for Performance
- BigQuery excels with denormalized data, minimizing the need for JOIN operations and improving query speed.
- Hierarchical Relationships
- Ideal for parent-child relationships like orders and customers.
- Efficient Storage
- Compact storage for structured data, reducing redundancy.
Best Practices
- Use nested and repeated fields for frequently queried hierarchical data.
- Avoid nesting more than 15 levels, as this is BigQuery’s schema limit.
- Use these fields to maintain denormalization without impacting query performance.
Loading Data with Nested and Repeated Fields
Loading JSON Data
To load data with nested and repeated fields, ensure your JSON file is formatted appropriately and a schema file is defined.
Example JSON File (sample_data.json
):
{
"customer_id": 1,
"customer_name": "John Doe",
"contact_info": {
"email": "john.doe@example.com",
"phone": "123-456-7890"
},
"order_history": [
{
"order_id": 101,
"order_date": "2023-11-01",
"total_amount": 250.75
},
{
"order_id": 102,
"order_date": "2023-11-05",
"total_amount": 100.50
}
]
}
Schema File (schema.json
):
[
{"name": "customer_id", "type": "INT64"},
{"name": "customer_name", "type": "STRING"},
{"name": "contact_info", "type": "RECORD", "fields": [
{"name": "email", "type": "STRING"},
{"name": "phone", "type": "STRING"}
]},
{"name": "order_history", "type": "RECORD", "mode": "REPEATED", "fields": [
{"name": "order_id", "type": "INT64"},
{"name": "order_date", "type": "DATE"},
{"name": "total_amount", "type": "FLOAT64"}
]}
]
Command to Load Data
bq load --source_format=NEWLINE_DELIMITED_JSON \
--schema=schema.json \
project.dataset.table_name \
sample_data.json
For detailed guidance, check out the BigQuery Loading JSON Documentation.
- 7 Powerful Strategies to Secure Your Next Job in 4 Months
- How Remote Work is Reshaping the Software Engineering Industry
- Overcoming Challenges of Cross-Cultural Remote Teams: Best Practices and Real-World Insights
- The Impact of Remote Work on Career Development in 2025
- Smart Financial Tips for Remote Engineers: Master Taxes, Savings, and Budgeting
Querying Nested and Repeated Fields
Flattening Repeated Fields with UNNEST
BigQuery requires the UNNEST
function to work with repeated fields.
Example Query:
Retrieve all orders for each customer:
SELECT
customer_id,
customer_name,
order_history.order_id,
order_history.total_amount
FROM `project.dataset.combined_table`,
UNNEST(order_history) AS order_history;
Filtering Specific Orders:
SELECT
customer_id,
customer_name,
order_history.order_id,
order_history.total_amount
FROM `project.dataset.combined_table`,
UNNEST(order_history) AS order_history
WHERE order_history.order_id = 101;
For more examples, see the BigQuery Nested and Repeated Fields Documentation.
Best Practices for Nested and Repeated Fields
- Optimize for Read Performance
Use nested and repeated fields for hierarchical data that is often queried together. - Avoid Over-Normalization
Normalize only when absolutely necessary to prevent redundancy. - Leverage
UNNEST
for Analysis
Simplify data flattening during queries.
Conclusion
Using nested and repeated fields in BigQuery can dramatically improve the efficiency of your queries and data management. These features allow for highly structured data storage while maintaining performance. By following best practices and leveraging tools like UNNEST
, you can harness the full power of BigQuery for your analytics needs.
For more information, explore:
- BigQuery Nested and Repeated Fields Documentation
- Best Practices for BigQuery Schema Design
- BigQuery JSON Data Loading