- Query JSON files in place with READ_JSON if you want to explore or process files without defining any schema up front.
- Load JSON into a fixed schema if your JSON data has a stable set of fields with shallow nesting.
- Transform the input during load if your table must always contain certain fields.
- Store JSON as text if you need only specific fields on demand or if the table structure changes frequently.
Sample JSON dataset
The following JSON data shows two session records for a website, where each line represents a single JSON object. This sample data is used in each of the examples in this document.Query JSON files in place with READ_JSON
The READ_JSON table-valued function reads JSON files directly from object storage — no table definition required. By default it infers the schema from the data, turning nested objects intoSTRUCT columns and arrays into ARRAY columns:
PARSE_AS_JSON => TRUE to receive each document as one row in a single JSON-typed column named json, and extract what you need with JSON functions:
READ_JSON accepts newline-delimited JSON, single (including pretty-printed) documents, and top-level JSON arrays, with optional compression. To persist the data, combine it with INSERT INTO ... SELECT. The loading approaches below remain the right choice when you ingest repeatedly into a stable schema and want the data contract to be explicit.
Load JSON into a fixed schema
If your JSON data has a stable set of fields with shallow nesting, you can load it into a table with a fixed schema to simplify queries. Missing keys are assigned default values. This method allows you to query columns directly without additional parsing, making queries faster and easier to write. Extra keys that are not explicitly mapped are excluded from structured tables, making this approach less flexible for changing data. If stored separately in aTEXT column, they remain accessible for later extraction.
The following code example uses the previously created doc_visits_source table to define columns that map directly to known keys:
| id | start_time | duration | tags |
|---|---|---|---|
| 1 | 1/6/2020 17:00 | 450 | [“summer-sale”, “sports”] |
| 2 | 1/5/2020 12:00 | 959 | [“gadgets”, “audio”] |
- The mandatory scalar fields,
id,start_time, andduration, are stored in separate columns, which makes it easier to filter, sort, or join by these fields. - Each column maps directly to a known JSON key, allowing for simpler queries without the need for JSON functions.
- Default values ensure that the table loads even if some fields are missing or additional keys appear. Extra JSON fields such as
user_agentwithagent,platform, andresolutionare ignored and not stored in the table. - Array columns are used to store
tags, which supports arbitrary numbers of values without schema changes.
Transform the input during load
Parsing JSON data during ingestion eliminates the need for subsequent query-time parsing, simplifying and accelerating queries. However, transforming data during load also requires well-defined JSON paths that remain consistent. If the JSON paths change, the load might fail. The following code example uses the previously createddoc_visits_source table to parse JSON data as it loads and inserts extracted fields into a Firebolt table named visits_transformed. It shows how to use JSON_POINTER_EXTRACT_KEYS and JSON_POINTER_EXTRACT_VALUES to store a dynamic key-value pair – agent_props_keys and agent_props_vals – from a nested object:
| id | start_time | duration | tags | agent_props_keys | agent_props_vals |
|---|---|---|---|---|---|
| 1 | 1/6/2020 17:00 | 450 | [“summer-sale”,“sports”] | [“agent”, “platform”, “resolution”] | [“Mozilla/5.0”, “Windows NT 6.1”, “1024x4069”] |
| 2 | 1/5/2020 12:00 | 959 | [“gadgets”,“audio”] | [“agent”, “platform”] | [“Safari”, “iOS 14”] |
- The
user_agentobject is stored in two arrays:agent_props_keysandagent_props_vals. TheJSON_POINTER_EXTRACT_KEYSfunction extracts the keys from theuser_agentobject into theagent_props_keysarray. TheJSON_POINTER_EXTRACT_VALUESfunction extracts the corresponding values into theagent_props_valsarray. Storing keys and values in parallel arrays offers flexibility when theuser_agentmap changes and avoids schema updates for new or removed fields.
NULL values cannot be cast to INT. For example, the following query attempts to extract a non-existent field /unknown_field and cast it to INT, which results in an error:
| id | start_time | duration | tags |
|---|---|---|---|
| 0 | NULL | NULL | NULL |
| 0 | NULL | NULL | NULL |
| 1 | 1/6/2020 17:00 | 450 | [“summer-sale”, “sports”] |
| 2 | 1/5/2020 12:00 | 959 | [“gadgets”, “audio”] |
Store JSON as text
You can store JSON as a single text column if the data structure changes frequently or if you only need certain fields in some queries. This approach simplifies ingestion since no parsing occurs during loading, but it requires parsing fields at query time, which can make queries more complex if you need to extract many fields regularly. The following code example uses the previously created intermediarydoc_visits_source table to create a permanent table that stores raw JSON, allowing you to parse only what you need on demand:
| raw_json |
|---|
{"id": 1, "StartTime": "2020-01-06 17:00:00", "Duration": 450, "tags": ["summer-sale", "sports"], "user_agent": {"agent": "Mozilla/5.0", "platform": "Windows NT 6.1", "resolution": "1024x4069"}} |
{"id": 2, "StartTime": "2020-01-05 12:00:00", "Duration": 959, "tags": ["gadgets", "audio"], "user_agent": {"agent": "Safari", "platform": "iOS 14"}} |
- The
id,start_time,durations, andtagscolumns follow the same purpose as in the previous table example. - Each row in the previous table contains a complete JSON object stored in a single
TEXTcolumn, rather than being parsed into separate fields. This approach is beneficial when the required fields are unknown at ingestion or the JSON structure changes frequently, allowing for flexible data storage without modifying the schema. Fields can be extracted dynamically at query time using Firebolt’s JSON functions, though frequent parsing may increase query complexity and cost. - Parsing occurs at query time, which can save upfront processing when data is loaded, but it might increase query complexity and cost if you need to parse many fields frequently.
- Subsequent queries need to extract fields manually with JSON functions as needed.
Lenient JSON parsing
When Firebolt reads JSON values into typed columns — bothREAD_JSON with an inferred or fixed schema and JSON external tables — the field-level parser is intentionally more permissive than the JSON standard. This is for backward compatibility with data that has historically loaded successfully; it is not a claim of strict conformance. The following non-standard constructs are accepted:
| Construct | Example | Standard JSON |
|---|---|---|
| Curly braces around array values | {"a": {1, 2, 3}} | [1, 2, 3] |
| Unquoted boolean words, any case | {"b": yes}, {"b": TRUE} | true / false |
| Non-finite floats | {"f": inf}, {"f": -inf} | (not representable) |
Leading + on integers | {"n": +5} | 5 |
| Trailing comma in an array | {"a": [1, 2,]} | [1, 2] |
| Quoted number for a numeric column | {"n": "42"} | 42 (unquoted) |
nan floats and unquoted strings ({"s": bare_word}).
Two paths remain strictly standards-conformant and reject all of the above:
READ_JSONschema inference validates the sampled file with a conformant parser before deciding column types, so a malformed sampled file fails the query rather than being read leniently.PARSE_AS_JSON => TRUEvalidates every document as well-formed JSON on ingestion.
PARSE_AS_JSON => TRUE.