JSON¶

Zuar Runner’s JSON plugin pipes data from any JSON file into a relational database. The input JSON files can be either JSON or JSONL.

Example use case¶

Let’s take a simple JSON file with nested data and convert the data into tables in a relational database.

Example JSON file¶

Here’s what the source JSON file (people_pets.json) looks like:

[
  {
    "id": 1,
    "name": "Justin",
    "pets": [
      {
        "id": 1,
        "name": "Bear",
        "pet": {
          "type": "dog",
          "breed": "Goldendoodle"
        }
      },
      {
        "id": 2,
        "name": "Birdie",
        "pet": {
          "type": "dog",
          "breed": "Goldendoodle"
        }
      }
    ]
  },
  {
    "id": 2,
    "name": "Matt",
    "pets": []
  },
  {
    "id": 3,
    "name": "Ben",
    "pets": [
      {
        "id": 3,
        "name": "Zuca",
        "pet": {
          "type": "dog",
          "breed": "Cavapoo"
        }
      }
    ]
  }
]

Expected Relational Database Results¶

Because the JSON file has nested data in the pets array ([ ]), we will need to create two tables in the database.

Table 1 will include top level key/value pairs: id, name. The resulting table should have 3 rows.
Table 2 will include data in the pets array (id, name, and pet object’s data). Table 2 will also include data from the parent object (id) so we can join the two database tables. The resulting table should have 3 rows.

Table 1 expected result:

id	name
1	Justin
2	Matt
3	Ben

Table 2 expected result:

id	name	pet__type	pet__breed	person_id
1	Bear	dog	Goldendoodle	1
2	Birdie	dog	Goldendoodle	1
3	Zuca	dog	Cavapoo	3

Create a Zuar Runner JSON Job¶

The JSON file(s) must exist in Zuar Runner’s file system. Use the file manager to manually add a JSON file to Zuar Runner.

The JSON input requires creating a job manually, so create a job using the Generic plugin. Set the job’s type to io.

Example Zuar Runner Job Configurations¶

Table 1 Zuar Runner job configuration:

{
    "input": {
        "use": "flatfile.iov2#JsonInput",
        "source": "/var/mitto/data/people_pets.json"
    },
    "output": {
        "tablename": "people",
        "use": "call:mitto.iov2.db#todb",
        "schema": "json",
        "dbo": "postgresql://db/analytics"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}

This example job take the source JSON file (people_pets.json) and creates a database table (json.people) in Zuar Runner’s internal PostgreSQL database (localhost). It creates columns from all of the “top level” keys excluding the arrays ([ ]).

Table 2 Zuar Runner job configuration:

{
    "input": {
        "source": "/var/mitto/data/people_pets.json",
        "use": "flatfile.iov2#JsonInput"
    },
    "output": {
        "dbo": "postgresql://db/analytics",
        "schema": "json",
        "tablename": "people__pets",
        "use": "call:mitto.iov2.db#todb"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#PluckV2Transform",
                    "jpath": "$.pets[*]",
                    "members": [
                        {
                            "name": "person_id",
                            "value": "$.id"
                        }
                    ]
                },
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}

This example job take the source JSON file (people_pets.json) and creates a database table (json.people__pets) in Zuar Runner’s internal PostgreSQL database (localhost).

Note that this job configuration includes an additional transform, the PluckV2Transform. This transform takes content from the JSON array and extracts that content into a separate database table. Based on the jpath and members parameters in the steps, it creates columns from all the keys inside the pets array and the “top level” id key.

Jpath¶

The jpath parameter let’s you define exactly what section of the source JSON file to pull from. This is necessary for Table 2 because of the nested data in the pets array. Learn more about JSONPath expressions below.

Members¶

The members parameter let’s you add additional columns from the source JSON file to the resulting database table. In our case, we need to add the id of each person to the people__pets table in order for us to join the two tables together. members also uses jpath in it’s value key, and you can name the resulting column by adjusting the name key.

JSONPath Expressions¶

To understand how to use JSONPath to pick specific sections of data out of a JSON object you can visit: https://goessner.net/articles/JsonPath/

To interactively learn how to use JSONPath syntax you can visit: https://jsonpath.com/

RegEx¶

Like the other flat file types, Runner jobs can use RegEx for the source file selection. An example of a JSON job input section using RegEx is below:

{
  "input": {
    "base": "flatfile.iov2#JsonInput",
    "use": "flatfile.iov2#RegexInput",
    "regex": "/var/mitto/data/people_pets_.*.json$"
  }

This example JSON job will read data from an arbitrary collection of JSON files named in a way that matches the regular expression pattern defined. See RegEx for more information.

JSONL¶

Runner also supports JSONL files; these files are commonly used in applications that deliver log type data, or applications that produce a large sets of records.

In JSONL files, each line is a JSON record itself, followed by a specific non-printing character used to delimit the lines.

The people_pets.json sample data can be represented as JSONL with some changes; it would have this content:

{"id":1,"name":"Justin","pets":[{"id":1,"name":"Bear","pet":{"type":"dog","breed":"Goldendoodle"}},{"id":2,"name":"Birdie","pet":{"type":"dog","breed":"Goldendoodle"}}]}
{"id":2,"name":"Matt","pets":[]}
{"id":3,"name":"Ben","pets":[{"id":3,"name":"Zuca","pet":{"type":"dog","breed":"Cavapoo"}}]}

If we need to process JSONL files, all of the previous discussion is applicable, with one modification to input section of the job, changing the use directive to JsonlInput:

{
    "input": {
        "source": "/var/mitto/data/people_pets.json",
        "use": "flatfile.iov2#JsonlInput"
    },