Skip to content

File Format

Data types come in a variety of formats. First things that come to mind are JSON, CSV, XML, Syslog, etc. Each file format poses a different parsing strategy, and is critical for Hunters to successfully structure the data type accordingly.

In this section, we'll go over the supported file formats:

NDJSON

New line delimited JSON is a common format used to group multiple JSON lines in a single file.

Sample:

{"text": "hello and welcome to Hunters.AI!", "name": "Hunters"}
{"text": "this is an NDJSON", "name": "example"}

As you can see from the above example, we have two JSON objects delimited by a newline. Note that even though there are multiple JSON objects in the file, they should all contain the same object structure.

CSV with Header

Comma sepearted values, also known as CSV, is a tabular format similar to a database table. It contains columns and rows, where each column represents a single field, usually of a specific type. CSV files may or may not have a header row, which is typically the first row of the every such CSV file, specifying the name of each column.
At Hunters, we currently support only CSV files which contain a header as the first row.

Additionally, a CSV may contain a comma as a separator as its name states, but may also use other separators, such as '|' or '^'. Field values which may themselves contain the separator (for example, if we had a row where the name was 'sandra,', we would additionally have to escape the field, traditionally with double quotes, to denote that this field itself also contains the separator value. At Hunters we support any CSV spec compliant separator.

Sample:

name, age, height
david, 35, 180
sandra, 30, 180

AWS Format

AWS has a custom JSON layout which they use for their services which emit log data, such as AWS Config and CloudTrail. The format looks as follows:

{
    "Records": [{
        "eventVersion": "1.0",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "EX_PRINCIPAL_ID",
            "arn": "arn:aws:iam::123456789012:user/Alice",
            "accessKeyId": "EXAMPLE_KEY_ID",
            "accountId": "123456789012",
            "userName": "Alice"
        },
        "eventTime": "2014-03-06T21:22:54Z",
        "eventSource": "ec2.amazonaws.com",
        "eventName": "StartInstances",
        "awsRegion": "us-east-2",
        "sourceIPAddress": "205.251.233.176",
        "userAgent": "ec2-api-tools 1.6.12.2",
        "requestParameters": {
            "instancesSet": {
                "items": [{
                    "instanceId": "i-ebeaf9e2"
                }]
            }
        },
        "responseElements": {
            "instancesSet": {
                "items": [{
                    "instanceId": "i-ebeaf9e2",
                    "currentState": {
                        "code": 0,
                        "name": "pending"
                    },
                    "previousState": {
                        "code": 80,
                        "name": "stopped"
                    }
                }]
            }
        }
    }]
}

The format contains a the key "Records" as a single field, followed by a JSON array containing various JSON object structures.