gnat_store
Synopsis
Stores or uploads Parquet files locally or to a supported cloud service.
Description
The gnat_store
tool is primarily used to store data in repository where it can be easily queried.
This tool can store data in a local directory or upload it to a cloud service such as MotherDuck, AWS S3, or Google Cloud Storage.
It is designed to work with Parquet files, which are a columnar storage format that is efficient for analytical queries.
This tool implements the gnat
command line interface and shares the same required and optional arguments as other GNAT tools.
Required Arguments
--output <output directory>
The --output
argument specifies the output file where the results of the GNAT tool will be saved.
The output file argument is required for all GNAT tools and must be specified for the tool to function properly.
Paths with an 'md:' prefix will upload the data to MotherDuck, while paths with an 's3://' prefix will upload the data to S3.
Other paths will store the data locally in a directory structure that is compatible with Hive partitioning.
Environment Variables
For MotherDuck, set the enviroment variable motherduck_token
then set --output
to md:<table_name>
.
For AWS S3, set the environment variables s3_region
, s3_endpoint
,s3_access_key_id
and s3_secret_access_key
then set --output
to s3://<bucket>/<path>
.
Examples
1) Store Parquet files locally in hive partition format
$ gnat_store --input /var/spool/input --output /var/hive --interval minute
2) Upload data to motherduck every minute
$ motherduck_token=<token>
$ export motherduck_token
$ gnat_store --input /var/spool/input --output md:flow --interval minute
2) Upload data to S3 bucket in Parquet format every minute
$ s3_region=<region>
$ s3_endpoint=<endpoint>
$ s3_access_key_id=<access_key_id>
$ s3_secret_access_key=<secret_access_key>
$ export s3_region s3_endpoint s3_access_key_id s3_secret_access_key
$ gnat_store --input /var/spool/input --output s3://mybucket/flow --interval minute