Datasets API

Provide access to datasets, to be easily consumed by tutorials and/or production applications.

Install

pip install --upgrade 'cratedb-toolkit[datasets]'

Synopsis

from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("tutorial/weather-basic")
print(dataset.ddl)

Usage

Built-in datasets

Load example datasets into CrateDB database tables.

from cratedb_toolkit.datasets import load_dataset

# Weather data example.
dataset = load_dataset("tutorial/weather-basic")
dataset.dbtable(dburi="crate://crate@localhost/", table="weather_data").load()
from cratedb_toolkit.datasets import load_dataset

# UK wind farm data example.
dataset = load_dataset("tutorial/windfarm-uk-info")
dataset.dbtable(dburi="crate://crate@localhost/", table="windfarms").load()

dataset = load_dataset("tutorial/windfarm-uk-data")
dataset.dbtable(dburi="crate://crate@localhost/", table="windfarm_output").load()

Kaggle

For accessing datasets on Kaggle, you will need an account on their platform.

Authentication

Either create a configuration file ~/.kaggle/kaggle.json in JSON format,

{"username":"acme","key":"134af98bdb0bd0fa92078d9c37ac8f78"}

or, alternatively, use those environment variables.

export KAGGLE_USERNAME=acme
export KAGGLE_KEY=134af98bdb0bd0fa92078d9c37ac8f78

Acquisition

Load a dataset on Kaggle into a CrateDB database table.

from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("kaggle://guillemservera/global-daily-climate-data/daily_weather.parquet")
dataset.dbtable(dburi="crate://crate@localhost/", table="kaggle_daily_weather").load()

In Practice

Please refer to those notebooks to learn how load_dataset works in practice.