Datasets API¶
Provide access to datasets, to be easily consumed by tutorials and/or production applications.
Install¶
pip install --upgrade 'cratedb-toolkit[datasets]'
Synopsis¶
from cratedb_toolkit.datasets import load_dataset
dataset = load_dataset("tutorial/weather-basic")
print(dataset.ddl)
Usage¶
Built-in datasets¶
Load example datasets into CrateDB database tables.
from cratedb_toolkit.datasets import load_dataset
# Weather data example.
dataset = load_dataset("tutorial/weather-basic")
dataset.dbtable(dburi="crate://crate@localhost/", table="weather_data").load()
from cratedb_toolkit.datasets import load_dataset
# UK wind farm data example.
dataset = load_dataset("tutorial/windfarm-uk-info")
dataset.dbtable(dburi="crate://crate@localhost/", table="windfarms").load()
dataset = load_dataset("tutorial/windfarm-uk-data")
dataset.dbtable(dburi="crate://crate@localhost/", table="windfarm_output").load()
Kaggle¶
For accessing datasets on Kaggle, you will need an account on their platform.
Authentication¶
Either create a configuration file ~/.kaggle/kaggle.json
in JSON format,
{"username":"acme","key":"134af98bdb0bd0fa92078d9c37ac8f78"}
or, alternatively, use those environment variables.
export KAGGLE_USERNAME=acme
export KAGGLE_KEY=134af98bdb0bd0fa92078d9c37ac8f78
Acquisition¶
Load a dataset on Kaggle into a CrateDB database table.
from cratedb_toolkit.datasets import load_dataset
dataset = load_dataset("kaggle://guillemservera/global-daily-climate-data/daily_weather.parquet")
dataset.dbtable(dburi="crate://crate@localhost/", table="kaggle_daily_weather").load()
In Practice¶
Please refer to those notebooks to learn how load_dataset
works in practice.