Main Backlog

Iteration +1

  • Table Loader: Refactor URL dispatcher, use fsspec

  • Table Loader/Docs: Advertise using OCI image

  • MongoDB: Load table with querying by single object id

  • MongoDB: Multi-phase BulkProcessor batch size adjustments

  • MongoDB: Report byte sizes (cur/avg/total) in progress bar

  • Documentation:

    The procedure employed by CTK uses the catch-all data OBJECT(DYNAMIC) storage strategy, which is sinking the source record/document into a single column in CrateDB.

    The transformation recipe attempts to outline a few features provided by Zyp Transformations, in this case exclusively applying transformations described by expressions written in jqlang.

  • MongoDB/Docs: Describe usage of mongoimport and mongoexport.

    mongoimport --uri 'mongodb+srv://MYUSERNAME:SECRETPASSWORD@mycluster-ABCDE.azure.mongodb.net/test?retryWrites=true&w=majority'
    
  • MongoDB: Convert dates like "date": "Sep 18 2015", see testdrive.city_inspections.

  • Table Loader: Propagate offset/limit to progress bar

  • Multi-file import from Tar files? https://github.com/crate/crate/issues/17770

Iteration +2

  • Address fix_job_info_table_name

  • Add more items about ctk load table to examples/ folder

    • Python, Bash

  • Cloud: Parallelize import jobs?

  • Bug: Use CRATEDB_USERNAME=admin from cluster-info

  • Cloud: Tests for uploading a local file

  • Cloud: Use .ini file and keyring for storing CrateDB Cloud Cluster ID and credentials

  • Cloud: List RUNNING/FAILED/SUCCEEDED jobs

  • Cloud: Sanitize file name yc.2019.07-tiny.parquet to be accepted as table name

  • ctk load table: Accept offset/limit and start/stop options

  • UX: Unlock testdata:// data sources from influxio

  • UX: No stack traces when cratedb_toolkit.util.croud.CroudException: 401 - Unauthorized

  • UX: Explain cratedb_toolkit.util.croud.CroudException: Another cluster operation is currently in progress, please try again later.

  • UX: Explain cratedb_toolkit.util.croud.CroudException: Resource not found. when accessing unknown cluster id.

  • UX: Make ctk list-jobs respect "status": "SUCCEEDED" etc.

  • UX: Improve textual report from ctk load table

  • UX: Accept alias --format {jsonl,ndjson} for --format json_row

  • Catch recursion errors:

    CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/
    
  • CLI: Verify exit codes.

  • UX: Rename ctk cluster info to ctk status cluster --id=foo-bar-baz

  • UX: Add ctk start cluster --id=foo-bar-baz

  • UX: Provide Bash/zsh completion

  • Beautify list-jobs output

  • ctk list-clusters

  • Store CRATEDB_CLOUD_CLUSTER_ID into cratedb_toolkit.constants

  • Cloud Tests: Verify file uploads

  • Docs: Add examples in more languages: Java, JavaScript, Lua, PHP

  • Docs:

  • Kafka:

  • CTK INFO/CFR

  • Migrate / I/O adapter

Iteration +2.5

  • Retention: Improve retention subsystem CLI API.

    ctk retention create-policy lalala
    ctk materialized create lalala
    ctk schedule add lalala
    
  • Retention: Make --cutoff-day optional, use today() as default.

  • Retention: Refactor “partition”-based strategies into subfamily/category, in order to make room for other types of strategies not necessarily using partitioned tables.

  • Retention: Add examples/retention_tags.py.

Iteration +3

Iteration +4

Add two non-partition-based strategies. Category: timerange.

Iteration +5

Iteration +6

  • Review SQL queries: What about details like ORDER BY 5 ASC?

  • Use SQLAlchemy as query builder, to prevent SQL injection (S608), see render_delete.py spike.

  • Improve configurability by offering to configure schema names and such.

  • Document how to run multi-tenant operations using “tags”.

  • Add an audit log ("ext"."jobs_log"), which records events when retention policy rules are changed, or executed.

  • Add Webhooks, to connect to other systems

  • Document usage with Kubernetes, and Nomad/Waypoint.

  • Job progress

Iteration +7

  • More packaging: Use fpm

  • More packaging: What about an Ubuntu Snap, a Helm chart, or a Nomad Pack?

  • Clarify how to interpret the --cutoff-day option.

  • Add policy rule editor UI.

  • Is “day”-granularity fine with all use-cases? Should it better be generalized?

  • Currently, the test for the reallocate strategy apparently does not remove any records. The reason is probably, because the scenario can’t easily be simulated on a single-node cluster.

  • Ship more package variants: rpm, deb, snap, buildpack?

  • Verify Docker setup on Windows

Done

  • Use a dedicated schema for retention policy tables, other than doc.

  • Refactoring: Manifest the “retention policy” as code entity, using dataclasses, or SQLAlchemy.

  • Document how to connect to CrateDB Cloud

  • Add DatabaseAddress entity, with .safe property to omit eventual passwords

  • Document library and Docker use

  • README: Add a good header, with links to relevant resources

  • Naming things: Use “toolkit” instead of “manager”.

  • Document the layout of the retention policy entity, and the meaning of its attributes.

  • CI: Rename OCI workflow build steps.

  • Move strategy column on first position of retention policy table, and update all corresponding occurrences.

  • Add “tags” to data model, for grouping, multi-tenancy, and more.

  • Improve example

  • Introduce database and CLI API for editing records

  • List all tags

  • Examples: Add “full” example to basic.py, rename to full.py

  • Improve tests by using generate_series

  • Document compact invocation, after applying an alias and exporting an environment variable: cratedb-retention rm --tags=baz

  • Default value for "${CRATEDB_URI}" aka. dburi argument

  • Add additional check if data table(s) exists, or not.

  • Dissolve JOIN-based retention task gathering, because, when the application does not discover any retention policy job, it can not discriminate between “no retention policy” and “no data”, and thus, it is not able to report about it correspondingly.

  • CLI: Provide --dry-run option

  • Docs: Before running the examples, need to invoke cratedb-retention setup --schema=examples

  • For testing the snapshot strategy, provide an embedded MinIO S3 instance to the test suite.

  • Improve SNAPSHOT testing: Microsoft Azure Blob Storage

  • Improve SNAPSHOT testing: Filesystem

  • UX: Refactoring towards cratedb-toolkit.

  • UX: ctk load: Clearly disambiguate between loading data into RDBMS database tables, blob tables, or filesystem objects.

    ctk load table https://s3.amazonaws.com/my.import.data.gz
    
    ctk load blob /path/to/image.png
    
    ctk load object /local/path/to/image.png /dbfs/assets