Natural language (NLSQL)¶

Talk to your data in natural language.

The CrateDB NLSQL package helps agents turn natural language into database queries, like Vanna AI or Google’s QueryData but tailored to CrateDB.

About¶

NLSQL provides a straightforward way to turn natural language into executable SQL by combining an LLM with explicit database context. It positions itself as an execution layer for data agents: agents handle reasoning and orchestration, while the NLSQL layer reliably generates, checks, and runs SQL against databases, returning results for downstream actions.

The trade-off is explicit: you shift effort from prompt tuning to context engineering and maintenance, but gain near-100% accuracy, stronger guardrails, and production reliability—especially for multistep or mission-critical workflows where probabilistic errors are unacceptable.

Install¶

uv pip install --upgrade 'cratedb-toolkit[nlsql]'

Synopsis¶

ctk query nlsql \
    --cluster-url="crate://crate@localhost:4200/?ssl=false" \
    --llm-provider="<provider-name>" \
    --llm-name="<model-name>" \
    --llm-api-key="<your-api-key>" \
    "What is the average value for sensor 1?"

Coverage¶

Providers

Supports a range of providers: Amazon Bedrock (+ Converse), Anthropic, Azure OpenAI, Google AI, Hugging Face Inference API, llamafile, Mistral, Ollama, OpenAI, OpenRouter, or Runpod Serverless (OpenAI-compatible).

Models

A range of models can be selected from the providers enumerated above. We recommend Gemini, Gemma3, Llama 3.1, Qwen 2.5, or later, for example Gemma-3-1B, Llama-3.2-1B-Instruct, or Qwen3.5-0.8B.

Details¶

The NLSQL interface works by wrapping a SQL database and exposing a query interface where plain-language questions are translated into SQL, executed, and returned as answers. Developers configure the engine with a database connection and a bounded set of tables, ensuring the model generates queries only within a known schema and avoids context overflow.

The procedure follows a schema-grounded approach: the engine injects table structure (and optionally examples or retrieved context) into the prompt so the LLM can synthesize accurate queries instead of guessing. It can also integrate with retrieval components to dynamically select relevant tables or augment prompts at query time for more complex setups.

The engine acts as a thin orchestration layer for Text-to-SQL purposes, and for building NLSQL systems: it handles prompt construction, query generation, execution, and result formatting, while leaving control, safety (e.g., read-only roles), and schema design to the developer.

Security¶

Any Text-to-SQL application should be aware that executing arbitrary SQL queries can be a security risk. It is recommended to take precautions as needed, such as using restricted roles, read-only databases, sandboxing, etc.

While we recommend to use a dedicated read-only user/role to guarantee 100% safety, CrateDB NLSQL also prevents Prompt-to-SQL Injections by default, by classifying the SQL statement and only permitting access for SELECT statements.

The permit_all_statements API argument or the NLSQL_PERMIT_ALL_STATEMENTS environment variable can be used to relax that default when set to a boolean value, to allow all types of statements. Only enable this flag when you are sure about this behaviour.

Examples¶

The NLSQL with sensor data demonstrates a basic database inquiry using the question »What is the average value for sensor 1?« to acquire information from a single table.

NLSQL with employee data, NLSQL with product orders, and NLSQL with weather data explore and demonstrate other kinds of query variants.