
In a previous article, we explored the idea of chat‑first access to data and how Queryless makes it possible to interact with datasets through conversation (see: https://datahub.io/blog/data-portals-to-chat-first-access).
In this article, we focus on a specific part of that system: how Queryless analyzes raw data files.
Queryless lets anyone explore data using natural language—like asking "What were our sales last month?" in chat. Behind the scenes, it generates analytical SQL queries and runs them on DuckDB to turn files into fast, interactive insights. But before looking at how Queryless analyzes raw files, it helps to understand the challenge teams face when working with them.
👉 Learn more and try Queryless: visit https://querylessai.com.
The Problem
Organizations accumulate data in many everyday formats that often become the starting point for analysis, for example:
- CSV exports from tools like Stripe, HubSpot, or Shopify
- Excel or spreadsheet files shared between teams
- analytics exports from internal tools
- log or event data
These files often end up stored in cloud drives, data folders, or shared storage. Extracting insights from them usually requires loading the data into a database, writing SQL queries, or building dashboards.
Queryless approaches this differently by allowing teams to query those files directly.
Talking to Your Data Files
To solve this problem, instead of manually loading files into databases or building dashboards first, with Queryless anyone can ask questions about their data directly through chat interfaces such as Telegram.
When the data source is a raw file, Queryless turns a question into an answer through a simple pipeline:
- A user asks a question about the data
- Queryless determines which dataset is relevant
- The file is retrieved from its storage location (for example S3, Google Drive, or another file system)
- The file is downloaded and cached locally
- Queryless reads the file metadata (such as schema, column names, and types)
- A SQL query is generated from the user's question
- The query is executed using DuckDB
- Results are returned as tables, summaries, or charts
Raw datasets can live in many storage systems—such as Amazon S3, Google Drive, other cloud storage services, or internal file servers—and Queryless can work with common file formats like CSV, Excel (XLSX), Parquet, and JSON.
This allows Queryless to run analytical queries directly on the retrieved files without loading them into a separate data warehouse or pipeline. But how does Queryless actually turn a question into a SQL query?

Query Generation and Execution
Once the dataset is available, Queryless converts user questions into SQL queries and executes them using DuckDB.
For example, a user might ask:
What were the total sales in March?
Queryless first identifies the relevant dataset, then inspects its metadata (such as column names and schema) to generate the query needed to answer the question.
Example:
SELECT
SUM(sales) AS total_sales
FROM transactions
WHERE date >= '2024-03-01'
AND date < '2024-04-01';
This query is then executed by DuckDB, the analytical engine used by Queryless.
DuckDB is well suited for this workload because it uses fast columnar and vectorized execution:
- executes analytical SQL very quickly
- can query data files directly
- requires minimal infrastructure
This allows Queryless to analyze raw datasets efficiently while keeping the architecture lightweight.
Debugging Queryless Answers and Charts
Once answers are generated, it is often useful to understand how they were produced.
If a result or chart looks unexpected, you can ask Queryless to show the SQL used to generate it. This makes it easier to debug how an answer was produced.
Example prompt:
Show the SQL used for this result.
Queryless might return:
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY 1
ORDER BY 1;
This transparency allows users to verify results or reuse the query elsewhere.

Queryless Can Learn Facts About Your Data
While inspecting queries helps understand results, Queryless can also learn from user feedback.
Datasets often contain ambiguous column names. Clarifying them helps improve future query generation accuracy. For example:
valueamountscore
If a result looks incorrect, users can ask Queryless how it produced the answer.
Explain how you calculated this result.
Queryless can describe which columns, aggregations, and filters were used.
If the user notices a mistake, they can correct the definition. For example:
When I say revenue, I mean the column "total_amount".
Please remember that definition.
Queryless can store this information and apply it to future queries, gradually learning how the dataset should be interpreted.
Conclusion
Queryless combines several technologies to make raw data easier to explore:
- natural language interfaces
- automated SQL generation
- DuckDB for fast analytical queries
- flexible access to raw data files
Together these components allow teams to move from raw files to insights through a chat‑first approach to data access—echoing the broader idea explored in our earlier article. Instead of building pipelines and dashboards first, teams can start by simply asking questions and iterating from there.
👉 To learn more or try it yourself, visit https://querylessai.com.