FAQ

How does BlendSQL execute a query?

BlendSQL handles traversal of the SQL AST and creation of temporary tables to execute a given query. This allows BlendSQL to be DBMS-agnostic, and extendable into both SQLite, PostgreSQL, and other DBMS.

Why not just implement BlendSQL as a user-defined function in SQLite?

LLMs are expensive, both in terms of $ cost and compute time. When applying them to SQLite databases, we want to take special care in ensuring we're not applying them to contexts where they're not required. This is not easily achievable with UDFs, even when marked as a deterministic function.

BlendSQL is specifically designed to enforce an order-of-operations that 1) prioritizes vanilla SQL operations first, and 2) caches results from LLM ingredients so they don't need to be recomputed. For example:
SELECT {{LLMMap('What state is this NBA team from?', 'w::team')} FROM w 
   WHERE num_championships > 3 
   ORDER BY {{LLMMap('What state is this NBA team from?', 'w::team')}
BlendSQL makes sure to only pass those team values from rows which satisfy the condition num_championship > 3 to the LLM. Additionally, since we assume the function is deterministic, we make a single LLM call and cache the results, despite the ingredient function being used twice.

So I get how to write BlendSQL queries. But why would I use this over vanilla SQLite?

Certain ingredients, like LLMJoin, will likely give seasoned SQL experts a headache at first. However, BlendSQL's real strength comes from it's use as an intermediate representation for reasoning over structured + unstructured with LLMs. Some examples of this can be found here.