Skip to content

LLMJoin

ingredients

Usage

LLMJoin

Bases: JoinIngredient

from_args(model=None, use_skrub_joiner=True, few_shot_examples=None, k=None) classmethod

Creates a partial class with predefined arguments.

Parameters:

Name Type Description Default
few_shot_examples List[dict]

A list of AnnotatedJoinExamples dictionaries for few-shot learning. If not specified, will use default_examples.json as default.

None
use_skrub_joiner bool

Whether to use the skrub joiner. Defaults to True.

True
k Optional[int]

Determines number of few-shot examples to use for each ingredient call. Default is None, which will use all few-shot examples on all calls. If specified, will initialize a haystack-based DPR retriever to filter examples.

None

Returns:

Type Description

Type[JoinIngredient]: A partial class of JoinIngredient with predefined arguments.

Examples:

from blendsql import blend, LLMJoin
from blendsql.ingredients.builtin import DEFAULT_JOIN_FEW_SHOT

ingredients = {
    LLMJoin.from_args(
        few_shot_examples=[
            *DEFAULT_JOIN_FEW_SHOT,
            {
                "join_criteria": "Join the state to its capital.",
                "left_values": ["California", "Massachusetts", "North Carolina"],
                "right_values": ["Sacramento", "Boston", "Chicago"],
                "mapping": {
                    "California": "Sacramento",
                    "Massachusetts": "Boston",
                    "North Carolina": "-"
                }
            }
        ],
        # Will fetch `k` most relevant few-shot examples using embedding-based retriever
        k=2
    )
}
smoothie = blend(
    query=blendsql,
    db=db,
    ingredients=ingredients,
    default_model=model,
)

run(model, left_values, right_values, question=None, few_shot_retriever=None, **kwargs)

Description

This ingredient handles the logic of semantic JOIN clauses between tables.

In other words, it creates a custom mapping between a pair of value sets. Behind the scenes, this mapping is then used to create an auxiliary table to use in carrying out an INNER JOIN.

For example:

SELECT Capitals.name, State.name FROM Capitals
    JOIN {{
        LLMJoin(
            'Align state to capital', 
            left_on='States::name', 
            right_on='Capitals::name'
        )
    }}
The above example hints at a database schema that would make E.F Codd very angry: why do we have two separate tables States and Capitals with no foreign key to join the two?

BlendSQL was built to interact with tables "in-the-wild", and many (such as those on Wikipedia) do not have these convenient properties of well-designed relational models.

For this reason, we can leverage the internal knowledge of a pre-trained LLM to do the JOIN operation for us.