Skip to content

QAIngredient

ingredients

Description

Sometimes, simply selecting data from a given database is not enough to sufficiently answer a user's question.

The QAIngredient is designed to return data of variable types, and is best used in cases when we either need: 1) Unstructured, free-text responses ("Give me a summary of all my spending in coffe") 2) Complex, unintuitive relationships extracted from table subsets ("How many consecutive days did I spend in coffee?") 3) Multi-hop reasoning from unstructured data, grounded in a structured schema (using the options arg)

Formally, this is an aggregate function which transforms a table subset into a single value.

The following query demonstrates usage of the builtin LLMQA ingredient.

{{
    LLMQA(
        'How many consecutive days did I buy stocks in Financials?', 
        (
            SELECT account_history."Run Date", account_history.Symbol, constituents."Sector"
              FROM account_history
              LEFT JOIN constituents ON account_history.Symbol = constituents.Symbol
              WHERE Sector = "Financials"
              ORDER BY "Run Date" LIMIT 5
        )
    )
}} 
This is slightly more complicated than the rest of the ingredients.

Behind the scenes, we wrap the call to LLMQA in a SELECT clause, ensuring that the ingredient's output gets returned.

SELECT {{QAIngredient}}
The LLM gets both the question asked, alongside the subset of the SQL database fetched by our subquery.

"Run Date" Symbol Sector
2022-01-14 HBAN Financials
2022-01-20 AIG Financials
2022-01-24 AIG Financials
2022-01-24 NTRS Financials
2022-01-25 HBAN Financials

From examining this table, we see that we bought stocks in the Financials sector 2 consecutive days (2022-01-24, and 2022-01-25). The LLM answers the question in an end-to-end manner, returning the result 2.

The QAIngredient can be used as a standalone end-to-end QA tool, or as a component within a larger BlendSQL query.

For example, the BlendSQL query below translates to the valid (but rather confusing) question:

"Show me stocks in my portfolio, whose price is greater than the number of consecutive days I bought Financial stocks multiplied by 10. Only display those companies which offer a media streaming service."

 SELECT Symbol, "Last Price" FROM portfolio WHERE "Last Price" > {{
  LLMQA(
        'How many consecutive days did I buy stocks in Financials?', 
        (
            SELECT account_history."Run Date", account_history.Symbol, constituents."Sector"
              FROM account_history
              LEFT JOIN constituents ON account_history.Symbol = constituents.Symbol
              WHERE Sector = "Financials"
              ORDER BY "Run Date" LIMIT 5
        )
    )
  }} * 10
  AND {{LLMMap('Offers a media streaming service?', 'portfolio::Description')}} = 1

Constrained Decoding with options

Perhaps we want the answer to the above question in a different format. We call our LLM ingredient in a constrained setting by passing a options argument, where we provide either semicolon-separated options, or a reference to a column.

{{
    LLMQA(
        'How many consecutive days did I buy stocks in Financials?', 
        (
            SELECT account_history."Run Date", account_history.Symbol, constituents."Sector"
              FROM account_history
              LEFT JOIN constituents ON account_history.Symbol = constituents.Symbol
              WHERE Sector = "Financials"
              ORDER BY "Run Date" LIMIT 5
        )
        options='one consecutive day!;two consecutive days!;three consecutive days!'
    )
}}

Running the above BlendSQL query, we get the output two consecutive days!.

This options argument can also be a reference to a given column.

For example (from the HybridQA dataset):

 SELECT capacity FROM w WHERE venue = {{
        LLMQA(
            'Which venue is named in honor of Juan Antonio Samaranch?',
            (SELECT title, content FROM documents WHERE content MATCH 'venue'),
            options='w::venue'
        )
}}

Or, from our running example:

{{
  LLMQA(
      'Which did i buy the most?',
      (
        SELECT account_history."Run Date", account_history.Symbol, constituents."Sector"
          FROM account_history
          LEFT JOIN constituents ON account_history.Symbol = constituents.Symbol
          WHERE Sector = "Financials"
          ORDER BY "Run Date" LIMIT 5
      )
      options='account_history::Symbol'
  )
}}

The above BlendSQL will yield the result AIG, since it appears in the Symbol column from account_history.

QAProgram

Bases: Program

Source code in blendsql/ingredients/builtin/qa/main.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
class QAProgram(Program):
    def __call__(
        self,
        model: Model,
        question: str,
        context: Optional[pd.DataFrame] = None,
        options: Optional[Set[str]] = None,
        long_answer: Optional[bool] = False,
        table_title: Optional[str] = None,
        max_tokens: Optional[int] = None,
        **kwargs,
    ) -> Tuple[str, str]:
        if isinstance(model, LocalModel):
            m: guidance.models.Model = model.model_obj
        else:
            m: str = ""
        serialized_db = context.to_string() if context is not None else ""
        with guidance.system():
            m += "Answer the question for the table. "
            options_alias_to_original = {}
            if long_answer:
                m += "Make the answer as concrete as possible, providing more context and reasoning using the entire table.\n"
            else:
                m += "Keep the answers as short as possible, without leading context. For example, do not say 'The answer is 2', simply say '2'.\n"
        if options is not None:
            # Add in title case, since this helps with selection
            options_with_aliases = copy.deepcopy(options)
            # Below we check to see if our options have a unique first word
            # sometimes, the model will generate 'Frank' instead of 'Frank Smith'
            # We still want to align that, in this case
            add_first_word = False
            if len(set([i.split(" ")[0] for i in options])) == len(options):
                add_first_word = True
            for option in options:
                option = str(option)
                for option_alias in [option.title(), option.upper()]:
                    options_with_aliases.add(option_alias)
                    options_alias_to_original[option_alias] = option
                if add_first_word:
                    option_alias = option.split(" ")[0]
                    options_alias_to_original[option_alias] = option
                    options_with_aliases.add(option_alias)
        with guidance.user():
            m += f"\n\nQuestion: {question}"
            if table_title is not None:
                m += f"\n\nContext: \n Table Description: {table_title} \n {serialized_db}"
            else:
                m += f"\n\nContext: \n {serialized_db}"
            if options and not isinstance(model, LocalModel):
                m += f"\n\nFor your answer, select from one of the following options: {options}"
            m += "\n\nAnswer:\n"
        if isinstance(model, LocalModel):
            with guidance.assistant():
                prompt = m._current_prompt()
                if options is not None:
                    response = (
                        m
                        + guidance.capture(
                            guidance.select(options=options_with_aliases),
                            name="response",
                        )
                    )._variables["response"]
                else:
                    response = (
                        m
                        + guidance.capture(
                            guidance.gen(max_tokens=max_tokens or 50), name="response"
                        )
                    )._variables["response"]
        else:
            prompt = m
            if model.tokenizer is not None:
                max_tokens = (
                    max(
                        [
                            len(model.tokenizer.encode(alias))
                            for alias in options_alias_to_original
                        ]
                    )
                    if options
                    else max_tokens
                )
            response = generate(
                model,
                prompt=prompt,
                options=options,
                max_tokens=max_tokens,
                stop_at=["\n"],
            )
        # Map from modified options to original, as they appear in DB
        response: str = options_alias_to_original.get(response, response)
        if options and response not in options:
            print(
                Fore.RED
                + f"Model did not select from a valid option!\nExpected one of {options}, got {response}"
                + Fore.RESET
            )
        return (response, prompt)

LLMQA

Bases: QAIngredient

Source code in blendsql/ingredients/builtin/qa/main.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
class LLMQA(QAIngredient):
    DESCRIPTION = """
    If mapping to a new column still cannot answer the question with valid SQL, turn to an end-to-end solution using the aggregate function:
        `{{LLMQA('question', (blendsql))}}`
        Optionally, this function can take an `options` argument to restrict its output to an existing SQL column.
        For example: `... WHERE column = {{LLMQA('question', (blendsql), options='table::column)}}`
    """

    def run(
        self,
        model: Model,
        question: str,
        options: Optional[Set[str]] = None,
        context: Optional[pd.DataFrame] = None,
        value_limit: Optional[int] = None,
        table_to_title: Optional[Dict[str, str]] = None,
        long_answer: bool = False,
        **kwargs,
    ) -> Union[str, int, float]:
        if model is None:
            raise IngredientException(
                "LLMQA requires a `Model` object, but nothing was passed!\nMost likely you forgot to set the `default_model` argument in `blend()`"
            )
        if context is not None:
            if value_limit is not None:
                context = context.iloc[:value_limit]
        result = model.predict(
            program=QAProgram,
            options=options,
            question=question,
            context=context,
            long_answer=long_answer,
            table_title=None,
            **kwargs,
        )
        # Post-process language model response
        return "'{}'".format(single_quote_escape(result.strip()))

run(model, question, options=None, context=None, value_limit=None, table_to_title=None, long_answer=False, **kwargs)

Source code in blendsql/ingredients/builtin/qa/main.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
def run(
    self,
    model: Model,
    question: str,
    options: Optional[Set[str]] = None,
    context: Optional[pd.DataFrame] = None,
    value_limit: Optional[int] = None,
    table_to_title: Optional[Dict[str, str]] = None,
    long_answer: bool = False,
    **kwargs,
) -> Union[str, int, float]:
    if model is None:
        raise IngredientException(
            "LLMQA requires a `Model` object, but nothing was passed!\nMost likely you forgot to set the `default_model` argument in `blend()`"
        )
    if context is not None:
        if value_limit is not None:
            context = context.iloc[:value_limit]
    result = model.predict(
        program=QAProgram,
        options=options,
        question=question,
        context=context,
        long_answer=long_answer,
        table_title=None,
        **kwargs,
    )
    # Post-process language model response
    return "'{}'".format(single_quote_escape(result.strip()))