Asking Solr Questions in Natural Language

July 31, 2024

With the recent advancements of AI/ML, many tasks that were once unapproachable have become not. One of these tasks is asking questions to computers in a natural language and getting accurate and reasonable answers. Indeed, doing this task today is enabled by large language models that are notable for their ability to achieve general-purpose language generation and other natural language processing tasks.(1)

One approach to asking Solr questions in natural language is leveraging the SQL query interface and integrating a text-to-SQL model. Two models are experimented with here: t5-small-awesome-text-to-sql2 based on Google t5-small and Natural SQL based on Llama. Though it’s easier to run the t5-small models, the accuracy of Llama in this experiment exceeds its counterpart by far, given the size and release date of each.

Introducing Quest

Quest is a containerized application built to capture user questions and display respective answers based on the connected database. It is composed of:

By installing Quest, the models can be run against data indexed in Solr. For more details on Quest, please check here.

Setup

Books Sales and Ratings dataset was used after renaming columns to match the Solr dynamic field naming convention.

The model used is Natural SQL 7B and the schema is set as follows:

  CREATE TABLE books (

    publishing_year_i INTEGER,

    book_name_t: TEXT,

    author_s: VARCHAR(100),

    language_code_s:VARCHAR(10),

    author_rating_s:VARCHAR(100),

    book_average_rating_f: FLOAT,

    book_ratings_count_i":INTEGER,

    genre_s: TEXT,

    gross_sales_f: FLOAT,

    publisher_revenue_f:FLOAT,

    sale_price_f:FLOAT,

    sales_rank_i:INTEGER,

    publisher_s:VARCHAR(100),

    units_sold_i:INTEGER

  ); 

Examples

Below is a sample of 2 questions and their respective answers given by Natural SQL model:

Q1: Show me the book with the highest sales price

A1:

SELECT book_name_t, sale_price_f

FROM books

ORDER BY sale_price_f DESC

LIMIT 1;

Q2: Which publisher have total revenue higher than $100k published fiction book?

A2:

SELECT publisher_s

FROM books

WHERE genre_s LIKE '%fiction%'

GROUP BY publisher_s

HAVING SUM(publisher_revenue_f) > 100000

LIMIT 1000

Demo

SolrQuest3.webm