Conversational Queries with YugabyteDB Using The LangChain Framework

Balachandar Seetharaman

In this blog post, we’ll walk through how to query an application’s data—stored in YugabyteDB—using natural language utilizing the LangChain Model.

Key Roles of LLMs in AI with Database

The emergence of Gen AI (Generative Artificial Intelligence) has caused a substantial technological paradigm shift. Large Language Models (LLMs), like GPT-3, have a significant role in the field of artificial intelligence (AI) and natural language processing (NLP). Here are some key roles and contributions:

  • Natural Language Interface to Databases (NLIDB): LLMs can serve as natural language interfaces to databases, allowing users to query databases using plain language rather than complex SQL queries. This simplifies data retrieval for non-technical users, thereby improving accessibility.
  • Recommendation Systems: LLMs can improve recommendation systems by analyzing user behavior and preference data stored in databases. They can generate personalized recommendations for products, content, or services.
  • Chatbots for Database Queries: LLM-powered chatbots can interact with users through natural language, responding to inquiries, offering insights, and pulling information from databases. This simplifies user access to database content.
  • Knowledge Discovery: LLMs can analyze databases to uncover hidden patterns, relationships, and insights that may otherwise go unnoticed. This offers valuable support for research, business intelligence, and decision-making.

Key Benefits of LLM Integration with YugabyteDB

YugabyteDB, a fully consistent distributed SQL database, facilitates the development of cloud native, transactional applications characterized by low latency, continuous availability, and globally distributed scalability.

The integration of Large Language Models (LLMs) like GPT-3 with YugabyteDB can significantly enhance the capabilities of these applications, offering a host of benefits and opportunities. Below are some compelling reasons to consider LLM integration with YugabyteDB:

  1. Improve data entry and querying for application developers, business analysts, or auditors.
    For example, a financial analyst wants to retrieve recent transactions for a specific customer’s account stored in YugabyteDB’s Table (e.g. cust_trans_dtl). The natural language interaction that is typed in is, “Show me all transactions for customer id 98675432 since the beginning of this month.” The output from SQL LangChain DB Model converts it to the following SQL query:SQL-like Query:

    SELECT * FROM cust_trans_dtl WHERE customer_id = 98675432 AND transaction_date >= '2023-10-01';

    This query will then return the results from the YugabyteDB’s table.

  2. Widely used for eCommerce and retail applications, YugabyteDB often requires integration with chatbots or virtual assistants through Natural Language Processing to support a diverse range of applications. For example, in an eCommerce platform that uses SQL-like LangChain DB and LLM integration to validate and secure product orders, a chatbot/customer interaction may play out as the example below:User: “I’d like to place an order for a laptop.”
    LLM: “Of course! To ensure the security of your order, please provide your account details.”
    User: “My account email is ‘user@example.com‘.”
    LLM: “Thank you. To further verify your identity, please provide your one-time password (OTP).”
    User: “My OTP is ‘123456’.”
    LLM: “Thank you for the verification. Let me generate a secure SQL-like transaction query for your order.”

    -- Ensure that the user is authorized and the OTP is valid SELECT * FROM user_accounts WHERE email = 'user@example.com' AND otp = '123456'; -- OTP verification -- If the user is authorized and the OTP is valid, place the order INSERT INTO orders (user_id, product_id, quantity, order_status) VALUES ( (SELECT user_id FROM user_accounts WHERE email = 'user@example.com'), (SELECT product_id FROM products WHERE product_name = 'Laptop'), 1, -- Quantity of 1 laptop 'Pending' -- Initial order status );

    While integrating LLMs with YugabyteDB can yield many advantages, It’s important to note the associated security, privacy, and data governance challenges. Organizations must carefully consider these factors and implement appropriate safeguards to protect sensitive data and ensure compliance with data regulations when implementing such integrations.

High-level solution architecture

The diagram below (Figure 1) shows the integration architecture of the Q&A application using YugabyteDB with the LangChain Model.

YugabyteDB LangChain end to end architecture
Figure 1: End-to-End Architecture

The table below shows the data flow sequences with their operations and tasks performed.

Data flow seq#Operations/TasksComponent Involved
1Question/answer app written in PythonPython/Flask
2Pass the data through LangChain which invokes OpenAI callLangChain Framework
3YugabyteDB stores the application data. As part of our example, we will use the Northwind sample database.YugabyteDB

How to Set Up

  1. Install YugabyteDB

    There are several options to install or deploy YugabyteDB if you don’t have one already available. Note: If you’re running a Windows Machine then you can leverage Docker on Windows with YugabyteDB

  2. Load Northwind Data in YugabyteDB

    Create the Northwind database in YugabyteDB and load the data.

  3. Install and set up LangChain

    Follow the instructions given in pypi for installing LangChain on your local machine.

  4. Create OpenAI Account and get your API Key

    Create your OpenAI account and get your API key.

  5. Setup the Sample application

    1. Environment ConfigurationCreate a folder named “cfg” in your working directory and copy the contents of the file below. Save it as “env.ini”.
      [LOCAL_DB]
      LOCALDBHOST = 10.31.16.2
      LOCALDBPORT = 5433
      LOCALDBNAME = northwind
      LOCALDBUSRNM = yugabyte
      LOCALDBPWD = xxxxxxx
      LOCALDBSCHNM = public
      
      # LOCALDB_URL_STRING = (
      #     "postgresql+psycopg2://"
      #     + LOCALDBUSRNM
      #     + ":"
      #     + LOCALDBPWD
      #     + "@"
      #     + LOCALDBHOST
      #     + ":"
      #     + LOCALDBPORT
      #     + "/"
      #     + LOCALDBNAME
      # )
      
      [OPEN_AI_VARS]
      OPEN_AI_KEY=xxxxxxxxxxxx
      OPEN_AI_ORG=personal
    2. Sample ApplicationCreate app.py file. Copy the content below and save.
      from datetime import datetime
      from flask import render_template, request, redirect, make_response
      #from app import app
      import pandas as pd
      import numpy as np
      import openai
      import os
      import configparser
      import langchain
      # Langchain specific libraries
      #from langchain_experimental.sql import SQLDatabase
      from langchain.utilities import SQLDatabase
      from langchain.llms import OpenAI
      from langchain_experimental.sql import SQLDatabaseChain
      parser = configparser.ConfigParser()
      
      # ######################################################################################
      # Get the environment variables stored in env.ini
      # ###################################################################################### 
      parser.read("cfg/env.ini")
      parser.sections()
      localdbusrnm = parser.get("LOCAL_DB", "LOCALDBUSRNM")
      localdbpwd = parser.get("LOCAL_DB", "LOCALDBPWD")
      localdbhost = parser.get("LOCAL_DB", "LOCALDBHOST")
      localdbport = parser.get("LOCAL_DB", "LOCALDBPORT")
      lcoaldbname = parser.get("LOCAL_DB", "LOCALDBNAME")
      
      LOCALDB_URL_STRING = (
      "postgresql+psycopg2://"
      + localdbusrnm
      + ":"
      + localdbpwd
      + "@"
      + localdbhost
      + ":"
      + localdbport
      + "/"
      + lcoaldbname
      )
      
      # open AI Keys
      
      os.environ["OPENAI_API_KEY"] = parser.get("OPEN_AI_VARS", "OPEN_AI_KEY")
      os.environ["MY_OPENAI_ORG"] = parser.get("OPEN_AI_VARS", "OPEN_AI_ORG")
      
      # Enter the Query here
      
      nw_qry = input("Please enter your Query in plain English: ")
      print(nw_qry)
      
      # Include tables should contain the list of tables required for your queries 
      
      db = SQLDatabase.from_uri(
       LOCALDB_URL_STRING,
       include_tables=[
         "categories",
         "customer_customer_demo",
         "customers",
         "customer_demographics",
         "employee_territories",
         "employees",
         "order_details",
         "orders",
         "products",
         "region",
         "suppliers",
         "territories",
         "us_states",
         ],
      )
      llm = OpenAI(temperature=0)
      db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
      nw_ans = db_chain.run(nw_qry)
      # Output of SQL Query with results
      
      print(nw_ans)

      Sample Output #1

      Request the customer details for a specific product using natural language.

      azureuser@baladevybplatform:~$ python ldbtest.py
      Please enter your Query in plain English: Give me the customer who ordered product Konbu
      Give me the customer who ordered product Konbu
      
      > Entering new SQLDatabaseChain chain...
      Give me the customer who ordered product Konbu
      SQLQuery:SELECT customers.customer_id, customers.company_name
      FROM customers
      INNER JOIN orders
      ON customers.customer_id = orders.customer_id
      INNER JOIN order_details
      ON orders.order_id = order_details.order_id
      INNER JOIN products
      ON order_details.product_id = products.product_id
      WHERE products.product_name = 'Konbu'
      LIMIT 5;
      SQLResult: [('HUNGC', 'Hungry Coyote Import Store'), ('LAUGB', 'Laughing Bacchus Wine Cellars'), ('HANAR', 'Hanari Carnes'), ('HILAA', 'HILARION-Abastos'), ('BSBEV', "B's Beverages")]

      The same output exists from the YugabyteDB client too.

      northwind=# SELECT customers.customer_id, customers.company_name
      northwind-# FROM customers
      northwind-# INNER JOIN orders
      northwind-# ON customers.customer_id = orders.customer_id
      northwind-# INNER JOIN order_details
      northwind-# ON orders.order_id = order_details.order_id
      northwind-# INNER JOIN products
      northwind-# ON order_details.product_id = products.product_id
      northwind-# WHERE products.product_name = 'Konbu'
      northwind-# LIMIT 5;
       customer_id |         company_name
      -------------+-------------------------------
       HUNGC       | Hungry Coyote Import Store
       LAUGB       | Laughing Bacchus Wine Cellars
       HANAR       | Hanari Carnes
       HILAA       | HILARION-Abastos
       BSBEV       | B's Beverages
      (5 rows)

      Sample Output#2

      Request the metadata information about customers’ tables.

      azureuser@baladevybplatform:~$ python ldbtest.py
      Please enter your Query in plain English: Give the tables that information on customers
      Give the tables that information on customers
      
      > Entering new SQLDatabaseChain chain...
      Give the tables that information on customers
      SQLQuery:SELECT table_name FROM information_schema.tables WHERE table_name LIKE '%customer%';
      SQLResult: [('customer_demographics',), ('customers',), ('customer_customer_demo',)]

      The same output exists from YugabyteDB client too.

      northwind=# SELECT table_name FROM information_schema.tables WHERE table_name LIKE '%customer%';
             table_name
      ------------------------
       customer_demographics
       customers
       customer_customer_demo
      (3 rows)

Alternative Large Language Models

For integration with YugabyteDB, you can use OpenAI’s GPT-3 language model with LangChainDB OR  Hugging Face’s Transformers library, which serves as an alternative to GPT-3 for various natural language processing tasks. The Transformers library offers access to an extensive selection of pre-trained language models, including but not limited to GPT-2, BERT, and RoBERTa.

Conclusion and Summary

In this blog post, we explored the use of Large Language Models within the LangChain framework to enhance the experience of business and technical users working with YugabyteDB. This integration offers several advantages:

  • Seamless integration with AI services
  • Streamlined data accessibility
  • Accelerated decision-making processes
  • Optimized data retrieval
  • Enhanced data exploration capabilities
  • Simplification of complex querying

These benefits cater to the diverse needs of both business and technical users within the YugabyteDB ecosystem.

Balachandar Seetharaman

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free