Chat with your Data in the Database without writing SQL


AI Tech Circle

Stay Ahead in AI with Weekly AI Roundup; read and listen on AITechCircle:

Welcome to the weekly AI Newsletter. After a three-week break due to the summer holidays, I hope you all had a well-deserved break from the busy schedule of an employee or running your own business. This newsletter is helping me and several community members as a go-to source for practical and actionable ideas. I’m here to give you tips to apply to your job and business immediately.

Before we start, share this week’s updates with a friend or a colleague:

Today at a Glance:

  • Chatting with DB in natural language
  • Generative AI Use case: Summarizing Legislative Documents
  • AI Weekly news and updates covering newly released LLMs
  • Open Tech Talk Podcast, the latest episode on career Growth Strategies with Executive Coach Vladimir Baranov

A shift from writing SQL queries to writing in natural language

my first interaction with the SQL language was during the University masters program, and then I went in depth. When I did the Oracle certification, From point onwards, SQL/PLSQL became part of my everyday job. Now, I am experiencing no other way of interacting with the database, which is not as tricky as learning the SQL and memorizing the syntax, which you keep forgetting over time.

As the LLMs are ruling the IT industry and impacting every aspect of our ecosystem, this impact also comes to the databases, and the question started coming to mind over a year ago:

How can we talk to the database rather than writing SQL queries to interact with it?

Oracle has released Database version 23ai, which was covered in the earlier newsletter episode. In it, you will find the basics of the new features introduced. I suggest heading to the earlier edition, “AI comes to the Database at the core of your data,” and reading this before moving forward.

In this release, you will gain practical knowledge of interacting with the Oracle database, similar to what you use today: Generative AI.

Select AI is the central feature enabling the Oracle database to allow you to chat with the data.

Select AI

LLMs now allow you to interact with your database using natural language, such as querying it in plain English.

With Select AI, the Oracle Autonomous Database handles the conversion of natural language into SQL, so you can provide a natural language prompt instead of writing SQL code to access your data.

Select AI has revolutionized productivity for users and developers. It allows those with limited SQL knowledge to extract valuable insights without needing to understand complex data structures or technical languages—no more memoizing the syntax.

Now, the question comes to mind: How do I get started? That’s what I did this weekend.

The DBMS_CLOUD_AI package in Oracle Autonomous Database facilitates the integration of a user-specified LLM for generating SQL queries based on natural language prompts. This package helps the LLM understand the database schema and guides it in creating SQL queries that align with that schema.

DBMS_CLOUD_AI is compatible with LLMs such as Oracle Cloud Infrastructure Generative AI, OpenAI, Cohere, and Azure OpenAI Service

To start first time Select AI in your database, you need to follow the steps mentioned in this Configuring Select AI for First Use (oracle.com)

Setup Autonomous Database to access OCI Generative AI

A credential is used to sign LLM API requests. We will use the Autonomous Database resource principal to access the OCI Generative AI/LLM.

Create AI profiles to access LLM.

AI profiles are created to facilitate and configure access to an LLM and to set up the generation of SQL statements from natural language prompts. Profiles capture the properties of your LLM provider and the tables and views you want to enable for natural language queries. You have the option to create multiple profiles (e.g., for different LLMs); however, only one is active for the current session.

Let’s get started: you’ll need to:

  • Create a profile that describes your LLM provider and the metadata (schemas, tables, views, etc.) that can be used for natural language queries.
  • Set the profile for your session. Because we’re accessing a single LLM, create a LOGON trigger that sets the profile for your session.

More information on DBMS_CLOUD_AI Package DBMS_CLOUD_AI Package (oracle.com)

Healthcare Dataset for exploration with Select AI

I have taken a dataset from the National Center for Health Statistics (2021): ​Vital Statistics Natality Birth Data​. The NBER Data File name is nat2021us, and it is available from Natality Data from the National Vital Statistics System of the National Center for Health Statistics USA, which provides demographic and health data for births.

I quickly loaded the files in the Oracle Autonomous database through the Dataload utility.

Now, let’s start chatting with the data that I have uploaded.

When I asked the question to show the overall count over the mother’s age, the LLM understood, and it suggested how you can get it.

You can do so many questions and answers with the data. For another example, I have asked for the results of male and female infants’ data. There is a typo mistake in the syntax, but even then, it gives us the correct results. This was not possible at all with the SQL queries.

In this example, I asked about the maximum birth weight of the infant.

Now, the same query but with the chat function, and you can see the results; it also explained why ‘9999’ was ignored.

So, I have learned a new way to interact with the database and explore the data rather than writing SQL queries. I am leaving an action for you. If you want to do it yourself, just head over to this Lab, ‘​Chat with your data in Autonomous Database using generative AI​. ‘

Weekly News & Updates…

Last week’s AI breakthroughs marked another leap forward in the tech revolution.

  1. Safety Modes: Cohere has launched the beta launch of Safety Modes, a new feature in the Cohere Chat API that allows customers to tailor model outputs to meet their specific safety needs. link
  2. CogVideoX 5B is an open-source version of the video generation model originating from QingYing. link
  3. Create Gems for customized help — from coding to career advice. Gems is a new feature that lets you customize Gemini to create personal AI experts on any topic you want. link
  4. Alibaba releases Qwen2-VL and is open-source Qwen2-Vl-2B and Qwen2-VL-7B under an Apache 2.0 license. It can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. link

The Cloud: the backbone of the AI revolution

  • Unlocking AI potential: NVIDIA NIM on OCI, link
  • Designing Generative AI Solutions: Key Lessons Learned, link
  • NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Debut link

Gen AI Use Case of the Week:

Generative AI use cases in the Government and Public Sector :

Summarizing Legislative Documents in the Legislative Administration area, this use case is derived from Deloitte.

This use case involves using large language models (LLMs) to automate the summarization of government and public-sector legislative documents.

Business Challenges

  1. Volume of Documents: Legislative bodies generate massive documentation, making it difficult for staff to read and summarize all content efficiently.
  2. Complex Language: Legislative documents often contain complex legal language, requiring specialized knowledge to interpret and summarize accurately.
  3. Time Sensitivity: Timely access to summarized information is crucial for decision-making and legislative actions.

AI Solution Description

The legislative document summarizing can be automated using Large Language Models (LLMs). The LLM is trained in diverse legal and legislative texts, enabling it to understand and process complex legal jargon. When a new legislative document is uploaded, the model automatically generates a concise summary, highlighting the legislation’s key points, implications, and potential impact. Legislative staff can then review and refine this summary, ensuring accuracy and relevance.

Expected Impact/Business Outcome

  • Revenue: By reducing the time and resources needed to summarize documents, costs associated with manual labor are lowered, indirectly contributing to cost efficiency.
  • User Experience: Legislative staff can access summaries quickly, improving their ability to make informed decisions and increasing overall productivity.
  • Operations: Streamlined document processing and summarization lead to more efficient legislative operations.
  • Process: Automating the summarization process reduces the likelihood of human error and ensures consistency in the presentation of legislative summaries.
  • Cost: Reduces the need for extensive manual labor and time investment in document review, leading to significant cost savings.

Required Data Sources

  • A comprehensive dataset of existing legislative documents, including bills, laws, and amendments.
  • Training data from legal and legislative texts to fine-tune the LLM for accurate summarization.

Strategic Fit and Impact

This solution has a high strategic fit and impact due to its potential to significantly improve the efficiency of legislative operations, reduce costs, and enhance the accuracy and accessibility of legislative information.

Rating: High Impact & strategic fit

Favorite Tip Of The Week:

Here’s my favorite resource of the week.

  • LM-class is an introduction-level education resource for contemporary language modeling, broadly construed. It relies on a prior understanding of machine learning and neural networks at the introduction level and undergraduate-level programming, probability theory, and linear algebra. The materials were developed for Cornell Tech’s CS 5740 Natural Language Processing.

Potential of AI

  • Andrew Ng has published that the LLM prices are down 79%. “Following a recent price reduction by OpenAI, GPT-4o tokens are now priced at $4 per million, based on a blended rate assuming 80% input tokens and 20% output tokens. At its initial launch in March 2023, GPT-4 tokens were priced at $36 per million tokens. Over 17 months, this price decrease represents approximately a 79% annual reduction in cost (calculated as 4/36 = (1 – p)^{17/12}).”

Things to Know…

U.S. AI Safety Institute Signs Agreements Regarding AI Safety Research, Testing, and Evaluation With Anthropic and OpenAI

The U.S. Artificial Intelligence Safety Institute, part of the National Institute of Standards and Technology (NIST) at the U.S. Department of Commerce, has announced agreements for formal collaboration with both Anthropic and OpenAI on AI safety research, testing, and evaluation. Each company’s Memorandum of Understanding sets the stage for the U.S. AI Safety Institute to gain access to significant new models from these companies before and after they are publicly released. These agreements will facilitate joint research on assessing capabilities, identifying safety risks, and developing strategies to mitigate those risks.

The Opportunity…

Podcast:

  • This week’s Open Tech Talks episode 141 is “Career Growth Strategies with Executive Coach Vladimir Baranov,” Founder and Certified Executive Coach of Human Interfaces. As an entrepreneur and the business leader behind several successful tech companies, he knows what it takes to survive and blossom in today’s chaotic business landscape.

Apple | Spotify | Youtube

Courses to attend:

  • Generative AI with Large Language Models from Deep Learning & AWS. Gain foundational knowledge, practical skills, and a functional understanding of generative AI.
  • CS324 – Large Language Models. In this course, students will learn the fundamentals of modeling, theory, ethics, and systems aspects of large language models and gain hands-on experience working with them.

Events:

Tech and Tools…

  • kotaemon: An open-source, clean & customizable RAG UI for chatting with your documents.
  • Firecrawl: Crawl and convert any website into LLM-ready markdown or structured data

Data Sets…

  • Visual Commonsense Reasoning (VCR) is a new task and large-scale dataset for cognition-level visual understanding. The dataset consists of image/metadata files and annotations. Link
  • YouCook2 from the University of Michigan is one of the vision community’s most enormous task-oriented instructional video datasets. It contains 2000 long untrimmed videos from 89 cooking recipes; each distinct recipe has 22 videos on average. Link

Other Technology News

Want to stay on the cutting edge?

Here’s what else is happening in Information Technology you should know about:

  • Oprah Winfrey To Host ‘AI And The Future Of Us’ ABC Special With Open AI CEO Sam Altman, Bill Gates, FBI Director Christopher Wray, More as reported by Deadline
  • Doomed to fail? Most AI projects flop within 12 months, with billions of dollars being wasted, a story covered by TechRadarPro

Join a mini email course on Generative AI …

Introduction to Generative AI for Newbies

The opinions expressed here are solely my conjecture based on experience, practice, and observation. They do not represent the thoughts, intentions, plans, or strategies of my current or previous employers or their clients/customers. The objective of this newsletter is to share and learn with the community.