Weekly Update: New LLM Models and the Basics of RAG


AI Tech Circle

Stay Ahead in AI with Weekly AI Roundup; read and listen on AITechCircle:

Welcome to the weekly AI Newsletter, your go-to source for practical and actionable ideas. I’m here to give you tips to apply to your job and business immediately.

Before we start, share this week’s updates with a friend or a colleague:

Today at a Glance:

  • Start your learning journey to RAG
  • Generative AI Usecase: Simulating Urban Planning Scenarios (Urban Planning/Future of Cities)
  • AI Weekly news and updates covering newly released LLMs
  • Open Tech Talk Podcast, the latest episode on career Growth Strategies with Executive Coach Vladimir Baranov

RAG Basics: A Beginner’s Guide to Retrieval-Augmented Generation

if you are new to this topic before you start reading this, I would suggest you go through these 2 earlier editions of the newsletter:

  1. Build Your Business Specific LLMs Using RAG.“​ this is to understand the fundamentals
  2. Chat with Knowledge Base through RAG. Technical code to build RAG-Based Chatbots

This week, we will start with the very basic RAG from scratch, based on the repository available from Mistral LLM. The goal is to clarify your understanding of RAG’s internal workings and equip you with the foundational knowledge needed to construct an RAG using minimal dependencies.

Let’s start with installing the required packages:

Now, to get the data from any article or document, or web source:

response = requests.get(‘https://www.gutenberg.org/cache/epub/1513/pg1513.txt‘)text = response.text

Then, split the data into Chunks: In a Retrieval-Augmented Generation (RAG) system, breaking the document into smaller chunks is essential for efficiently identifying and retrieving the most relevant information during the retrieval process. In this example, we split the text by characters and grouped 2048 characters into each chunk.

Key points:

Chunk size: To achieve optimal performance in RAG, we may need to customize or experiment with different chunk sizes and overlaps based on the specific use case. Smaller chunks can be more beneficial for retrieval processes, as larger chunks often contain filler text that can obscure semantic representation. Using smaller chunks allows the RAG system to identify and extract relevant information more effectively and accurately. However, be mindful of the trade-offs, such as increased processing time and computational resources, that come with using smaller chunks.

How to split: The simplest method is to split the text by character, but other options are based on the use case and document structure. To avoid exceeding token limits in API calls, you might need to split the text by tokens. Consider splitting the text into sentences, paragraphs, or HTML headers to maintain chunk cohesiveness. When working with code, it’s often best to split by meaningful code chunks, such as using an Abstract Syntax Tree (AST) parser.

Creation of embeddings for each text chunk:

Text embeddings convert text into numeric representations in a vector, enabling the model to understand semantic relationships between words. Words with similar meanings will be closer in this space, which is crucial for tasks like information retrieval and semantic search.

To generate these embeddings, we use Mistral AI’s embeddings API endpoint with the mistral-embed model. We create a function called get_text_embedding to retrieve the embedding for a single text chunk. Then, we use list comprehension to apply this function to all text chunks and obtain their embeddings efficiently.

Loading into Vector Database: after getting the embeddings in place, we need to store them in the Vector Database.

The question that the user will ask needs to create embeddings and then receive similar chunks from the Vector DB.

To perform a search on the vector database, we use the index.search method, which requires two arguments: the vector of the question embeddings and the number of similar vectors to retrieve. This method returns the distances and indices of the most similar vectors to the question vector in the database. Using these indices, we can then retrieve the corresponding relevant text chunks.

There are some common methods:

  1. Similarity Search with Embeddings: This method uses embeddings to find similar text chunks based on their vector representations. It’s a straightforward approach that directly compares the vector distances.
  2. Filtering with Metadata: If metadata is available, it can be beneficial to filter the data based on this metadata before performing the similarity search. This can narrow the search space and improve the relevance of the results.
  3. Statistical Retrieval Methods: TF-IDF (Term Frequency-Inverse Document Frequency) evaluates the importance of a term in a document relative to a collection of documents. It uses the frequency of terms to identify relevant text chunks. BM25 is a ranking function based on term frequency and document length, which provides a more nuanced approach to identifying relevant text chunks compared to TF-IDF.

Combine Context and Question in a Prompt to Generate a Response:

Lastly, we can use the retrieved text chunks as a context within the prompt to generate a response.

Prompting Techniques for Developing a RAG System: In developing a Retrieval-Augmented Generation (RAG) system, various prompting techniques can significantly enhance the model’s performance and the quality of its responses.

Here are some key techniques that can be applied:

  • Few-Shot Learning: Few-shot learning involves providing the model with a few task examples to guide its responses. By including these examples in the prompt, the model can better understand the desired format and context, leading to more accurate and relevant answers. Example: Suppose you are building an RAG system to answer questions about historical events. The prompt could include a few examples of questions and answers to show the model how to respond appropriately.
  • Explicit Instructions: Explicitly instructing the model to format its answers in a specific way can help standardize the output, making it more consistent and easier to interpret. This can be especially useful for tasks that require a specific structure, such as generating reports or summaries. Example: If you need the model to provide responses in bullet points or a numbered list, you can include these instructions in the prompt to ensure the output follows the desired format

Head over to this link, and you can try building your first simple RAG.

Weekly News & Updates…

Last week’s AI breakthroughs marked another leap forward in the tech revolution.

  1. Llama 3.1 has introduced a new version that has improved reasoning capabilities, a larger 128K token context window, and support for 8 languages
  2. FLUX.1 suite of models from Black Forest Labs that push the frontiers of text-to-image synthesis
  3. GitHub Models: you can access different models via a built-in playground on GitHub
  4. Meta Segment Anything Model 2 (SAM 2), the first unified model for real-time, promptable object segmentation in images & videos.
  5. Prompt Tuner from Cohere uses customizable optimization and evaluation loops to refine prompts for generative language use cases.
  6. Minitron is a family of small language models (SLMs) obtained by pruning NVIDIA’s Nemotron-4 15B model

The Cloud: the backbone of the AI revolution

  • Deploy Llama 3.1 405B in OCI Data Science Link
  • Designing Generative AI Solutions: Key Lessons Learned Link
  • Everybody Will Have an AI Assistant,’ NVIDIA CEO Tells SIGGRAPH Audience Link

Gen AI Use Case of the Week:

Generative AI use cases in the Government and Public Sector :

Utilizing large language models (LLMs) for Simulating Urban Planning Scenarios (Urban Planning/Future of Cities), this use case derived from Deloitte

Business Challenges

  1. Complexity of Urban Planning: Urban planning involves numerous variables, including demographics, infrastructure, environmental impact, and economic factors. Integrating these into a coherent plan is complex.
  2. Time-Consuming Processes: Traditional urban planning methods are time-consuming and require extensive manual labor and iteration.
  3. Resource Constraints: Limited access to real-time data and advanced tools can hinder effective planning and decision-making.
  4. Public Participation: Ensuring community engagement and feedback in the planning process can be challenging.

AI Solution Description

Using large language models (LLMs), generative AI can simulate urban planning scenarios by processing vast data and generating multiple design concepts.

Here’s how it can be done:

Data Integration: The AI model ingests various data sources, including demographic data, environmental reports, infrastructure details, and economic statistics.

Scenario Generation: The LLM processes this data to generate multiple urban planning scenarios. It can create detailed descriptions, visualizations, and potential outcomes for each scenario.

Simulation and Optimization: The generated scenarios are then simulated to predict their impacts. The AI model optimizes these scenarios based on predefined goals, such as sustainability, economic growth, and livability.

Expected Impact/Business Outcome

  • Revenue: More efficient planning processes can lead to cost savings and better resource allocation, ultimately boosting economic development.
  • User Experience: Improved urban design and infrastructure enhance the quality of life for residents and increase public satisfaction.
  • Operations: Streamlined planning processes reduce the time and effort required for urban development projects.
  • Process: Automating data analysis and scenario generation speeds up the planning process and improves decision-making.
  • Cost: Reducing manual labor and errors in planning processes can significantly cut down costs.

Required Data Sources

  • Environmental impact reports
  • Infrastructure data
  • Economic Statistics
  • Historical urban planning data
  • Public feedback and survey results

Strategic Fit and Impact

Implementing generative AI in urban planning aligns well with the strategic goals of modernizing infrastructure, improving public services, and fostering sustainable development. The high impact rating reflects its potential to transform urban planning processes, leading to more efficient and effective development outcomes.

Rating: High Impact & strategic fit

Favorite Tip Of The Week:

Here’s my favorite resource of the week.

  • NVIDIA founder and CEO Jensen Huang and Meta founder and CEO Mark Zuckerberg explore how fundamental research drives AI breakthroughs. They highlight how generative AI and open-source software are empowering developers and creators. Additionally, they discuss the role of generative AI in creating virtual worlds and the potential of these worlds in advancing the next generation of AI and robotics.

Potential of AI

  • Try experimental demos featuring the latest AI research from Meta, AI Demos

Things to Know…

This week, I liked the resources on AI and Generative AI from Georgetown University on how to use Gen AI and cite it in your articles, papers, and research.

“To cite the informational product generated by ChatGPT or other AI, the recommendation is for the Methodology and/or Introduction of your paper to specify the following:

  • The prompt you used when utilizing ChatGPT; and
  • The text that the chatbot produced in response. If the response from ChatGPT is lengthy, please include it in the form of an Appendix.

Please remember that if AI connects you to another resource, you need to cite that resource, just as you would in a literature review.”

The Opportunity…

Podcast:

  • This week’s Open Tech Talks episode 141 is “Career Growth Strategies with Executive Coach Vladimir Baranov,” Founder and Certified Executive Coach of Human Interfaces. As an entrepreneur and the business leader behind several successful tech companies, he knows what it takes to survive and blossom in today’s chaotic business landscape. Tune in to gain valuable insights and actionable strategies to advance your career, communicate effectively, sell your products, and navigate common challenges in the technology field.

Apple | Spotify | Youtube

Courses to attend:

  • Prompt Compression and Query Optimization: In this course, you’ll learn to integrate traditional database features with vector search capabilities to optimize the performance and cost-efficiency of large-scale RAG applications.
  • IBM: AI for Everyone: Master the Basics: Understand AI, its applications and use cases, and how it transforms our lives. Explain terms like Machine Learning, Deep Learning, and Neural Networks.
  • Embedding Models: From Architecture to Implementation: This course covers the details of the architecture and capabilities of embedding models, which are used in many AI applications to capture the meaning of words and sentences.

Events:

Tech and Tools…

  • Zerox OCR: A simple way of OCR-ing a document for AI ingestion
  • MIMIC-CXR Database: The MIMIC Chest X-ray (MIMIC-CXR) Database v2.0.0 is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies

Data Sets…

  • GOT-10k: Generic Object Tracking: The dataset contains over 10,000 video segments of real-world moving objects and over 1.5 million manually labeled bounding boxes
  • ExecuTorch enables on-device inference capabilities across mobile and edge devices, including wearables, embedded devices, and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.

Other Technology News

Want to stay on the cutting edge?

Here’s what else is happening in Information Technology you should know about:

  • AI Startup Anthropic Faces Backlash for Excessive Web Scraping as reported by Techopedia
  • Has the AI bubble burst? Wall Street wonders if artificial intelligence will ever make money. a story covered by CNN

Join a mini email course on Generative AI …

Introduction to Generative AI for Newbies

The opinions expressed here are solely my conjecture based on experience, practice, and observation. They do not represent the thoughts, intentions, plans, or strategies of my current or previous employers or their clients/customers. The objective of this newsletter is to share and learn with the community.