Are we all prompting wrong? Balancing Creativity and Consistency in RAG.

For a Boston native like myself, there are few things more heartwarming than Artificial Intelligence understanding the brilliance of Good Will Hunting. A few cursory prompts reveal that it views it as a “must-watch tale of redemption and self discovery…


This content originally appeared on DEV Community and was authored by Simon Risman

For a Boston native like myself, there are few things more heartwarming than Artificial Intelligence understanding the brilliance of Good Will Hunting. A few cursory prompts reveal that it views it as a "must-watch tale of redemption and self discovery".

Chat Will Hunting

But a slightly closer look reveals what many users of LLMs have accepted as a given - slight variations on an otherwise consistent topic. This is the result of Stochastic Generation.

Stochastic generation 🤖

This is a fairly common term, from online bootcamps to college lectures, students of AI are familiar with this concept. For those who need a quick refresher, here is the 3-step generation loop that many LLMs follow.

LLMs are trained using a next-token prediction task, where the model predicts the next token in a sequence based on the previous tokens. This process involves:

  1. Tokenized Input: The input text is converted into a sequence of numbers (tokens).
  2. Probability Distribution: The model generates a probability distribution over the possible next tokens.
  3. Sampling Algorithm: This distribution is passed through a sampling algorithm to select the next token.

The probabilistic elements that this process introduces enables LLMs to generate more captivating dialogue, novel images, and creatively praise award-winning films.

Randomness and RAG 🎰

When building RAG based applications, we are often not as concerned with creativity as we are with facts. When dealing with facts, we want as little probability involved as possible. In other words, instead of sampling a probability distribution, its beneficial to just take the token with the maximum likelihood every time.

LLMWARE allows you to explore how random your generated results are, as well as augment how random you want them to be. Heres a quick demonstration:

Demo 🙌

Load the model

model = ModelCatalog().load_model("bling-stablelm-3b-tool",
                                  sample=True,
                                  temperature=0.3,
                                  get_logits=True,
                                  max_output=123)

In the load_model method, we make a few important selections. The bling 3B is one of our newest and highest performing models.

Setting the sample attribute to True or False will allow you to change between a stochastic approach and a top-token model.

The temperature can be an important tool to control the randomness of the output, with lower values making responses more focused and higher values increasing diversity in the generated text.

These key settings will allow you to see what kind of approach you want to take when it comes to the probabilistic nature of your model.

Run a simple inference model on some sample text

response = model.inference("What is a list of the key points?", sample)

This step is where your model is doing the heavy lifting, analyzing and summarizing the loaded-in documents.

Run a sampling analysis

sampling_analysis = ModelCatalog().analyze_sampling(response)
print("sampling analysis: ", sampling_analysis)

Now you get to see the analytics - giving you a better idea of how heavily your model samples from the lower-probability side of the distribution.

This analysis will include what percentage of the tokens selected by the model were also the highest probability output, and will note cases where the not-top-token was selected.

In cases where the top token was not selected, the below code will print out the exact entries of the outputs, including their token rank.

for i, entries in enumerate(sampling_analysis["not_top_tokens"]):
    print("sampled choices: ", i, entries)

All these tools can help you make an informed decision on whether you want your model to think a little outside the box, or stick to the most likely answer. To see this process in action, check out our youtube video on consistent LLM output generation.

The full code for this example can be found in our Github repo.

If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click here to join. See you there!🚀🚀🚀


This content originally appeared on DEV Community and was authored by Simon Risman


Print Share Comment Cite Upload Translate Updates
APA

Simon Risman | Sciencx (2024-06-17T18:44:07+00:00) Are we all prompting wrong? Balancing Creativity and Consistency in RAG.. Retrieved from https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/

MLA
" » Are we all prompting wrong? Balancing Creativity and Consistency in RAG.." Simon Risman | Sciencx - Monday June 17, 2024, https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/
HARVARD
Simon Risman | Sciencx Monday June 17, 2024 » Are we all prompting wrong? Balancing Creativity and Consistency in RAG.., viewed ,<https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/>
VANCOUVER
Simon Risman | Sciencx - » Are we all prompting wrong? Balancing Creativity and Consistency in RAG.. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/
CHICAGO
" » Are we all prompting wrong? Balancing Creativity and Consistency in RAG.." Simon Risman | Sciencx - Accessed . https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/
IEEE
" » Are we all prompting wrong? Balancing Creativity and Consistency in RAG.." Simon Risman | Sciencx [Online]. Available: https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/. [Accessed: ]
rf:citation
» Are we all prompting wrong? Balancing Creativity and Consistency in RAG. | Simon Risman | Sciencx | https://www.scien.cx/2024/06/17/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.