Introduction
When we think about "improving AI response accuracy," two approaches we can't avoid are: Finetune and Prompt Engineering.
Both of these approaches aim to achieve the goal of using "private data + large models" to enable machines to understand human intentions and provide responses that closely align with human expectations.
Finetune or Prompt Engineering?
Finetune can be considered as small-scale training of large models. Apart from differences in the dataset, it doesn't fundamentally differ from pretraining in terms of technical processes. However, this approach may not be suitable for all businesses for several reasons. First, it's expensive. Though Finetune is cheaper than pretraining, it's much more expensive than Prompt Engineering. Second, it might not be as accurate as Prompt Engineering. Ideally, when the private dataset is large enough and highly similar to the pre-trained model, Finetuning becomes much easier. However, the reality is:
1. Finetuning requires well-annotated datasets. Raw private data can not be used directly.
2. Finetuning is prone to overfitting. If your dataset is similar to the model's data but has too few parameters, it can significantly affect the model's generalization, leading to overfitting.
3. Finetuning can lead to model "forgetfulness." Contrary to 2, if the dataset differs significantly from the original training data, the model can easily forget old tasks while learning new ones, leading to "catastrophic forgetting."
4. Enterprise private data often updates too quickly, demanding frequent Finetuning with new "small data," which is time-consuming and resource-intensive.
Clarify Prompt Engineering
Under current conditions, we adopt Prompt Engineering strategies in the practical deployment of AI applications. This strategy allows us to achieve excellent results with limited resources. Below, you'll see how effectively using prompt engineering can tap into the capabilities of large models.
Human-Generated Prompts
Many people narrowly understand prompts as questions or demonstrations within various "GPT" models. Clearly, these are entirely human-generated prompts. Prompts are generally constructed from two elements: "context + instruction." Elements like role prompts and few-shot examples can be considered as part of the "context." You can provide an initial setup to the prompt and use it in your application:
If you want better control over the entire consultation process, you can further supplement process information. Here, we'll take the example of a hotel booking AI scenario:
Human-Machine Collaboratively Generated Prompts
However, if the "context" you give to the large model is too long, it becomes challenging to support massive input contexts. It consumes many tokens, and models have limitations (even Claude only supports up to 100,000 tokens).
For example, if you have a 180,000-word document, how do you make the large model learn from it? This is where we need to leverage Retrieval Augmented Generation (RAG) technology. In simple terms, it allows us to retrieve data outside the base model and inject relevant data into the context, enhancing prompt information to ensure response quality and accuracy.
Specifically, we need to use the similarity search capability of a "vectorized database" and "embedding" to preprocess the 180,000-word text, refine key information, and combine it with human questions (instructions) to create the final version of the prompt for the model. The model responds more contextually based on this prompt. The following flowchart explains the process of collaboratively generating prompts between humans and machines (we are about to launch a productized tool for creating AI applications):
Here's an example to help you understand this process. If you are a newly hired employee who wants to know how to calculate annual leave without looking up the information, you can directly ask the administrative assistant, "How should I calculate annual leave?" It will search in the vector database based on your question and provide you with the following content:
At this point, the search results become the context, and using the large model, the assistant can provide you with a precise response. In other words, after vectorization processing, combined with the user's question, it becomes the final prompt given to the large model. Based on this prompt, the assistant will give you the following response:
Some Ongoing Use Cases
Based on the principles mentioned above, we have already implemented examples in various vertical scenarios. Here, you can compare the results of direct conversation with large models and answers driven by Prompt Engineering.
Take this e-commerce shopping assistant as an example (blue-purple background images show results using the methods mentioned above, while the other is the result of directly using GPT-4):
Responses are more service-oriented and less redundant:
For specific questions like clothing combinations, the results based on our own enterprise's private data are more in line with expectations, with shorter, less verbose responses. When it comes to technology selection, the biggest value of large models lies in using a wide-ranging foundational model at minimal cost (adjusting prompts) to create our "small models." Many times, these models outperform fine-tuning, so we prioritize using the Prompt Engineering approach.
About Momen
Momen is a no-code web app builder, allows users to build fully customizable web apps, marketplaces, Social Networks, AI Apps, Enterprise SaaS, and much more.
Momen AI is our new feature that enables you to build your own AI apps with ease. Beyond LLM integration, Momen AI's data pre-processing enables you to shape your GPTs and respond with context and reasoning. All of this can be done within a few minutes!