Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

University of California, Los Angeles

Abstract

Misunderstandings arise not only in interpersonal communication but also between humans and Large Language Models (LLMs). Such discrepancies can make LLMs interpret seemingly unambiguous questions in unexpected ways, yielding incorrect responses. While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped. In this paper, we present a method named `Rephrase and Respond' (RaR), which allows LLMs to rephrase and expand questions posed by humans and provide responses in a single prompt. This approach serves as a simple yet effective prompting method for improving performance. We also introduce a two-step variant of RaR, where a rephrasing LLM first rephrases the question and then passes the original and rephrased questions together to a different responding LLM. This facilitates the effective utilization of rephrased questions generated by one LLM with another. Our experiments demonstrate that our methods significantly improve the performance of different models across a wide range to tasks. We further provide a comprehensive comparison between RaR and the popular Chain-of-Thought (CoT) methods, both theoretically and empirically. We show that RaR is complementary to CoT and can be combined with CoT to achieve even better performance. Our work not only contributes to enhancing LLM performance efficiently and effectively but also sheds light on a fair evaluation of LLM capabilities.

Rephrase and Respond (RaR)

MY ALT TEXT

Demonstration of RaR. One-step RaR: one single prompt to ask the LLM to rephrase, expand and respond. Two-Step RaR: it involves first rephrasing the question and then using the original and rephrased question to improve the response quality.

(One-step) RaR

In interpersonal communication, rephrasing is a commonly known technique. People rephrase another person's question as a process of understanding, to ensure clarity and coherence in responding. Such a communication strategy can be similarly applied to an LLM, letting it generate a rephrased question first and provide an answer subsequently. Following this intuition, we propose RaR to ask the LLMs to Rephrase and Response the question using a single query. This approach can be viewed as a strategy to directly enhance the quality of the LLM's response.


        "{question}"
        Rephrase and expand the question, and respond.
      

Two-step RaR

To further leverage the quality improvement of the questions rephrased by larger models, like GPT-4, we introduce a variation of RaR called Two-step RaR. Intuitively, even among humans, a more detailed and precise question elicits in more accurate and decisive responses. Two-step RaR follows this intuition by designing a two-step procedure to improve the quality of the questions: in the first step, given a query question, we generate a self-rephrased query rephrased_question by prompting a rephrasing LLM with the following prompt:


        "{question}"
        Given the above question, rephrase and expand it to help you do better answering. Maintain all information in the original question.
      

Motivating Example

Why are we interested in RaR? Let's investigate the following motivating example. When posed with the query, "Was Mother Teresa born on an even month?"" GPT-4 might mistakenly assert that August is an odd month. We take a step further to investigate the intrinsic reason for LLM's inefficiency in answering such questions. As shown in the other three conversations in the figure, when GPT-4 explains its reasoning, it appears that the model has several ambiguities toward the questions. For example, it may consider February as odd due to its irregular number of days and sometimes consider an even/odd month to be months with an even/odd number of days.

In this paper, we highlight an often-overlooked aspect of studies in LLMs: the disparity between human and LLM thought frames. To tackle this problem, we propose to let the LLM to rephrase the question and incorporate additional details for better answering. Upon rephrasing by the LLM itself, the newly generated question is more detailed and has a clearer question format, as presented in the figure. This self-rephrasing technique leads to significant improvement in accuracy, as shown in the barplot.

Results

We investigate the performance of RaR by comparing the accuracy of GPT-4 with One-step RaR and Two-step RaR. Notably, both One-step RaR and Two-step RaR significantly improve GPT-4's accuracy. Notably, for tasks that GPT-4 originally finds highly challenging, RaR exhibits remarkable improvement even to almost 100% accuracy. Indeed, similar to human communication, rephrasing and elaborating a question and then answering is an effective approach. In summary:

  • (One-step) RaR provides a universal, plug-and-play black-box prompt that allows for efficient and effective performance improvement of LLMs on general tasks.
  • Two-step RaR provides a universal method for LLMs to improve the question quality autonomously by rephrasing the question.
  • Examining the question quality is pivotal when evaluating the LLM performance on QA tasks.

Performance across Various LLMs

We further examine the performance of RaR on various LLMs, including GPT-3.5 and Vicuna. In particular, we employ Two-step RaR to investigate (1) if all these LLMs can provide consistent response improvement by rephrasing the questions; and (2) if the GPT-4-rephrased questions can improve the performance of other LLMs.

  • All models can benefit from rephrasing questions, with more advanced models expected to gain a larger improvement.
  • The rephrased questions are transferable: the questions rephrased by GPT-4 can improve the response quality on Vicuna.

Comparison with Chain-of-Thought

We compare RaR with CoT. We present the mathematical formualtions of RaR and CoT and compare them with each other. We also present experimental results to show that (1) RaR offers improvements in scenarios where zero-shot CoT is ineffective; and (2) RaR addresses and corrects the shortcomings inherent in few-shot CoT.

BibTeX

@misc{deng2023rephrase,
        title={Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves}, 
        author={Yihe Deng and Weitong Zhang and Zixiang Chen and Quanquan Gu},
        year={2023},
        eprint={2311.04205},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
      }