Retrieval

In this section, we will discuss how to enhance ChatPLUG with a search engine or local knowledge base, using the Bing search engine as an example.

BING_SEARCH_API

To use Bing search, we need to obtain a subscription key by following the official documentation provided in the links for an overview and to apply for the API. Once we have the key, we can export it using the provided command.

export BING_SEARCH_API=<subscription_key>

OpenWeb

The OpenWeb class provides a wrapper for the Bing search engine to return a list of text snippets.

class OpenWeb(object):
 
    def __init__(self, search_engine=None):
        if search_engine is None:
            search_engine = BingSearch()
        self.search_engine = search_engine
    
    @lru_cache
    def search(self, query) -> (List[Snippet], bool):
        """
        search with cache.

        Args:
            query: search_query

        Returns:
            snippets: list of snippets
            is_special_card: True/False
        """
        return self.search_engine.search(query)

Learn2Search

To improve performance, we can implement a query classifier and query rewriter model to determine whether a question needs to be searched and rewrite it as a suitable search query.

For simplicity, we will use text_is_question as the query classifier and the last utterance as the search query. For better performance, we may need to build our own query classifier and query rewriter.

class BaseLearn2Search(object):
    def __init__(self):
        print(f'| skip query_classifier.')    
        self.query_classifier = None

    def need_search(self, query: str) -> Tuple[bool, str]:
        return text_is_question(query) and not is_persona_question(query), CHITCHAT_QUERY
    
    def get_search_query(self, query: str, history: List[HistoryItem]):
        # only use the last query
        return query

Config

To configure ChatPLUG to use Bing search, we need to enable the use of the OpenWeb class in the chatplug_3.7B_sftv2.6.0_instruction.hjson file. We also need to specify the directory and provider for the utterance rewriter and provide the path for the learn2search query classifier.

openweb_use: true

# rewrite
utterance_rewriter_save_dir: ""
utterance_rewriter_is_onnx: false
utterance_rewriter_quantized: false
utterance_rewriter_provider: cuda

# learn2search
learn2search_query_classifier_path: ""

Overall, these steps will allow us to enhance ChatPLUG with a search engine or local knowledge base, providing users with more accurate and helpful responses.