ask_youtube_playlists.question_answering package

Submodules

ask_youtube_playlists.question_answering.extractive module

Contains the functionality to perform extractive question answering.

ask_youtube_playlists.question_answering.extractive.get_extractive_answer(question: str, context: str, model_name: str = 'deepset/roberta-base-squad2') str[source]

Returns the answer to a question using extractive question answering.

Parameters:
  • question (str) – The question.

  • context (str) – The context.

  • model_name (str, optional) – The model name. Defaults to “deepset/roberta-base-squad2”.

Returns:

A dictionary with the ‘answer’ as a string, the ‘score’ as a float and the ‘start’ and ‘end’ as integers.

ask_youtube_playlists.question_answering.generative module

Contains the functionality to answer a question using generative models.

class ask_youtube_playlists.question_answering.generative.LLMSpec(model_name: str, model_type: str, max_tokens: int)[source]

Bases: object

Class to store the information of a language model.

model_name

The name of the language model.

Type:

str

model_type

The class or method used to load the language model.

Type:

str

max_tokens: int
model_name: str
model_type: str
ask_youtube_playlists.question_answering.generative.get_generative_answer(question: str, relevant_documents: List[Document], model_name: str, temperature: int, max_length: int) str[source]

Returns the answer to the question as a string.

Parameters:
  • question (str) – The question asked by the user.

  • relevant_documents (List[Document]) – The list of relevant documents.

  • model_name (str) – The name of the language model.

  • temperature (float) – The temperature used to generate the answer.

  • max_length (int) – The maximum length of the generated answer.

Returns:

The answer to the question.

Return type:

str

ask_youtube_playlists.question_answering.generative.get_model_spec(model_name: str) LLMSpec[source]

Returns the language model specification.

Parameters:

model_name (str) – The name of the language model.

Returns:

The language model specification.

Return type:

LLMSpec

Raises:

ValueError – If the language model is not available.

ask_youtube_playlists.question_answering.generative.load_model(model_name: str, temperature: float = 0.7, max_length: int = 1024) BaseLLM[source]

Loads the language model.

Parameters:
  • model_name (str) – The language model name.

  • temperature (float, optional) – The temperature used to generate the answer. The higher the temperature, the more “creative” the answer will be. Defaults to 0.7.

  • max_length (int, optional) – The maximum length of the generated answer. Defaults to 128.

Returns:

The language model.

Return type:

llms.base.BaseLLM

ask_youtube_playlists.question_answering.retriever module

Contains the functionality used to retrieve the most relevant documents for a given question.

class ask_youtube_playlists.question_answering.retriever.DocumentInfo(document: Document, score: float, playlist_name: str)[source]

Bases: NamedTuple

Class to store information about a document.

document

The document text or content.

Type:

langchain.schema.Document

score

The relevance score of the document. The higher the score, the more relevant the document is. It is in the range [0, 1].

Type:

float

playlist_name

The name of the playlist to which the document belongs.

Type:

str

document: Document

Alias for field number 0

playlist_name: str

Alias for field number 2

score: float

Alias for field number 1

class ask_youtube_playlists.question_answering.retriever.Retriever(retriever_directory: Path, config_filename: str = 'hyperparams.yaml')[source]

Bases: object

Class to retrieve the most relevant documents for a given question.

static cosine_distance(question_embedding: ndarray, document_embedding: ndarray) float[source]

Calculates the cosine distance between two vectors.

Parameters:
  • question_embedding (np.ndarray) – The embedding of the question.

  • document_embedding (np.ndarray) – The embedding of the document.

Returns:

The cosine distance between the two vectors.

Return type:

float

classmethod retrieve(retrievers: List[Retriever], question: str, n_documents: int) List[DocumentInfo][source]

Retrieves the most relevant documents with their score and the playlist they belong to.

This function retrieves documents in two steps:

  1. Extracts the most relevant documents from each retriever in

2. Ranks the retrieved documents from all retrievers and returns the most relevant ones, in addition to their score and the playlist they belong to.

Parameters:
  • retrievers (List[Retriever]) – A list of retrievers.

  • question (str) – The question posed by the user.

  • n_documents (int) – The number of documents to retrieve.

Returns:

A list of named tuples, each containing the document, its score and the playlist it belongs to. The list is sorted in descending order by relevance score.

Return type:

list

retrieve_from_playlist(question: str, n_documents: int) List[DocumentInfo][source]

Retrieves the most relevant documents with their relevance score.

Parameters:
  • question (str) – The question posed by the user.

  • n_documents (int) – The number of documents to retrieve.

property total_number_of_documents: int

Returns the total number of documents.

Module contents

Implements the question answering system.

It consists of three components: 1.- Retrieval: This component retrieves the most relevant documents for a given question.

2.- Extractive: This component extracts the most relevant sentences from the retrieved documents.

3.- Generative: This component generates an answer to the question from the extracted sentences.