# Problem 1: Solution

## Objective

To validate the data types e.g. binary labels, ordinal labels etc. for feature transformations

## Problem&#x20;

* While equipping BigQuery AI to a data column for efficient, it is not consistent with its output. This is the same problem with running language models with equipped tools or function decorative wrappers. - *many catalysts but not enough time.*&#x20;
* This stage is important because you want to apply the correct data type transformations to progress the other stages without unwanted surprises.

## Solution

* Utilizing `sentence-transformers` with sample data representing the data types:
  * continuous&#x20;
  * binary labels
  * ordinal labels
  * multi-label categories
  * short text e.g. names, emails
  * long text e.g. chat logs&#x20;

Assuming all dependency or hierarchal relations are false, the formula resonates with the canonical cosine similarity form. Suppose the target, $$\hat{t}$$ , that maximizes cosine similarity over a normalized set $${q\_i}$$:

$$\hat{t} = \arg\max\_{t} ; \frac{\langle t, q\_i \rangle}{|t| , |q\_i|}$$

Simply, the pseudocode code as steps:

1. First compute $$\bar{q}$$ as the normalized set.
2. Then pick $$\hat{t}$$ as the target with the highest cosine similarity to $$\bar{q}$$

The data samples representing each data column label is stored in the directory path: `core/data/sample/*.csv`  - along with other assumptions that are used to configure agent.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://whoamimi.gitbook.io/blog/projects/readme-1/gaby-ai-agent-features/data-cleaning/stage-i-defining-and-understanding-the-dataset/problem-1-solution.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
