# Stage V: The End!

## Running Workflows&#x20;

A pre-configured sample size, N, is drawn as subsets of mean-pooled tabular datasets. These subsets contain only the raw values (columns names are excluded). Each subset is then projected into a latent space with reduced dimensionality, aligned to the hidden block size of the sentence-transformer module.

From there, the transformed data is passed into scikit-learn or BigQuery PCA to further reduce its dimensionality, ensuring the dataset shape falls below 384 dimensions (the default embedding size of the model in use). PCA applies a numerical method, singular value decomposition (SVD), to retain the components with the highest explanatory variance.

## Database Schema

* End reports for each previous stage in this pipeline.

## User Interaction - Frontend&#x20;

At the end of this workflow, the following options are triggered to the user:

1. Download cleaning report (least priority - connecting to their database's subnet IP address, but this is not available for some datastores or that it can be more difficult to implement this)
2. Proceed to next stage:&#x20;
   1. Generate Database current quality scores OR
   2. Data insights & significance&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://whoamimi.gitbook.io/blog/projects/readme-1/gaby-ai-agent-features/data-cleaning/stage-v-the-end.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
