# Problem Summary

**Point 1**

Text datasets for emotional / behaviour analysis is poorly classified, for example:

* <https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset>
* <https://www.kaggle.com/datasets/shivamb/go-emotions-google-emotions-dataset>

The main problem is that the labels are not correctly classified and that they hold bias to the label. Maybe Generative models cannot completely solve this but running many of them concurrently with Bayesian Statistical methods, they just might. &#x20;

**Point 2**

As I work in data teams, every time I get to see a fun project I have to do the boring tasks first for example:

* Data scraping (XML)
* Data Analytics
* Debugging ETL pipelines

Sometimes there are some surprises where the workflow would crash on Monday morning. These system failures may not be stopped in the future but can be reduced by a bot that exactly mimics me.&#x20;

When a data pipeline is built incorrectly it may lead to many costly mistakes but that doesn't mean the programmer is at fault and noone can know for certain. One can imagine that if some programmer can be as devoted to the problem as he is to his wife at home, surely, the problem would stop surprising him with another man (open sourced ports).&#x20;

The current solution entails running multiple daemon processes with unstoppable life-cycles - without these, Apple / Mac devices wouldn't be here today.  If daemon processes could think, adapt, learn and make decisions on their own then maybe we might have more time to think about whether we are truly alive.

I once questioned this world as to why this buffoonery of tools had not been automated, and everyone assumed I was just a Padawan who did not know any better. Now that I am older and a little bit taller, I am going to ask this question again: *“How has this not been automated?”*

One of the ways in gaining market exposure is by leveraging the social platforms like Reddit, Facebook, Instagram etc. Out of boredom, I've attempted manipulating these algorithms through *botnets* i.e. temporary testing users. I am well aware that this is not an ethical solution but I do it anyway and I am proud of it. That being said, I am proud of being part of the rare percentile of users that are flagged / banned / revoked from platforms including League of Legends.

I perceive these failures as lessons and I use these lessons to give feedback to my programming skills. Sometimes, the consequences worsen but on good days, the living state of my botnets last a little longer. I am an amateur so I make errors like missing out control variables like the peak hours of active users - should I analyze the causal effect or apply correlation inferencing methods to this feature? The answer is simple: just do it i.e. *pick one -> negate one -> repeat cycle until you are left with one option.*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://whoamimi.gitbook.io/blog/projects/readme-1/problem-summary.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
