Skip to content

Validate persona description is sufficiently different #1225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

aponcedeleonch
Copy link
Contributor

Closes: #1218

Check if the description for a new persona is different enough from the existing personas descriptions. This is done to correctly differentiate between personas

@aponcedeleonch aponcedeleonch requested review from JAORMX and ptelang March 5, 2025 11:21
Closes: #1218

Check if the description for a new persona is different enough from
the existing personas descriptions. This is done to correctly
differentiate between personas
@aponcedeleonch aponcedeleonch force-pushed the prevent-similar-personas branch from befb7b4 to 490e355 Compare March 5, 2025 11:21
)
# If the distance is less than the threshold, the persona description is too similar
if persona_distance.distance < self._persona_diff_desc_threshold:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is it to parallelize this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 2 approaches to get the distances:

  1. Direct query to sqlite and get the distances
  2. Query to get all the personas, then use numpy matrices operations to get the distance

The result of the experiment was that it didn't matter, both of them were practically onpar.

For this specific comparison for just checking the threshold probably makes no difference to parallelize it with matrices operations. I don't expect someone having 1000 different personas in their DB. If it happens then yes, we would need for optimization. Probably with a sensible amount of personas (<10) really makes no difference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take an extreme but not unreasonable example: 100 personas. Would we start seeing issues in this case?

@aponcedeleonch aponcedeleonch merged commit da69ec0 into main Mar 5, 2025
11 checks passed
@aponcedeleonch aponcedeleonch deleted the prevent-similar-personas branch March 5, 2025 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Create Personas with descriptions
2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy