Introducing Statsmodels Guru on Gurubase.io #9429

kursataktas · 2024-11-20T21:06:11Z

Hello team,

I'm the maintainer of Anteon. We have created Gurubase.io with the mission of building a centralized, open-source tool-focused knowledge base. Essentially, each "guru" is equipped with custom knowledge to answer user questions based on collected data related to that tool.

I wanted to update you that I've manually added the Statsmodels Guru to Gurubase. Statsmodels Guru uses the data from this repo and data from the docs to answer questions by leveraging the LLM.

In this PR, I showcased the "Statsmodels Guru", which highlights that Statsmodels now has an AI assistant available to help users with their questions. Please let me know your thoughts on this contribution.

Additionally, if you want me to disable Statsmodels Guru in Gurubase, just let me know that's totally fine.

Signed-off-by: Kursat Aktas <kursat.ce@gmail.com>

josef-pkt · 2024-11-20T22:04:25Z

I don't think we should advertise anything like this yet.

First we don't really want to pick one of the competitors for it.
I was recently looking at copilot to see how good the answers are. #9318

Second, from what I have seen so far, the answers are good for very common problems or where our documentation is very good, e.g. OLS regression.
However, the answers are very often wrong or incomplete for less common questions.
quick check:
the example under "Example calculations" is not defining the effect size correctly.
https://gurubase.io/g/statsmodels/calculate-power-for-proportion-in-statsmodels

copilot is better in this case and uses
effect_size = sms.proportion_effectsize(0.5, 0.6) # Example proportions (baseline and target)

But copilot also often gives example that will not work or will not do what is required.

kursataktas · 2024-11-21T08:00:34Z

Hi @josef-pkt

Thanks for the review. You are right, it works pretty well for well-written documents and the questions that can be extracted from them. However, for questions with very limited supporting context, it sometimes hallucinates.

We are building a "Trust Score" feature, which will show users a score indicating the level of supporting context for the answer.

As for the copilot side, I have no comment. We position Gurubase as a Learning Assistant rather than a Coding Assistant. Perhaps you can use both tools.

josef-pkt · 2024-11-21T15:23:06Z

We position Gurubase as a Learning Assistant rather than a Coding Assistant

That's what I was looking at with copilot when it showed up on my windows computer.
I don't have it integrated with my IDE. (I'm not using Visual studio or other copilot compatible IDE)

Since the latest windows update copilot is not a trial version anymore, but some performance has decreased. I now don't automatically get references (links to websites) anymore.

josef-pkt · 2024-11-21T15:32:04Z

BTW, in issue #9318 I was looking in both direction

Can we use copilot and similar to improve or speed up improving our docs?
How and where can we improve our docs so that copilot and similar give better answers?

One task is to add more example notebooks for where we have gaps or where notebooks are not descriptive enough.

When I looked at the trial version of copilot I often got reference links to blog articles that described with examples a statsmodels feature. Those copilot answers were usually pretty good and mostly correct.

josef-pkt · 2024-11-21T17:51:25Z

another problem that I just found: Delay in getting newer features

example "two sample test for comparing proportion in statsmodels"
shows only old function and not the newer much better functions release in 2020.
both copilot and gurubase

The answers themselves look fine (no obvious errors when skimming), but are outdated.

I guess it takes some time for docs, blog posts and similar to create enough material to get into LLMs.

(aside
google search also finds mainly posts about proportion_ztest, but at least it include as second item the link to the appropriate new function in the docs.
It looks like there are not many pages on the internet that use more than the simple ztest and chisquare test
)

kursataktas · 2024-11-25T15:10:38Z

Can we use copilot and similar to improve or speed up improving our docs?

We have a similar idea planned for the future. The idea is to compare the codebase with the documentation to identify missing parts and inconsistencies.

another problem that I just found: Delay in getting newer features

example "two sample test for comparing proportion in statsmodels"
shows only old function and not the newer much better functions release in 2020.
both copilot and gurubase

The answers themselves look fine (no obvious errors when skimming), but are outdated.

I guess it takes some time for docs, blog posts and similar to create enough material to get into LLMs.

(aside
google search also finds mainly posts about proportion_ztest, but at least it include as second item the link to the appropriate new function in the docs.
It looks like there are not many pages on the internet that use more than the simple ztest and chisquare test
)

I manually added the latest version of the documentation as a data source for Statsmodels Guru. The issue might be that I missed some parts of it, or the system didn’t perform well in finding the relevant context. I’ll look into this.

Additionally, I want to update you on the release of the Maintainer Panel feature on Gurubase. With this panel, you can add, remove, or update data sources, change the logo, and more. You can find the details here.

kursataktas added 2 commits November 20, 2024 23:48

Introducing Statsmodels Guru on Gurubase.io

e140f57

Signed-off-by: Kursat Aktas <kursat.ce@gmail.com>

fix

676e128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introducing Statsmodels Guru on Gurubase.io #9429

Introducing Statsmodels Guru on Gurubase.io #9429

Uh oh!

kursataktas commented Nov 20, 2024

Uh oh!

josef-pkt commented Nov 20, 2024

Uh oh!

kursataktas commented Nov 21, 2024

Uh oh!

josef-pkt commented Nov 21, 2024

Uh oh!

josef-pkt commented Nov 21, 2024

Uh oh!

josef-pkt commented Nov 21, 2024

Uh oh!

kursataktas commented Nov 25, 2024

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Introducing Statsmodels Guru on Gurubase.io #9429

Are you sure you want to change the base?

Introducing Statsmodels Guru on Gurubase.io #9429

Uh oh!

Conversation

kursataktas commented Nov 20, 2024

Uh oh!

josef-pkt commented Nov 20, 2024

Uh oh!

kursataktas commented Nov 21, 2024

Uh oh!

josef-pkt commented Nov 21, 2024

Uh oh!

josef-pkt commented Nov 21, 2024

Uh oh!

josef-pkt commented Nov 21, 2024

Uh oh!

kursataktas commented Nov 25, 2024

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.