Content-Length: 143089 | pFad | http://phabricator.wikimedia.org/T378118

s ⚓ T378118 [Spike] Check with Search Platform about traffic impact of recommendations A/B test
Page MenuHomePhabricator

[Spike] Check with Search Platform about traffic impact of recommendations A/B test
Closed, ResolvedPublic2 Estimated Story PointsSpike

Description

Question we are trying to answer

  • If we roll out an A/B test on mobile that utilizes MoreLike to make recommendations for treatment users, this will generate a lot of traffic to APIs that aren't currently handling this volume of traffic. There are a few parts to this:
    • What volume of traffic are we expecting? (Web to investigate/determine)
    • Is the current Search Platform infrastructure sufficient for this volume of traffic? (Web to talk to Search Platform about this)
    • If not, what would be necessary to support this, and how long would this take? (Search Platform to weigh in on this, document the results here)

Acceptance Criteria

  • We have gotten good estimates of potential traffic volume based on relevant page view data
  • We have met with Search platform to discuss this and the implications thereof
  • We have documented the outcome of this sync and have an idea of what might be necessary to make this work
  • We have discussed with the search team whether this work should be reflected in a hypothesis from the side of the search team

Event Timeline

ovasileva updated the task description. (Show Details)
ovasileva removed the point value for this task.
NBaca-WMF set the point value for this task to 2.Nov 13 2024, 5:50 PM

More like currently stays fairly busy at 80-100M requests per day primarily through the RelatedArticles extension. From the search perspective our main concern is to ensure that Related Articles and this functionality are using the same shape of api requests and thus share the same caches. This will help keep the load down on the search side and ensure better response times to the users by using an already well populated cache.

In terms of total load the search side can handle, it's hard to say for sure.If this has a similar distribution of pages (cachability) to Related Articles, then i wouldn't start to worry about additional load until we are talking about adding 10-20M+ reqs/day. Even then it's not necessarily a worry, but something we will at least keep a closer eye on.

Just met with @EBernhardson and @dcausse to discuss this. Since we are testing based on session id we should see a similar distribution of pages to Related Articles, so for the the A/B test the Search Platform team mainly needs to just be alerted of when it starts and which wikis. Ebernhardson will be out in January but David and others on the team will be around.

David raised concern that MoreLike could be slower on different languages, but we don't have good numbers on it. I will clarify the deployment schedule for the different wikis, and see if we can spread out the deployments.

Lastly, I'm going to followup on getting an estimate for traffic to starting search in general, which should be located in the search satisfaction schema in the data lake.

Regarding traffic estimates i took a quick stab at the data in superset: https://superset.wikimedia.org/sqllab/?savedQueryId=986

This reports the number of page loads that submitted at least one autocomplete event. This seems like a reasonable proxy for the minimum number of autocomplete opens we should expect.

dtnum_pages
2024-11-194,079,768
2024-11-203,966,064
2024-11-213,899,241
2024-11-223,653,168
2024-11-232,843,423
2024-11-243,055,215
2024-11-253,973,376

thank you! that's helpful! I also took a look the web team's search dashboard, which has desktop and mobile autocomplete actors raw count

image.png (490×2 px, 125 KB)

@bwang - curious if we can collect the documentation for this in a single place. Maybe on-wiki? Potentially as a sub-page of https://www.mediawiki.org/wiki/Reading/Web/Content_Discovery_Experiments might work









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://phabricator.wikimedia.org/T378118

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy