Page MenuHomePhabricator

Install Matomo Custom Reports Plugin for wikimediafoundation.org
Closed, ResolvedPublic

Description

Name of team
Integrated Marketing, External Communications

Description of the project
As we continue to enhance our analytics capabilities with Matomo, including recent updates to components like GeoIP databases and the deployment of new plugins such as the Tag Manager and Campaign Reporting, we have purchased the Custom Reports plugin. The plugin will enable us to closely monitor our digital campaigns and gain deeper insights into our website traffic. We believe that this plugin will allow us to create customized reports tailored to our specific needs, thereby improving our ability to analyze and act on website data. We have included this tool for our upcoming campaign in September 2024 and need it to be ready by that time.

We would like to request the SRE team to help us install and troubleshoot this plugin to ensure it is fully operational before our campaign launch. To activate the license, enter the license key in your Matomo (Piwik) under Administration.

Matomo version: 4.x

Access information: We have the plugin license key and are happy to share it in the format/way that is best for you.

Contact person: @SCampos-WMF or @Ospingou.

Please let me know if you need any additional information. Thank you in advance!

Event Timeline

I approve the install of the plugin. The Matomo software has passed the security review, it is not distributed as part of MediaWiki and is used for analytics purposes stated above.

BTullis triaged this task as High priority.

I have sent a message to the vendor, asking how we can obtain the code so that we can package it.
The approved method depends on the marketplace plugin being enabled, which requires the internet_features_enabled option to be set, as well as an outboud proxy.

I might be able to get this to work, but I would rather be able to download the code and create a debian package from it, then make this available via the wikimedia-private apt repository, if possible.

Thank you Ben. Sharing here for vis, that I just shared with you the access to our Matomo account where the plugin .zip file and license key are located.

Change #1062069 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Enable the CustomReports plugin

https://gerrit.wikimedia.org/r/1062069

I have built a Debian package of the plugin locally on my workstation and uploaded it to matomo1003.eqiad.wmnet

This installed cleanly and I enabled the plugin from the UI, so that @SCampos-WMF can now work on testing the functionality.

image.png (483×1 px, 70 KB)

A required configuration change to matomo is here: https://gerrit.wikimedia.org/r/1062069

I will follow up with the Debian packaging files, which will be in a repository here: https://gitlab.wikimedia.org/repos/data-engineering/matomo/plugin-customreports
The package build script will depend on the local availability of a zip file containing the code, which can only be downloaded from the Matomo shop here: https://shop.matomo.org/my-account/downloads/

In addition to that, I will put the debian package in our private APT repository, and then configure Matomo to install the package from that location.

Change #1062069 merged by Btullis:

[operations/puppet@production] Enable the CustomReports plugin

https://gerrit.wikimedia.org/r/1062069

Change #1062401 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add a non-free component to the apt private repo

https://gerrit.wikimedia.org/r/1062401

Hi! I am Luca from the SRE Infrastructure Foundations team, I had a chat with Ben today about https://gerrit.wikimedia.org/r/c/operations/puppet/+/1062401 and the need to deploy this extra plugin.

My understanding after reading the task: the new custom reports plugin is binary only, non open source and protected with a license key. It will be called by websites like https://wikimediafoundation.org, but potentially from any other website tracked by Matomo. The tool is needed to have better data starting from September.
The https://wikimediafoundation.org website is not hosted by the Wikimedia infrastructure, but the open-source tracking software is (Matomo).

The SRE team has always warned every team in the Foundation about why we take a strong stand regarding deploying non-open-source software:

  • Lack of transparency, debuggability and openness.
  • Software security concerns related to the point stated above. Closed software may contain nasty bugs and we'd rely only on upstream for fixes.
  • Any community member/group that want to try to set up a similar stack will be blocked, because part of it is behind a paywall.

I am aware that the SRE team allows some non-free software use cases, but they are confined, critical and non replaceable with anything else. The biggest example is the code running on routers and switches, but we cannot really avoid it since without that equipment the whole production stack would not exists (and the companies that are able to write their own network equipment software have thousands of employees :). A more recent use case came with AI, since we chose a vendor (AMD) that focused on releasing all their stack open-source, avoiding the more competitive and widespread competitors (more info https://wikitech.wikimedia.org/wiki/Machine_Learning/AMD_GPU#Do_we_have_Nvidia_GPUs%3F).

Having said this, and after a chat with my manager @joanna_borun, we'd ask to avoid the deployment of this software and to find an open-source alternative. We completely understand that this will create some delays and issues in the upcoming campaign in September, and we are sorry about it, but this is an important topic and we stand for the aforementioned values.

Hi! I am Luca from the SRE Infrastructure Foundations team, I had a chat with Ben today about https://gerrit.wikimedia.org/r/c/operations/puppet/+/1062401 and the need to deploy this extra plugin.

My understanding after reading the task: the new custom reports plugin is binary only, non open source and protected with a license key.

Minor correction: it is not a binary. It is a collection of plain-text scripts, written in PHP, Javascript, and typescript (for vue).
There is no DRM or other obfuscation on the files. While there is a licence key provided, it does not have to be used in order to access the functionality.

However, it's completely correct that it is not open source. The code is only accessible from the vendor's web shop, after purchase.

Hi @elukey , @joanna_borun, the reason the plugin was chosen is because there are no other equivalent open source options. The plugin underwent an internal security review and signoff as part of the provisioning process which should provide sufficient assurances wrt to the security concerns above. Additionally see Ben's note about the codebase. If anyone wishes to reuse our configurations for Matomo they can disable the corresponding plugin. Does that address your concerns? I would like to request that we proceed with the install.

Hi! I am Luca from the SRE Infrastructure Foundations team, I had a chat with Ben today about https://gerrit.wikimedia.org/r/c/operations/puppet/+/1062401 and the need to deploy this extra plugin.

My understanding after reading the task: the new custom reports plugin is binary only, non open source and protected with a license key. It will be called by websites like https://wikimediafoundation.org, but potentially from any other website tracked by Matomo. The tool is needed to have better data starting from September.
The https://wikimediafoundation.org website is not hosted by the Wikimedia infrastructure, but the open-source tracking software is (Matomo).

The SRE team has always warned every team in the Foundation about why we take a strong stand regarding deploying non-open-source software:

  • Lack of transparency, debuggability and openness.
  • Software security concerns related to the point stated above. Closed software may contain nasty bugs and we'd rely only on upstream for fixes.
  • Any community member/group that want to try to set up a similar stack will be blocked, because part of it is behind a paywall.

I am aware that the SRE team allows some non-free software use cases, but they are confined, critical and non replaceable with anything else. The biggest example is the code running on routers and switches, but we cannot really avoid it since without that equipment the whole production stack would not exists (and the companies that are able to write their own network equipment software have thousands of employees :). A more recent use case came with AI, since we chose a vendor (AMD) that focused on releasing all their stack open-source, avoiding the more competitive and widespread competitors (more info https://wikitech.wikimedia.org/wiki/Machine_Learning/AMD_GPU#Do_we_have_Nvidia_GPUs%3F).

Well, this has been codified (a long time ago) in the guiding principles, particularly focused around our main projects and content, where we should strive to use open source tools over proprietary ones, although, (as also indicated by your examples), not at all cost: we use proprietary or closed tools [...] where there is currently no open-source tool that will effectively meet our needs. This would benefit from an update or a specific policy better exploring what that means in practice, and with better guidelines. It has proven unrealistic for WMF to rely exclusively on open-source software for all its needs (e.g. for a lot of corporate or externally hosted services), and the landscape has also arguably become more difficult for open-source software with the rise of cloud and SaaS over the past decade+, leading to fewer sustainable open source options and more proprietary and open-core options (including this one, Piwik/Matomo). Clearly open-source is and remains as important as ever for our project, but given very limited resources we should be more strategic about how we evaluate these decisions (when no good options exist), on where it matters the most. Where our use of open-source software is important for our content/reuse, forks, independence, and generally, where we best support the open-source ecosystem we rely on and are part of.

But, unfortunately, that can't realistically be the case for everything we need and use, and it's getting more difficult over time. In a fair amount of situations we've spent considerable effort and staff time on a workaround or our own implementation, with mixed results. Let's continue to do that where it really matters or really helps - but that won't and can't be always and everywhere. This plugin, which does help our organization towards its mission, but is not by itself serving an essential and very core part of our projects, is an example where that tradeoff and judgement needed to be made, which is what @odimitrijevic has done: the decision was to stick with open-core Matomo and purchase/install/host the proprietary plugin.

Until now, for the most part any non-fully-FLOSS software in use by WMF has been hosted externally, usually as SaaS, and not within our own infrastructure (with some exceptions like you mentioned). There are indeed also some practical considerations to consider, and we are not well setup for hosting proprietary software (like you indicated). But, although we don't love it, it seems feasible and limited risk in this case, with the source available (not a blob). We can debug and troubleshoot issues, we can fix critical issues ourselves if we really have to, and in the very worst case, it can be turned off. Definitely not ideal, but it seems a reasonable compromise given the circumstances.

Hi @elukey , @joanna_borun, the reason the plugin was chosen is because there are no other equivalent open source options. The plugin underwent an internal security review and signoff as part of the provisioning process which should provide sufficient assurances wrt to the security concerns above.

Why is this all happening privately? How much code are we actually talking about here? Why can't the WMF just pay someone a little more to create an open source plugin, which is what we've done for the rest of wikimediafoundation.org when there were gaps? Surely that would be a far bigger win in the long term, both practically and morally.

Clearly open-source is and remains as important as ever for our project, but given very limited resources we should be more strategic about how we evaluate these decisions (when no good options exist), on where it matters the most. Where our use of open-source software is important for our content/reuse, forks, independence, and generally, where we best support the open-source ecosystem we rely on and are part of.

I'm disappointed but not surprised that Wikimedia is retreating from its robust open source principles, but I'm genuinely shocked that it's doing so for the purpose of better tracking of users.

Change #1062401 merged by Btullis:

[operations/puppet@production] Add a matomo_plugins component to the apt private repo

https://gerrit.wikimedia.org/r/1062401

...I'm genuinely shocked that it's doing so for the purpose of better tracking of users.

I would just like to add a small note here, for the purposes of clarity in how Matomo is used at the Foundation.
This plugin does not change the user tracking behaviour of Matomo at all. It just enables creating reports based on filtering our existing data by different parameters.

We anonymise all visitor data in Matomo by removing the last two octets of the visitor's IP address..
This setting is applied globally across Matomo and, to the best of my knowledge, it always has been.

image.png (724×829 px, 91 KB)

Why can't the WMF just pay someone a little more to create an open source plugin, which is what we've done for the rest of wikimediafoundation.org when there were gaps?

I will refrain from commenting personally on any policy matters.

@mark @odimitrijevic thanks a lot for the explanations and the rationale, I am not 100% happy with the outcome but I will not oppose anymore :) In the future we'd love to get involved sooner (we == SRE team) rather than later in these kind of discussions, to help as much as possible and provide more options (if available).

After a chat with my team, we came up with a proposal - would it be possible to review this use case again in 3/6 months and decide whether or not we need to keep this plugin installed? Looping in @SCampos-WMF and @Ospingou to get their feedback.

Moreover, for all future references - as stated above very clearly this is an exception that we took for a limited/confined use case, since there was no better open-source alternative to use. The use of non-open-source software at the Foundation is not under debate, every other use case (if any, in the future) will need to be reviewed by the SRE team beforehand.

I have added the package to the private apt repository.

btullis@apt1002:~$ sudo -i private_reprepro -C matomo_plugins --ignore=wrongdistribution include bookworm-wikimedia-private `pwd`/matomo-plugin-customreports_4.1.8-1_amd64.changes
.changes put in a distribution not listed within it!
Ignoring as --ignore=wrongdistribution given.
Exporting indices...

Verified that it is visible:

btullis@apt1002:~$ sudo -i private_reprepro -C matomo_plugins list bookworm-wikimedia-private
bookworm-wikimedia-private|matomo_plugins|amd64: matomo-plugin-customreports 4.1.8-1
bookworm-wikimedia-private|matomo_plugins|source: matomo-plugin-customreports 4.1.8-1

I will now proceed to update the puppet manifests for matomo to install the plugin.

Change #1067362 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add the matomo-plugin-customreports package to Matomo

https://gerrit.wikimedia.org/r/1067362

Change #1067362 merged by Btullis:

[operations/puppet@production] Add the matomo-plugin-customreports package to Matomo

https://gerrit.wikimedia.org/r/1067362

The installation and configuration of this plugin is now complete.
I will leave the ticket open a for a while, in case others still wish to share their points of view or reply to any open questions above.

...I'm genuinely shocked that it's doing so for the purpose of better tracking of users.

I would just like to add a small note here, for the purposes of clarity in how Matomo is used at the Foundation.
This plugin does not change the user tracking behaviour of Matomo at all. It just enables creating reports based on filtering our existing data by different parameters.

Sure, that's still making the existing tracking of users better. Alternative scary bold text: WMF deploys proprietary analytics software.

After a chat with my team, we came up with a proposal - would it be possible to review this use case again in 3/6 months and decide whether or not we need to keep this plugin installed? Looping in @SCampos-WMF and @Ospingou to get their feedback.

Moreover, for all future references - as stated above very clearly this is an exception that we took for a limited/confined use case, since there was no better open-source alternative to use. The use of non-open-source software at the Foundation is not under debate, every other use case (if any, in the future) will need to be reviewed by the SRE team beforehand.

Thanks for suggesting this, I think planning for a review in the next few months is important to actually establish "there was no better open-source alternative to use". There's currently no evaluation criteria outlined on this ticket, it would be good to have that documented, like similar requests in T317274#8219765 and T201045.

Thank you all for your input throughout this process. We value both your support and concerns, and appreciate you bringing them forward.

We would like to clarify that we intend to use this tool specifically to measure the performance of our external communications campaigns that aim to drive traffic to the Foundation’s WordPress website (wikimediafoundation.org). We have chosen to work with a reputable and trusted partner, known for its commitment to privacy, security, and reliability, and that is already part of our data platform catalog. We selected this plugin because it neither introduces any new data collection methods, nor captures personally identifiable information, or requires us to collect new data points. Instead, it allows us to compare existing data points by generating custom reports and applying filters.

@elukey thank you for suggesting a more flexible approach. We are open to reviewing this use case again in 3/6 months to decide whether we should keep it. I’ll follow up within this timeframe.

I appreciate everyone's feedback on the subject, and I want make clear that my team (Data Platform SRE) is firmly committed to open source principles, and that my colleagues spent a considerable amount of effort attempting to find an open-source alternative in accordance with the guiding principles.

As @mark pointed out "This would benefit from an update or a specific policy better exploring what that means in practice, and with better guidelines." I welcome an update that has clarity around who decides whether the guidelines have been followed, and how they decide it. I don't think we want to allow teams to arbitrarily enforce their interpretation of the guidelines simply because they are hosting the content, as that is likely to have unintended negative consequences (if you consider use of closed-source and SaaS to be negative-which I definitely do).

I appreciate everyone's feedback on the subject, and I want make clear that my team (Data Platform SRE) is firmly committed to open source principles, and that my colleagues spent a considerable amount of effort attempting to find an open-source alternative in accordance with the guiding principles.

Definitely, I don't think that anybody questioned Data Platform's SRE in any way :) I had several chats with Ben before writing to the task, to understand the context and the initial requirements, but I felt that some discussion needed to happen to figure out a good path forward (that happened, we are all happy about the end result as far as I understand).

As @mark pointed out "This would benefit from an update or a specific policy better exploring what that means in practice, and with better guidelines." I welcome an update that has clarity around who decides whether the guidelines have been followed, and how they decide it.

It makes sense, having something in writing could help for future use cases, +1!

I don't think we want to allow teams to arbitrarily enforce their interpretation of the guidelines simply because they are hosting the content, as that is likely to have unintended negative consequences (if you consider use of closed-source and SaaS to be negative-which I definitely do).

About this particular use case: my only suggestion for the next time (that I added in the task) is to reach out to SRE sooner the next time, to collaborate as early as possible in finding a shared solution. My team (Infra Foundations) would have liked to be aware of this request before it was already approved and basically ready to deploy, since we are responsible (among other things) for the overall application security of our prod infrastructure and this is a very different use case from the average that we manage. We all collaborate to achieve our end results, and the more we do it the easier will be in the future to adapt to new use cases.

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy