Commons talk:WMF support for Commons/Commons community calls

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
This is the talk page for discussing improvements to Commons:WMF support for Commons/Commons community calls.

Priorities from the perspective of a frequent user and re-user (inside and outside Wikimedia projects)

[edit]

Posting here just in the case I will miss tomorrow's call. I am very grateful for this opportunity, thank you for listening and considering <3 !

My perspective: I very frequently edit Wikimedia Commons, with the focus of describing the media there as accurately and reliably as possible, and making the media there usable and re-usable by the world (not just Wikimedia projects), in full agreement with the Wikimedia movement strategy.

Professionally I also currently lead a project by a government agency which frequently re-uses media from Wikimedia Commons (probably often media which is not used in Wikipedia at all). You can see some of the usage here. Besides this visible re-use, we also rely on search and querying of Wikimedia Commons and Wikidata to find more media and data, which is harder to track down. As project manager I can say our usage and data retrieval goes up to 10,000s to 100,000s of Wikidata items and Commons files.

I have worked on media databases (broadly speaking) professionally since the early 2000s. My native language is Dutch and I am very aware that the majority of the world doesn't speak a word of English. We have the tools in Wikimedia projects to serve this majority of the world if we decide to leverage them.

High-level wishes from these perspectives:

  • In terms of content organization, multilingual discoverability and ease of re-use, structured data is vastly superior. A part of the Wikimedia Commons community is very attached to Wikitext and categories, and I heard that they matter for discoverability too; therefore I still use them. Mainly as duplicate work on top of adding structured data - I would be able to use my sparse volunteer time more efficiently if this were not needed. For re-using and searching, structured data is the way to go. Commons should be a structured database just like any other contemporary digital assets management system.
  • Commons is a knowledge platform, not a stock images website. If I want to find a free picture of a dog or a rainbow, I will use a generic search engine. The unique strength of Wikimedia Commons is that we describe and contextualize very specific things (a specific church at a certain point in time, a specific occurrence of an animal or plant in a specific location...). Generic search engines can't help searching for such specific things at specific times and spaces, but we can build (and partly already have) the unique and very helpful infrastructure to achieve that. We should further develop search and browsing for discovery of such specificity. For discovery of media related to general topics, IMO it's better to e.g. work with general-purpose search engines, perhaps focusing on mission-aligned ones (e.g. DuckDuckGo?), to make our general-scope media more generally discoverable there.
  • Generally make structured data more visible to contributors so that there is more incentive to improve it.
  • Design updates to SDC should encourage editors to edit with precision and accuracy (sourced, correct, not generic but specific).

More specific wishes and requests I'm currently thinking of:

  • Remove authentification from WCQS so that Wikimedians, and cultural and other knowledge organizations around the world can perform federated and shareable Wikimedia Commons queries.
  • Improve MediaSearch so that it shows (structured) metadata of each file by default (not needing a click).
  • Add faceted search to MediaSearch.
  • Persistent faceted search results can become new-style galleries. It would be great to be able to have persistent URIs for specific faceted search results, multilingually ("Korenmolen de Distilleerketel in de 19de eeuw")
  • Show structured data on file pages by default and more prominent than Wikitext (not in a separate tab anymore)
  • In order to be able to re-use gadgets and scripts from Wikidata, and to provide a unified experience, make sure SDC has the generic Wikibase/Wikidata design (i.e. revert the decision to have Commons-specific UI for SDC).

Thanks! Spinster (talk) 09:39, 20 November 2024 (UTC)[reply]

As someone who works on Wikidata scripts/gadgets a lot, the biggest problem for those (by far) is the lack of Javascript hooks. I can adapt scripts to support different HTML structures, but they won't work if they don't run at the right time.
Also, links to some relevant tickets:
  • phab:T327076 - UI for structured data on Commons should have the same Javascript hooks as Wikidata
  • phab:T341781 - Show structured data by default
  • phab:T297995 - Remove authentication from Wikimedia Commons Query Services (WCQS)
  • phab:T337106 - Faceted, structured data-based MediaSearch on Wikimedia Commons
- Nikki (talk) 16:13, 20 November 2024 (UTC)[reply]
In terms of content organization, multilingual discoverability and ease of re-use, structured data is vastly superior. Strongly disagree. It's basically redundant to categories and just duplicates the work. Most files do not have structured data and those that have them do not have most major subjects or as many things set as the categories. Most of the SD that are set have been set using the categories. It's wishful thinking and is SD is a resource-sink without much need for it when it comes to subjects depicted. Moreover, categories can also be multilingual – it's just one of many cases where people think SD is needed or better when it's not. See Add machine translated category titles on WMC.
Improve MediaSearch so that it shows (structured) metadata of each file by default Also strongly oppose – instead make it show the categories which unlike SD are well-maintained, usually fairly complete and not polluted with unrelated or vandal depicts data.
make structured data more visible to contributors so that there is more incentive to improve it just wastes precious scarce volunteer time to duplicate work that has already been done via file categories.
For discovery of media related to general topics, IMO it's better to e.g. work with general-purpose search engines, perhaps focusing on mission-aligned ones (e.g. DuckDuckGo?), to make our general-scope media more generally discoverable there. People also search for relatively niche things with Web search engines (e.g. a specific river from space at sunlight) and the problem is that WMC is not well indexed there. Videos are not showing in DuckDuckGo Videos at all for example. See Do something about Google & DuckDuckGo search not indexing media files and categories on Commons.
Please accept the reality of structure data and categories. Prototyperspective (talk) 19:22, 20 November 2024 (UTC)[reply]
The category system is broken in a lot of ways. It doesn't scale well to the size of Commons, and is causing stability issues. Tiny intersection categories ("Red apples with green spots sitting on blue plates in November 2024") are common but make actually using the category system to find every picture of a red apple difficult. All of this and more is solved by structured data, but migrating all of the existing category-based data to structured data absolutely is a challenge. The tools to work with structured data are often barely functional, and WCQS has been an afterthought since it was introduced. But that doesn't mean we should look backward instead of forward. AntiCompositeNumber (talk) 15:46, 21 November 2024 (UTC)[reply]
  1. It's not broken at all.
  2. For scaling you seem to be referring to phab:T343131 which can be addressed in various ways such as maybe better caching or removing redundant meta-categories (or moving these to SD since they are not about the content).
  3. [overspecific intersection categories] make actually using the category system to find every picture of a red apple difficult. 1. Not an issue of categories. 2. Not addressed with structured data. 3. Addressed with the Deepcat gadget which would be greatly improved if the deepcategory search operator issues like phab:T376440 were fixed and could be improved upon (e.g. specify depth or exclude certain subcats of Red apples like "Red apples in fiction") and with this highly supported wish.
  4. Those overspecific categories if anything are a problem and often they are getting upmerged and if not you could propose that but there should also be a category the user would navigate to that contains more of these files instead of many deep overspecific cats. Moreover, many of these by date categories should be redundant by enabling users to sort, search and/or filter (also see phab:T329961 & phab:T329961) by content in the {{Information}} template like the date= field which is something quite overdue as there is so much useful metadata in there that it should be searchable / part of filters.
  5. All of this and more is solved by structured data That is denying the reality and wishful thinking. None of these things have been solved or solved to any notable degree.
  6. that doesn't mean we should look backward instead of forward Just because something is new doesn't make it better. When it comes to subjects depicted, forward are categories, putting one's head in sand and arguing with what one idealogically wish was true is structured data.
Prototyperspective (talk) 16:25, 21 November 2024 (UTC)[reply]
... everyone breathe :)
  • an image gallery including subcats is a great bandaid.
  • If we're redesigning things to make more sense: combination categories are a bit of a misuse of the theoretical concept of cats. "X in fiction" should be in categories "X" + "in fiction". Then <adjective> <adjective> <adjective> <noun> <in context> <in context> would be in six atomic categories, with a large number of possible combination categories. Then we need indexes and views that allow seeing all of the "red, decaying, food, on flatware" which will show red apples with green mold spots on blue plates.
--SJ+ 14:40, 24 November 2024 (UTC)[reply]

Perennial needs

[edit]

Commons:Requests for comment/Technical needs survey. RoyZuo (talk) 11:34, 20 November 2024 (UTC)[reply]

@RoyZuo Thanks, we already discussed internally the result of this survey, and we tried to include as much as possible its findings into our roadmap. Sannita (WMF) (talk) 10:26, 25 November 2024 (UTC)[reply]
And also Commons-related Community Wishlist proposals. I hope both are being discussed in the community calls and looked into instead of the CC kind of sidelining/duplicating these – if I was able to attend I would only bring up these two resources and various specific already-existing proposals in them and ask for increasing technical development as described here. Prototyperspective (talk) 15:32, 10 December 2024 (UTC)[reply]

summary of calls

[edit]

how did the two sessions yesterday go? Arlo James Barnes 20:17, 22 November 2024 (UTC)[reply]

@Arlo Barnes Thanks for the question. The calls went well, we will publish the notes in the next days. Please have a bit of patience, because we need to give them a bit of structure. Sannita (WMF) (talk) 18:02, 23 November 2024 (UTC)[reply]
Do these meetings ever have etherpads or collective notes that the attendees can contribute to? That makes some community meetings easier to follow --SJ+ 14:40, 24 November 2024 (UTC)[reply]
@Sj We collected feedback on an internal document, but I'll ask if we can move to Etherpad for the next calls. Sannita (WMF) (talk) 10:25, 25 November 2024 (UTC)[reply]
Did I missed them or are the notes from the session still not published? GPSLeo (talk) 15:40, 8 December 2024 (UTC)[reply]
@GPSLeo Still not published, sorry we're so behind on this, we're trying to summarise them (also for internal use). Sannita (WMF) (talk) 15:41, 8 December 2024 (UTC)[reply]
@Sannita (WMF) I suspect other editors may have mentioned this in the past, but it is rather ironic that Wikimedia is an information platform used by tens of thousands of community members every day to discuss and resolve issues, yet the Foundation so often seems unwilling or unable to use this platform. The foundation wouldn't be struggling to summarize and publish the discussions, if those discussions had simply happened on-wiki.
The fact that those discussions aren't already accessible is a transparency problem. The fact that the discussions are to be "summarized" needlessly raises further transparency and trust issues. I want to clarify that "trust" issue - I'm confident your people will do their best to summarize the discussion fairly and accurately. The issue is that many problems are rooted in miscommunication and misunderstandings. Some wiki-cultural or wiki-contextual subtleties have been notoriously difficult to communicate across the community-foundation interface. Non-transparent process attempting to "summarize" discussions with the community are only liable to escalate any miscommunications or misinterpretations.
For some editors, having to sign up for a live chat at appointed time on some arbitrary other-platform are burdensome or prohibitive constraints on who is permitted to participate. Some contributors have real life commitments and can't attend at the assigned time. Some contributors have less predictable lives and can't commit to a scheduled time. Some contributors find live chat too stressful or too constraining. Some of our contributors are attracted to wiki work exactly because the wiki allows them to contribute and to respond in whatever time and manner makes them most comfortable. The Foundation has noted many times a substantial percentage of contributors are de facto excluded, for whatever reason, if people are required to go off-wiki to participate. Alsee (talk) 11:23, 16 December 2024 (UTC)[reply]
@Alsee I know, and I take full blame for it. In my defense, I had also other projects to follow and to close before the end of the year, while also organising the December call. I will post the summary of the calls during the week, and let you know about it. Sannita (WMF) (talk) 14:03, 16 December 2024 (UTC)[reply]

@Arlo Barnes, Sj, GPSLeo, and Alsee: The summary of the November conversation is now available on a subpage. We're working on the December's call summary, and we expect to publish it in early January. Sorry for keeping you wait, we hope to speed up the process for the next calls. Sannita (WMF) (talk) 14:34, 18 December 2024 (UTC)[reply]

Thank you, very helpful. :) Honoring the comment by @Alsee, I would appreciate any upgrade to the "real-time meeting workflow" that leads to automatic publication of summary transcripts, even if they are updated later with a more accurate or more useful one. In addition to being more inclusive, asynchronous discussions mediated by text also seem easier to search through and less expensive to organize, translate, multitask around, and sustain over long periods of time. So maybe we could aim for a certain ratio of asynch vs real-time Q&A: a few rounds of asynch for each real-time call... --SJ+ 14:55, 6 January 2025 (UTC)[reply]

My comments based on Sandra's and Nikki's comments - Jane023

[edit]

Quick reorganisation of Sandra’s and Nikki’s comments to be able to refer to these issues by number:

1) Remove authentification from WCQS so that Wikimedians, and cultural and other knowledge organizations around the world can perform federated and shareable Wikimedia Commons queries. phab:T297995 - Remove authentication from Wikimedia Commons Query Services (WCQS)

2) Improve MediaSearch so that it shows (structured) metadata of each file by default (not needing a click).

3) Add faceted search to MediaSearch. phab:T337106 - Faceted, structured data-based MediaSearch on Wikimedia Commons

4) Persistent faceted search results can become new-style galleries. It would be great to be able to have persistent URIs for specific faceted search results, multilingually ("Korenmolen de Distilleerketel in de 19de eeuw”) This is a popular windmill today that was a ruin in the the early 20th-century - see nl:De Distilleerketel

5) Show structured data on file pages by default and more prominent than Wikitext (not in a separate tab anymore) phab:T341781 - Show structured data by default

6) In order to be able to re-use gadgets and scripts from Wikidata, and to provide a unified experience, make sure SDC has the generic Wikibase/Wikidata design (i.e. revert the decision to have Commons-specific UI for SDC). phab:T327076 - UI for structured data on Commons should have the same Javascript hooks as Wikidata

On categories: I am going to skip the category discussion because though I love Commons (and Wikipedia) categories and use HotCat and Cat-a-lot quite a bit on Commons categories for heritage sites and artist categories, I have given up on the “category or item” discussion in favour of both when and if possible. On 2) I feel that Commons categories are much more inefficient for search than structured data, but because of all the restrictions on practical use of WCQS (it’s so well hidden!) I prefer Wikidata search. My main issue with categories these days is that when I go to track down painting files in some language Wikipedia I don’t speak or read, I am shocked that when I click on the file it doesn’t take me to my “ normal” Commons UI, but takes me by default instead to some UI that doesn’t give me any commons categories at all. My main comment on 1) is that this authentication feature is the reason I don’t use WCQS at all. My main comment on 5) is that I occasionally get confused when I update the Wikidata item in a Commons file but the file is still showing the data from the old Q number and I have to go in and change the Q number there too. This hasn’t happened recently so no idea if it has been fixed. For point 6) I agree, but I would wish to keep my default setting across all Wikimedia projects based on my default language version on that project (currently it seems I get the “ not logged in” version as soon as I leave one and enter another).

On copyright files: All of that being said, one thing I really like is the effort to improve multi-lingual copyright labels outside of the complicated templating that we have had since 2010. As a paintings enthusiast with a fondness for Dutch 17th-century art, I am happiest with high resolution images of such paintings and always eager to see the best and version in use. I am a big fan of detail images of paintings and recognise the challenges when all we have are details and are missing an image of the whole painting. Occasionally I stray outside of my safe “PD-old-expired-100” and I find it very confusing at times to see that we are using any image but most weirdly, an image of the signature for a copyrighted painting with no indication of the reason.

On "Commons file gaps": As a member of a gendergap workgroup I am also always surprised by the lack of any gender categories (though with various gender discussions these are as problematic as ethnic or melanin-toned categories). In the case of missing paintings, I have looked for ways to show the gap, and of course this is currently only possible on Wikidata. For popular modern artists there is currently no way to show a commons gallery of paintings in a catalog except to show a numbered series of File:Noimage.svg. Instead of deleting these regularly when they get uploaded by misinformed or unsuspecting Commonists, it would be nice if copyrighted images just default automatically to the “no image” option based on the Wikidata item information, which can be passed through to Commons.

On "modern art on Commons: I do find it logical to look for modern art on Commons, even if we insiders know it’s not there. Most people will look for a painter or sculptor name without considering copyright at all. With a true “global sign-in” to give me my customised UI, I could possibly use some structured data flag to a file held in some non-English Wikipedia, enable auto-delete for copyright files based on death date of the creator (though possibly trumped by Freedom of Panorama) all using some “No image” structured data artefact so that if the artist gets reattributed or his/her death date passes the 70 cutoff, the “undelete” would be semi-automatic and if there was no previous upload, then maybe an auto-upload link can be added to the artefact.

On "ghost uploads": I think the precision of our artist death dates is one reason we don’t have more modern art on Commons. As Commonists come and go, their “ghost uploads” for modern art slowly snowball into huge black holes, especially as exhibitions that those Commonists attended join the other “great exhibitions we all forgot ever happened”. Jane023 (talk) 10:15, 27 November 2024 (UTC)[reply]

@Jane023 Thanks, much appreciated! Sannita (WMF) (talk) 12:30, 27 November 2024 (UTC)[reply]
As one of the easiest requests to fulfil: what are the obstacles to temporarily addressing #1 to see the impact on usage and overhead? That's more a policy than a technical question. And I haven't seen any specific claims about what the negative outcomes might be. --SJ+ 14:57, 6 January 2025 (UTC)[reply]
@Sj Regarding WCQS, we are still addressing internally the issue, and I feel it might be a topic for the upcoming conversation regarding tools. The discussion is also going on on Phabricator, and I'm monitoring it closely. Sannita (WMF) (talk) 14:03, 7 January 2025 (UTC)[reply]
[edit]

Where are they?   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 15:49, 12 December 2024 (UTC)[reply]

At the event on meta m:Event:Commons community discussion - 12 December 2024 16:00 UTC. GPSLeo (talk) 16:04, 12 December 2024 (UTC)[reply]
@GPSLeo: I see now under "More details", thanks. Also in my email after registration.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 17:08, 12 December 2024 (UTC)[reply]
@Jeff G. Sorry I missed the comment. For the next calls I'll be sure to make them more visible. Sannita (WMF) (talk) 17:08, 12 December 2024 (UTC)[reply]
@Sannita (WMF): Thanks! Not in the VP, though.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 17:20, 12 December 2024 (UTC)[reply]
@Jeff G. I was thinking the event page :) Sannita (WMF) (talk) 18:06, 12 December 2024 (UTC)[reply]

Thoughts on tool investment priority questions

[edit]

As I am reluctant to join a call that is using proprietary technology (there's a priority for you), I'll answer the two questions that were shared on Telegram here.

  1. . I think it makes more sense that WMF is working on core tools rather than supporting community developed tools. That being said, I think a lot of the tools created by the community should be core functionality. By that, I mean that it is not just a maintainer for the existing tool that is needed, but that the idea is brought into core and made fit natively in the existing workflow/ecosystem. Sometimes it is completely missing though, as the possibility to record and upload video in a free format should be a core functionality, but there is no clear "core" for that to add it to. In that case, helping out on the current Commons app may be the right thing (or perhaps it would be an even cleaner separation to have a separate app for it). Another possible aspect of this question is that WMF can help highlight which tools are in most dire need of active maintainers, and in general make efforts to help ease onboarding of new ones.
  2. . I would love to see a video recording/upload tool, as that seems to be something that is hard for the community to build.

Ainali (talk) 15:12, 6 January 2025 (UTC)[reply]

  • I would like that the Foundation would invest in supporting any much used tool, no matter who has created it. First of all I think of Cat-a-lot, a much used tools (at least in Commons, but perhaps also in EN-WP), with which there were problems last year and it took quite a while to solve the most part of them, because the creator was not available anymore and WMF was reluctant to help because it was not a tool made by them. I think this is one of the tools created by the community that should be core functionality. --JopkeB (talk) 15:36, 6 January 2025 (UTC)[reply]
Very much looking forward to the new 2025 calls. The ones I attended so far have been very well organized and moderated and it was really interesting to hear everyone's perspective.
  1. I think that it's preferable for WMF / Wikimedia entities to develop and maintain (often newly built) functionalities rather than taking over maintenance of community tools.
    1. It should be a long-term engagement with maintenance planned for at least the next decade.
  2. In my view/experience, metrics, batch contribution and batch import functionalities are crucial to build and maintain. Not only for the GLAM use case, but in general. Batch contribution to and batch upload of files is important not just for Wikimedians but is very valuable for anyone working with a MediaWiki wiki that includes large(r) amounts of media files. Metrics are interesting not just for partners, but to display the impact of the contributions of any Wikimedian.
  3. As I mentioned above and expressed in earlier calls, structured data should be prioritized, not categories and wikitext.
Spinster (talk) 15:45, 6 January 2025 (UTC)[reply]

I may or may not be able to make the call, but would like to suggest that for a lot of community-maintained tools, what WMF could best provide is program management and coordination of volunteers. We don't necessarily lack volunteer developers willing to maintain these tools, but the PM side of the process (keeping track of whether there are maintainers signed up for each tool; raising the flag when there is priority work; getting word out when there are changes coming that are likely to break existing tools and tracking whether someone has taken responsibility for checking each tool to make sure it is ready to deal with the change; creating a path for people to get involved in this work; etc.) is an extremely difficult task to accomplish on an unpaid volunteer basis. I suspect that if this coordination were better done, we would have a lot more volunteers to do this. - Jmabel ! talk 18:49, 6 January 2025 (UTC)[reply]

I support this suggestion of Jmabel. It would already be very helpful to know who (or where) to turn to for questions about a tool, especially when there is no answer on the talk page of a tool (including templates). JopkeB (talk) 06:53, 7 January 2025 (UTC)[reply]
One area where there could be more collaboration is with initiatives like d:Wikidata:Wiki Mentor Africa and m:Global Majority Wikimedia Technology Priorities. Wiki Mentor Africa specifically is focusing on technical skill development, and it could be useful for them to find suitable tools for long term development. I think that they also have coordinators, but in the tooling context, I think that the biggest bottleneck is the number of technically skilled mentors (i.e., persons who already know how the tools, Wikimedia infrastructure, etc. work under the hood) who can answer questions. --Zache (talk) 08:40, 7 January 2025 (UTC)[reply]

I add my two cents on a WMF-maintained video upload and conversion feature (either as a core mediawiki feature or a dedicated WMF infra/tool). I (reluctantly) adopted video2commons last year, only because I really needed it for my own project and no one stepped up when the tool was broken for several months. I managed to patch it up and it's more stable now, but still there are plenty of bugs to solve, new features to address. I lack both time and skills to maintain it properly, so I'd really love to see the WMF invest resources in this topic. vip (talk) 21:12, 8 January 2025 (UTC)[reply]