Wikidata is a free and open knowledge base that anyone can edit. It is a sister project of Wikipedia and serves as a central repository for structured data, so rather than paving pages with text, it stores data in a structured format that can be queried and reused across different platforms.
One of the key features of Wikidata is its ability to handle deletion requests, which are known as RFDs (Requests for Deletion), a similar process happens on Wikipedia. These requests allow users to propose the removal of items from the database that are deemed unnecessary, incorrect, or otherwise unsuitable for inclusion.
I was recently asked if there was currently any “tracking of the amount of deletion requests on WD over time”, with a specific focus on promotional editing, number of requests, and administrator burden. I was not aware of any such tracking, so I decided to investigate the data and see what insights could be gleaned from it, and possibly help out with whatever then end up happening as part of T429036 [Analytics] [Request] Baseline data for Item deletions which looks like it will happen soon.
Approach
All of the requests for deletion go via the RFD page on Wikidata. This page is treated as a talk page, with each section being a request for deletion. Each section has a title, which is the item, or items being requested for deletion, and a body, which contains the reason and any discussion around the request. The page is often maintained by bots in terms of marking when deletions occur, and when requests are closed, so the page is a good source of data for analysis. And like many other talk pages, it is also archived, with older requests being moved to archive pages. The main RFD page has been around for a while, and the archive pages go back to 2012.
Data Gathering
I’m trying out marimo for my data gathering things time, when I would normally use a standard IPython notebook. It’s self described as “a next generation Python notebook”.
So first off, I started by iterating through the archive pages, and downloading them all into local .wiki files, to speed up further processing.
You can find the notebook in this gist, and this resulted in a bunch of files that look something like this…
{{Archive|category=Archived requests for deletion}}
=== [[Q259]] ===
This item is a duplicate of [[Q35]]. --[[User:Hydriz|Hydriz]] ([[User talk:Hydriz|talk]]) 11:45, 30 October 2012 (UTC)
:{{done}} (as staff) --[[User:Denny Vrandečić (WMDE)|Denny Vrandečić (WMDE)]] ([[User talk:Denny Vrandečić (WMDE)|talk]]) 12:27, 30 October 2012 (UTC)
=== [[Q292]] ===
Duplicate of [[Q2]] (Earth). [[User:Emijrp|Emijrp]] ([[User talk:Emijrp|talk]]) 12:21, 30 October 2012 (UTC)
:Oh I did it as a steward, does someone know if am I allowed to use my tools here? --[[User:Vituzzu|Vituzzu]] ([[User talk:Vituzzu|talk]]) 12:25, 30 October 2012 (UTC)
:: Yes, you are. The project has no admins of its own (only some staff who help out right now). If the stewards take over, staff would be happy to step down from that task.
:: And ideally, the users will soon have their own admins and bureaucrats to deal with it :) --[[User:Denny Vrandečić (WMDE)|Denny Vrandečić (WMDE)]] ([[User talk:Denny Vrandečić (WMDE)|talk]]) 12:27, 30 October 2012 (UTC)
:::Yep, it's quite common for new wiki but this is a special one ;)
:::Anyway I'm quite interested in helping so if needed do not hesitate to poke me.
:::--[[User:Vituzzu|Vituzzu]] ([[User talk:Vituzzu|talk]]) 12:29, 30 October 2012 (UTC)
=== [[Q304]] ===
And [[Q254]]. Mozart. [[User:Emijrp|Emijrp]] ([[User talk:Emijrp|talk]]) 12:35, 30 October 2012 (UTC)
:{{done}} by Vituzzu. --[[User:Hydriz|Hydriz]] ([[User talk:Hydriz|talk]]) 13:48, 30 October 2012 (UTC)
In total this is around 185MB of
text.
Analysis
Next, some actual analysis of the data. I used a combination of regexes and the mwparserfromhell library to parse the wiki text when iterating through the files.
Signals
The script analyzes the initial_reason and section heading using regular expressions to detect specific themes. It categorizes discussions by searching for keywords related to:
- Promotional content: (e.g., spam, marketing, self-promotion).
- Notability: (e.g., lack of references/sources).
- Duplicate/Vandalism: Identifying specific policy-based reasons for deletion.
Specifically using these patterns:
SHARED_SIGNAL_PATTERNS = {
"is_promotional_signal": r"\b(?:promo|promotion|promotional|advert|advertisement|advertising|marketing|brand|company|business|self[- ]?promo|coi|spam|hoax|vandal)\b",
"is_notability_signal": r"\b(?:notable|notability|reference|references|source|sources)\b",
"is_duplicate_signal": r"\b(?:duplicate)\b",
"is_vandalism_signal": r"\b(?:vandal|vandalism)\b",
}
These were extracted using some more code, which looked at the most common words that appeared in the RFDs. (notebook code)
Outcomes
Since administrative outcomes are often recorded in varying ways, the script uses a tiered approach to determine the result:
- Template Detection: Primarily looks for specific Wiki-templates (e.g., {{deleted}}, {{kept}}).
- Heuristic Fallback: If templates are missing, it searches for text strings in the comment history to “guess” the outcome (e.g., “not deleted,” “on hold”).
- Timeline Mapping: It uses the last identified outcome in a discussion to set the final state for that RfD.
Things get a little messy here, as the outcome is not always clear, sometimes there are duplicate outcomes, and sometimes the outcome is not recorded at all. The script tries to handle this as best as possible, but there are some cases where it is not clear what the outcome was, but for the most part, it is possible to get a good idea of what happened in a generalized way through the years.
Overall, the raw data summarized looks something like this, but the graphs below are far more interesting!
Aggregation & Visualization
So, what can we see? (You can run the notebook yourself too, and see the code)
Looking at RFDs over time, there are a high number from the early years, which skew the perspective of the last 10 slightly, and also it should be noted that we are only half way through 2026 right now…
I really don’t know what happened
back in 2013 and
2014 for
sure, but these spikes were spread out throughout the months of
those years, and there were up to 40k RFDs in June 2014 for
example. One of the peak days was June 18th,
where I see lots of listings that show Merged with
[[Q12345]], via The Game which seems to imply there was a
tool aiding these deletion requests, and that this was prior to
merging being a functionality of Wikibase on Wikidata.
So if we zoom in on the more stable data, and also project the second half of this year, we get a clearer picture showing and upward trend since 2019, with around 14k RFDs predicated this year, which is around 38 a day, and double the number back in 2018 and 2019.
In general this is between 7k and 15k per year, and if I had to guess, we would see merge edits from 2015 onward to replace
And if we have a quick look at the outcomes, we can see that most are deleted or done, generally around 85-90%, with a small slither of other outcomes.
On to the signals! The chart below tracks the percentage of deletion requests flagged with promotional or notability-related keywords over time.
Several clear patterns emerge from the data:
- Long-term Upward Trend: There has been a steady, significant increase in signal-bearing deletion requests since 2012. What was once a relatively quiet process potentially using other words, has become increasingly dominated by these specific types of issues.
- The 2020 Shift: A notable “step change” occurs around 2020. Before this, the rates were lower and more erratic. Since 2020, both promotional and notability signals have stabilized at a much higher baseline, rarely dipping back to pre-2020 levels. 2020 also aligns with the larger increase in baseline RFDs being recorded, but remember, this graphs is a % rate anyway…
- Promotional vs. Notability: While both signals have trended upward, they often move in tandem. This suggests that the issues driving deletions on Wikidata are frequently overlapping—many items flagged for notability concerns are often simultaneously flagged for promotional content, indicating a clear intersection between these two types of problematic editing.
- Recent Volatility: In the most recent months (2025–2026), we see higher volatility, and higher overall rates.
Now, what does this actually mean in terms of admiistrative load on the project? The below graph is interactive, and starts of with the total closed RFDs hidden.
We can see that the 2013/2014 period again stands out, and that large number of RFDs being created and closed during that period lead to the number of RFDs that an admin on average would close skyrocketing. This also highlight another interesting month, May 2017, which also has a spike in RFDs closed per closing admin. One of the largest days was May 22nd and it looks like many items that were empty were found, and reported for deletion.
If we again zoom into the time period after 2015, we can see a fairly consistent set of data in terms of unique closing admins per month, and also average RFDs closed by an admin per month. Note this is only an average, not an exact calculation on a per admin basis.
Taking a quick look at the users and bots that have interacted with the most distinct RFDs, the top 10 are:
| User | RFDs |
| BeneBot* | 245716 |
| DeltaBot | 98131 |
| Succu | 15156 |
| Marcol-it | 14230 |
| Calak | 8442 |
| Ary29 | 8423 |
| Lymantria | 6676 |
| AttoRenato | 4875 |
| GZWDer | 4482 |
| Dorades | 4445 |
I’m down at number 202, with only 454 RFDs myself.
Further thoughts
This really only scratches the surface in terms of what could be determined from this treasure trove of archived discussions around deletions, a few things that would be well worth trying to determine in my opinion:
- There are many things deleted on Wikidata that do not end up having an RFD entry, as admins just go and deleted them, so that data should really be pulled in to get a full deletion rate picture.
- In terms of “overload” of the system, the time that a bad item exists for before it is deleted might be a very good indicator, this would likely require non public data sets however to determine the dates of deleted revisions of these deleted items.
- I decided to leave the signal analysis to the initial comment and or reason, and I imagine if the entire conversation around deletion was checked you’d end up with slightly inflated rates when it comes to the signals.
So, to whoever gets to look at T429036 [Analytics] [Request] Baseline data for Item deletions, good luck, and have fun!
And a note on marimo, its not terrible, I quite like it, the automatic sandboxing and vscode integration is rather neat.




).
Mapping missing buildings in La Paz, Bolivia
Capacitação OSM 2026 – IVIDES DATA ® –
Formulários Web com KoboToolbox 










































































0821.jpg)



