October 21, 2021
2021 marks Andy’s 18th anniversary of working with Wikimedia projects. He has been working with Wikidata since its inception in 2012 when it was initially conceived as central data storage for Wikimedia wiki sites. Andy is an amateur ornithologist, a freelance consultant, and has been a Wikimedian-in-Residence (WiR) with a number of organizations, including ORCID and the Royal Society of Chemistry. He writes regularly on wiki and natural history topics at his blog. (Wikidata Profile: Q15136093)
Andy’s wiki interests include:
Wikipedia: templating content to make it machine readable [microformats and HTML mark-up],
Microformats for taxonomic names used in Wikipedia infoboxes and elsewhere (see Andy’s blog for more information), and
the goal of one Wikidata Q record for everything that had a Wikipedia article.
Wikispecies is maintained mostly by a relatively small community of professional taxonomists, it is pretty unstructured. Andy wants to promote more structure and is an advocate of Wikidata to Wikispecies data exchange automation.
Andy is running this project right now but there is not much community help. He could use more help.
Integration is primarily a cultural and social issue, not a technical one:
Initially, there was some big pushback from the Wikispecies community;data quality control and divergent data models were the concerns.
Data Model issues boil down to a “Bonnie and Clyde” problem.
Need to resolve the taxonomic models. Also some folks in Wikidata are not the easiest to work with and have been banned from Wikipedia.
Wikidata makes Wikispecies a redundant project. What we need is a front-end to Wikidata similar to the Scholia interface but for species: https://scholia.toolforge.org.
Scholia – can display information about a taxon (Wikispecies should mimic this project)
Grandfather rights, they need to wean off of Wikispecies; make material available in Wikidata and Wikimedia Commons. This makes the data available to Wikispecies. We need to encourage this data flow.
Institutions
Collections
Taxons. Note we are pulling in images from Wikidata; there is no consensus here yet to build a template. It is complicated because of tree hierarchy.
People. There is an “authority control” template that auto pulls in all Author IDs and a template for an image of the person. The community seems mostly happy about this.
Journals
Publishers
Articles. Journal articles are next on the development docket.
DOIs for historic papers are awesome – incredibly useful!
If this can’t be done, associate them
Show the Wikidata QID for creator IDs on the website
At the point of digitization add the item to Wikidata; maybe build a bot or an automated tool to do this? You might be able to find a Wikidata volunteer to do this. (Andy is interested in serving in an advisory capacity for BHL as a Wikidatan-in-residence.)
BHL needs a Wikimedian-in-residence to:
Develop staff and librarian skills in Wikidata,
engaging with BHL volunteers,
doing bulk uploads, and
running public engagement events and campaigns.
Wikidata is working as intended, especially with the smaller language community. There is a bit of resistance from EN-Wikipedia. There are 5.5 million articles to administer -- too much and they need to automate it.
See categories for taxons, species and categories for a specific cultivar
BHL ought to focus effort into Wikidata, Wikimedia Commons, Wikisource - these are the backend infrastructure of all Wikisites.
Wikimedia Commons also has templates to pull in metadata about images from Wikidata
Keep pushing images to Flickr (but with CC0 or openly licensed) - has the licensing been changed/updated? JJ - yes, the licensing is CC0 now.
There are Commons people monitoring BHL Flickr – and they will pull the new content in ad-hoc like Fae (who Andy has not heard from in quite some time)
Can you pull in the taxonomic machine tags? Sometimes they are used to putting that stuff into categories. We definitely don’t want to lose that metadata)
Wikimedia Commons has the Wikibase extension installed to store structured data about the images; search runs on the Wikibase structure data and there is an older search mechanism that searches for strings in the pages using Elasticsearch
They did the install of the Wikibase (a year or two ago) - see this page for more information https://commons.wikimedia.org/wiki/Commons:Structured_data
They are restricted to what is out-of-copyright in the US; or openly licensed. Slightly different approach to copyright than Wikimedia Commons.
They use DJVU (preferred) or PDF files - proofread text version of the document.
Interfaces are easy to start but finesse takes a bit of learning - the last 5% of the work takes a fair bit of learning. Everything needs to be looked at by at least 2 people. (QA process can be a bottleneck. This is the case with other transcription platforms as well)
Wikidata items will link to the Wikisource. See:
Good news! I believe Wikisource is getting more resources and support from Wikimedia Foundation for technical development
They are a very welcoming community, they run transcribe-a-thons to engage volunteers
The issue with Wikisource is massively labor intensive but it is very fun and relaxing work
No, but they can be put into Wikimedia Commons, and from there, a CSV can be created – but this is rarely done and will only be allowed if it is useful to other Wikimedia projects or has an educational purpose.
Other tidbits
There is a new data “Firehose” service to Facebook, Google, etc. They get the data for their purposes with guaranteed up-time. BHL could also get the enterprise service as well. IA is the first non-profit to get this. Service here: https://enterprise.wikimedia.com/