March 23, 2022
Andra is a bioinformatician, semantic “webby,” and works on Gene Wiki professionally. Gene Wiki is a project that began its work by making articles in Wikipedia to expose and preserve gene data created during research funding cycles for the public (the nature of funding cycles means this data was getting lost). Gene wiki is now moving data into Wikidata because while useful in Wikipedia researchers wanted to expose Gene wiki knowledge in other languages. (2012) Andra has been involved with Wikidata since 2014. He takes the info from public databases and pushes the data into Wikidata. Andra uses the taxonomic world as a playground to explore and invent and is an active organizer and participant on iNaturalist’s Wikidata project — Wikidata:WikiProject_Biodiversity. (Wikidata Profile: Q19845625)
It started at Wikimania 2018 where a group of Wikimedians and Andra decided they wanted to create an app that would link to iNaturalist images. To date, 44,000 thousand images from iNaturalist have made their way into Wikipedia articles. Andra wanted to do this with GBIF images too. One complication is that GBIF does not annotate its image licenses separately from their metadata license.
During his work, Andra has found interesting things such as:
Frogs are difficult to find openly licensed images while dragonflies are very easy to find. The underlying thread here is that open image availability speaks to the particular values of each community and levels of “openness.”
Yes, follow-up and send Andra an example link to machine tags in Flickr and blog post with more information. We will discuss this soon, possibly on a next Wikipedia Weekly: Biodiversity edition.
I really don’t have enough time to work on Wikispecies. If you are interested in getting BHL images into Wikispecies then you have to add 300+ languages pages manually. I think there is value in the community and the work but my interest is data reuse which is Wikidata.
Wikidata natively is a relational database that stores JSON blobs; it is very similar to Wikipedia built on MediaWiki. Stores 1 article as 1 single record. Wikidata does the same thing but instead of storing text it is storing a JSON blob.
In 2015, they deployed Blazegraph as an additional search layer which means there is a copy of the data from Wikibase to the Blazegraph triplestore and if you look closely you will see there is lag in the Wikidata Structured Data Query Service.
Also Note: Andra’s semantic peers don’t love Wikibase / Wikidata because it is not actually a true triple store and has its own namespace that doesn’t conform to W3C RDF specifications.
However, the unique selling point of Wikibase is the ability to edit a single statement → you don’t need to have a PhD in computer science to add a statement. In order to do a single statement in an RDF database you would need to do an insert using APIs etc.
What Wikidata brings to the table is that it takes away (some of) the necessity to set-up, deployment, and infrastructure maintenance. This allows users to focus primarily on content with no need to deal with ICT issues. Maintaining Wikibase requires that users have to consider maintenance issues, however since Wikibase uses the same Infrastructure as Wikidata, less effort is needed to overcome learning curves.
People do know how to use Wikidata, so it is more intuitive to use the same interface on other datasets. Essentially, it allows anyone to interact with the semantic web.
Some Wikibase limitations:
Because BIBFRAME is an RDF ontology it is not so easy to deploy on Wikibase right now
It can’t handle millions of statements (e.g. BHL); there will be a workshop in March about increasing the intake of Wikibase. Andra to send an invite.
Wikibase Stakeholders Group is about institutions discussing requirements and needing similar functionality. They pool financial resources to hire developers to build out features. Additionally, it’s about surfacing institutional needs to WMDE.
BHL should be on the WBSG. It’s for institutes, libraries, museums etc. (it’s a hangout place, they meet once per month) and they have a central fund everyone chips in for development costs. Membership fees are currently $0 but that may change later.
Look into bots. Gene wiki bots: http://jenkins.sulab.org.
Andra is happy to work with us on creating a BHL bots; work on 1 bot on a specific dataset with the caveat of he will not maintain it. He needs to pass it off to BHL Technical Team for data curation and maintenance stuff.
We have:
BHL bibliography ID (P4327) (books/journals)
BHL creator ID (P4081) (authors)
BHL Page ID (P687) (pages) - > he uses page IDs
BHL part ID (P6535) (articles)
BHL name ID (P8724) (taxons) ← let’s work on this one.
Not in Wikidata:
BHL Item ID → not represented
BHL contributor ID → not represented
BHL collection ID → not represented
JJ sent Andra all of our datasets. He is going to look at them more.
He is semantic webby by training and quickly realized that setting up semantic infrastructure is very expensive. Wikibase is essentially a kickstarter for any linked open data (LOD) venture and helps get folks on the LOD bandwagon.
Also check out: https://triplydb.com - free triplestore then starts at $700 per month for paid tier.
Similar to the BnF project, Luxembourg is using wikibase for their library data but using CIDOC and has built a Wikibase that has mapping rules. They have a large technical team working to do the:
CIDOC mapping
Wikibase doesn't have strong security layers so built an additional SSL layer.
Here is a recent talk about this work: https://www.youtube.com/watch?v=MDjyiYrOWJQ