James Hare is a consultant who has been involved in the Wikidata project since 2013 and he has been a Wikimedia volunteer for over 15 years. He is currently working with the Internet Archive as a developer and product manager specifically on two Wikibase initiatives. Both Internet Archive Wikibase instances are using WBStack, an invite-only Wikibase cloud-based solution. James is always open to new opportunities. (Wikidata Profile: Q23041486)
The project idea began with the Internet Archive bot which checks for broken IA links and replaces them with valid ones in the Wikimedia Ecosystem (100+ sites). The project grew out of this idea to take every citation in Wikipedia and create a Wikibase Q entry for each, then an attempt to connect that citation to the textual resource (if one exists) and interlink it with other relevant entities.
To tackle the project, James is taking a layered approach:
1) Adding all Wikipedia citations in
2) Adding in identifiers
3) Adding in links to source materials
4) Creating interlinkeages
An example entry: https://wikipediacitations.wiki.opencura.com/wiki/Item:Q67964
WBStack (there are capacity constraints but these are evaporating with further development from WMDE). WBStack will soon be Wikibase Cloud, not sure if there will be migration work to be done but it will probably be pretty straightforward.
Most folks know IA metadata is mediocre. My approach is to improve the data outside of the current system which is closed off from volunteer editors. Creating a Wikibase instance allows IA to share the data with external curators, inviting other people to work on the data without impacting IA or requiring any new functionality to be built.
To be able to ask important questions of the data that are not possible today, to yield new insights.
No there are two:
For bibliographic citations - https://wikipediacitations.wiki.opencura.com
For news articles - https://iagraph.wiki.opencura.com/
An important thing to note is that Wikidata is reaching capacity. Wikibase is the response. We need to explode the Wikibase instances.
Options for federation:
Property federation (right now this is an all-or-nothing deal) - to federate with Wikidata you will need to import Wikidata properties wholesale.
You can also create your own properties as you need them and then you map using the “same as” property to the Wikidata property
Another non-technical route is to go to the community for the creation of a new property in Wikidata
Property creation is a regulated process. A group of super volunteers (sysops) decides which properties are in Wikidata. Their goal is to maximize the expressiveness with as few properties as possible. The application process is intended to weed out duplicate ideas.
WBstack – it’s currently going from a side project to a core project for Wikimedia Deutschland. It’s going to be Wikibase.Cloud and it's supposed to scale to handle very large data sets. Look to Adam Shoreland and his blog for development news. (https://addshore.com/)
I take a pluralist approach. There are many ways of expressing things – there isn’t one way of modeling relationships. Data modeling is going to be nuanced given a community’s culture/language/background and that diversity is going to be reflected in Wikidata.
If you expose your data for other people to consume then other people can improve that data for you. New knowledge can then be synthesized out of existing knowledge. (See chemical data example) Putting your data in a format that other people recognize and expect (like semantic-linked open data) makes it actionable and usable.