December 2, 2021
Diana is formerly the Technical Services Librarian at the Marie Louise Rosenthal Library at the Field Museum of Chicago and also served as Chair of BHL Cataloging Committee. She is now an active BHL and Wikidata volunteer and brings a deep bench of metadata expertise to her work in disambiguating BHL Author names. She uses Wikidata as a research database and source of URIs for Author name records in the BHL and has merged 1000’s of these names to facilitate better search and indexing for BHL’s end users. She is known to invent and iterate on workflows in OpenRefine to enable reconciliation between the two knowledge bases: BHL and Wikidata.
Duplicate author names associated with title records have always been a problem for the BHL. With the addition of article-level metadata, this problem has only grown exponentially. Data providers are not required to submit the author name data with an accompanying URI. This means there is much work to do for the BHL Cataloging and Metadata Committee to disambiguate duplicate names in the BHL. No identifier for these names, means BHL Catalogers must use external systems such as VIAF, LC Name Authority file, Wikidata, Google searches etc. to do painstaking research and reconciliation work.
(See also Diana’s: Open Refine / Wikidata author name reconciliation process)
Article creation has created a lot of issues for us. Names in BHL have become a nightmare; ideally we would have one record per unique creator but we are very far off from this ideal.
Data providers are giving us different variations of names
Names attached to title
Rules for formatting names are all different across partners resulting in duplicates
Names attached to articles (even bigger problem)
Pensoft, sometimes has ORCID IDs which is very helpful but not always
Biostor is problematic because authors don’t have any URIs
Diacritics can create problems too, by creating duplicate entries
VIAF and Wikidata have duplicate entries as well – so the problem of duplicates is not necessarily resolved by consulting these databases, rather you have to do a lot of cross-investigative work
At this time, VIAF is more comprehensive than Wikidata and better for older content
Alternative names in Wikidata give us synonyms to search on which is great
Summer of 2020
I pull down spreadsheets of around 1000 names to work on and just keep working through the alphabet
Once the spreadsheet is cleaned-up volunteers will go in and make sure entity is with the right Wikidata Q entry; then volunteers go and merge and add Wikidata ID to the BHL record
Typically, a full day, probably more
Diana Shih
Becky W
Susan Lynch
Diane Shaw (doing a lot of work on the Wikidata-end)
Certainly improved searching, correct metadata, and full publication indexes for any given author.
Users do write in and notice the work
As far as I know, no work is not being done for geographies and/or subjects. I don’t even think you can use the Wikidata Q number for these records in the BHL database at this time.
Document Diana's process, create Open Refine recipe for other staff to use. We need more people on this.
Load BHL author names data into Wikidata as a project and then use their MixNMatch tool to crowdsource the tsunami of reconciliation work
Speak to Rod Page about his current workflow and the addition of URIs for author names. Could we use wikidata as a staging area? Deposit citations into wikidata first, disambiguate author names over there, then bring them into Biostor for eventual harvest to BHL? Think about data flows.
Talk to Crossref and ask if it's possible to enforce author name URIs – do we have any sway, can we make some type of appeal?
For author names lacking a URI completely, generate one by adding it to Wikidata
Ask BHL-TECH to allow the Wikidata Q number to be used for other entity types (geographies, articles, subjects etc.)
Tech development- allow the cataloging group to edit subject headings too. Need a conversion project to FAST headings or some RDF vocabulary.
We need more discussion and a policy around preferred name authority database to use for BHL, should the preferred name be pulled from VIAF, LC Authority, Wikidata, or elsewhere?