September 15, 2021
Katie Mika is a former National Digital Stewardship Resident (NDSR) for Foundations to Actions: Extending Innovations in Digital Libraries in Partnership with NDSR Learners. Currently, she is a Data Services Librarian at Harvard Library’s Research Data Management Program & the Institute for Quantitative Social Science (IQSS) where she provides data-sharing consultations, data curation services, and research data management support for researchers, librarians, and domain specialists. She consults on programs and research projects to support the creation, use, reuse, and archiving of FAIR data objects and partners with libraries across campus to develop Collections as Data services for libraries and special collections that contribute to the long-term preservation of data artifacts with enduring research value.
Yes, open collections content as computational assets. Linked open data is a way towards greater Interoperability and feels like the most important thing we can do to make all of that text into structured information accessible to users.
Let’s think about low-hanging fruit:
One direction that seemed like an immediate pathway was creating a user workflow to access the text content more easily.
Project Idea: There was an R package that was an API wrapper -- maybe building a python wrapper for text content?
More demos, more proofs-of-concept. High-impact visualization projects that demonstrate the value of using collections data.
Adding taxonomic names to page content from transcribed OCR would be simple and extremely powerful.
Turning image data into a dataset that could be ingested into computer vision tech
Personal experience disambiguating content to port BHL content to Wikidata. Very manual. The balance between high-quality and lower-quality metadata is necessary. Named entity recognition is improving every year.
Gamification is most effective when it is extremely simple. It needs to be built into the platform -- don’t send people offsite. They are motivated by supporting the improvement of the BHL. Don’t remove people from the content. The game shouldn’t be the motivation, the content is.
The old idea that crowdsourcing is some kind of hack -- the work done by the crowd, is actually not FREE. Community building, standards, guides, videos etc. are required. Incentivizing individuals is also needed. Moderators and curation require more staff capacity, not less.
Likely not. But you can see an example of code; look to National Library of Wales leveraging IIIF.
I looked at Wikisource but didn’t pitch it in the NSDR report -- didn’t make sense to use unless you are working in the Wikiverse system. Wikisource is the best platform to use if you are in the Wikiverse.
Wikisource at the time was very basic and didn’t have the bells and whistles that the other platforms had.
Wikisource needs more features. For one: a more sophisticated image viewer. Although things have likely changed for the better since 2017.
That being said, the exports from Wikisource are super flat, and highly interoperable - exports into JSON/JSON LD. - output is extremely usable.
Probably did not happen.
There was a meeting in St. Louis to have a discussion. The key takeaway then was that there weren’t enough resources to continue to extend functionality.