Day of DH: Data Data Data – CDRH Development

[This post is also available on the Day of DH’s blog]

Hello! My name is Jessica Dussault and I am a programmer at the Center for Digital Research in the Humanities (CDRH) at the University of Nebraska-Lincoln (UNL). Let me tell you a little about my Day of DH 2017 which was, if you can tell from the title, filled with data.

Hawking Lewis and Clark

My morning started off with conversation and some brainstorming. In late 2016, we redid almost everything about the Journals of the Lewis and Clark Expedition website, updating the TEI XML and redesigning the navigation, appearance, and technology. As one of our sites with the highest traffic and an important educational resource, it was important to us to make sure it continued to be accessible to all users in the future. Now, the work behind us, we find ourselves in the “marketing” phase of the project. How do we let people know about the site’s redesign and new features? Should we queue up tweets for each day’s journal entries and tag the Lewis and Clark National Historic Trail? Perhaps post a recipe for boudin blanc to social media? Create a choose-your-own-adventure Lewis and Clark game in the style of the Field Museum’s account? After an informal discussion in the morning, we came away with plenty of ideas….stay tuned to see how it turns out!

Independent Research Time

The CDRH has been kind enough to put 10% of the dev team time towards independent research. This could be entirely tech related, DH related, or, in my case, local history related. As an undergraduate at UNL, I was a member of the Cornhusker Marching Band and became interested in its history. As a staff member, I’ve been able to pursue that interest and recently CDRH and the University Archives and Special Collections (SPEC) have digitized three of ten vulnerable reels of band footage from the 1940s to 1960s. My coworker Sara Roberts and I also just finished teaching an OLLI class on the history of the marching band at UNL, so yesterday I took a bit of time to tie up loose ends — promising to clean up and publish the presentations, corresponding with guest speakers from the class, and tackling a few of the somewhat incomplete Zotero references we’ve been collecting. It’s so easy to click the “cite” button in a browser and move on, rather than immediately filling in more information about the source. Sara and I have a lot of dream plans, like locating funding to repair and digitize the remaining reels, writing up something more formal about the band’s history, creating a digital media project, and recreating drill routines from the 20th century. There’s a lot to do, but a few hours a week, we’re cranking closer and closer to the next big project.

One API to Rule Them All

Most of my day was spent working on the Center’s API effort. Karin Dalziel dreamed up a center-wide API a few years ago, but we’ve only started going all-in on it the past few months. Most of our projects are in TEI and furthermore, many of them are ingested into Solr for project specific searching functionality. It makes a lot of sense to be able to search across all of them at the same time, right? Unfortunately, in the past ten years as each project was created, the Solr fields used had similarities, but weren’t standardized. Here’s the current API plan:

Create a set of standard fields and fieldtypes for the API
Identify the search platform
Ingest TEI (and other formats) into search platform
Design API
Build API
Build frontend to stand up websites per project quickly
TEST TEST TEST
Start using it for real
Party
See if other people want to use it, too?
Speaking tour of the country talking about how awesome it is

Right now we’re mostly on steps 5 and 6 (step 7 is technically ALL STEPS), but yesterday I doubled back to work on step 3. Maybe two years ago I had designed something to take TEI and prep it for Solr, but since choosing Elasticsearch I had mostly abandoned the Solr ingest scripts for the new “data repository” as we’re calling it. However, since many of our old sites are still powered with Solr, it’s important to be able to quickly manage those indexes (creating, schema-fying, populating, clearing, etc), so I worked yesterday on plugging the functionality back in. I also got to delete some old code that is no longer being used, which was super satisfying. Nothing quite so nice as removing code that reminds you of how you once were a terrible — I mean how much you’ve grown as a — programmer.

This plugging back in and then tearing out took me most of the day. I will probably be doing more of it today, as well, with a particular emphasis on writing unit tests!

Endangered Data Week

Endangered Data Week is in full swing! It’s described as “a new, collaborative effort, coordinated across campuses, nonprofits, libraries, citizen science initiatives, and cultural heritage institutions, to shed light on public datasets that are in danger of being deleted, repressed, mishandled, or lost.” Yesterday I went to a session held at the UNL Library that introduced the concept of endangered data and looked at some current strategies being pursued. We thought up some ideas as a group of what we can do on a more local level. Although I think Endangered Data Week is not targeted towards data like the marching band film reels I previously mentioned, I kept thinking about them. In some ways, the UNL Library is already trying to “save” some data which is in danger of becoming unplayable. At the same time, people raised really great points about how we could be helpful in requesting information from Nebraska state government agencies, identify agencies whose documentation and data may not always be available, and take an overall survey of what information is available online for Nebraska organizations.

What’s Next?

More unit tests! My Friday probably contains some code review and cleanup of ongoing API components. It sounds like I’m also going to be helping put together a Poetry Printer with some of my coworkers, which should be a fun Friday project. I also need to send out a reminder email that the next Dish with DH event is on Monday, which will give folks on campus interested in Digital Humanities a chance to meet up over lunch and chat about what they’re working on. I hope you enjoyed this brief cross-section of all the programming and non-programming I do at my job at the CDRH!