Carnegie Hall Data Lab

Carnegie Hall Data Lab: First Year in Review

*Note: This post was first published on January 29, 2021 as part of our original CH Data Lab site.

The Carnegie Hall Data Lab launched one year ago, on January 31, 2020.

After forming an official Data Lab team of three archivists from the Carnegie Hall Susan W. Rose Archives, we made our GitHub Pages site live and published four experiments for public access and use. We gave the site its public debut at the LODLAM Summit at the Getty Center in Los Angeles during the first week of February and began reaching out to other information professionals to promote the site and encourage potential data partnerships and collaborative projects. We started off 2020 with a lot of momentum and grand expectations for the year to come.

Less than two months later, everything changed for all of us. The COVID-19 pandemic forced the closure of Carnegie Hall, and we switched to virtual workspaces and transitioned to virtual interactions. At first, we saw this break from “normal” archives activities as a period to focus our efforts on data cleanup and on upgrading our data lab user interface and experiment functionality. We knew upon the Hall’s closing that the Rose Archives already had several great digital resources to work with – the Data Lab, Performance History as LOD, the Digital Collections, Performance History Search, and OPAS (our internal performance history database) –and that we would have lots of proactive data work to focus on with limited accessibility to the physical office. We quickly realized that a break from the norm meant a heavier lean on the Rose Archives to produce (or dig up) interesting historical content for a performance hall that could not hold events. We received more requests for content and archival assets than ever before, to supplement the content produced by the Hall’s social media, artistic planning, and marketing teams. Our data lab posts and experiments fell to the back burner as the internal and external Archives requests ramped up.

When we (virtually) sat down to discuss this reflective post on our first full year as the CH Data Lab, we thought we would write up a quick post expressing our disappointment in the way things played out but outlining our future hopes and plans. Despite the chaos and uncertainty of the past year, we soon realized how much we accomplished to move the Data Lab forward.

2020 in a Nutshell

Despite few new events taking place at Carnegie Hall, between February 1, 2020 January 25, 2021 we cleaned up performance history data and:

Added over 1,500 new works.
Added more than 3,200 new names (performers and composers).
Created records for 828 new or previously undocumented events.
Released 9 experiments and 5 blog posts.

Considering the Hall has been closed since March 13, 2020, these numbers are notable and show we’ve put forth considerable effort in filling gaps in our history using archival materials and outside sources. We additionally created many non-public event records for virtual activities related to the Hall’s education and fundraising departments.

We worked hard to generate internal interest in the Data Lab and forged partnerships with many Carnegie Hall departments that may not have had time to pay attention to our project before the pandemic drew the Hall’s activities to a halt. More people began to see the significance of how we use data to creatively tell stories, especially important since teams were (and are still) looking for more ways to reach people virtually and with new forms of technology. The social media team started asking for permission to share our Data Lab posts and experiments, and the web team began investigating the use of our URIs as embedded metadata to improve SEO on the main Carnegie Hall website. Any conversation we can create around LOD and SEO is a huge success!

Our work with other teams further challenged us to look at our own data differently, helping us to find flexibility in our data curation and in our phased processes. We implemented a weekly automatic update of our linked open dataset, an important goal since we initially released our LOD in June 2017. By adding descriptions, explanations, sources, and making corrections to our performance history database, we deepened the usefulness and authoritativeness of the information we offer. For example, when asked if we could produce some information pertaining to non-classical events, we strengthened the records for rock concerts by incorporating set lists and beefed-up performer records with biographical information from outside sources (and of course linked to authority and Wikidata URIs whenever possible).

2021 is Looking Up!

We do have some Data Lab experiments and work in motion! Since most of the records in our performance history database were created based on printed materials (programs, flyers, etc.) in our physical collections, we revisited our Booking Ledgers collection as a source of performance data for the many events for which we have no other documentation and are gradually reviewing each digital page to create event records for these performances. Once this large task is complete, we hope to incorporate our findings into a blog post and/or find inspiration for a data experiment.

The Data Lab site is also ready to be improved. GitHub Pages’ simple markdown structure doesn’t allow for active scripting, limiting the kinds of experiments we can offer. To address this, we have been working on building a new Data Lab site using Django, a Python-based framework that will facilitate live SPARQL queries and interactive experiments and will also include better blogging capabilities Our tech skills are limited on this front; we are learning more as we go along but hope to recruit some external support to realize our website dreams.

We continue to chug along with data creation and Wikidata editing, and have aligned several thousand additional Wikidata items to Carnegie Hall agent and work IDs. This continues to improve the experiment outputs in terms of completeness and accuracy, giving us tangible benefits throughout the data management cycle.

Ahead for Carnegie Hall is its Voices of Hope: Artists in Times of Oppression festival, “an exploration of humankind’s capacity for hope, courage, and resistance in the face of unimaginable injustice”. We look forward to creating an experiment or post inspired by this festival, specifically exploring our role as information stewards and ways technology and data can be used to proactively address archival silences.

Working Together

While we deprioritized external collaborations in the past year (recognizing that others had to do the same), we hope to revisit some of our earlier conversations with interested parties and institutions. In the spring of last year, we started working with representatives from the crowdsourcing platform FromThePage to transcribe some materials in our collections. We uploaded a few sample pages from early Carnegie Hall accounting ledgers and began transcribing and creating keywords for these pages. We plan to continue working on this process and to further the conversation around crowdsourced transcription of our materials and uncovering historic data. Early last year we also had a virtual meeting with a few external colleagues to brainstorm collaborative data projects between the Carnegie Hall Data Lab, Semantic Lab at Pratt, and the Weeksville Heritage Center and hope to restart these conversations soon.

Making “Lemonade out of Lemons”

Since its founding in January of 2020, our perspective on the purpose of the Data Lab has changed. We initially thought of Data Lab as a space to play around with data and make cool projects for others to enjoy, use, or replicate. While this is still a big tenet of CH Data Lab, our work has evolved to be more inclusive of outside asks, influences, and considerations. We now enjoy basing experiments off content that other teams ask for, such as data about a specific performer or musical genre. We still have autonomy to create interesting experiments and research and write about topics important to the Rose Archives team, but now we get to cross-post or cross-share with content shared by other Carnegie Hall teams. This brings more eyes to our project and increases interest among those who may not otherwise find our Data Lab site.

Another important takeaway: don’t let perfect be the enemy of the good. Some of the asks from this past year have really pushed our tech skills and required us to think creatively about specific experiments and data presentations, often in ways that have felt uncomfortable as they were not perfect or completely realized. Ultimately, the experience led us to expand our knowledge on manipulating data and accept that our experiments are called experiments for a reason: they need not be totally polished products to be shared.

The increased reliance on and demands for our data and digital materials throughout this difficult year led us to reconsider our data cleanup, update our processes, and step up our efforts to promote performance data use – a win-win for all. We now have more evidence to share that data work is prep work rather than busy work, as we know our data will serve many purposes. We identified many data gaps and at the same time outlined ways to iteratively improve upon completeness. We continue to work on data advocacy and on building trust within Carnegie Hall to produce better data and projects with long term value and use. We notice the downstream effects of our data emphasis and are open to taking more ownership of future, institution-wide data efforts.

Posted on Jul 29 2021 by archives