Carnegie Hall Data Lab

Card image cap

Welcome to our new Carnegie Hall Data Lab site!

The team here at Carnegie Hall Data Lab is thrilled to share with you our new and improved Data Lab website! This new platform combines our old data.carnegiehall.org site, which offered basic data querying and URI dereferencing ("About" pages for our Events, Names, etc., URIs), and our previous Data Lab site, which was hosted on GitHub and offered blog posts and data experiments.

Why build a new site?

The new site sprang from the intense pressure and uncertainty of the early days of the Covid-19 pandemic. (We discuss some of the challenges forced upon us by the closure of Carnegie Hall in our previous post.) Out of a desire to keep some momentum in the Data Lab during the closure, and in keeping with our DIY ethos, we decided to learn more about the inner workings of our existing data.carnegiehall.org application, with the hope of implementing some improvements and updates on our own. In general, we wanted to better integrate our public-facing LOD activities.

So, how did we start with wanting to learn how to self-manage our existing data endpoint and end up with a completely new website? When we launched the Data Lab in late January, 2020, we decided to publish our website – consisting then of blog posts and data experiments – using GitHub Pages, a service that turns Markdown files into a website and hosts them for free on the internet. It was simple, quick, and best of all for small operations like ours with limited budget, tech skills, and support, free. The setup served us well for 18 months, but it had two significant drawbacks:

  1. our public-facing offerings were spread across two separate applications with their own websites; and
  2. GitHub Pages' markdown structure, while easy to use and very convenient and fast, did not allow for any runtime scripting.

The second point wasn't an issue for experiments like our recent CH's Rock Explosion of 1971-72, which presents a fixed and unchanging set of data. Experiments like Whose Birthday is Today?, with its reliance on active queries for data that changes daily, weren't possible on GitHub Pages. We mitigated this to an extent by querying Wikidata, taking advantage of the Carnegie Hall Agent ID (P4101) property and the map display template on the Wikidata Query Service to create an iFrame embed for the resulting map, but this limited us to those performers and composers from our history that both have existing Wikidata items, and which we'd successfully aligned with Wikidata – only around 17,000 out of the more than 100,000 names in our performance data.

The original data.carnegiehall.org site, quickly built for us by our friend and colleague Matt Miller when we first published Carnegie Hall's performance history as linked open data in 2017, used a platform called Django. The Django folks describe it as: "a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source." Since we already had a decent working knowledge of Python, we decided that learning how to update and improve the site (by learning Django) might be within our grasp. Needing a distraction during those disorienting, frightening early days of the pandemic, what began with following a few Django tutorials ended, two months and many rabbit holes later, with a prototype for a new Data Lab website, combining a blog, data experiments, and a new SPARQL query engine all in one package.

Moving towards a production-ready site

As Alexander Pope said, a little knowledge is a dangerous thing. While we had enough chops to hack together a working local prototype for a new site, there was a substantial gulf between our capabilities and what it would take to make a production-ready, fully deployable version of that site. A short list of items in that gulf included:

  • hiding database credentials (the endpoint URL and login for the triplestore that manages our LOD) using environmental variables
  • fully installing the new Yasgui ("Yet Another SPARQL GUI") SPARQL query engine into the site – we'd been including it in the prototype web pages using a content delivery network (CDN), which was lightweight and easy, but less fully integrated and configurable than we needed (read more from Yasgui's documentation)
  • general QC to ensure the prototype would function smoothly when deployed
  • actually replacing the old data.carnegiehall.org site with the new one

Engaging professional support and addressing challenges

In short, we needed professional help to deploy the new site. The happy ending to the story is that we got that help by once again engaging the supreme development skills of Matt Miller, and now we have a new site! The more difficult chapters in between showed us we'd made some noob assumptions about the amount of work it would take to deploy the site as a replacement for our two existing sites (the old data.carnegiehall.org and our old Data Lab site on GitHub Pages).

The first major hurdle Matt pointed out was tied to Heroku, the cloud platform for developers to test, scale, and deploy apps, which hosts data.carnegiehall.org. To keep the framework lightweight, Heroku is ephemeral: when you redeploy, or reboot, it rebuilds the entire stack. Our new site uses Wagtail, a content management system (CMS) that allows you to upload content (e.g., blog images) to the server. But given Heroku's ephemerality, content like images and documents cannot be uploaded to the server and remain persistent over multiple deployments because Heroku will erase the filesystem, deleting any uploaded files. Matt solved this problem for us by using an Amazon Web Services (AWS) S3 bucket to store the static files and media, so it remains external to the site itself and persistent.

The second challenge we didn't fully appreciate was the amount of work involved to replace the existing data.carnegiehall.org application with the new, integrated Data Lab site. In order to retain all the query functionality and dereferencing capability ("About" pages) for our URIs, the applications had to be merged, meaning the resulting site, rather than being simply a merger of two sites, actually contains three applications:

  • an updated version of the "data" application (the SPARQL query interface, all the code that handles the queries, and the dereferencing pages);
  • a "blog" app, with an admin/editing interface for creating and managing blog posts; and
  • a "pages" app for creating data experiments and "lab reports".

Lower down, at a level that wouldn't prevent us from launching, were a few issues we hoped to solve but could live with if we couldn't:

  • The original, tutorial-driven blog site included a sidebar search widget; we weren't able to get this working properly and removed it.
  • We had trouble correctly integrating the mapping capability (e.g. Whose Birthday is Today?). We'd used a custom Python library called Folium that facilitates the integration of Leaflet maps into a site built using Python, like ours. However, using it correctly within the Django template framework was challenging, so instead of drawing the map within an existing page template, so we tried, unsuccessfully, to add the map to the template as an iFrame embed. We could only get the map to render as a separate HTML file, which meant no headers (navigation, etc.) or footers.

Outcome

Matt solved the first two major, structural challenges, without which we couldn't deploy. As for the lower-level issues, he quickly implemented a basic, functional search for the blog posts, repositioned the map as an embedded iFrame, reset new URL paths for each section of the new site, and generally cleaned up the backend folder and site structure.

The end result is a working, stable site tying together all our experimental data presentations and offerings on a single platform that we can manage and update ourselves.

We sincerely hope you enjoy the new site! If you have any questions about the site or how we put it together, or if you'd like to share some friendly constructive criticism, please contact us at archives@carnegiehall.org.

Header image: Construction of Carnegie Hall, 1890. Courtesy of Carnegie Hall Rose Archives and available publicly on the Carnegie Hall Digital Collections.