The team here at Carnegie Hall Data Lab is thrilled to share with you our new and improved Data Lab website! This new platform combines our old data.carnegiehall.org site, which offered basic data querying and URI dereferencing ("About" pages for our Events, Names, etc., URIs), and our previous Data Lab site, which was hosted on GitHub and offered blog posts and data experiments.
The new site sprang from the intense pressure and uncertainty of the early days of the Covid-19 pandemic. (We discuss some of the challenges forced upon us by the closure of Carnegie Hall in our previous post.) Out of a desire to keep some momentum in the Data Lab during the closure, and in keeping with our DIY ethos, we decided to learn more about the inner workings of our existing data.carnegiehall.org application, with the hope of implementing some improvements and updates on our own. In general, we wanted to better integrate our public-facing LOD activities.
So, how did we start with wanting to learn how to self-manage our existing data endpoint and end up with a completely new website? When we launched the Data Lab in late January, 2020, we decided to publish our website – consisting then of blog posts and data experiments – using GitHub Pages, a service that turns Markdown files into a website and hosts them for free on the internet. It was simple, quick, and best of all for small operations like ours with limited budget, tech skills, and support, free. The setup served us well for 18 months, but it had two significant drawbacks:
The second point wasn't an issue for experiments like our recent CH's Rock Explosion of 1971-72, which presents a fixed and unchanging set of data. Experiments like Whose Birthday is Today?, with its reliance on active queries for data that changes daily, weren't possible on GitHub Pages. We mitigated this to an extent by querying Wikidata, taking advantage of the Carnegie Hall Agent ID (P4101) property and the map display template on the Wikidata Query Service to create an iFrame embed for the resulting map, but this limited us to those performers and composers from our history that both have existing Wikidata items, and which we'd successfully aligned with Wikidata – only around 17,000 out of the more than 100,000 names in our performance data.
The original data.carnegiehall.org site, quickly built for us by our friend and colleague Matt Miller when we first published Carnegie Hall's performance history as linked open data in 2017, used a platform called Django. The Django folks describe it as: "a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source." Since we already had a decent working knowledge of Python, we decided that learning how to update and improve the site (by learning Django) might be within our grasp. Needing a distraction during those disorienting, frightening early days of the pandemic, what began with following a few Django tutorials ended, two months and many rabbit holes later, with a prototype for a new Data Lab website, combining a blog, data experiments, and a new SPARQL query engine all in one package.
As Alexander Pope said, a little knowledge is a dangerous thing. While we had enough chops to hack together a working local prototype for a new site, there was a substantial gulf between our capabilities and what it would take to make a production-ready, fully deployable version of that site. A short list of items in that gulf included:
In short, we needed professional help to deploy the new site. The happy ending to the story is that we got that help by once again engaging the supreme development skills of Matt Miller, and now we have a new site! The more difficult chapters in between showed us we'd made some noob assumptions about the amount of work it would take to deploy the site as a replacement for our two existing sites (the old data.carnegiehall.org and our old Data Lab site on GitHub Pages).
The first major hurdle Matt pointed out was tied to Heroku, the cloud platform for developers to test, scale, and deploy apps, which hosts data.carnegiehall.org. To keep the framework lightweight, Heroku is ephemeral: when you redeploy, or reboot, it rebuilds the entire stack. Our new site uses Wagtail, a content management system (CMS) that allows you to upload content (e.g., blog images) to the server. But given Heroku's ephemerality, content like images and documents cannot be uploaded to the server and remain persistent over multiple deployments because Heroku will erase the filesystem, deleting any uploaded files. Matt solved this problem for us by using an Amazon Web Services (AWS) S3 bucket to store the static files and media, so it remains external to the site itself and persistent.
The second challenge we didn't fully appreciate was the amount of work involved to replace the existing data.carnegiehall.org application with the new, integrated Data Lab site. In order to retain all the query functionality and dereferencing capability ("About" pages) for our URIs, the applications had to be merged, meaning the resulting site, rather than being simply a merger of two sites, actually contains three applications:
Lower down, at a level that wouldn't prevent us from launching, were a few issues we hoped to solve but could live with if we couldn't:
Matt solved the first two major, structural challenges, without which we couldn't deploy. As for the lower-level issues, he quickly implemented a basic, functional search for the blog posts, repositioned the map as an embedded iFrame, reset new URL paths for each section of the new site, and generally cleaned up the backend folder and site structure.
The end result is a working, stable site tying together all our experimental data presentations and offerings on a single platform that we can manage and update ourselves.
We sincerely hope you enjoy the new site! If you have any questions about the site or how we put it together, or if you'd like to share some friendly constructive criticism, please contact us at archives@carnegiehall.org.
Header image: Construction of Carnegie Hall, 1890. Courtesy of Carnegie Hall Rose Archives and available publicly on the Carnegie Hall Digital Collections.