March means I've been museum focused for a year. Visiting. Volunteering. Interacting with guests, helping collections management. Classes on the skills and technology applied.
I've learned a lot. The museum approach to helping people learn, and helping people desire to learn, is amazing and deep. Museums do not have captive audiences. They have to market learning, make it desirable. And they do. And I think that data scientists should do so too.
Museums don't do Powerpoint. They don't do five-page whitepapers. They start from dry collections (literally, 30% to 50% humidity, and well documented), cabinets and shelves with row after row after row of almost indistinguishable bugs and maps and art, inscrutable machines and furniture, exotic clothes and natural curiosities hidden in storage. Millions of objects indexed by accession number, specific instances of creation reduced to a database record. All tracked and counter, found and organized, by the registrars - registrars have a whole different level of "object oriented data". Analyzed and understood by the curators, magically pulling meaning and understanding from tens of thousands of specific artifacts, scattered pieces of human and natural creation. Then they tell the stories they found, they draw you in, the make you curious, they make you want to learn. They make a story, and ask you to be the next chapter.
I've processed petabytes, I made a career of building systems that do not blink at understanding millions of rows of data per day. Or per hour. They can react, informing decisions faster than a web page downloading. I curated data, and made models, sometimes predicting, sometimes explaining. The explanatory models were strong, by business standards. But by museum standards? Weak. Boring.
Admittedly the business owners, the clients, they were rapt at stories of what and who and how, as I summarized in meetings the magic of how we identified 10,000, sometimes 100,000 potential new customers for them. They wanted to understand so much, not just how we found them, but what the new people wanted, why I told them there were only so many new customers they did not already have? What did they want to buy right now, but also what is their next buy? What should the client be offering... what were they missing? They heard the story, they thirsted for the story, because they wanted to write the next chapter.
But internally, and for the client's employees that had to act on the discoveries? It was still reports, not stories. This year has highlighted how awful that is.
I met my earlier goal of smoothly processing petabytes and billions of records, clearly defining the system domain and range, and leaving just the interesting, thoughtful decisions for the humans. My new goal? Make the discoveries I find interesting and approachable beyond the highly invested audience, and excite audiences with only passing connections. Make stories, human stories.
Curation is not just selecting, it's understanding, seeing the intertwined stories, and helping people see the storied threads that together build the fabric of understanding.
Data curation and development of data exhibitions: that's the archetype I now wish to fulfill, built on the foundation of data science I have lived.