About Me

My photo
Data scientist, steward of wildlands and stories.

Story Characters Before Analysis: a Data Story Mistake

I saw some advice on "How to Craft a Compelling Data Narrative," and it was - I'll call it misguided. It put story craft before data analysis. Specifically: "Characters: [examples including specific customer groups omitted] This doesn’t need to be part of your presentation, but you should define the key players for yourself beforehand." This will bias and weaken your analysis. It will make your story boring.

So let's talk about how to do better analysis, and tell a more powerful story.

There are two analytic processes that identify characters. Structural analysis (formerly a "systems analyst" job, back in the white-shirt-black-tie/ankle-skirt-and-sweater era of computing) of process to identify process roles. Then classification/clustering of the actors within each role. "Define key players beforehand" might have been intended to be about structural analysis, since it guides the data gathered (and data storage model), and the analytical methods used. But it's not what a busy executive would hear.

In ordinary language, picking "characters" would mean something like "pick audience segments that you can imagine." And human imagination is usually bound by the experience of the individuals imagining. A good analyst tries to avoid such limitations. This improves the analysis, and makes it better illuminate the dimly understood corners of the real world.

Analytically, and for stories, the classification and clustering is most powerful when based on providing the most information in the fewest classes/characters. Because of this, I have a taste for using prior-experience/following-behavior data, and then doing factor extraction or EM modeling. These both have a goal of telling the most story (explaining the most data variation) with the fewest groups possible, by allowing the groups to internally vary in the "unimportant" ways. In comparison, K-means has no "unimportant" variation: shoe size and wallet size are treated the same by the algorithm. Neural nets emphasize the variations and groups that were emphasized by the trainer: this looks fantastic on everything already understood, but the dimly understood corners are filled with training biases and model "hallucinations," not so different than a person trying to understand someone foreign to them.

As a story... if you pick characters you already know, you will get character interaction that you already know, like a stale sitcom. A new character - especially one representing a new "factor" of behavior or experience, increases our understanding, brightens the light of our understanding, and clarifies our view of the world. (But they need to be relatable - "18-24 year old environmentalist" is not a character, "Sam, green-haired store clerk, door dash driver, and Audubon member living in a shared apartment downtown" is a character that _might_ be representative. Check your data.)

I will give credit at least the article I'm picking on remembered that stories have characters. I have read other "data story" advice that amounts to little more than "write a speech with simple summaries and present data visualizations while telling it," and/or boldly claimed that infographics are stories. Bullet points are not a story. It can be effective communication, to the right audience..

Leaders need a broad and clear understanding to make the best decisions, and have the broadest impact (or capture the most market). Don't let your data stories limit them.


References: