The Goal
To create a canonical representation following music annotations as a foundation. The project explores different ways to annotate raw stories and derives new versions dynamically through configurable settings.
Different prompts extract different Structured Data schema predefined (internally represented as Json)
This project combines AI prompting and procedural development, with a bit of Logistic Regression testing.
Version 00.0: Canonical breakdown
Each scene is broken down to represent key elements such as actors, actions, emotions, locations, and world objects.
The image below illustrates how AI prompting is used to extract structured components from the "Cinderella" tale. Pre-defined Rules are then applied to map these components into a structured representation (labels are shown for reference).
Version 00.1: Inspired by Roman Haubensotck-Ramati pattern
Just as Haubenstock-Ramati painted intricate patterns, a new Rule has been defined to generate elements inspired by his work (minus the canvas).
The image below builds on the previous Pipeline, incorporating the "Ramati" Rule to process text from "The Little Pigs" and add a fresh creative layer.

Below image builds is a view from "Cinderella"

And finally, this is how "The Little Pigs" starts:
"Once upon a time there was an old mother pig who had three little pigs and not enough food to feed them. So when they were old enough, she sent them out into the world to seek their fortunes ..."
From here, prompts extracts Structured data.
And piece of a news article published in New York 2024
Version 00.2: Randomizing the Canonical
So far Rule has pre-defined mapping, let's randomize things up for fun.

Version 00.3: Logistic Regression
Tried to train a model to learn from the Canonical representation and generate a Rule that predicts element types and locations using AI — goodbye pre-mapped Rules.
Used a multi-variable dataset with a PCA to pre-process source variables.
I tried optimizing it using a RandomForestRegressor and played around with different models (Linear, Decision Tree, Random Forest), but the performance still wasn’t perfect. Guess I need to feed it more data and a bit more and a bit more patience. 😅
Anyway, this is the model trying to predict types and element from same "Cinderella" text.
