prj#02

Finding

Fit

05/2025

The Goal

Take an existing piece of art, break it down into its core edges, and let a Reinforcement Learning agent go wild.

The mission? Learn how to reposition those edges into brand new compositions.

By extracting edge shapes from the original image and dropping them into a custom training environment—with rewards for tasteful placement.

This project works with Reinforcement Learning.

Training the Network

I trained a PPO agent using a custom-built ObjectLayout environment.

The job? To take a handful of edge shapes and figure out where to place them—step by step—to hit target locations and earn rewards.

Rewards were simple but strict: get close to the target spot, get rewarded; drift off course, get nada.

The neural net structure I used is shown below: a dual-path network where the actor predicts action probabilities and the critic estimates state value. Both start from a shared input layer, funnel through a couple of Linear + ReLU layers, and then branch out into their respective heads.

While I initially set out to train the network from scratch, reality (and my sanity) nudged me toward using Stable-Baselines3 (SB3). It streamlined experimentation and made it easier to hook into a clean actor-critic architecture.

PPO (Proximal Policy Optimization) was intended to change the rules only a little bit each time, but finding the balance between local and global rewards was not a simple task. The expectation was that the clipped objective function would help here, but it still required a lot of manual iterations on reward system tuning.

This used, by default, GAE (Generalized Advantage Estimation). This computes a smoother estimate called the advantage, which tells the agent: “Was this action better than expected?” (Honestly, I didn’t play with the GAE parameters—I was, by this time, far from my basic knowledge of Reinforcement Learning when I started this prj#2.)

To track progress, I turned to TensorFlow graphs and ChatGPT to help me decipher what was actually going on under the hood. Despite some head-scratching, it became clear the model was learning something—though it had a habit of overgeneralizing and treating distinct target locations as interchangeable.

Reinforcement learning, like art, doesn’t always follow the rules.

Let the Fun Begin

First, I built tools to extract edges and break down reference images into a library of reusable shapes—kind.

Then came the baseline model setup and environment tuning: adding logic and injecting a reward system that could tell good layouts from questionable ones.

I had to teach the agent not just where to move shapes, but also how to avoid crashing into others.

I started exploring multi-goal reward systems—symmetry, spacing, repeated motifs... Turns out, juggling those rewards takes extra care (and a lot more training cycles).

Watching the Chaos (aka Training Visualization)

To make sense of the madness, I visualized the training in color: each shape tagged with its own hue, tracing the path from “what even is this?” to “hey, that almost looks deliberate.”

The screenshot shows a snapshot of those early trajectories—where shapes landed, re-landed, and eventually started to make better decisions.

Recorded a video of the shapes in motion. Watching them dance, nudge, and occasionally crash their way to better composition is oddly satisfying.

Inspiring(?) Output

Using a set of basic shapes and a sprinkle of heuristic logic, I generated a few compositions. Used as input “Kandinsky”, the goal wasn’t to impress but to deconstruct and explore.

So no offense to art lovers—this was more about the journey than the gallery showing.

Because in the end, the most exciting part wasn’t the final image in this case —it was the process of trying to teach a machine how to place things with intention.