While reading about traditional Japanese textile design, I stumbled across something called hitozameshi patterns — and I immediately fell down a rabbit hole. These patterns, often found in sashiko stitching and other folk crafts, are built from repeating geometric motifs that follow incredibly simple mathematical rules.

At a glance, hitozameshi patterns look like intricate, hand-drawn patterns. But there’s more going on beneath the surface. The key idea is that they’re generated on a square grid, where lines are drawn between points according to specific rules. These rules are usually randomly generated and yet produce complex structures, often with apparent symmetry.

LLMs are often hailed as being unable to generate true, unique, art. But, could I get an LLM to generate rules for generating a hitozameshi pattern which could be considered artistic.

To test this, I used a simple rule-based approach:

If the letter in the word is a vowel, connect the grid point; if it’s a consonant, don’t.

So to generate a pattern, all I needed from the model was a string of words. The actual visual complexity would emerge from applying this rule to the letters in the generated text.

But then another question emerged — one that sits right at the intersection of computation and creativity:

Can LLMs be truly random?

This is an interesting question because humans are not inherently random, so the corpus of training data is unlikely to contain randomness. It is highly likely that if you asked a model to give just one random number it will continuously give the same number or a very small subsection of numbers. This is because LLMs with the same prompt are mostly repeatable so if you ask a LLM to ‘give a random number between 1 and 100’ the response will always be similar. This is what we seen when we run this experiment for 1000 iterations.

We can see that when needed to only predict one number at a time, 73, 57 and 42 are the highest likelihood answers with a very bad range of numbers actually selected.

So why does an LLM struggle with randomness in the first place?

The answer lies in something called temperature — a parameter that controls the entropy (or unpredictability) of a language model’s output.

When generating text, an LLM predicts the next word (or token) by assigning probabilities to all the possible options. A low temperature (e.g., 0.2) makes the model more deterministic — it will almost always pick the most likely next word. This is great for factual or consistent writing but terrible for randomness. A higher temperature (e.g., 0.8 or 1.0) flattens the probability distribution, making the model more adventurous in its choices. Words with lower probabilities become more likely to be picked, and the results become more varied — and often weirder.

In simple terms:

Low temperature = safe, predictable, repetitive
High temperature = creative, chaotic, more “random”

But even with a high temperature, the model isn't generating true randomness — it’s still drawing from learned patterns in its training data. So if people often use the number 42 when asked for a random number (thanks, Hitchhiker’s Guide to the Galaxy), the model learns that. It’s “random” in the way a human might be: still biased, still influenced by context, just slightly more unpredictable.

So, when we ask an LLM to act randomly, we’re not really asking it to roll an imaginary die — we’re asking it to simulate the kind of randomness it has seen before (which is not random).

The temperature used for these experiments are the stock temperature used by OpenAI which will be fairly low.

If we ask the model to predict 100 numbers at once however we get much better response, possibly due to Openai building this capability directly into the model.

Now, lets look at chat GPTs ability to do random words. I ran the same 2 experiments, first asked the model to give just one random word at a time. As expected the results were similar to the numbers example where the model kept repeating the same words. It should also be noted that the words the model decides to be random are very niche words. This plot shows all word counts above 10.

Next we asked the LLM to produce 100 random words at a time. This gave strange property as the model seemed to give ‘random’ words in alphabetical order. This would mean the range of words is more diverse but the same 26 words seems to come up a lot for each letter of the alphabet. This plot here only shows the words that came up more than 35 times in the data.

To push the randomness further, I added a selection step:

Ask the LLM to generate 100 random words.
Ask it for a random number between 1 and 100.
Use that number to select one of the 100 words.

This added an extra layer of indirection, simulating a kind of model-generated dice roll.

Finally, I selected three words for the horizontal axis and three for the vertical, and used the vowel/consonant rule to determine how to connect points in the grid. The result was a generated hitozameshi pattern based on phrases like:

"pomegranate fur pathway"
"xerus dance quilt"

And this is what that looks like.

These generated patterns hold this weird tension — they feel like they’re almost symmetrical. From a distance, there’s order. But up close, the randomness breaks through, and that illusion of structure unravels. It’s exactly what I find so compelling about hitozameshi: even rules pulled from random, machine-generated words can lead to something strangely satisfying.

Whether or not an LLM can truly be random is still up for debate. But it turns out, it doesn’t need to be. It just needs to be random enough — and then, with the right rules, art starts to emerge.

Predictably Unpredictable-Can LLMs generate “art”

Ollie Olby

Contact

Predictably Unpredictable-Can LLMs generate “art”

Ski Lift Tilt

Ollie Olby

Contact