Claude Shannon – Serenity Through Sweat

Prompt

Thanks for joining me for another edition of the SerenityThroughSweat blog.

AI has been in the news quite a bit recently with the continuing advancement of ChatGPT and the drama surrounding its upper management.

I came across some of the grassroots origin of AI, in the form of computational linguistics, while continuing research on my communications project.

I am far from a subject matter expert on AI, language, or communication, but here is my two cents nonetheless. And, you should take it, we are due for a recession anyway.

Computational linguistics really began as a field before it ever had a chance. By that I mean the right tools for the job hadn’t even been invented yet.

ChatGPT and other Large Language Models, LLM’s, require enormous datasets and computing power. Before the internet, and the personal computer, this meant manual entry and analysis of all those words.

The LLMs function less by looking at the “rules of language”, and more by analyzing the likelihood of what the answer should be based on existing information.

From the analysis on computational linguistics, “Members of the IBM research team flaunted their ignorance of linguistics as if to taunt the other researchers. Fred Jelinek is famously quoted as saying, ‘Every time I fire a linguist from our project, the performance of our system gets better’

How fast can you squeeze out the toothpaste? Now, how fast can you put it back in? Our words work the same way

I think the easiest way to think about these LLM’s is as probability engines. This work was pioneered by Claude Shannon (whose work I have covered in quite a few other posts)

The LLM absorbs and analyzes a huge amount of data. An unimaginable amount of data. Think about reading the entire contents of the internet. Every tweet, every news article, every blog. Then statistically analyzing all those words to look for patterns.

From a previous post covering the work of Shannon, “As Shannon showed, this model also describes the behavior of messages and languages. Whenever we communicate, rules everywhere restrict our freedom to choose the next letter and the next pineapple*” “Because you’re completely aware of those rules, you’ve already recognized that ‘pineapple’ is a transmission error. Given the way the paragraph and the sentence were developing, practically the only word possible in that location was ‘word’ “

A beautiful day for a run on a Montreal layover

When Shannon completed his mathematical theory of communication, the internet wasn’t even a pipe dream, and he did a tremendous amount of work developing the earliest computers.

His theories and ideas, though, would pave the way for how these LLMs operate. They look for patterns by searching and analyzing all of the current written work on a topic. They then recombine words in a statistically viable way to answer questions

You can debate whether or not this constitutes, learning, or understanding, or consciousness, but that’s not really the point. It is here now, in this current form, and it can be an extremely useful tool. It can also spit out unintelligible garbage. So how do you engage with LLMs in a way that is useful and productive?

I think the answer has already been covered in the AI action warning movie Irobot. “My responses are limited, you must ask the right questions”

In this light, the rise of ChatGPT and other LLMs has led to the creation of a new host of jobs, one of which is the prompt engineer.

I first heard about the prompt engineer from episode 556 of the freakonomics podcast.

Prompt engineers discern what it is that their customer wants, and then find a way to effectively communicate that to the LLM.

Asking the right questions, adding the right context and constraints, make all the difference. If you think about it, the same concept applies to communicating with our kids. Or with other adults who may be operating outside their area of expertise.

If you want your five year old to do something, you need to set up some guideraills, and provide clear expectations. If you want a coworker to complete a new task, you need to provide the context and desired outcome, in order to get the finished product you want.

LLMs function much like the very intelligent five year old. You can be amazed what they are able to produce if given the right prompt.

Sometimes, it is hard to know what exactly we want. It is even harder to find the right combination of words to effectively transmit that want to someone else. Asking the right questions, setting the right context and guardrails, can help us in the endeavor. Finding the right prompt, might just lead to some serenity.

Trying to keep the mileage up in my pseudo off-season

Thanks for joining me, stay safe and stay sweaty my friends.

Uncertainty

Thanks for joining me for another edition of the SerenityThroughSweat blog. This week, I want to revisit one of the main language characters we visited a few months back, Claude Shannon.

It’s somewhat odd calling Shannon a language character. I was first introduced to his world reading about lifespan and longevity. His work was used as an analogy to demonstrate a point, even though he did do some work in genetics.

Shannon was a mathematician, an engineer, a teacher, and a tinkerer. He is considered the founder of the modern technology age. He did work in World War II on code breaking, and on communication, but he was not a language person in the way we typically think about language.

Shannon was more concerned with the idea of transmitting and receiving messages, more so than actually constructing them. (As language folks tend to obsess over).

Shannon’s breakthrough work was the mathematical theory of communication, which broke down sending information digitally. I’m not a mathematician. Most of the original work (which I purchased) is gibberish to me. But, I can understand the concept, and it is profound in its breakdown of communication to an elemental level.

I talked about one of the aspects of his world in a post from last November (Noise). But this week I wanted to talk about uncertainty.

Shannon starts with the idea of flipping a coin. The outcome is either head or tails. This communicates to us a binary choice. The answer to the question, what happened in the coin flip, can be be expressed as a binary digit or ‘bit, one of two options.(yes the bit you are familiar with if you’ve used any computer technology in the last 40 years is Shannon’s idea from the 60’s)

Shannon quickly noted though, that the coin flip is perfectly random, unless the coin is weighted. In which case one outcome is more likely than another.

He then went on to show (all of this mathematically of course) that most of our communication is very heavily weighted. Because of our rules of grammer, syntax, phonology, and morphology, the next letter and the next word is highly dependent on the one that precedes it.

This was a highly useful realization and skill when Shannon was working in cryptography as a code breaker, but I think it means a lot to us as everyday communicators.

“for the vast bulk of messages, in fact, symbols do not behave like fair coins. The symbol that is sent now depends, in important and predictable ways, on the symbol that was just sent: one symbol has pull in the next.”

“As Shannon showed, this model also describes the behavior of messages and languages. Whenever we communicate, rules everywhere restrict our freedom to choose the next letter and the next pineapple*” “Because you’re completely aware of those rules, you’ve already recognized that ‘pineapple’ is a transmission error. Given the way the paragraph and the sentence were developing, practically the only word possible in that location was ‘word’ “

So much of what we say is predetermined, by custom, by ritual, by routine. When it is time to actually say something outside the norm, it is easy to falter. To struggle to find the right words.

As I mentioned earlier, Shannon was an engineer. He was concerned with designing a system to effectively and efficiently transmit messages. In pursuit of solving that problem, he taught us a valuable lesson about constructing messages.

“what does information really measure? It measures the uncertainty we overcome. It measures our chances of learning something we haven’t yet learned. Or, more specifically, the amount of information something carries reflects the reduction in uncertainty about the object”

“Why doesn’t anyone say XFOML RXKHRJDFJUJ? Investigating that question made clear that our “freedom of speech” is mostly an illusion: it comes from an impoverished understanding of freedom. Freer communicators than us, free of course in the sense of uncertainty and information, would say XFOML RXKHRJDFJUJ. But in reality, the vast bulk of possible messages have already been eliminated for us before we use a sentence or write a line.”

If information reflects the reduction in uncertainty, that should be one of, if not the primary focus of our communications. Especially those novel ones that break from ritual and routine.

Think about 20 different people practicing basketball individually on a court. There are bound to be some collisions, some balls bouncing off each other at the rim, and maybe even some injuries. An aviation training area can be very similar. Multiple individuals, in a confined area, with different agendas.

In aviation, we make position reports both procedurally in certain airspace, and in high volume uncontrolled areas. Those reports need to resolve a lot of uncertainty in order to avoid disaster. A good formula is who you are, where you are, and what your intentions are.

If you know John is working on 3 pointers from the corner, and Phil is practicing layups, you can now decide how and where you want to practice, without disturbing, or being disturbed by, the fellow ballers. A tremendous amount of uncertainty has been resolved. That is valuable information. Much more concrete and actionable than, John and Phil are playing baskstball.

Last few runs before returning to the Florida heat

This, of course, is a task much easier said than done. To make all, or even most, of our messages precise enough to overcome the maximum amount of uncertainty, requires a novel concept. Thinking before we speak.

What information do I have? What information does the receiver of the message need. What do they expect to hear? What uncertainty needs to be overcome?

There is no shortage of uncertainty in our world. Overcoming even a small amount of it will lead to happier humans. And I’m sure there is serenity to be found along the way.

Thanks for joining me, stay safe and stay sweaty my friends.

Binary

Thanks for joining me for another edition of the SerenityThroughSweat blog. While continuing my linguistics research I seem to have taken a fork in the road to information theory.

Sometimes you follow these paths to dead ends. But sometimes, the path leads to somewhere interesting even if it isn’t exactly where you thought you were heading, or needed to go in the first place.

Information theory was pioneered in the 1940’s and 50’s by Claude Shannon. We talked about him a little bit in the post on noise.

One of the ideas that helped kickstart Shannon’s theory, was that of the mathematician and logician George Boole.

George Boole in the laws of thought, explains the way that any question of logic can be turned into math. This is done with conditional statements AND, OR, NOT, and IF, along with an evaluation of if the statement is true 1, or false 0.

Imagine you want to find out how many people in your city are blonde women. The characteristic blonde can be represented by x and female by y. The statements will either be true 1, or false 0. AND would be represented by multiplication •, OR by addition +.

Each data point (person) can then be evaluated by the equations which can be translated easily back and forth between math and plain English.

1•1 = 1 blonde and female. 1•0=0 blonde and male. If you decide you are only concerned with how many women there are, 1+1=1 for the group of blonde women and 0+1=1 for the group of non blonde women.

This foundation laid by Boole in the 19th century set the stage for Shannon and other inventors to build our modern computing era. Boolian algebra would work with electrical circuits laid out either in parallel or in a series to evaluate the data.

Binary implies and either/or, true/false, 1 or 0. When setting code to evaluate these statements or questions, computation can be accomplished at lightning speeds.

This is why definitions are so important. As more and more of our world is driven by this binary code, true or false, statements can only be properly evaluated if we have agreed on the definitions.

This is a blessing for our modern information age. Tasks that would require huge amounts of human time and energy, and would be very error prone, can now be automated.

2+2=4. Is the picture of a stop sign. Are the letters in This scramble grstl. These can all be assigned yes or no values. True or false. And they are very simple examples. But as we move away from simple examples and in to more complex questions, the binary coding becomes more challenging.

Writing code to evaluate human defined terms is where I want to focus. The past few years has seen a rise in social media platforms restricting posts in one way or another.

Sometimes this is done by removing the posts entirely. Sometimes it is done by flagging the post, putting some sort of warning, or label, or explanation on it. Sometimes it is done by adjusting the post’s visibility.

Most of these restrictions are performed at least initially by a computer. A computer operating in binary. The post is true or false. It contains misinformation or it doesn’t. It contains banned content or it doesn’t.

This is not a blog post about censorship, those platforms policies, or one specific position over another. It is about the process. The mechanisms behind evaluating posted content.

If these posts are being flagged initially by an algorithm. That algorithm has to be programmed to observe certain characteristics or definitions.

As we saw from the onset, computers are faster and less error prone than humans at binary logic. When it comes to subjective rationalization, not so much.

If misinformation, or objectionable content, or hate speech is clearly defined, and we all agree on the definitions, then a binary logic calculation is magically fast and efficient.

However, if we go all the way back to 1964, to the court case Jacobellis V. Ohio which ultimately ended up in the supreme court, we see the root of the problem.

A movie theater was being sued for showing a movie with a sex scene. As the court case moved it’s way up the legal system to higher and higher courts, each court was unable to successfully define obscenity and pornography.

The problem is summed up well by justice Stewart in the popular legal quote “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.”

If humans “perhaps can never succeed in intelligibly defining” such terms, how can we expect a computer code, written by humans to do so?

Yet this is to a large extent the situation we find ourself in. Whoever controls the definition, and writes the code, establishes the binary. What is tru and what is false.

I have said it before, and I will say it again, words are important. The way we collectively define them is important. Participating in conversations about those definitions is important and everyone has the right to a voice in that conversation.

Thanks for joining me, stay safe and stay sweaty my friends.

Noise

Thanks for joining me for another edition of the SerenityThroughSweat blog. This week I want to talk about noise.

Maybe not in the typical sense that we think of it. There are different types of noise, and they all play a part in disrupting not only effective communication, but our general happiness and even our health.

I found the idea of noise disrupting our health in the book Lifespan by Dr David Sinclair. Dr Sinclair’s message condensed down to an elevator pitch, is that ageing is a disease that can be treated, halted, and even potentially reversed.

A significant part of ageing is noise in the communication between our genes and our cells. Minimizing that noise, and ensuring genes and cells effectively communicate, keeps cells healthy, operating properly, and young.

Dr. Sinclair goes on to quote Claude Shannon, one of the founding fathers of information theory back from 1948.

Shannon’s noisy channel coding theorem, says that “however contaminated with noise interference a communication channel may be, it is possible to communicate digital data error free up to a given maximum rate through the channel. (a mathematical theory of communication, 1948)

Dr Sinclair uses this theory of information transfer as an example for how our genes and cells communicate, as well as what we can do to minimize the noise, thus maximizing the error free data transfer (effective communication)

This got me thinking about the types of noise we experience in interpersonal communications, some of which I recognized without knowing they had their own specific domains. Physiological, physical, psychological, and semantic noise all play their own part in disruption.

Physiological noise refers to anything going on within our personal body that might hinder communication. This could be a headache, hunger, fatigue or other physiological conditions. Think those Snickers commercials. Why don’t you have a Snickers, you don’t listen so well when you’re hungry.

Physical noise refers to disruptions that are physical in nature but external to the receiver. Think headset/radio/phone malfunction, a crowded room, or even a bright and distracting light.

Psychological noise refers to disruptions that are internal to the receivers thought process. If you are preoccupied with another problem, or day dreaming instead of listening that would be psychological noise.

Finally semantic noise is a misunderstanding of words between the sender and receiver. This could be due to lack of shared knowledge, language barrier, or cultural differences.

There is no shortage of barriers to effective communication. There is always some noise present, and often there is a lot of it. The constant noise we live with, makes determining Shannon’s maximum error free data transfer rate a crucial piece of information to know and apply.

Staying at or below the applicable Shannon rate for a given exchange will ensure the message is transmitted effectively. If you have ever had a conversation at a loud concert, with a foreign speaker, a toddler, or someone with a bad hangover, you already understand self limiting your rate of data transfer through the given channel. (If you’ve ever been the hungover one this is greatly appreciated)

Taking account of the noise around us, and the overall capacity of our channels of communication is a demanding and everpresent task. One that helps pave the path to serenity.

Thanks for joining me, stay safe and stay sweaty my friends.