Merriam-Webster

Thanks for joining me for another edition of the SerenityThroughSweat blog. This week, I wanted to share another interesting story I found while researching my linguistics project.

The history of dictionaries may seem like a boring subject. You write down words, and you define them. How hard could it be? There are actually a lot of questions that must be answered when deciding how to make a dictionary.

“What is the relationship between words and phrases? How far should a dictionary go in recording nominal phrases? (Fire escape, forest fire)”

“How strictly should a dictionary confine its inventory to recorded usage? Can a spelling form be shared by more than one word (record as a number and record as a verb).”

“How much attention should be paid to etymology? (Weave intransitive vs transitive verb)” Weave in and out of traffic, and Weave clothes on a loom come, from different origin words as an example.

Making a dictionary becomes a little more complex than just a book to check when you don’t trust your scrabble opponent.

One of the most popular dictionaries in the US, is the Merriam-Webster brand. Their story was featured in the chapter I was researching, on the history of lexicography.

“The Merriam dictionaries trace their history back to the American Dictionary of the English Language dutifully compiled by the polemical lexicographer Noah Webster in 1828.  It contains no fewer than 70,000 entries”

“Webster was an indefatigable collector of words with a rare gift for definition writing.”

“Unfortunately,  his etymologies were influenced by his belief that modern languages, including English, are derived from something called Chaldean, which he believed was the language used by Adam and God for their conversations in the Garden of Eden and the immediate precursor to Hebrew.”

“After his death, his successors-including his son-in-law, Chauncey H. Goodrich, and the redoubtable Noah Porter, president of Yale College- quietly abandoned the Chaldaean hypothesis and brought the etymologies into line with the findings of Germanic and Indo-European scholarship.”

That is a lot to unpack for a book that has been mostly superceded by online reference checking. But recall that for generations, the Webster dictionary reigned Supreme. It is eerie to think about how much power definition holds, and how that power was held by a religious fanatic.

I grew up Roman catholic, and considered myself fairly devout until after high-school. Even I had never heard of Chaldean before.

After some very preliminary research it seems that the Chaldean people were in Mesopotamia around 11-12 thousand years ago, and were assimilated into the Babylonians. You may recognize that name from it’s own biblical reference the tower or babel.

Apparently there are multiple references not only in the Bible, but also from other renowned scholars, (Pliny the elder and Cicero) to Chaldean knowledge. There appears to be multiple references to their expertise in astronomy, astrology, vibrations, and numerology.

Some or all of that may be nonsense. I don’t know. And frankly, I don’t know how to know if any of it is real or not. Either way, it is fun to think about next time you have to check the dictionary when your five year old asks the difference between gunk and sludge.

We base our lives on definitions. How we identify ourselves, each other, the occurrences of our day to day experiences, they all depend on agreed upon definitions. The ability to set those definitions is a great power. And, as Uncle Ben would say, with great power comes great responsibility.

Thanks for joining me, stay safe and stay sweaty my friends.

Binary

Thanks for joining me for another edition of the SerenityThroughSweat blog.  While continuing my linguistics research I seem to have taken a fork in the road to information theory.

Sometimes you follow these paths to dead ends. But sometimes, the path leads to somewhere interesting even if it isn’t exactly where you thought you were heading, or needed to go in the first place.

Information theory was pioneered in the 1940’s and 50’s by Claude Shannon. We talked about him a little bit in the post on noise.

One of the ideas that helped kickstart Shannon’s theory, was that of the mathematician and logician George Boole.

George Boole in the laws of thought, explains the way that any question of logic can be turned into math. This is done with conditional statements AND, OR, NOT, and IF, along with an evaluation of if the statement is true 1, or false 0.

Imagine you want to find out how many people in your city are blonde women. The characteristic blonde can be represented by x and female by y. The statements will either be true 1, or false 0. AND would be represented by multiplication •, OR by addition +.

Each data point (person) can then be evaluated by the equations which can be translated easily back and forth between math and plain English.

1•1 = 1 blonde and female. 1•0=0 blonde and male. If you decide you are only concerned with how many women there are, 1+1=1 for the group of blonde women and 0+1=1 for the group of non blonde women.

This foundation laid by Boole in the 19th century set the stage for Shannon and other inventors to build our modern computing era. Boolian algebra would work with electrical circuits laid out either in parallel or in a series to evaluate the data.

Binary implies and either/or, true/false, 1 or 0.  When setting code to evaluate these statements or questions, computation can be accomplished at lightning speeds.

This is why definitions are so important.  As more and more of our world is driven by this binary code, true or false, statements can only be properly evaluated if we have agreed on the definitions.

This is a blessing for our modern information age. Tasks that would require huge amounts of human time and energy, and would be very error prone, can now be automated.

2+2=4. Is the picture of a stop sign.  Are the letters in This scramble grstl.  These can all be assigned yes or no values.  True or false.  And they are very simple examples.  But as we move away from simple examples and in to more complex questions, the binary coding becomes more challenging.

Writing code to evaluate human defined terms is where I want to focus.  The past few years has seen a rise in social media platforms restricting posts in one way or another.

Sometimes this is done by removing the posts entirely. Sometimes it is done by flagging the post, putting some sort of warning, or label, or explanation on it.  Sometimes it is done by adjusting the post’s visibility.

Most of these restrictions are performed at least initially by a computer.  A computer operating in binary.  The post is true or false. It contains misinformation or it doesn’t. It contains banned content or it doesn’t.

This is not a blog post about censorship, those platforms policies, or one specific position over another. It is about the process. The mechanisms behind evaluating posted content.

If these posts are being flagged initially by an algorithm. That algorithm has to be programmed to observe certain characteristics or definitions.

As we saw from the onset, computers are faster and less error prone than humans at binary logic. When it comes to subjective rationalization, not so much.

If misinformation, or objectionable content, or hate speech is clearly defined, and we all agree on the definitions, then a binary logic calculation is magically fast and efficient.

However, if we go all the way back to 1964, to the court case Jacobellis V. Ohio which ultimately ended up in the supreme court, we see the root of the problem.

A movie theater was being sued for showing a movie with a sex scene. As the court case moved it’s way up the legal system to higher and higher courts, each court was unable to successfully define obscenity and pornography.

The problem is summed up well by justice Stewart in the popular legal quote “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.”

If humans “perhaps can never succeed in intelligibly defining” such terms, how can we expect a computer code, written by humans to do so?

Yet this is to a large extent the situation we find ourself in. Whoever controls the definition, and writes the code, establishes the binary. What is tru and what is false.

I have said it before, and I will say it again, words are important. The way we collectively define them is important. Participating in conversations about those definitions is important and everyone has the right to a voice in that conversation.

Thanks for joining me, stay safe and stay sweaty my friends.