Featured Post

Rest

 I hope that everybody in the world gets their infinite moment of respite today. 

Monday, January 16, 2017

Optimizing rote memorization

My grandfather has been working at the DFW airport as a janitor for years. Recently, his coworkers all got a 75 cent raise, up to $10 from $9.25. There was one caveat: you have to pass what seems to be a written exam on airport security (the questions relate to TSA policy and the like). The only problem is, my grandpa doesn't speak English, so even new employees are currently being paid more than he is. What makes this situation worse is that the exam is taken on a computer -- my grandpa hardly knows how to move the mouse.

However, we have something we can work with: a list of handwritten questions and answers. My grandpa requested that I print it out with bigger font so he can read it.
But I want to take it further: How can I create a study plan so as to guarantee his success?

We have the following facts:
1. He cannot understand written English.
2. He can, however, recognize English letters and the sounds they make.
3. We have a list of questions, paired with answers (TRUE or FALSE).
4. He hasn't been in school for a really long time.
5. He would have no trouble understanding this if it were in Korean.

Let us examine the list. First, an obvious approach would be to memorize each question and its corresponding answer. But we immediately see that this isn't necessary: each question can be identified with less than its full length. The question now is: is there a uniform minimal "key length" on how many words he should memorize from each question? For example, suppose we are given the following sentences:

· Bob is a chef.
· Bob is not a chef.
· Alice is a chef.
· Alice is not a chef.
Notice that memorizing only the first word of each question is not sufficient to distinguish the questions, since 2 of them start with "Bob". We quickly see that 2 is not enough either, but 3 works -- so 3 is our key length.
After we establish the key length, we need only to learn to associate the keys -- "Bob is a" is one in this example -- with the correct answer (in our case, TRUE or FALSE).

The only problem is, the set of questions we are working with has a minimal key length of almost the entire length of some of the sentences! So here is the situation:
· Bob is a chef.
· Bob is not a chef.
· Alice is a chef.
· Alice is not a chef.
· Carol is an absolutely fantastic chef who lives in New York.
· Carol is an absolutely fantastic chef who lives in New Jersey.
Notice that our minimal key length here would be 10 due to the last two sentences. This means we would be memorizing our first four sentences.
But naturally, would quickly realize that they should memorize the first 3 words for the first 4 sentences and the last word for the 5th and 6th. In other words, the uniform key length restriction is not only inefficient, but also artificial.
Once we throw off the restriction, we need a method to generate keys for each sentence. A key must be unique and effective: that is, there needs to be a fast, human-computable function that takes a sentence and yields a unique key. So essentially we have compressed our data.

If we restrict our range to "substrings starting from the beginning or the end", an example mapping is not hard to figure out:

·  Bob is a chef.  => Bob is a
·  Bob is not a chef. => Bob is not
·  Alice is a chef. => Alice is a
·  Alice is not a chef. => Alice is not
·  Carol is an absolutely fantastic chef who lives in New York. => York
·  Carol is an absolutely fantastic chef who lives in New Jersey. => Jersey

The corresponding method to compute this map would be to start from the beginning, try to recognize a substring, and if you fail, start from the end and try to recognize one.

The great benefit of using substrings as keys is that the key actually exists inside of the sentence already, so computing the function is reduced to a recognition task – once you spot the substring in the input, you output it.

[An aside: Human recall/recognition is what I like to call "partially fast" -- if it exists in our memory vault, we're usually pretty quick to recognize it, but if it isn't, the process is a lot less reliable*. We can model this with an associative array, or hash table: lookup is usually really fast, but if it doesn't exist we might have to search through countless locations to verify this. And that’s exactly the thing: we’re bad at searching, since we don’t have location-addressable memory. We can see this effect with test taking: multiple choice is far easier than free-response tasks, since the latter requires searching through a swath of potentially relevant information. Whether we retrieve the proper information depends on whether or not the information has proper associative links (i.e., X reminds me of Y which is related to Z, so I recall Z), and whether our mind has enough time, energy, or luck to traverse the right connections or not. On the other hand, autoassociative tasks are easy as long as the signal is sufficiently clear.]

But we have pigeonholed ourselves into only learning by syntactic methods. There are various other, more natural methods, including but not limited to: sounding out the sentences, mapping the sounds to the correct answer, or interpreting the sentences semantically. The benefit to sounding out the words is that my grandpa does know how to rudimentarily translate groups of letters to sounds, so the function is already there. And sounds tend to be more familiar, so this might help with memorization.

My grandpa also suggested a method: he wants translation of all the sentences to Korean, along with the sounded-out phonetic transliteration. There are two ways we can implement this. The first way would be to take an entire sentence and map it to the meaning. However, to me this is horribly inefficient – if you can map it to a meaning, why not map it to the binary value that we need in the first place—TRUE or FALSE!
The second way makes a little more sense. You take some significant words of each sentence – in our example it might be “chef”, “Bob”, “Alice”, “fantastic”, etc. – and map these to the meaning. This is essentially dimensionality reduction – each sentence is considered as the sum of a few special “parts” – in this case a small set of important words which we will have him learn. The caveat is when the sentences don’t really share much of a common basis – then memorizing the words and their meaning becomes an extra layer of inefficiency. So this method is a little situational.

When the student speaks of “understanding”, dimensionality reduction may be what he/she is trying to do. Memorization of independent facts, or even just strings, is no intellectual challenge, but is time-consuming. So the mind wants for a coherent system, one that is smaller than the initially presented, overwhelming batch of apparently true sentences. Surely there must be a simple framework upon which all of this was derived, the brain thinks.
But perhaps it’s not about smallness, it’s there’s also this factor of familiarity. We are already familiar with a space of countless principles and ideas – so can we “embed” these new concepts as a subspace thereof?

In summary, when it comes to choosing a method for a memorization task, we have to keep two things in mind:
Dimensionality: Are the objects we are memorizing (sounds, words) complex or simple? That is, how large is the smallest set of objects (basis) that form the original set by combinations? For instance, our example with Alice, Bob, and Carol turned out to have a fairly low dimensionality, since for our problem we only had to recognize a few key words.
Familiarity: Are the objects we are memorizing familiar to us – i.e. do the objects or related objects already exist in memory to any extent?
For example, if we substitute every letter in the English language sentence by some unique hieroglyph or code number, although the dimensionality stays the same (we’re just doing a one-to-one transformation), but familiarity is drastically reduced. It is probably easier for most to memorize “fnefhew” than “5 4 1 5 9 1 8”, so it might be easier to learn the encoding and then decode each number into a letter first, especially if you have lots of these numbers. This is probably why sounding out the words works as a method, and how mnemonics actually help you memorize things despite often being longer than the original object. In the same way, memorizing sentences by sound seems to be advantageous.


*Although we seem quick to declare "I don't recognize this", this is not the same as saying "I have never seen it". One may sometimes not recognize a person's face until much later.