Featured Post

Rest

 I hope that everybody in the world gets their infinite moment of respite today. 

Saturday, September 23, 2017

Learning hyperplanes

You don't need very many points to learn a lot! Suppose you have a given "experience space". 
Each point admits a radius of applicability -- an experience allows you to extend it to some slightly different test case of some radius of difference e. 

But say you're given two points far away. That means you can probably interpolate between these two points to construct a "line" covering a large number of test cases. And each point on this line also has a radius of applicability, so we have essentially a thick line. We can also extrapolate -- extend points in different directions. 

Example: Suppose you have no idea what "taste" is like. You taste coffee, and you only understand things sufficiently "like" coffee. Then perhaps you taste hot chocolate. You interpolate between the two points, so you can now "recognize" things like frappuccinos and other sweet drinks.
But further you can extrapolate to the extremes, now that you understand that coffee is more "bitter" than hot chocolate, so you can extend from coffee to even more bitter things, like "espresso". You might go the opposite way from hot chocolate, to sweeter drinks. However you can't yet imagine things like fruit juice -- it's an additional dimension (sour). Of course it is a fallacy to assume that the natural-seeming taste basis -- sweet, sour, salty, bitter, spicy -- is the only one. Perhaps we can do some sort of principal components analysis. It might also be that our brains are structured in a way so that upon tasting coffee and hot chocolate, we choose this natural basis to compare the two: i.e. we break it down and say "coffee is more bitter than hot chocolate". Either way, we can at least state that we cannot find a basis vector for which coffee and hot chocolate's projection onto that vector is actually different. For example, since coffee is about as "sour" as hot chocolate, we can't exactly learn "sourness". More precisely: fruit juice will come as a complete surprise to someone who has tasted only hot chocolate and coffee.

It may seem like we're defining "learning" or "covering" as "not being surprised when we encounter it", but we can actually extend this kind of analysis to different examples of "learning". For example, with people: "learning some point (x,y,z)" in this case might mean "knowing how to act around a person with personality values (x,y,z)". 

Let us rigorously analyze how it is possible we can construct "lines". It's quite simple: we follow the same analysis as we do in geometry with vectors. As we take the vector defined by the two points and add copies of it to one of our points, we might take "that quality that differentiates coffee from hot chocolate" and "add it to coffee several times" to get "espresso". The picture is kind of like: 

HOTCHOC ---------------> COFFEE 
Then just copy/extend:
HOTCHOC ---------------> COFFEE -----------------> ESPRESSO
Of course we're essentially doing x = t(v_0) + COFFEE, where t = 2 and x = ESPRESSO, but we let t vary on some continuum and we get a line. 

Again it is important to note that in fact we have more than a line -- we have  "thick line", since around every point we have a radius of applicability. So it's more like a tube. 

An aside: Obviously the sort of thinking wherein you consider events/objects/etc. as tuples is pretty common in fields such as machine learning, but I'd like to point out another way it illuminates a particular phenomena. 
As in music theory and learning, we notice that music was done perfectly well far before any real music theory was developed (I mentioned this in a previous post). If we model this in terms of a vector space, when humans developed music, we were discovering, say, some particular subset of a space. When we developed music theory, we stepped back at our subset and performed some principal components analysis to break it down into simpler parts. Again it is tempting to claim that the eigenvectors given by such a method are somehow canonical or natural in the sense that they ARE why we developed music a certain way, but that's not necessarily so!
For analogy, we have seen that there are several symmetric ways to look at an n-gon, and there are several bases for a vector space that are sometimes more convenient than others in particular cases. (In more mathematical detail, there are conjugate transformations that express a particular reflection along another line of symmetry in terms of the original reflection and a "change of coordinates, just as there is a change of coordinates that can express a linear transformation as a simple shear transformation in a vector space.) In the same way, there might be a another nice basis for music that perhaps more clearly explains why we call certain things music (i.e. they belong to the subset, i.e. we "recognize" it) and other things not (i.e. they don't belong to the subset) -- i.e. it might be more natural. Or perhaps there is no real "basis" -- there's some other mechanism (whim?) that causes us to identify what's music and what's not. Nobody says that we identify music by projecting the item onto some vector subspace. 
Anyways, we can probably extend this to more than problems concerning "recognizing" -- a lot of different skills: musical instruments, sports, etc. consist of a family of individual micro-skills that are often after-the-fact distilled down to core skills and given structure. Mathematics itself is built the same way -- the clean organizations and categorization of subfields didn't always exist, nor do they need to! 

Appendix:
Define "recognize": Essentially tasting them will be "familiar" -- won't feel new -- you can construct it as a point on the line. 
Define "radius of applicability": A point has a "radius of applicability". If another point (i.e. experience) is sufficiently "similar" to our "learned" point in that we may also say this new point is roughly "learned", then we say that this new point is within the first point's "radius of applicability". We could hypothetically also quantify "roughly learned" with some finer model, but we want to begin with a simple model.