I guess wefve started. Let me start with a couple of announcements. The first is I guess I should remind you that Homework 1 is due today at 5:00. Several people have asked what they should do with their homework, and I should have said this maybe on the first day, although I guess its most relevant now. One thing you should not do is give it to me. I am highly unreliable. I will accept it by the way, thatfs why Ifm warning you. Youfll give it to me. Itfll be in my hand. By the time I get back to Packard, who knows where it will be? The TAs are more reliable. You can give your homework to a TA if you personally know them and trust them. But really, the correct I.O. method is that there is a box; itfs actually a file cabinet; itfs explained on the website. But itfs near my office in Packard. Thatfs the right place where you should submit the homework. Okay, the next is that we have gotten more than just a handful of sort of requests or suggestions to do something like a linear algebra or matrix review session. Thatfs for people who either never had a detailed course on this or have successfully repressed the memories of the course they may have taken. So for those people, the TAfs and I have been talking about it, and wefll probably do something. It wonft be a formal session. It might simply be one of the office hours. One block of office hours will simply be devoted to this topic. And we would of course announce that by email and on the website. So that would sort of be the idea. Just out of curiosity, by a show of hands, if such a session were to be put together, how many people would go to it? Wefll have a session, that simple. Thatfs what wefll do. Okay, any questions about last time? If not, wefll continue. So, last time we started looking at this idea of the null space of a matrix. Now the null space is a subset. Itfs a set of vectors, itfs a subspace, in fact, as its name implies. And it is a subset of what I would call a dimension of the input dimension. Thatfs what it is. So, itfs the vectors that are mapped to zero. Now what it gives is, this gives exactly the ambiguity in solving Y = AX. So, it tells you exactly what you know and donft know when I give you Y, about X when I give you Y in Y= AX. In particular, anything in the null space is going to map to zero. So for example, in a sensor or estimation problem, X corresponds to some pattern or input for which your sensors all read zero. So for all you know, itfs either zero or itfs something in the null space. If the only point of the null space is zero, then that means thatfs the only way that can happen. This is what it means. Now in a design problem or in a control problem something in the null space is a very interesting thing. Itfs something like this; itfs something that has, you might say, zero net effect. So in the case of the mass problem, it would be a sequence of forces were non-zero, so the mass goes flying off to the left, right, goes all over the place. But the point is that N seconds later, it is back at the origin with zero velocity. So that means that basically nothing happened. Okay? So thatfs the idea. Now one case of great interest is when the null space -- I canft see if -- Itfs not me but thatfs twisted at an angle, so maybe you guys can rotate your camera there a little bit. There we go. Too much. There we go. Actually, thatfs still too much. Ifll let you guys figure it out back there. So you say that you have zero null space, thatfs the slang. That means, simply, not that the null space is zero. The null space canft be zero. The null space, the data type is itfs a set of vectors. So when you say its zero, you mean itfs a set whose only element is the zero vector. But the slang is the null space is zero. And some people call this 1:1, and what it means is this, it is the same as saying that X can always be uniquely determined from Y = AX. It basically says when does the transformation A not lose information? And 1:1, of course, means that different Xfs get mapped to different Yfs. That actually, literally, what 1:1 means. Otherwise you might call it many to one, would be the other way. In fact, one too many is not a function, because the definition of a function is that only one thing comes out. Another way to say this is that the columns of A are independent. So when you say something like zero null space, youfre really just saying that columns of A are independent. So, itfs two different languages for exactly the same thing. And you can check that. Why does it mean that? Because when you say AX = 0 implies X = 0, AX, one interpretation of AX is itfs a linear combination of the columns of A where the Xfs give you the recipe, the coefficients. So basically it says that the only way to add up a linear combination of the columns and get zero, the only way, is if actually the recipe is trivial, if all the Xfs are zero. Thatfs the same as saying the columns are independent. Okay, now this part, everything up here is easy to check, or itfs basically just a restatement of -- I mean, therefs nothing to it. This is not obvious, and so a few things here are not going to be obvious, Ifm not going to show them now. Ifll state them as unproven fact, later wefll come back and youfll be able to show all these things. This is an unproven fact. It says if A has zero null space, if and only if it has a left inverse. Now a left inverse is a matrix B, which when multiplied on the left -- Sorry, when it multiplies A on the left it gives you the identity. Thatfs what BA is. Now you may or may not have seen this. You have probably seen the idea of a matrix inverse, but maybe not a left inverse. Actually, how many people have even heard of the concept of a left inverse for a non-square matrix? Thatfs good because it used to be basically no one. So a left inverse means you multiple on the left and you get I. By the way, the meaning of the left inverse is actually, itfs great, itfs actually a decoder is what it is. Because if BA = I, Y = AX then if you multiple by B, what you get is you get X. So XB, when you see a left inverse, it means it is a decoder or in communications, itfs a perfect equalizer. In any kind of signal processing itfs a perfect deconvolver or something like that. Thatfs what a left inverse is. Now whatfs interesting about this is that, sort of, of course, is a very strong statement about this first one. It says yeah, you can uniquely determine X, given Y = AX. This says, actually, not only can you determine it, but you can get it by applying a linear function. Thatfs not obvious right? It could have been that you could uniquely get X, but you had to jump through all sorts of hoops and do all sorts of signal processing or something like that to get X. It turns out no. In fact, you can decode with a linear decoder. And the next one is this. It turns out this one is relatively easy to show. It says itfs the determinant of A transposed A, is non-zero. At the moment, that means nothing to you, but it will soon. And I think this is one of the ones thatfs -- this one, I have to be very careful here, because some of these are extremely simple. Theyfre either three lines or theyfre actually very difficult. And all of the difficult ones, by the way, are equivalent to each other. Shall I try showing this? What do you think? So I reserve the right to absolutely give up one minute into this, and to pretend that I never thought -- that I knew ahead of time this was actually one of the tough ones. Okay, so wefre just gonna see. Wefll see if this is one of the easy ones. What we want to show is this. Letfs start by saying suppose that A has a non -- Wefre going to show that AX -- Well, here, A is 1:1, if and only if, det A transpose A is non-zero. Okay, letfs try that. All right, so letfs try a few things here. Letfs start by saying, assume A were not 1:1. Okay? That means, well by definition, it says that its null space is not just zero. That means that there is some X which is non-zero, but AX = 0. Thatfs what it means. It means therefs a non-zero element in null space. Now if AX is zero, then surely A transpose AX is zero because I could write it his way, thatfs the zero vector, and A transposed times any zero vector is zero. But I could also write it this way and write A transpose AX. I can re-associate. Ah, but now we have a problem. We have a square matrix here multiplying a non-zero vector and we get zero. And that implies that the determinant of A transpose A is zero. Okay? So we just showed that if this fails to hold, this fails to hold. And now we do the opposite. Now letfs assume -- thatfs one way. Letfs assume now det A transpose A is zero. Assume thatfs zero. That means the matrix A transpose A is singular. That means therefs some non-zero vector, which gets mapped to zero like that, where X is non-zero. Now let me ask you a question here. If I had scalars, and I wrote down -- Actually, can I just cancel the A transpose off of here? Thatfs an excellent question. Can I just write this as A transpose A = 0, as A transposed times AX = 0, and now say A is non-zero, so I can just A transposed non-zero and just remove that. Can I do that? Can you cancel matrices? No, in general, no you absolutely cannot. So itfs entirely possible you can have the product of two matrices, both non-zero, and you get zero. So you canft cancel matrices. Okay? Thatfs not going to help, but watch this. This is actually -- Here Ifll show you a trick, you may have to pan up here. Itfs going to set the pace for the class. I can move it up. All right, what happens is this. We know this, and herefs the trick. If this is true, I can certainly multiple on the left by A transpose and get that because X transpose times a vector thatfs zero is surely zero. Thatfs the case. Now Ifm going to rewrite this in a different way. Ifm going to write this as AX transposed times that. And thatfs just one of the rules for transposition. You transpose a product by transposing each thing and reversing the order. Okay? Now you may know that the -- Well, I guess its part of the prerequisite. I think actually wefre going to cover it in a few slides anyway. But in fact, the transpose of a vector times itself is actually the square of its norm. Thatfs actually this. This is actually nothing more than the sums of the squares of the entries. This is AXI squared sum on I. Now when a sum of squares of numbers is zero, every single number is zero, period. And in fact, what this says is, therefore AX is zero. Thatfs just what we wanted, because now I have an X in the null space of A, and its non-zero, and therefore A is not 1:1. Okay? By the way, I wonft do this very often. I would actually encourage you to see how much of -- Most of the statements in the notes are things you should check. These things get boring after a while, but you should actually try to check as much as you can. Most of it is quite straightforward. Some of this stuff for now is not. Okay, so let me say something about the interpretations of the null space. Well, suppose Y = AX describes a measurement set up or something like that. Then what it says then is something in the null space is undetectable. It is a pattern that is undetectable from the sensors. Because when you have that pattern, all the sensors just give you flat zero reading. And you donft know if maybe what youfre looking at is zero. In fact, it gives you the exact same signature, zero. It also says this, it says that X and X -- is you take any vector and you add anything in the null space, they are completely indistinguishable according to your measurements, totally indistinguishable because they give you the same readings, the same Y, the same AX. Now in fact it characterizes the ambiguity in the measurement. Thatfs what it does. Now if Y = AX represents something like an output resulting from an input or action, X, then basically it says that something in the null space is actually an action. It is a non-zero action that has no net result. Thatfs what it means, and in fact in this case, the null space characterizes the freedom of input choice for a given result. So actually having a big null space in a measurement problem is, generally speaking, bad because it means therefs lots of ambiguity, if what youfre trying to do is recover X. If on the other hand, X represents an action that you can take, and Y is an outcome, having a big null space is great because it says there are lots of Xfs. And you have an organized way, in fact, of finding Xfs that will satisfy the requirements. So thatfs good, and it means that you can drop back, and you can optimize something else. You can drop back and find a small X or whatever you like. Now the range of a matrix is another one of these fundamental subspaces, or whatever. And itfs denoted with a calligraphic R or a script R or something like that. Itfs range of A. This is also a subspace. It is not a -- it is a subspace of the output dimension. So it is actually a subset of Rn and it is simply the set of all vectors you can hit. It is the set of all vectors that have the form AX where X ranges over all of Rn. So this is a subset of Rm. Okay, unless youfre matrix is square, the null space in range are sets of vectors of completely different dimensions. Now you can interpret it this way, itfs a set of vectors that can be hit by the linear mapping Y = AX. Thatfs what it means. Itfs the set of all things you can achieve, or something like that. But itfs also, in terms of the columns of A, itfs very simple. Itfs the span of the columns, so null space zero means that the columns are independent. If the range -- Here the range is simply the span of the columns. Okay? And you can also think of it this way. Itfs the set of right hand sides, if you write it as AX = Y for which AX = Y has a solution, so thatfs the idea. Now here it has all sorts of interesting interpretations and applications in different areas. So for example, Y = AX represents a design problem where X is an action, Y is an outcome. Range of A is the set of all things you can actually achieve. So if someone says, gI would like this outcome.h But if that outcome is not in the range, it means, gSorry, I canft do it. I canft get that outcome. Maybe I can get close.h Wefll talk about that. gBut I canft get that outcome.h In the case of a sensor reading, it is also very interesting. Y = AX, the range means it is a possible sensor measurement. What would be a practical use for the range in a case like that? [Student:][Inaudible] Got it, thatfs it precisely. So what would happen is this, if in fact the measure -- If you believe the system is modeled by Y = AX, Y is a vector of measurements. If Y comes and the first thing you do with Y is check, is it in the range? If the answer is yes, no problem, you can then proceed to whatever youfre doing. If the answer is no, you want to throw an exception. You want to throw an exception to whoever passed you the Y, and basically says sorry, that Y could not have possibly been something of the form AX. By the way, that might mean one of you sensors is off. So this could be a sensor detection routine or something like that. And you could do all sorts of interesting stuff there. How would you check? Suppose you knew that one sensor was bad, what do you do? Whatfs the algorithm? How do you check it? That can be a homework problem, by the way. Want to make a note of that? Thatfs simple. So letfs talk about it. How do you do that? If I give you Y = AX, X is in R10, Y is in R20. Roughly speaking, youfve got about a 2:1 redundancy of sensors over parameters youfre estimating. But therefs a trick. One of the sensors has failed. How would you find it? So, I want you to tell me how to find out which sensor. I want you to come back and say, gSensor 13 is no good.h How do you do it? [Student:]You put in two inputs. You know what, actually itfs passive. You donft have -- You canft do that. You donft have the -- itfs passive. You canft control X, youfre just given readings, thatfs all. You had a suggestion. [Student:]Take the [inaudible] of Y [inaudible]. I donft think you can take the inner product of Y with the rows of the sensor matrix because the have different dimensions. The rows of the sensor matrix are of size Rn, the input. And the output is of size -- if it was a ten in our example, the otherfs R20, but youfre close. What would you do? [Crosstalk] Find the null space of what? [Student:] [Inaudible] Let me ask you this. Suppose I have a sensor suite, and I say, gPlease take out sensor 13, what does that do to the matrix? How do you express that in terms of what you do with Y = AX, Y used to be 20 high, now itfs now 19. How do you get that y and what do you do with A? What do you do? [Student:]You delete the 13th row. You delete the 13th row of A. Okay, so thatfs what you do. So basically, if everything is fine, if itfs Y = AX thatfs what itfs been for a while, and I come along and say, gNuh, uh, sensor 13 is going out for maintenance.h Then you have a reduced A red and you remove the 13th row. Thatfs what it means. Does everybody have this? So now you have 19 x 10 matrix. Okay. You can calculate the range of that 19 x 10 matrix. Letfs suppose -- Letfs say you can do that. You will within one lecture. You can calculate the range of that matrix, and you can check if the reduced measurement is in the range. If itfs in the range, what does it mean after I remove the 13th row? You have to be careful. What does it mean? Itfs kind of complicated. [Student:][Inaudible] Ifm going to turn it around and say that it means that the 13th -- It means, if now you remove a row, and you look at the reduced Y, and itfs in the range of the reduced matrix, what it means is this; the 13th sensor might not have been bad. Sorry, itfs a strange statement, but thatfs exactly what it means. What if, I remove the 13th row; I look at the range of that matrix. I look at the range of that matrix or something like that, and actually now the reduced Y is not in the range. It means you still have a bad sensor. It means youfre still wrong. It means the 13th was not the probably. Ifm assuming therefs only one sensor failure. I have a feeling that Ifm just making things more confused. This is what we call a homework problem. You will be seeing this shortly. Trust me. Okay, so. Or you can go back and see if any of this makes sense later. All right? Now, therefs a special case, which is when you can hit all vectors in the output space. Thatfs when the range is Rm. That means you can hit everything. That says, you can solve AX = Y for any Y. That has all sorts of obvious interpretations. Wefll get to that. Another way is to say that the columns of A span Rm. Thatfs what it means. Now in this case, this is not obvious. It says there is a right inverse of A. So, this says, for any Y you give me, I can find and X for which AX = Y. This third bulla here is much more specific. It actually says, not only that, but in fact I can construct the X as a linear function of Y. Thatfs what it means. So here, if in fact therefs AB with AB = I, thatfs a right inverse. What it is says is this, it says that ABX is X, well of course because IX is X. On the other hand, it just says -- You know what, I shouldnft have used X. Sorry. There we go. It was perfectly okay, just horrible notation. What this says is -- this is quite specific. It says, gOh, you want an Ax for which AX = Y, no problem. Letfs take your Y and multiply by B.h So B actually is an automated design procedure. Thatfs what it is. If this is a design problem, itfs a design procedure. It says, gYou want this velocity and position? Multiply by B that gives you a force program that does it.h Okay, so thatfs what it is. This is the same as saying that the rows of A are independent. Again, you can check. And itfs basically, now it gets tricky. This is the null space of AT is zero. Now, some of these look complicated. Some actually are, most are not, but I am going to defer a lecture until we show all of the details. So this is the null space of AT is empty. And you can say this, these two are actually basically the same thing because to say that the null space of a matrix is equal to just zero is basically saying that the columns are independent. But the columns AT are the rows of A. And so, this one and this one are simply the English equivalent of that statement. Okay? And this one would probably follow up from one on the previous page or something like that. I want to warn you, some of these are not obvious. But by the way, I would not -- You should sit down and sit down and see how many of these you can actually show. You wonft be able to show some of them, like that one. Maybe that one, Ifm not sure, right away. But within a lecture you will be able to. Okay, so letfs look at various interpretations of range. Suppose a vector V is in the range of a matrix, and a vector W is not in the range. Wefll see what does this mean. If Y = AX represents a measurement. Then if you observe V that means itfs a possible, or maybe a better name is a consistent sensor signal. We just talked about that a few minutes ago when I confused all of you, most of you any way. So if however, you get a sensor measurement which is not in the range, that basically says itfs impossible, if you really believe Y = AX or it means itfs inconsistent. Thatfs what it means. Now if Y = AX represents the result of an output given an action, or an input or something like that, then it says, if youfre in the range, itfs something thatfs possible. Itfs achievable. However, something not in the range is something that simply cannot be achieved. Thatfs what it means. So, in this case Ra characterizes possible results or achievable outputs. Thatfs what it does. Okay, now so far we have talked about the size of A has been arbitrary, it doesnft have to be square. And in fact, once you find my greatest complaint with, aside from the fact that they are profoundly boring, linear algebra classes. The first linear algebra classes, my first complaint is that they focus on square matrices too much, which actually in engineering has essentially no use. Wefll see why, but in fact, most times in engineering when you have, in any kind of engineering application, you have a square matrix, except wefll see some examples later, but in general it means something is deeply wrong. Wefll see why. Youfll know why later, why in fact matrices should either be fat or skinny or something like that. It means kind of that you set things up so there are just enough knobs to turn to do what you need to do. Thatfs silly. The whole point is you either want to exploit redundancy or something like that. There are cases where you want to do that. If you want to measure something, if you want to measure six things, and sensors are super duper expensive, and for some reason you can only afford six, okay fine. But in general, this makes absolutely no sense. Itfs not robust either. In fact, if you have seven sensors to measure six things, at least then you can actually add a nice layer of software that will actually detect and remove a sensor that fails, automatically. Using the method which I deeply confused all of you with, and which you will see on the homework shortly. Hey, what do you think? Do you think wefre allowed to, even though wefve posted Homework two? I think we can do it, wefll see. All right. We just have to make -- it may have to wait until Homework three, wefll see. Now wefre going to look at a square matrix. You see, itfs invertible or non-singular if its determinant is non-zero. And this is equivalent to many, many things. The things like the columns of A are a basis for Rn. That means they span Rn; that means its range is everything. And it means that theyfre independent which means that, in this case, the other half of being a basis, so leave that. It means a null space is zero. Thatfs the slang. The null space is the set consisting of the zero vector. I donft think -- I quit. Ifm going to start saying null space is zero. It also says the rows of A are basis for Rn. It says Y = AX is a unique solution for every Y in Rn. So, it means it has now -- therefs a matrix that is both its left and its right inverse, and itfs the only one. So, if youfre a left inverse of a square matrix, youfre also a right inverse. Okay? Itfs the same as saying the null space of A is just zero, and the range is everything. And itfs the same as saying the determinate of (AT) A and det (A) AT is non-zero. By the way, for a square matrix, this is easy to show because here, I can write this, det (AT) A is det (AT) x det (A), right? That I can do. Can I do that here? If I wrote here, thatfs det (A) X det (AT), what would you say to me? [Student:]Syntax error. Youfd say syntax error, exactly. Why? Because det takes as argument only a square matrix, and in this problem there is M and there is N and they need not be the same. So this is a syntax error, which is a terrible, terrible thing, right? Because it means the most casual parsing would reveal the error. Wefll see later that there are some deeper semantic errors. Thatfs where the syntax works out but the meaning is wrong. Strangely, a semantic error is actually less of a crime. Okay, but in this case, theyfre square, so we can write these out, and these are very straightforward. So whatfs the interpretation of an inverse? Well, I mean, this is kind of obvious. It basically undoes the mapping associated with A. Actually, the cool part is its both -- it can work as equally well as a post equalizer or a pre-equalizer or a prefilter, and Ifm using some of the language of communications here. In other words, you can in this case, you can send something through a channel, and then apply B to reconstruct what happened. Thatfs a traditional equalizer. Or, before you send the signal, you can process it through B, and that has lots of names like, one is like a pulse shape or pre-equalizer, or something like that, in communication. So thatfs the idea. Okay, so thatfs that. That youfve seen. Ifll mention one other thing involving that. One interpretation of inverse, in fact you can also use it for things like left inverses and right inverses, but letfs take B is A-1, and letfs look at the rows of the inverse of a matrix. Well, letfs write this, this is Y = AX. If you write it out column by column is Y = X-1A-1 = XnAn, of course X = BY. This first one is Y = AX and this is just X = BY, because thatfs the inverse here. Now, if I write this way, I want to write out Y column by column, this one, and Ifm going to write X out row by row. This is XI is BI until they transpose Y, because when you multiply a matrix B by a vector Y, if you do a row interpretation it says you take the inner product of each row of B, with Y. Did I get that right? I did. Hey, that says look. These XIfs I can plug them in here and I can rewrite it this way. And now you see the most beautiful thing. You see what the rows of the inverse of a matrix are. They are things that extract the coefficients of -- so this basically says something like Y is a linear combination of the Afs. Now if you want to know what linear combination, or how do you get the mixer, whatfs the recipe? It says you take the inner product with the rows of the inverse matrix and these give you exactly the coefficients in the expansion of Y in the basis A-1 through An. Okay? Sometimes people call A1 through AN and B1 through BN, dual bases. Thatfs actually kind of common now in some areas of signal processing. They call it -- this goes back a long way, and itfs interesting, actually to work because wefre going to look at some other stuff later. But let me ask you something. What can you say about this? What is that equal to? What is it? [Student:][Inaudible] Yep, thatfs one. If I = J, and its zero if I ? J. Okay? By the way, actually very soon wefre going to talk about things like orthogonality and things like that. So some people refer to these as biorthogonal sets or something like that. In other words, itfs kind of weird, it says that if the Bfs are orthogonal to the other Afs but it doesnft -- what do you know about this? What can you say about (AIT) AJ? Not much. There is only one thing I can say. I can say something intelligent about that. What can I say intelligent about that? Itfs what? [Student:]Itfs non-zero. How do you know its non-zero? [Student:][Inaudible] So I would have a column of A0, whatfs wrong with that? I could remove it. So, remove it. [Student:][Inaudible] Thank you. Itfs all the same thing. Yeah, A could not be invertible if it had a column that was zero. Right. Okay. All right. All of these things are very abstract now, but wefll see them in applications and it will -- Yeah? [Student:][Inaudible] No, theyfre not. Ah, but a basis doesnft mean that theyfre orthogonal. Wefll get to that. Yeah. Wefll get to that. By the way, if your background is physics or its something like applied physics or something like that, most, a lot of the matrices you encounter will be not only square but symmetric. And so, a lot of operands will be self-adjoined. And that means a lot of things because thatfs what youfve always seen. If thatfs the case, howfs my diagnosis? Thatfs a diagnosis there. Okay, so in that case this class is great for you. Youfre going to see all sorts of stuff, and youfre going to learn all the things that are almost always true, like eigenvalues are real for you, right? Yeah. Got it. Good. Youfll be fine, everything will be great. Actually, then it comes up. Youfll see itfs going to come up in physics, too. So, wefll see. There will be cool cases where itfs not symmetric. And then weird things will happen and all your fellow physicists will be deeply and profoundly confused because for them everything is symmetric, all bases are orthogonal and so on. Rank of a matrix. The rank of a matrix is simply the dimension of its range. And here are some facts, these are not obvious. Things like the rank of A is the rank of AT. Another way to say this is the rank of A is the maximum number of independent columns or rows of A you can have. And that means immediately the rank of a matrix cannot be any more than the minimum of its input or output dimension, number of rows or columns. I actually said those backwards, but thatfs fine. Thatfs what it means. If you have a 3 x 5 matrix, its rank could not possibly be more than three. It is at most three. Herefs a very basic fact, some people actually make a big deal about this and they circle it, and itfs in some fancy box and everything and itfs called sometimes, the fundamental theorem of linear algebra or something like that. They make a big deal. They have a theme song, and anyway. This is sometimes called the fundamental theorem of linear algebra. Itfs beautiful, but itfs not that big a deal. Actually, youfll find out itfs actually much cooler to be able to do stuff with linear algebra than to celebrate pretty formulas. You know, itfs like Ei? + 1 = 0, you get over that and everythingfs fine. What this says is very interesting. It says that the rank of the matrix, thatfs the dimension of the set of things you can hit. Thatfs this, plus the dimension of the null space, thatfs the dimension of sort of the ambiguity set. The size of the set of things that get crunched to zero is equal to N, which is the input dimension. My interpretation of this, well Ifm sure lots of people interpret it this way, is this is something like conservation of dimension or if you like, something like degrees of freedom. I mean, maybe thatfs a better way to say it. Donft take any of this too seriously, but itfs just to give you the idea. So the rank of a matrix is actually the dimension of the set hit, but letfs make this very specific. Letfs take A is NR, well we just had something that was 20 x 10. Thatfs it; so by the way, the name for that technically is thatfs a skinny matrix because itfs got 20 rows and ten columns. Ifll use that terminology. Itfs going to come up a lot. So I have a skinny matrix, R 20 x 10, and Ifm going to tell you that the rank of A, by the way could it be 11? No, but Ifm going to make it eight. Okay? So there it is, itfs a 20 x 10 matrix whose rank is eight. Now what does this mean? It means that if you ask, what can you hit, whatfs the set of things that have the form AX? Well, the answer to that is this. First of all, thatfs a subspace of R20, so you have twenty long vectors and therefs a subspace. It could have a dimension up to ten, but in fact it only has a dimension of eight. Thatfs what it says. Itfs got a dimension of eight. So basically it says, if X is an action and Y is a result, it says that the set of things I can effect or do is actually eight dimensional. Thatfs what I can do. I can do eight dimensions worth of stuff. But wait a minute, in this case I had ten input knobs, so roughly speaking, you would look at that and say, if what you can do is only eight dimensional, you have ten input knobs, you have two redundant knobs. Guess what. Thatfs the dimension of the null space, so the dimension of the null space of A must be two because they have to add up to ten. Thatfs the idea here. It basically says something about the total number of degrees of freedom going in minus the degree of ambiguity is equal to the total dimension of what you can affect. Thatfs kind of what it says. Itfs a good idea to have a rough idea of what this is, and it will get much more specific in specific cases. This one is very loose. Each dimension of input is either crushed to zero or ends up an output. Thatfs boy, I wouldnft put that in anything but notes. Because itfs actually, in some sense, I didnft even know what the dimension of an input is. What Ifm saying is that if any of you said this, boy, wefd come down hard on you. I mean, unless it was casual, in casual conversation, but in anything other than that you would not want -- On the other hand, as a vague idea it kind of explains whatfs going on. Now therefs another interpretation of rank, which is actually very, very nice. Itfs the idea of coding, and it comes up in ideas like transmission and actually it comes up in compression, all sorts of things. So here it is, if you have a product of matrices, then the rank is less than the minimum of the ranks of the matrices, of the things that you multiply. This sort of makes sense because if you look over here, and you think about what you can do, it basically says -- this one is actually very, very easy to show. This one up here. But now what this says is if I write a matrix A in factored form, if I write B as M by R matrix multiplied by an R by N, you can think of R as an intermediary dimension, the intermediate dimension. It basically says, I have something that looks like this, thatfs A, but Ifm going to write it this way, and Ifm going to factor it, and that means basically Ifm going to write it as a product of two things. Actually, if you like instead of one piece of code, itfs gonna call two, one after the other. Okay? And what this says is, in this case R is the intermediary dimension, itfs the intermediate dimension here. Thatfs what it is. Now if you factor something like this the rank is less than R, but in fact, if the rank is R you can factor it this way. So I can write it this way. Now this doesnft look like it has any -- Itfs interesting, itfs actually extremely interesting, and tons of super practical stuff is based on just this idea. But wefll just look at one now. Actually later in the class wefre going to do something even cooler which is this. Very soon youfll know how to factor a matrix into one to make a factorization like this. This has lots of applications immediately. Like, suppose I want to transmit Y, given X, but I pay for the number of real numbers I have to transfer between them, right? If N is 100, and M is 200, but R is 3, this makes a lot of sense because what happens is first I do this processing on this transmit side. I transmit three numbers, or whatever I said the rank was, and then theyfre reconstructed over here. Already these ideas should have ideas of efficient transmission, compression; all that kind of stuff is close to here. It is very close to this. Letfs just look at one application. Letfs suppose I need to calculate a matrix vector product Y = AX. Now you would think that canft be that big a deal. Why would that be a big deal? And it would not be a big deal if A were, for example, 5,000 x 5,000. Right? Or it depends on what wefre talking here, A could be 20,000 x 20,000, and you could do this doing NPI and a whole bunch of processes, or something like that. Or A could be 100 x 100 and you could do this in a -- actually, 40 x 40 and you could do this in a shockingly fast way. Okay? Ifm just multiply Y = AX. This could be -- What are you doing when youfre multiplying by A? This could be an equalizer. It could be precoder. It could be some estimation youfre doing, it could be anything. So wefre not worried about that. Now, suppose you knew that A could be factored, and letfs suppose -- The interesting case is when the rank is much smaller than either M or N. But letfs look at this. Now, if I compute Y = AX directly, thatfs MN operations because I calculate the inner product of each row of A with Y, and each of those requires N multiplies N-1 adds. You know, roughly. Okay? That would be MN operations. With a typical sort of gigaflop machine -- well my laptop comes out faster than that, so itfs pretty fast. Thatfs okay because as fast as you can do this, therefll be some application where you want to do it faster, or you want to handle bigger things. That would be MN operations. Now suppose you do it this way, you write A as BC, and you associate it this way. You first operate by C, and then by this. Well, if you multiply by C first, itfs going to cost you RN operations, then it costs you MR, and thatfs (M + N) x R. Okay? Just to plug in some numbers here letfs suppose -- Letfs make both M and N equal to 1,000, and letfs take R = 10. So itfs a 1,000 x 1,000 matrix and its rank is 10. The saving now is pretty obvious. In the first case itfs 106 and in the second one itfs 102,000, so itfs a million versus 20K. Okay? Thatfs a 50:1 savings in multiplying AX. Has everybody got this? This is the idea. By the way, later in the class wefre going to look at very good methods for not just -- very soon youfll know how to factor a matrix this way. If it is low ranked youfll factor it. Later in the class, wefll do some things that are very cool. Wefll actually look at how to do approximate factorizations. Thatfs even cooler. You will be given a matrix A; you will know how to find, for example, an approximation of A thatfs ranked ten. Okay? This has tons of uses all over the place. Okay, thatfs what it means. If you see a matrix thatfs low rank, it says actually, you could operate on it quickly or it means itfs got something to do with compression or something like that. [Student:]How do you know if a matrix is low rank in the first place? How do you know if a matrix is low rank in the first place? You will know, the sort of exact answer, if you want exact rank, youfll know in one lecture. If you want to know -- much more interesting question, much more practical is this, given a matrix, how do you know if itfs approximately rank 10. Youfll know that in about five weeks. By the way, did you notice that I didnft answer your question at all? Okay. But I did it forcefully. So, thatfs okay. We put your question on a stack, it will be popped. Okay? One last topic is change of coordinates. Now, the standard basis vectors are these EIfs like this, and you obviously have this expansion. I mean, thatfs kind of silly, but itfs a pedantic way of writing this out. But you would say in this case that the XI are the coordinates of X in the standard basis. Normally, you just say XI are the coordinates. Okay? Or the entries of the coordinates or something like that. Thatfs how you write it. Now, if you have another basis in Rn, if T one through ten is another basis, you can expand a vector X in another basis, and wefll call the coordinates ~X one up to ~X N. Okay? And of course, the fact that thatfs a basis tell us that -- That tells us two things. It says that the Tfs are actually independent, which means therefs only one way to write it this way. And it means, also, that any X on the left hand side here can be written in this form. Okay? It turns out we can get these ~X very easily, by a couple of matrix operations. So here ~X and the coordinates of these, and what wefre going to do is wefre going to write this out as a matrix, so wefll write that as X = T[~X]. Now, this is basically a character representation of this, so you write it this way. But now, immediately we get the answer. T is invertible, and therefore ~X is simply T-x, and indeed it turns out that the inner product of the Ith row of the inverse of a matrix extracts the TI coordinates. Okay? For example, if someone says -- I mean actually, where would this come up all the time? It comes up in things like vision and robotics. And actually, it comes up in navigation as well. So, for the rest of we just use a fixed coordinate system, and seem to go through life just fine, but for seem reason in things like dynamics and navigation and things like that, everyone has to have their own personal coordinate system, and so you have to transform things from one to the other. For example, in robotics, every link of the robot arm has its own personal coordinate system, and you have to transform them. I donft know why people do this, but they do. This is kind of the thing that would allow you to do that. Now you know what it is, itfs sort of the inverse of a matrix and it the rows extract the coordinates, and things like that. Now, you can also look at what happens to a matrix. If you have Y = AX, this is all in the standard coordinate system, and now suppose you write X in the T coordinates. And then wefre going to write Y also in the T coordinate, and the T coordinates will be ~X and ~Y, then just by working it out you get ~Y = (T-AT)~X , thatfs what it looks like. So, this thing is something thatfs going to come up a bunch of times, not a bunch of times, but it will come up several times, and it should light up a light in your brain. First of all, itfs got a name, its called similarity transformation, thatfs the first. For the moment this is all very abstract, but you should think of it as basically representing Y = AX but you represent both X and Y in a different basis, and youfll get something that looks like that. So wefll get back to that later. Now for some reason at the end of this lecture, therefs some stuff thatfs more elementary than all of this and should have been at the beginning. But thatfs okay, wefll do it now. It has to do with just a quick review of norms and inner products and things like that. Now, the Euclidian norm of a vector is actually the square root of the sum of the squares. Some of the squares you can write as X transpose X. Thatfs the inner product of a vector with itself. And itfs the square root of the sum of the squares. And itfs supposed to measure the length of a vector. And indeed it measures the length of a vector with N = 1. Itfs the absolute value. When N = 2 itfs actually the Euclidian distance in the plain, and for three itfs Euclidian distance in R3. But it makes sense to talk about then norm in R500. It makes perfect sense, and people talk about it all the time. Now there are various things related to this, but Ifll mention some of the basic ideas. So, itfs supposed to measure length. Itfs supposed to be a generalization of absolute value. By the way, some people have actually -- the double bars are to distinguish it from the absolute value. But presumably if you were super, super cool, and kind of just grew up from a child onward using vectors and matrices, you would probably write this this way, and there are people who have reached that level of development, of linear algebraic development that they write it this way. The same way wefve got to the level where zero, we donft mark it; we donft put a doo dad on it. We donft put an arrow on it, like our friends in physics. We donft make it bold or something like that. So for zero wefre fully developed. Zero is totally overloaded for us. Itfs the same thing for the number, the complex number, matrices, vectors, everything. This is a hold over. Actually, itfs 150 years old, maybe more; something like that. Therefs only one minor catch here. Unfortunately, therefs a bunch of fields where people -- itfs useful to interpret this as the element-wise absolute value. Thatfs the hitch there, but itfs supposed to be a generalization of this. So it satisfies a bunch of properties, these are all easy to show. This one, I believe you did show, or will be showing. Is that right? No, itfs close to something you will show or will be showing, which is because showing, which is the Cauchy--Schwarz inequality. Is that right? On this homework? Ifm always a couple of homeworks ahead, so I canft remember. Okay, this is a triangle inequality. Itfs kind of obvious. It says basically, if thatfs X and thatfs Y, thatfs X + Y, and it says that the length of this is no bigger than the length of X plus the length of Y. The picture makes it totally obvious when they would be equal, and thatfs when X and Y are aligned. This is called definiteness. It says if the norm is zero, the vector is zero. Thatfs easy to show here because is the square root of a sum of squares is zero; the sum of the squares is zero. But if a sum of numbers that are non-negative is zero, each number is zero. That means each XI2 is zero. Therefore, each XI is zero. Now, there are some things related to norm. For example, itfs extremely common in a lot of applications where N varies. By the way, if N doesnft vary in an application, like if N, if youfre doing control or something like that, you donft -- when N is fixed, itfs like six and itfs the three positions and the three momentums. You donft divide by N or something like that. But in other areas, like signal processing, where you get vectors of different lengths. For example, an audio clip is a vector, but N depends on what the length of the audio segment is. And you want to talk about different things like that. Or it could be any other signal. Itfs extremely common to normalize the norm by square root N. Thatfs the same as taking the sum of the squares of the elements, dividing by N. This is called the mean square value of the vector X. And that is obviously the root mean square value, thatfs the RMS value of a vector. And, it just gives you a rough idea of the typical size of that the entries would be. Therefs actually some things you could say about it. But so if the RMS value of a vector is one or one point three, it means that basically if you look at a random entry it should be on the ord. of one point three. By the way, if a vector has RMS value of one point three, how big could the entries be? Could they be 1,000? Could you have an entry thatfs 1,000? [Student:]It depends on [inaudible]. Ifll simplify your answer. Ready? Yes. Thatfs his answer. Thatfs correct. You can have a vector whose RMS value is one point three and you could have an entry in it thatfs 1,000. That only works if you have a really big vector. You would have a lot of zeros and then a really huge one. You couldnft have a four long vector that had an RMS value of one point three and have an entry of 1,000. Therefs things you can say about that, but thatfs the rough idea. Now, norm is length from zero. If you take, since you have a difference, it says the norm of a difference gives you a distance. This is very important. This allows you to talk about the distance between two things. This is what allows you to do things, talk about the distance between, for example, two audio signals, or two images. You can say, and in fact here you might even give an RMS value, and in both cases it would be very interesting. What would you do? You might have the original and some encoded and subsequently decoded signal, and you would say, gI was within in 1 percent.h The RMS error is 1 percent that means the norm of the difference divided by the norm of, say, the original signal is about point-zero-one. Thatfs what that would mean. Now, the inner product, of course wefve been using it all the time. Therefs lots of ways to write it, but one way is this. Itfs actually just X transpose Y, nothing else. In fact, we wonft write this. Actually, itfs kind of cool. Therefs other ways people write this. Ifll show you some of them. Herefs one that you see. Again, wefre going toward physics now. This would be the notations there. There are others. You could write X, Y thatfs one way. And Ifve seen this. You could also do dot, and depending on what subfield theyfre in, your dot is either little or big thick thing. These are different notations for inner product. Does anyone know any others because therefs probably some others too? All right, so the inner product, you can work out these properties is quite straightforward. In fact, I donft think Ifll say much more about that, Ifll zoom on. Because these are kind of obvious. The Cauchy--Schwarz inequality is very important and it says this, that the inner product of two vectors in absolute value is less than or equal to the norm of the product. From this, you can immediately derive, for example, the triangle inequality right away. But thatfs what this says. And this is something, I guess you show on the homework, or have shown, or will show in the next half day or something like that. The angle between two vectors is simply the arc cosine of the inner product divided by the product of the norms. Now, this number here is between plus and minus one. And the arc cosine, letfs draw the cosine. And we usually take it between zero and 180, or something like that. So we donft distinguish between minus 10 degrees and plus 10 degrees. Itfs the unsined angle, is the way to say it. So itfs the angle between things. And this allows you to do all sorts of cool stuff. To talk about two images, and say that they have an angle of 3.7 degrees with respect to each other. It makes perfect sense. Or an image can come in and you can have a database of 10,000 images and you could ask which of the 10,000 images is the one that just came in. Have the largest inner product width, which would mean the smallest angle. And this would make sense, and you can already start guessing these things have applications. Itfs kind of weird to talk about the angle between two vectors, in for example R one million because thatfs what youfre doing if youfre looking at 1K by 1K images. After a while -- donft try to visualize, by the way, R one million. Or at least work your way up there because you could be in big -- yeah. Start with R4, when youfre comfortable at R4, maybe R5 or R8, something like that. There also might be some drugs you might have -- some pharmacological things are needed to really, really understand R. Actually, Tom Kover gives a wonderful talk, basically showing beyond a shadow -- makes it completely clear. You know, we pretend, so people ask, gWhat are you doing here?h When wefre talking about the angle between two images, gTherefs no such thing as the angle between two photos.h And I go, gYeah, sure, its 3 degrees.h Then Ifd say, gYou donft understand, we do this kind of applied math stuff, and the way it works is we talk about distances and angles, but in huge dimensional spaces.h And they say, gCan you really visualize that?h At that point you have the choice of either lying and saying, gOh yeah.h Or what you would really say is something like this, youfd say, gWell, no not really.h The truth is not even close, right? But youfd say, gWell not really, but we use our pictures that we draw on two dimensional paper or in our offices, you get a grad [Student:] to say hold your hand up here, and then you get in there and you poke around this way.h You say, gWe use that intuition from R3 to guide us in making image compression algorithms in our one million, right?h So you might ask how useful is your intuition from R3? Tom Kover has a series of examples showing that you know nothing. Hefll show you all sorts of things. Theyfll be true for N = 3, 4, 5, for six on theyfre so false itfs amazing. Thing are like wow -- I should collect some of these from him because theyfre really good, and there are about four of them. One is like if you pick a point at random in a unit ball in arc 20, itfs basically; I think they call it a sphere hardening. Therefs something like a 99.99999 percent chance that you are within 1 percent of the boundary of the sphere. Just totally insane things, but okay, just to let you know, so that at least we were honest about it. Itfs very important to understand what it means to have a positive inner product or a negative inner product or something like that. Is should say this, if X and Y are aligned, that means they point in the same direction, that says that the inner product is the product of the norms. If theyfre opposed, that means that the inner product is as small as it can be. If theyfre orthogonal, it means that therefs a 90 degree angle between them. It means X transpose Y is zero. And itfs very important to understand what it means to have a positive inner product or negative inner product. So positive inner product basically means that the angle between them is between zero and 90. If you have two things, as you twist them out, the angle goes from 90, and the inner product is quite positive. Itfs as positive as it can be when theyfre aligned. As they twist along like this, and when they get to 90 degrees, the inner product is zero, then the inner product starts getting negative, like that. So a very, very crude thing to say would be something like this. If the inner product between two vectors is positive, they point roughly in the same direction. And I mean really roughly. I mean theyfre not fighting each other. If you add them, thatfs actually one of the characterizations, that the norm of X + Y is bigger than the minimum of the two. You make progress. If you have a negative inner product, it means the angles between 90 and 180. This allows us to combine things like a half space. If you have a set of Xfs who have a negative inner product where theyfre give vector Y, thatfs this. You draw the vector Y and the vectors that have a zero inner product with Y, basically those that are orthogonal to Y, thatfs the hyper plain with Y as a normal vector. Thatfs here. This shaded region, these are actually vectors whose inner product is negative with Y. So they have an angle more than 90 degrees, thatfs sort of the picture. This is called the half space, and these things come up and all sorts of things. A half space is the solution of a linear inequality. Ifll start in on the material of the next lecture, which youfve probably seen in one or two other contexts, maybe in the context of Fourier or somewhere. So wefll talk about orthonormal sets of vectors. The Graham-Schmidt procedure or when you right it out as a matrix, in matrix terminology itfs called the QR factorization. And then wefll talk about various things we can show with this. We start with the idea of an orthonormal set of vectors. So a set of vectors is called normalized if norm UI is one. So normalized just means the length is one. And by the way, some people call those unit vectors. I donft know why. Other people call them direction vectors, vectors whose norm is one. So thatfs very common notation for it. Unit vectors is a bit weird because for most people unit vectors means eI, e3, e5, but still. A set of vectors is orthogonal if different vectors in the set -- another way to say this is mutually orthogonal. It says that any two different vectors in the set have a 90 degree angle between them. Okay? And itfs orthonormal if both. Now, watch out because people will simply say, you will say on the streets, U1 through UK are orthonormal vectors. That is basically completely wrong. By the way, therefs nothing wrong with saying U1 through UK are normalized vectors. Thatfs cool because to be normalized is an attribute of a vector alone. To be orthogonal, it makes no sense to say -- if it made sense then you could have your set of orthogonal vectors and mine, and we could throw them together. If you have five and I have seven, now we have 12 orthogonal vectors. It doesnft work that way. Normalized vectors works, no problem, we can combine your normalized vectors with mine, and now we have twelve normalized vectors. Orthogonality refers to a set of vectors. It is not an attribute of a vector. Okay. How do you write that in vector notation? Very simple, if you have some vectors U1 through UK, you put them in a matrix like this. You just simply make them the columns of a matrix, call that U. And to say this, is to simply say that U transpose U is I, period. Thatfs it. Why? Because U transpose is U-1 transpose up to UK transpose like this. And then you multiply by U-1 through UK. And you can see now, the II entry of this matrix is basically UI transpose UI. Thatfs one and the off diagonals are zero. What would be the matrix way of saying that a set of vectors is orthogonal but not necessarily normalized? How would you say it? [Student:][Inaudible] U transpose U is diagonal. That means they are -- thatfs a mutually orthogonal set. How do you say -- whatfs the matrix way of saying a set of vectors is normalized? How do you say it? [Student:][Inaudible] In terms of the U transpose U, it would be the diagonal entries of U transpose U would be all ones. How about the off diagonal entries? Who knows? That depends on the angles between them. Thatfs what it would say. Now, you can quickly show that orthonormal vectors are independent. How? Well, if you write something this way, and you multiply it by UI, you will get that I is zero. And that says, if you have an orthonormal set of vectors, U1 through UK, it turns out it spans our view. You have to be very, very careful here, because if I say that I have an orthonormal set of vectors, thatfs exactly the same as saying U transpose U is I, exactly the same. This is the matrix way to say that. Watch out because it is not the same as that. Although, that will hold in some special cases. Okay? In general, this is not true. An actually, therefs a quick way to remember it. If you have a bunch of vectors and you stack them up. So herefs three vectors in R10, so I write them this way, and I write this and so you get a 3 x 3 matrix here. Right? Like that. Thatfs U transpose U equals I. Okay? Now, suppose I got confused and switched them around and I wrote, (U) U transpose equals I. By the way, therefs no syntax crime here. None because this would be a 10 x 10 matrix. This would be a 10 x 3 matrix. And that would be a 3 x 10 matrix, and there is no syntax crime. So a casual parsing doesnft reveal any problem with this. The parser goes right through it. Okay? But as a quick way to know that youfre in deep, deep trouble, and that this, when you multiply two matrices together, the rank of this matrix is three. The rank of that matrix is three, and the rank of the product, therefore, is at most three. And whatfs the rank of the identity matrix in R10 x 10? Itfs ten, so we have a problem here. So this is a semantic error if you do this. Itfs a semantic error. By the way, wefre going to find out that that 10 x 10 matrix is not the identity. What it is is really interesting, and wefre going to see it later. But for the moment, just remember it is not the identity. Okay, Ifll mention the geometric properties, which are actually very interesting. Suppose you have an orthonormal set of vectors, which is to say a matrix whose columns are orthonormal. Ooo, that was slang, did you hear that? But Ifm trusting you that you are cool on that and arenft going to make semantic and other errors. That was slang. Thatfs how, by the way, people would say it. Otherwise, youfd sound like a pedant right? If youfre like, gExcuse me donft you mean the set of its columns is orthonormal.h You shouldnft talk to people like that. You should avoid them. So suppose the columns of U are orthonormal. It says the following, if you take W = UZ. It says the norm of W is the norm of Z. Now note that the norm of W is different -- W is possibly a different size from the size of Z, so what this says is, when you multiply by a matrix whose columns are orthonormal, you donft change the norm. Thatfs what it means. Another way to say it is, people say it this way, W = UZ is isometric. gIsoh means the same, and gmetrich means that it has to do with the metric or distance properties. And it basically says if you have two points like this, and the distance is three between them. If they get mapped by you, it says that the distance between if thatfs like X and Y, the distance between UX and UY in possibly a higher dimension will also be three. So it preserves these things. Itfs very easy to show this. Thatfs just a quick calculation. I think maybe this is a good time to quit for this lecture.