Some announcements. Actually, you can turn off all amplification in here. Thanks. And you can down to the pad. Ifll make a couple of announcements. The first is todayfs lecture, wefre gonna finish just a few minutes early because I have to dash over to give a talk at 11:00 a.m. in CISX. In fact, youfre all welcome if you wanna come. I donft know why you would, but itfs a talk on circuit design and optimization. Thatfs in CISX auditorium. So remind me, actually, if we get -- if Ifm not out the door and walking towards CISX at 10:45 a.m., you can do something like wave your hands or anyway, that kind of thing. As a result of that, Ifll be canceling my office hours today. A little bit late notice, but I will, and Ifm moving them to Thursday. That doesnft really work really well with like, for example, Homework 7, but Ifll also be around sort of on and off in the afternoon today, so if there is something you really thought you needed to get me today for, youfd be able to find me some time in the afternoon probably. Or you can send an email. One other announcement is I got one inquiry. I think it had to do with someone planning to go away or something like that over the next week, and they asked could I possibly be so cruel as to assign Homework 8 this Thursday. What do you think the answer is to that? Ifm just sort of curious. Anyway, I donft even have to answer that. Of course wefre gonna have a Homework 8. That was never in question, so we will indeed assign a Homework 8 on Thursday and itfll be due like maybe the Tuesday after the Thanksgiving week, something like that. I saw someone -- are you okay? [Student:]Ifll live. Youfll live? Okay. Her head kind of listlessly fell backwards. Thatfs okay. All right. Okay, any other questions about this? No? I guess the people coming in now will be shocked when I get up and leave at 10:40 a.m. Okay, any questions about last timefs material? Otherwise, wefll finish and then, actually, later today wefll get on to the essentially final topic in the class. Wefre gonna spend a good deal of time on it, but it will in fact be the last actual topic, so. All right. Wefre studying linear dynamical systems with inputs and outputs, so systems of the form x dot is Ax + Bu and y = Cx + Du, like this. The transform matrix, if you plug in s = 0, is, of course, you just plug in C(si) -- a inverse b, and you plug in s = 0 and you get minus C(A inverse) B + D. Itfs a famous matrix. Itfs one youfll see arising often and this is called -- I guess the more descriptive term for this would be the static gain matrix is what this would be. But kind of a rather cool retro term to use is itfs the DC Gain matrix. Thatfs direct current, so this goes back to like, I donft know, 1910 or something like that, but itfs kind of -- I use this, but mostly just to irritate people and stuff like that because itfs so retro. All right. And Ifll say a little bit in a minute about what happens if A happens not -- if A is not invertible. So what this describes is it actually describes the system, what relates the inputs to the outputs under static conditions. That is exactly what this does, so if you have static conditions, that means u, y, and x are all constant. Then of course, you have x dot and in that case is zero; x is constant. See if zero = A + Bu, y = Cx + Du and if you eliminate x from these equations by solving for x = minus A inverse Bu here and you plug that in here, you get this. Okay? So this is assuming A is invertible here, so this is what it describes. Now, if A is not invertible, what it says is that there are inputs for which you cannot solve this equation here for any u, so it says there actually are no -- there are ufs, or there can be ufs, for which there is no static equilibrium. You donft have to worry about that, but thatfs the meaning of that. Now, if the system is stable, this is the integral of the impulse response, and this just follows from the Laplace transform, that the integral of a function is -- well, this is e to the --s(t), so h of s is integral either the --h times this, 0 to infinity. You plug in s = 0 and itfs the integral. Now, that of course, requires h of t, this integral, to make sense and that would be the case if h of t decays. Okay, itfs also the same as the limit as t goes to infinity of the step response matrix, and the step response matrix here is, of course, the integral from 0 to t, and thatfs again, follow just from thatfs what the definition of the integral is. Okay, and if you wanna know sort of what does the static or DC transfer matrix tell you, it basically tells you this. It says if the input vector converges to some constant u infinity, so the inputs can wiggle around, they can do whatever they like, but if they converge to a constant value, then the output will also wiggle around, but it will converge to a constant value and that constant value is obtained by matrix multiplication h of 0. So h of 0 is very important. Itfs the thing that tells you, roughly, how the input affects the output when all the transients and so on have settled out. And if you work out -- we can work out some examples and they should make perfect sense. For a mass spring system, the DC gain matrix -- our mass spring system is right here. Our DC gain matrix, and letfs think about what itfs gonna do -- itfs gonna map the two tensions you apply here into the displacements of these three masses. So thatfs what itfs gonna do. Thatfs what the DC gain matrix is. So, of course, itfs a 3x2 matrix and it would tell you, for example, the first column is obtained by pulling one Newton on this tension, letting the whole thing come to equilibrium and then recording the offset of the three masses. And itfs kind of obvious what happens if you pull a Newton here, this thing displaces to the right, this displaces to the left. This displaces to the left a bit less. This one probably a bit more to the left than this one goes to the right. Ifm making that up and I will change my story if we look at the actual numbers now and itfs different. And letfs see. Did I get my story right? Yes, I did get my story right, so this says that if you apply one Newton, the left mass -- thatfs the first position -- moves a quarter meter to the left. The next one moves minus a half; thatfs twice the displacement, and the other one moves minus a quarter. And then you get a similar thing over here. Therefs a symmetry. And this you can work out. Well, itfs horrible, but I donft recommend doing it by hand, but for example, if you took this thing and C and worked out C times si minus this thing inverse times that, you would indeed get this matrix. Now for the circuit, the DC gain matrix is actually quite simple. Again, you can work it out, but letfs see what it means. I have to find it somewhere. Here we go. So the circuit, the DC gain matrix, has the following interpretation. It basically says apply one volt here, wait for all the transients to settle, and then look at the mapping from the voltage here to the voltage at all the nodes, and of course, itfs one, right? Because if you put a volt here, if everything is static, therefs no voltages changing anywhere, no current disflowing, this is an equi- potential here. So these all have to be the exact same voltage. So thatfs the DC transfer matrix. Actually, in this case, thatfs literally the DC transfer matrix from that input to those outputs, itfs like that. Now, these are silly cases. Obviously, if you have something more complicated, itfs interesting immediately. I mean it tells you immediately how something works in static conditions. Wefre gonna cover a couple more topics on systems with inputs and outputs. So the first is dyschronization with piecewise input, so here you have your system x dot A + Bu, y = Cx + Du and I put in a sequence Ifm gonna call that use of D, D is for discreet or something like that. And what wefll do is, well, the input is going to be -- the continuous time input, thatfs the input to this system here, is going to be equal to -- Ifm going to index into the sequence and thatfs going to be over the time interval from kH to k + 1H. So by the way, many people call it -- therefs all sort of names for this. This is sometimes called zero order hold is one name youfll hear for it. It just means the input is piecewise constant. Therefs probably some other names. Ifm just not thinking of them at the moment, but thatfs basically what it is. Okay, and now, wefll define sequences, so x[D], x of D, thatfs for discreet, thatfs a sequence. So in other words, it takes as argument an integer and itfs gonna be x at kH. So x is this function from R + Rm; x[D] is a sequence. Thatfs a function from z + to RN. Thatfs what x[D] is. And the system came by sampling here, so these would be -- you can call this a perfect sampling or something like that. So here youfve referred to h as the sample interval. Actually, thatfs for x and y. Up here, it would be very common to call it the update interval. So that would be very common to just use a different name for it. And in fact, it turns out that in many systems, the update and the sample intervals are different. Wefll get to that in a minute. So this would be the update interval. So for example, youfd say, oh, the update frequency is 50 hertz. The sample frequency is 400 hertz. U hertz did the same, but thatfs the idea. Okay. All right, so you can just think of these as sample versions of x and y. Letfs work out how theyfre related. Well, the x[D] of k + 1 is x of k + 1H, but we can get that because we can propagate forward. This is what the state was at kH. This propagates it forward h-seconds. So this tells you how the state would propagate, actually, if there were no input over that interval. Then this convolution here tells you what the effect of the input is, but itfs only over that interval. Here u only refers to between kH and k + 1H. Now over that time period, this thing is a constant and therefore, comes out of the integral. These are matrices. I obviously canft pull it out on the left. I have to pull it out on the right. Itfs the only correct place to put it and it goes here. B is also a constant; that goes out, and so you get something like this. I shouldnft say that. You donft get something like this. You get this. So the terms make perfect sense. This is basically what would -- this is what the state would be if there were no input applied. This, thatfs complicated, but this thing here, thatfs a matrix here and it just multiplies what the input is, its constant value over that interval and itfs an update, and in fact, if you look at this closely, you can write this this way. Itfs a discreet time dynamical system. It says x[D] k+1 is AD, x[D] + BD times uD, and everything works this way. D is the same, C is the same. These donft change, but A and B are given by this. And some people call this the discreet time system, so thatfs what this is. This is simple enough. This integral, by the way, can be evaluated numerically, but in fact, therefs lots of cases where it can be analyzed or you can actually just give an explicit formula for it from this thing. And Ifll give a case like that in a minute. So A is invertible, which, by the way, need not be the case. There are plenty of interesting cases where A is not invertible. You can express that integral as this, and this makes perfect sense if someone walks up to you on the street and A for example was a number, you would know that the integral, either the TA between zero and h is something like itfs one over A times e to the HA minus 1. So you would guess at a formula like this would hold. And the questions would be -- now, thatfs still just a guess. Thatfs how these overloadings should work. You should just figure out what would this be if they were scalars. Then you have to put things in the right place. Actually, it doesnft tell you whether the A inverse should go on the left or the right here. That it doesnft tell you. Thatfs the first thing. And then therefs the bigger question as to whether or not itfs true because arguing from analogy from a scale of formula will often give you the right answer, but in a large number of cases, it just will give you something thatfs completely wrong. So here, I can tell you why youfre okay, or actually, maybe someone can tell me why. What is it about the terms here that actually gives us -- what makes this safe? By the way, safe means you should still go and check it, but what makes this safe? What do you imagine makes this safe? Everything commutes here. You imagine this as a power series in A. Everything commutes. A commutes with A inverse, obviously, because A-A inverse is A inverse A is i. When everything commutes, thatfs actually kind of the safe time. Thatfs exactly when your scalar formulas are gonna actually correctly -- theyfre gonna lead you to correct results. So thatfs how that works. And if you wanted to show this, it wouldnft be hard. Youfd actually just write out the power series for e to the Tau-A, integrated term by term, look at the new power series you have and say, hey, what do you know. Thatfs the power series for this. Now, you might not wanna do that because of the A-inverse there, but still. Well, anyway, that would be that. So you have this. Now, and interestingly, we can actually now point out something quite interesting here is that stability is preserved by discretization. So if Eigen value of A, thatfs an x dot = A + Bu lambda N, the Eigen values of AD are e to the H lambda 1 up to e to the H lambda N. Did we put that on like some additional -- I have a vague memory we made an additional homework problem on this or something. Spectral mapping theorem. Well, I have a vague memory of it. Maybe you have or -- a few people nodded. You werenft just being polite, right? You actually saw Spectral Mapping Theorem in this course sometime? Okay, good. All right. So here you would know that the Eigen values of AD, which are given by just the exponential like this, would be e to the H lambda 1 up to e to the H lambda N. And thatfs interesting because it turns out that if you have the real part of a complex number is less than 0, thatfs if, and only if, the magnitude of its exponential is less than 1. And the reason is very simple. Itfs the magnitude of e to the -- well, here -- u, if u is a complex number, is exactly equal to e to the real part of u like that. So thatfs almost by definition. So if this here has a negative real part, then the magnitude of this is less than one. Now, this is just the simplest form, but therefs all sorts of horrible little things. Theyfre mostly horrible in bookkeeping, but you could actually, in principle, work out all of them now. Here are the kind of horrible things that could happen. You could have offsets. You could say, well, you know, we measured the state at these times, which are offset. Theyfd give you some horrible timing diagram, and then they would say the whole thing operates at 50 hertz, but the state sample and the update are offset by 8 milliseconds, or something insane like this. Itfs not hard. You just plug in the right e to the 8 milliseconds A in the right places and you could work out what happens and all that sort of stuff. Itfs not fun, but it can be done. Very, very common is multi-rate and so here you could actually have different inputs updated at different times, different outputs sampled at different intervals. Theyfre usually multiples of a common interval, so thatfs extremely common. So, for example, a jet engine controller on an airplane might run at 400 hertz and then something wrapped around that, the update might be at only 100 hertz, but the sampling might be 50 hertz. Your nav update, your navigation update might be running at 10 hertz. Your radar altimeter might be running at 2 hertz and all this kind of stuff, and itfs just a lot of horrible exponentials flying around and itfs not fun, but somebody has to do it. And I claim, in principle, you could. Okay, the next topic, which is dual system, Ifm gonna actually skip over because I decided I wasnft -- Ifm feeling slightly pressed for time, but only just slightly. So donft worry. Youfll know when I get panicked. Ifm not going there. I control it by the number of espressos I have before coming here. Ifm at two right now, but I can go up to four if I need to. We donft yet. Next topic, causality. So Ifll just say a little bit about this. Now, one interpretation as you write the state is this. Itfs basically the state of time, T; the state propagated forward T seconds. Thatfs what this term is. Plus, thatfs if you do nothing, so some people call this the zero input term, or I donft know, something like that. This thing is the zero initial state input. This is the effect of the input. Now notice that the state of time, T, depends on the initial state and the input only over the interval between zero and T, time. So you would say something like this. The current state and the current output depend only on the past input. Well, thatfs what causality is. Causality says that the things you do now affect the future, not the other way around. Now itfs a bit strange because you look at x dot = Ax + Bu, y = Cx + Du and I donft see any asymmetry in time in these equations. I donft see anything that would make things running backward in time look any different than running forward in time. I donft see any asymmetry here. Actually, the asymmetry is right here, in fact. Thatfs the asymmetry, so itfs a subtle point. Itfs probably worthwhile to mention. So the asymmetry doesnft come from these equations. These equations are very happy. They run backwards in time. They run forwards in time. No problem. Itfs actually our assigning -- our considering an initial value problem. Thatfs a problem in which the initial state is fixed, which makes it appear causal to us. For example, suppose you fixed the final state, x of T. Well, then I can write another formula that looks like this. It says that the current state is equal to the final state, propagated backwards in time to the current value, plus, and then an integral that has to do only with the future. So this is an integral. And I mean you can just check this formula. Itfs absolutely correct. So in this case, if you fix the final state, the system is, in fact, not only is it not causal, itfs anti-causal, or something like that. I mean this had to be. So these are both related to the concept of state, which we, so far, have used only to mean x of T in x dot = Ax + Bu. But in fact, therefs an idea that when you say state, it ties into a much bigger idea and let me say what that is. So first, let me just say what state is abstractly. How many people have seen the concept of state, like in a computer science course, some very abstract computer science course or something like that? Actually, [inaudible] theory where you have to know that, right? You have to know the state of a processor. So in fact, the state of a processor, now thatfs not abstract. Thatfs quite not abstract. The state of a processor or something like that is very simple. Itfs all the values of the registers and all the stuff like that. Itfs basically whatever you would need to do in that processor so that, if you restored the state and ran it forward in time from there, its behavior would be absolutely identical. Thatfs what the state is. So for example, when you call a function, you push the state on some stack or something, and when you finish calling the function, you pop the state and the idea now is, aside from side effects that happen during the function call, it should actually continue absolutely as if nothing happened. By the way, if you forget to set a register, if you forget to restore a register or something like that, youfre in deep, deep trouble, and thatfs because you failed to restore the state. Therefs also an abstract idea of state. Abstract idea of state is something like this. It is this and this is worth thinking about and understanding. This is gonna be abstract, so you donft have to worry about it. Itfs just for fun. The state is what you need to know about a dynamical system at a given time so that, if you knew the inputs in the future, you could perfectly predict the future behavior. Thatfs what it means. So other ways to think of it is that it is a sufficient statistic for what has happened in the past in order for the purposes of predicting the future. So thatfs what state is. Okay, so for example, if you have a model of how prices are dynamically changing, they depend on certain other things, like interest rates and things like that, you would say that the state in that process is sort of everything you need to know so that moving forward you can make prediction of these prices correctly, given the future inputs. Have a question? [Student:]So this would work for like a time invariant system? No. No, it works perfectly well. The question was does it work for time variances. It works perfectly well for -- the same concept works for time variance systems. And by the way, discrete systems as well. So thatfs the idea. So therefs lots of ways to think about it. Youfd say something like this. The future output depends only on the current state and future input. Or you could say the future output depends on the past input only through the current state. Or you could the state summarizes the effect of the past inputs of future. These are all ways to sort of say the same concept. And another way to say it is the state is actually the link or the bridge between the past and the future. So another way to say it, you say this in machine learning, youfd say something like this, that the past and the future are conditionally independent given this state. Again, if you donft know what Ifm talking about, thatfs fine. These are vague ideas, but thatfs okay. Itfs a very useful one to know about. Okay. Okay, so thatfs the concept of state and thatfs just to sort of point out what it means. Another topic is change of coordinates. Wefve already done this for an autonomous system. For a system with inputs and outputs, a non-autonomous system, the same sort of thing happens, except youfll see that some things donft transform. So if we choose to change the state to x ~, we write x is tx ~, so x ~ are the co-efficients of the state in the ti expansion of the state. Then youfve got x ~ dot is. You get your familiar term here. Thatfs the T inverse AT, thatfs the similarity transform, times x ~ + T inverse Bu. That makes perfect sense because Bu was the effect on the state derivative, but in the original coordinates, and thatfs what it is in the new coordinates. Now, the linear dynamical system then looks like this: x~dot is A~x~+B~u, y to C~x~+D~u and here A is the familiar similarity transform. B gets transformed by this T inverse. C gets transformed on the right and D doesnft change at all. But these make perfect sense, absolutely perfect sense. And the reason is this. Youfre only changing the state coordinates, so therefd be no reason that D would change because D maps input to outputs. Thatfs how you do the state. This is a read-out map and this basically says how do you map x~. The T here transforms for the new coordinates to the state coordinates to the old ones, and so on. Now, when you do this, the transfer function is the same. In fact, the input and output have not changed at all here, have not changed at all, and you just work out what happens here. If you form C~si minus A~B -- by the way this means the impulse response, for example, is the same. So you can check that C e to the TAB -- well, I guess youfd say plus delta of T times D, thatfs the impulse response. This is the same as if I put tildes all over these things. It will be identical because in fact, u and y havenft changed. So this will be the same. Okay, so you get this, and again, you have to work this out. This is not immediately. You have to throw Tfs and T inverses and then the usual tricks with T and T inverses help, for example, i you write as T, T inverse, for example, or something like that. You pull things out, mess with a few inverses and things, then therefs a lot of smoke, a lot of cancellations, and in the end, you get this. So itfs the same. Okay, this allows you to have ideas like standard forms for linear dynamical systems. So you can change coordinates to put A in various forms, like you can put diagonal real modal Jordan. Ifll say a little bit about the use of this. It actually has a huge use, which Ifll say about in a minute. So here if you put it in certain forms, then you get very interesting block diagrams because, for example, if A is diagonalized, you get something like this, and that is the so-called diagonal form of a linear dynamic system. Thatfs the diagonal form, just as an example. Now, you might ask why would you want to change coordinates like this. Well, you might wanna change coordinates of a real system just to understand how it works. So for example, this might be a modal -- this would be a modal expansion in the middle, and then if one of these numbers were really small, youfd say something like, well, the input doesnft really strongly drive the third mode. That would be it. Or youfd say something like you look at a number over here, it might be small, or if therefs a scalar, or if you know this is a matrix, itfs small. Youfd say the output doesnft particularly see a large contribution from the second mode. That would be the types of things youfd say. Now, therefs another real use of changes of coordinates and this is real, and it is absolutely not -- you do not change coordinates in the purpose Ifm gonna describe now, to things like Jordan canonical form or diagonal or anything like that. Itfs this. If this represents some sort of signal processing system that you need to implement, for example, this might represent your equalizer, letfs say, in a communication system, and you have to implement it. This change of coordinates is your friend. It is a degree of freedom because basically itfs a knob. It says down here, the point is I can choose any T I like and I will have implemented the same signal. It has the same input/output properties, so I can change coordinates anyway I like. I could do this for lots of reasons. I could change coordinates to get an A, for example, that has a lot of zeros in it. That might be very useful because when I implement this, it means itfs much simpler to implement, for example. And therefs actually forms you would use. You might also, if this was actually some kind of analogue implementation, you would actually change coordinates to make the resulting system one that was less sensitive to perturbations in the parameters, just for example. These would be the types of things you would do. Okay, that was just for fun, but itfs important to know that the change in coordinates actually has real use beyond merely sort of understanding how a system works, which is the most obvious application. Okay. So wefll finish up with discrete time systems. Theyfre pretty simple because therefs not even any differential equation. Theyfre just linear iterations. This is xT + 1 is Ax of T + Bu of T. The block diagram looks like this. This is sometimes called -- you can call it a register or a memory bank or something like that. Thatfs what it is. Or a single unit delayer or something like that; a single sample delayer. Well, one over z, z inverse would be a typical one. That means thatfs the output sequence. What goes into it is actually the sequence advanced in time because if youfre looking through a delayer at a signal, what went in is something that was advanced in time on unit, so thatfs what this is. Thatfs the block diagram. Youfll notice itfs exactly the same, except the z is replaced with s, so therefs really nothing here. Oh, I should -- I guess Ifll also mention if you know about digital circuits and things, you can also imagine that therefs a latch so that this thing doesnft -- like you do two-face clocking or something like this, or therefs a small delay or something like that so that on each -- this thing doesnft race around and come back. Itfs basically therefs a one-sample delay there. Now, here the analysis is well, I mean it couldfve been done on Day 1. It requires nothing more than the knowledge of like matrix multiplication and thatfs about it. So x of 1 is this; x of 2 is that. You multiply this out and you get that. And the pattern is very simple. The state is A to the Tx of 0. Thatfs basically what A to the T for a discrete time system, thatfs the time propagator operator. Thatfs what that is. Thatfs time propagation, A to the T. So this propagates for the initial state and this is actually discrete time convolution. Itfs nothing more. This is just discrete time convolution here. And you can write it this way: Cy is CA to the Tx of 0. Thatfs the input contribution from the initial state, plus H convolved with u starts discrete time convolution, and the impulse response in this case is D at T = 0 and then C, A to the T minus 1, B for T positive. So thatfs the impulse response. Itfs D, CB, CAB, CA2B and so on. Okay. Now, suppose that you have a sequence -- and wefll just cover the z transform very quickly. Therefs not much here. Itfs mostly just a review. Suppose you have a sequence. Wefll just make it matrix value to cover all cases right off the bat and wefll make it a sequence defined not as a double-sided sequence, but a sequence defined from 0, 1, 2 on these indexes. Then the z transform is simply this function here. And this maps, this makes sense, and depending on how violently w diverges. You have a divergence no faster than an exponential, then this is guaranteed to make sense for large enough complex number z because if z is large enough in magnitude, these things are falling off here quickly. So you referred to this as the domain of w and as with Laplace transforms, itfs not a big deal; you donft really need -- all that can happen there is things can get complicated. Now, if you define a new signal, which is time advanced like this, so itfs the same as another signal, but itfs what the signal will be one sample later. You work out what the z transform is. Itfs again, itfs like baby algebra. You multiply this out. Ifm gonna change T to T bar, whatever, T + 1, and Ifll write it this way where the z comes out, and then you can write that as zw of z minus zw of 0. It looks like the same thing for the derivative formula for Laplace transform. Thatfs the only difference right there, is that z. But otherwise, therefs no -- it doesnft matter. Okay. Discrete time transfer function is simple. You just take the z transform. I mean therefs a big difference here. We donft need this to get the solution. We already know the solution. We just worked it out by multiplying matrices, so this is, in a sense, not particularly needed. I just wanna show you how this works. And there actually are people who are more comfortable in the frequency domain; so just from there, depending on their cultural upbringing, personality type, things like that, there are just people more comfortable. Thatfs fine. No problem. And so this is addressed to them. By the way, if youfre perfectly comfortable with matrix multiplication, then I think everything we did over here, this described exactly what happens in a discrete time system. But anyway, all right. So you take the z transform here. You get the usual thing. This is the analog of sX of s minus x of 0, except we have this extra z in there. You get this and you solve for the z transform of the state to get this formula here. And the only difference is therefs an extra z here and you get the Laplace transform of the output is gonna look like this. Itfs h of z, u of z. H of z is Czi minus A inverse B + D. Thatfs the discrete time transfer function. Got a question? [Student:]Yes, [inaudible] rotation of that extra z? Thatfs a good question. I think the answer is yes. I can defend its existence, at least this way. Or I canft argue for it, but I can tell you why you shouldnft be shocked to see it there. If you sample something faster and faster and faster, this is, basically, -- where is it? I lost it. Here it is. Okay, that one. This basically says youfre off by one sample in calculating the effect of the output, the effect of the initial state, for example, on the output. Thatfs kind of what this says. If you sample faster and faster, the z has no effect because youfre sort of moving something close. Itfs a smaller and smaller time interval, so itfs okay. It would be like saying, well, no problem. Ifm just using x at the end of the interval as opposed to the beginning, and as the interval gets small, the effect goes away. So thatfs not an argument as to why it should be there. Itfs an argument as to why it shouldnft bother us that itfs there. That is just an additional argument because the main argument, why it shouldnft bother you that itfs there, is correct, which is a strong argument. Not always completely persuasive, but thatfs it. Okay, so this finishes up a section of the course and wefre now gonna enter the last section of the course, and in fact, wefre gonna do essentially one more topic and then the course is over. Itfs gonna take a lot. Wefre gonna do a lot of stuff on it. Itfs very useful. Itfs really cool stuff. It has to do with singular value decomposition. You may have heard this somewhere, somebody. Actually, how many people have heard about things like singular value decomps? Thatfs very cool. Where did you hear about it? [Student:]Linear algebra. A linear algebra class, so itfs gotten there. Itfs funny. Itfs only 100 years old. Traditionally, material hasnft gotten into -- that was taught in the math department? [Student:]No, actually, [inaudible], but under them. Oh, taught in the math department? [Student:]No, itfs EE. Oh, okay, sorry. It was taught in an EE department. Okay. So you know, I think actually, itfs about time. Itfs been around for about 80 years now, so itfs about time you might see its appearance in math, linear algebra courses. Did anyone here actually hear it in a linear algebra course taught in a math department? Cool. Where? [Student:]At the University of Maryland. Aha. Cool. So that was, by the way, one hand in a sea of -- for those watching this later or something. Okay, all right, fine. So wefll look it up. How about in statistics? Anyone hear about this principle component analysis? Therefs a couple, okay. Whatfs that? [Student:]We used it in machine learning. P -- so Machine Learning, you know about it through PCA? But other than that, I guess people in like CS never heard of this. Okay. Cool. Ifm trying to think of some other areas where itfs used. Okay, all right. So wefll do the last topic. Wefll start in with some basic stuff. You know, itfs actually quite basic and Ifll explain that in just a minute. Wefre gonna look at first, the special case about the eigenvectors of symmetric matrices, what you can say, and then wefll get to quadratic forms. Thatfs actually a bit different. Actually, itfs the first time -- so far, a matrix has always meant the same thing to you. This is actually be a different -- itfs gonna mean something different here. Wefll see that soon. Wefll look at some of the qualities, the idea of wefre gonna overload inequalities to symmetric matrices. Wefre gonna overload norm to matrices. These are not gonna be simple overloadings. These are not -- theyfre gonna be overloadings in the sense that some things youfd guess are true would be true, and a bunch of things you would guess are true are false. And these are not simple overloadings. Theyfre not what you think they are and theyfre really useful. And this will culminate in the singular value decomposition or principle component analysis, depending on your background. Okay. Letfs start with the Eigen values of symmetric matrices. So suppose you have a symmetric matrix, obviously itfs gotta be square. And herefs the first fact. The Eigen values of a symmetric matrix are real. Oh, by the way, whole groups of people, for example, if you do physics, depending on what kind of physics you do, what happens is all the matrices you see are real. By the way, they could be either symmetric -- therefs another one where itfs self-adjoint is what youfd call it. And it means that all the Eigen values youfd ever encounter would be real. Or, by the way, sometimes therefs an I in front, in which case, all the Eigen values are purely imaginary or something like that. So if youfre in one of these fields, what happens is, after a while, you get the idea that all matrices are symmetric or self-adjoint. Then you actually start imagining things, like all of these, and they lose -- even people who have done this for years and stuff like that, they get very confused then when you go back to matrices that are non-symmetric. Or theyfve even completely suppressed it and forgotten it. Okay, but for you -- I should say this -- for symmetric matrices, theyfre very special things that obtain, in terms of the Eigen values, eigenvectors, all that. Itfs very useful to know. Just donft spend all your time off dealing with these. If youfre one of the other types, make sure you know what happens when matrices are non-symmetric. But anyway, letfs look at it. Letfs see how this works. Suppose you have AV as lambda V. V is non-zero, so V is not eigenvector and V is complex here. Letfs see what happens here. Ifm gonna look at V conjugate transpose. By the way, thatfs an extremely common thing. People write that as VH or V* and I should mention, although itfs -- I donft know -- anyway, in Matlab if you type V prime, like that, and itfs complex, you will get this. You will get the conjugate transpose. So this is V conjugate transpose and if youfre very slow with these arguments, because they look -- just one little mistake and -- in fact, arguments when you first look at them, they look like it canft be, like you missed something or whatever. They look magic. But letfs take a look at it. Wefre gonna say V conjugate transpose AV as V conjugate transpose -- I mean a parser as AV here. But AV we know is lambda V, so Ifm gonna write this as -- the lambda comes outside. Ifm gonna write lambda V conjugate transpose V. V conjugate transpose V is the sum of the absolute values of the complex numbers lambda i because thatfs what, well, because A conjugate A is magnitude A2 for a complex number. Now wefre gonna do the same calculation, but wefre gonna do it a little bit differently. Ifm gonna take this thing here and Ifm gonna replace these two with that expression, and thatfs fair. And letfs see why. Well, we can do the transpose first. This is the same as AV -- well, you can do either one. This is V conjugate transpose A conjugate transpose. Thatfs what this is. Now, V conjugate transpose, thatfs what Ifve got here. A conjugate transpose is A transpose because Ifm assuming A is real. So I get that. And AV is lambda V. I plug that in here, but therefs still the conjugate up top and that comes out as lambda bar times this. Now these two are equal, so thatfs equal to that. This is a positive number and theyfre equal; thatfs equal to that, and the only conclusion is lambda is lambda bar. You can go over this and look at it yourself to check that I am not pulling a fast one on you because if you first do this calculation, and I would assume no problem, I just lost a conjugate somewhere. But in fact, no. These are two valid derivations. Thatfs equal to that and therefore, lambda is lambda bar. Itfs the same as lambda is real, same thing. Now, we get to a basic fact about symmetric matrices and itfs important to understand the logic of it. Itfs quite subtle. It says this. There is a set of orthonormal eigenvectors of A. Thatfs what it says. Now, in slang, you would say things like this. You might say the eigenvectors of A arenft orthonormal. In fact, thatfs how youfd say it informally. But that is wrong. Okay? This is the correct statement, so as with many other things, you might want to practice thinking and saying the correct statement for a while, and after a while, when you realize people are looking at you weirdly and theyfre like why would you sound like that, then when itfs actually causing social troubles, then you switch. People start thinking youfre a [inaudible]. And then you switch to the slang and the slang is the eigenvectors of a symmetric matrix are orthonormal. Thatfs wrong in so many ways if you parse it. Itfs sad. Wefll go over the ways in which thatfs wrong in a minute. So letfs see what that says. It says I can choose eigenvectors Q1 through QN, which are eigenvectors of A with Eigen values lambda i, and the Qi transpose Qj are delta ij. Thatfs the same as saying therefs an orthogonal matrix Q for which Q inverse AQ, which is the same Q transpose AQ, is lambda. So another way to say it is you can diagonalize A with an orthogonal matrix if A is symmetric. Thatfs the condition. Okay, now, that says you can express A as Q lambda Q inverse, but Q inverse is Q transpose. Now this I can write lots of ways, but herefs one. I can write this in a dyadic expansion here. This is the sum lambda i times Qi, Qi transpose. Wefre gonna look at this and itfs a beautiful thing. These are end-by-end matrices. Some people call these outer products, so itfs a sum. Theyfre also called dyads. And so this is sometimes called a dyadic expansion of A because you write A as a linear combination of a bunch of matrices or dyadic expansion. Now we have seen that matrix before. It is projection onto the line that goes through Qi, so this is A. You express A as a sum of projections onto these -- in fact, theyfre orthogonal projections, these matrices. And I think -- I have another vague memory of a homework assignment problem or something like this. Maybe not. Okay. Some of my vague memories are, well, wrong or something. Okay. So these are projections, so there can be a lot of interpretations of what this means. Before we go on, though, letfs talk about the slang statement. So herefs the slang statement. This is what you would say. The eigenvectors of a symmetric matrix are orthonormal. Therefs your statement. This is among friends, casual get-together; this is what you would say. Youfre just fooling around doing research, no onefs looking, this is what youfd say. Actually, you could even say this at a conference. Therefd be some social cues, though. When I hear someone and people like me hear someone say this, we get a little bit on edge and we listen for a little while to figure out either they have no idea what theyfre talking about or they know exactly what theyfre talking about and theyfre choosing to use informal language. Itfs usually clear very quickly. Okay. This doesnft make any sense and any sense that you could assign to this is completely wrong. First of all, you canft talk about the eigenvectors of a matrix, even though everyone does, because it doesnft make any sense. Therefs zillions of eigenvectors. Take any eigenvector; multiply it by any non-zero number, thatfs an eigenvector. So you donft talk about the eigenvector because that doesnft -- letfs start with that. That doesnft make any sense. Okay. So if I have a matrix -- herefs one -- i, that matrix is quite symmetric. What are its eigenvectors? So what are the eigenvectors of i? [Student:]All the non-zero vectors. All non-zero vectors are eigenvectors of i. So letfs make it two by two and I could say, okay, herefs my choice of eigenvectors for i: 1 1 and 1 2. There, thatfs my choice of eigenvectors. Thatfs V1 and thatfs V2. It is now false that these eigenvectors are normalized. They donft have norm 1. That has norm square root 2, that has norm something else, squared 5. So thatfs false. Theyfre not normalized and theyfre most certainly not orthogonal by any means. Okay? And in fact, if someone said get me the eigenvectors of i, and i returned these two things, no one can complain. These are eigenvectors of i, period. Okay? So itfs absolutely the case that i times this is 1 times this, i times that is 1 times that. So thatfs fine. Okay. Now, therefs actually a situation in which you can actually say when something is close to this, so letfs forget the normal because thatfs silly. Can you say the eigenvectors of a symmetric matrix are orthogonal, and this case shows the answerfs no because itfs not true. Herefs a matrix in symmetric. Thatfs an eigenvector, thatfs an eigenvector; theyfre independent and theyfre by no means orthogonal. I think thatfs enough on critiquing this thing. So the right semantics is, the right statement is you can choose the eigenvectors to be orthonormal, and that statement is shrewdly true for i because, for example, I could choose 1 0 and 0 1. For that matter, I could choose 1 1 divided by square of 2 and 1 minus 1 divided by square of 2, and that would also be an orthonormal set of eigenvectors for i. Okay. Letfs interpret this formula. This is A is Q lambda Q transpose. Now remember, Q transpose is Q inverse. So letfs look at some formulas. Letfs look at some interpretations. So the first is to look at three matrices separately and it says that if you wanna operate by A on a vector, herefs what you do. The first thing you do is you multiply by Q transpose. So the first thing you do is the x comes in, you multiply Q transpose x and you get Q transpose. And now we know what Q transpose x does. We know exactly what it does. Thatfs essentially Q inverse. Q transpose x actually resolves x into the Qi coordinates. Thatfs what it means. It resolves x. So this vector is the coordinates of x in the Qi expansion. Now we multiply by symmetric matrix -- I mean, sorry, a diagonal matrix. Thatfs very simple. Itfs very simple to visualize what multiplying by a diagonal matrix is because youfre actually just scaling each coordinate. By the way, if itfs a negative Eigen value, youfre flipping. Youfre switching the orientation. Okay? Thatfs here. Now, when we multiply by Q, we know exactly what that is. Thatfs actually reconstituting the output. So if you like, you can think of this as a coding, a scaling, and a reconstruction. Your question? [Student:]Yeah, sorry, still donft have any [inaudible] orthogonal eigenvectors. Why is it that for a symmetric matrix you can find orthogonal eigenvectors, but if a matrix is not symmetric, you canft necessarily find orthogonal? Itfs a great question. I havenft answered it yet. But Ifm going to. I think I am. Might be Thursday, but Ifm gonna get to it. So wefre just gonna push that on the stack and wefll pop it later maybe. So the question was why and I havenft said so. So first, wefre just gonna look at -- I said it as a fact and wefre gonna look at what does it mean? What are the implications? Then wefre gonna come back and see why itfs true, but wefll see why in a minute. Now, by the way, I have shown why the Eigen values should be real. I have not shown that you can choose the eigenvectors to be orthonormal. Oh, by the way, one implication of this, it says that a symmetric matrix, the Jordan form is really simple. Itfs always diagonal. You cannot have a non-trivial Jordan form for symmetric matrix. So wefre gonna get to that later, I hope. I hope we are. I think we are. Okay. Now, this is actually a very interesting interpretation. Oh, and by the way, itfs worthwhile knowing this; this comes up all the time. This is, well, roughly, actually, this exact operation is carried out, for example, in the current standard for DSL. Itfs also done jpeg. So jpeg, you do a DCT transform on an 8 x 8 block of pixels. You donft have to know this. Ifm just saying this is not those blog diagrams of abstract things. This type of thing is done all the time in all sorts of stuff all around you. In jpeg, at least in one of the older standards, you take 8 x 8 blocks of pixels, you form a DCT transformation, and then, in fact, you donft multiply here. In fact, what you do is you quantize in the middle and then you transmit these and then itfs reconstituted when you decode the image. Okay? So pictures like this actually are all around you, widely used, and so on. It comes up in signal processing, compression, communications. I mentioned one in communications. It comes up all over the place. So thatfs the picture is you resolve into the Qi coordinates, you scale, flip of lambda is negative, and you reconstitute. Now, geometrically, therefs a beautiful interpretation because we know an interpretation of orthogonal matrix geometrically is itfs an isometric mapping. So itfs something that preserves lengths and it preserves angles, and it preserves distances. Now, it can flip. For example, you can have a reflection through a plane and roughly speaking, you should think of these as rotations or reflections. So this basically says -- Ifm gonna call it a rotation even if itfs a reflection -- it says rotate the input by, for example, round some axis by 30 degrees. Scale in that new coordinate system and then it says undo it, and that means rotate around the same axis 30 degrees the other direction. So thatfs the idea. Wefve already mentioned this. Oh, by the way, when you diagonally real scale a vector, therefs lots of ways to say it. Therefs, well, I found both dilation and dilatation, so somehow therefs two. I thought for a while dilation was the only correct one. No, it also turns out, itfs also English to say dilatation and I tried to blame it on some weird people in some field. I couldnft identify the field that committed this crime. Or country of origin; I also tried to pin it maybe like on the British or something like that. That seemed like a good, promising -- that would be the kind of thing, that extra syllable, just have that Britain there. But couldnft blame it on them. Couldnft chase it down. So youfll see dilatation. Therefs also another thing you should -- so youfll also see this. And by the way, on a couple of occasions, I have had students and people say that actually these mean different things and one or two of them tried to explain it to me. They seemed -- the distinction seemed very clear to them, but it never sunk in with me. There may be. There may be a difference, but if there is, I for one havenft got it. That might be my -- probably me. So there might even be a distinction. There could be some field where they say, no, no, no, totally different things. This is where you multiply each component in the vector by lambda i. So this is the picture. Now this is decomposition like this, wefve already talked about. This is a dyadic expansion. You can call it -- oh, by the way, some people call this simply the spectral expansion. This is a spectral expansion of A, thatfs what this is. This thing over here. And these are called projectors and sometimes they even -- in fact, a very common way to see this would be this. This would be very familiar, but actually, in a lot of cases, there would be an infinity here. In a bunch of fields, this would be very, very -- youfd see things like that and youfd also see the same thing with an integral and all sorts of stuff, but it would look just like that. And theyfd call that the spectral expansion of the operator A, depending on what field youfre in. Okay. So letfs look at -- this is just a stupid example, but just to see that something happens here. So herefs a silly matrix. You clearly donft need anything to figure out what this matrix does to a vector. But as usual, with the examples, the boys and I even do this for a 2x2 matrix. The boys even do this for a 30x30 matrix, or for that matter, 3,000 x 3,000 matrix, where it is by no means obvious what a 3,000 x 3,000 matrix does symmetric at all. So here you work it out. The eigenvectors turn out to be 1 1, 1 minus 1 -- oh, did you hear that? That was slang, big time slang. So let me wind back and say it again without slang. But then Ifll stop after this lecture and Ifll go back to slang. Okay, so Ifll say it precisely. For this matrix, I chose the eigenvectors 1 and 1 divided by square 2, 1 minus 1 divided by the square root of two, which are orthonormal. Actually, that involved a small slang because I shouldnft say the vectors are orthonormal. I should say I chose the set of two eigenvectors consisting of 1, first eigenvector, 1 1 divided by square of 2; the second eigenvector, 1 minus 1 divided by the square of 2; end of set. And that set of eigenvectors is orthonormal. There, that was formal. Thatfs why people donft talk this way and why, if you see people who do talk this way, itfs weird. But you should think that way, so you should speak casually. But maybe you donft know when the right time is to, but you should never think casually. Thatfs called sloppy or something. Okay. All right. Or you could do it; you should just do it in private. You shouldnft write it down or something. And not think that itfs clear thinking. So herefs the picture. Therefs Q1 and Q2, and herefs some x that comes in. Herefs x and so the first thing you do is resolve it until you project it onto the Q1 line and the Q2 line. Thatfs orthogonal projections; thatfs these and these. Then you scale these by the eigenvectors. I guess letfs see what happens. This first guy, nothing happens. And the second one gets multiplied by two and flipped in orientation, so this guy gets flipped over here and doubled, and so you get something like that, and then thatfs the output. Now, you sure did not need to do a spectral decomposition to find out what this 2x2 matrix does to a vector, but itfs just to show an example. Okay, now wefre gonna show what I said I was gonna show, except wefre not gonna do the full case. Wefre gonna do the case where lambda i is distinct. In this case, when the Eigen values are distinct, there is a statement that can be made somewhere in between the nonsense slang and the correct formal statement. Therefs actually a true statement that lies in between them. So wefre gonna do the case of lambda i distinct and in this case, I can say the following -- all right, let me see if I can get it right. All right, itfs gonna come out right. Here we go. I have to concentrate. Here it goes. If the Eigen values are distinct for a symmetric matrix, then any set of eigenvectors is orthogonal. Independent eigenvectors -- oh, I was close. I was close. All right, let me try it again. If a matrix is real and symmetric, then any set of N independent eigenvectors is orthogonal. Yes. It came out right. I think thatfs correct. So thatfs a different statement from the original that I made. Just the quantifiers were in different places or something like that. Notice I didnft have -- so in this case, you donft have to choose. There is no choice. And the slang reading of that would be -- in slang, you would say this, in abnormal speech you would say the eigenvectors of a real matrix with the state Eigen values are orthogonal. And then if you simply choose to normalize them, then you could say gand therefore could be normalized to be orthonormalh or something like that. So AVi is lambda Vi. Norm Vi is 1. Well, letfs see whatfs gonna happen here. So here, notice I didnft have to choose the Vs in any special way. I just chose them to be the eigenvectors period. I have asked you to normalize them. That seems entirely reasonable. The eigenvectors are non-zero. You can obviously divide them by their norm and get something which is still an eigenvector and has norm 1. So wefll work out Vi transpose AVj, this thing, and wefll do that two ways. Ifll first parse it, Ifll associate it AVj. Thatfs gonna leave you lambda j transpose Vj, but Ifm also going to rewrite it this way as AVi transpose Vj. Now you have to check. AVi quantity transpose is Vi transpose A transpose, and thatfs why I use the fact that A = A transpose and then thatfs the same as this. Here, I get lambda i. Now when you do things like this, you have to be very careful. Itfs just like the calculation we did before. Itfs probably that youfve made a mistake, but you can check here. There is no mistake and you see the following. Itfs actually quite interesting. If i is not equal to j, you get this statement. Just by saying this is equal to that, you get this. Now therefs only two possibilities. I mean if i equals j, this is a non-statement because it says 0 = 0. If i is not equal to j, this is a number which is non-zero. Thatfs our assumption that the Eigen values are distinct. Therefore, that has to be zero and wefre done. So in this case, you can actually say the eigenvectors, with a small bit of slang, you can say the eigenvectors of a symmetric matrix with distinct Eigen values are orthogonal. And then therefore, can be normalized easily to be orthonormal, so itfs something like that. Therefs a little bit of slang. Okay, now the general case, the distinction is this. You have to say the eigenvectors can be chosen to be orthogonal. An example would be the identity matrix where, of course, you could choose an orthonormal set of eigenvectors, but if youfre just weird and perverse, you could choose any set of independent vectors and say, what? Those are my eigenvectors. Donft like them; theyfre independent; whatfs your problem? So thatfs that. Okay. So letfs look at some examples of this. The first example is a RC circuit. So here I have a resistive circuit. I pulled the capacitors out on the left like this and the state of this dynamical systemfs gonna be the -- I can take the voltage on the capacitors and I can take the charge just as well, and the dynamics of this is CVk. CkVk is the charge on a capacitor k. CkVk is the current flow and thatfs the current flow into the capacitor, which is the negative of the current flow into the resistive circuit. So thatfs the dynamics of that. And I have i = gV; g is the conductance matrix of this circuit and its maps, the voltage appearing at these ports to the currents that flow into the ports. So itfs a conductance matrix like that. So this describes the dynamics of the system and you can write that this way. V dot is minus C inverse gV and so we have an autonomous dynamical system. Therefs A like that. Now, by the way, g is symmetric, the resistive circuit. Thatfs actually a basic principle from physics, basically says that if this is an arbitrarily complicated thing with the resistors and things like that, just your terminal resistors, then in fact, this g matrix is symmetric. And you get similar things in mechanics and other stuff, and economics, too. Okay, however, this matrix is not symmetric. But wefre gonna change coordinates to make it symmetric and to change coordinates to make it symmetric, we use a rather strange state. Itfs the square root of the capacitive times the voltage. Now thatfs kind of weird because thatfs a reasonable choice of state. Thatfs voltage. That is an entirely reasonable choice of state, this measure in volts. Thatfs the charge, to use the charge. This is in Coulombs; thatfs in volts, but, in fact, youfre using something thatfs halfway in between because this is Ci to the 0Ci. Youfre using square root CiV. It doesnft look right. These are very -- I forget, I mean the physical units of these -- this is really quite bizarre. I guess itfs volts times square root farads, which you could write out all sorts of other ways. Whatever it is, itfs weird. It seems odd. When you change coordinates like this, this is scaling. You actually end up with x dot is minus C minus half, gC minus half, so this is actually now a symmetric matrix here. We can conclude that the Eigen values of actually this matrix, which, however, is similar to this matrix, are real. So we recover something you probably know from undergraduate days or intuition or something like that. An RC circuit has real Eigen values period, so you canft have oscillations. Okay? Cannot. So for example, in an interconnect circuit, if itfs well modeled by resistance and capacitous, you cannot have oscillations period. Okay? You have to have inductance to have oscillations. Okay, thatfs that. I will say a little bit about this. I quit in a minute literally, and Ifll tell you why in a second. I should say there is, if you really do wanna know why you would use coordinates like this, there is actually something interesting here and Ifll tell you what it is. What is norm x2 for this thing? For this? Well, itfs sum square root CiVi quantity2, which is what? Itfs sum CiVi2 and somebody in EE tell me what this is. Itfs the electro statically stored energy in the capacitors times 2. Thank you. I was just hanging on that last thing. That is twice the electro statically stored energy. So these coordinates might look sick and indeed, they are sick, I think. However, these are coordinates in which the norm now corresponds exactly to energy, so if someone said defend that, you could say, well, itfs quite natural from the energy point of view. I have to quit a few minutes early today. I told people at the very beginning because I have to rush over and give a talk at CISX auditorium, so wefll quit here.