Just a quick poll: I just wanna make sure -- how many people have tried the image interpolation problem? And how many actually had trouble with it? One, two -- okay. Unspecified trouble, I think is what theyfre -- [Student:][Inaudible]. [Student:]Yeah. I think it was -- had something wrong with the norm function. Okay. No, therefs nothing wrong with the norm function. [Student:]That was an exaggerated one. Oh. [Student:]Like, I was trying to do -- [Student:]Yeah, it gets a slightly different response than if you just do the sum of squares. [Student:]Right. Oh, you do? [Student:]Yeah. [Student:]Yeah. How different? [Student:][Inaudible]. [Crosstalk] [Student:][Inaudible]. Really? [Student:]Yeah. I mean, norm looks good, but itfs not -- Wow. Okay. You know what you should do if you donft mind? [Student:]What? Send your code and your build number to the staff website. I mean, the full code and the CVX build number -- which I should probably mention about how youfre supposed to vote early and often or whatever. You may wanna download -- redownload this CVX stuff every couple of weeks. Not that anything really has changed or anything. No, no. Itfs things like therefs a comment in the user guide that was wrong or therefs something mid -- tiny, tiny things. But just every two weeks, something like that, when you -- you can think of it when you clear your cash or something on your browser. Just download another CV just for fun. So I -- [Student:]How do I know when to do it? How do you what? [Student:]Know when to do it? Itfs easy enough. You just look and see what the build number is. Itfs fine. And you donft have to, right? I mean, some of these -- most of these builds are just, like, fixing commas. Okay? So itfs not -- [Student:]Could we just rerun a [inaudible]? Youfll figure it out. Yeah. Itfll work. Itfll work fine. Okay. But actually, please, if you actually do find what you believe to be an error -- and some people have found a couple -- actually, mostly in the documentation -- actually, e-mail us. And if youfre finding inconsistencies and numbers coming out, we -- actually, we wanna know about it. So [inaudible] just passively go, gOkay. Now itfs working,h and move on. Please let us know. Some are not -- can be explained, or therefs nothing anyone can do about it. Others probably need to be addressed. So -- Okay. Back to numerical linear algebra whirlwind tour. So last time, we got as far as the LU factorization, which is also -- I mean, itfs the same as Gaussian elimination. So this is Gaussian elimination, and it basically says this, gAny non-singular matrix you can factor as a permutation times a lower times an upper triangular matrix.h Actually, this, of course, is highly non-unique. Therefs zillions of factorizations. But for the moment, we donft need -- wefre not gonna even cover why youfd pick one over another or something like that. Thatfs a topic you would go into in detail in whole classes on this topic. But the point is that once you have affected this -- once youfve carried out this factorization, you solve AX equals B by, essentially, you factor, and then you do back substitutions or forward and backward substitutions. And a very important point about that is that once youfve factored a matrix and the cost -- if the matrix is dense and youfre not gonna exploit any structure -- is N cubed. The back solves are N squared. And actually, what this -- now you know something thatfs not obvious at first sight. I mean, you just have to know how this is all done to really know this because itfs kinda shocking. It says, gBasically, the cost of solving two sets of linear equations with the same matrix is actually equal.h More or less -- I mean, for all practical purposes, itfs identical to the cost of solving one. Okay? And thatfs not obvious, I donft think, because if someone walked up to you on the street and said, gHow much effort does it take to solve two sets of linear equations compared to how much it costs to solve one set of linear equations?h I think a normal person would probably answer, gTwice.h Thatfs wrong. The answer is the same. Okay? And thatfs where factor-solve method -- there are other methods where the number is twice. It couldnft be any worse than twice, I suppose. Right? But the point is itfs the same. [Student:]Could you just put the two in one system? Is that what youfre saying? Like, just to stack the -- Nope, nope. That would make it much worse. If you stacked it like that, youfd have a 2N by 2N -- [Student:]Yeah. And 2N cubed -- youfd be off by a factor of eight. So youfd be eight times more. [Student:]So what was your argument for being two -- less than two? Thatfs all. One factor, two back solves. Okay. Now we need to get to something very important. A lot of you probably havenft seen it. Itfs probably -- itfs one of the most important topics, which I believe is basically not covered because it falls between the cracks. Itfs covered somewhere deep into some class on the horrible fine details of numerical computing or something like that, I guess. I donft think itfs well enough covered, at least from the people I hang out with -- not enough of them know about it. And it has to do with them exploiting sparsity in numerical algebra. So if a matrix A is sparse, you can factor it as P1LUP2. Now here, youfre doing both row and column permutations, and you might ask -- I mean, you donft need the P2 for the existence of the factorization. All you need, in fact, is the existence of P1. You need P1 because there are matrices that donft have an LU factorization. Though the P2 here really means that therefs lots and lots of possible factorizations. The question is why would you -- why would you choose one over another? And Ifll mention just two reasons -- again, Ifm compressing six weeks of a class into 60 seconds, but one reason is this: you would choose the permutations -- the row and column permutations in your factorization. First, youfd choose it so that the -- when you actually do the factorization, itfs less sensitive to numerical round off errors because youfre doing all this calculation in floating point arithmetic. Right? Actually, by the way, if youfre doing this symbolically, then you donft need to -- this first issue doesnft hold. But of course, 99.99999 percent of all numerical computing is in doubles -- [inaudible] floating points. You have floating points. Okay. Now, therefs another reason, and this is very, very important. It turns out that when you change the permutations -- row and column permutations, the sparsity of L and U is affected. And in fact, in can be dramatically affected. So if you actually get the row and column orderings correctly -- thatfs these two permutations -- the LU that you will get here -- the number of non-zeros in them when theyfre sparse -- by the way, I should add something like this: that once P1 and P2 are fixed, then you normalize L by -- or U by saying, for example, the diagonal entries of either L or U are one -- then this becomes unique -- the factorization at that point. Otherwise, therefs some stupid things you could do, like for example multiply L by two and divide U by two. But once you -- then you -- then they become a canonical form. Theyfre fixed, and so once you pick these, therefs only once choice for L and U once you normalize either L or U. So in fact, the sparsity pattern will be determined by P1 and P2 here. And if you -- if these are chosen correctly, you will have a number of non-zeros in L and U that is small. If you choose them poorly, itfll just be enormous. And this is the key to solving, for example, 100,000-variable, 100,000-equation problem: AX equals B. Obviously, you canft remotely do that if it were dense. You canft even store such a matrix. Okay? Period. And if you could store it -- well, I guess if you -- you could store it and, I mean, if -- theoretically, you could, but itfs a big deal, and youfd have to use a whole lot of your friends machines and all sorts of other stuff to do this. I mean, itfs a big, big deal to do something like that. However, if that 100,000 by 100,000 matrix is sparse -- letfs say, for example, it has ten million non-zeros entries, which means roughly -- this is all very rough. If you have 100,000 by 100,000, you have ten million things -- it means you have about 100 -- on average, 100 non-zeros per row and 100 non-zeros per column -- just roughly. Well, I guess thatfs not rough. Thatfs exact. When you say average -- so what that means is something like this: it says that each variable only enters into -- on average, each of your 100,000 variables enters into, on average, 100 equations. Each equation involves, on average, 100 variables. And by the way, if you still -- if you think thatfs an easy problem to solve, itfs not. So I mean -- however, that can be solved extremely fast if P1 and P2 can be found so that these factors actually have a reasonable number of non-zeros. If you start with ten million non-zeros -- if the L and U -- if youfre lucky, L and U will have about on the order of ten million, 20 million on a bad day. Thirty million would be -- youfre still no problem -- just not even close to a problem. If you canft find P1 and P2 that end up with a sparse cell, then itfs all over. But in many, many, many cases, therefs gonna be very simple -- shockingly simple and nai"ve algorithms. Wefll find permutations that give you a good sparse factorization. Yeah? [Student:]How do you know, when youfre finding P1 and P2, whether it can be made less sparse? You donft. You donft. You donft know. So what you do is you run -- it depends on what youfre doing. You run at -- youfll actually attempt to solve it, and then youfll be told -- I mean, it depends, but actually, itfll be really simple. If it just doesnft respond, like, for the -- and you can look at your memory allocation and watch it go up and up and up and up, then youfll know itfs not working. But itfs easy enough to know ahead of time. Therefs actually two passes -- Ifll talk about that in a minute. The first pass is called a symbolic factorization, and the symbolic factorization, you only look at the sparsity pattern of A, and you actually attempt to find P1 and P2. In real systems, what would happen is this: the symbolic factorization would run, and by the time itfs gotten to some upper limit, like 100 million or a billion, letfs say -- it depends on, of course, what machine youfre doing this on. But after it gets to a billion, it might quit and just say, gTherefs no point in moving forward because the L and U Ifm finding already have 100 million entries.h Something like that. So thatfs how you do it in the real world. Itfd be two phases for that. Thatfs how it would really work. The way that would work in Matlab -- which, again, let me emphasize Matlab has nothing to do with the numerics. Itfs all done by stuff written by other people. They would just, I think, stop. I mean, you just write A/B, and it just wouldnft respond. And that would be its way of telling you that itfs not finding a very good permutation. Ifll say more about this in a minute. One thing thatfs gonna come up a lot for us is Cholesky Factorization, not surprising. So thatfs when you factor a positive-definite matrix. So if you factor a positive-definite matrix, you can write it as LL transpose -- L transpose -- I mean, obviously, what this means is itfs an LU factorization. Two things about it -- you do not need a permutation. Every positive-definite matrix has an LU factorization. In fact, itfs got an LU factorization with L equals U. By the way, just to warn you, therefs different styles for this. Other people that use -- this is lower triangular. You can also do it as, I guess, letfs see -- upper triangular -- therefs an upper triangular factor. Therefs different ones, so I guess you just have to check. In fact, who has recently used -- whatfs the chole in LA pack? That would be the standard. What does it [inaudible]? [Student:]U. In the upper triangular? Oh, okay. So sorry. Wefre backwards from the world standard. The world standard would be upper triangular, but everything works this way. So this is the Cholesky Factorization. Itfs unique, so therefs a unique -- and by the way, therefs no mystery to these factorizations. Ifm not gonna talk about how theyfre done, but therefs absolutely no mystery. I mean, you basically just write out the equations and equate coefficients. I mean, I donft want you to think therefs some mysterious thing and you have to take this ten-week course to learn what a Cholesky Factorization -- itfs really quite trivial. To write down what it is -- the formula for it is extremely simple. What you should know after our discussion last time in multiplying two matrices is that little formula we write down -- thatfs not how itfs done. Itfs done by blocked and hierarchies and all that kind of stuff, which exploit memory and data flow and things like that. So let me just show you what I mean here. If you write A11, A12 -- you know what? Thatfs good enough. A22 -- something like that. You simply write this out that this is L11, L21, L22 times the transpose of that, which is L11, L21 and then L22 -- like that. And now, this is the Cholesky Factorization, and you just wanna find out what are these numbers? Well, okay. Letfs start here. Letfs take the 11 entry. We notice that this times that is -- so we find L1 squared is A11. Everybody following this? So obviously, L11 is square root A11. Okay? Thatfs the first one. Now that you know what L11 is, you just do the same thing. You back substitute, and you look at the next row -- would be this one times this one would be L11, L12 equals A12. And now you divide out. Do I have to do this? I donft think I have to. I mean, so this is totally elementary for you. You just write it all out. And so the existence and formulas -- in fact, lots of formulas, obviously all equivalent for the Cholesky Factorization -- you can derive this way. But just -- and then you could write, by the way, a little chole in C, and it would be about five lines. Again, like matrix multiply, but your chole would be beaten like crazy by the real ones, which would block it up into little blocks optimized for your register sizes and cashes and things like that. Okay. All right. So thatfs the Cholesky Factorization. The way you solve -- and here you get one third, and this is kinda predicted -- the factor of two is totally irrelevant in flop counts. As we -- as I mentioned last time, you can be off by a factor of 100 comparing real computation time versus flop count time. So two things with an equal number of flop counts, like for example, your amateur chole or your amateur mat molt will be beat -- could be beat by a factor of 100 by one that does the right thing. So -- and they have exactly equal number of flops. Actually, it could be more than 100. You could arrange it very nastily to -- you arrange N to be just big enough to have a lot of, like, cash misses and things like that. And we could arrange for that number to be almost as big as we like. Okay. So when you do a Cholesky Factorization, not surprisingly, A is symmetric. A basically has about half the number of entries as a non-symmetric matrix. So the fact that the work is half is hardly surprising. But again, the factor of two is totally irrelevant. Nothing could be less relevant. Okay. So this is how you do Cholesky Factorization. And there is one thing I should say about this, which is good to know, and that is this: when you do a dense -- when you do a non-symmetric LU factorization, the P is chosen primarily so that numerical round-off does not end up killing you as you carry out this factorization. It doesnft end up -- you end up factoring something, but itfs not -- the product of the things you factor are not equal to the original matrix or something like that so your answer is totally wrong. So the P is chosen. That means that this P is often chosen, actually, at solve time. So when you -- you look at -- you have to look at the data in the matrix A to know whether you should do a -- thatfs called pivoting, by the way. By the way, you -- obviously, if you -- you donft have to follow everything Ifm saying, but itfs important to know some of these things. By the way, that slows down a code. Right? Because if itfs got branches and things like that -- and then if itfs got to actually pivot and stuff like that, you have data movement. On a modern system, this can be very expensive. Okay? So -- [Student:]Why did you say [inaudible] fewer elements in the positive [inaudible] case? It seems like theyfre the same size. Yeah, theyfre the same size. But this is a general one. So yeah, if you get N squared entries to tell you what it is -- this is symmetric. So you really only have half the increase. [Student:][Inaudible]. Okay. Now -- so here, you would choose this -- the P dynamically. So actually at -- when you factor a specific matrix. Here, you donft need it, and in fact, itfs actually well known that just straight Cholesky Factorization actually will -- is quite fast. And actually, itfs quite -- itfs stable numerically. In other words, itfs known that this is not gonna lead to any horrible errors. This means, for example, if this is small, like 100 by 100, you get something thatfs very, very fast. And also, itfs got a bounded run -- I mean, itfs got a bounded run time. It just -- it will be very fast. Right? Because there are no conditionals in it, for example. Okay. Now wefre talking about sparse Cholesky Factorization. That looks like this. Here, you of course, preserve sparsity by permuting rows and columns. Okay? So of course, any permutation you pick -- if -- well, I can write it this way. P transpose AP equals LL transpose. So the other way to think of it is this: you take A, and your permute is -- you carry out a permutation of its rows and columns -- the same one. Then you carry on a Cholesky Factorization -- by the way, itfs absolutely unique -- the Cholesky Factorization -- provided you normalize by saying that the diagonals are positive. Otherwise, itfs a stupid thing. Question? Okay. Okay. So this -- so since you can -- I can permute the As any way I like, and then -- Ifm sorry, compute the rows and columns of A and then carry out a Cholesky Factorization, when A is sparse, this choice of P is absolutely critical. So if you give me a sparse matrix, and I pick one P, therefs just a Cholesky Factorization. Period. And that Cholesky Factorization will have a number of non-zeros in L. In fact, if you choose P randomly, L will basically be dense for all practical purposes. Okay? And then itfs all over. Especially if your matrix is 10,000 by 10,000, itfs pretty much over at that point. Yeah? [Student:]Could you [inaudible] solve the equation using Cholesky? You need to know the matrix is positive-definite. Right? Thatfs correct. Right. So -- [Student:][Inaudible]. [Crosstalk] I can tell you exactly how that works. So in some cases, itfs positive-definite because, for example, itfs a set of normal equations. Thatfs very common. Or itfs gonna be something wefll talk about later today. Maybe wefll get to it -- Newton equations. Right. So a lot of times, itfs known. Now in fact, the question is what happened? Obviously, the matrix has a Cholesky Factorization. Itfs positive-definite. In fact, if L is -- especially if this is the case, itfs positive-definite. What happens if itfs not? Thatfs a good question. And here -- I didnft get very far in my Cholesky Factorization, but in fact, what would happen is this: at each step of a Cholesky Factorization, therefs a square root. Right? You really did the first one. The next one would be square root of -- well, thatfs basically the determinant. Okay? Itfd be the square root of the determinant would be the next one. What would happen is this: that square root -- if all those square roots -- if the arguments passed to them are all positive, then you will have a Cholesky Factorization. If that matrix is not positive-definite, one of these square roots -- youfre gonna get a square root of a negative number. Okay? And then several methods -- therefs actually different Cholesky Factorizations modified. In fact, some -- I forget the name of it, but therefs a type of Cholesky Factorization which you might want, and it works like this: right before it takes a square root, it has a test that says if this thing is negative, it throws an exception. By the way, it can even be arranged to throw -- to actually calculate a vector Z for which Z transpose AZ is negative, which of course is a certificate proving A is not positive -- or less than or equal to zero. Whatever. So it would do that. Thatfs one way. Unfortunately, a lot of the other ones -- because these are built for speed -- itfll just throw some horrible exception from the square root or something like that. So thatfs -- therefs a modified -- in fact, Cholesky is, in fact, the best way to check if a matrix is positive-definite. You find that number. You donft calculate the IG or anything like that. You just run a Cholesky. You donft want a basic Cholesky, which will dump core, and then that will be your -- then you have a system called -- anyway. No. Ifm joking. But the way it would work is this: youfd run with this modified Cholesky that, when it encounters a square root of a negative number, does the graceful thing, which is to return telling you, gCholesky didnft succeed,h which is basically saying, gNot positive-definite.h So thatfs the best way to check. Okay. Letfs go back to this one. This permutation. Now, in -- so herefs the way this actually works, and Ifll explain how it works. The permutation here is generally not chosen in Cholesky Factorization. What Ifm about to say is not quite true, but itfs very close and is a zeroth order statement -- itfs good enough. You donft choose the permutation in a sparks Cholesky Factorization to enhance numerical properties of it. Now technically, thatfs a lie, but the first order -- itfs pretty close to the truth. Itfs a perfectly -- itfs a zeroth order. Itfs a good statement. You choose the P to get a sparse L. Thatfs what you do. Okay? And what happens then is this: therefs a whole -- I mean, this actually -- therefs simple methods developed in the e70s where you would choose these Ps -- and I mean, they have names like approximate minimum degree. Then they get very exotic. Then therefd be ones like Reverse Cuthill-Mckee ordering. Then you get to very exotic orderings. So, thatfs all fine. Thatfs all fine, but now Ifll tell you as most of you would be consumers of sparse matrix methods -- in fact, you donft know it, but you have already been very extensive consumers of sparse matrix methods. Okay? Because when you take your little baby problem with 300 variables and CVX expands it -- I donft know if youfve ever looked, but if you looked, youfd find some innocent little problem youfre solving with ten assets or something like that -- if you look at it, itfll all of a sudden say, gSolving this problem with 4400 variables.h Well, trust me. Oh, systems of -- when it says 27 iterations, I promise you equations of the size 4400 by 4400 were solved each iteration. Okay? You would have waited a whole long time if this wasnft being done by sparse matrix methods. So youfre already consumers -- big time consumers of sparse matrix methods. So what Ifll -- what I wanna say about this is that even very simple -- by the say, all the things that you did, although itfs in the solver, which I donft know the details of -- those were -- the orderings were very simple ones. They were things like approximate minimum degree. I mean, just really simple, and for most sparsity patterns, this things obviously work fine. So thatfs what this is. Now the exact value really depends in a complicated way on the size of the problem, the number of zeros and the sparsity pattern. So actually, itfs interesting to -- I wanna distinguish the dense versus the sparse, so letfs actually talk about that. So -- and Ifll just talk on a laptop little PC -- something like that. Dense is, like, no problem. You can go up to, like, 3,000. I mean, just no problem. [Inaudible] scale something like N cubed. You already know that. And itfll just work. Itfll start allocating more memory and start taking, like, a macroscopic time, but yeah. Itfll just -- itfs gonna work. Therefs essentially no variance in the compute time. Itfs extremely reliable. Itfs gonna work. Itfs gonna fail if the matrix you pass in is singular or nearly singular, but then you shouldnft have been solving that equation anyway. So you can hardly blame the algorithm for it. So these things -- for all practical purposes, the time taken to solve a 3,000 by 3,000 equation with 3,000 variables -- the time taken is just deterministic. By the way, at the microscopic level, thatfs a big lie. But itfs good enough. And the reason itfs a big lie is very simple. Actually, depending on that matrix -- the data in that matrix, different orderings and pivoting is gonna do -- and if you have no pivoting, it might take less time or whatever. And you have a lot of pivoting and all that, it might take more. But itfs absolutely second order -- just not there. Okay? So thatfs the deal with -- so therefs no stress in solving a 3,000 by 3,000 system. Zero. Absolutely zero. It just works. You can say exactly how long itfs gonna take ahead of time, and thatfs how long itfs gonna take. Period. Now you go to sparse. Now -- well, it depends. Therefs actually two passes. It depends on the sparsity pattern. So letfs go to a typical example: 50,000 by 50,000 matrix. So you wanna solve AX equals B, 50,000 by 50,000. Now some people, you can make a small offering to the god of sparse matrix -- orderings that some people say is effective. But the point is, therefs no reason to believe that you can always solve 50,000 by 50,000 equations. Okay? And in fact, therefs ones that you canft. And it just means that whatever sparse solver youfre -- whatever ordering method is being used, typically on your behalf -- I mean, unless you do go into large problems. If you do large problems, you have to know all of this material -- every detail. Okay? If you donft work on huge problems, then this is stuff you just have to vaguely be aware of on the periphery. Okay. So in this case, 50,000 by 50,000 -- therefs a little bit of stress. You type A/B, it might come back like that. Youfll know examples where itfll come back like that soon. Okay? Or it might just grind to a halt, and you can go and watch the memory being allocated and allocated and allocated, and it just -- at some point, youfll put it out of itfs misery. So thatfs what -- however, the interesting thing is this. Once thatfs done -- if you do it in two steps -- if you do a symbolic factorization first, youfre gonna know. Youfll know. Then after that, itfs -- for matrices of that sparsity pattern, itfs absolutely fixed. Right? So this value that has a lot of implications in applications. If you write general purpose software that has to solve AX equals B where A is sparse, you donft know. Basically, you deal with it every time itfs fresh. If I pass you an A matrix, the first thing you do is you do a symbolic factorization. You start going over it, and first, you look at it and you go, gWell, itfs sparse. All right. Letfs see what we can do with ordering.h And then you do the ordering. You do the symbolic factorization. You know how many non-zeros there are gonna be. At that point, if you wanted to, you could announce how long itfs gonna take to solve that problem because once you know the sparsity pattern of L, itfs all over. You know everything. In fact, just the number of non-zeros is a good -- question? [Student:][Inaudible] symbolic factorization? Well first of all, thatfs in the appendix, which you should read. So letfs start with that. [Student:]Appendix B? Well, yes. Okay. Some people are saying itfs appendix B. Itfs one of the appendices. Therefs not too many, so a linear search through the appendices wouldnft -- itfs feasible, I think. So the appendices where this is covered in more detail, but you have to understand even that is like condensing a two-quarter sequence into 15 pages or something. So -- okay. So the interesting thing is if youfre gonna solve a set of equations lots of times -- like, suppose you wanna do medical imaging or you wanna do control or estimation or some machine learning problem or something like that, but you have to keep solving the same problem with the same sparsity structure. At that point, actually, you could invest a giant pile of time into getting a good ordering. So it could take you -- you could run this whole communities that do nothing but calculate ordering. You could actually calculate your ordering for three days, and when you get -- by the way, if you get an ordering that works and has sparse Cholesky Factorization, itfs no onefs business how you got it. Right? Like a lot of things, itfs just -- itfs fine. So you could do that. And then, it can be fast. So -- okay. All right. Well, thatfs an -- oh, let me just summarize this. For dense matrices, totally predictable how much time it is. Sparse -- depends on the sparsity pattern, depends on the method you or whatever is solving this equation on your behalf is using to do the ordering. By the way, if it fails on Matlab or something like that because itfs using the simple approximate minimum degree, symmetric minimum degree, blah, blah, blah ordering, it could easily be that if you were to use a more sophisticated ordering, it would actually solve it. Just no problem. Right? So -- okay. On the other hand, it is too bad that the sparsity pattern plays a role in these methods. On the other hand, hey. You wanna solve 100,000 by 100,000 equations? You canft be too picky. You canft say, gWow. Thatfs really irritating. Sometimes I can do it, and sometimes I canft.h So anyway. All right. Enough on that. And finally, one more factorization wefll look at is the LDL transpose. So this is just for a general symmetric matrix. Here, the factorization is this. Therefs a permutation. Therefs an L and an L transpose. In fact, they can have ones on the diagonal. In fact, you canft -- what you do here is D is actually a block -- itfs block diagonal, and the blocks are either one by one or two by two. The essential matrix is easily inverted, by the way. If I give you a block matrix where the blocks are one by one or two by two, can you solve DZ equals Y? Well, of course you can. Right? Because you either are dividing or solving a two by two system. So -- okay. And itfs the -- itfs sort of the same story. So I think I wonft go into that. Now, wefll look at something thatfs actually -- everything wefve looked at so far is just so you can get a very crude understanding of what happens. Therefs nothing for you to do so far. Right? So in fact, most of this stuff is done -- in most systems, this is done for you. You solve a -- you call a sparse solver. You can do that in Matlab just by making sure A is sparse and typing A/B, just for example. Or the real thing would be to directly call something, like, from UMF back or spools or therefs all sorts of open sores packages for doing this. And youfre probably using them whether you know it or not. So youfre certainly using it in Matlab. If you use R, youfre using it. If -- anyway. So -- actually, I donft know. Does anyone know? Do you know what the sparse solver is in Packagen R? [Student:]I donft have one. No, come on. [Student:]Itfs a separate package. Than -- okay. Thatfs fine. But it has one. Come on. [Student:]Yeah, but itfs not like private [inaudible] language. Well, thatfs okay. Thatfs fine. Thatfs -- I mean, come on. Itfs an open sores thing. It doesnft matter then, right? So thatfs fine. Okay. Now, wefre gonna talk about something that is gonna involve your action. Everything so far has been -- itfs just me explaining what is done on your behalf by some of these systems. Now, wefre gonna talk about something you actually need to know because youfre gonna need to do it. So -- and these are things, by the way, that will not be done for you. They cannot be done for you. Youfll have to do them yourself. So this -- now is the time -- up till now, itfs just sort of, gHerefs how things work.h Now you better listen because now therefs gonna be stuff you have to understand and youfll actually have to do. So wefll just start with this blocking out a set of equations. Really dumb -- I mean, we just divide it up like this. You can divide it up any way you like, and wefre just gonna do elimination. So itfs gonna work like this: wefll take the first row -- block row. It says A11 X1 plus A12 X2 equals B1. So I just -- I solve that equation by multiplying on the left by A11 inverse, which Ifm assuming is invertible. Obviously, a leading sub block of a non-singular matrix does not have to be non-singular. Right? Thatfs obvious, as in 0111 is obviously non-singular, and its leading one by one block -- its A11 is certainly not. But assuming it is, you can eliminate X1, and you just get this. Thatfs X1. You back substitute X1 into the second equation, and in the second equation, you get A21 X1. Thatfs A21 A11 times this junk over here. And then that plus A22 X2 equals B2, and that equation looks like this. Now, this guy you have seen before, although you donft remember it probably. Or maybe you do. Thatfs the sure compliment. The sure compliment just comes up all the time, so you might as well get used to it because it comes up all the time. By the way, if you have an operator called the sure compliment, thatfs what -- the Cholesky Factorization is, like, two lines because you just keep calculating the sure compliment of kind of whatfs below you or something like that. So thatfs the sure compliment times X2 is equal to this right hand side. Now -- so this suggests a method for solving this set of equations, and you do it this way. You first form A11 inverse A12 and A11 inverse B1 -- and by the way, I have to say something about this because this is sort of already high-level language. When you say form this, how do -- by the way, how do you do it? Do you actually calculate the inverse of A11 and -- it depends on what A11 is. So what this really means is the following: you factor A11 once, and you back solve for B1 and then each column of A12. Okay? So itfs understood when you see this in pseudo-code that you know what youfre doing and that you donft form this inverse and all that kinda stuff. Itfs just understood. Thatfs what it means. [Student:][Inaudible] in Matlab because we can wait to do that? Yeah. [Student:]Can Matlab store that [inaudible]? Yeah. So let me -- this is coming up on your homework, which wefll put on the web server. That is if itfs up. Sorry about the AFS outage yesterday. I donft know how many people noticed it. Anyone notice it yesterday? I think it raises our blood pressure a lot more than yours. So I gotta learn, actually, just to just kinda -- itfs like a power outage and just deal with it. But it makes you very nervous. Thatfs my, like, nightmare -- number six worst nightmare is final exam -- AFS outage. Or something like that. I donft know. Okay. What were we talking about? Fact solving. Yeah, yeah. Matlab. All right. Yes. Of course. Okay. Letfs suppose that you needed to -- letfs suppose we had to do a Cholesky Factorization just -- okay. So the way you do it is this: So youfre gonna factor a matrix A -- okay. Herefs what you do -- and this will work. So first, I have to tell you a little bit about what a backslash does. Okay? Which you could figure out anyway. So it does a lot of things. Ifll just give you whatfs essential for what wefre talking about now. Herefs what this does. First of all, it checks if A is sparse or something. But even if A is full, it looks to see if A is upper or lower triangular. Okay? If A is upper lower triangular -- actually, more than that. If it can be permuted to upper lower triangular -- but letfs say itfs just -- if itfs upper lower triangular, it will actually be back solved. Okay? So for example, that means if I write A is 3,000 by 3,000, thatfs a big enough matrix that you will see a macroscopic difference. And I type A/B if A is full. Itfs gonna take order N cubed. I donft know how long itfll take -- 20 seconds, 15, 12 -- I donft know. Something like that. Okay? I mean, obviously, it depends on the machine youfre on. But itfs gonna take a macroscopic amount of time. Youfll hit carriage return, and you will wait a little bit before you get the answer. Okay? If A is 3,000 by 3,000 and lower triangular, what will happen when I type A/B? [Student:]N squared? Yes. Itfs N squared, and itfll be -- I wonft even see it. Ifll just -- Ifll hit carriage return. Itfll take more time to actually go through all the -- to interpret it and to instruct X or whatever is drawing your window to type into that terminal. Thatfs gonna take way more time than solving it. So itfll go from, like, ten seconds to nothing. We can -- in fact, we can guess how much faster itfll be. Itfs gonna be on the order of 3,000 times faster -- thatfs by the order. But in fact, the numbers are different by a factor of two. So itfll be about 1,000 times faster. This is my guess. Okay. So this is the first thing you need to know. Okay. Now, suppose you wanna do -- suppose you wanna solve AXI equals BI in Matlab where you have a bunch of these. Okay? So this would be intensely stupid. I mean, it should work, but anyway -- that doesnft matter. If you did this -- Ifm just writing something thatfs in between -- I donft even know what it is. But anyway, if you did this here, this thing -- what do you think itfll do? Itfll just -- itfll look at A fresh each time and go, gGreat. Letfs factor that guy.h Right? Itfll even recalculate the permutations fresh every time. Okay? So this is not the way to do it. Okay? If you do this, though, it will actually do the right thing. If you do that, itfll do it. I mean, this is nothing but a batch -- thatfs a batch solve. Thatfs all it is because the columns of A inverse B, where B is this thing, is nothing but A inverse B1 inverse B2 and so on. So you can call this a batch solve or whatever you want. So if you write A/B, itfll do the right thing. Itfll factorize A once. Itfll go through the columns and back solve. Actually, to tell you the truth, even that is not -- thatfs actually not what it does. But letfs pretend thatfs what it does. It doesnft do that, but it again blocks it out and calls an LA pack code. But these are fine details. Thisfll do the right thing here. Okay? So for example, what is the run time of this versus this? Whatfs the run time of those two? [Student:][Inaudible]. Identical. Basically identical. Itfs not identical, but itfs -- basically, itfs the same for all practical purposes. Okay? So thatfs an important thing to know. Now, we can put all these together. And now suppose you need to factor A once, but then you donft have all the things you wanna back solve at the same time. Then you canft do this trick. Itfs not a trick. I mean -- okay. Herefs what you do. Now wefll get through it, and Ifll show you how to do it. You would do something like this. Ifm just gonna do this approximately. By the way, I think this will give you the upper triangular version of L, but it hardly matters. So you do this once. Okay? And the cost on this is N cubed. Is that the comment? I think it is. Okay, there. Okay? Thatfs an N cubed thing. Okay? So for 3,000 by 3,000, youfre gonna -- therefll be a microscopically visible delay when you do this. Okay? If by then, do you -- well, letfs see how I can do this. You have LL transpose equals A. So A inverse B is L minus TL inverse B. Is that right? Yeah. So herefs my -- ready for my code now? If I do this in store L, anywhere else from now on -- if I wanna solve AX equals B, I can do it super fast like this. Letfs see if I can get this right. There. There you go. I can execute this anywhere I like with different Bs, and it will execute super fast N squared. Do you know -- do you see why? The reason is -- I mean, therefs a lot of interpretive overhead, so the whole thing -- I mean, this is a bunch of stupid overhead. Basically, it looks at this -- I mean, Ifm just -- this is just -- this is the formula for that. Right? So it looks at this, and it says, gOkay. Letfs look at L.h And it looks at L, and it pokes around for a while. And then after a while, it says, gSay. L is lower triangular. Oh. Okay. Ifll call the back solve instead of doing blah, blah -- instead of doing -- factoring it. Itfs already factored,h or something like that. So wefll do this. Then this will be passed to this thing, and itfll look at that, and itfll analyze the matrix as if itfs never seen it before and go, gHey. Wait a minute. That thing is upper triangular. Oh. Okay. Ifll just go ahead and --.h By the way, all of this thinking and -- takes way longer than to actually do it, unless the matrix is a couple thousand by a couple thousand. Right? So -- and then itfs fine. But the point is that this will now execute very fast. So I think I just answered your question. Okay. All right. So letfs -- back to block elimination, which is what this is called. So what you do is you form A11 -- and actually, the comment I just made is very important because itfs about to come up right now. So these two, you would solve by factorizing A11 once and then back solving on A12 concatenated with B1. So this line would translate into a very compact thing. Okay. Then you form the sure compliment. Now, to form the sure compliment, the point is that youfve already calculated these columns. So you would -- this is just a matrix multiply here and a subtraction. So you form the sure compliment, and you form B tilde -- thatfs this right-hand side guy here. This guy. No more of this inverse is appearing here, but youfve already calculated the sub expressions, so this is fine. Then you determine X2 by solving this equation. Thatfs the sure compliment times X2 B tilde, and then you get X1. Once you get X2, you go back and you replug X1 from wherever our formula was -- here it is. Right here. From this guy. Okay? Now, the interesting thing here is you have to do the following: you have to solve A11 X1 equals this. But guess what? A11 you already factored on step one, so this last -- step four is actually a back solve. It is not a factorization -- the back solve. Right? So these are very important things. [Student:]I have a question. Yeah. [Student:]Why does the sure compliment arise? Like, whatfs the common thread? Oh. [Student:]Is it just coincidence? Or is there, like [inaudible]. [Crosstalk] No, itfs not coincidence at all. Yeah, therefs a physical reason for it. It has lots of -- I think you did a homework problem on it, anyway. I think itfs -- it comes up when you solve one set of equations -- if you have a block set of equations, and you solve one in terms of the other. Thatfs basically where it comes up. So -- and it comes up in a lot of physical things, too, like in networks and -- it just comes up all the time in all sorts of things. PDEs -- it just comes up everywhere, basically. So whenever you have some linear system, and you impose some boundary conditions on some of the variables, a sure complimentfs gonna come along. It comes up in statistics, too. I mean, this is -- this happens to be the covariant -- this is the error covariance of estimating X1 given X2 or the other way around. Do you remember which it is? One or the other. But I have a Galician joint variable, and I estimate X1 condition -- if I look at the random variables expected -- the condition expectation of X1 given X2. I look at the conditional covariance -- itfs this thing. So just -- it just comes up everywhere. So -- okay. So if we work out this block elimination and see what happens, you have a big drum roll, and you work it out, and you think itfs all very clever. When you total it up, you get something like this. And wefre gonna do it in terms of factorizations and solves -- back solves. You get something that looks like that. Okay? And wefre going to assume, by the way, that the smaller system -- the sure compliment one -- wefre gonna do by dense methods. So to pay for the dense method, itfs gonna be N cubed. So you get something that looks like this -- N2 cubed. You get this. Now, if you just plug in the general matrix A, you -- it comes out, actually, not down to the leading order, but of course, as it has to be down to the flop. Itfs identical to what we had before. So therefs no savings. Therefs absolutely no reason to do this. Well, sorry -- in terms of flop count for a dense matrix. Now, the truth is this is exactly how itfs done when you do blocking and stuff like that. This is how you get fast because you arrange for, like, the sure compliment block size to be just right for your registers and your L1 cash and all that kinda stuff. And you get that size just right, and this thing will give you a big speed up. But thatfs a secondary question. Now, this is useful -- when this is -- block elimination is useful exactly when A11 has extra structure that can be exploited. Okay? Now, thatfs -- when it does, this method is gonna speed you up like crazy. So letfs look at some of those. Herefs an example. Wefll do a couple of examples, actually, because theyfre very, very important. If A11 is diagonal, okay? That means that your matrix looks like this. Ifm just gonna draw a pattern like that. Okay? If it looks like that, then youfre gonna do unbelievably well, and herefs why. Letfs just put some numbers on here. Letfs put 10,000 here, and letfs put 1,000 here. Okay? Something like that. Youfre gonna do really, really well. So I have 11,000 equations, 11,000 variables. But itfs blocked in such a way that A11 is diagonal. Okay? And by the way, this has meaning -- this will have meaning in any application which this arrives. It basically common variables, and everybody depends on them. Every equation, everybody -- if you go -- if you look at every variable, they all depend on those 1,000 common variables. But then therefs 10,000 variables that donft depend on each other. And all Ifm doing is Ifm providing the English description of this sparsity pattern. Thatfs all Ifm doing. Okay? So if you see this -- actually, from this class onward, if you see this, some big green lights should light up. This is what you want to see. If you see something like this, you will know this will scale to unimaginably large systems. Okay? And itfll scale very gracefully. Now, the reason here is this. You just go back and look at what wefre gonna save, and you can actually even determine -- I lost my -- oh, here it is right here. All right. So letfs actually see what happens in this case. A11 is diagonal. Well, this is, like, super cheap right here. Thatfs, again, a non-issue. Donft even think about it. This -- by the way, herefs the joke. Forming S is now the dominating thing. So in fact, if you were to -- the joke, as in this problem -- if you were to actually do a -- if youfre gonna profile the code that solves this, you would find out that, most of the time, it would be done doing matrix multiplication. It would actually be forming this. Okay? Thatfs gonna dominate completely, and in fact, itfs gonna be something like -- itfll be 1,000 by 10K by a 10K by 1,000. Matrix multiply -- thatfs it. In the end, youfll have to solve 1,000 by 1,000 system, but thatfs gonna be 1,000 squared, and this one was 1,000 times -- 1,000 squared by 10,000 is ten times more. So we know exactly the number of flops itfs gonna take. Itfs gonna be 1,000 -- itfs gonna be multiplying this matrix by one like that. Thatfs it. And it will be seriously fast. If you were to take this and pass in a matrix like this to -- and just write this back slash something, itfs not gonna work. I mean, basically, itfll just sit there and -- actually, itfll very happily start factoring this -- very happily. This will not be recognized -- this thing. This structure will not be recognized, and itfll start factoring things and things like that. And most of the time, of course, itfs multiplying zero by zero and adding it to zero and then storing it very carefully back and then accessing it again later -- pulling it back in. But itfs very happy to do so. No problem. Itfll just do it, and it just will never finish. Okay? So -- by the way, this is called arrow structure. So itfs good. Itfs a good thing. So from now on, arrow -- by the way, this -- letfs do another example. Suppose, in fact, this 10K was ten 1K by 1K blocks. Okay? I canft fit ten in there, but you get the idea. Okay? By the way, therefs a story behind this. Basically, it says, gI have 11 groups of 1,000 variables each.h Okay? One of those groups connects to everybody -- thatfs this last group. The other ten only connect to that last guy. So -- in fact, you should even visualize the following graph. Right? It looks like that except therefs ten of these. Okay? So each note here on the graph is a group of 1,000 variables. This group does not depend on that one, but one of them depends on all of them. How do you solve this? By block -- I mean, you do block elimination, thisfll be way, way fast because how do you do A11 inverse? A11 is 10,000 by 10,000. Whatfs the effort to compute A11 if itfs ten 1,000 by 1,000 blocks? How do you actually -- well, you donft calculate A11 numbers. Sorry. How do you calculate A11 inverse A12? How do you form this expression? How fast can you -- whatfs the cost to factorize a block diagonal matrix? Yeah, you factorize each block. So this is ten times 1,000 cubed divided by three. That beats like crazy 10,000 cubed over three. I mean, by a long shot. Okay? So thatfs very important. Okay. [Student:]So are -- is the moral of the story donft use A/ and just go through this algorithm in your own script? No. Now Ifll tell you a little bit about this. Okay. So having said all that for this example, herefs what would happen. So -- okay. So let me now, letfs talk practically about what would happen here. It would probably work. If you had a matrix whose sparsity pattern looked like this, okay? If you did this -- now, let me just explain a few things. If you just did this /B, too bad. Itfll never work. And itfs 11,000 by 11,000 -- never, never. Okay? However, if you gave, say, Matlab or some other system the hint that this was sparse -- so if you say itfs sparse, all youfre saying is that therefs a bunch of zeros. There are a bunch of zeros. You see this giant triangle here -- giant triangle here. These are all zeros. Okay? Thatfs a lot of zeros. Itfs a lot of non-zeros, though, over here. Right? But if you do -- then, actually, it is overwhelmingly likely that it wonft do a bad job because the method will actually get all of these guys first, and then -- in the ordering, you will get something -- it might not be -- it might not get it exactly right, but itfs likely to do pretty well because this -- so in fact, this arrow structure is one that a normal sparsity handling system is probably gonna recognize. Okay? So in this case, declared as sparse -- do backslash -- you can do your own experiments, but theyfre not really very interesting experiments. Theyfre just experiments on what some people implemented in messages. I mean, itfs not that interesting. Right? I mean, the important part is if you do real problems like this, youfre calling these things directly. Then, of course, itfs your job to do this. But now, let me mention some others where itfs not obvious. Okay? Letfs do one. Herefs one. Letfs suppose you do medical imaging, for example, and suppose you do MRI or something like that. And you have to solve an equation that looks like this. Okay? Letfs make this like one million variables, and letfs make this 1,000 here. By the way, I have no idea what this is, but letfs just imagine that such a thing is there. And this one million by one million here -- if this is, like, diagonal, you know what to do. We just did it. Ifm gonna make this, though, a DFT. Okay? So itfs a discrete Fourier transform. Thatfs a dense matrix -- completely dense. So you can form -- well, you could attempt to form your million plus 1,000 by a million plus 1,000 matrix and actually attempt to solve it. Itfs completely dense. Everyone agree? Totally dense. Okay. Those sparsity is gonna help you out -- the gods of sparsity -- you are on your own on this one. Okay. However, how fast can you solve this system? You canft even store this, right? All you have to store is this part, right? This is pushing it, but thatfs your fault for going into medical imaging. You do medical imaging, youfre not gonna do this on something with a couple of gigs of RAM, right? Obviously, we can work out what that is. But this is perfectly storable otherwise. You donft store this block, obviously. How do you solve this? [Student:][Inaudible]. Yeah. Of course. So you look at this, and you go back, and you say, gWell, what do I have to do?h And you go, gWell, I have to calculate A11 inverse a couple of times.h By the way, you do not solve a discrete Fourier transform equation by -- you canft even form -- if itfs a million by a million, you donft even form a factorization. It doesnft matter. You actually calculate by the inverse DFD. Itfs analogue N. So itfs 20 times 20 times a million flops. Itfs way, way fast. Okay? Everybody got this? So -- by the way, these are not obvious things. Theyfre known by lots of people, but probably more than an order of magnitude, more people who should know these things donft know them. So thatfs why Ifm telling you these things. So -- okay. So this is very important to know. So -- in fact, I cringe whenever I hear people very unsophisticated -- in very unsophisticated ways talking about scaling. And itfs like, gWell, I tried that. Itfs way too slow.h Well, anyway. By the way, the difference between this and just typing A/B -- the technical name for that is -- this time, they actually -- thatfs called stupid or nai"ve linear algebra, and this is called smart linear algebra. Smart linear algebra means you kinda know what youfre doing, and youfre exploiting structure at this level. Thatfs what it means. So -- [Student:]How do you [inaudible] in English? How do you communicate it -- well, you -- well, it might -- itfll just end up being sparse. I mean, do you mean in Matlab, how do you do it? Therefs a very bad way, and then therefs a good way. You make it sparse in the beginning, and then itfll be preserved as you do the right operations on it. You have to be very careful in Matlab because, like, one false move -- one little character here, and boom. You have a dense matrix. You add identity to it. Now, adding identity to a sparse matrix, you got a sparse matrix. Right? Not in Matlab, you donft. So you have to know, gOh, you shouldnft add IEYE, you have to add spy or something.h I donft know. SPEYE -- that would be the sparse identity. [Student:][Inaudible]. Right. Now, of course, that makes you wonder under what circumstances the identity not sparse, and I donft know of any. Maybe two by two case because then you could say, gWell, only 50 percent of the entries are non-zero.h I donft know. Itfs not my -- but anyway. Okay. So I think thatfs it. So basically, you wanna look for things -- block things. If you look at a block, and you see a block thatfs easily solved, exploit it. By the way, if whatfs easy to solve about it is it has to do with itfs sparsity, you can just be lazy and let your sparse -- let whatever -- do your offering to the gods of sparse matrix factorization orderings -- heuristics for the orderings, and hope that it works. But if itfs dense or something like that, youfre gonna have to do the work yourself. That comes up in a bunch of cases, one of which wefre about to look at right now. But thatfs the idea. Letfs quickly talk for fun about some fast -- what systems can you solve fast? I mean, wefve already looked at some obvious ones: diagonal, lower triangular -- we just talked about one a few minutes ago: block diagonal. How about, like, block lower triangular? Can you do that fast? Yeah, of course you can because if itfs block lower triangular, you can do back substitution at the highest level or whatever, and youfll have matrix inverses to do on the blocks. Okay? We talked about another one, [inaudible] FD. Sorry -- thatfs the algorithm. DFD -- so discrete Fourier transform is the operation. [Inaudible] FD is am algorithm, which does it fast. Others would be things like DCT -- so Discrete Cosine Transform that youfd use in imaging. Anybody know others that are fast? [Student:]Toplets? Which one? [Student:]Toplets. Toplets, yeah. Toplets is a good one. So you have fast algorithms for toplets matrices, and they all have names. And usually, people are tortured by having to learn them every operation by operation like Levinson-Durbin is one of them, and -- did they teach this in statistics? Probably. No. Okay, good. Fast -- oh wait. It doesnft matter. For toplets? No. Okay. Hunko matrices is another one, so it goes like this. [Student:]Circulant. Circulant -- yeah, circulant. How fast can you solve a circulant? [Student:]N log N. N log N. Right because you take an FFT, multiply FFT back. So that -- and circulant, of course, is the matrix is circular convolution. Convolution -- well, thatfs toplets. Wefve already covered that. Therefs a bunch of others, and itfs fun to actually poke around and learn about others, like therefs one called the fast Gauss transform that comes up in machine learning. Thatfs, like, way fast. So itfs -- therefs a bunch of them that are, like, order N. Therefs some other ones that come -- theyfre whole fields that can be described as fast methods of solving AX equals B. For example, if you solve an elliptic PDE, you are solving a very specific AX equals B. Okay? And those can be done, like, some of them are light, and they can solve it insanely fast. Itfs not easy to do. It is not trivial. But for example, if you go to someone and say, gOh, I really -- I have this grid. Like, I have an image,h letfs say. Right? For example, suppose you have an image, and all the constraints only connect a pixel and its immediate neighbors. And you say, gI have to solve a set of linear equations that looks like that.h Itfll have a very characteristic look if you look at it. Itfll be, like, stripes and things like that. And if you do PDEs and things, youfll just immediately say, gAh. Okay.h But in fact, solving that equation, thatfs the one-one block in an image. Letfs do the image youfre doing or will do or have done in homework eight -- seven. What are you doing now, anyway? [Student:]Six. [Student:]Six. Oh. Well, that shows you what wefre thinking about. All right. So suppose we scale it up to a million by -- 1K by 1K, so you have a million variables. You -- actually, each step in that algorithm -- the 20 steps you will solve -- I promise you a million by million equation. Definitely. Itfs super duper sparse. Right? Because each row and column will only have, like, two and three entries because youfre connected to yourself. Youfre connected -- four entries. Whatever. Youfre connected to your neighbor above, below, left, right. Right? So super sparse. By the way, sparse orderings wonft do so well. It does okay for your homework -- or it should, anyway. It does. But it wonft scale to bigger numbers very gracefully. However, if you look at it carefully, youfll realize that, in fact, solving that set of equations -- that big A11 block for that problem, you know what it is? Solving an elliptic PDE on a uniform grid. Then, you go to your friends. Itfs like a plus on equation or something like that. So you go to your friends who know about PDEs, and you say, gHow do you solve that?h And they fall down laughing. And theyfre like, gAre you kidding? 1K by 1K? We precondition. Blah, blah, blah -- Fourier transform -- multi-level.h I mean, whatever they do -- therefs probably some people here who do this. Okay? Itfs not easy. Itfs a whole field. Right? But the point is, they know how to do it. And then you -- after they calm down, you say, gGreat. Could you just give me the code that solves that?h And then wherever you have to do A11 in the back substitution, you put that there. Now youfre done. So, okay. Okay. Thatfs enough on that. Letfs look at a consequence of this. Itfs the idea of a structured matrix plus a low rank. This is actually very important. This is also extremely good to know. And this is also -- this is something that you have to do. In other words, you better recognize this because no automatic system can do this. Period. So you have to do it. So here it is. Suppose you need to solve A plus BC times X equals B. Normally, B and C are small. So youfre supposed to think of B -- this is low rank. This is, like, rank P. B is a skinny matrix. C is fat, and A is something thatfs quickly invertible. A could be the identity or diagonal -- block diagonal, lower triangular. You take your pick. It could be solving a plus on equation. It doesnft matter. Itfs something. You can do fast circulant -- who cares? Okay. And you wanna solve this equation. Now, herefs the problem. Generally speaking, whatever structure was good about A is almost certainly destroyed by adding -- in fact, just add a dyad. Just add a little B, a little C to it, and itfll destroy anything. Itfll -- and of course, itfll destroy a DFD matrix, a DCT matrix, toplet structure is totally blown out of the way. Diagonal -- forget it. You make an outer product, itfll spray it non-zeros everywhere. A block diagonal ruined -- so thatfs kinda the idea. Okay. By the way, once someone calculates this and hands you this matrix -- all over. You canft -- if they donft tell you itfs A plus -- once they multiply this out and hand it to you, itfs all over. Youfll never recover. Youfll never get any speed out of it or anything. Youfre doing -- youfre stuck at N by N now. Okay. So letfs kinda do this. What you do is you do what we did in duality, which is you uneliminate. You introduce new variables. Itfs strange. So herefs what you do. You uneliminate. You write this as AX plus B times CX, and you introduce a new variable called Y, which is CX. You uneliminate. Then you write this as AX plus BY equals B, and then CX minus Y equals zero. Thatfs that this is Y equals CX. So, if you can -- these two are completely equivalent. If you solve one, you can solve the other. Okay? Now when you look at this, you should have the following urge: you should look at a matrix like that, and your eye -- donft ignore this. Your eye should gravitate towards the minus I. I mean -- well, yes. Thatfs right. Okay. Sorry. Your eye should gravitate towards this minus identity. And it should look at that, and you should have an overwhelming urge to carry out block elimination because a matrices is easy to invert. They donft come simpler that I. Right? So you should have an overwhelming urge to eliminate this. By the way, if you eliminate this I, what will -- if you eliminate this thing, what will happen? Youfll get the short cut, and youfll go, gOh my God. This is great. This is fantastic. Thatfs an I. I now know that if I see an easily inverted matrix appearing as a block in the diagonal, I know what to do.h You do it, youfll form the sure compliment. Thatfs the sure compliment. Okay? So by the way, if the I were huge and A were small, great. But of course, this is [inaudible]. So if you eliminate this block first, youfre actually back where you started. So now the trick is -- so you have to actually hold off on the urge to eliminate an easily inverted block. You must hold off. This is very important, and instead, you eliminate A. Okay? So thatfs the idea. And actually, let me explain the reason. The reason is this: in this application, I is small. This is -- letfs say A is a million by a million, but B and C are each a million by ten and ten by a million. But anyway, thatfs what they are. In that case, this I is ten by ten. So it doesnft really help you to eliminate this. Itfs this million by million guy you wanna go after. And A, for example, could be a DFD -- a discrete Fourier transform. Right? Or something like that. Thatfs the one you wanna go after. If you eliminate that, you get this: I plus C inverse B times Y equals this. And this allows you -- this, we can actually handle efficiently. Why? Because you put A inverse B here. Thatfs fast because, somehow, you have a fast method for doing A inverse. You form the -- in this case, you form the smaller one, and then you solve that. And then this is sometimes written -- this has got lots of names. Itfs called matrix inversion lemma, but itfs got lots of other names. Therefs -- if youfre British, itfs called Sherman-Morrison-Woodbury, and then any subset of those, depending on if you went to, like, Oxford or Cambridge or something like that. Okay. So thatfs one. And itfs also called -- letfs see. Matrix inversion lemma, Sherman-Morrison-Woodbury -- therefs one more. I think itfs if you came from MIT, therefs some name you have for it special. It doesnft matter. Youfll hear another name for it. So -- whatfs it called? Anyway -- I donft know. Ifll think of it. It hardly matters. The point is youfll hear lots of people talk about it, and itfs -- sometimes, itfs just written out as this formula: A plus BC inverse is equal to this, assuming all the inverses here are -- A is invertible and A plus BC is invertible. You get this formula. And you would see this -- I mean, here are the types of things you should know. Itfs a staple in signal processing. In fact, essentially, most signal processing relies on this. This is basically the one non-trivial fact of all signal processing up to about 1975, Ifd say. Itfs just this. Same -- by the way, this is also the trick for optimization. So let me explain what it is. In that case, B and C -- I mean, the most classic case is something like this: diagonal plus low rank. So here -- well, I canft use lower b. So PQ transpose -- there you go. So, suppose you need to solve that -- something like that. So diagonal plus low rank is a beautiful example of a matrix that, if you know what youfre doing, you can solve equations diagonal plus low rank so fast itfs scary. But if you donft know what youfre doing, itfs a total disaster. So this one, you can solve. Itfs actually -- in fact, what is the order of solving that? Order N. Itfs just order N. Itfs -- therefs nothing here to do. Itfs just order N. So if someone says, gDonft you understand? I have to solve a million by million equation?h And you go, gYeah. I understand.h And they go, gItfs totally dense. Dense! A million by a million! Thatfs like 10 to the 12 entries in my matrix.h And you go, gYeah, I know. But you can solve it in about a half a second. Maybe less.h Okay? But you have to know what youfre doing to do this. Okay? Nothing will recognize this. By the way, you might ask, gIs it easy to recognize a matrix, which is diagonal plus low rank?h If it were easy to recognize -- I guess you know the answer, by the way. Ifm asking. But if it were easy to recognize, you can imagine some processing or some higher-level thing that would look at the matrix -- it could think a while. If itfs a big matrix, this might be worth thinking. So it can look at it for a little while and go, gLook at that. Youfre lucky. Itfs diagonal plus low rank.h So what do you think? Is it easy to determine diagonal plus low rank? No. Itfs a famous problem in statistics. Itfs in lots of other -- it comes up in lots of areas. Itfs impossible. Okay? Let me just ask this just to see if wefre on the same page here. How about this: how about alpha I plus PQ transpose? Do you think is that right? Am I asking the right thing? I think I am. All right. By the way, Ifm not sure I know the answer to this one, but letfs try it anyway. This might -- what Ifm about to say -- my example might be just for symmetric, but letfs just try it anyway. How about a multiple of the identity plus rank one? Can you detect that in a matrix? How? [Student:][Inaudible]. [Crosstalk] [Inaudible]. The eigen values will do it. Thank you. Thank you. What are the eigen values? [Student:][Inaudible]. You got it. Okay. And thatfs if and only if. Right. So in fact, this we can determine. This we can determine. Okay? Because one eigen value will be different, and all the others will be equal to alpha. Okay? [Student:][Inaudible]. Correct. Right. So then we say fantastic. We have a method of recognizing a multiple identity plus low rank. Okay? And we have to calculate -- all we have to do is calculate the eigen values. And whatfs the cost of that? N cubed. Actually, with a multiplier, itfs about five or ten times more than just solving the stupid equation. Okay. So -- all right. Well, anybody had this? I mean, the main thing herefs to think. But diagonal plus low rank? Forget it. So -- okay. So -- all right. So thatfs the picture. And these are just sort of stunning -- you get stunning speed-ups if you know this material. By the way, I should say I went over this kind of fast, and a slightly less fast description is the appendix, which you should definitely read. And wefre going to assume, moving forward, that youfve read it and got it. A lot of this, by the way, you can experiment with. I mean, you can experiment with it in Matlab. You have to be careful, though, because youfre kind of experimenting half with -- half of it is this method, and half is what Matlab actually sort of does and stuff like that. So -- okay. So I guess wefll quit right here.