All right. Now I guess Ifll take that as the ? as a sign that wefre on. So actually, first of all, embarrassingly, I guess I have to introduce myself even though itfs the second week. And I have to apologize for not being here last week. Ifve ? actually, for the record, Ifve never done that before. Ifve never missed an entire first week, and I promise I wonft miss any more classes, maybe, mostly, itfs ? anyway, Ifll make a very ? if I miss anymore classes, itfll be for a really, really good reason. Anyway, you were in very good hands with Jacob, so thatfs okay. A couple of announcements [inaudible], Ifll start with a bunch of administrative things. The first is that we have two more TAfs. This is Argerio Zimnus, who is sleeping, as most senior PhD students would be at this hour, and Cwong Mo Co is over here. What wefll be doing is adding more office hours both for them. We now have ? since we now have essentially like an army of TAfs, wefll ? we can have ? we were joking about just having office hours like 24/7. I donft think we can pull that off yet, but wefre going to do something like that, and in fact, I do have some questions. Also Ifm going to add office hours for myself, and we donft know ? wefre trying to figure out what ? I donft know, if people have opinions about where we should distribute the office hours, so I donft know. We have our own conjectures, but if anyone has any other? For example, in my case I might add them Tuesday and/or Thursday afternoons. What do you think? Yeah? Oh, that was a yes, oh, okay. Okay, all right, fine. Then Ifll do that. Ifll add office hours for myself, and for the TAfs. Now this week, of course, is chaos week, otherwise known as the equalsf week. If youfre not in this department youfll ? you actually ? you cannot even imagine what this week is like, but just look around you and youfll get the idea. If youfre in this department, you know full well. So those office hours, at least in my case will kick in ? have to kick in next week, but look to the website for that. Letfs see, a couple of more administrative things. Herefs one, and I ? it has to do with the homework. What wefre going to do is we actually did a little bit of planning for the homework, and the way itfs going to work, and actually itfs a good excuse for me to say a little bit about how the course is gonna go for a while, and this is actually substantive. Homework 1 and Homework 2, Homework 1, which youfve already started ? yes? Okay, so Homework 1 and Homework 2 are going to be just these big long slogs through ? I donft know, itfs ? youfll have nightmares about it, and people coming up to you on the street and saying, is this set convex, is this function convex, and things like that for years. But the nightmares will subside eventually. Theyfll always be there to some extent, so therefll be little triggers later in your life that will set it off again. Someone will say something like, which of the following, and youfll ? something will remind you of it and youfll get all on edge. But seriously, what wefre going to do is this; wefre trying to go fast over the first material, which is actually a little bit, I mean, it is interesting. Wefre going to use all of it eventually, but itfs sorta the mathematical basis of what wefre gonna do, and so the idea is to actually not cover it as well as we should. Meaning, wefre going to go fast. Faster than you ? therefs no way you could kind of absorb every subtlety. So what wefd like you to do is read the book. I mean, that goes without saying. Read the book, youfll do the homework, listen to the lectures, and youfll get maybe 75 percent of it, something like that. Actually, thatfs enough because whatfs going to happen then is in ? by Homework 3, itfll actually get interesting and wefll be doing a lot of applications from then on, kinda mixed in with new material, and at that point it gets interesting. So the idea is to avoid having something which for four weeks, non-stop, is essentially just a math course, so thatfs sorta the rational behind this. On your part what it means is this; if you think wefre going fast, youfre right, we are. But it also means donft worry about getting absolutely everything. All the things wefre gonna to see youfre gonna get lots more chances, and itfll be more interesting later because itfs gonna be in an applied context. Having said that though, this is your time to kinda get down some of the basics. Okay, so thatfs our plan for how the course is gonna go. I should also say that there was one change. I donft know if it was announced yet, and that is that the homework wefre gonna try to have graded a little more seriously. So the ? then for example 263; in 263, the homeworkfs were graded on a scale of 0 to 4, very crudely. Thatfs nominally two bits. The actual amount of information from the entropy was on the ? of .2 bits. We havenft worked out actually what the entropy is, but it was something like that. So wefre going to try to do ? at least give you a little more feedback and make this 0 to 10. Itfs going to be imperfect, but wefre going to attempt that. So itfs going to look something like that. The idea is if itfs perfect, thatfs a 10, if you ? anyway, wefll ? when we ? wefll get into that later. Another admin ? maybe just two more admin things and then wefll move on to real material. The other is that we posted another reading assignment, and so right now you should be reading Chapter Two, and when you finish Chapter Two, keep reading, just go right on to Chapter Three, and four for that matter after that. And then letfs see, one more admin announcement. This is a bit specific, but itfs a good excuse to mention something. We got several questions about what do inf and sup mean, and thatfs a good excuse for me to say a little something. So, you are welcome ? if you donft know what these mean, youfre welcome to treat these as min and max, okay? So, two ? one thing is actually for questions like that. When we get more than 10 or 15 questions, wefre going to start a FAQ, on the website, so please let ? what? Unknown Speaker: Itfs already there. Sorry, itfs already there. Okay, it wasnft there two minutes ago, but anyway. So therefs a ? therefs going to be a FAQ section in the website. Already pushed over? Okay. So there is a FAQ section in the website. And so please check there if you have some sort of basic question first. And this is anything ranging from, I canft get this software to work, to what do inf and sup mean, or when is Homework 3 due, or I donft ? I guess thatfs posted on the website, but that kind of stuff. Wefll just post stuff there. And if you have suggestions, actually, for us to add stuff there, just send us an email. Okay. And on this topic, I should say a little bit; if you have had analysis, of course on mathematical analysis, then you know what infimum and supremum mean. If you havenft, I would simply substitute min and max. And the way the course is going to work is something like this; the book, by the way, is not ? I mean, there are many more mathematical books, so if thatfs your interest on your website as pointers to other books, you can go insane. And it can be as formal as you like. By the way, outside the lectures youfre happy to hunt me down. Ifll tell you what I know, if this is your interest. Frankly thatfs not really the whole point of this class, but Ifm happy to answer any question about that. And Ifm talking about things like detailed conditions, is this open, is this closed, what happens if the boundary ? all these kinds of things. You know, what are the exact conditions under which this holds and things like that. In lecture Ifll be very caviler about these things. So boundary conditions donft hold, everything I say is modulo, some technical conditions. In the book we, generally speaking, donft lie. If you look in the book therefs just a few things that are wrong. Well, you can treat it as true, the book. But in lecture Ifll just be happy to just go broad brush over these things. And I think the class, by the way, is ? just from past experience, if youfve had ? if you have a deeper understanding of all the analysis and stuff like that, great. But I havenft noticed that it makes any difference in the experience. You donft have to know it; you can work with an intuitive idea of these things. Not really worry too much about boundary issues and things like that and youfll be just fine and totally effective when ? at using these methods. Okay, so thatfs all my admin stuff, any other questions? Otherwise, wefll jump in. So today wefll cover convex functions. And let me say a little bit about how this is going to work. If you go down to the pad here, that would be great. There we go. So wefre going to look at convex functions. First of all, just define what one is, and then look at various aspects of it. And let me say operationally why you need to know this; because one of the things youfre going to need to do is actually determine if a function is convex because to use this material, thatfs like the most basic skill. So thatfs why we want to ? thatfs why you want to look at all this and see how this works. Okay. So letfs just jump in. Well a function is convex if its domain is convex. Thatfs the first requirement. And the second is that it satisfies this inequality for theta between zero and one. So this says that if you evaluate F at ? this point you saw in the convex sets lecture. This is a convex mixture, a convex combination of X and Y. So geometrically itfs a point on the line segment between X and Y. This says if you evaluate a function at a point on a line segment between X and Y, the result is actually less than the same mixture of the values of the end points. Or in terms of the graph, it says that if you take two points on the graph of the function and then draw the straight line that connects them, and I guess an old word for that is the cord, or something like this. Right, so thatfs a very old word, is the cord, and it says ? this says the cord lies above the graph. So thatfs what ? thatfs what this inequality says. And actually youfll get a very good idea of what this means eventually. Another way to say it just very roughly is upward curvature. So it just means curves up, thatfs all. And by the way, of course, a convex function can look like that. It can be monotonic decreasing. Nevertheless, the curvature is upward for a function like that. It doesnft have to be bowl shaped, but it should have positive curvature. Okay, now you say a function in concave is ?F is convex. That means negative curvature, downward curvature, something like that. And itfs strictly convex if itfs convex. And not only that, but this inequality here holds with strict inequality provided data is strictly between zero and one. So thatfs strict convexity. And you have strict concavity too. So letfs look at some examples; well ? and these are just on R. So these are just functions you can draw. So the first is just an affine function, so thatfs linear plus a constant. Thatfs ? it has zero curvature, so itfs convex. And in fact what happens for a function that is affine is the following; is that in effect you have equality here. For always, for an affine function thatfs exactly what it means. It means that if you do a linear interpolation between two points, you actually get the exact function value. So thatfs essentially the boundary of a set of convex functions. Exponential, doesnft matter what the coefficient is in here, this is convex. So if A is positive, itfs increasing, but the curvaturefs upwards. If itfs negative, itfs a decreasing function, but itfs convex. Powers separate out. It depends on the values of the exponent. If the exponentfs one or bigger, or if itfs negative, or ? well zero [inaudible] thatfs just a constant one, in that case itfs convex. I mean, these are things you can just draw and see. So I think the question is to whether or not a function on R is convex, is a non-issue. Herefs how you check; you draw it, and you use your eyeball to see if it curves up. So therefs really no issue there, okay? So these ? wefll find formal ways to verify all these, but I think in terms of a function on R, there is no issue. You just ? you draw, does it curve upward or not, thatfs the question. And you have things like power of absolute value would be another one. Negative entropy is X log X. Thatfs something that goes like this; itfs got this infinite slope here and then goes like that. Something like that, and ? but at point as it curves upwards, thatfs negative entropy. So entropy, which is minus X log X is going to be concave. Okay? Now in concave functions examples would be like an affine function, and in fact, of course, if a function is both convex and concave, then itfs affine. And thatfs clear because it says basically; this inequality holds one way and the other. That means its equality. This in fact implies the function is affine. Okay? This powerfs in the range between zero and one, you know, like square root for example, you just draw this, and itfs clearly concave. Log rhythm, itfs another famous example. Okay. A little more interesting example on RN and RM by N, thatfs the set of M by N real matrices. So herefs ? of course you have an affine function on RN, thatfs [inaudible] plus X plus B. Thatfs a general linear function plus a constant. So this is the form of a general linear function, affine function on RN. And thatfs going to be convex, itfs also concave. Norms; so any norm is convex. That follows actually from triangle and equality, or ? I mean, thatfs part of the definition of being a norm. And examples are things like this; the so-called p-norms, which is the sum of the absolute value of XI and then to the one over P. Now for P equals one, thatfs the sum of the absolute values for P equals two. Itfs usually Euclidean norm, but for example, for P equals three, itfs the three norm, which is the cube root of the sum of the cubes of the absolute values of a vector. Yes? [Student:]Is there a half norm? Thatfs a very good question, is there a half norm? So some people would ? let me answer it first ? well let me just first give the answer. The answer is no because the square of the sum of the square roots is actually not a convex function, and therefore it canft be a norm because norms have to be ? all norms are convex. Itfs actually concave. Now having said that, it is currently very popular in statistics and a bunch of other areas to use a ? as a eristic for finding a sparse solution that involves ? wefll see this later, by the way, using things like the so called p-norm for P less than one, but itfs not a norm, so itfs weird. [Student:]Why do all the norms have to be convex? That actually follows from the definitions of ? norm has to satisfy a triangle in equality and a homogeneity property. And then from those two you can establish it has to be convex. By the way, I should mention something here. Itfs not easy to show ? if you put ? just for general P, just take the three norm. Itfs not easy to show thatfs a convex function. Itfs not easy at all. So itfs ? I mean, itfs not hard, but you have to know exactly just the right five or six steps to do it. So this is maybe the first thing Ifve said that wasnft trivial, and itfs not. Okay, letfs look on matrices; actually, every now and then wefre going to ? there are going to be problems where the variables are matrices. Sometimes square, sometimes theyfll be rectangular, but letfs look at a couple right now. What is an affine function on matrices? Well the general form looks like this. A trace of A transpose X plus B ? by the way, when you see this you should read this as follows; this ? by the way, some people write this as the inner product of A and X plus B. Thatfs the standard inner product on matrices, is trace A transpose X. If you work what it is entry by entry, itfs just this. So itfs as if ? well itfs this, let me write it another way, letfs see if I can do this. If ? thatfs not totally ? thatfs mixing weird and ? that mixing notation, but the point is this says that if you string out A as a vector, string out X as a vector and then calculate the ordinary inner product, well you would just get this, like that. Okay? And thatfs the same as this thing. So this notation, you might as well get used it. When you see trace A transpose B ? A transpose X, that really means you can think of it this way. And, by the way, another notation for this is A big dot X or something like that, so youfll see that. Okay. Herefs an interesting function, which is the norm of a matrix. So thatfs ? and Ifm talking about here, the spectral norm or the maximum singular value, or you can call it many other things, and therefs not that many other ? actually therefs probably just a couple other names for it, the L2 induced norm and maybe the ? now Ifm running out of obvious ones. So this is the square root of the largest [inaudible] X transpose X. Now I want to point something out, that is very ? thatfs a very complicated function of a matrix. So thatfs ? it is not a simple ? to take the matrix form X transpose X, find the largest [inaudible] value of it then take the square root of that. Thatfs a chain of quite complicated operations. So thatfs a function which is not simple, but itfs a norm and itfs convex, okay? Now herefs one extremely useful property for convex functions, is this; a function is convex if and only if itfs convex when itfs restricted to any line. Thatfs very, very useful. And in fact, itfs one of the best ways to actually establish convexity of a function. And essentially it means the following; I already said that convexity of a function on R is a non-issue. Plot and use your eyeball. This says in principle, convexity of any function is a non-issue. Now the only hitch is you have to unfortunately plot and use your eyeball on all possible lines that pass through the set of which there are [inaudible] number. Okay, so thatfs the only ? but conceptually it means that therefs nothing subtle about convexity. Itfs basically if you take some complicated horrible function, multiple dimensions, and you take a line, then ? and then view that function on that line, you should view something that looks like that. And if that happens no matter what line you choose, itfs convex. Thatfs what it is. So this is not too hard to show, and may indeed be coming up on a homework or something for you. I mean, to establish it, this is the case. And letfs look at an example; so herefs an interesting function; if the log of the determinant of a positive definite matrix ? by the way, thatfs a complicated function right there. For example, if X were 10 by 10 and I wrote this out, it wouldnft ? it would take like a book to write out what that is because youfd have 10 factorial terms in log det X. So this is a log of a polynomial of these 100 by ? well okay, itfs 10 by 10 so therefs only 55 entries because the bottom is the same as the top or whatever. So therefs 55 entries, you know, free variable. So this is the log of a polynomial in 55 variables, and that polynomial probably would take a book. Ifm just making a wide guess. By the way, it could be a whole lot bigger than a book too. I could be off a bit. But ? so thatfs ? this is really a fantastically complicated function. Actually, this function is concave. Thatfs going to be very important. Itfs going to play all sorts of roles later. So letfs actually show that, that this is concave using this technique. This is ? well itfs the only technique you know except for applying the definition. Now ? by the way, what does it mean to say its concave? Basically it says that if you have two positive definite matrices, and you evaluate the log and determinance of them, and then you form a blend of the two, letfs say the average. It says, the log of the determinant of that average is bigger than or equal to the average of the log of the determinants of the end points. Does everybody follow that? So thatfs ? and this is not an obvious fact. I mean, once you know it itfs obvious, but ? well itfs actually not obvious, letfs just say that. Even when ? once you know it itfs obvious only because you know it. Okay, so letfs see how this works; so to establish this we have ? what we have to do is we have to pick an arbitrary line in symmetric matrix space, okay? So what does a line look like in symmetric matrix space? Well it looks like this ? and itfs one that passes through the positive definite cone. So itfs going to look like this; without lost generality it looks like a point. X, which is positive definite, plus T times a direction V. Now this direction V is a symmetric matrix, but it does not have to be positive semi-definite, right, because thatfs a line in a direction ? therefs no reason the direction has to be positive ? it does have to be symmetric. So this thing describes a function of a single variable T. And we have to check that this is in fact a concave function of T, so thatfs what we have to do. By the way, if youfre looking at some function, you have absolutely no idea if itfs convex or concave. First thing to do when youfre near a computer, sit down, generate arbitrary line and plot and look. Look at 10 or something like that. And by the way, if you find one that doesnft curve up then you destroy all evidence that you did this and then comfortably say, obviously thatfs not convex, and then erase all the files that ? itfs three lines, but the point is then you erase the evidence. If they keep coming up like this, then after 50 times itfs like, well okay, here it goes. And then you roll up your sleeves and you move in to prove it. And it may not help you, but itfs actually quite useful to do this. Ifve actually failed to follow this advice on several occasions, jumped in 45 minutes later, wandered ? got tired, wandered over, typed in a few, quickly turned up, found a point where it had the wrong curvature and then realized I was just ? I had just been completely wasting my time, so ? okay, I just mentioned that. Donft tell people where you learned that ? where you heard that. All right, so letfs work out what this is; well a right X ? Ifm going to write this ? therefs lots of ways to do it, but X is positive definite so it has a square root. So Ifll take half out on each side, and this will look like this, T, itfs gonna look like that, times X half. Thatfs what ? this matrix is this. But the determinant of a triple product is the product of the determinance. And you take logs and it adds and all that, so you get this because the log of X half plus log ? sorry, log det X half plus log det X half is log det X. So you get this thing here. Itfs still not too obvious, but what wefre going to do now is this; Ifm going to write the ? this thing is a symmetric, but not necessarily positive ? itfs certainly not positive semi-definite or anything like ? you donft know. Matrix, wefll take its igon expansion. Wefll write this as whatever, Q lambda Q transpose, and you get this with a T there like this. What Ifll do now is Ifll do the trick of writing I as QQ transpose then pull it out on either side and Ifll get det Q. It doesnft ? det Q is either plus or minus one, but it doesnft matter. And then Ifll end up with det I plus T lambda. Thatfs a diagonal matrix. I know what the determinant of a diagonal matrix is, itfs a product of the entries, and so I get this thing, okay? So I went over this a bit fast, but I promised I would go fast, so ?yes? [Student:]How did you choose the directional matrix V? Itfs extremely important that itfs completely arbitrary. So the technical answer to your question, how did I choose the direction V is, I didnft. Or arbitrarily I suppose is the actual correct adverb or whatever, okay? Thatfs ? because if V is ? if I chose V in any special way then my final conclusion is not gonna hold, itfs not right. To be convex has to be every line. It has to be convex when restricted to any line through the point. Okay. Now wefre in pretty good shape, because I know what log 1 plus T lambda looks like. For any real number of lambda ? if lambdafs positive it looks like one thing and it goes like this, or sorry, it goes like that. Nope, sorry, that was from my point of view. I did it right the first time, it goes like this. If lambda has the other sign it goes like that; either way curvature is negative, and so this is concave. Okay? By the way, this is gonna turn out to have sorts of implications. If youfre in statistics, if youfve taken information theory, communications, a lot of other things like that, actually, if youfve done any computational geometry, it turns out some things you know are actually related to concavity of log det X. Things involving like entropies of ram ? of galcie invariables and things like that. So actually throughout the class, when we get ? when wefve actually covered some material youfll see all sorts of things you know from other classes are going to start to connect to various things here. Next topic is ? this is just a bookkeeping thing. Itfs quite eloquent, but it ? and itfs actually something good to know about. It works like this; when you have a convex function, it turns out you can encode the ? itfs convenient to encode the domain of the function by simply assigning the value plus infinity when youfre outside the domain. So if you have a function F with some domain, then we simply ? we define an extended valued extension as follows; itfs gonna agree with the function if youfre in the domain and wefll assign it infinity outside the domain. And, I mean, letfs not worry about this here, but technically therefs a different between F and F tilde. And the difference occurs when you call F at a point outside the domain. If you call F at a point outside the domain, you get the dreaded ? whatfs returned as the dreaded NID token, otherwise known as enot in domainf. Of course, what would actually happen is some exception would be thrown at you or something like that, or a NAN would come back or something like that, but thatfs what that is. Whereas F tilde evaluated at a point outside the domain is of F, simply returns the token plus infinity. Okay? Now what happens is then everything kinda works. So if you have a function, sort of it looks like that, and its domain is from here to here, thatfs a convex function. What we simply do is we simply assign it the value sort of plus infinity outside; here, thatfs plus infinity, okay? So it looks like that, so you make it plus infinity. Actually everything works, including this inequality if you say ? if you take this point and this point, everything works. If you draw ? everything will work because this line will now have a slope going straight up and it all just works. So this is just something to know about, itfs absolutely standard in convex analysis. So itfs ? you should know about it. Now for concave functions itfs the opposite. You encode the domain ? or I should say, you encode the not domain as minus infinity in the function value, okay? Now we get to first order of condition. First of all, just to remind you, a function is differentiable if its domain is open. I mean you could also talk about being differentiable at a point, and youfd say that itfs differentiable to point if the point is in the interior of the domain, but wefll just talk about being differentiable period. So a point is differentiable if its domain is open and the gradient exists at each X in [inaudible]. Actually, this is a bit circular and what you really want to say ? you really want to say this is not ? this is informal so Ifll just leave it that way. So herefs the first order condition for convexity; this is very important, and actually itfs a hint as to why convex opposition actually works very, very well. Here it is, it says the following; form that is the Taylor approximation. The first order Taylor approximation of F at X. Okay, that says as a function of Y. Thatfs the Taylor approximation. So therefs F, here is this Taylor approximation, of course this is drawn in R, but in general this is the Taylor approximation. And of course what the Taylor approximation is this; it ? at the point in which you expanded, itfs perfect. In other words, it coincides with the function. Nearby, so near X, F ? the Taylor expansion is very near, by which I mean to say, formally near squared. So itfs small ? the error is small squared. So what the Newton tells you or calculus tells you is that these two functions, one is this affine function and one is this one, is that the are very near as long as Y is near X. For a convex function this thing is actually always an underestimator. So thatfs the important part. It says that the Taylor expan ? the first order of Taylor expansion is a global underestimator of the function. Thatfs what it means. Now, what ? let me ? I mean, this is actually the key to everything. I mean itfs ? once you know all these itfs kinda trivial, but this is the key to everything because let me explain what happens; later in the class wefll formulate all sorts of crazy problems as convex problems. And wefll come back and wefll say, Ifve solved it, this is the solution. It will be some horrible complicate problem with hideous no non-linearityfs, things that are non-differentiable, god knows whatfs in it. And someone will say, well how do you know thatfs the global solution? That problem is highly non-linear, itfs got all ? I mean, thatfs ridiculous. I mean maybe itfs a local solution, Ifd believe that, but how could you possibly assert that in all over our 100 therefs no point that does better than your point. Maybe therefs some sick little region you havenft examined yet where the function miracously does better. Everybody see what Ifm saying? And itfs really a preposterous statement that youfre making. I mean after a while youfll get used to it, but itfs preposterous. And then you said, oh no, no, because this is a convex problem, I assure you this is the absolute best there is. And someone will say, how can you say that, itfs ridiculous. This is it, and the reason is this; from local information, thatfs a gradient, a gradient is local information. It says, from local information you get global conclusion, which is this inequality. So just ? although the inequality is not a big deal, the fact that it says something that you evaluate a gradient somewhere, thatfs completely local. To evaluate a gradient all you need to do is wiggle X around near that point and see how the function varies locally. You donft have to do any big exploration far away. What is says if that function is convex then from that little local exploration you can make global bounds on that function that hold arbitrarily far away. Everybody see what Ifm saying? So not a big deal, but this is ? if you ever ? whatfs going to happen is three weeks into the class things that are just preposterous will be asserted, youfll be asserting them in fact. And if you ever wanted to go back and say, well what ? you know, but you know how it always is, right, therefs just a string of little trivialities and the next thing you know youfve said something quite deep, and then you always want to go back and say, where exactly did that happen? Anyway, I say it happened here. Thatfs what I say. Okay, second order of condition; I guess youfve seen this. This youfve probably seen in a calculus class or something like that. Usually it has to do with sort of this part, the suficion conditions or checking if somethingfs a minimum or maximum or something like that. And it basically says that the ? it says, if itfs twice differentiable, it says itfs a hessian, which is the makers of partial derivatives. If that is ? it says the following; it says itfs convex if and only if that matrix is positive-semidefinite, okay? So thatfs the condition. And then therefs a gap in characterizing strict convexity. And the gap says something like this; it says, certainly if the hessian is positive-definite everywhere then the matrix ? Ifm sorry, and then the function is strictly convex. Actually the converse is false and an example would be S to the fourth on R. Thatfs a strictly convex function; X to the fourth, but its second derivative at zero is zero, okay? Thatfs a case where the second derivative fails to be positive at the origin. So this is the condition. So this is useful sometimes. Actually, this is less useful, in fact, and also when ? although you will have to ? I mean, wefll see to it, let me put it that way, that youfll have to use this method to establish convexity of something. But as youfll see later, this is to be avoided because generally itfs a point to show something as positive-semidefinite, unless itfs really simple, the function. Now we can knock off some examples real quick. Letfs characterize all quadratic functions right now. So a quadratic function looks like this; itfs a quadratic form, itfs a p-asymmetric here, a liner function and a constant, okay? So the gradient of this function is PX plus Q, and the hessian is P. So thatfs the ? itfs constant, itfs got a constant hessian. So a quadratic function is convex if and only if P is bigger than or equal to zero, if itfs a positive-semidefinite matrix. And an example would be least squares objective. That just ? immediately. So here the hessian is A transpose A and thatfs going to be positive-semidefinite so thatfs always convex. Herefs a function that is not obvious, itfs one ? this is probably one of the first functions you encounter that is not obvious, itfs not obvious that itfs convex. Maybe other than minus log det X, but herefs one; X squared over Y just two variables. So itfs ? this is convex in X and Y provided Y is positive. And letfs take a look at that ? first of all, letfs just check a few things. If you fix Y itfs convex in X thatfs clear. If you fix X itfs convex and Y because itfs a one over Y in that case, okay? Now by the way, Ifm using there this idea that these are essentially lines. I mean thatfs a line, one is aligned with the X axis, one with the Y axis. So if a function of many variables is convex, it better be convex in each variable separately. The converse of course if completely false; you can have a function convex in each variable separately, but not jointly convex. And the answer in ? I mean, the distinction comes completely from the Hessian, right? So you have a, for example, a function of X and YQ variables, you have a two by two hessian that is required to be positive-semidefinite. To say that itfs convex in X and Y, it says the diagonals of that hessian are non-negative. Thatfs what it means. To ? that tells me nothing about the off-diagonals, so therefs a further condition linking the two and the cross condition. Question? [Student:]Is there any way to prove convexity or show convexity from [inaudible] projecting it on to one dimension [inaudible] number of times and ? Be careful because wefre not projecting onto a dimension, wefre restricting to one dimension. And the answer is, yes. In principle if you restrict it to any line, any sort of one dimensional set that passes [inaudible] and the result is always convex, then itfs convex, but it has to be understood thatfs in principle unless youfre prepared to do something symbolic with every line like we just did with log det. Okay. So letfs take a look at this function, and letfs show that itfs con ? by the way, herefs a plot of it, so it looks ? I donft know, some people have told me it looks like a boat or something like that, the bow of a boat or ? I donft know, anyway, so and if you look in ? I guess, one slice itfs quadratic and itfs ? I guess itfs ? I donft know what it is anyway so, if you slice it different directions you get obviously convex things. So lets work out the hessian, well I worked it out, and Ifm not going to calculate in front of you all the partial derivatives of the things here, but you calculate partial squared F, partial X, partial Y, and the diagonals, and you fill it out and you get a matrix which indeed is rank one, and can ? and positive-semidefinite because you write it this way. And here wefre using the fact that Y is positive here, to ensure that this is positive-semidefinite. So thatfs convex, so thatfs your ? maybe one of your first non-obvious convex functions. Herefs another on; the so called log sum X function. Thatfs the log of a sum of exponentials of variables. Actually, before we go on let me just say a little bit about this function; itfs ? first of all itfs sort of like a smooth max, or in fact some people call ? there are fields where this is simply referred to as soft max. So I know whole fields where this is universally called the soft max. The reason itfs soft max is this; if you take a bunch of variables, X1 through XN, then X increases of course very quickly. So the biggest X is a lot bigger. If therefs a good gap between X ? the largest X and the next one, X will accentuate the spread. Okay? If you add these up and then take the log ? if for example one of the Xfs is kind of isolated and far away from the second ? if the biggest is away from the second biggest then you can actually quickly see that basically this sum X is basically X of the largest. You take the log of that and you get the largest. So this is sort of a smooth ? itfs a soft max some people call it. I should also mention this is ? if youfre in electrical engineering, this is the DB combining formula. So this is how you combine powers that add incoherently. So if you have for example, if you have a bunch of powers adding, and the powers come in at, minus 15 DPM, minus 22, minus 37 and minus 3DBM, everyone in here knows what the power of the result is, except I forgot the numbers I just listed, but I think itfs minus three with the largest one. The answer is itfs minus three. So, whatever that is, minus three DB reference to a millawatt. And let me explain what that is; X convert from decibels to power. This does incoherent power addition and log converts back to decibels. By the way, obviously if youfre not in electrical engineering you donft have any idea what Ifm talking about, and thatfs just fine. So Ifm just saying this is a function, youfve seen it before and it comes up in statistics as well in exponential families, this comes up all the time. Okay. So, in a little statistical mechanics too, itfs the denominate ? a fancy version of this is some normalization constant or whatever. Ifm sure people here know a lot more about this than I do. So thatfs a convex function, and the argument would go something like this; you take the hessian ? by the way, if youfre curious, no one can look at this and write this formula out for the hessian. Just, you know, so if youfre looking at this and saying like, oh, that looks pretty easy, anyway, itfs not. You have to sit down and first you try to find some secret rules for ? chain rules for calculating the hessian of some function, thatfs the first thing you do. Then that gives you a headache. Then I usually just basically ? at some point you give up and you just go in there ? you just calculate partial squared F, partial XI, partial XJ, therefs just no way around it. You get this huge horrible mess then you have to go from your mess, your index by index representation to a matrix representation and then youfre off and running. Then you go back and erase all that and then write things like this. So ? just like this and then thatfs the idea, but lets take a look at it and see what it says; now when you do this you look at it and you see well therefs this first term, Z here is this X of X, so itfs obviously ? itfs non-negative. This think here, itfs interesting. Thatfs a non ? thatfs a positive diagonal matrix so this is positive definite ? positive-semidefinite ? well itfs positive-definite. And of course what youfre hoping for is to add that to something positive definite at which point you ? itfs easy because you just say the sum of two positive-semidefinite matrices is positive-semidefinite, done. And therefs a little minor problem, which is that this sign goes the wrong way, and that could mean two things; it could mean either this is not convex, or you have to work harder to show this thing is positive-seimdefinite, and itfs the latter here. [Student:]Is Z a vector here? Yep, Zfs a vector. [Student:]What is diag of [inaudible]? What is diag of? [Student:][Inaudible]? Of Z here? [Student:]Yeah? What is diag of Z? Oh, diag of Z, you take a diag ? this is actually entering ? of course its mat lab slang or something. But itfs entering ? itfs sort of entering mainstream mathematical notation. [Student:][Inaudible] Exactly. Yeah. So diag of a vector imbeds the vector into the diagonal elements of a diagonal matrix. What department are you in? [Student:]Double A. What? [Student:]Double A. Double A, okay. Now wait a minute here, have you never used [inaudible]? [Student:]A long time ago [inaudible]. It was a long time ago. [Student:]Maybe. Maybe, okay. [Student:]Last year, I think. Okay, last year. Okay, fine. Okay, thatfs fine, just checking. I thought you were going to say math or something like that. [Student:][Inaudible] Okay, all right. So it was in a CS class last year, which ? the memory of which ? you didnft remember what class it was or ? what was it? [Student:]220. 220? [Student:]Eight. Eight, okay, I figured. Okay. So back to our thread here, itfs not over yet, you still have to show this is positive-semidefinite. How do you show a matrix positive-semidefinite? You simply show that the associate quadratic form in always non-negative. So you put a V on the left to be transposed, a V on the right. You plug that in here and see what happens and you get this thing. And this turns out to be greater than or equal to zero using the Cauchy-Schwartz inequality applied here, okay? These are not ? this is not obvious. Okay? This is not something that you just type in. This is like 30 minutes of thinking and breaking pencils and things like that. Oh, you know, I just remembered something, speaking of that. I donft know that itfs been announced yet, but I ? itfs just a suggestion. A suggestion actually itfs not a bad thing to work on homework in possibly small groups of people depending on who you are. I mean people already probably have the sense to do this anyway, but youfre being officially encouraged to work in small groups like that. In which I case I could say so after 30 minutes of working this, breaking pencils, throwing pencils at the other people in the group and things like that. Actually working in a group is very good because itfs easy to think you understand something, very easy, when youfre by yourself. And it turns out if you have to address like two or three other people and try to explain it ? actually, you know while youfre explaining it youfll realize that so your mouth is going like this, right, and therefll be something in the back of your head, I donft know what part of it is, will actually come out and say, by the way ? also just looking at the others in the group you realize, this makes no sense whatsoever. I mean, what youfre saying makes no sense. So itfs a very good [inaudible]. Whereas a lot of things you know by yourself can be totally obvious. Totally obvious, then you try to explain it to someone. Also, if you try to explain it and itfs this long complicated ? you go, look, its super simple, itfs so easy. You just ? okay, you do this, but then its, well hang on, I forgot to say this. And then also, by the way, do you remember this and then you put that there ? this is also a hint that this is way to complicated, your description of it, so youfre encouraged to work in small groups. All right, letfs move on. So a geometric [inaudible] is concave, so this is the product of a bunch of variables, these have to be positive, although actually this works for non-negative. So itfs a bunch of variables and then the enth root of that, so thatfs concave. And itfs the same type of argument as for log sum exptH, same type of thing. Okay, some other connections, the idea of an epigraph and a sublevel set. So if you have any function then the sublevel ? the alpha sublevel set is the set of points with F value less than alpha. By the way, later in the course F will be an objective or it will be some type of objective, or something like that. And for example, if F is a design, then if F represents the power dissipated by some circuit design or something like this, this will be the set of designs that meet an alpha spec. Thatfs what this is going to mean, okay? Thatfs what a sublevel set will often mean. By the way, if itfs an estimation problem, an F is a measure of implausibility, like negative log likelihood in a statistical estimation problem. This will be the set of points, which are at least alpha plausible values, okay? So something like that. All right, now if you have a convex function then the sublevel sets are convex. By the way, the converse of that is false, but itfs a very important thing and wefll have a name for it very soon, a function whose sublevel sets is convex. The real correct connection between convex function, convex set, because wefve overloaded the word convex now to mean two things; it applies to sets and it applies to functions. The real connection is through something called the epigraph. So I guess epi means above, and so epigraph means everything about the graph. The graph of a function of course, is the set of pairs, XY. So itfs an RN plus one, itfs the actual graph. The epigraph is everything above it. So if I have a function, the epigraph is this shaded region like this. And herefs the real connection between convex set and functions. A function is convex if and only if its epigraph is a convex set. So ? and in fact, it turns out you should just be thinking about convex functions in terms of their epigraphs just always. So, for example, when I say herefs a function like that, in fact letfs put a domain restriction on it from there to there, you should just immediately visualize this set drawn in, and itfs a convex set. Actually this is going to be important because wefre going to do a lot of calculus on ? with convex functions and you want to think about what does it mean in terms of the sets? Thatfs what you want to be doing. So this you should just be thinking of at all times, the epigraph. Okay. Letfs look at Jensenfs inequality; so our basic inequality for convexity is this; it says if you take a point between zero and one, if you take two points and take some kind of weighted average of the two, thatfs what this is, and evaluate it, itfs ? so this says that F of the weighted average is less than the weighted average of F. I said it right. I always forget this. Ifll show you a pneumonic in a minute, which you can sneak off and do if you have something to draw on. You can even do it if you canft draw, but it takes me longer. Now therefs tons of extensions of this, for example, instead of just two points you could have a finet number of points and a bunch on fada Ifs that add up to one and are non-negative. Thatfs a convex combination. And the same inequality would be true thinking of an accountable infinite number of points, some combination that way, and then you can have an infinite one. The most general is this; if F is convex, then F of an expected value is less than or equal to expected value of F of X. Now thatfs called ? thatfs Jensenfs inequality. And this is where Z is a random variable, however, which is in the domain of F almost surely. So thatfs this, and this is Jensenfs inequality. And this basic case is nothing by this; itfs a distribution on Z, extremely simple. It takes only two values, X and Y with probability theta and one minus theta, and you recover this thing. So this is Jensenfs inequality. I never remember Jensenfs inequality, especially because whenever it comes up usually F is half the time convex, half concave. Of course if itfs concave it goes the other way. And so I actually never remember it and I usually have to just draw a picture and remember this thing. Therefs another way to say it though, I know what that is; so if you want a general pneumonic about how it works, it basically says that for a convex function dithering actually hurts. And let me explain what that means; it means that if I have a function ? a point here, and herefs a convex function, like that, okay? Letfs imagine, thatfs actually my target point in some process, but now, actually when I manufacture it, I actually get a distribution of values like that, okay? Whose mean is this point, everybody cool on this? So thatfs the ? this happens, so this is ? and then this tells you the cost, okay? And this could be the power, I donft really care, something like that. Or the speed of some ? lets make it the power. So this is the power, and it basically says, you know, look, these points, that are where the manufacturing [inaudible] they were on your side. The threshold voltage went up or whatever, something happened, and it actually dissipates less power than your nominal design. Everybody see that? And this is where it worked out badly. And now I ask, whatfs the expected value of the power? What is it? Well the first thing you would do is youfd say, well look, itfs around here, because if I approximated this by an affine function and I propagated this distribution through an affine function, then its mean would be the same. So the first quick answer is something like this; well itfs the same, if this is like one watt, and you have these manufacturing variations, sometimes youfre less than a watt, sometimes youfre more. And it says that the average is going to be on the order of watt, you know, like a watt. Everybody got that? But the fact is its worse ? the average is worse than a watt. Itfs like 1.05, it canft be better. And the reason is this; itfs true that when you go up and down here, The first order approximation, in that case sometimes youfre less than a watt, sometimes more, but because the curvature is upward you get ? letfs see, you pay more when itfs bad then you recover when itfs good. Did that make any sense? And thatfs entirely due to the curvature. So that my pneumonic for Jensenfs inequality. It basically says manufacturing variation generally isnft good. Yes? [Student:]What [inaudible]? Thatfs your question? So this is a good time for me to announce this. There will be periodically times, Ifd say multiple times per lecture where Ifll go in something and Ifll make no sense whatsoever. Its best ? if this starts happening like three or four times a lecture then you can let me know, but thatfs good feedback. Itfs best at least for now in the spirit of just rushing forward, wefll just move on. Did anyone understand what I said? A handful of people, theyfre just probably just being polite. Okay. Youfll learn to just move on. Now, let me ? actually, let me ? this is a good point to kind of explain ? give a sign post and say where we are and how this works. What youfre gonna have to do is youfre gonna have to look at functions and figure out if theyfre convex or not, thatfs what you have to do. So the methods I just looked at involving lines and all that, if you have to resort to that, if you actually write out a hessian, I mean this is to be avoided to be honest with you. This is ? you do this only in a last resort. Of course every now and then you have to; wefll arrange that you will have to because everyone should have to do this once or twice or something. But the point is the right way to do it is using a convex calculus. And so the right way to do it is this; whatfs gonna ? this method three, just sort of just general method is gonna work like this, youfll learn a bunch of atoms. Youfve already seen a bunch, affine function, powers, log sum exp, X squared over Y, norms, quadratic functions, thing like that, where once you know it, minus log det X you know itfs convex, okay? Then wefre going to look at a calculus. And a calculus meaning methods to combine these and rules for showing itfs convex. And you saw this with convex sets last lecture. But here therefs gonna be a bunch ? there are gonna be some simple ones here. Now in this rule set, these divide into what I would call sort of the really obvious basic ones and then therefs sort of that intermediate tier, and then you get into the advanced ones and the ultra advanced ones and things like that, therefs sort of no limit on these things. So I will tell you when we move into different levels of esoteric on this, but ? okay. So these are extremely basic sense of following. If you have a convex function and you scale it by a non-negative scalar, itfs convex, thatfs totally obvious. If you add two functions that are convex, itfs convex, okay? And that extends to adding five functions and it goes to an integral or even an expected value of convex functions would be convex. Composition with an affine function; so if you pre-compose with an affine function, so in other words if you apply an affine function then a convex function, you get something thatfs convex, and itfs ? many ways to check this. Actually, just directly is simple enough, just with the old theta in there. Okay? Very simple to show these things. Letfs look at some examples; we can make some examples now. F of X is minus log sum VI minus AI transpose X, and this of course is defined on the region where these are strictly positive. Thatfs the interior of a polyhedron. So I have a polyhedron defined by AI transpose X less than VI. The interiors where thatfs strictly less, and thatfs where this makes sense. This is by the way called the log barrier for this polyhedron, and you have a minus sign here. So you know, thatfs kind of a very complicated highly non-linear function of X. You could work out the gradient; you could work out the hessian. This one wouldnft be too bad because if you work out the hessian, which I would not recommend doing by just getting in there and slogging it out, calculating partial squared F, partial squared XI XJ, but using some of the rules for calculating hessians, some of these are in the ? these are in the appendix by the way. It would work, but the easiest thing by far is to say this, herefs my function, ready, Ifm going to apply ? here, Ifm going to take this function which is fi of Z is sum minus log ZI, okay? So thatfs it, it takes a bunch of variables, takes their logs, takes minus sum. Thatfs a convex function. Why? Well each of these is a convex function and a sum is convex. Thatfs a convex function. Again, nothing stunning yet. Now wefre going to simply pre-compose this function with this affine mapping. Ifll call it B minus AX, thatfs an affine mapping. And that gives you this function and then here, I just applied this, this composition rule. Herefs another one; is the norm of any affine function, so norm AX plus V is gonna be affine ? convex, sorry. Itfs going to be a convex function of X. Okay. Herefs one that maybe not totally obvious; so herefs one convex function and herefs another one, like that. The point wise maximum is this function here; it looks like that, okay? So at each point itfs the maximum of the two ? of the functions, okay? It looks like that. By the way, in terms of epigraphs, what does this correspond to? Precisely. So calculating point wise maximum of functions, in fact you can even write some silly formula for it, you know something like this; epi of max over I FI is equal to the intersection over I of epi FI, something like that, okay? So thatfs the correspondence here. That preserves convexity, okay? And you know, it means for example, here this function which is the max of a bunch of affine functions as a Piecewise linear function, thatfs convex. Obviously not all Piecewise linear functions are, but any Piecewise linear function expressed in this way is convex. Herefs one, this is again, not obvious. The sum of the R largest components of a vector. So take a vector in R50 and the sum of the top three elements. Thatfs a very complicated Piecewise linear function, itfs convex. Why is it convex, because itfs the maximum of A transpose X or A ? this is for the sum of the top three, is any vector with three onefs and 47 zeros. Now therefs a giant pile of those, therefs 50, choose three. But the point is, itfs the maximum of 50 choose three linear functions done. Itfs convex. So you have to watch out here because you slowly sneak up on this and youfll find out after a while youfve actually done something and some of these things are not obvious. Proving this by another method would really ? could be very, very painful. I mean, this one, hessian doesnft even exist, itfs not even differentiable, which would save you the horrible trouble of getting in there by hand and working out a hessian. But just proving that directly would be a real pain. Now this maximum business extends, and it extends to an infinite max or a point wise supremum. So herefs the statement; if you have a function of two variables, X and Y, and suppose itfs convex in X for each Y in some set, you donft even know what the set is, totally irrelevant what the set is. Finide, infinite, makes no difference whatsoever. Then it says that if you take ? again, you can leave this as max if you havenft seen this before, this is simply ? if you simply take the maximum over this possibly infinite collections of functions point wise you get a new function, thatfs going to be convex. And herefs some quick example; lets look at a couple of ? letfs start with this one, letfs take the maximum eigen value of asymmetric matrix. Now we discussed that before, that is a really complicated function. For a matrix that is six by six or bigger, there is no formula for this, none, because therefs no formula for the roots of a sixth degree and order and higher order polynomial. I mean, you donft need to know that, but itfs a good thing to know. Well itfs not useful; itfs just a cool thing to know. This is really a very complicated function, the maximum eigen value of a matrix, but watch this. If the supremum of Y transpose XY over all Y that have ? over the units sphere. The units sphere is the set of all points whose norm is one. Now letfs check me and see how the argument goes. Look at Y transpose XY. Now by the way, when you look at that you are ? at this point you are trained and wired, all by the quadratic part of your brain should be lighting up. This has been proved in FMRI studies, okay? But the point is actually here the variable is capital X. What kind of function of capital X is this? Remember, you have to suppress the flashing quadratic neurons. Itfs linear, thatfs a linear function of X. This is sum, XIJ YI YJ, itfs linear in X. Okay? So for each Y thatfs a linear function. This is a supremum of an infinite collection of linear functions. In fact, therefs one for every point on the units sphere in our end. Supremum of a bunch of linear functions, linear functions are convex. Supremum over these things is convex, thatfs a convex function. And now you know something thatfs not totally obvious. I mean, itfs not that unobvious either, but itfs ? this would be ? since thatfs like a two line proof or something like that, thatfs not bad. And of course itfs a disaster if you actually try to write out what lambda max is. I mean, just ? you write down square root of ? or thatfs not lambda max, but if you write down the largest of the lambda Ifs for which the character [inaudible] vanishes, itfs all over, youfll never recover, this is an example. Let me hit the next one, is composition. So under certain circumstances composition preserved convexity. So letfs see what that is; if you have H of G of X, -- so the rule goes like this, and Ifll ? the famous one is this, it says that a convex increasing function of a convex function is convex. Okay? So if ? this thing will be convex if you have a convex ? if the outer function is convex and increasing, which ? I mean non-decreasing, okay? So thatfs the condition. And the way to derive these as other ones, for example, itfs convex if this thing is concave and H is convex and non ? and decreasing, roughly is what it is. Now the way to check these is simply to write out the chain rules. So if you take ? you imagine that these functions are differentiable and of one variable and you work out the second derivative and you get something like this. And from things like this, this is how you read off the rules, by the way, these things can ? unless youfre doing this all day long these rules, you can easily forget them. What you should remember is there are composition rules, and you have to go back and [inaudible]. Whenever Ifm somewhere and I have to figure this out I actually quietly write this out and then figure out the rules myself. By the way, the rules hold even when these are non-differentiable. You donft need any derivatives. This is just to check. And lets actually ? maybe we can make up a new rule, letfs make one up for fun, ready? All right, letfs see, letfs try here, lets say that G ? Ifm not gonna remember what Ifm saying, so Ifm gonna write it down. Suppose I told you ? I have to get this right ? suppose I told you that G prime is positive. So G is going to be increasing, and I told you ? do I even need G prime? No, sorry. I take that back. Ifm going to tell you that G is concave. So that means this here, and lets make ? lets see if I can get it right, and lets ? let me ask, what are the conditions on H to make F concave? Does that make sense? So what we want is ? wefre gonna say that thatfs ? wefre gonna assume that thatfs less or equal to zero, and I need thing to be less than or equal to zero. Well that means this has to be positive. So I have to have H prime positive. So H has to be increasing. And then I look over here, thatfs positive no matter what, so I need this to be less than or equal to zero, and I just made up my very own new composition rule, and itfs this ? I hope I get this right. Okay, let me go very slowly, if H is concave and increasing and G is concave ? okay, ready, so a concave increasing function of a concave function is concave. Isnft that right? Anyway if I got it wrong, I donft care, you get the principle. Okay, itfs a 300 level class, I can mess up minus signs all I like, thatfs your job to fix them and stuff like that. So I think I said that right. So letfs look at some example ? by the way, the only one I actually remember, to be honest with you is a convex increasing function of a convex function is convex. Then I have ? I remember one other thing, there are composition rules and then I have a note attached to that, therefs lots of them and they get very confusing, although there are people, Ifve noticed who just know them immediately. Theyfll say, oh yeah, not thatfs a concave increasing function ? no, decreasing function of a convex function, thatfs con ? something or another and Ifm like, really? I donft know. Then they have some internal pneumonic or something for doing this, but I donft know them just to tell you the truth. Letfs look at some examples; the exptH of a convex function is convex. And thatfs by the way is very interesting. It says that the exponential actually can only ? it preserves positive of a curvature. If a function curves up, [inaudible] also curves up. Thatfs what itfs saying geometrically. Inverse is interesting. It says the inverse of ? itfs not the inverse, itfs the reciprocal, there you go, thatfs the English word, the reciprocal of a positive concave function is convex. So, for example, letfs see if I can get an example of that. One over the square root is a ? that would be a convex function, and indeed it is, itfs X minus one half, and it goes kinda like that ? sorry, for you, like that. Actually, by the way, thatfs fine because flipping the axis doesnft change the curvature property, so I donft have to draw it for you. If it looks convex for me it looks convex for you. So thatfs the composition one. There are then vector compositions ones and Ifll say ? Ifll give some examples here. So here you have vector composition, so here I have a function of ? a multi-argument function of a bunch of other functions. And now, of course, the possibilities just explode because you have horrible things where you have a function thatfs sort of increasing in some components, decreasing in others, the arguments themselves are either convex, concave, and it gets very complicated. But here you have something like this; X is convex if all of the functions are convex, thatfs the easy case, and H is convex and decreasing in each argument ? sorry, non-decreasing, roughly speaking increasing. By the way, therefs one subtlety here that I want to point out, there a tilde here. Tilde is the extended value extension. And Ifm not going to spend time in lecture going over this, but you want to read that part of the book at some point about this tilde because thatfs not a typo, itfs very important. Let me give a couple of examples here. By the way, what this shows is that actually the earlier rules that you learned are actually ? they can be derived from this. Let me give an example; how about the sum of convex functions? So herefs a function H, H of Z is one transpose Z, itfs the sum of the Zfs. Okay? This function is ? well letfs work it out. It is certainly convex, right, in Z? Itfs also increasing in each argument, do you agree with that, because it just sums the I. So itfs obviously increasing in each argument. Therefore, by this composition rule, it says that if I compose this with a bunch of functions, each of which is convex, the result is convex. So I ? so from this rule Ifve rederived the simpler rule which is the sum of convex functions is convex, everybody see that? Letfs try one more. Lets try H of Z is the max of Z. Well the max function is itself convex. Itfs also ? itfs increasing in each argument, I mean thatfs clear, right? If you increase any element of a vector the max doesnft go down, so itfs non-decreasing. Therefore, this subsumes actually several of the earlier ones. Actually it turns out therefs only two rules. Therefs this rule ? in fact there are only two rules, therefs this rule and therefs the affine pre-composition. But itfs good for a human being at least to think of them as eight or 10 rules or something. Quick question? [Student:]Increasing [inaudible] do you mean also that when you ? that they increase from vector to vector, or just that ? [Crosstalk] Nope, I do not mean that. I mean this ? I mean, hold all element fixed except one. Increase one the function cannot go down. Thatfs what non-decreasing in each argument means. Okay, so I think wefll quick ? this covers, these are the basic rules. By the way, this was very fast, [inaudible] wefll cement this, donft worry, and wefll continue next time. And then maybe even by next week itfll get interesting. Duration: 77 minutes