Here we go. Hey, everyone, welcome. I actually have two handouts for you today. Theyfre posted online, and wefll distribute them through the lecture at just right now, as Ifm starting. One of them is tomorrowfs section handout. Its focus is pretty much on the assembly code generation I was talking about last Monday, Wednesday, and Friday. The second of the two handouts is Assignment 5. I was gonna hand it out on Wednesday. Then I said, gYou know what? Ifll hand it out a little bit earlier,h because I just want to like afford everyone the flexibility to work on these problems when they have time. You donft have to hand anything in for Assignment 5. It is a written problem set. There are no programming exercises whatsoever. Itfs just lots and lots of practice with this code generation stuff that wefre doing in section tomorrow, but youfll also certainly see C code generation on the midterm next Wednesday evening. So the only deadline Ifm really imposing on Assignment 5 is that you actually do the problems, make sure your answers are consistent with mine. And I say it that way because it doesnft have to be exact on an instruction by instruction basis, but you have to just make sure that your code dereferences things the right number of times and loads things the right number of times in order to feel comfortable with that material because it definitely will appear on the mid-term I give next Wednesday evening at 7:00 p.m., okay. I donft know where the mid-term is gonna be yet. Ifll probably figure that out in the next two days. This Wednesday I will certainly give out a practice mid-term and a practice solution so that you have some fodder to play with over the course of the next week. But all of the section handout problems, if there were 20 problems on those section handouts, 19 of them came from old practice midterms. So definitely make sure you understand those. Youfre welcomed to bring in any lecture notes, any of your assignment printouts, whatever you want to bring in. You can bring in textbooks. I donft see the value of it since everything that youfre really responsible for has been covered either in lecture or in handouts. So thatfs that. I know Assignment 4 is due this Thursday. I think people have started it, and they are true believers when I say that it is probably the most difficult of the four youfve seen all quarter. So start that soon, even if for no other reason than just doing a small component of it tonight so you know what youfre up against with this Thursday deadline, okay. When I left you last time, I had just started to talking about the C preprocessor. I want to talk about preprocessing versus compilation versus linking. Youfre used to, from at least 106 memories, it all being the same thing. You clicked command all or you did a drop down and you clicked build, and all of a sudden, this double clickable app was created. Thatfs because it does these three things in sequence behind the scenes, and it doesnft very clearly advertise whether or not something in preprocessing or compilation or linking broke down. You donft necessarily know the difference. So I want to focus on the differences and tell you what each phase is responsible for. And when I left you last time, I had just introduced the notion of a pound define, and I advertised it quite clearly as something that was no more sophisticated than glorified search and replace of this token with that text right there. So if I do this, a height -- whoops, and I say that this is 80, then anywhere K width and K height appear beyond these two lines, it actually substitutes this for that and this for that. The only exception is it wonft do that in string constants, but itfll even do it in future pound define. So if I were to do something like this, K perimeter, and I equated it with K width plus K height, then this would not only substitute anything down here, but it really would replace that with a 40, and this right there would be replaced with a 80. So by the time you got around to the definition of K perimeter, it would see this not as this token stream, but as two times open paren, 40 + 80, close paren, okay. It doesnft evaluate it. It doesnft even recognize that theyfre integers. It just looks at it as blank text, but the substitution of this there and this there is exactly what you do want. Pound defines are really nothing more than glorious search and replace. We use them in C, pure C, to consolidate what otherwise metric numbers and metric string constants, and attached meaningful names to them. What you may not know about pound defines is that you can define an extension to the pound define, and you can actually pass arguments to pound defines as if theyfre functions. Theyfre not called functions. Theyfre called macros, so I could do something like this. The maximum string, A, B. As long as therefs no space between that paren and the final character of the token right there, itfs clearly understood to be a little bit more than just a pound define constant. Itfs a pound define expression thatfs parameterized on whatever A and B adopt in context when theyfre used later on, okay. So if you want some quick and dirty way to find a larger of two numbers, you could substitute it with this. And just to be clear about order of operations and evaluation of everything, we usually see an intense number of parenthesis put around these things just so that therefs absolutely no ambiguity as to how things should be evaluated if this thing is just plopped in context somewhere later on, okay. And order of operations might otherwise confuse things. Youfll actually see an example of that in a second. Anywhere you see this later on, you wouldnft type it in this way, but just pretend that you did. If I, for whatever reason, needed it to tell me that 40 was, in fact, greater than 10, when I see this in code later on, it really will go and find the max symbol, and it will -- every place that there was an A, it will place a 10. Every place there was a B, it will place a 40. So this would, during preprocessing, be replaced by 10 greater than 40, if true, 10, otherwise 40. And even though thatfs an obtuse way of identifying that 40 is greater than 10, that is the textual substitution you would get in response to that, okay. So itfs like a pound define. Itfs this quick and dirty way to inline functionality thatfs otherwise complicated with something thatfs a little bit more readable. You could, of course, go with a function, but you already know from the assembly code you saw last week regarding function call and return, that a lot of time is spent setting up parameters, writing the parameters there, jumping to the function, and then after itfs all over, jumping back and cleaning up the parameters. Itfs not that much work. It may be ten assembly code instructions, but this is the type of thing that would expand to like three or four assembly code instructions. So the entire function or the entire effort of determining a maximum number using just traditional function column return would spend 70 percent of its time, or something about that percentage, just calling and returning from the function. Do you understand what I mean when I say that? Okay, using this pound define thing, this is this very efficient way of jamming in an expansion of this every place MAX with two parameters is actually used. Now it doesnft actually require that A and B be integers. I mean, of course, we know to look at them, that they should be integers, but if I were to do this, get rid of that. If I were to be senseless and do something like this, this would eventually cause problems. But as far as preprocessing is concerned, all it would do would be doing -- all it would do here is do templatized search and replace, would use this. Every place therefs an A there, youfd see a 40.2, and every place therefs a B, youfd see a hello as a string constant. And only during compilation when it reads the expansion of this as if we typed it in that way, well, say, you know what? I donft like you comparing doubles to car stars, okay, using a greater than sign. Okay, so you would get in there eventually, but you wouldnft get it via the preprocessor. Do you understand what I mean when I say that? Yep. [Student:]Is there a good or bad style to do something like that? Well, in something like this? [Student:]Yeah. I actually donft see the problem with this as long as you have been doing it for more than a few days. I mean this is -- Ifll show you an example of two pound define macros that we used in Assignment 3, one of which you didnft even know you were using, and the other one is in my solution, okay. This is obviously a hack just to introduce a point, that preprocessing is still just text and replace, and that it leads to problems later on might be tracked and flagged in compilation, or it might be flagged when you get a (inaudible) at 4:00 in the morning. Okay, you just donft know. There was a question right here. [Student:]Do you receive from this (inaudible)? The question is do you receive anything. It doesnft receive in the sense of return value, but this is an expression that evaluates to either the result of evaluating A or evaluating B. So this one, before I crossed it out, this would evaluate to the number 40. So if I wanted to, I could do this, all caps, of like, letfs say Fibonacci of 100 and factorial of 4,000, and Ifm curious as to which onefs bigger. Therefs actually a problem with that that Ifll outline in a second, but that would really bind max to the larger of those two values, okay. Itfs interesting that this is something -- therefs something about that call that I donft like, but Ifll explain that in a second. Let me just show you some reasonable uses of pound defines. Wefll be more central. Do you know how in Assignment 3 there was some situations where you wanted the assert condition to be either greater than or equal to zero, and less than logical length, and in other ones, you wanted to be less than or equal to logical length? And depending on how aggressively you reuse and call vector nth yourself, there may have been situations where you were blocked out by the assert statement that sat at the top of the implementation of vector nth. Vector append or vector insert, the logical length is a completely reasonable parameter to accept, but if you called vector nth using that value and you had the right assert statement inside, it would actually block you out and error out and end the program. Do you understand what I mean when I say that? Well, what I did, rather than writing a function that computed the nth of the address of the nth element in a blob of memory, I wrote it as a pound define macro. I just did this. Pound define, I called in nth Lin address, and I framed it in terms of base and a Lin size and index, okay. And I equated that all in the same line. You can actually do that and itfll allow you to continue the definition on the next line. I equated it with this like that, okay. I could have written it as a function. The reason I wrote it as a separate thing altogether is because I wanted something that did the point arithmetic for me without the asserts. I wanted to control the go and get the millionth element, even if it were dealing with an array length too, but I would actually call this from within vector nth after Ifve den the assert. Does that make sense to people? And so this way I had this quick and dirty way of actually doing this type of point of reference just once, studying it and saying, gOkay. This needs to be careful code because itfs the type of code that can go wrong if youfre not careful about it.h Make sure that this is doing exactly what I want it to do, and then call this everywhere. I see a lot of people do the point arithmetic like seven or eight times in vector dot C, okay. And if youfre cutting and pasting it, thatfs not great, okay. If youfre cutting and pasting and you got it right the first time, itfs probably okay, but Ifd much rather see people consolidated this to either a help or function, or now that we know it, a little macro that jams this calculation in the code for me even though it looks like the function, okay. Does that make sense? Therefs no asserting going on here whatsoever, so I can get the asserts right, and rather than calling vector nth everywhere, I can just call nth a Lin address, okay, wherever I would otherwise call vector nth internally. So I never have to worry about whether or not the off by one nature of what vector nth allows in terms of incoming values to block me out accidentally, okay. Make sense? Now the thing about this is this looks like a function call. There is really no type checking done on these things right here, so this only works post preprocessor time. If this gets specified to be a pointer, and these are things that can be multiplied together and ultimately be treated as an integer offset, okay. You usually do get that right, but itfs not as good as a true function in that regard because a preprocessor doesnft do error checking at all, but it does push the expansion to the compilation phase where it does do error checking, okay. Usually donft like separation of cogeneration, or Ifm sorry, letfs say C code generation from the actual type checking, but you just deal with it with the down points. Question over there? [Student:]Well, with the (inaudible) equal to or as integers? You could certainly. When I used this, I implemented void star vector nth, took a vector star V, and an nth. I think it was called position. And as it turns out, it was two lines long. I had the assert position greater than or equal to zero. I had the assert -- I actually had these on one line, but Ifm just making it clear, position is less than V arrow lodge length, spell it right, lodge length. And then right at the end, I said return nth a Lin address where I passed N V arrow OMfs, V arrow OM size and position. So in response to this macro call right there, itfs not really a call. Thatfs kind of the wrong word. Itfs just the placement of a macro so that it expands during preprocessor time to that as if we typed it in ourselves that way, okay. And as an expression, it evaluates presumably to the right address, so thatfs what gets returned, okay. That make sense? Okay. There are some drawbacks to this. Itfs quite clear that thatfs a macro because I put it right there. What you may not have know is that these right there are also macros, okay, and Ifll show you what they look like in a second. Theyfre a little weirder, but nonetheless, they are in fact macros, and thatfs how they can be stripped out using some compilation flags so that theyfre not present in the final executable that you ship as a product. Somebody had a question, yeah. [Student:]Quickly, thatfs not your data void star, but the top was car star (inaudible)? Actually, in pure C itfs not a problem. Actually, in none of the language itfs a problem because remember void star is like the all accepting pointer, so itfs what youfre doing when you assign something of type car star, which is what this becomes, and you return and funnel it through a void star. Youfre doing whatfs called upcasting. Youfre just going from a more specifically typed pointer to something more generic, and it just knows that therefs no danger in that direction. Itfs when you downcast and you say, gI have this generic pointer, but now Ifm claiming that it was really this very specific pointer all along,h but you really do need a cast in many situations, certainly, if you have references involved, okay. Other questions? Okay. So the problem with this that I did outline is that you donft get side checking at all during the preprocessing phase. There are other problems associated with this, but let me talk about what assert looks like. Youfve seen -- I imagine 80 to 90 percent of you have actually seen an assert statement fail, and youfve seen what happens when the condition is passed to assert isnft met. The assert dot H file defines assert. It doesnft define it as a function. It looks like a function call, but it really is this. Define assert, and Ifll just put C, O, N, D, and itfs equated with this. It actually evaluates cond, okay. And if the condition is true, you know that it just returns in the functional sense, although it really is not a function call. When this passes, it just basically evaluates to a no op and doesnft do anything, and just continues to the line after the assert. What it does is it needs to have at least one statement in the -- sort of the if region of this turnery thing. So it just casts zero to be a void just to say, gOkay, donft do anything with this zero. Donft allow it to be assigned. Just has to be present to sit in between the question mark and the colon.h This right here is some elaborate thing. That printout, itfs a standard error. Some string that involves the filename and the line number of this assert in the original file followed by an exit. Actually, it doesnft have this right there, okay. So you may not understand the syntax and how everything is exactly relevant to the implementation of assert, but you know that this looks harmless and this looks pretty drastic, okay. So whenever you put assert position is greater than zero in your code, what youfre really asking the preprocessor to do for you is say, gYeah, take this assert position greater than zero, greater than or equal to zero, and replace it with position greater than or equal to zero, oh, awesome. Donft do anything, otherwise end my program and tell me what line this thing failed at.h Does that make sense? Okay. The actual full definition is this. If defines N debug, thatfs kind of like a pound define, but itfs an if question about the presence of a pound define. If thatfs the case, then pound define assert of condition to just be a no op -- whoops, regardless L rather, wefll do this. So this is the thing youfre using in Assignment 3, and this is the way itfs -- this is really turned on. If you pass or you define a count defined constant prior to this called MD bug for no debugging, then it replaces all of your assert calls with this harmless statement right there. Okay, so it technically is one statement, and this zero compiles to just one line of assembly thatfs optimized down to zero lines of assembly. But thatfs how the asserts go away when you compile it a different way so that therefs no danger of asserts actually failing on your behalf in production code. Does that make sense to people? Okay. There are some problems with the definition of assert, not really. I actually want to go back and revisit this function right here, and in particular, that right there, that particular use of max, and start to show you the drawbacks of the preprocessor. And this is actually related to why I prefer static const globals as opposed to pound define constants because Ifm trying to like get you away from the preprocessor to the degree you can. This right here, it is so literal about a textual search and replace that it will call one of these things once, and the other one, the larger of the two might quite arguably be the more time consuming one. It will call it twice. Why is that the case? Because this right here, because of that pound define definition for max over there, that one expands to this is equal to -- this is the case that Fibonacci of 100 is greater than factorial of 4,000. If so, then return Fibonacci of 100, else return factorial of 4,000. Okay, and then there would be that right there. Okay, thatfs how literal the text and replace is, text, search, and replace is. And so you actually get the imprint of this is a very time consuming function. This isnft quite as bad because itfs a linear recursion. But if you turn it up to Fibonacci 100 is greater than this right here, itfs gonna take not only a long time, but twice as long as a long time, okay, because of the second call right here. Make sense? So it doesnft actually cache results internally, or itfs not clever at all. It assumes that you really meant to type it this way because of the way you framed the definition of the pound define, okay. There are even -- even so, even if itfs kind of stupid from an efficiency standpoint, at least itfs correct. Clever C programmers at one point go through this phase where they try to do as much in a single statement as possible, and so they might want to figure out the larger of two variables, and simultaneously increment the two variables, so theyfll do something like this. Oh, yeah. I want to know the larger between M plus M and N, but I also want to increment both of them at the same time, okay. It will actually commit to a ++ on the smaller one just once, and will commit to a ++ on the larger one twice because of the way it expands it. This would be replaced at preprocessor time with this. Is M greater than M? Oh, and by the way, increment them. Whoops, oh, it is, okay. Well, then return the value and then increment it, otherwise, return the other element and increment it. So you certainly see that ++ is being levied a total of three times, okay. That make sense? Itfll return one more than the true larger value, and itfll also promote the larger value twice as opposed to once, okay. Now you could argue that these are moronic examples because people wouldnft do this in practice, but you could also argue that the language should be sophisticated enough that it just doesnft allow people to do these types of things because if it does happen, maybe it happens one day out of 300, once a year, but you could very easily spend four to eight hours just trying to figure out why this one little line isnft working properly, okay. When those types of things are allowed to happen, you have to somewhat blame the language. You certainly can blame the language as opposed to the programmer if other languages wouldnft have allowed something like this to happen, okay. So as we get to be better programmers, wefll start to be more opinionated about how good the languages themselves are, and how they allow us to quickly get to a final product, and making it as easy in the process as possible. Okay, C is really working against you in a lot of ways, okay. It was invented in like the late 60s, early 70s. It came into fashion. The spirit of programming then was let me do whatever I want, man, and so you can get down in the hardware. And it wasnft as problematic then because think about how small code bases were in 1965. You canft even think about that because I wasnft even born yet, much less you. But youfre dealing with programs, except for operating systems. Unix was being written in the late 60s and early 70s, maybe a little bit earlier than that, but most programs were like pawn and maybe like miniature golf with like the most ridiculous paddle -- club and ball that you can imagine, just really, really simple programs that had to fit in 64K of memory or 16K of memory. There just couldnft be that many programs. That just means programs were more manageable then. Now youfre dealing with code bases. I canft even imagine how many lines of code exist behind Google walls, behind Microsoft walls. Wefre talking millions, tens of millions of probably lines of code, probably more than that. I have no idea, okay, but like wefre a magnitude like where the exponent is six or seven, okay, very, very large. If youfre weighing that much code, you donft want to have to say, gGod, this,h you donft want to have to look for a problem like that and do a binary search on 10 million files to figure out what the problem is. You want it to be very, very likely that youfd get something right the very first time you type it, and that is unlikely in C++. Youfre all learning that right now, okay. Does this make sense to people? Okay. There are other aspects of the preprocessor I should talk about. I think Ifve hit on everything with regard to pound define. Therefs also the pound include. When you do this, pound include, Ifll do assert dot H. Ifll do one above it, include -- letfs do STDIO dot H. Thatfs for print F and scan F and things like that. You know about assert dot H, and then you also do this, and you saw things like how to include gen lib and simp IO dot H in CS106. I donft know whether anyone ever answered the angle bracket versus the double quotes thing, whether you just say, gOh, I have no idea, but Ifll just do it because it works if I do it that way.h Whenever you use angle brackets or less than and greater than signs to delineate the name of the dot H file, itfs taken by the preprocessors to mean, oh, thatfs a system header file, so that actually should be with the compiler, so I should look one place by default for those files. But when itfs in double quotes, it assumes that it is a client written dot H file, so it looks in the actual working directory by default. There are flags you can pass to GCC via the Make system to tell it other places where the pound include files might live, but by default this means in user slash bin slash include, and user include, which youfve never looked at before, but they exist. This means, at least in our world, just looking at currently working directory over your compiling, and thatfs probably where they are, okay. Make sense? Another thing you might now know about these things is just like pound defines in many ways these are instructions to search and replace this line with something else. This onefs easier to deal with because you have a sense of what vector dot H looks like. What this does, when the preprocessor folds over that line and says, gOh, pound include vector dot H in double quotes. Let me go find it. Oh, I found it.h It removes that line right there, and it replaces it with the full contents of the vector dot H file. Does that make sense to people? And so the stream text that it builds for you as part of preprocessing, the output of preprocessing, itfs whatfs called a translation unit where all the pound defines and all the pound includes have been stripped out. It creates the text thatfs actually fed to the compiler on behalf of this line right here. It would replace it with the contents of vector dot H as if youfd typed it in by hand there, okay. Does that make sense? Now you say, gWell, why donft I just type in all the prototypes every single time at the top?h You want to consolidate all the prototypes to one file so that everyone agrees consistently on how all those functions should be called. But if you wanted to, you could just get rid of this, and if youfre only gonna use one or two of the functions, you can manually prototype them right there. And as long as itfs consistent with the real prototypes that exist in the dot H file, it wouldnft cause any problems, okay. The pound include process is recursive. So if you pound include a file that itself has pound includes, it will keep on doing until it just bottoms out, okay. It does basically this recursive depth research. Itfs like random sentence generator without any random numbers, okay, where it builds a full stream of text built out of all the pound include files until it just has one stream of non pound include and non pound define oriented text that gets fed to the compiler, okay. Does that make sense? Okay. So therefs that. If you want to experiment and you want to see what the product of just preprocessing is, what happens when just the pound include and the pound defines are stripped out, go create like a three line file with two pound define constants, and just pound include a dot H file that you write yourself. Donft pound include any system headers because then the output is really, really long. But if you want to do this, GCC, youfre used to seeing something like GCC, the name of a file dash C, like letfs say vector dot C or something like that. You havenft typed that in yourself, but you see that published to the screen every time you type make, first time at three and four. Well, dash C means compile, but donft try to build an executable. Therefs actually something a little more drastic, dash E. What that means if run the preprocessor and output the result of preprocessing, but donft go further than that. So that means if you look at this file, youfll have some senses as to what it should look like before. You certainly know what it looks like before preprocessing. All of the components that make up this file and vector dot H and anything that vector dot H pound includes will be spliced in sequence to build one big translation unit, okay, with all the prototypes and all the implementations that are in vector dot H, vector dot C rather, to the compiler itself, okay. Make sense? Okay, as far -- what happens if vector dot H pound include -- oh, Ifm sorry. You know hashset dot H pound includes vector dot H. Suppose I were airheaded and I said, gOh, I want -- you know, I think that vector dot H should also pound include hashset dot H.h You could if the preprocessor werenft very smart, and you also didnft have the power to prevent this. You could get circular inclusions. Oh, I better include that. Well, I have to include that. Oh, I better include that. It just could go back and forth forever. The preprocessors will solve this problem a while ago. Wefre not he first people to accidentally do that, but youfve also seen things like this. If not defined, something like vector dot H, theyfd go ahead and define it, and then list all the prototypes that come in vector dot H, and then mark the end region. The very first time that vector dot H gets pound included, or presumably this is the contents of that vector dot H file, as the preprocessor folds over it, it looks in this and goes, gOh, have I not seen this little token before?h And if it hasnft, itfs like, gOkay, well, then I guess this is safe to do.h Itfll come down here and define exactly the same thing. You donft have to associate anything with this key right here. Itfs just basically like a valueless key and a hashset behind the scenes, but as long as itfs defined, then if for whatever reason this pound includes, either itfs self directly or something that would pound include vector dot H, the second time itfs the preprocessor tries to digest it as part of the generation of the translation unit. Itfll come here and say, gOh, is this not defined?h No, actually it is defined for reasons that may not be clear to me, but I defined it earlier apparently, so itfll circumvent all this and put an end to the vicious cycle, okay. Make sense? Question over there? [Student:]Yeah, just a question. The reason why you donft want to include CPP files for that very reason? No, actually thatfs a slightly different reason. All the dot H files, they declare prototypes, but nothing in dot H files ever emits -- has any code emitted on its behalf. Like you declare structs, but it doesnft actually generate code in response to that. Youfre not supposed to declare storage for anything in dot H files except occasionally a very clever way of declaring a shared global variable, okay. But the dot C files and the dot CC files, they actually define global variables and global functions and class methods and things like that, things that really do translate to zeros and ones in the form of machine code, but we view them as like M of R 1 is equal to R 3 plus 12 or something like that, okay. But dot H files are supposed to be just about definitions that have no cogeneration associated with them so that you can read them multiple times. Like how many files are there for Assignment 4 and every single one of them probably pound includes vector dot H, right? If they all pound included vector dot C, then they would all be defining vector new and vector dispose, and so when time came to build RSS new search as an executable, youfd have like three or four implementations of the same function. Does that make sense? Declaring the prototype for a function is very different than actually defining the function. One has code emission associated with it, the compilation actually generates code on behalf of the implementation. It doesnft do anything on behalf of the prototypes, okay. [Student:](Inaudible). Yeah, absolutely. Youfre not required to do this. You just try to choose tokens that are very, very, very unlikely to come up anywhere else, okay. I mean this might be what you choose every time you have a vector dot H file, but presumably, you only have one vector dot H file, which means youfd only have one token defined like this. And when you really use normal pound defines, you just avoid the leaving underscores and the trailing underscores, okay. Does that all make sense? Okay. So if you get a chance, it takes you all of 15 seconds to do this. Just type in by hand GCC space dash capital E, and then the name of some dot C file in the directory where you happen to be, okay. And youfll just see it like tons and tons of stuff, but toward the end, youfll see familiar codes. Youfll see the vector dot C code you wrote at the end of it, but at the top, all the prototypes and any of the dot -- the stuff inside the dot H files that happen to be pound included by vector dot H, okay, and also by vector dot C for that matter. Question in the back? [Student:]Yes. So you said that thatfs the way they had them including circulation. Thatfs one of the ways. Thatfs the antsy standard way of doing so, yes. [Student:]So my question was if that was not included, what did you say? Most preprocessors are smart enough that they donft want to commit to circular recursion just because youfre not telling it to not do that. Most of them are very smart and they just keep track of it. And I think by protocol it understands that therefs no value in ever pound including something twice, but earlier implementations of preprocessors werenft interested in solving every single problem that might come up. It wasnft -- I donft want to say itfs an edge case. Itfs probably a very common case, but in theory, you donft want to just assume that the preprocessor does the right thing, so you just want to make sure it couldnft possibly fail you or infinitely recourse and loop forever, even if youfre using like some dummy implementation of the preprocessor, okay. Some compilers have their own versions of this. Ifve seen -- ten years ago I saw a preprocessor directive called pragma, and it had this optional word over here called once. That was just a more condensed version of trying to do exactly the same thing here without having to invent these names. This doesnft exist, and certainly not antsy standard, and it used to exist in code wear and I donft even see it in code wear anymore. But different preprocessors can do whatever they want to to extend the standard preprocessor directives. You should just concern yourself with pound define, and if you want if not defines and if defines, and the Lfs, but really just worry about pound define and pound include. And if you know what those are doing at preprocessor time, then youfre certainly walking away with a good amount of information, okay. So therefs that. Let me draw some pictures so youfll have something to write down. So this is vector dot C, and it has this as a code base in it. And it has this file, this file, and this file pound included at the front of it. Letfs just say that this is A dot H, and B dot H, and C dot H. I know thatfs small, but you can just name them anything you want to, okay. You know that itfll go and find the contents of A dot H and B dot H and C dot H, and as part of preprocessing, what itfll do if the contents of A dot H happens to be that, and the contents of B dot H happens to be this, and the contents of C dot H happens to be this, it really will build a stream of text thatfs consistent with all these stacked emoticons. This is the stream of text it would build in memory, and the nose list smiley face would be at the bottom. And that stream of text would be passed on to the true compilation base, okay. Everything that resides in here is still supposed to be legal C, it was just spread among multiple files at this level, so that things like prototype and struck definitions and class definitions and pound define macros and constants could all be consolidated to one place. Youfre familiar with that concept, right, once used from everywhere, okay. Well, if you let it got further, it will now compile, okay, where it will take this stream of text as if you typed it in character by character this way and compile it and emit assembly code on your behalf, and as long as there are no errors, itfll build the dot O file, okay. As soon as it finds one error, itfll say, gOop, an error.h And you know, you probably remember the C++ compilers from X code and from Visual Studio C++. When it gives you an error, it gives you a lot of them, and it goes on for pages and pages and pages. You can suppress it. You can tell it to stop after one error if you want to, but just assuming that everything compiles cleanly, this by default would generate a vector dot O file, okay. And youfve seen these dot O files pop up in your directories. This would have all these assembly code statements. If it were compiling to CS107 assembly, you might see things like M of R 1 is equal to SP, things like that, the things that actually emulate the implementations of all of the functions that happen to exist in this translation unit, okay. Does that make sense everybody? Okay. So what I want to do is I want to talk about compilation and linking kind of simultaneously. And Ifm just gonna go through one. I donft want to say itfs an easy example. Itfs actually quite sophisticated, but itfs a short program and I can just talk about what happens, and then talk about what happens when you just stop -- when you start to remove pound include statements, okay. Now I am being GCC specific in my discussion of compilation. Ifm just doing so because GCC will probably become the most important compiler to you, at least at Stanford, if youfre programming in C++, okay. Let me just give you a sense as to what the dot O file would look like in response to this dot C file. Let me just write this file called main dot C. Itfs gonna be a full program. Itfs not gonna do anything, but itfs gonna be legal C code, and itfs gonna cause some functions. I am going to pound include STDIO dot H. The only thing thatfs relevant is that it defines the printec function, okay. Ifm also gonna pound include STBLIB dot H with the L right there. This is gonna define malloc and free. It also defines realloc, but Ifm not gonna call realloc. And Ifm also gonna pound include assert dot H, not N, H, and this is the program. Nth main, nth ard C, car star ard V, itfs an array. And Ifm just gonna do this. Itfs like four or five lines. Void star memory is equal to malloc of 400. Ifm going to assert that memory is not equal to null. Ifm going to print F (inaudible) because if Ifve gotten this far, then I know that I got real memory, and Ifm gonna celebrate by bringing it. So this is in place just to demonstrate exactly what compilation does. Now pretend wefre in a world where there are no other architectures beyond the mock CS107 architecture we discussed last week, okay. So on the CS107 chip, and I feed this to GCC in accordance to the way that the make files that youfre dealing with actually would call it. Itfs going to run it through the preprocessor. You know that these three things would be recursively replaced to whatever extent itfs needed to build one big stream of text, which at the end has this right here, okay. This right here corresponds to that in this emoticon drawing over here, okay. I donft have to generate the full assembly codes for this, but the interesting parts are gonna be this. This is the full dot O file thatfs generated as the compiler digests the expansion of this to a translation unit. Preprocessing takes this and builds a long stream of text without pound include and pound defines, and thatfs fed to the GCC compiler that actually generates that O code for you. You certainly should expect there to be a call to malloc, okay. You would actually see some lines right here like SP is equal to SP minus four. M of SP is equal to 400. Those things should be familiar to you based on what we talked about last week. Ifll move over to the right, okay. You would expect to see a call to printec. You would expect to see a call to free. You would expect to see RV is equal to 0. You would expect to see a return at the end. Those are gestures to the interesting parts of this program from a compilation standpoint, okay. Why isnft there a call to the assert function? Because I included preprocessing in the discussion, and that right there doesnft define an assert function. It declares or defines a way to take this right here and replace it with an expression that doesnft involve an insert function, okay. There would actually be a -- based on the way I wrote it before, I didnft preserve it. Remember how I called F print F before? Thatfs the file star version, or basically the IF stream version of print F. There would be a call to F print F in here as well because of the way I defined assert. Does that make sense? Okay. So therefs that. This is a clean working program. Itfs not very interesting. It does lots of business and has the weirdest way of deciding whether to print yay or not, but nonetheless, it would compile and it would run. It doesnft even (inaudible) from memory because Ifm very careful to free it down here, okay. Compilation generates this dot O file. If I donft include a flag inside the make file or with the GCC call, itfll actually try to continue and build an executable. By default, itfs named A dot Out. If I just use GCC right here, if I want to suppress linking in the creation of an executable and just stop at the creation of a dot O file, I would pass it -- I wouldnft call GCC, but I would call GCC dash C. It means stop after compilation, okay. And youfve seen the dash Cfs fly by with all the GCC calls that are generated from make. Make sense? Okay. If I donft include this, then it will try to build an executable. By default, it would create something called A dot Out unless you actually use the dash O flag to specify what name should be given to the product. And if I say my prog for my program, then it wonft use A, its default, (inaudible). Itfll actually name it my.prog, okay. The only requirement thatfs needed past compilation, this is compilation, the generation of this. When it tries to create an executable, youfre technically in whatfs called the link phase where it tries to bundle all the dot O files that are relevant to each other. In this case, therefs only one dot O file, at least exposed to us. And it tries to build an executable. The only requirement that really -- you really need is you need there to be a main function so it knows how to enter the program. You have to have a definition for every single function that could potentially be called from anywhere, okay. And you can only define all the functions -- each function can only be defined once, okay. Therefs not many link errors that can happen when youfre trying to create an executable, okay. Does that make sense? Now by default it actually links against some libraries that are held behind the scenes that provide the implementations of print F and F print F and malloc and free and realloc and all of those, okay. Does that make sense? Okay. So therefs that. This is compilation. This is linking right here. Ifll let you say so. And if I type in dot slash my prog, itfll run this thing, print yeah, and wefll have a working program here. What I want to do, I only have a minute, so Ifll just kind of give you like a little teaser as to what we should -- what wefll see on Wednesday. I want to kind of tinker with what happens if I forget to pound include STDIO dot H. All that, that just confuses matters a little bit with regard to the definition of print F. Does that make sense? Okay. Then Ifll say what happens if I forget to pound include STDLIB dot H and I donft have explicit prototypes for malloc and free visible? Theyfre not included in the translation unit, so theyfre not around during compilation, okay. What kind of impact does that have on the ability to build A dot L or my prog? And the most interesting of the three is what happens if I accidentally exclude the definition of the assert macro, so that itfs not visible during compilation. Does that make sense? Okay. Well, I have negative ten seconds, so Ifll let you go. I will talk about those three things. Ifll reproduce this on Wednesday and wefll spend the first half an hour talking about it, okay.