Hello. Itfs the mad rush. Welcome to Wednesday. As you can see by the growing mound of paper up here on the front of my desk, Assignment 1 is coming in today, the first of the sequence of tasks that Ifm gonna set you to doing this quarter. And Assignment 2, the handout -- I think Jason brought it -- it is in the back for you to pick up on the way out. The files actually arenft up for Assignment 2 yet, so you may wanna hold off jumping right on that until a little bit later today. Therefs been a little bit of a -- my householdfs in a little bit of upheaval because my younger son got strep this weekend, which he thought the best thing to do with would be to give to me, so right now therefs a little bit of strep. So you probably wanna stay away from me because I donft think you want it any more than I want it. And it also means that Ifm gonna need to cancel my office hours today. I donft feel that good, and I need to get home to my kids, too. Okay, so letfs just ask a little question about Assignment 1. I always like to do this when the assignments come in because itfs good for me to kinda know what the experience was like for you, as well as kinda you to see in context how everything went for everybody. So let me just ask how many people think they managed to do the whole thing, start to finish, five problems in less than ten hours? Less than ten? Okay, so thatfs actually like a good chunk of you. Thatfs good. Thatfs less than I wouldfve expected. Therefs certainly some hurdles to kinda getting everything worked out in C++ that in general I would -- the people who raised their hands, Ifm assuming what youfre telling me there is that the coding was very straightforward, like if you had to write that in Java, you probably couldfve done it in half the time, and probably most of the thing that got you tripped up was how do I say this in C++. What header file do I need? What is the compiler telling me? How many people think it took them between 10 and 15 hours? Okay, another equally large group. So thatfs kind of my target zone. So you guys are kinda right where Ifm thinking it should be. Let me ask about 15 to 20? Okay, a little bit smaller group here. And then letfs say more than 20. Anybody wanna admit to that? You can tell me later if youfd rather not. So that seems pretty good. I do expect that the assignments -- at the beginning here, our first three assignments are a little bit like problem sets where we give you some small things to work on that give you some skill drilling. The first big kind of comprehensive program is the one that goes out in the fourth week, but they do build up a little bit. We get some more mastery, and we kinda keep moving. And then Assignment 4 will be kind of a heftier thing, so kind of in terms of planning, I think, we believe that the range is probably 10 to 20 hours, but the early weeks tend to be a little bit more on the light side, and then they do kinda pick up steam. Any comments on the assignment that you wanna share that I should hear about? [Student:][Inaudible]. Somebody said they spent three hours, just getting this burnt and that, and whatnot. Was that the Mac or the PC that was so -- the PC. Wefll try to negotiate with Microsoft for an easier install pack. Theyfve been trying to help us, but it turns out itfs just not really set up to make it easy, and then the fact that therefs a lot of different OSes that it needs to coordinate with, itfs kind of like most PC installs. Itfs not actually trivial. [Student:]One thing I was thinking is Ifm sure a lot us made a text file to test the prompt in Problem 5. Maybe it wouldfve been nice to have just a standard one rather than testing. I think thatfs a good point, although I will say that in general I think learning to be able to make your own test data is actually really valuable, and the idea that knowing exactly what you put in it is almost sort of better than having me give you a test file which then you have to go look at, and figure out whatfs in it, and figure out what the results would be. If you stack the deck with 10, 20, 30, 40, 50, youfll know what to expect. And so in some ways, it actually kind of gives you I think better control. I think sometimes what I found out when I give students test data, they donft tend to make up any other test data. If I give you no test data, then youfll have to make up some, and then that gets you thinking about how to create good test cases. But that said, it certainly would be trivial to do so. Thatfs probably not where you spent your most time, though, I bet. How do you feel about C++ now after the assignment relative to kinda before you got into it? Are you feeling okay about your skills transferring? The compiler being a little bit of a change for you, the language. Hopefully, you feel like -- one thing Ifd like to hear you saying is that youfre confident about your programming skills coming with you, that youfre not finding yourself having to relearn things that you knew how to do before. Youfre just learning how to express it a little bit in different syntax, and knowing how to divide things into functions, and test them, and write loops, and use variables should all seem very familiar, so hopefully thatfs not coming out too surprising. All right. So let me keep moving forward. Wefre gonna talk about the map and the set classes which are the remaining classes of the class library. Hopefully, Ifll do most of that today. If not, what we wonft finish today, wefll do on Wednesday. And then wefre gonna pick up and talk about recursion, and at that point wefre gonna move back to the text, and wefre gonna cover Chapters 4, 5, 6, and 7 just pretty much straight in order, so if youfre looking to read ahead. I do think this is actually some of the best parts of the reader is the chapters -- especially on the recursion chapters, 4, 5 and 6. So Eric Roberts, who wrote the original text that I edited and sort of worked on, I did the least editing in 4, 5, and 6, so they in some sense are most true to Ericfs original work, and I think they are some of the strongest conceptual help. And recursion being kind of a tough thing to get your head around, I think youfre gonna want second sources, so this is a good time to read ahead in the text, come to lecture having seen a little bit of it from that perspective, and ready to hear it from my perspective, and hopefully together wefll move toward mastery of that topic. So wefve seen vector, grid, stack, and queue, which is what we were doing on Friday, and these four are the group we call the sequential containers where theyfre storing and retrieving things on the basis of sequence. You place things LIFO into the stack, last in first out. You put things in the vector by index, retrieved by index. Therefs a sequence thatfs being maintained behind the scenes to store and retrieve the data. All of these containers have the property that they donft really examine your elements. They just store them. So you can put things into a stack, but it never really looks at them. It never looks at the strings you put in or the students that you put in the vector. It just holds onto it, and then when asked to give you something back on N queue -- I mean a D queue operation or a get at or something like that, it retrieves what you earlier placed there. These are not very fancy. They do things that are very useful, but the bang for the buck is mostly in terms of just modeling a particular behavior like the stack and the queue, and then allowing you to kind of use it easily without error. The associative containers, which is what map and set belong to, are the ones where youfre really getting a lot of power out of them, where they do something that actually is hard, or inconvenient, or inefficient for you to do manually. Theyfre not about sequencing. Theyfre about relationships. The set is a little bit like a vector in the sense that itfs a collection of things without duplication, but it has fast operations for checking membership, adding things and removing things, checking to see if somethingfs in the set in a way that the vector youfd end up kind of trudging through it to see do I already have this value in here. Therefs no easy way to do that with a vector as it stands. And so set gives you this kind of access for is this in the set, is it not, as well as a bunch of operations that allow you to do high level combinations like take this set and union it with another, or intersect it, or subtract this one from that, and get the mathematical properties of sets expressed in code. The map, even fancier in some sense, is a key value pairing that you want to be able to do some kind of look up by key. I wanna have a license plate number, and I wanna know what car thatfs assigned to, and what its make, and model, and owner is. The map is exactly the thing you want for that where you have some sort of key, or some sort of tag, or identifying ID that you wanna associate some larger thing with, or some other thing with for that matter, and be able to look that up, and make that pairing, and retrieving that pairing information quickly. Both of these are designed for very efficient access. Itfs gonna be possible to do these lookups, and additions, and combinations in much faster than you would be able to do if you were doing it manually. Wefre gonna peek under the hood in a couple weeks and see how that implementation works out and why, but for now whatfs pretty cool as a client is you just get to do neat things with it, stick stuff into it, check things, build things on top of it, taking advantage of all the power thatfs there through the kind of simple interface without having to learn what crazy internals behind it are. Itfs not something we have to worry about in the first pass. So wefll talk about map. As I said, itfs a collection of key value pairs sort of modeling a dictionary, a word and its definition. It could be that itfs the student ID and itfs their transcript. It could be that itfs an index which tells you what a word -- and what page as it appears in this text. So therefs a key. Therefs a value. You have operations that once you create a map where you can add a new pair, so you add a key and a value together and stick them in there. If you attempt to add a different value for an existing key, it will actually overwrite. So at any given time, therefs exactly one value associated per key. You can access the elements that you stored earlier saying Ifve got a key, give me the associated value. I can check to see whether a key is contained in the map, just if I wanna know is there some entry already there. And a couple of the fancy features wefll get to in a second have some shorthand operators, and some access that allow you browsing. Some examples of things that maps are really good for -- so maps turns out to be just ubiquitous across computer science problem solving. You probably saw the hashmap in your 106A class, and you got a little bit of work out of that toward the end of the quarter. This guy, therefs just a thousand uses for which a map is a great tool. A dictionary is an obvious one, a thesaurus where you have a word mapped to synonyms, the DMV which has license plate tags that tell you the three cars Ifve owned in my life and what their license plate numbers were -- any sort of thing that looks like that fits in the mapfs goal. Letfs take a look at what the interface actually looks like. This is a little bit simplified. I removed a couple of the advance features Ifm gonna get to in a second. The class name is map. Itfs templatized on the value type. If you look at the add operation, it takes a string type for the key and a value type for the value, and that means that the map is templatized on only one of the half of the pair, that the pair is always a key which is a string mapped to some other thing, whether itfs a student structure, or a vector of synonyms, or a string which represents a definition. That value type can vary based on what the client chooses to fill in the template parameter with, but the keys are always strings. Ifll kinda get to why that is a little bit later but just to note it while wefre here. So things like remove, and contains key, and get value that all operate on a key always take a string, and then what they return or access is templatized by this value type. So therefs one little thing to need to know a little about how the interface works. When you create a map, it is empty. It has no pairs. If you ask it for the size, it tells you the number of pairs in it. If you ever add something to a key that was already present -- so if earlier Ifd said Juliefs phone number is this, and then I later went in and tried to add under Julie another phone number, it will replace, so there will not be entries for Julie. Therefs always exactly one. So add sometimes is incrementing the size, and other times the size will be unchanged but will have overwritten some previous pair. Remove, if there was a matching value will decrement the size. If actually there wasnft a matching entry, it makes no changes. Contains key can tell you whether somethingfs in there. And then get value has a little bit of a quirk in its interface that youfll need to be aware of. If you ask it to retrieve a key that doesnft exist, itfs got a little bit of a problem there. If you say give me the phone number for Julie, itfs supposed to return to you the previously associated value, something of value type. Well, if you ask it to get a value for something and there was no pair that matched that, it has a little bit of a conundrum about what to return. For example, if this table were storing numbers -- maybe you were storing the latitude and longitude of a city. Well, when if you asked it to return the value for a city it doesnft know about, what is it supposed to return? What is the default -- some sort of sentinel value that says no latitude, no longitude. It says, gWell, what is that?h If you were getting strings -- if you had definitions, you might return the empty string. If you were returning numbers, it might be that what you wanna return was zero or negative one. There is not for all types of values some clear identifiable sentinel that would be appropriate to return. So what happens actually in get value is if you ask it to retrieve a value for something that doesnft exist, it throws an error message. There is no other sensible return value it knows of, and so it treats it actually as a drastic situation. That means if you are probing the table for something that you traditionally wanna be checking contains key if youfre not positive that itfs already in there. If you do know, itfs fine to go ahead and get value through, but if you are querying it, itfs a two-step process to verify itfs there before you go get it. So I got a little bit of code. Ifm gonna go actually just type this code rather than show it to you prewritten because I think itfs easier to learn something from when I actually type it, so Ifm gonna go ahead and just show you. So what I have here is a little bit of code I wrote before you got here which is just designed to take a particular input text file and break it apart using stream extraction into words, in this case tokens that are separated by white space using the default stream extraction. So it has a while true loop that keeps reading until it fails, and in this case it just prints them back out. What Ifm gonna do is Ifm actually gonna gather those words, and Ifm gonna put them into a map, so thisfll be my plan. So letfs build a map thatfs gonna map words to counts, so the number of times a word occurs in a document will be stored under that key. So the first time I see gthe,h Ifll put a one in, and the next time I see a gthe,h Ifll update that one into a two, and so on. And when Ifm done, I should have a complete table of all the words that occurred in the document with the counts of the number of times they occurred. So let me go ahead and Ifm gonna pass it up here by reference so I can fill it in with something, and then what Ifm gonna do here -- Ifm gonna say if M contains key word -- so this means that there was an existing entry. Ifm gonna add to it under the word -- letfs take a look at what we did here. If it was already there, then Ifm gonna get the value that was previously stored underneath it, add one to it, and then stick it back into the table, so that add is gonna overwrite the previous value. If it was four, itfs gonna retrieve the four, add one to it, and then overwrite it with a five. In the case where it didnft contain that key, so no occurrence has been seen yet, we go ahead and establish a new entry in the able using m.add of the word with the count one. Yeah? [Student:]Could you try the remove number function if therefs no value there to develop the [inaudible]? It does not, actually. So remove can actually -- partly it has to do with a little bit about can you do something reasonable in that case. Remove in this case can say therefs nothing to remove. Get value just canft do anything reasonable. Itfs supposed to return you something. Itfs like therefs nothing to return. I canft make it up, and thatfs why thatfs considered a more drastic situation. Question? [Student:]You passed the map on the wrong line. Oh, youfre right. I totally did. Thank you very much. That was gonna give me quite an error when I got that. There we go. Okay, fixed that. Thank you very much. And so then maybe when Ifm done I could say num unique words, and then m.size. So if we let this guy go, itfs gonna complain all over the place because it says whatfs this map of which you speak. Then wefll tell it herefs a header file youfd like to derive. Okay. So there were 512 unique words in this document. Itfs the text of some handout we had earlier that I just quickly put in a text file. So this shows sort of the basic usage pattern will typically be Ifm sticking stuff in there, and intentionally updating and changing, and then I in this case was using a count at the end. Ifm gonna note one funky little thing about the map while Ifm here because map has a shorthand access that allows you to retrieve the value associated with a key using a syntax that looks a little bit bizarre, but itfs kinda becoming common -- is to instead of saying m.get value word, that you say m of square brackets of word. So that looks a little bit like an array access, and the first time I saw this I have to say I was just shocked, and I thought, gOh my god. What an abuse of the notational system,h but it has become so ubiquitous in languages now that I actually donft consider it so frightening anymore. Itfs basically saying reach into the table, and instead of using an index to identify which element youfre doing, youfre using a key and saying reach into the bucket thatfs tagged with this index and get the value back out. So this is effectively an m.get value of word but using this syntax. Therefs something thatfs a little bit wacky bout this syntax though. Itfs not exactly equivalent to get value in that it doesnft mind if you try to access something that doesnft exist. And what it will do is its strategy is if you access a word m of some word where this is not an existing key, it will create a pair, it will tag it with the key you asked for, and then it will kind of leave the associated value as to whatever the default value for that type is. So in this case itfs almost like you declared an int variable on the stack, what do you get? It turns out youfll get garbage. Youfll get kind of an uninitialized number which is of very little value, but the side effect of then having set this pair was up that we could immediately go in, and write on top of it, and make the assignment into the one. And so whatever junk condits were there were only temporary, and then I immediately overwrote it with that. In effect, that also means I can do things like this where I do m.word plus equals one, and now Ifm using it both for the add and the get value part. So the m bracket form kind of handles both the things you think of as being add -- set adding something into there, overwriting or adding a new value, as well as just retrieving it when you wanna read it like a get value. So youfll see both of these used, but they do have a little bit of a quirky behavior that using the get value without the existing key throws an error. Using it this way actually kinda sets up an empty error that the assumption is that youfre using it in this context where youfre trying to immediately create it and overwrite it. Question? [Student:]Does the case not matter for map? If itfs -- Case does matter. If I had the word capital T gtheh and lowercase gthe,h those are different words, so it really does -- basically what you think of a string equals comparisons at the base level. If I wanted it to be case insensitive, then I would need to actually go out of my way to convert the keys to uniform case to guarantee that they would match other forms of the word. So let me go back -- and thatfs a little bit of code there. So let me just review a little bit about where wefre at. That shorthand operator that we have -- I said I would explain why is it that keys have to be string type. That if wefre trying really hard to build these general purpose containers, it seems like it would be extra convenient if we could say the key is an integer, and the mapping is -- we could use our student ID numbers, and itfs a studentfs record thatfs there. Why does it have to be string type? Well, it turns out that it really does use the known structure of strings to store them efficiently. Itfs capitalizing on certain properties that strings and strings only have to decide how to store those things so that it can actually do this very fast lookup that does not generally apply to other types of things. I couldnft say the key is a student structure and you have to match student structures. Itfs like therefs not easy efficient ways to match any kind of value type, but there are certain things you can do with strings and strings only that itfs capitalizing on. So in the case where you have something and youfre trying to use a key that isnft already in a string type, the recommended way to get around that is you have to figure out a way to convert it to a string. So if you have an int, a student ID number that you wanna use, or a phone number, something like that, you just convert it into a string form. You take the integer; you make it into a string. If you have a first name and a last name, and you wanna use them both, then you concatenate them together, and use that string as the key -- just ways of building a string out of what you have. Itfs a little bit of a hack, but it solves the problem without too much trouble in most situations. The other question that comes up is what if I have more than one value for a key? The thesaurus example I gave earlier, the key is the word. The thing I want to map it to is the synonyms, that list of words. If I did add a word with each synonym, each subsequent synonym would just replace any previous pairing Ifd put in. If what I really want is a one to many relationship, one word many options, then what I can do is just use vector as the value, so build a map containing vectors containing strings. So Ifll get a nested template out of that, and then when I wanna add a new synonym to the map, itfs a matter of pulling out the existing map thatfs there and adding it to the end of that vector, so it actually is kind of a two step add to get it into the vector and get that vector into the table. So the last thing thatfs missing is one that Ifll go back to my code and we can do together. Once Ifve put the information in the table, I hope you believe me that I can get it back fast. I can check to see if therefs an existing entry and do the lookup by key, put new entries in, replace it, overwrite -- all of that in the interface so far seems just fine. But letfs say I just wanna see that whole table. I wanna print my class list. I wanna see the thesaurus or the dictionary just spewed out, or just actually walk through it and look for new words. As it stands, the interface doesnft give you that access. Theyfre not indexed. Theyfre not in there under Slot 0, Slot 1, or Slot 2. Itfs not like a stack or a queue where you can just dequeue them back out to see all the things you put in there. They kinda go in and so far itfs the roach motel. You can put them in, and if you know who youfre looking for, you can get them back, but therefs no way to kind of scan the contents. What wefre gonna see is that map provides access to the elements using a tool called an iterator. This is the same tool that Java uses, so if youfve seen that, youfre already ahead of the game. Where you would like to see the entries in it, and rather than giving you a whole vector or some other kind of random access version of it, it provides you with a little intermediary. In this case, this class is called iterator. You create an iterator on the class. Here, let me just go write the code. Itfs better to see that. So if I got here and I said I wanna see what they are, what I do is I create an iterator. The mapfs iterator name is a little bit wacky. There are iterators on both the map and the set, and to distinguish them because theyfre similar but not exactly the same thing, the iterator type is actually declared within the map class itself. So itfs actually map of the type in angle brackets colon colon iterator is the name of the iterator for an iterator that walks over a map containing integers. We get one of those from the map by asking for it with the iterator member function. So that sets you up an iterator thatfs positioned to kind of work its way through it. The map iterator makes no guarantee about what order it will visit them, but it will visit them all. It uses a syntax that looks like this. And I say iter.hasNext -- are there more things to retrieve? So the iteratorfs kind of keeping track of where we are as wefre working our way through the map. If there are more keys we havenft yet visited, havenft yet retrieved from the iterator, it will return true in which case the call to iter.next will retrieve the next key to be visited, and then advance the iter, so it will have moved past that and kind of moved that to the already visited side, and now it will continue to work on the things that are unvisited. As I keep doing that, iter.next is both retrieving it and advancing it, so you typically wanna call that once inside the loop to get the value, and then check that hasNext again to see whether therefs more and keep going. It gives you just the key. It doesnft give you the pair. The pair is just an easy call away, though. So I can say key equals and then M of key. And so using the shorthand syntax or the M -- the get value, either way once I have a key, itfs very easy to look up the value. In fact, itfs so efficient, it means therefs actually really no harm in just getting the key and going back in to get the value separately. Itfs still works out just fine. So when I do this, this should actually iterate through, show me every entry in the map, and the count for it. So one thing I will note is the sequence by which itfs traveling through seems to be effectively random. You see my reasonable experience few following somewhere using the numbers 1, 4, 2, so there is no pattern. There is no rhyme or reason. It turns out it is actually internally youfre doing something kinda sensible, but it does not guarantee as for the iterator anything predictable to you about which sequence it will visit them in. It says I will get them all before itfs all done and said, but donft count on seeing them in alphabetical order, reverse alphabetical order, in order of increasing value or anything clever like that. It will get to them all. Thatfs all you can be sure of. That is the [inaudible] thatfs up here. Question? [Student:]On the previous page, on the previous slide, you mentioned something about [inaudible] finding the [inaudible], you would have to use a vector for that? Yeah. So if I had a one to many in that case, so maybe it was synonym to vector of strings, then I could use the iterator to go through and say how big is this vector compared to the other vectors Ifve seen so far. So I keep track of letfs say the [inaudible] so far. So youfd run an iterator that -- so letfs just say I wanna find the most frequent word. Thatfs about the same operation. Let me come back over here. If I just wanna see the most frequent, I could do this. I could say string max, and Ifll say int max count equals zero. So the max is empty here. When I get this thing, I say if M of key is greater than my max count, then I want my max to be the key, and my max count to be M of the key. So Ifve started right with no assumptions about what the max is, and any time I see something thatfs greater than the max I have so far, I update that. So if this were a vector, I would be checking to see that the size of a value is bigger than the size of the previous vector. When Ifm done, I can say max equals max count. It turns out gthe.h The most common word in there? What a surprise, right? It shows up 42 times. So the same sort of [inaudible] would be used if there was some other thing I wanna do. So what the iterator is just giving us is a general-purpose way of walking through all the values. And then what you wanna do with them when you get there is your business. Do you wanna print them? Do you wanna put them in a file? Do you wanna compare them to find the top two, or the smallest, or the biggest, or whatever is up to you. So itfs just giving you access to that data that you stuck in there and get it back out. [Student:]So what does iter.next do? Iter.next is -- think of it as an abstraction. The idea is that itfs almost like itfs got a little pointer. Itfs kind of like if I were in charge of walking through this class and returning each student, it is a pointer that given a certain student will return you the student Ifm currently pointing at, and then move to another random student. And at any given point when Ifm done them all, that hasNext returns false. And so next does two things, which is return to you the next key that you havenft yet seen, but advances kind of the iterator as a side effect so you can now test for are there any more, and if so, the next call will return a different one. So every time you call, it returns a new unvisited key in the map until theyfre all gone. [Student:][Inaudible] or the next one? Well, it returns the next one in the iteration. You can imagine that itfs almost like a caret thatfs between characters at any given point. It will return the next character, but if you call it again, it will return a new one. So it never returns the same value unless there was the same thing being iterated over. So it advances and returns in one step. That is key. There are some iterators that donft have that design. They tend to return the current one, and then you side effect, the STL iterators for example. In this one is kind of more of a Java style iterator. It does both. So if you needed to use the value a couple times now, youfd probably store it in a local variable, and then use it throughout that thing, and then only call iter.next once in each loop. There is an alternate way of doing something to every pair in a map. Ifm actually just not gonna show it to you because I think actually itfs something thatfs overkill. So there is something you can read about in Handout 14 and look at which is called the mapping function. Ifm gonna actually just skip it because I think it confuses the issue more than it helps. When you know how to do iteration, itfs a great way to get access to all things in a map, and since the other function is actually just a more complicated way to get at the same functionality, I wonft bore you with it. Then let me tell you about set. Once you have set, youfve got it all. So map I think is the heavy lifter of the class library. The set is kind of a close second best in terms of just everyday utility value. What set is is a little bit like a vector in some ways, but it has the added features that there is no sequencing implied or maintained by it. If you have the set that youfve added 3 and 5 to, if you add 5 and 3 in the other order, you end up with the same set when youfre done. Those two sets are equal. They have the same contents, so it doesnft consider the order of insertion as important at all. Itfs just about membership. Does it contain these values? There are never any duplicate elements. If you have a set containing 3 and 5, and you attempt to add 3 again, you donft get the set 3, 3, 5. You just get 3, 5. So kind of like the mathematical property that it derives its name from, there I just existence. Is the number in or not? And adding it again does not change its status if it was already there. So your typical usage is you make an empty set. You can use add, remove, and contains to operate on individual elements. Add this one. Remove that one. Contains -- remove has the same behavior that it did on the map which is it kind of silently fails if you ask it to remove something that wasnft there. If you ask add to add something that was already there, it silently doesnft change anything, so both of those operations just kind of quietly deal with the cases that might be a little bit unusual. And then contains will give you information about does this number exist in the set. All of those are very efficient, can be done many, many times a microsecond without taking any processor time at all, so that means itfs very handy for just throwing things in a set and then later wanting to know if itfs there, as opposed to walking your way through a vector piecemeal to see if something was there. So for example, a vector doesnft have anything like this. If you wanna know if Julie is in your list of students, and you have them in a vector, the only way to know is to walk through it Sub 1, Sub 2, Sub 3 until you find it or you run out of entries to check. The contains operation does a blinding fast sort of check and can almost immediately tell you is Julie in there or not, whether your entries are a thousand, a million, a billion, pretty much immediate access to dig it out. It also offers these high level operations, and these actually are where sets are particularly valuable is this idea of unioning, intersection, subtracting, checking to see whether two sets are equal, so whether they have all the same numbers, or whether one has a subset of another -- that allow you to model -- a lot of times the thing you want to take your code to do is something that in the real world needs a subset or an intersect kind of operation. Youfd like to know if the courses you have taken meet the requirements you need to graduate. Well, if the requirements to graduate are a subset of what you take, whether theyfre a proper subset or not, if theyfre all in there, then you can graduate. If you wanna know if you and I are taking any classes together, or have any requirements that we both need to do, we can take the requirements, see what we have, subtract whatfs left, intersect that to find out what we have left to see if therefs any classes we could take together and both satisfy requirements. Any kind of compound Boolean queries like when youfre trying to do searches on an index, you wanna see pages that have this word, and this word, or that word, and not that word are very easily modeled in terms of sets. You have the set that have A and the set that have B. If you intersect them, it will tell you about which things have both. If you wanna see the gor,h you can use the union and find out which things have either. Sometimes you use sets simply for this idea of coalescing duplicates. If I just wanted to see the set of words that were in my file, and I donft care about how many times a word occurs, I could just dump them all into a set, and then when Ifm done, I will know what the 512 unique words are, and I will have not had to chase down did I already have a copy of that word because the setfs add just automatically coalesces with any existing entry that matches. So let me show you its interface. This is probably the biggest of all our classes in terms of number of member functions there, and features of them. Ifm not gonna talk about the constructor just yet because it has something scary in it that wefre gonna come back to which is a little bit fancy in terms of how it works. Therefs a default argument over there. Ifll note that, so that means that actually if you donft specify, in some situations the set doesnft need that argument and can kind of figure out for itself what to do. So we can create a set. We can ask for the size. Is empty -- just checking to see if the size is zero. The things that operate on a single element here, adding, removing, containing -- and then ones that operate on the set as a whole comparing it to another set. That set there is one of the few places where youfre allowed to use the name set without the qualification. Thatfs because wefre in the template, and in this situation, this set without qualification is assumed to be whatever set is currently being built. So if I have a set of int, the equals method for set of int expects another set of int as the argument. Same for is subset, and union, and intersect, so it actually kinda matches it out correctly all the way through, so you can only union sets of integers with other sets of integers. That should make sense to you. These operations return the Boolean. These guys actually destructively modify the receiver. So you have a set that youfre a messaging with a union with. You give it another set. It will join into the receiver set, so the receiver set will be enlarged by whatever new members are contributed by the other set. Intersect could potentially shrink this set down to just those elements that are in common with this other one. And then subtract takes all the elements that are in there that are present in here and removes them. So you can think of those as just modeling what the [inaudible] formula for those operations are. It also uses an iterator. Like the map, once you stick them in there, theyfre not indexed. Theyfre not by number. Therefs no way to say give me the nth element. Therefs no sequencing to them, so you use an iterator if you wanna walk through a set and see whatfs in there. So let me write you a little set code. Any questions about its basic interface? [Student:]Why would somebody use a vector over a set? Sometimes you actually do care about ordering, or you do want duplicates. You might have a bunch of names where maybe some people are gonna have the same names, or the ordering really is -- that you wanna know whofs in what room in a dorm, you might be using the index to say itfs in Room 100. Thatfs at what index? So definitely when you care about that index, and where you also want that random access that I know a particular number, and I wanna see whatfs in that box. There really isnft that same feature in a set. Therefs no way to get the nth member. If you just wanted to pick a random member out of a set, the only way to do it would be to open up the iterator and take a random number of steps forward, whereas in a vector you can say just pick from zero to N minus one. So there are definitely things that vector can do that set doesnft, and it just depends on the task which you have. Do you need duplicates? Youfre definitely stuck with vector. Do you care about random access? Do you use the index in some interesting way? Do you care about recording the sequence, like knowing who was first or last that you added? Set doesnft track those things in the same way. So let me write a little code that uses a set, and letfs write this. Ifm gonna write something that tests the random number generator. Every time I do this it alarms me, but I think itfs good to know. Ifm gonna keep a list of the numbers Ifve seen, and Ifm going to write a for loop that then -- I donft know. Letfs just run it until we see a duplicate. Ifm going to generate a number between 1 and 100. If seen.contains that number, then Ifm gonna break. Let me do a count. No, theyfll be fine. Otherwise, Ifm going to add it. And then Ifm gonna print found seen.size or repeat. So what you might be hoping is that if random number generator was still truly random but still a little bit predictable -- in this case, if I asked it for the numbers between 1 and 100, youfd like to think it would take about 100 iterations before it would get back to a number thatfs seen. But certainly given itfs random, itfs not actually picking and choosing without replacement. Theyfre just kinda bopping around the space 1 to 100, and it may very well light back on a number itfs already seen before it would get around to some of the other numbers. So when you run this, youfre kinda curious to see what is it like. Letfs find out. Every time I do this I always think, gWow. The random number generator really not that random,h but wefll see what today says. Ifm including random to get access to random integer. And Ifm going to put my set here. And Ifm gonna add one called randomize here at the top. That randomize initialized the library so that we have a new random sequence for this particular run which will allow us to have different results from run to run. It said 24 numbers before repeat. Okay. Letfs run that again. Seven before repeat. Letfs try it again. 20 before repeat. So showing you a little bit about the -- seven again. Six. Ten. Like is it ever going to get close to 100? No. So it just tells you that it bops around, but it actually does not -- it is pseudorandom, which is one of the words that computer scientists use to say, gYeah, donft count on it being really random.h I would not wanna run my casino on the basis of numbers being generated by your average C++ random library. There you go. So in this case, just using it for containment so that each time through I can add that new number and then check the contains. Both those operations act operating very quickly. If I did this instead with a vector, I could accomplish the same thing, but itfd just be more manual. Ifd stick it on the end, and if I wanted to see if it was there, contains is not an operation vector supports, so Ifd have to walk through the vector from zero to the length minus one, and try to do the matching myself, which gets slower and slower as the vector gets longer. Itfs just a hassle to write that code, so having contains around saved us a little bit. If I want to print my sets, why donft go ahead and just at the bottom show that the iterator gets used on the set, has the similar kind of formed name as the map, so the iterator -- in this case capital I is a nested class declared within the set. I ask the set to give me its iterator while there are things left to pull out, then I will pull out it and print it. And so therefs the sequence of letfs say 20 or so numbers. So this highlights something thatfs a little bit distinctive about the iterator as it applies to the set is that those numbers -- 15, 16, 22, 24, 37 -- are coming out in increasing order all the way down to the bottom. And if I run it again, I will get the same effect, but perhaps on a shorter list. That the sets iterator is not as unpredictable as the map iterator was -- that the set iterator -- that in fact part of how the set is able to supply its operations so efficiently is that internally it is using some notion of sorting. It is keeping track of things by ordering. In this case, the increasing order is the default strategy for how it lines things up -- and that the iterator takes advantage of that. It is internally storing in sorted order. It might as well just use the iterator to walk them in sorted order, and that tends to be convenient for somebody who wants to process this set. So as a result, you can count on that, that when youfre using a set, that the iterator will put out the values from kinda smallest to largest. So for example, for strings it would be the lexicographic ordering, that kind of slightly alphabetical thing based on ASCII codes that it would produce them in. In numbers, theyfll go from smallest to largest as a convenience, and it happens to be kinda nice for just browsing it in alphabetical order. Often itfs something thatfs handy to be able to do. We head back over here. And so the code we just wrote, printing the set, doing the random test, I can apparently reproduce my own code on demand. And then here it just shows a little bit of some of the fancier features you can start going once you have sets in play. So if I were keeping track of for the friends that I know who they consider to be their friends, and who they consider to be their enemies -- of course, those are very small lists Ifm sure for you indeed, but if I were putting together a party, what I might wanna do is say letfs invite everybody friends with and everybody Ifm friends with, but then letfs make sure we donft invite anybody that one of us doesnft like, thatfs on our enemy list. And so by starting with a copy of the friends that one has, unioning that with the twofs friends, right now Ifve got the enlarged set which includes -- note that I did a set string result equals one friends. That actually does a complete deep copy, so the map and the set just like our other containers know how to copy themselves and produce new things. So I got a new set that was initialized with the contents of onefs friends. I took that set and destructively modified it to add twofs friends, so now that result has an enlarged circle that -- and then Ifm further destructively modifying it to remove anyone who was on my enemy list and your enemy list to get us down to kind of the happy set of folks that either of us likes and neither of us dislikes to do that. So things that you can imagine if you had to do this with vectors would start to get pretty ugly. Walking down your list and my list to see whofs matching, to see who you have, I donft, to make sure that everybody got one and only one invitation. The set is automatically handling all the coalescing of duplicates for us. And then allowing you this kind of high level expression of your actions is very powerful. If in the end what youfre trying to model is this kind of rearrangement, then being able to union, subtract, intersect really helps the clarity of your code. Somebody can just read it and say, gOh, I see what theyfre doing here,h doing set operations to get to where we wanna be. Question? [Student:][Inaudible] the same enemies, so when you subtract onefs enemies, and then you [inaudible] again with two itfs not there. It turns out subtract will say remove these things if theyfre present. So if you and I have some of the same enemies, after Ifve removed them from your list, Ifll remove them from my list. It turns out they wonft be there, but it doesnft complain. Like the remove operation, if you ask it to subtract some things, it doesnft get upset about it. It just skips over them basically. The set though is a little bit more quirky than the others in one kind of peculiar way. Ifm gonna get you -- foreshadow this. Ifm actually gonna do more serious work on this on Friday -- is that the other containers kinda store and retrieve things. Theyfre putting them in buckets, or in slabs, or boxes, or whatever, and returning them back to you. But set is actually really has a much more intimate relationship with the data that itfs storing that it really is examining the things that you give it to make sure for example that any existing copy is coalesced with this one, that when you ask it to go find something it has to do some kind of matching operation to say do I have something that looks like this. Itfs doing this sorting internally, as I said. Itfs keeping them ordered. Well, it needs to know something about how to do that ordering, what it means for two elements to be the same, what it means for one element to precede another or follow another in some ordering. Okay. But set is written as a template. Set takes any kind of elem type thing. How is it you compare two things generically? That actually is not an easy problem. Ifll show you a little bit off this code, and then wefll talk about this on Friday more -- is that one way you could do it is you could assume that if I take the two elements, I could just use equals equals and less than. If I just plug them into the built-in operators and say, gTell me. Are they equal? What does equal equal say? Is one of them less than the other? Tell me that.h Using the built-in operators will work for some types of things you would store. Itfll work for ints. Itfll work doubles. Itfll work for characters. So all the primitive types have relational operator behavior. Even things like string, which is a sort of fancier type also has behavior for equals equals and less than. But you start throwing things like student structures into a set, and I say take two students and say if theyfre equal equal, wefre gonna run into some trouble. So Ifll show you that error message, and wefll talk more about what we can do about it when I see you again on Friday. If you have something you need to talk to me about, now would be a good time before I run away, so if you wanted to see me today at office hours.