Okay, hi, everyone. So today Ifm going to go through in some detail some of the ideas that I started on last week of how you can actually go about building a compositional semantic representation to provide some kind of handle on how you can start with meanings of words and some tactic structure, and actually put together a meaning for a complex sentence. And so most of the work thatfs gonna be talked about here is how can we have enough mathematical machinery and essentially what wefre gonna using is lambda calculus to glue things together, and first order star predicate logic is the output form, so that you can actually have a complex meaning being built compositionally out of the parts. And so thatfll be today, and then for the last two lectures next week, on Monday wefll then talk about well, how can we learn about and represent the meanings of words, so thatfll be the lexical semantics part. And then the last lecture, Ifll talk about some applications like question answering and textural inference. [Inaudible] say a sentence about quizzes. So again, have precisely one more quiz and sort of shifting the schedule between having had the holiday on this Monday and lectures ending next Wednesday, our goal is to hand out the final quiz by the end of this week, Friday, and then for it to be due on the Wednesday midnight, the day of the last lecture. Okay. Okay, so first of all, for the first couple of slides, essentially Ifm gonna be kept in a slightly more formal way with the kind of material that we talked about last time, and then go on and talk about how that we can get further than this. So this sort of bout looks familiar from last time. And so we have two people, gKathyh and gFong,h we have a verb grespectsh which wefre representing with lambda calculus taking two arguments, and inference of the verb grun.h And then here is our grammar, and so Ifm stop writing ? stopping writing definite clause grammars, so our grammars are just effectively being written before the colon and just like a contextually grammar before, and after the colon is writing how meanings are combined to represent the whole sentence. And the form of what happens with the semantics in our grammar rules is always gonna be incredibly simple, so what wefre doing here is wefre saying our composition rule is, gTake the meaning of both halves and do function application.h Do function application [inaudible] only one thing a verb phrase rewrites as a verb, and so you just pass the meaning up. So essentially, wefre always going to be either just doing function application or passing the meaning up. And so the action then comes into well, how can we have lexical representations of meanings thatfll allow the right kind of combination? Okay. So having this is then enough that we can do the same kind of example that I showed last time, so we have, gKathy respects Fong.h We write down the meanings of the words next to their base categories, we ? and then build a verb phrase where we do function application lambda calculus just like this. So wefve got the meaning of grespects,h the meaning of gFong.h When we do function application we say, gOkay, gFongh substitutes in for the value of y, and therefore the return value is lambda x respects x Fong.h And then we do the same again at the top node. We say, gOkay, herefs the meaning of the verb phrase.h We apply it to the argument of the subject gKathy,h and so we get out, gKathy respects Fong.h And so note in particular we kind of do that swapping here, so that internally the verb is taking its object first. So the outside argument here is representing the object argument, not the subject argument, and then itfs kinda swapped around inside. So we do get out the right meaning of, gKathy respects Fong,h and not the other way around that Fong respects Kathy. Okay, so the general scenario that Ifve kinda used as my main motivating example at the moment is that essentially, what wefre going to be doing is building a database interface. There are other things that you could imagine. You could imagine that these are meant to be commands that your pet robot understands or something like that, but as a kind of a concrete way to have a semantics that we can map onto, Ifm going to assume that wefve got databases. Now, Ifm not actually going to really spend any time on talking precisely about the mapping onto SQL, Ifm just gonna to sort of show random motivate ? show bits of SQL that you could imagine mapping it onto and take that as motivational. So Ifm really going to show the mapping for first order logic, but for those of you whofve seen databases, I mean following the kind of data along tradition of databases, it should be fairly straightforward to see how that if we can represent things first order logic, we can then do a further tetonistic transformation into an SQL query. Okay, so in particular what Ifm going to assume essentially is that wefve got database tables that represent our predicates of interest, so in that previous example we had a respectralation between people. And so when we have a respectralation in our predicate logic, Ifm going to assume that we have a respect table, and the respect table is going to have two columns. Itfs going to have the IDs of people who respect other people, the respecter column, and the respected column. So that if we have Kathy and Fong as our two IDs for people in the database, we could then ? if we take, gKathy respects Fong,h as kind of a new information statement, we can that as, gOkay, what we wanna do is insert into our respects table a pair of respecter/respected, where Kathy is the respecter and Fong is the respected.h Alternatively, in PEPfs slightly more interestingly we could also look at the goal of then answering the question, so we all want to be able to answer the question of, gDoes Kathy respect Fong?h And so then wefll be wanting to do a query of the database and work out whether it holds a particular row. So the little ? this point we use the teeta little SQL trick where we say, gSelect gyesh from respects where respects dot respecter equals k, and respects dot respecter equals f.h So wefre wanting to ask gyes/noh questions like, gDoes Kathy respect Fong?h And we kind of fake that up in SQL by saying, gSo select gyes.h And so then if this relation is in the database SQL will return gyes,h and if it isnft in the database SQL will return gno rows returned,h and we interpret gno rows returnedh as just meaning gno.h Okay. Okay, so thatfll be our goal to be able to build that kind of interface. And one other note I shouldfve said ? I mean just like last ? I mean in the examples I show there, and the semantics I make, Ifm assuming that for every predicate in the semantic form, we exactly have a relation that in the database for this database table that exactly matches that relation. That kind of makes it convenient and easy. I mean in practice that may not be true, and the example that you could think of there is equivalent to the example from last time with chat AD, so that chat AD actually only has one database table for countries. It has all the information about countries, but what it has is predicates, which you could think of as equivalent to database views where it has a capital predicate, which is taking just then two columns out of the countries table of country name and capital name. And so Ifm assuming in the same sort of way that Ifve got database views where I ? that map precisely onto the predicates of interest. Okay, so let me go through a little bit more formally the kind of representational base that wefre gonna use for this inference. So the model Ifm gonna use here is going to be using the higher order lambda calculus, so the higher order is meaning that you can have variables that arenft only individuals, but the variables themselves will be functions like last time. And in particular the version of things that Ifm going to present is then assuming that wefve got a type lambda calculus. So traditionally when youfre presented a lot of things in either the lambda calculus world or the logic world, things are completely untyped. You just have variables x and y, and do things with them, but therefs absolutely no reason that that has to be the case. And just like most people have decided that programming languages tend to work better when you have types on variables, you can also put types on variables into logical systems. And I think there are good reasons of both convenience and foundationally that once youfre working with higher order logics, I ? it can be very convenient to have type logics, and Ifm not gonna go off on ? about that for a long time, but wefre gonna assume that all of our variables have types. So wefre going to have Booleans, zeron 1, wefre going to have individuals, who are people and things in our model. And then wefre going to have functional types where we can talk about properties of individuals. So the basic property is something that takes an individual and says whether something is true or false about them, and then we have more interesting ones like binary relations, or ternary relations. So in terms of the kind of model we have in mind, this picture that I put on the blackboard is the kind of picture of our model. So we have an oval world, which is the outer circle, and inside the world we have various individuals, which Ifve denoted with xfs, and some of them have names, and herefs Kathy and herefs Fong, and herefs some other individuals in our world. And particular individuals can be objects; they donft have to be people. Okay, so then if we have properties of an individual, properties can be represented as subsets effectively. The subset where that the property is true, so this is the car property, and the things inside of the car, so A and B are cars, and the other six things ? the other four things in my world arenft cars. Similarly, this thing here is in Palo Alto, and that thing there is in Palo Alto, so Ifve got this [inaudible] in Palo Alto property. Red things [inaudible], so those are my various properties I can have in my models as effectively my model of the world that I can do inference on ? with in a model theoretic way. Itfs a little bit more complex when you then have predicates that have more than one argument because then I canft draw the same convenient circles in the same way. But you can think of a two-place predicate like glikes,h or grespectsh is like this. So it takes a first argument, which Ifve only partially represented it here, so you can read it. So the first argument should be able to be any individual, but Ifve only showed two of them. And what it then does is returns another function, so the function it returns is then a property. So the property is then saying whether the glikes Kathyh property holds of these ? Ifm sorry ? you know what? I drew this the wrong way around for some ? I was wanting ? let me just quickly change this. This whole point was it was meant to take the objects first, and then take the people second, is what I just illustrated with my previous example. Let me just quickly change this. A, B, C, [inaudible]. Okay, sorry. Okay. This is one [inaudible]. So the first argument it takes is the object, so if you have the predicate, so if you have glikes,h and its object is A, youfre then whatfs returned is a property, a function that takes a single individual argument and returns a Boolean. So this property here is the set of people that like object A. And so that set there is gKathyh as I represented it here. And herefs the property of individuals that like object B, and the set that for which that is true is gKathy and Fong.h Okay, and you can apply in that same general way if you want to have functions that take three arguments that youfre building these higher order functions, or return lower order functions until they ground it in either individuals or Booleans. Okay. Okay. Something to note that I will use sometimes though tends to confuse people, so I wonft use all of the time, once you have types in your ? so the conventional way of writing untyped lambda calculus is things like this, lambda y, lambda x, respect xy where here shows that this is something that takes two arguments. Now once you have a typed lambda calculus, that all functions inherently have a type. So itfs just like java that if you have a method that it has a particular type that is its signature, and you can know that a function has a particular type. So you kind of actually donft have to write all these lambda arguments, you can just write grespecth because itfs just a fact about respect that respect is a function that takes two individual arguments and returns a Boolean, that thatfs just the nature of the function. And so because of that we can then write things in the shorter form, which I will just show. So this is referred to as the long form where youfre showing lambda abstractions of all the arguments and then how theyfre being applied, but the short form is just to write the function without excessive lambdas. And so I mentioned very briefly and Ifll come back to in a minute, the idea of having quantifiers. So that when you have a quantifier like geveryh ? itfs a relationship between two properties, so this is, gEvery kid runs.h And so youfre evaluating the property of being a kid and the property of things that run, and what youfre asking is whether for everything that this property is told, whether this property also holds. So this is writing out a generalized quantifier in the form of the long form. But actually, we can just write it out in the short form like this because gkidh is a property of functions of individuals to Booleans. gRunh is a function from individuals to Boolean and the generalized term in geveryh is then a function that takes two properties and returns a Boolean as to whether it holds. And so we can write it out in the shorter form. And so though these kind of general rules for lambda calculus, which Ifm not gonna spend a lot of detailed time on the fine points of, but the general things that are for lambda calculus that you can do lambda extractions, so you can always pull out lambdas if you want to like this if itfs useful to. You can do function application, which gets called beta reduction, so you can learn from this that Alonzo Church was really fond of Greek letters and named absolutely everything with a different Greek letter just to keep things complex. In those eta reductions, so eta reduction just letfs you do the reverse of lambda extraction, and then alpha reduction is used for renaming of variables, so you donft get chance collisions between variable names. Okay. And so the central idea is that we can effectively come up with good semantic types that correspond to syntactic categories, or at least common pieces of syntactic categories. Wefll see that there are various complications in various places. So in general when you have nouns or verb phrases that they will be properties. So something like being a car is a property, so that therefs a certain set of things that are cars. But also a verb phrase, so the gliking Ah is also a property because then youfre picking out the set of subjects for which liking A is true. The simplest case of noun phrases are individuals, so when we have things like the, gKathy respects Fong,h example, Kathy and Fong we both represent as individuals. That kind of very quickly stops working when you want to deal with more complex cases like, gEvery car is red,h and so Ifll come back to how people deal with that in a moment. What about if you want to sort of be building a meaning of gred car?h Well, the prop ? it seems fine to say that red is a property at the end of the day, the set of things that are red things, but we have this kind of idea that we want to be able to build up the meanings of words by doing function application. And so if we simply have the meaning of car is a property and the meaning of red is a property, well, we kind of canft combine them by function application any more. And at this point therefs actually a choice as to how you go about designing your system. One way you could go about designing your system is to say, gYep, Gfs function application isnft going to work,h so what instead wefre going to do is say, gEvery time we write a syntax rule down, wefre then going to write special rules as to how to generate the meaning according to that syntax rule.h So for example I could say that carfs a property and redfs a property. And if I have the syntax rule gnoun phrase goes to adjective noun,h what I say is the way you translate that is if the adjective meaning is the property p, and the noun meaning is the property q, you translate that as lambda z, p of z, nq of z. So Ifm kind of building up in my syntax rule a translation rule. And thatfs referred to in logic as using a syncategoremic translation where youfre introducing extra stuff in your syntax rule. The general trend that people have used when doing computational semantics has been to really avoid doing that because you get a lot more simplicity and power if you can just have automatic translation by function application. But the cost of that is you then have to encode the extra complexity down in the lexicon, and so thatfs whatfs then shown on this slide. So if we just want to be able to make a meaning for gred carh by function application, then what we have to do is say that one word is a function that takes the other one as an argument. And so we say itfs the adjective that does that. So we say that well, red takes the meaning of car as an argument and returns something. Well, what does that mean that red should be? Well, the meaning of car as a property, itfs something from individuals to Boolean, and when we have a red car, thatfs also a property, so this set here is the set of red cars, and thatfs also a property. And so therefore, the meaning of red should be something that takes a property and returns a property, and so its type is what you see here. It takes a property individual to Boolean, and returns another property. Okay, and once you start going down that route, you just kind of keep on heading down that route and you get into higher and higher order functions. So if you want to have a adverb like gveryh in gvery happy,h you say, gWell, what does gvery happyh do?h It takes an adjective like ghappyh and it returns something that has a meaning thatfs kind of like an adjective, right? gVery happyh is sort of like an adjective meaning apart from itfs describing a more extreme adjective meaning. Okay, so if that means the type that wefre going to give to adverbs that modify adjectives at least is it takes as its argument the adjectivefs type, and it returns something that is of an adjective type. So itfs intebol to intebol, to intebol to intebol, which is just about incomprehensible, but in practice, these things are fairly straightforward. If you kind of just think of them in terms of higher-level notions. So you ? once you think that these are properties, and adjective is something that maps from one property to another, and this is something that maps from adjective meaning to another, that then things sort of fall into place in sort of straightforward enough way. Okay, so here is my grammar fragment that Ifm going to start off doing some examples for the grammar fragment, and Ifll add to it a bit more so I can answer questions. So this is a very straightforward PCFG grammar ? I should say CFG grammar ? I mean the only way itfs different from the kind of ones that wefve seen in the Pentree Bank, is if you look at the noun phrase structure, Ifm actually kind of producing a binary noun phrase structure rather than flat noun phrase structure. If you want to build up semantics of noun phrases, it really helps to have a binary noun phrase structure, different reason than making parsing going fast. So a noun phrase goes to a determiner and then an N-Bar. An N-Bar can have any number adjectival PP modifiers and then goes to a noun. But apart from that, note that the semantics is in all cases trivial, so the semantics of our rules is simply function application that you have something be the head, and itfs going to be the head [inaudible], and if there are other things, they just become arguments of it that Ifm denoting there. And that means that all the interesting stuff then happens in the lexicon. And so this shows some of the lexicon, and Ifll go through some of this with examples, but just go through a couple of things at the start. So we have gKathy,h gFong,h and gPalo Alto,h and so theyfre things that are individuals. I guess when I did this I didnft actually put gPalo Altoh in as an individual, but we could have somewhere here gPalo Altoh also represented as an individual. Okay, then you have things like gcarh where gcarh is a property, so itfs something from individuals to Booleans just like here. For gred,h red is a little bit more complex since what Ifve represented here as a property is actually then gred prime.h So gred primeh is the property of things that are red. So the red here is then this red thatfs of the intebol to intebol tied, where what it does is it takes a property gp,h so this is the higher order part of our lambda calculus. And inside its lexical inferri, it does the meaning combination. So it says, gOkay, if youfre gonna make up something like gred car,h I take as an argument the property and then I build as my lexical inferri to return the property in which p is true of it, and the red property is true of it. And so we kind of do the clever bits of meaning combination down in the lexical entries. Okay. And then maybe more straightforwardly down at the bottom we then have the verbs, which are much like before. So grunh is taking individual and returning a Boolean. gRespectsh is taking two individuals and then saying whether itfs true or not, similarly for glikes.h So glikesh is just like here, and an example like grunsh is effectively another property here, so therefs the things that run, and so maybe this runs or that car doesnft run. Okay. Okay, so these are the ones in large print, when I get on to the more complex sentences, the print gets even smaller. And I think to some extent theyfll sort of work through some of the details of this, you have to kind of look more carefully and see that it all makes sense more than I can do in lecture, but nevertheless I want to try and point out some of our high level points of whatfs going on. So what we wanna do is make a meaning for gred car Palo Alto.h And I mean first of all, to think about it conceptually, well, what should the meaning of gred car in Palo Altoh be? The meaning of gred car in Palo Alto,h well, that seem like that should still be a property. It should still be something that picks out a set of individuals that satisfy the property gred car in Palo Alto.h I mean in particular, in terms of my model that I have right here, what Ifm wanting it to do is pick out precisely this set here as this is the gred car in Palo Altoh set of individuals. So the end result of the semantics that we make should be that wefre getting out a property. So how do we actually get that all out? This is the bit that Ifve talked about already. So gcarh is a property from individuals to Booleans, so gredh has this more complex meaning where it takes a property and returns a bigger property. So we can combine in here gcarh as p, and what we get back is lambda x, car x and red prime x. So thatfs already created a property, which corresponds to the things that are cars and red, so thatfs that sort of bigger property up here. Okay, and then we wanna do combining in Palo Alto. And so in ? at that kind of scary lexical entry Ifve written next to it, which I didnft stop to dwell on in the time. But while in principle you can kind of reason through in first principles what kind of types you have to put on words to be able to get things to work and then essentially to build those types. So if you think about gin Palo Altoh ? gin Palo Altoh it should itself be a property, right. That that seems like itfs something that can be true or false of individuals there. At least at a certain point it time, theyfre either in Palo Alto or theyfre not in Palo Alto. So here I have this property of things that are inside Palo Alto. Okay, so that means that when we build this pp up here of in Palo Alto, its meaning should be a property. Well ? should complexify that once more. The idea of gin Palo Altoh should be a property, but just like an adjective wefre then going to want to have the pp modifying the meaning of gred carh just like an adjective, so Ifll explain that in a moment. Okay, so we have ? we wanna build up a property meaning, but what we actually have is an individual gPalo Alto,h and we have the word gin.h So that suggests that we want to have something that takes an individual and is returning something. So the prepositions meaning starts off by taking an individual y. And so well, whatfs it gonna return? Itfs gonna return for gin Palo Altoh something thatfs then an adjective meaning, something that will take a property and build a more complex property where it satisfies the previous property and gin Palo Alto.h And so that means at the very end of the day, that what you get for some ? a word like ginh is itfs taking an individual, which is the object of the preposition, and then itfs returning one of these adjective meanings. You can have ginh as having that kind of type. Okay, so it takes gPalo Altoh as an argument, so gPalo Altoh becomes y and substitutes into ginh to give this meaning, okay. And so now this is something that works just like an adjective, that itfll take a property as an argument and return a more complex property. And so here it takes as itfs argument things that are gred cars,h and so all of this becomes lambda p, and so you substitute gin,h so then you get lambda x, car of x, and red prime of x, and in prime Palo Alto x, which is then the meaning of the whole thing. Does that sort of make sense? I know therefs a lot of notation stuff, but the basic idea is we can take these things of various functional types, we can write them so that they take arguments of the appropriate functional type, and then they return an argument of the appropriate functional type. And so in this sort of way if we model all noun phrase modifiers as properties, that as things that map from properties of properties, we can then have successive noun phrase modifiers kind of combine together and the resulting meaning would be still a property. So by extending this we can have sort of the large leather green couch in my bedroom, which my uncle sold me, and we can kind of have each of those modifiers be something that maps from a property to another property, and we can combine it together and make up the meaning of the whole. Itfs worth noting finally, the way Ifve wrote the grammar ? if I go back a few slides ? but when youfre at this N-Bar level ? at the N-Bar level you can just take modifiers. So you could take adjectives on the left, or you could take prepositional phrases on the right, and if I have other things like relative clauses, you could take them on the right as well, so you can just take any number of modifiers. And so that actually means that the grammar has a syntactic ambiguity where you can take the modifiers in either order. So here I built gred carh first, and here I built gcar in Palo Altoh first. And interestingly, thatfs then what people call a spurious ambiguity, whether things that are licensed by the syntax that donft actually make any difference to the meaning thatfs generated for the sentence. So regardless of which one I combine with first, I get the same meaning of lambda x, car x, in Palo Alto x, and red x. Okay. And precisely the reason why I get an ambiguity thatfs spurious there is that the meanings of the different modifiers Ifm putting on are actually just turning into extra conjoined terms. So things that are just turn into extra-conjoined terms are referred to as intersective modifiers in linguistics, and so thatfs exactly what I was doing here. I was having the set of cars, I was intersecting it with the set of red things, and then Ifm intersecting that with the set of in Palo Alto things, and Ifm coming down to this intersection and that gives me intersective semantics. So if youfre in that world ? and so whatfs of the things that we think of as modifiers have intersective semantics? So ggreen leather couch.h You have the set of couches, you have the set of green things, you have the set of leather things, you can intersect them, and you have the set of green leather couches. And providing wefre in that world of intersective semantics, itfs very easy to see how to do this with database queries because effectively you can have a table for each one of these properties, and then you can just join the tables and find out the individuals, the database IDs for ? which are in all three of these tables. So thatfs kind of the easy case and the case that we can actually practically work with most of the time. And sort of important to note though, that there actually are ? and this is something thatfs talked about quite a bit in linguistics ? there are nonintersective modifiers where you get a meaning from the whole thing, which you canft kind of represent as an intersection of terms. So I mean the most famous example is of words like galleged.h So if you say that someone is an alleged murderer, that doesnft mean that they satisfy the property of being a murderer and that they satisfy the property of being alleged. It means that they satisfy the property of being an alleged murderer where alleged murderers somehow a property meaning that you can put together out of the meanings of the parts. And so somehow if you wanna be able to deal with alleged murderer, you wanna be able to deal with that, but itfs kind of not so straightforward as to how to do that because effectively you want to be sort of working out a sort of a new functional form out of an old functional form. I mean herefs the other example of this that I had here, which isnft quite so convincing, but I think illustrates the same point. So this example here is goverpriced house in Palo Alto.h And so although itfs not quite as dramatic as galleged,h I mean I think itfs reasonable to say that overpriced isnft a term that has absolute reference. So you can have one semantics for an overpriced house and itfs in Palo Alto, and thatfs a kind of a global scale of whatfs an overpriced house, so using these semantics, well, every house in Palo Alto qualifies as an overpriced house by global standards. But if you make the modification the other way, you then have the set of houses that are in Palo Alto, and then youfre saying overpriced out of houses that are in Palo Alto, and so that seems to have a different semantics of houses that are overpriced even relative to the standards of Palo Alto. Okay, so you can get tricky things with nonintersective adjectives, but Ifm not gonna deal with that further today, and Ifm just gonna stick with the intersective adjectives. Okay. So allfs good until now, but unfortunately natural languages just get a bit more complex that what wefve done so far. Okay. So I donft know if this ever troubled any of you guys when you did a baby logic class, but when you think about how people translate English into a baby logic class, something fundamentally weird happens, right? So if you wanna translate gKathy runs,h you translate it into gruns Kathy.f Thatfs easy. But if someone then tells you to then translate into predicate logic gno kid runs,h well then, you kind of have to know to do this really funny thing where you write down gnot there exist x, kid x, and run x.h So run ? here the noun phrase subject was an argument of the verb, just like it seems to be in the syntax of the sentence, gKathy runs.h Kathy is an argument of running. Syntactically this example doesnft seem any different. gNo kidh is now the subject of gruns.h Runs, a predicate, has taken a subject, thatfs his argument, but the translation is kinda completely different. We found this really funny thing where wefve wrapped some weird logical structure around the outside of the whole thing, and grunsh is buried down here somewhere. So that looks kind of awkward, and so the question is how can we produce those kind of translations in our system. Well, herefs another example of how things go wrong. So we have, gNothing is better than a life of peace and prosperity. A cold egg-salad sandwich is better than nothing,h and if you donft look too closely it seems that you should be able to deduce from that that a cold egg-salad sandwich is better than a life of peace and prosperity. Why is that? Well, it seems like gis better thanh should be a transitive relationship, right. That if you have that whatever ? a Blackberry Pearl is better than a Motorola Razor, and an iPhone is better than a Blackberry Pearl, you should be able to conclude that an iPhone is better than a Motorola Razor, right. gIs better thanh is a transitive relation. And so thatfs what Ifm attempting to do here, Ifm trying to do inference using the fact that gis better thanh is the transitive relation. But with sentences like this, it just goes totally wrong, and well, why does it go wrong? Well, the reason is again, gnothingh is a quantifier. Itfs just like this gno kid runs,h here when you have gno kid,h that as soon as you have things that are quantifiers, it kind of just doesnft work if youfre treating them as arguments to relation, and you seem to do something special and different. Okay, so somehow we need to be able to model in our system how we represent the meaning of these quantifiers and deal with them. And so well, how can we do that? Well, wefve done pretty well with things that are properties. We have gred carh and gPalo Alto,h and things like that. So we could build everything about noun phrases as complex properties, except when they have a determiner out the front of them like gthe,h or gevery,h or gno,h and things like that. Once we see that, it seems like we have to do something very special. So what are we gonna do from the meaning of gthe red car in Palo Alto,h or gevery red car in Palo Alto?h Well, first of all, mention what we gonna do with gthe.h What ? for gtheh what the easiest thing to do is say, well, we want to then map down to an individual. Just like we had Kathy and Fong as individuals. gThe red car in Palo Alto,h that should be an individual. It should be this guy right here, that thatfs the meaning of the gthe red car in Palo Alto.h An so a possible semantics for gthe,h and this is actually precisely the semantics for gtheh that was proposed by Bertram Russell, is to say that gthe,h the meaning of gtheh can be represented as the iota function. So iota is the semantics of gthe,h which was back in the lexicon if you look back ten slides. The meaning of gtheh is that this is a partial function. So if therefs precisely one thing in the model for which p of x is true, then the function returns that thing. So if I ask for gthe red car in Palo Alto,h the meaning ? whatfs returned is the individual b. If there are multiple things for which a certain property is true, so for example if I just say, gThe car,h then this is only a partial function, and the return value is undefined. So if the function doesnft apply, itfs got no return value. So thatfs our meaning of gthe.h It will return an individual if one is uniquely specified, and if there ? one isnft uniquely specified, you donft get any value in return. Now, in ? now I mean thatfs kind of true. I mean in practice though, itfs not quite that simple. So when I ? I mean basically we use gtheh when wefre talking about things that we can identify. So we talk about gthe president of Stanford Universityh because therefs only one of them, and so it can be uniquely identified, but we donft say gthe department chair of Stanford Universityh because that kind of ? it sort of feels wrong because it doesnft identify an individual, and thatfs Bertram Russellfs core inside. I mean in practice itfs a little bit more complex than that because in practice people will interpret the meaning of gtheh as relative to some situationally defined context, so if you say something like, gPass me the phone,h youfre not making a claim therefs only one telephone in the world. What youfre making a claim is in this situationally defined context of the room that wefre sitting in, therefs one phone, and thatfs the one that youfre meant to pass me. Okay, so that gives us a semantics for gtheh nevertheless, so I mean now we could actually say a sentence like ? work out a semantics for a sentence like, gKim likes the red car in Palo Alto,h because the meaning of gthe red car in Palo Altoh would just be this individual b, and then we can look across and say, okay, glikes bh returns this function, and if we evaluate it with respect to gKim,h we get gtrueh as the answer, and so we can evaluate that kind of sentence. Okay. And so we can then translate that kind of sentence into our SQL database form, so gred car in Palo Alto,h properties where we make a query of a database, and we return the set of objects that satisfy it, just like in a regular database query. So thisfll be select cars dot odge ? for properties Ifm just assuming that itfs a one-column table where the field thatfs called odge. Select cars dot odge from cars locations gred,h so Ifve joined my tables where cars dot odge equals location dot odge, and locations dot place equals Palo Alto, and cars dot odge equals red dot odge. Okay, so then if I want to do gthe red car in Palo Alto,h what Ifm wanting to say is the return set from that database query is a single row, and I can actually write that in my SQL by being a little bit clever and writing a having clause, and I can say, gSelect all of the same stuff, having count equals one.h And so this having clause will precisely implement my Bertram Russell semantics of gthe.h If therefs a single thing that satisfies it, I will get returned that one row of the database, and otherwise I donft get a return. Okay. Well, thatfs good up to there. If I could just say, gFor the red car in Palo Alto,h well, it turns into this individual, and we know how to evaluate individuals with respect to something like glikes,h but what if I want to say, gEvery red car in Palo Alto?h Well, then it seems like we have to do this fundamentally ? change things around where with words like gevery,h these determiners, they have this quantificational force where they take other things as their arguments, and so thatfs the thing that wefre going to represent. So what wefre gonna take as the type of a generalized determiner, a word like geveryh and gno,h is to say, gThis is something that takes one property.h So for gevery carh itfll take the property of being a car. For gevery red car in Palo Altoh itfll take the property of gred car in Palo Alto,h and then what itfs going to return is something that maps from properties to truth-values. Okay, and so I can have semantics for generalized determiners in terms of these what Ifll call generalized quantifiers that take a single argument. So gsome kidh will take two ? gsome kid runs,h you take the two properties as arguments, you can join them together and then the gsomeh predicate says is there at least one individual for which this property is true? And the geveryh predicate says is it the case that this is true for all individuals? Okay. Just whatfs expressed there. Okay, so now Ifm kind of just about in business in the following sense. Okay, herefs kind of how I want to do gevery student likes the red car.h Okay, so I have gcar,h I build up the bigger property of gred car,h and then I combine it with the determiner gthe red car,h so this is this iota operator here. And when we evaluate the iota operator of gthe red car,h its value is just an individual. Itfs the individual b in our model over here. Okay, so because its value as an individual I can apply the predicate glikesh to it, and then I get something that is a property of gliking the red car.h In particular the property that I get out is precisely this property here of gliking the red car.h Okay. And so then what I want to say is that, gEvery student likes the red car.h And so well, Ifd really be in business at this point if I could do the following trick. Up until now Ifve treated the verb as the head of the sentence and say, gDo function application.h Thatfs how I handled gKathy likes Fong.h But glikesh took two arguments, gKathyh and gFong,h and it works, but it seems like thatfs not gonna work for me for gevery student.h But what if I just changed my mind and said, gWell, letfs take gevery studenth as the head in the semantics, and have it take glikes the red carh as an argument.h How would things work out then? Well, geveryh is something that takes two properties and then returns a truth-value, so it combines with one property here to give us gevery student.h And well, look, the meaning of the verb phrase is a property. Wefve made precisely this property that I put a box around on the board, so gevery studenth could take that as an argument, and then wefd precisely get the geveryh predicate evaluating those two properties. And so it would be saying, gIs this property true of every individual?h And at least for the partial representation Ifve shown over here, it is true of every individual, and so Ifd get x my answer gyes.h Okay, that looks really cool apart from the one problem that I played this mucky trick whereas before the verb phrase was the head of the sentence, I now made the subject noun phrase the head of the sentence, and that might then get you worried as to well, kind of what happens after that. Like what if I put a quantifier in object position? What if I said, gEvery student likes every car?h Well then, I sort of am in worse trouble then because then if the meaning of gevery carh is something that takes a property and returns a truth-value, it sort of seems like I canft combine that with glikesh at all. So I need to sort of build some more technology to be able to deal with quantifiers. So the person who sort of initiated this enterprise was Richard Montague. So Richard Montague was a philosopher in the late 60s and early 1970s. He actually committed suicide, which is a long story I will not digress into for ? too much detail at the moment. But based ? the central idea of his work was to notice the fact that modern logic had developed all of these rather nice tools that had completely abandoned the goal of actually trying to provide semantic representations for human languages, whereas the original goal of logic as I mentioned before, was actually to help assess human arguments. The ancient Greeks actually believed that logic was a tool, so that when you listened to George Bush say something on television you could evaluate whether what he was saying was sensible or not. So Richard Montague had the idea that maybe now that there are all of these good formal logical tools, maybe there actually could be applied vector natural language if we have the right technology for doing it. And so he introduced these kind of techniques of using higher order logics and lambda calculus. And so the idea that Montague came up with to solve the problem of how to incorporate quantifiers into things, was to say, gWell, it just doesnft work to say that noun phrases refer to individuals because we wanna be able to say things like, gEvery student likes every car,h or, gEvery student likes some car,h or something like that. And these quantifiers we have to represent differently. So in his model all noun phrases got turned into these higher order things. So any noun phrase meaning got turned into something that took a property and then returned a Boolean. And so the meaning of gKathyh is then no longer an individual gKathy.h The meaning of it is this thing that takes a property and returns that property evaluated with respect to Kathy. Slightly confusing, but sort of makes sense, but once you have this, if you want to have the semantics of gKathy runs,h or youfve got the property of running things, and you stick that in as p, and youfre evaluating whether Kathy is true of it. So youfve fundamentally changed around one of the functions and arguments. A lot of modern work has sort of seen that as rather ugly and unnecessary, and so has instead gone in this direction that essentially follows Montague, that essentially says you want to allow flexible shifting between types. So Kathy is just an individual, but if you want to for semantic combination, you can certainly turn it into something that takes properties and evaluates whether itfs true of that property. We had properties like gmanh and gcar,h and well, it proved handy for gtheh to have this iota operator, which could map from a property to an individual that satisfies it as a partial function, which is what those dots are meant to show. And so noun phrases are commonly these days represented by allowing these kind of flexible type shifting operations. Okay. Now this point time is getting short, and Ifm gonna skip a little bit, but earlier on we talked about how there are quantifiers scope ambiguities, and I will assure you if you read the handout that goes with this class really carefully, it explains this beautiful way in which you can model quantifier scope ambiguities in terms of doing different amounts of type shifting, but Ifm not gonna go through it right now. But believe me, here are the two different representations for gsome kid broke every toy.h And if you look very carefully as I flip them back and forth, you see in one that the gsomeh quantifier is on the outside, and in the other the geveryh quantifier is on the outside. Okay, let me just for a couple of minutes sort of show a couple of more examples of how you can then start to do questions like this. Okay, so we want to have ? we want to be able to do gyes/noh questions, so we wanna check on a database something like gis Kathy running? What things are true of the world?h And perhaps more interestingly, we might want to do content questions like, gWho likes Kathy?h What is the right semantics for questions is something people still write papers about in philosophy and linguistics. The model Ifve got here is that a question is just going to itself be a property. So a question is I say a property like gredh and gin Palo Alto,h that the meaning of the question is just that property, and thatfs going to be interpreted as what is the set of things for which that property is true, and thatfs the answer to the question. Okay. So is how I then extend my grammar to deal with questions. And therefs one kind of tricky bit here, which is I use this idea of gget threadingh for questions. The complex thing about questions is that they ? in English ? is that they have these long distance dependency. So if you have something like, gWho did Bill report that John said Sue likes?h The gwhoh at the front is actually the object of the liking, and somehow you have to connect those two positions together. And there are various ways that it could be done, but a common way in some formal grammars, and the method that I use here, is this idea of ggap threadingh where you can introduce a variable as an empty node, and then kind of shift it up through the syntax until you discharge it. So here I will have an empty introduced with a meaning of a variable z, and then I will allow rules to pass up that z until I have my whole sentence, and then I will discharge the z as being an argument for that predicate. I will show an example. And then here are my syntax and semantics for question words. So the question ? for a question word like gwhath it kind of doesnft have any meaning in this model, it just takes a property and returns that property. So if we have something like gwhat is running,h we have the property of gis running,h and what will take it as an argument and just return it does nothing to it. But some of the other gw-hh words do more interesting things, so for gwho is running,h it takes the same property as an argument, but it makes a more complex property, which says the argument property has to be true, and the thing has to be human. And if I asked, gHow many birds are running?h Then Ifm taking a property of being a bird, and then Ifm going to be taking a property of gis running,h and then what Ifm going to be returning is the count of the individuals for which that combined property is true. Okay. So if I take that and put it all together, I can actually now make quite interesting queries and translate them into something that can be evaluated over a database. And again, Ifll probably go through this a little quickly, but you can peer closely at the examples and see that it really does work. So herefs an easy case, which Ifll use to illustrate this gap threading idea. So, gWhat does Kathy like?h So the idea is we actually have empties in our grammar now, so therefs a missing object of like, which is introduced with this variable z, an individual meaning, so this is the ? has a meaning of kind of glike z,h but we donft actually know who z is yet. So herefs glike zh as the meaning, then we get gKathy likes z,h which has the meaning of glike z, Kathy.h And then what we can do ? we kinda pass up a gap as the splash categories, so this is a verb phrase, which had supplied inside it an empty noun phrase whose meaning was z. And so now here we have a sentence, which has inside it an empty noun phrase whose meaning was z. So at any point we want to we can then discharge our gap thread at empty and say, gOkay, this sentencefs meaning is actually glambda z, likes z, Kathy.h So that wefve turned back into a property the thing that wasnft actually present, that was missing. So this is kind of like in predicate calculus if you do natural deduction where you assume something and then you later discharge the thing that you assumed. Okay, so now I have a property of glambda z, likes z, Kathy,h and so for, gWhat does Kathy like,h gwhath introduces no new semantics of itfs own, and so the meaning of the whole thing is just glambda z, likes z, Kathy,h i.e. herefs a property and the set of rules that are returned from the database for that property is the answer to the question. So I get select liked from likes, were likes dot liker equals Kathy. Cool. Okay, but we can go on from that and do much more complex examples. So, gWho does Kathy like,h works essentially the same, apart from gwhoh has this more complex semantic form, so that the result is that youfve got this property of gKathy likes the thing and the thing is human.h So now we have select like from likes humans where likes dot likes equals Kathy, and humans dot odge equals likes dot likes, so wefre restricting it to being a human thing. Okay, but we can put in more complex question words like, gWhich cars did Kathy like?h So now gwhich carsh is saying so gwhichh is taking the property of a car, and itfs taking the property of gKathy like,h and asking if both of them be true at the top, which gives us this database translation. And at that point we can just keep on using all the stuff that Ifve tried to present today, and it really does all fit together and work right, so I can ask, gWhich cars did every student like?h And so we have the same empty z. We can make a meaning for gevery student likes.h So now we have the set of ? this property is the set of z such that every student likes z. And then we can combine that with something like gwhich car,h and make a much more complex meaning. And we can keep on going through examples like this. gHow many red cars in Palo Alto does Kathy like?h Or, gDid Kathy see the red car in Palo Alto?h So at this point Ifm starting to be in the space that the trees that you have to build are too large to fit in my narrow margin, so youfll just have to believe me that the proofs work out okay, but they do ? or the derivations work out okay. So this is, gHow many cars in Palo Alto does Kathy like?h So we kinda make up the gdoes Kathy likeh part here. And then for the, gHow many red cars in Palo Alto,h we make this property of red cars in Palo Alto just like before, and then how many takes first this property, and then it takes the glikes Kathyh property, and so itfs returning the count of how many things satisfy this big composite property, and then we could translate into our piece of SQL where wefre then doing a count operation on the things that are satisfied. Now, [inaudible]. So I thought then just for the last few minutes Ifd mention just a little bit about some work thatfs been happening recently that could actually connect these things together. I mean I think itfs fair to say that for ? until about 2005 I guess, 2004, 2005 ? that there was really kind of a complete disconnect between work that was using statistical probabilistic machine learning methods and OP, and logically oriented deeper meaning representation work on NOP, that that was something that therefd been a lot of work on in the 70s and 80s, and there was still an occasional little bits of work going on in the 1990s, but effectively the two didnft have anything to do with each other. That by and large people were doing probabilistic machine learning stuff, were doing more surface work, so they were doing things like building part of speech [inaudible] recognizes and statistical parsers, all the things that you guys have been doing, but really were doing nothing significant in the way of doing full sentence meaning representations. They were doing some meaning tasks, but low-level meaning tasks like doing name density recognition, or extracting particular semantic relations. But just in the last couple of years, therefs actually then started to be some nice threads of work where people are actually starting to return to how can you learn and use much deeper semantics while still fitting into using probabilistic and machine learning ideas, and I thought that Ifd just mention a couple of things of this sort. So one question is, gWell, could we possibly learn representations of this sort?h And thatfs something that a couple of groups have been working on recently. So Luke Zettlemoyer and Michael Collins at MIT have been doing a couple of papers on that, and then also Ray Mooney and students at Texas have been doing this sort of work. And I canft really go through the details here, but this is in one very short slide the outline of the Zettlemoyer and Collins approach. So that they assume that they have some stuff at the beginning. They assume that they have an initial lexicon. The initial lexicon is mainly knowing about some function word, so for things like prepositions you have a meaning of what does this preposition mean? They also though assume that they know a few specific ? the main specific words, so that they might assume that they know the meaning of gTexash for example. They then start off with category templates, and so category templates is effectively possible semantic forms words could have. And so thatfs kind of like I was saying that there are these dominant patters that we have properties and two-argument things and things like that. They have a set of category templates. So they start off with a little bit of stuff, not nothing, but primarily then beyond that, what they assume that they have is pairs of sentences and meaning representations. That may seem like a fair bit, and in some sense it is a fair bit, but I mean a prominent belief of how human language acquisition works is that humans donft actually acquire stuff about sentences until they understand what they mean, right. That basically people are working out the meaning of things from situational inference, so that after the first ten days when mommy is holding the orange juice and saying, gDo you want juice?h That you start to understand what that means, and then you map between the words and semantics. So you have a set of examples like this, and so the goal of the ? the goal of their papers is that you want to do combined learning of both lexicon and syntactic structure. So although you have a kind of a bootstrap for the lexicon, most words you donft know what they mean, and you have no information whatsoever of syntactic structure and semantic combination rules. So this isnft like the Pentree Bank where youfre given all of the trees, youfve been given no trees whatsoever, and no basis for which to put together compositional semantics. And so what they do is that itfs built as a kind of an introvert bootstrapping process where youfre postulating possible lexical forms, so thatfs where youfre using these category templates, and youfre trying to use them in a parser, and the parser is a maxent parsing model. And effectively youfre then learning lexical representations for words that seem to work in terms of the fact that theyfre able to combine together with words that are in your initial lexicon, and work to be producing the final semantics for sentences that youfd like to see, and effectively through each iteration through the bootstrap, what you hope to do is figure out a successful semantic representation for some more words, and then start building up your lexicon so it grows over time. But itfs ? it is a kind of a cool joint learning task because I mean at the start therefs absolutely no parsing model whatsoever, so that theyfre simultaneously trying to postulate possible lexical entries, and then optimize the parameters of a maxent parser to try to parse things in a particular way as it goes along. Therefs then also been work on reasoning with such representations. Now in particular domains, one possibility is just to go with logical reasoning. And I mean I think that can actually be quite successful in certain domains of business rules, laws, and things like that that you can do logical reasoning. But I think in general it is the case that just like all kind of probabilistic argument in general goes, that knowledge of the world is in general so incomplete and uncertain that we just canft get very far with strict probabilistic inference. In particular a lot of the time things only go through assuming various plausible assumptions. That all the time wefre assuming sort of plausible things will happen in the world, and that allows our inferences to go through where theyfre not actually kind of the climates thatfll happen. Most of the time we just assume that a really big earthquake wonft happen in the next ten seconds, and so therefore itfs reasonable to make assumptions about what the world will be like in ten seconds time. So that then leads us into the work on doing probabilistic models and knowledge representation reasoning. And so thatfs also been an area thatfs been actively pursued. I mean itfs the case that all the current work on doing probabilistic inference still assumes restricted ? something more restricted than first order logic. So first order logic is in general very difficult to deal with because you have this open domain of individuals that perform [inaudible] is in the first order logic, is that you have an infinite space of individuals and you can quantify arbitrarily over them, whereas all of the existing models are probabilistic knowledge representation and inference in some way uses somewhat more restricted model where you can still do certain kinds of quantification that is not quite open [inaudible] like in first order logic. And so one thread of that is definitely Collins work here on probabilistic relational models. Another thread of work thatfs been going on recently is Pedro Domingo and colleagues have been doing Markov logic, and they differ in various ways. Part of how they differ is the Daphnefs model was a basing net directed graphical model whereas this is then an undirected Markov network star model. Actually just the last week ? I know someone whofs been doing undergrad honors with me has been trying to apply some of the Markov logic ideas to doing mappings of sentences into Markov logic and then doing inference on that with at least reasonable success on range of sentences, itfs not that itfs a full coverage system. So I think that this is an area where itfs now possible and itfs starting to be a lot of exciting ideas where you can kind of start to combine machine learning statistical methods with using these richer representations and then you have to do richer semantic inference. Okay. And thatfs the end for today.