NLP - Lecture 2 - Elements of Linguistics
Linguistics
When we address the study of linguistic we don’t consider only one language because obliviously there are lot of languages, organized by region in the world.
Linguistics is the scientific study of language, and in particular the relationship between language form and language meaning.
Besides form and meaning, another important subject of study for linguistics is how language is used in context.


Phonetics
The human vocal tract can produce a wide range of sounds. But only certain sounds are selected as significant for communication. To identify and describe those sounds, we focus on each individual sound segment within a stream of speech.
The general study of the characteristics of speech sounds is called phonetics.
- Articulatory phonetics: how speech sounds are made or articulated;
- Acoustic phonetics: physical properties of speech as sound waves;
- Auditory phonetics: perception, via the ear, of speech sounds.
We exploit an already established framework for the study of speech segments known as IPA (International Phonetic Alphabet).
Consonants
When we describe the articulation of a consonant, the focus is on three features:
- the voice/voiceless distinction;
- the place of articulation;
- the manner of articulation.
Voiced and Voiceless Sounds - How to make a consonant sound
Air is pushed out by the lungs up through the trachea to the larynx. Inside the larynx, the vocal cords take two basic positions:
- Vocal cords spread apart, no obstruction for the air passing so these are voiceless sounds;
- Vocal cords drawn together, the air repeatedly pushes them apart as it passess through, with a vibration effect, so these produce voiced sounds.
To feel the distinction, try to place a fingertip gently on top of your Adam’s apple and produce:
- Z-Z-Z-Z or V-V-V-V voiced sounds -> vibration
- S-S-S-S or F-F-F-F voiceless sounds -> no vibration
Place of Articulation
After the larynx, the air enters the vocal tract via the pharynx. It is the pushed out through the mouth and/or the nose. Most consonant sounds are produced by using the tongue and other parts of the mouth.
The terms used to describe many sounds denote the place of articulation of the sound. The location inside the mouth at which the constriction take place.
To describe the place of articulation of most consonant sounds, we can start at the front of the mouth and work back.
- We consider the voiced-voiceless distinction and use the symbols of the IPA for specific sounds.
- The symbols are enclosed within [ ].

Familiar symbols
- [p] is used for the voiceless consonant, like in pop, [b] in Bob, [m] in mom
- [w] for voiced in wet
- [f] and [v] are used for labiodentals, it means that this kind of sound is emitted using the upper front teeth and lower lip at the beginning of fat and vat.
The voiceless [f] is at the beginning, and the voiced [v] is at the end of the pronunciation of five.
- Alveolar sounds (front of the tongue raised to the alveolar ridge) of [t] in tot, [d] in dad, [n] in nun and so on.
There are some unfamiliar symbols.
- For example [
] to refer to the voiceless version as in thin, three, wrath and so on. - We use [
] called eth for voiced version of thus, then and so on. These sounds are called dental because the teeth are involved.
Transcribing sounds
Written English poor guide for pronunciation.
Bang and tounge end with angma, only, there is no [g] sound despite the spelling
There are some single sounds that are represented in spelling two letters. In ship we pronunce [
Some similar sounds can have very different spellings. For example photo and enough are both pronunced as [f].
There are also words with letters that are not pronunced ad all, for example both Write and right are pronunced as [rait].
Tricky letters that suggest one sound but are pronunced with another. Like face vs phase and race vs raise (“ce” like [s] and “se” like [z]).
Manner of articulation
With respect to the place of articulation, [t] and [s] are similar (voiceless alveolars).
However they are different sounds, since they differ in their manner of articulation (pronounce)
- [t] sound is a stop consonant:
- Blocking the airflow very briefly, then letting it go abruptly.
- [s] sound is a fricative consonant:
- Pronunced by almost blocking the airflow, then letting the air escape through a narrow gap, creating friction.
Vowels
Vowel sounds are produced with a relatively free flow of air, typically voiced.
The place of articulation are: front, back, high, low areas (mouth).
For example, pronunciation of:
- heat and hit: “high,front” vowels because the sound is made with the front part of the tongue in a raised position
- Hot is a “low,back” vowel, the sound is generated from the back part of the mouth with tongue in lower part.

Diphtongs
Combine two vowel sounds.
For example Hi or bye we move from low to high front.
Alternatively we can use movement from low to high back, combining [a] and [u] to produce [au].
The vowels [e],[a] and [o] are used. As single sounds in other languages and by speakers of different varieties of English.
First sounds of diphtongs in American English. The pronunciation of some diphtongs in Southern British English is different from North American English.

Phonology
Phonology studies the systems and patterns of speech sounds in a language, focusing on their underlying structure rather than their physical variation.
It ignores individual differences in pronunciation (e.g., due to vocal tract shape, tiredness, or a cold) and instead looks at the abstract blueprint of sounds and how they function and interact within a language.
Example Consider the words tar, star, writer, butter and eighth as being the same, we mean that they would be represented in the same way.
In actual speech, these [t] sounds are all potentially very different from each other because they can be pronounced in such a different ways in relation to the other sounds around them.
However, all these articulation differences in [t] sounds are less important to us than the distinction between the [t] sounds in general and the [k] sounds, or the [f] sounds, or the [b] sounds, because there are meaningful consequences related to the use of one rather than the others.
These sounds must be distinct meaningful sounds, regardless of which individual vocal tract is being used to pronounce them, because the words tar, car, far and bar are meaningful distinct.
Phonemes
A phoneme describes each meaning-distinguishing sounds in a language.
The phoneme /t/ is described as a sound type, of which all the different spoken versions of [t] are tokens.
N.B.
The slash marks conventionally denote a phoneme, /t/, an abstract segment, whereas the square brackets, [t], is used for each physically produced segment. i.e, the actual realization of the sound, the possible implementation is [t].
A phoneme functions contrastively. It means that if i have two phonemes, i can distinguish them because when i use one the meaning of the word change if i use the other one.
- For example /f/ and /v/ are two phoenems because fat and vat or fine and vine are two distinct words.
- If we change one sound in a word and there is a change of meaning, the sound are distinct phonemes.
The descriptive terms we use to talk about sounds can be considered features that distinguish each phoneme from the next.
If the feature is present, we mark it with a (+) sign and if it is not present, we use a (-) sign.
Natural Classes
- /p/ is [-voice, +bilabial, +stop]
- /k/ is [+voice, +velar, +stop]
- /p/ and /k/ share some features they are members of a natural class of phonemes
- They tend to behave phonologically in similar ways
- /v/ is [+voice,+labiodental,+fricative] and is not of the same class of /p/ and /k/
That’s why words beginning with /pl-/ and /kl-/ are common in English, but words beginning with /vl-/ or /nl-/ are not.
This way we describe individual phonemes but also the possible sequences of phonemes in a language.
Distinctive Features of poor english phonemes:

Allophonemes
A phoneme is the abstract unit or sound type, and there are many different versions of that sound type produced in an actual speech.
Each phoneme in a set, all versions of the same phonemes are called allophones.

NB: the difference between phoneme and allophones is that substituting one phoneme for another will result in a different meaning (and pronunciation). But substituting allophones only results in a different (and perhaps unusual) pronunciation of the same word.
Minimal pair and sets
When two words, e.g., fan and van are identical in form except for a contrast in one phoneme occurring in the same position, the two words are described as a minimal pair.
When group of words can be differentiated, each one from the others, by changing one phoneme they are described as a minimal set.

Morphology
Morpholoy deals with the identification of the elementary units of meaning in the words.
In many languages what appear to be single forms actually turn up to contain many “word-like” elements.
Example: In Swahili (East-Africa), nitakupenda is something like I will love you.
This kind of investigation on the structure of words is the subject of morphology.
Morpheme
As in phonology, also in morphology we have morpheme. It’s a basic element that we could find in words.
Example:
- talks, talker, talked, talking consist of one element, talk, and other four elements –s, -er, -ed, -ing All five elements are morphemes (talk, -s, -r, -ed, -ing).
A morpheme is a minimal unit of meaning or grammatical function. Units of grammatical function indicate past tense or plural, for example:

Free and Bound Morphemes
Two type of morphemes:
- free morphemes: can stand by themselves as single words, e.g., new, tour
- bound morphemes: cannot stand alone and are attached to another form, e.g., re-, -ist, -ed, -s (known as affixes)
All affixes (prefixes and suffixes) in English, are bound morphemes.
Free morphemes can generally be identified as a set of separate English word forms such as, nouns, verbs, adjectives and adverbs.
When free morphemes are used with bound morphemes attached, the basic word forms are known as stems:

Categories of Free Morphemes: Lexical and Functional Morphemes
Lexical morphemes Set of ordinary (i.e regular) nouns (girl, house), verbs (break, sit), adjectives (long, sad) and adverbs (never, quickly).
Words that carry the content of the message we convey. We can add new lexical morphemes to the language, so they are an open class of words.
Functional morphemes Articles (a, the), conjunctions (and, because), prepositions (on, near) and pronouns (it, me).
We never add new functional morphemes to the language, so they are described as a closed class of words.
Categories of Bound Morphemes: Derivational morphemes and Inflectional Morphemes
The set of affixes making up the bound morpheme class is divided in derivational and inflectional morphemes.
Derivational morphemes, use of bound forms to make new words or to make words of a different grammatical category from the stem.
Adding the derivational morpheme –ment changes the verb encourage to the noun encouragement.
- The noun class can become verb classify by adding the derivational morpheme –ify
- Derivational morphemes can also be prefix, for instance, re-, pre-, ex-, mis-, co-, un-
Inflectional morphemes, indicate the grammatical function of a word, used to show if a word is plural or singular, past tense or not, if it is a comparative or possessive form.
English has only eight inflectional morphemes, all suffixes:


Morphological Description
An inflectional morpheme never change the grammatical category of a word
- Old and older are both adjectives (-er simply creates a different version of the adjective)
A derivational morpheme can change the grammatical category of a word
- Teach (verb) becomes Teacher (noun) if we add the derivational morpheme –er
- The suffix –er in Modern English can be an inflectional morpheme (as part of an adjective) and also a distinct derivational morpheme (as part of a noun)
If derivational and inflectional suffixes are used together, they always appear in that order. Example: First derivational (-er) is attached to teach, then the inflectional (-s) is added to produce teachers.

Grammar
With phonetic/phonology and morphology, we got two levels of description in the study of language.
Linguistic expressions as sequence of sounds that can be represented in the phonetic alphabet and described in terms of their features
- We can identify a voiced fricative, a voiceless stop and a diphthong as segments in the transcription of a phrase such as:

- We can take the same expression and describe it as sequence of morphemes:
Hence we could characterize all words and phrases of a language in terms of phonology and morphology.
The words in the lucky boys however, can only be combeind in a particular sequence. For example the following two phrases are not well-formed:
- boys the lucky
- lucky boys the
We need to follow strict rules for combining words into phrases. The article must go before the adjective which must go before the noun. Therefore the rule (the grammatical) is: article + adjective + noun.
The process of describing the structure of phrases and sentences in such a way to get all grammatical sequences in a language defines the grammar of a language.
Traditional Grammar
The terms article, adjective and noun for grammatical categories come from traditional grammar, originating from the description of languages such as Latin and Greek. Used as model for other grammars.
Several inherited terms from that model used in describing these basic grammatical components “part of speech”, and how they connect to each other in terms of “agreement”.
Parts of speech and agreement

Describing the Structure of a Language
But you cannot use a prescriptive approach because it is not suited for non-latin languages.
So another approach is the descriptive approach. They collect samples of the target language in order to describe the regular structures of that language as it is used.
There are two types of descriptive approaches:
- Structural analysis: investigates the distribution of forms in language, adopts “test-frames” i.e., could be sentences with empty slots
- Constituent analysis: it is designed to show how small constituents (or components) go together to form larger constituents.
Structural Anaylsis
Some fillers are not single nouns but longer expressins that act like nouns (i.e “the professor with the northen accent”). Forms like “It”, “the big dog”, “an old car” cannot fil the first test-frame, it doesn’t make sense. So it must be modified a bit like the purple test-frame, that is well suited.
These are called noun phrase (NP): they can be just a pronoun or a whole descriptive phrase that functions as a noun.
Constituent Analysis
One basic step is determining how words go together to form phrases.
Example:
- The old woman brought a large snake from Brazil (9 constituents)
- The phrase-like constituents are combinations of the following types: “The old woman”, “a large snake”, “Brazil” (phrases that starts with a nouns), then “from Brazil” (a prepositional phrase), and “brought” (a verb)
The analysis of the constituent structure of the sentence can be represented as:
This way, we can determine the types of forms that can be substituted for each other at different levels of constituent structure.

Subjects and Objects
The term noun phrase is used to describe the form of the expression (i.e., it has a noun or pronoun in it).
From the previous analysis, we can also understand the different grammatical functions of constituent phrases. Like, what the phrase does in a sentence.
A noun phrase can function as a subject or object. In English, word order shows these functions:
- Subject: the first the first noun phrase before the verb and
- Object as the noun phrase after the verb
The other phrase at the end of our example sentence is an adjunct (often a prepositional phrase)

Word Order
The basic linear order of constituent in english is: Noun Phrase-Verb-Noun Phrase, that we can represent as (NP V NP). And their typical grammatical functions are Subject-Verb-Object (SVO).
But this is not true for every language, as shown in this table:

Syntax
We humans tend to interpret sentences on an expected structure. When we arrive at an unsound interpretation we go back and try to use a different structure. Recognizing the underlying structures of sentences to make sense of them.
“Time flies like an arrow; Fruit flies like a banana”
When we concentrate on the structure and ordering of components within a sentence, we’re studying the syntax of a language.
The goal is to have a small and finite set of rules capable of producing a large and potentially infinite number of well-formed structures (generative grammar)
Syntactic Analysis
In syntactic analysis, some conventional abbreviations for the part of speech and for phrases are used:
- POS: N(=noun), Art(=article), Adj(=adjective), V(=verb)
- Phrase: NP (= noun phrase), VP (= verb phrase), PP (= preposition phrase), PN(= proper noun), Pro (= pronoun)
A verb phrase (VP) consists of the verb (V) plus the following noun phrase (NP)
Turning to a more dynamic format, we could represent the concept “consists of” with an arrow (
A set of rules to create English phrases could be:
- NP
Art(Adj) - NP
Pro - NP
PN
Or in a more compact way: NP
The structure of a phrase of a specific type will consist of one or more constituents in a particular order.
Let’s consider a set of simple (yet incomplete) phrase structure rules:
- S
NP VP : means a sentence (S) rewrites as a noun phrases (NP) and a verb phrase (VP) - NP
{Art(Adj), N, Pro, PN}: means a phrase (NP) rewrites as either an article plus an optional adjective or a pronoun or a proper noun - VP
V NP: means a verb phrase rewrites as a verb plus a noun phrase

Lexical Rules
Phrase structure rules generate structures and to turn the structures into recognizable English, we need lexical rules
Lexical rules specify which words can be used when we rewrite constituents such as PN.


Tree Diagrams
A tree diagram is a way to create a visual representation of underlying syntactic structure.
It shows that there are different levels in the analysis:
- A level at which a constituent (e.g., NP) is represented;
- A different lower level at which a constituent such as N is represented;


The problem is that there could be syntactic ambiguity:

- In the first case we have PP attached to NP that is modifier-of-object rading: means that straweberries are the ones that have chocolate on/with them.
- In the second case, PP attached to VP, instrument reading. Means that Alice uses chocolate when eating them, e.g., she dips them.
Still on syntactic ambiguity, we have that the different structures depend on some lexical ambiguity (a single word has multiple possible parts of speech or senses):
- Flies is a verb in the first part and noun in the second part
- like is a preposition in the first part and a verb in the second part.

Semantics
Semantics is the study of meaning in language.
We can distinguish between:
- Referential meaning: Components of meaning that are conveyed by the literal use of a word. Think of dictionary definitions of words.
- Associative or emotive meaning: Feelings or reaction to words that may be found among some individuals or groups but not others.
Example: the word needle:
- some basic components might include “thin, sharp, steel instrument”.
- however, different people might have different associations attached to the word needle “pain”, “illness”, “blood”, “drugs”, “thread”, “knitting”, “hard to find”.
Meaning
Referential meaning may help to account for the “oddness” when one reads some sentences.
The hamburger ate the boy The table listens to the radio The horse is reading the newspaper
All the sentences are syntactically good, but semantically odd. The kind of noun used with ate must denote a living or “animate” entity that is capable of eating.
Semantic Features
The meaning of a word can be analyzed in terms of its semantic features.
- For example: boy has the feature [+animate] and [+human], while horse [-human]
We can characterize which semantic feature is required in a noun to appear as the subject of a particular verb.
That is, predicting which nouns (boy, hamburger, horse) would fit in a sentence appropriately and which would be odd.

Componential Analysis
Features such as [+human] or [+adult] can be treated as basic elements or components of meaning in the approach called componential analysis.

This approach is not without problems:
- For many words it may not be easy to determine components of meaning. Think of nouns like advice, threat and warning
- This is because the words are treated as a sort of containers the carry meaning components
Semantic Roles
Instead of viewing words as containers of meaning, we can look at the roles they play within the situation described by a sentence.
- Let’s consider a simple event as in “The boy kicked the ball”
- The verb describes an action (kick)
- The noun phrases describe the roles of entities (people and things) involved in the action.
- We can identify a small number of semantic roles for these noun phrases.
- The NP the boy is the entity that performs the action (agent)
- The ball is another role, the entity that is involved in or affected by the action (theme)
- The theme can also be an entity that is simply being described (e.g., The ball was red)
Agents can also be non-humans (i.e the wind blew the ball away). The theme is typically non-human, but can be human (i.e the dog chased the boy).
If an agent uses another entity to perform an action, the latter fills the role of instrument:
- the boy cut the rope with an old razor.
- he drew the picture with a crayon, with is a clue to discover instruments in English.
When a noun phrase is used to designate an entity as the person who has a feeling, perception, or state, it plays the role of experiencer. If we feel, know, hear, or enjoy something, we’re not performing an action (hence we are not agents)
- The woman feels sad
- Did you hear that noise?
A few semantic roles designate where an entity is in the description of an event:
- where an entity is (on the table, in the room) play the role of location
- We drove from Chicago to New Orleans
- where the entity moves from is the source (from Chicago)
- where it moves to is the goal (to New Orleans)

Lexical Relations
Words can also have “relationships” with each other:
- To explain conceal, we might say “It’s the same as hide”
- Shallow could be explained as ”the opposite of deep”
- Pine is a ”kind of tree”
This approach is used for the semantic description of a language and is known as the analysis of lexical relations.
The lexical relations above are:
- synonymy (conceal/hide): Two or more words with very closely related meanings
- antonymy (shallow/deep): Two forms with opposite meanings
- hyponymy (pine/tree): When the meaning of one form is included in the meaning of another.
Hyponymy:

Homophones
- When two or more different (written) forms have the same pronunciation
- To/too/two, right/write, flour/flower
Homonyms
- When one form (written or spoken) has two unrelated meanings
- Bat (flying creature) – bat (used in sports); race (contest of speed) – race (ethnic group)
Polysemy Two or more words with the same form and related meanings
- Head (object on top of your body, froth on top of a glass of beer, person at the top of a company or department)
- Foot (of a person, of a bed, of a mountain)
Collocation
A mature speaker of a language knows which words tend to occur with other words:
- Asking people what they think of when one says hammer, the most will say nail, similarly, table let’s us think of chair or needle elicits thread, or salt elicits pepper
In other words, one way to organize our knowledge of words is simply based on collocation, or frequently occurring together.
In recent years more attention has received Corpus linguistics, it focuses on the study of which words occur together, and their frequency of cooccurrence.
A corpus is a large collection of texts, spoken or written, to find out how often specific words or phrases occur and what type of collocation are most common
A concordance is a listing of each occurrence of a word (or phrase) in a corpus, along with the words surrounding it.
The word being studied is described as the “key word in context”.
Example: key word= “sarcastic”

This type of research provides more evidence that our understanding of what words and phrases mean is tied to the context in which they are typically used.
Pragmatics
Pragmatics refers to the way in which we use words and sentences to conveys a specific meaning. The idea here is to investigate che intention of a speaker or writer when comunicating something.
In some sense, is the study of invisible meaning or how we recognize what is meant even when it is not said or written.
This means that speakers (or writers) must be able to depend on a lot of shared assumptions and expectations when they try to communicate. Investigating that provides us some insight into how we understand more than just the linguistic content of utterances.
The meaning of the text is not in the words alone, but in what we think the writer intended to communicate in that context.
Let’s consider the advertisement in the figure:
We may think that someone is announcing the sale of some very young children. But we refuse that interpretation and assume that it is clothes for those children that are on sale. Yet the word clothes is nowhere in the message.
The influence of the context is crucial, here it is the physical context, the location where we encounter words and phrase. Differently, in the linguistic context, the surrounding words, also known as co-text helps to understand what is meant.
- For example if we see “bank” in front of a building we are sure that bank is the financial institution and not the bank of a river.
References
Reference is an act by which a speaker (or writer) uses language to enable a listener (or reader) to identify something. Words themselves don’t refer to anything, people refer.
To this aim, we can use proper nouns (Chomsky, Jennifer, Ciro), other nouns in phrases (a writer, my friend, the cat) or pronouns (he, she, it). These words identify someone or something uniquely, or for each word or phrase there is a “range of reference”.
- Jennifer, or friend or she can be used to refer to many entities in the world.
An expression such as The war doesn’t directly identify anything by itself, because its reference depends on who is using it.
We can also refer to things when we are not sure what to call them: “The blue thing, that icky stuff”.
A successful act of reference depend also on the listener/reader’s ability to recognize what the speaker/writer means. The key process here is called inference. Examples:
- Where’s the spinach salad sitting? (ask a waiter), He’s sitting by the door (replies a second waiter)
- Can i look at your Chomsky? Sure, it’s on the shelf over there!
- Jennifer is wearing Calvin Klein!
An inference is additional information used by the listener to create a connection between what is said and what must be meant.
We usually make a distinction between how we introduce new referents (a puppy) and how we refer back to them (the puppy, it)

The second (or subsequent) referring expression is an example of anaphora (“referring back”). The first mention is the antecedent Anaphora: a subsequent reference to an already introduced entity
The connection between antecedents and anaphoric expressions is often based on inference:

In some cases the antecedent can be a verb: “The victim was shot twice, but the gun was never recovered”. Any “shooting” events must involve a gun.
Presuppositions
When we talk about an assumption made by the speaker (or writer), we talk about presupposition.
In general, we design our linguistic messages based on large-scale assumptions about what our listeners already know. What a speaker (or writer) assumes is true or known by a listener (or reader) is a presupposition.
Example:
- “Hey, your brother is looking for you.” -> There is a presupposition that you have a brother
- “When did you stop smoking?” -> Two presuppositions: (1) you used to smoke; (2) you no longer do so
Pragmatic Markers
Pragmatic markers are used to mark a speaker’s attitude to the listener or to what is being said.
Short forms such as you know, well, I mean, I don’t know
- You know -> used to indicate that the knowledge is being treated as shared
- I mean -> used to self-correct or to mark an attempt to clarify something

- I don’t know has evolved to become a marker of hesitation or uncertainty when a speaker is about to say something potentially in disagreement with another speaker

The speaker can signal a desire not to challenge the other speaker by appearing hesitant about disagreeing.
Speech Act
The term speech act describes an action that involves language such as “requesting”, “commanding”, “questioning”, or “informing”.
Example: I’ll be there at six
- It’s a kind of performing the speech act of promising
Speech act: the action performed by a speaker with an utterance.
To understand how utterances can be used to perform actions we need to visualize a relationship between the structure of an utterance and the normal function of that utterance
