Artificial
Intelligence Notes
Unit-3
Lecture-1
Natural Language
Processing
The Japanese Fifth Generation Computer Report has explicitly
stated that for the Fifth Generation Computers, the input will be via voice,
pictures and natural language dialogues. Present day systems expect people to
learn and master a computer language and communicate with it. In other sense,
the humans have to rise to the level of the machine. In order to bridge this
gap between the humans and machines, computer scientists and linguists are vigorously
working on Natural Language Processing (NLP).
“NLP is a subfield
of AI which deals with the methods of communicating with a computer in ones own
natural language”
Natural Language Processing minimizes many hardships a person
faces while communicating with the computer.
·
One
need not be a computer literate to communicate with it.
·
One
can dispose off special query languages, which presently humans use to access
information from databases.
·
Information
generated worldwide daily from various sources and presented in newspapers,
magazines and written material can be condensed and presented to the user in a
capsule form.
·
There
is a human touch in the NLP system. One feels at home by directly communicating
with the machine.
·
NLP
system when coupled with speech recognition and synthesis system is certain to
give the humans a shot in the arm.
Fundamental Problems in NLP
Problem 1 Words used by
one set of people could have different meaning for a different set of people.
e.g.
If we give the sentence “This flat is terrible”, to an Englishman and an
American, the former would give the meaning for flat as a house while the American
would mean it as puncture.
Problem 2 The ‘functional structure’ of the
sentence itself can give rise to ambiguities.
e.g.
in the sentence “I saw Taj Mahal flying over Agra”, who is flying, Taj Mahal or
the person who spoke the sentence.
Problem 3 Extensive use of pronouns increases
ambiguities.
Consider
the following sentence:”Ravi went to the supermarket. He found his favorite
brand of coffee powder in rack five. He paid for it and left. The question is:
To what object the pronoun “it” refers to: the supermarket or the coffee powder
or rack five?
Problem 4 Conjunctions used in natural
language to avoid repetition of phrases also cause NLP problems.
e.g.
Consider the sentence: “Ram and Shyam went to a restaurant. While Ram had a cup
of coffee and Shyam had tea”. In the sentence we have suppressed the term “had
a cup of” for Shyam but the meaning is well understood by humans while it might
be difficult for the machine.
Problem 5 Ellipses is a
major problem which NLP systems find difficult to manage. In Ellipses, one does
not state explicitly some words, but leaves it to the audience to fill it up.
An
example for this is “What is the length of river Ganges? Of river Cauvery?
Because
of these five major problems, at present, we find it difficult to build NLP
systems.
Some commercial NLP
Systems Available
·
Clout
Systems by Microrim.
·
INTELLECT
BY Artificial Intelligence Corporation.
·
Naturallink
by Texas Instruments.
·
Paradox
by Ansa Corporation.
·
Q
& A by Symantec Corporation.
·
SAVVY
by Apple Computer Systems.
·
Themis
by Frey Associates.
Where will NLP
Systems Available
·
In
the inquiry centers of Airports, railway reservation counters, share markets
etc.
·
In
natural language interfaces for database querying systems.
·
In
language teaching
·
In
text understanding and generation.
·
In
conferences.
Artificial
Intelligence Notes
Unit-3
Lecture-2
Phases of Natural Language
Processing
·
Phonological
Analysis: Phonology is the analysis of spoken
language. Therefore, it deals with speech recognition and generation. The core
task of speech recognition and generation system is to take an acoustic waveform
as input and produce as output, a string words. The phonology is a part of
natural language analysis, which deals with it. The area of computational
linguistics that deals with speech analysis is computational phonology.
·
Morphological
Analysis: It is the most elementary phase of NLP.
It deals with the word formations. In this phase, individual words are analyzed
according to their components called “morphemes”. In addition, non word token
such as punctuation etc. are separated from words. Morpheme is the basic
grammatical building block that makes words.
The
study of word structure is referred to as morphology. The task of breaking a
word into its morpheme is called morphological parsing. A morpheme is defined
as minimal meaningful unit in a language, which cannot be further broken into
smaller units.
So
for example, word “fox” consists of a single morpheme, as it cannot be further
resolved into smaller units. Whereas word “cats” consists of two morphemes the
morpheme ‘cat’ and morpheme‘s’ indicating plurality.
Morphemes
are traditionally divided into two types:-
1)
Free
Morphemes: that are able to act as words in isolation. (e.g.
think, permanent, local)
2)
Bound
Morphemes: that can operate only as part of other words (e.g.
“is”, “ing” etc.)
The
morpheme which forms the central part of word is also called “stem”. In English,
a word can be made up of one or more morphemes.
E.g.
Word-Think =>stem “think”
Word-localize =>stem “local”, suffix “ize”
Word-Denationalize =>prefix “de”, stem “nation”, suffix
“al” , “ize”)
The
computational tool to perform morphological parsing is finite state transducer.
A transducer performs it by mapping between two sets of symbols. A transducer
normally consists of four parts:- recognizer, generator, translator &
relator. The output of transducer becomes a set of morphemes.
·
Lexical
Analysis: In this phase of natural language
analysis, validity of words according to lexicon is checked. Lexicon stands for
dictionary. It is a collection of all possible valid words of the language
along with their meaning. In NLP, the first stage of processing input text is
to scan each word in the sentence and compute (or lookup) all the relevant
linguistic information about the word. The lexicon provides the necessary rules
and data for carrying out the first stage analysis.
·
Syntactic
Analysis: Syntax refers to the study of formal
relationships between words of sentences. In this phase the validity of a
sentence according to grammar rules is checked. To perform syntactic analysis,
the knowledge of grammar & parsing technique is required. The grammar is
formal specification of rules allowable in the language, and parsing is a
method of analyzing a sentence to determine its structure according to grammar.
The most common grammar for natural languages is context free grammar (CFG)
also called phase structure grammar and definite clause grammar. Syntactic
analysis is done using parsing. Two basic parsing techniques are: top-down parsing
and bottom up parsing.
·
Semantic
Analysis: Semantics deals with the meaning of
natural language sentences. In this phase, meaning of the sentence is
understood. If computers wish to communicate by means of natural languages a
computational representation of these sentences is required to capture the
meaning.
Natural
Language semantics consists of two major features:
(i)
How to represent the meanings in a way
that allows all the necessary manipulations.
(ii)
How to relate these representations to
that part of linguistic model which deals with structure (the grammar or
syntax).
·
Pragmatic
and discourse analysis: Pragmatics is the study of relation between language and context
of use. Context of use includes such things as the identities of the people and
objects. Hence, pragmatics includes analysis of how language is used to refer
to pupil and things. In this phase of NLP, the main intention of speaker behind
a message is understood.
The
discourse is a collection of sentence, but arbitrary collection of multiple
sentences does not make discourse. In discourse, the sentences must be
coherent. Collection of well-formed and individually interpretable sentences
often forms the coherent discourse.
Artificial
Intelligence Notes
Unit-3
Lecture-3
Learning
Human beings are blessed with several
unique characteristics. Learning is also one out of many such types of
characteristics that enables humans to acquire new situations. Learning is an
incessant process. It is done by viewing, listening, interacting with others,
studying with others etc. and also by experience. Learning helps us in not only
acquiring new knowledge but also in refining and updating the knowledge already
possessed. It helps us in correcting our mistakes once committed and makes us
capable not only to repeat our performance but also to enhance the level of
perfection in our performance.
Language understanding and learning
are two out of the most vital human skills that are most difficult to
computerize. A machine cannot be called intelligent, if it does not have power
of learning. Hence an intelligent machine should be able to learn new things
and to adopt new situations rather than simply doing things as they are told to
be done.
The general model of learning is
described below:-
Hebert Simon has defined learning
as:-
“Any
change in a system that allows it to perform better the second time on
repetition of the same task or on another task drawn from the same population”.
The Salient features of learning are:-
·
Skill Refinement: Term skill refinement refers to the
situation of improving the skill by performing same task again and again. Human
beings also become more accustomed to perform a task more efficiently and
perfectly by handling it multiple times. Similar should be applicable to
machines. If machines are able to improve their skills with the handling of
task, they can be said having skills of learning.
·
Knowledge Acquisition: The improvement in the skill is
because we have a tendency to remember the experience or gain some knowledge by
handling the task. This is known as knowledge acquisition. Any machine liable
to be called possessing learning capabilities should have tendency and ability
to acquire knowledge. Process of knowledge acquisition proceeds in two steps:-
In first step the initial knowledge base construction is done. In his step,
initial facts regarding some tasks are fed. The second step is refinement of
knowledge. In this step, additional knowledge that refines the knowledge base
is fed.
Types of Knowledge:
Ø Rote Learning (Learning through
memorization)
Ø Learning by taking advice
Ø Learning by induction
Ø Explanation based learning
1.
Rote Learning: It is the most elementary form of
learning. In this, the data of a particular task is simply stored (or
memorized).
e.g. if a game is played, the winning sequence of moves will simply
be remembered by the players
involved and next time, instead of selecting a move arbitrarily from all
possible moves, they will select a move from among the “winning moves”. Hence
by repeatedly remembering the winning situation, the choice of move becomes
improved.
It is similar to learning by
studying in case of human beings. From computational aspect, whenever computer
executes some program, it stores the computed values for further use, it will
save the computational time in future and later programs will be executed with
a faster pace compared to previous ones. This is called data catching. When
programs involve extensive computations, then using normal strategy of
computations becomes more expensive in terms of time, and data catching saves
significant amount of time. Catching is used in many AI programs to produce
some surprising performance improvements.
One good example of this is Samual’s(1963) checkers program. It was
designed to store the chess moves played by its creator and thus used to learn
the game. It is learned to play checkers to such a good extent that ultimately
it was able to beat its own creator.
2.
Learning by Taking Advice: Out of the many ways to learn,
learning by taking advice is one way. In normal life span, human beings use
this method quite often. Rather, we learn most of the things in our early life
by this method only. Right from our parents, relatives to our teachers, when we
start going to school, give us valuable advice as and when required and we
learn almost all the initial things and acquire all of the early knowledge
through learning by taking advice from others.
We all know that computers have nothing of their own. They function on
the basis of the program fed on them. Computer programs are written by
programmers. When a programmer writes a computer program, he or she gives many
instructions to computer to follow, the same way a teacher gives instructions to
his or her students. A computer follows the instructions given by the
programmer. Hence, a kind of learning takes place when computer runs a
particular program by taking advice from the creator of the program.
3.
Learning by Induction: It is similarity based learning. In
this, large number of examples are given and machine learns to perform similar
actions in similar situations. In case of human beings also, this form of
learning is frequently used. When we are children, our teacher tells us so many
things by giving examples. Suppose there are two fruits, one is green apple and
other a pear. As an adult it is easy to make a difference however, for a child,
it might not be easy to differentiate between above two fruits. In such
situations, various examples of both the fruits are given to teach the
difference.
Similarly, in our daily life we see many examples of birds flying. Also,
there are examples that when there are clouds in the sky, it rains. Based on
these examples we formulate certain rules like, “all birds can fly” and “clouds
bring rain”.
When we formulate such types of rules and use them to draw conclusions in
given situations, we learn the things by induction.
Induction means “the inferring of general laws from particular
instances”. Thus, inductive learning means, generalization of knowledge
gathered from real world examples & use of the same for solving similar
problems.
4.
Explanation based learning: There are primarily two approaches
of learning i.e. data intensive approach and knowledge intensive approach. In
the data intensive approach (like rote learning), the main emphasis is given on
data. Large number of positive and negative examples are given to the machine
and based on the outcomes of these examples, the machine learns. The
explanation based learning is knowledge intensive approach. It is also called
situation based learning. In this more emphasis is given on knowledge. There
are certain applications where instead of data, the knowledge plays more
important role.
e.g. chess playing, medical diagnosis.
In such type of applications, a critical situation is handled and machine
is given the example to handle this situation. This is explanation based
learning.
In general,
we can say that in case of explanation based learning, the emphasis is given to
learn more by taking the critical examples, rather than by having more
examples. In this, one positive training example is given to machine in order
to perform the learning.
A pleasure to read about English Grammar when the person sounds like they know what they're talking about. Extremely informative to understand 4 types of sentences based on structure . Very interesting facts here . Thank you and wish you good luck.
ReplyDelete