Notes 4 Learners: Artificial Intelligence Unit-3

Artificial Intelligence Notes

Unit-3

Lecture-1

Natural Language Processing

The Japanese Fifth Generation Computer Report has explicitly stated that for the Fifth Generation Computers, the input will be via voice, pictures and natural language dialogues. Present day systems expect people to learn and master a computer language and communicate with it. In other sense, the humans have to rise to the level of the machine. In order to bridge this gap between the humans and machines, computer scientists and linguists are vigorously working on Natural Language Processing (NLP).

“NLP is a subfield of AI which deals with the methods of communicating with a computer in ones own natural language”

Natural Language Processing minimizes many hardships a person faces while communicating with the computer.

· One need not be a computer literate to communicate with it.

· One can dispose off special query languages, which presently humans use to access information from databases.

· Information generated worldwide daily from various sources and presented in newspapers, magazines and written material can be condensed and presented to the user in a capsule form.

· There is a human touch in the NLP system. One feels at home by directly communicating with the machine.

· NLP system when coupled with speech recognition and synthesis system is certain to give the humans a shot in the arm.

Fundamental Problems in NLP

Problem 1 Words used by one set of people could have different meaning for a different set of people.

e.g. If we give the sentence “This flat is terrible”, to an Englishman and an American, the former would give the meaning for flat as a house while the American would mean it as puncture.

Problem 2 The ‘functional structure’ of the sentence itself can give rise to ambiguities.

e.g. in the sentence “I saw Taj Mahal flying over Agra”, who is flying, Taj Mahal or the person who spoke the sentence.

Problem 3 Extensive use of pronouns increases ambiguities.

Consider the following sentence:”Ravi went to the supermarket. He found his favorite brand of coffee powder in rack five. He paid for it and left. The question is: To what object the pronoun “it” refers to: the supermarket or the coffee powder or rack five?

Problem 4 Conjunctions used in natural language to avoid repetition of phrases also cause NLP problems.

e.g. Consider the sentence: “Ram and Shyam went to a restaurant. While Ram had a cup of coffee and Shyam had tea”. In the sentence we have suppressed the term “had a cup of” for Shyam but the meaning is well understood by humans while it might be difficult for the machine.

Problem 5 Ellipses is a major problem which NLP systems find difficult to manage. In Ellipses, one does not state explicitly some words, but leaves it to the audience to fill it up.

An example for this is “What is the length of river Ganges? Of river Cauvery?

Because of these five major problems, at present, we find it difficult to build NLP systems.

Some commercial NLP Systems Available

· Clout Systems by Microrim.

· INTELLECT BY Artificial Intelligence Corporation.

· Naturallink by Texas Instruments.

· Paradox by Ansa Corporation.

· Q & A by Symantec Corporation.

· SAVVY by Apple Computer Systems.

· Themis by Frey Associates.

Where will NLP Systems Available

· In the inquiry centers of Airports, railway reservation counters, share markets etc.

· In natural language interfaces for database querying systems.

· In language teaching

· In text understanding and generation.

· In conferences.

Artificial Intelligence Notes

Unit-3

Lecture-2

Phases of Natural Language Processing

· Phonological Analysis: Phonology is the analysis of spoken language. Therefore, it deals with speech recognition and generation. The core task of speech recognition and generation system is to take an acoustic waveform as input and produce as output, a string words. The phonology is a part of natural language analysis, which deals with it. The area of computational linguistics that deals with speech analysis is computational phonology.

· Morphological Analysis: It is the most elementary phase of NLP. It deals with the word formations. In this phase, individual words are analyzed according to their components called “morphemes”. In addition, non word token such as punctuation etc. are separated from words. Morpheme is the basic grammatical building block that makes words.

The study of word structure is referred to as morphology. The task of breaking a word into its morpheme is called morphological parsing. A morpheme is defined as minimal meaningful unit in a language, which cannot be further broken into smaller units.

So for example, word “fox” consists of a single morpheme, as it cannot be further resolved into smaller units. Whereas word “cats” consists of two morphemes the morpheme ‘cat’ and morpheme‘s’ indicating plurality.

Morphemes are traditionally divided into two types:-

1) Free Morphemes: that are able to act as words in isolation. (e.g. think, permanent, local)

2) Bound Morphemes: that can operate only as part of other words (e.g. “is”, “ing” etc.)

The morpheme which forms the central part of word is also called “stem”. In English, a word can be made up of one or more morphemes.

E.g. Word-Think =>stem “think”

Word-localize =>stem “local”, suffix “ize”

Word-Denationalize =>prefix “de”, stem “nation”, suffix “al” , “ize”)

The computational tool to perform morphological parsing is finite state transducer. A transducer performs it by mapping between two sets of symbols. A transducer normally consists of four parts:- recognizer, generator, translator & relator. The output of transducer becomes a set of morphemes.

· Lexical Analysis: In this phase of natural language analysis, validity of words according to lexicon is checked. Lexicon stands for dictionary. It is a collection of all possible valid words of the language along with their meaning. In NLP, the first stage of processing input text is to scan each word in the sentence and compute (or lookup) all the relevant linguistic information about the word. The lexicon provides the necessary rules and data for carrying out the first stage analysis.

· Syntactic Analysis: Syntax refers to the study of formal relationships between words of sentences. In this phase the validity of a sentence according to grammar rules is checked. To perform syntactic analysis, the knowledge of grammar & parsing technique is required. The grammar is formal specification of rules allowable in the language, and parsing is a method of analyzing a sentence to determine its structure according to grammar. The most common grammar for natural languages is context free grammar (CFG) also called phase structure grammar and definite clause grammar. Syntactic analysis is done using parsing. Two basic parsing techniques are: top-down parsing and bottom up parsing.

· Semantic Analysis: Semantics deals with the meaning of natural language sentences. In this phase, meaning of the sentence is understood. If computers wish to communicate by means of natural languages a computational representation of these sentences is required to capture the meaning.

Natural Language semantics consists of two major features:

(i) How to represent the meanings in a way that allows all the necessary manipulations.

(ii) How to relate these representations to that part of linguistic model which deals with structure (the grammar or syntax).

· Pragmatic and discourse analysis: Pragmatics is the study of relation between language and context of use. Context of use includes such things as the identities of the people and objects. Hence, pragmatics includes analysis of how language is used to refer to pupil and things. In this phase of NLP, the main intention of speaker behind a message is understood.

The discourse is a collection of sentence, but arbitrary collection of multiple sentences does not make discourse. In discourse, the sentences must be coherent. Collection of well-formed and individually interpretable sentences often forms the coherent discourse.

Artificial Intelligence Notes

Unit-3

Lecture-3

Learning

Human beings are blessed with several unique characteristics. Learning is also one out of many such types of characteristics that enables humans to acquire new situations. Learning is an incessant process. It is done by viewing, listening, interacting with others, studying with others etc. and also by experience. Learning helps us in not only acquiring new knowledge but also in refining and updating the knowledge already possessed. It helps us in correcting our mistakes once committed and makes us capable not only to repeat our performance but also to enhance the level of perfection in our performance.

Language understanding and learning are two out of the most vital human skills that are most difficult to computerize. A machine cannot be called intelligent, if it does not have power of learning. Hence an intelligent machine should be able to learn new things and to adopt new situations rather than simply doing things as they are told to be done.

The general model of learning is described below:-

Hebert Simon has defined learning as:-

“Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population”.

The Salient features of learning are:-

· Skill Refinement: Term skill refinement refers to the situation of improving the skill by performing same task again and again. Human beings also become more accustomed to perform a task more efficiently and perfectly by handling it multiple times. Similar should be applicable to machines. If machines are able to improve their skills with the handling of task, they can be said having skills of learning.

· Knowledge Acquisition: The improvement in the skill is because we have a tendency to remember the experience or gain some knowledge by handling the task. This is known as knowledge acquisition. Any machine liable to be called possessing learning capabilities should have tendency and ability to acquire knowledge. Process of knowledge acquisition proceeds in two steps:- In first step the initial knowledge base construction is done. In his step, initial facts regarding some tasks are fed. The second step is refinement of knowledge. In this step, additional knowledge that refines the knowledge base is fed.

Types of Knowledge:

Ø Rote Learning (Learning through memorization)

Ø Learning by taking advice

Ø Learning by induction

Ø Explanation based learning

1. Rote Learning: It is the most elementary form of learning. In this, the data of a particular task is simply stored (or memorized).

e.g. if a game is played, the winning sequence of moves will simply be remembered by the players involved and next time, instead of selecting a move arbitrarily from all possible moves, they will select a move from among the “winning moves”. Hence by repeatedly remembering the winning situation, the choice of move becomes improved.

It is similar to learning by studying in case of human beings. From computational aspect, whenever computer executes some program, it stores the computed values for further use, it will save the computational time in future and later programs will be executed with a faster pace compared to previous ones. This is called data catching. When programs involve extensive computations, then using normal strategy of computations becomes more expensive in terms of time, and data catching saves significant amount of time. Catching is used in many AI programs to produce some surprising performance improvements.

One good example of this is Samual’s(1963) checkers program. It was designed to store the chess moves played by its creator and thus used to learn the game. It is learned to play checkers to such a good extent that ultimately it was able to beat its own creator.

2. Learning by Taking Advice: Out of the many ways to learn, learning by taking advice is one way. In normal life span, human beings use this method quite often. Rather, we learn most of the things in our early life by this method only. Right from our parents, relatives to our teachers, when we start going to school, give us valuable advice as and when required and we learn almost all the initial things and acquire all of the early knowledge through learning by taking advice from others.

We all know that computers have nothing of their own. They function on the basis of the program fed on them. Computer programs are written by programmers. When a programmer writes a computer program, he or she gives many instructions to computer to follow, the same way a teacher gives instructions to his or her students. A computer follows the instructions given by the programmer. Hence, a kind of learning takes place when computer runs a particular program by taking advice from the creator of the program.

3. Learning by Induction: It is similarity based learning. In this, large number of examples are given and machine learns to perform similar actions in similar situations. In case of human beings also, this form of learning is frequently used. When we are children, our teacher tells us so many things by giving examples. Suppose there are two fruits, one is green apple and other a pear. As an adult it is easy to make a difference however, for a child, it might not be easy to differentiate between above two fruits. In such situations, various examples of both the fruits are given to teach the difference.

Similarly, in our daily life we see many examples of birds flying. Also, there are examples that when there are clouds in the sky, it rains. Based on these examples we formulate certain rules like, “all birds can fly” and “clouds bring rain”.

When we formulate such types of rules and use them to draw conclusions in given situations, we learn the things by induction.

Induction means “the inferring of general laws from particular instances”. Thus, inductive learning means, generalization of knowledge gathered from real world examples & use of the same for solving similar problems.

4. Explanation based learning: There are primarily two approaches of learning i.e. data intensive approach and knowledge intensive approach. In the data intensive approach (like rote learning), the main emphasis is given on data. Large number of positive and negative examples are given to the machine and based on the outcomes of these examples, the machine learns. The explanation based learning is knowledge intensive approach. It is also called situation based learning. In this more emphasis is given on knowledge. There are certain applications where instead of data, the knowledge plays more important role.

e.g. chess playing, medical diagnosis.

In such type of applications, a critical situation is handled and machine is given the example to handle this situation. This is explanation based learning.

In general, we can say that in case of explanation based learning, the emphasis is given to learn more by taking the critical examples, rather than by having more examples. In this, one positive training example is given to machine in order to perform the learning.

Notes 4 Learners

Artificial Intelligence Unit-3

1 comment:

FEEDBACK

Blog Archive