Wednesday, June 5, 2019
Linguistic Automatic Generation Natural Language
lingual Automatic Generation Natural speech colloquy1. Introduction1.1. The Problem contentionThis thesis deals with the problem of Automatic generation of a UML Model from Natural Language parcel program Requirement Specifications. This thesis describes the information of Auto Modeler an change softw ar system Engineering tool that takes Natural Language softw ar package strategy Requirement Specifications as Input, performs an alter OO summary and tries to produce an UML Model (a partial unitary in its present state i.e. static Class diagrams all) as awayput. The spate for Auto Modeler is described in 23.1.2. MotivationWe conducted a short position of the software program Industry in Islamabad in order to observe what sorts of Automated Software Engineering Tools were required by the Software ho intentions. The result of the valuate (see Appendix-I for the survey report) indicated that there is demand for such a tool as Auto Modeler. Since such tools i.e. 23 that have already been essential are either non available in the market or are very expensive, and thus out of the r all(prenominal) of most software houses. on that pointfore we decided to build our give birth tool that disregard be employ by the software industry in order to enable them to be more productive and competitive. yet at present Auto Modeler is not ready for commercial use. But it is hoped that future versions of Auto Modeler will be able to cater to the take of the Software Houses.1.3. Background1.3.1. The need for Automated Software Engineering Tools In this era of Information Technology great demands are placed on Software Systems and on all those that are involved in the SDLC. The developed software should not only be of high quality however it should excessively be developed in minimal amount of time. When it comes to Software quality, the software must(prenominal) be highly reliable and it should meet the clients require and it should satisfy the custo mers expectations.Automated Software Engineering Tools evoke assist the Software Engineers and Software Developers in producing High Quality Software in minimal amount of time.1.3.2. Requirements Engineering Requirements engineering consists of the quest tasks 6 Requirements Elicitation Requirements Analysis Requirements Specification Requirements Validation / Verification Requirements ManagementRequirements engineering is recognized as a critical task, since many software failures originate from inconsistent, partial or simply in invent System Requirements specifications.1.3.3. Natural Language Requirement Specifications Formal methods have been successfully apply to express Requirements Specifications, but often the customer sessnot understand them and therefore cannot validate them 4. Natural Language is the only common medium understood by both the Customer and the Analyst 4. So the System Requirements Specifications are often written in Natural Language.1.3.4. Object Orien ted Analysis The System Analyst must manually serve The Natural Language Requirements Specifications Document and perform an OO Analysis and produce the results in the form of an UML Model, which has become a Standard in the Software Industry. The manual process is laborious, time down and often prone to errors. Some specified requirements might be left out. If there are problems or errors in the original requirements specifications, they may not be discovered in the manual process.OOA applies the OO paradigm to models of proposed systems by defining classes, rejects and the transactionhips between them. Classes are the most master(prenominal) building block of an OO system and from these we instantiate objects. at once an individual object is created it inherits the same operations, relationships, semantics, and attributes identified in the class. Attributes of classes, and hence objects, hold values of puritanicalties. Operations, also called methods, describe what can be done to an object/class.1A relationship between classes/objects can show various attributes such as aggregation, composition, generalization and dependency. Attributes and operations represent the semantics of the class, while relationships represent the semantics of the model 1. The KRB seven-step method, introduced by Kapur, Ravindra and Brown, proposes how to regain classes and objects manually 1. Hence,Identify candidate classes (nouns in NL). Define classes (look for instantiations of classes). Establishing associations (capturing verbs to create association for each pair of classes in 1 and 2). Expanding many-to-many associations. Identify class attributes. Normalize attributes so that they are associated with the class of objects that they truly describe. Identify class operations.From this process we can see that one goal of OOA is to line NL concepts that can be transformed into OO concepts which can because be used to form system models in particular notations. Here we shall concentrate on UML 1.1.3.5. Natural Language touch on ( human lyric poem technology) If an automatic analysis of the NL Requirements Document is carried out then it is not only viable to quickly find errors in the Specifications but with the right methods we can quickly generate a UML model from the Requirements.Although, Natural talking to is inherently ambiguous, imprecise and incomplete often a native language record is redundant, and several classes of terminological problems (e.g., jargon or specialist terms) can arise to make communication difficult 2 and it has been proven that Natural Language touch with holistic objectives is a very complex task, it is possible to extract sufficient meaning from NL decrys to produce reliable models. Complexities of language range from simple synonyms and antonyms to such complex issues as idioms, anaphoric relations or metaphors. Efforts in this particular area have had some success in generating static object models employ s ome complex NL requirement metres.1.3.5.1. linguistic analysis Linguistic analysis studies NL text from different lingual levels, i.e. words, sentence and meaning.1(i) Word-tagging analyses how a word is used in a sentence. In particular, words can be mutable from one sentence to other depending on context (e.g. light can be used as noun, verb, adjective and adverb and while can be used as preposition, conjunction, verb and noun). Tagging techniques are used to specify word-form for each single word in a sentence, and each word is tagged as a take apart Of wrangle (POS), e.g. a NN1 tag would denote a singular noun, while VBB would signify the base form of a verb.1(ii) Syntactic analysis applies phrase marker, or labeled bracketing, techniques to segment NL as phrases, clauses and sentences, so that the NL is delineated by syntactical/grammatical annotations. Hence we can shows how words are grouped and connected to each other in a sentence.1(iii) Semantic analysis is the study of the meaning. It uses dis feed in annotation techniques to analyze open-class or content words and closed-class words (i.e. prepositions, conjunctions, pronouns). The POS tags and syntactic elements mentioned previously can be linked in the NL text to create relationships.Applying these linguistic analysis techniques, NLP tools can carry out morphological processing, syntactic processing and semantic processing. The processing of NL text can be supported by Semantic Network (SN) and corpora that picture a familiarity base for text analysis.The difficulty of OOA is not just due to the ambiguity and complexity of NL itself, but also the gap in meaning between the NL concepts and OO concepts.11.3.6. From NLP to UML Model Creation. After NLP the sentences are simplified in order to make identification of UML model elements form NL elements easy. Simple Heurists are used to Identify UML Model elements from Natural textual matter (see Chapter 7)* Nouns indicate a class* Verb indic ates an operation* Possessive relationships and Verbs like to have, identify, denote indicate attributes* Determiners are used to identify the multiplicity of roles in associations.1.5. Plan of the thesisIn Chapter 2 we present a brief survey of previous work and work similar to our work. Chapters 3, 4, 5, 6 and 7 describe the theoretical basis for Auto Modeler. Chapter 8 Describes the Arc advanceecture of Auto Modeler. In Chapter 9 we describe Auto Modeler in action with a case study. In Chapter 10 we present conclusions.2. Literature SurveyThe first relevant published technique attempting to produce a systematic procedure to produce number models from NL requirements was Abbot. Abbott (1983) proposes a linguistic establish method for analyzing software requirements, expressed in English, to derive basic data types and operations. 1This approach was further developed by Booch (1986). Booch describes an Object-Oriented Design method where nouns in the problem description suggest ob jects and classes of objects, and verbs suggest operations.1Saeki et al. (1987) describe a process of incrementally constructing software modules from object-oriented specifications obtained from informal pictorial language requirements. Their system analyses the informal requirements one sentence at a time. Nouns and verbs are automatically extracted from the informal requirements but the system cannot determine which words are relevant for the construction of the formal specification. Hence an important role is played by the human analyst who reviews and refines the system results manually after each sentence is processed.1Dunn and Orlowska (1990) describe a natural language vocalisation for the construction of NIAM (Nijssens, or Natural-language, Information Analysis Method ) conceptual schemas. The construction of conceptual schemas involves allocating surface objects to entity types (semantic classes) and the identification of elementary fact types. The system accepts declara tive sentences only and uses grammar rules and a dictionary for type allocation and the identification of elementary fact types.1Meziane (1994) implemented a system for the identification of VDM data types and simple operations from natural language software requirements. The system first generates an Entity-Relationship Model (ERM) from the input text and then generates VDM data types from the ERM.1Mich and Garigliano (1994) and Mich (1996) describe an NL-based prototype system, NL-OOPS, that is aimed at the generation of object-oriented analysis models from natural language specifications. This system demonstrated how a large scale NLP system called LOLITA can be used to support the OO analysis stage.1V. Ambriola and V. Gervasi.4 have developed CIRCE an environment for the analysis of natural language requirements. It is based on the concept of successive transformations that are applied to the requirements, in order to obtain concrete (i.e., rendered) views of models extracted fr om the requirements. CIRCE uses, CICO a domain-based, fuzzy matching, parser which parses the requirements document and metamorphoses it into an abstract parse tree. This parse tree is encoded as tuples and stored in a circumstancesd repository by CICO. A group of cerebrate tuples constitutes a T-Model. CIRCE uses internal tools to refine the encoded tuples called extensional knowledge and the knowledge about the basic behavior of software systems called intentional knowledge derived from modelers to further enrich the Tuple space. When a specific concrete view on the requirements is desired, a projector is called to build an abstract view of the data from the tuple space. A translator then converts the abstract view to a concrete view. In 5 V. Ambriola and V. Gervasi describe their experience of automatic synthesis of UML diagrams from Natural Language Requirement Specifications using their CIRCE environment.Delisle et al., in their project DIPETT-HAIKU, capture candidate object s, lingually differentiating between Subjects (S) and Objects (O), and processes, Verbs (V), using the syntactic S-V-O sentence social organisation. This work also suggests that candidate attributes can be found in the noun modifier in compound nouns, e.g. dumb is the value of an attribute of reserved book.1Harmain and Gaizauskas developed a NLP based CASE tool, CM-Builder 23, which, automatically constructs an initial class model from NL text. It captures candidate classes, quite a than candidate objects.Brstler constructs an object model automatically based on pre-specified tombstone words in a use case description. The verbs in the key words are transformed to behaviors and nouns are transformed to objects.1Overmyer and Rambow developed NLP system to construct UML class diagrams from NL descriptions. Both these efforts require user interaction to identify OO concepts.1The prototype tool developed by Perez-Gonzalez and Kalita supports automatic OO modeling from NL problem desc riptions into UML notations, and produces both static and dynamic views. The underlying methodology includes theta roles and semi-natural language.13. Software Requirements EngineeringSoftware requirements engineering is the science and discipline refer with establishing and documenting software requirements 6. It consists of* Software requirements elicitation- The process through which the customers (buyers and/or users) and the developer (contractor) of a software system discover, review, articulate, and understand the users needs and the constraints on the software and the development activity.* Software requirements analysis- The process of analyzing the customers and users needs to arrive at a definition of software requirements.* Software requirements specification- The development of a document that clearly and precisely records each of the requirements of the software system.* Software requirements verification- The process of ensuring that the software requirements specifi cation is in compliance with the system requirements, conforms to document standards of the requirements phase, and is an adequate basis for the architectural (preliminary) design phase.* Software requirements management- The planning and controlling of the requirements elicitation, specification, analysis, and verification activities.In turn, system requirements engineering is the science and discipline concerned with analyzing and documenting system requirements. It involves transforming an operational need into a system description, system performance parameters, and a system configurationThis is accomplished through the use of an iterative process of analysis, design, trade-off studies, and prototyping.Software requirements engineering has a similar definition as the science and discipline concerned with analyzing and documenting software requirements. It involves partitioning system requirements into major subsystems and tasks, then allocating those subsystems or tasks to softw are. It also transforms allocated system requirements into a description of software requirements and performance parameters through the use of an iterative process of analysis, design, trade-off studies, and prototyping. A system can be considered a collection of hardware, software, data, people, facilities, and procedures organized to accomplish some common objectives. In software engineering, a system is a set of software programs that provide the cohesiveness and control of data that enables the system to solve the problem.6The major difference between system requirements engineering and software requirements engineering is that the origin of system requirements lies in user needs while the origin of software requirements lies in the system requirements and/or specifications. Therefore, the system requirements engineer works with users and customers, eliciting their needs, schedules, and available resources, and must produce documents comprehendible by them as well as by manage ment, software requirements engineers, and other system requirements engineers.The software requirements engineer works with the system requirements documents and engineers, translating system documentation into software requirements which must be understandable by management and software designers as well as by software and system requirements engineers. Accurate and timely communication must be ensured all along this chain if the software designers are to begin with a valid set of requirements. 64. Automated Software Engineering ToolsSoftware engineering is concerned with the analysis, design, implementation, testing, and maintenance of large software systems. Automated software engineering focuses on how to automate or partially automate these tasks to achieve evidential improvements in quality and productivity.Automated software engineering applies computation to software engineering activities. The goal is to partially or fully automate these activities, thereby significantly increasing both quality and productivity. This includes the study of techniques for constructing, understanding, adapting and modeling both software artifacts and processes. Automatic and collaborative systems are both important areas of automated software engineering, as are computational models of human software engineering activities. Knowledge representations and artificial intelligence techniques applicable in this field are of particular interest, as are formal techniques that support or provide theoretical foundations.7Automated software engineering approaches have been applied in many areas of software engineering. These include requirements definition, specification, architecture, design and synthesis, implementation, modeling, testing and quality assurance, verification and validation, maintenance and evolution, configuration management, deployment, reengineering, reuse and visualization. Automated software engineering techniques have also been used in a unsubtle range o f domains and application areas including industrial software, embedded and real-time systems, aerospace, automotive and medical systems, Web-based systems and computer games.7Research into Automated Software Engineering includes the following areas* Automated reason techniques* Component-based systems* Computer-supported cooperative work* Configuration management* Domain modeling and meta-modeling* Human-computer interaction* Knowledge acquisition and management* Maintenance and evolution* Model-based software development* manakin language semantics* Ontologies and methodologies* Open systems development* Product line architectures* Program understanding* Program synthesis* Program transformation* Re-engineering* Requirements engineering* Specification languages* Software architecture and design* Software visualization* Testing, verification, and validation* Tutoring, help, and documentation systems5. Natural Language ProcessingNatural language processing (NLP) is a subfield of a rtificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.5.1. Language ProcessingLanguage processing can be divided into two tasks11* Processing written text, using lexical, syntactic, and semantic knowledge of the language as well as any required real world information.11* Processing spoken language, using all the information needed above, plus additional knowledge about phonology as well as enough additional information to do by the further ambiguities that arise in speech.115.2. Uses for NLP5.2.1. User interfaces. Better than obscure command languages. It would be nice if you could just tell the computer what you want it to do. Of course we are talking about a textual interface not speech.105.2.2. Knowledge-Acquisition. Programs that could read books and manuals or the newspaper. So you dont have to explicitly encode all of the knowledge they need to solve problems or do whatever they do.105.2.3. Information Retrieval. Find articles about a given topic. Program has to be able somehow to determine whether the articles match a given query.105.2.4. Translation. It sure would be nice if machines could automatically translate from one language to another. This was one of the first tasks they tried applying computers to. It is very hard.105.3. Linguistic levels of AnalysisLanguage obeys regularities and exhibits useful properties at a number of somewhat separable levels.10Think of language as transfer of information. It is a great deal more than that. But that is a good place to start.Suppose that the speaker has some meaning that they wish to convey to some hearer.10Speech (or gesture) imposes a linearity on the signal. wholly you can play with is the properties of a sequence of tokens. Actually, why tokens? Well for one thing that makes it possible to learn.10So the other thing to play with is the order the tokens can occur.So somehow, a meaning gets encoded as a sequence of tokens, each of which has some set of distinguishable properties, and is then interpreted by figuring out what meaning corresponds to those tokens in that order.10Another way to think about it is that the properties of the tokens and their sequence somehow elicits an understanding of the meaning. Language is a set of resources to enable us to share meanings, but isnt best thought of as a means for *encoding* meanings. This is a sort of philosophical issue perhaps, but if this point of view is true, it makes much of the AI approach to NLP somewhat suspect, as it is really based on the encoded meanings view of language.10The lowest level is the actual properties of the signal streamphonology speech sounds and how we make themmorphology the social organisation of wordssyntax how the sequences are structuredsemantics meanings of the stringsThere are important interfaces among all of these levels. For example sometimes the meaning of sentences can determine how individual words are pronounced.10This many levels is obviously needed. But language turns out to be more clever than this. For example, language can be more efficient by not having to say the same thing twice, so we have pronouns and other ways of making use of what has already been saidA bear went into the woods. It found a tree.Also, since language is most often used among people who are in the same situation, it can make use of features of the situationthis/thatyou/me/theyhere/therenow/thenThe mechanisms whereby features of the context, whether it is the context created by a sequence of sentences, or the actual context where the mouth happens is called pragmatics.10Another issue has to do with the fact that the simple model of l anguage as information transfer is clealy not right. For one thing, we know there are at least the following three types of sentencesstatementsimperativesquestionsAnd each of them can be used to do a different kind of thing. The first *might* be called information transfer. But what about imperatives? What about questions? To some degree the analysis of such sentences can involve the ideas of a basic notion of meaning Speech acts.10There are other, high-levels of structuring that language exhibits. For example there is conversational structure, where people know when they get to talk in a conversation, and what constitutes a valid contribution. There is narrative structure whereby stories are put together in ways that make sense and are interesting. There is expository structure which involves the way that informative texts (like encyclopedias) are arranged so as to usefully convey information. These issues blend off from linguistics into literature and library science, among other things.10Of course with hypertext and multi-media and virtual reality, these higher levels of structure are being explored in new ways.105.4. Steps in Natural Language UnderstandingThe steps in the process of natural language understanding are115.4.1. Morphological analysisIndividual words are analyzed into their components, and non-word tokens (such as punctuation) are separated from the words. For example, in the phrase Bills house the proper noun Bill is separated from the possessive suffix s.115.4.2. Syntactic analysis. Linear sequences of words are transformed into structures that show how the words relate to one another. This parsing step converts the immediately list of words of the sentence into a structure that defines the units represented by that list. Constraints imposed include word order (manager the key is an illegal constituent in the sentence I gave the manager the key) number agreement case agreement.115.4.3. Semantic analysis. The structures created by the synta ctic analyzer are assigned meanings. In most universes, the sentence Colorless green ideas sleep furiously Chomsky, 1957 would be rejected as semantically anomalous. This step must map individual words into appropriate objects in the knowledge base, and must create the correct structures to correspond to the way the meanings of the individual words combine with each other. 115.4.4. Discourse integration. The meaning of an individual sentence may depend on the sentences that precede it and may influence the sentences yet to come. The entities involved in the sentence must either have been introduced explicitly or they must be think to entities that were. The overall discourse must be coherent. 115.4.5. mulish analysis. The structure representing what was said is reinterpreted to determine what was actually meant. 115.5. Syntactic ProcessingSyntactic parsing determines the structure of the sentence being analyzed. Syntactic analysis involves parsing the sentence to extract whatever information the word order contains. Syntactic parsing is computationally less expensive than semantic processing.10A grammar is a declarative representation that defines the syntactic facts of a language. The most common way to represent grammars is as a set of production rules, and the simplest structure for them to build is a parse tree which records the rules and how they are matched. 10Sometimes backtracking is required (e.g., The horse raced past the barn fell), and sometimes multiple interpretations may exist for the rootage of a sentence (e.g., Have the students who missed the exam ). 10Example Syntactic processing interprets the difference between John hit Mary and Mary hit John.5.6. Semantic AnalysisAfter (or sometimes in conjunction with) syntactic processing, we must still produce a representation of the meaning of a sentence, based upon the meanings of the words in it. The following steps are usually taken to do this 105.6.1. Lexical processing. Look up the individual words in a dictionary. It may not be possible to choose a single correct meaning, since there may be more than one. The process of determining the correct meaning of individual words is called word sense disambiguation or lexical disambiguation. For example, Ill meet you at the diamond can be understood since at requires either a time or a location. This usually leads to preference semantics when it is not clear which definition we should prefer. 105.6.2. Sentence-level processing. There are several approaches to sentence-level processing. These include semantic grammars, case grammars, and conceptual dependencies. 10Example Semantic processing determines the differences between such sentences as The ink is in the pen and The ink is in the pen.5.6.3. Discourse and Pragmatic Processing. To understand most sentences, it is necessary to know the discourse and pragmatic context in which it was uttered. In general, for a program to participate intelligently in a dialog, it must be able to represent its own beliefs about the world, as well as the beliefs of others (and their beliefs about its beliefs, and so on).10The context of goals and plans can be used to aid understanding. Plan recognition has served as the basis for many understanding programs PAM is an early example. 105.7. Issues in SyntaxFor various reasons, a lot of attention in computational linguistics has been paid to syntax. part this has to do with the fact that real linguistics have spent a lot of work on it. Partly because it needs to be done before just about anything else can be done. I wont talk much about morphology. We will admit that words can be associated with a set of features or properties. For example the word dog is a noun, it is singular, its meaning involves a kind of animal. The word dogs is related, obviously, but has the property of being plural. The word eat is a verb, it is in what we might call the base form, it denotes a particular kind of action. The word ate is related, it is in the past tense form. You can imagine Im sure that the techniques of knowledge representation that we have looked at can be applied to the problem of representing facts about the properties and relations among words. 11The key observation in the theory of syntax is that the words in a sentence can be more or less naturally grouped into what are called phrases, and those phrases can often be treated as a unit.So in a sentence The dog chased the bear, the sequence the dog forms a natural unit. The sequence chased the bear is a natural unit, as is the bear.11Why do I say that the dog is a natural unit? Well one thing is that I can replace it by another sequence that has the same referent, or a related referent. For example I could replace it by 11Snoopy (a name)It (a pronoun)My brothers favorite pet (a more complex description)What about chased the bear? Again, I could replace it bydied (a single word)was hit by a truck (a more complex event)This basic structure, in English, is sometimes called the subject-predicate structure. The subject is a nominal, something that can refer to an object or thing, the predicate is a verb phrase, which describes an action or event. Of course, as in the example, the verb phrase can also contain other constituents, for example another nominal. 11These phrases also have structure. For example a noun phrase (a kind of nominal) can have a determiner, zero or more adjectives, and a noun, maybe followed by another phrase, likethe big dog that ate my homeworkVerb phrases can have complicated verb groups likewill not be eatenSyntactic theories try to calculate and explain what patterns are used in a language. Sometimes this involves figuring out what patterns just dont work. For example the following sentences have something wrong with them 11* the dogs runs home* he died the book* she saw himself in the mirror* they told it to sheFiguring out exactly what is wrong with such sentences allows linguists to create theories that help understand the way that sentences
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.