What is Parsing?

Parsing is the process of converting formatted text into a data structure. A data structure type can be any suitable representation of the information engraved in the source text. Tree type is a common and standard choice for XML parsing, HTML parsing, JSON parsing, and any programming language parsing. The output tree is called Parse Tree or Abstract Syntax Tree (AST). In HTML context, it is called Document Object Model (DOM). A CSV file parsing can result in a List of List of values or aList of Record objects. Graph Type is a choice for natural language parsing. A piece of program that does parsing is called Parser.

Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information (p-values).[citation needed] Some parsing algorithms may generate a parse forest or list of parse trees for a syntactically ambiguous input.[2] The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."[1] This term is especially common when discussing which linguistic cues help speakers interpret garden-path sentences. Within computer science, the term is used in the analysis of computer languages, referring to the syntactic analysis of the input code into its component parts in order to facilitate the writing of compilers and interpreters. The term may also be used to describe a split or separation.

Types of Parsers

LALR(1) Parser

In computer science, an LALR parser[a] or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, rightmost derivation.) The LALR parser was invented by Frank DeRemer in his 1969 PhD dissertation, Practical Translators for LR(k) languages,[1] in his treatment of the practical difficulties at that time of implementing LR(1) parsers. He showed that the LALR parser has more language recognition power than the LR(0) parser, while requiring the same number of states as the LR(0) parser for a language that can be recognized by both parsers. This makes the LALR parser a memory-efficient alternative to the LR(1) parser for languages that are LALR. It was also proven that there exist LR(1) languages that are not LALR. Despite this weakness, the power of the LALR parser is sufficient for many mainstream computer languages,[2] including Java,[3] though the reference grammars for many languages fail to be LALR due to being ambiguous.[2] The original dissertation gave no algorithm for constructing such a parser given a formal grammar. The first algorithms for LALR parser generation were published in 1973.[4] In 1982, DeRemer and Tom Pennello published an algorithm that generated highly memory-efficient LALR parsers.[5] LALR parsers can be automatically generated from a grammar by an LALR parser generator such as Yacc or GNU Bison. The automatically generated code may be augmented by hand-written code to augment the power of the resulting parser.

Parser Visualizer

What is Parsing?

Types of Parsers

LALR(1) Parser

SLR(1) Parser

Try The Visualizer!

Team Members

Swarga Raj Dutta

Ritika Saxena