. . A Sanskrit Compound Processor . . . . . Amba Kulkarni Anil Kumar Department of Sanskrit Studies University of Hyderabad Hyderabad June 21, 2012 . . . . . . 1 / 48
Sanskrit is very rich in compound formation. vedaved¯ a˙ ngatatvaj˜ na pravaramukut .aman .imaric¯ ıma˜ njar¯ ıcayacarcitacaran .ayugula jal¯ adivy¯ ıtv¯ abh¯ apakapr .thiv¯ avapratiyogipr .thiv¯ ıtvavat¯ ı . . . . . . 2 / 48
Sanskrit Compounds . . It is a single word (ekapadam). It has a single case suffix (ekavibhaktikam) with an exception of aluk compounds such as yudhis .t .irah ., where there is no deletion of case suffix of the first component. It has a single accent(ekasvarah .). The order of components in a compound is fixed. No words can be inserted in between the compounds. The compound formation is binary with an exception of dvandva and bahupada bahuvr¯ ıhi. Euphonic change (sandhi) is a must in a compound formation. Constituents of a compound may require special gender or number different from their default gender and number. e.g. p¯ an .ip¯ adam, p¯ acik¯ abh¯ aryah ., etc. . . . . . . . . . . . 3 / 48
Syntactic classification sup¯ a ˙ m sup¯ a ti˙ n¯ a n¯ amn¯ a dh¯ atun¯ atha ti˙ n¯ a ˙ m ti˙ n¯ a | subanteti vij˜ neyah . sam¯ asah . s .ad .vidhoh . budheh . || Subanta (noun) + Subanta (noun) ( r¯ ajapurus . ah . ) Subanta (noun) + Tinanta (verb) ( paryyabh¯ us . ayat ) Subanta (noun) + n¯ ama (nominal base) ( kumbhak¯ arah . ) Subanta (noun) + Dh¯ atu (verbal root) ( kat .apr¯ u ) . ¯ Tinanta (verb) + Subanta (noun) ( kr .ntavicaks . an a ) Tinanta (verb) + Tinanta (verb) ( kh¯ adata -modata) . . . . . . 4 / 48
Semantic classification The Sanskrit compounds are classified semantically into four major types: . Tatpurus .ah . . . . (Endocentric with head typically to the right) . . . . . . Bahuvr¯ ıhih . . . . (Exocentric) . . . . . . Dvandvah . . . . (Copulative) . . . . . . ıbh¯ Avyay¯ avah . . . . (Endocentric with head typically to the left and behaves as an indeclinable) . . . . . . . . . . . 5 / 48
Compound processor . . . Segmentation (Sam¯ asapadacchedah .) 1 . . . Constituency Parsing (S¯ amarthyanirdh¯ aran .am) 2 . . . Type Identification (Sam¯ asabhedanirdh¯ aran .am) 3 . . . Paraphrase generation (Vigrahav¯ akyanirm¯ an .am) 4 . . . . . . 6 / 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . Segmentation (Sam¯ asapadacchedah .) . . . Split a compound into its constituents. tapassv¯ adhy¯ ayaniratam is segmented as . tapas-sv¯ adhy¯ aya-niratam . . . . . . . . . . 7 / 48
. . . . . . . . . . . . . . . . . . . Segmentation (Sam¯ asapadacchedah .) . . . Split a compound into its constituents. tapassv¯ adhy¯ ayaniratam is segmented as . tapas-sv¯ adhy¯ aya-niratam . . . . . Constituency Parsing (S¯ amarthyaniradh¯ aran .am) . . . This module parses the segmented compound syntactically by pairing up the constituents in a certain order two at a time. tapas-sv¯ adhy¯ aya-niratam is parsed as . << tapas-sv¯ adhy¯ aya > -niratam > . . . . . . . . . . 7 / 48
. . . . . . . . . . Segmentation (Sam¯ asapadacchedah .) . . . Split a compound into its constituents. tapassv¯ adhy¯ ayaniratam is segmented as . tapas-sv¯ adhy¯ aya-niratam . . . . . Constituency Parsing (S¯ amarthyaniradh¯ aran .am) . . . This module parses the segmented compound syntactically by pairing up the constituents in a certain order two at a time. tapas-sv¯ adhy¯ aya-niratam is parsed as . << tapas-sv¯ adhy¯ aya > -niratam > . . . . . Type Identification (Sam¯ asabhedanirdh¯ aran .am) . . . Decide the type of a compound at each node of composition. . << tapas-sv¯ adhy¯ aya > Di-niratam > T7 . . . . . . . . . . 7 / 48
. Segmentation (Sam¯ asapadacchedah .) . . . Split a compound into its constituents. tapassv¯ adhy¯ ayaniratam is segmented as . tapas-sv¯ adhy¯ aya-niratam . . . . . Constituency Parsing (S¯ amarthyaniradh¯ aran .am) . . . This module parses the segmented compound syntactically by pairing up the constituents in a certain order two at a time. tapas-sv¯ adhy¯ aya-niratam is parsed as . << tapas-sv¯ adhy¯ aya > -niratam > . . . . . Type Identification (Sam¯ asabhedanirdh¯ aran .am) . . . Decide the type of a compound at each node of composition. . << tapas-sv¯ adhy¯ aya > Di-niratam > T7 . . . . . Paraphrase generation (Vigrahav¯ akyanirm¯ an .am) . . . tapah . ca sv¯ adhy¯ ayah . ca = tapassv¯ adhy¯ ayah . (= tat1) gloss: penance and self-study tasmin niratah . = tapassv¯ adhy¯ ayaniratah . . gloss: who is constantly engaged in penance and self-study . . . . . . . . . . 7 / 48
Compound processor . . . Segmentation (Sam¯ asapadacchedah . ) 1 . . . Constituency Parsing (S¯ amarthyaniradh¯ aran .am) 2 . . . Type Identification (Sam¯ asabhedanirdh¯ aran .am) 3 . . . Paraphrase generation (Vigrahav¯ akyanirm¯ an .am) 4 . . . . . . 8 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. . . . . . . 9 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. The compound formation involves a mandatory sandhi. . . . . . . 9 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. The compound formation involves a mandatory sandhi. Each sandhi rule is a triple (x, y, z) where y is the last letter of the first primitive, z is the first letter of the second primitive, and x is the letter sequence resulting from the euphonic combination. . . . . . . 9 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. The compound formation involves a mandatory sandhi. Each sandhi rule is a triple (x, y, z) where y is the last letter of the first primitive, z is the first letter of the second primitive, and x is the letter sequence resulting from the euphonic combination. For analysis, we reverse these rules of sandhi and produce y + z corresponding to a x . . . . . . . 9 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. The compound formation involves a mandatory sandhi. Each sandhi rule is a triple (x, y, z) where y is the last letter of the first primitive, z is the first letter of the second primitive, and x is the letter sequence resulting from the euphonic combination. For analysis, we reverse these rules of sandhi and produce y + z corresponding to a x . Only the sequences that are morphologically valid are selected. . . . . . . 9 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. The compound formation involves a mandatory sandhi. Each sandhi rule is a triple (x, y, z) where y is the last letter of the first primitive, z is the first letter of the second primitive, and x is the letter sequence resulting from the euphonic combination. For analysis, we reverse these rules of sandhi and produce y + z corresponding to a x . Only the sequences that are morphologically valid are selected. We follow GENerate-CONstrain-EVALuate paradigm attributed to the Optimality Theory for segmentation. . . . . . . 9 / 48
Compound Segmenter The task of a segmenter is to split a given sequence of phonemes into a sequence of morphologically valid segments. The compound formation involves a mandatory sandhi. Each sandhi rule is a triple (x, y, z) where y is the last letter of the first primitive, z is the first letter of the second primitive, and x is the letter sequence resulting from the euphonic combination. For analysis, we reverse these rules of sandhi and produce y + z corresponding to a x . Only the sequences that are morphologically valid are selected. We follow GENerate-CONstrain-EVALuate paradigm attributed to the Optimality Theory for segmentation. The Optimality Theory basically addresses the issue of generation. . . . . . . 9 / 48
. . . . . . . . . . . . . . . Flow-chart represetation of Compound Segmentation . . . . . . 10 / 48
. . . . . . . . . . . . Flow-chart represetation of Compound Segmentation The basic outline of the algorithm is: . . . 1 Recursively break a word at every possible position applying a sandhi rule and generate all possible candidates for the input. (17 segments) . . . . . . 10 / 48
. . . . . . . . . Flow-chart represetation of Compound Segmentation The basic outline of the algorithm is: . . . 1 Recursively break a word at every possible position applying a sandhi rule and generate all possible candidates for the input. (17 segments) . . . 2 Pass the constituents of all the candidates through the morph analyser. . . . . . . 10 / 48
Recommend
More recommend