https://ir.dila.edu.tw//handle/123456789/975
標題: | 《大正藏》經文中大範圍的文字重用現象之偵測與分析 Detection and Analysis of Textual Reuse in Taishō Tripiṭaka |
作者: | 韓東霖 | 關鍵字: | 文字重用;大正新脩大藏經;量化分析;數位人文;Textual Reuse;Taishō Tripiṭaka;Local Alignment;quantitative analysis;Digital Humanities | 公開日期: | 七月-2020 | 摘要: | 近年來在人文研究界興起的一大變化,便是數位人文研究熱潮的興起。數位人文研究方法主要著眼於利用電腦能高速進行大量運算與比對的長處,搭配近年來大量完成的數位化內容,期望進行傳統人文研究較難以處理的大尺度問題。其中,文字重用是近年來被關注的一個出現於文本內部互相引用的特殊現象。所謂文字重用乃指以暗喻、改述,或甚至是逐字引用形式的存在,並且發生於某位作者借用或是再度使用前人或者當代的另一位作者的文章。透過文字重用的梳理,我們得以發現經文之間隱含的引用現象及其歷史傳承。而在佛典中,文字重用的現象相當頻繁,但針對大範圍經文當中的文字重用現象進行之研究卻鮮少見到。因此在本研究中,我們提出一種能有效找出文字重用現象的演算法,針對鮮少被大範圍分析研究的佛經經文進行比對,目的在於一次性的找出經文之間特有或是有意的文字重用現象。我們用統計與量化的方式來呈現我們所發現的比對結果,歸納整理這些重複文句的可能分類並進行篩選,進一步統計經典之間的重用比例,並與現存的文獻研究進行比對與驗證。本研究選擇以中華電子佛典協會所製作發行之電子佛典集成所收錄的《大正新脩大藏經》為資料來源,透過Local Alignment演算法逐一進行文句比對,目的在於找出經典之間的重複、並且重用長度夠長的文字重用現象。接著我們以重用長度與重用頻率兩個面相,觀察這些比對結果,並進行分類與篩選,進一步向上集合匯總,以找出經典之間獨有或是有意的文字重用段落所呈現的重用比例。研究結果顯示,經文中存在許多長度十分驚人的重複文句,大多來自於經錄、佛名經類的經典;而大量高度相似的重複文句則較接近於佛教經典內的專有術語與慣用表達方式,並且由許多不同的相似短句前後交互壘疊而成。透過清理這些高度相似的重複文句後我們由經典間的重用比例中發現:經典之間獨有或是有意的的重用現象,大多來自於同本異譯的經典之外,也找到許多高重用比例的經咒儀軌與注疏類經典;而其中也存在著許多部類不同,但重用比例卻非常高的經對,非常值得我們進行更深入的研究探討。 After Buddhism was introduced into China, it led to a large number of translation activities of Buddhist scriptures. These texts were later collected together to become Buddhist Canons. Nowadays, Taishō Tripiṭaka is the main source for the modern Buddhist scholars and researchers in their study of Buddhism. Large amounts of Buddhist texts present many issues worth studying. One of the research topics that has been widely discussed in recent years is the phenomenon of textual reuse between texts. By analyzing the textual reuse between texts, we are able to discover the implicit citation and historical inheritance between scriptures. However, due to the immense size of the Taishō Tripiṭaka, we rarely see the scope of research studies regarding textual reuse in the whole Taishō Tripiṭaka carried out in large-scale. In recent years, with the development of information technology, Digital Humanities has become an emerging topic in the traditional humanities research community. The digital methods mainly focus on the use of computer’s high-speed computing and precise comparison capabilities to deal with large-scale tasks that are difficult to be completed by traditional humanities research methods. In this study, we propose an effective algorithm that can detect and analyze the textual reuse phenomenon in the Taishō Tripiṭaka, and calculate the textual reuse ratio between texts. We then compare the results of our algorithm with those of the existing research studies. Our research methods are listed as follows: (1) take the XML files of whole Taishō Tripiṭaka as our main materials for this study. (2) split the texts into sentences. (3) pair sentences for preliminary pairwise comparison, and rule out the sentences pairs with less than a preset number of characters in common. (4) performs the Local Alignment algorithm, which is commonly used to align long DNA sequences, for identifying repeated passages between sentences in the Taishō Tripiṭaka. The results from the statistical analysis shows that: (1) extremely long repeated passages between texts, often happen in the text related to Tripitaka Catalogues and list of “Buddhas’ Names”. (2) huge amounts of similar patterns between texts can be understood as an idiom or common usages in Buddhist texts. (3) from the perspective of the proportion of reused paragraphs between texts, different translations of the same text tend to produce higher textual reuse percentages. However, we have also found that many texts are differently categorized, but have many similar common paragraphs, some of which have not even been discovered by the previous research studies. These results provide interesting clues for future research. |
描述: | 碩士論文 | URI: | http://172.27.2.131/handle/123456789/975 |
顯示於: | 佛教學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。