《中阿含經》與《增壹阿含經》之文本翻譯風格量化分析與相似斷詞自動化擷取

董惠珠

DC Field	Value	Language
dc.contributor.advisor	洪振洲	en_US
dc.contributor.author	董惠珠	en_US
dc.date.accessioned	2020-04-07T01:11:00Z	-
dc.date.available	2020-04-07T01:11:00Z	-
dc.date.issued	2018-01	-
dc.identifier.uri	http://172.27.2.131/handle/123456789/798	-
dc.description.abstract	《大正藏》經號T 26《中阿含經》與T 125《增壹阿含經》，兩經之譯者皆記載為僧伽提婆；對於現存《中阿含經》的譯者記錄，目前學界還沒有人提出異議，但是對於現存《增壹阿含經》的譯者記錄，各家說法不一，學界對此尚無定論。本研究嘗試利用統計量化分析的方式，對《中阿含經》與《增壹阿含經》進行翻譯風格分析，以此探討現存《中阿含經》與《增壹阿含經》是否來自相同譯者的作品。研究方法為：以「可變長度n-gram」（variable length n-gram，VL n-gram）為切詞方法，經由適當的篩選門檻找出風格特徵詞，再搭配主成分分析法（principal components analysis，PCA）進行統計分析，以之觀察兩經的翻譯風格是否具有一致性。分析結果顯示，兩經的翻譯風格有顯著的差異。本研究同時使用人工比對的方式從已經找出來的眾多風格特徵詞中尋找意義相似的斷詞，以此觀察兩個文本是否有用字不同卻是意義相似的詞彙或短語。經過人工判讀後，找到諸多例證顯示兩個文本翻譯風格之差異受到譯者用字習慣的影響。研究結果顯示，現存漢譯《中阿含經》和《增壹阿含經》，有極高的機率不是來自相同譯者的作品。在研究過程中，有鑑於以人工比對所需投入的大量工時，本研究也嘗試尋找一個自動化識別相似斷詞的方法，期能提高研究效率，並且因應日後巨量詞組的比對需求。我們以「最長共同子序列」（longest common subsequence，LCS）作為兩兩斷詞之間相似程度的衡量方法。實驗結果顯示，此衡量方法之成效雖非顯著，然而對於大量詞組的比對，仍不失為一個可用的方法；在演算結果中可能包含著關鍵性的線索，能夠提供學者作為進一步研究之用。	en_US
dc.description.abstract	In the Taishō Tripiṭaka, the translators of the Madhyama-āgama (T 26) and the Ekottarika-āgama (T 125) are both attributed to the same person, Gautama Saṅghadeva. So far, no one doubts the translator of the Madhyama-āgama is Gautama Saṅghadeva but there are different opinions among scholars concerning the translator of the Ekottarika-āgama. This study attempts to analyze the translation style of the Madhyama-āgama and the Ekottarika-āgama by quantitative methods, and discuss whether these two collections are the works of a same translator. The research methods are as follows: (1) the variable length n-gram (VL n-gram) is used to split text of T 26 and T 125 into shorter segments, called gram, (2) the grams that are used in more than an arbitrary threshold documents are adopted as “style features”, and (3) applying the principal components analysis (PCA) to the frequency of the style features of T 26 and T 125, the consistency of the translation style of these two collections is analyzed. The results from the statistical analysis show that the translation styles of these two collections are significantly different. In order to further strengthen the analysis results, we manually check the style features of the two collections to look for different phrase but sharing similar meanings in different collections. After the manual comparison, we find many examples indicating that the differences in translation styles between the two collections are indeed affected by the translator’s choice of word. These results again confirm the fact that the Madhyama-āgama and the Ekottarika-āgama are probably not the works of a same translator. Seeing the drawback of manual comparison which required a huge contribution of man-hours, this study also attempts to provide a solution to automatically identify similar phrases in order to reduce the man-hours and improve the research efficiency. We use the longest common subsequence (LCS) as a measurement for the degree of similarity between two phrases. The experimental results show that although the effect of LCS is not as significant, yet it is still a useful method to compare large data of phrases and some computational findings may suggest clues that intrigue further scholastic research.	en_US
dc.description.tableofcontents	摘要 i ABSTRACT ii 誌謝 iv 目次 v 表目錄 vii 圖目錄 viii 第一章緒論 1 第二章文獻回顧 9 第三章文本翻譯風格研究方法 19 （一）文本來源 19 （二）語料處理 21 （三）特徵值選取 21 （四）投入主成分分析運算並繪圖觀測 25 第四章文本翻譯風格實驗分析 26 （一）最低卷數門檻值設定在20的主成分分析結果 26 （二）最低卷數門檻值設定在40和60的主成分分析結果 28 （三）最低卷數門檻值設定在80、100和111的主成分分析結果 29 （四）主成分分析運算結果小結 32 （五）主成分分析運算結果之gram分析 32 第五章相似斷詞判讀 36 （一）何謂「相似斷詞」 37 （二）比較詞組 38 （三）人工比對 39 （四）判讀 40 （五）利用「相似斷詞」進行文本風格分析 44 第六章相似斷詞自動化擷取研究方法 49 （一）最長共同子序列（LCS） 49 （二）精確率、召回率、F1-度量 51 （三）K折交叉驗證 53 （四）同義詞語料 53 第七章相似斷詞自動化擷取實驗分析 54 （一）訓練最佳LCS相似度分數 55 （二）效能評估與分析 57 （三）加入同義詞 59 （四）加入同義詞後之效能評估與分析 61 （五）相似斷詞自動化擷取研究小結 65 第八章結論 66 參考文獻 68 一、佛教藏經或原典文獻（依經號排序） 68 二、中日文專書、論文或網路資源等 68 三、西文專書、論文或網路資源等 71 附錄1、歷代經錄之撰出年代及略稱 74 附錄2、加入同義詞之前各組「相似斷詞」的LCS相似度分數 74 附錄3、加入同義詞之後各組「相似斷詞」的LCS相似度分數 75	en_US
dc.language.iso	zh	en_US
dc.subject	中阿含經	en_US
dc.subject	增壹阿含經	en_US
dc.subject	翻譯風格	en_US
dc.subject	量化分析	en_US
dc.subject	可變長度n-gram	en_US
dc.subject	主成分分析法	en_US
dc.subject	最長共同子序列	en_US
dc.subject	madhyama-āgama	en_US
dc.subject	Ekottarika-āgama	en_US
dc.subject	translation style	en_US
dc.subject	quantitative analysis	en_US
dc.subject	variable length n-gram	en_US
dc.subject	principal components analysis	en_US
dc.subject	longest common subsequence	en_US
dc.title	《中阿含經》與《增壹阿含經》之文本翻譯風格量化分析與相似斷詞自動化擷取	en_US
dc.title	Quantitative Analysis of Translation Styles and Automatic Similar Phrases Identification of the Madhyama-āgama and the Ekottarika-āgama	en_US
dc.type	thesis	en_US
item.grantfulltext	none	-
item.fulltext	no fulltext	-
item.languageiso639-1	other	-
Appears in Collections:	佛教學系

Show simple item record

Page view(s)

139

checked on Mar 30, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM