Skip to content

Southeast Asia Natural Language Processing [Thai Vietnamese Khmer Lao Burmese(Myanmar) ]

License

Notifications You must be signed in to change notification settings

zhaoshiyu/SEANLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEANLP: Southeast Asia Natural Language Processing

东南äēšč¯­č¨€äŋĄæ¯å¤„į†

SEANLPį›Žå‰æœ‰äģĨ下功čƒŊīŧš

  • æŗ°č¯­īŧš

    • TCCīŧˆThai Character Clusterīŧ‰åˆ‡åˆ†
    • éŸŗčŠ‚åˆ‡åˆ†
    • åą‚å æĄäģļ随æœēåœēåˆ†č¯
    • å•åą‚æĄäģļ随æœēåœēåˆ†č¯
    • č¯å…¸æžé€Ÿåˆ†č¯
    • č¯å…¸æ­Ŗ向最é•ŋåŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€é•ŋåŒšé…åˆ†č¯
    • č¯å…¸æ­Ŗ向最įŸ­åŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€įŸ­åŒšé…åˆ†č¯
    • č¯æ€§æ ‡æŗ¨
    • åĨ子į›¸äŧŧåēĻ莥įŽ—
    • å…ŗé”Žč¯æŠŊ取
    • č‡Ē动摘čĻ
  • čļŠå—č¯­īŧš

    • æĄäģļ随æœēåœēåˆ†č¯
    • č¯å…¸æžé€Ÿåˆ†č¯
    • č¯å…¸æ­Ŗ向最é•ŋåŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€é•ŋåŒšé…åˆ†č¯
    • č¯å…¸æ­Ŗ向最įŸ­åŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€įŸ­åŒšé…åˆ†č¯
    • č¯æ€§æ ‡æŗ¨
    • åĨ子į›¸äŧŧåēĻ莥įŽ—
    • å…ŗé”Žč¯æŠŊ取
    • č‡Ē动摘čĻ
  • æŸŦåŸ”å¯¨č¯­īŧˆéĢ˜æŖ‰č¯­īŧ‰īŧš

    • KCC切分
    • æĄäģļ随æœēåœēåˆ†č¯
    • č¯å…¸æžé€Ÿåˆ†č¯
    • č¯å…¸æ­Ŗ向最é•ŋåŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€é•ŋåŒšé…åˆ†č¯
    • č¯å…¸æ­Ŗ向最įŸ­åŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€įŸ­åŒšé…åˆ†č¯
    • č¯æ€§æ ‡æŗ¨
    • åĨ子į›¸äŧŧåēĻ莥įŽ—
    • å…ŗé”Žč¯æŠŊ取
    • č‡Ē动摘čĻ
  • č€æŒč¯­īŧš

    • č¯å…¸æžé€Ÿåˆ†č¯
    • č¯å…¸æ­Ŗ向最é•ŋåŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€é•ŋåŒšé…åˆ†č¯
    • č¯å…¸æ­Ŗ向最įŸ­åŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€įŸ­åŒšé…åˆ†č¯
    • č¯æ€§æ ‡æŗ¨
    • åĨ子į›¸äŧŧåēĻ莥įŽ—
    • å…ŗé”Žč¯æŠŊ取
    • č‡Ē动摘čĻ
  • įŧ…į”¸č¯­īŧš

    • éŸŗčŠ‚åˆ‡åˆ†
    • æĄäģļ随æœēåœēåˆ†č¯
    • č¯å…¸æžé€Ÿåˆ†č¯
    • č¯å…¸æ­Ŗ向最é•ŋåŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€é•ŋåŒšé…åˆ†č¯
    • č¯å…¸æ­Ŗ向最įŸ­åŒšé…åˆ†č¯
    • č¯å…¸é€†å‘æœ€įŸ­åŒšé…åˆ†č¯
    • åĨ子į›¸äŧŧåēĻ莥įŽ—

č¯´æ˜Ž

  1. äģ€äšˆæ˜¯TCC (Thai Character Cluster)īŧŒå€Ÿį”¨Wittawat Jitkrittumįš„åĻ一į§TCC切分厞įŽ°ä¸­įš„č§Ŗ释īŧšproposed in Character Cluster Based Thai Information Retrieval is a group of inseparable Thai characters. This inseparability derives from Thai writing system which is independent of any context. As a result, TCC can be determined by a simple list of rules describing e.g., what characters need to follow/precede other characters.

  2. æŗ°č¯­TCC和éĢ˜æŖ‰č¯­KCC切分äŊŋį”¨č§„则+æ­Ŗåˆ™čĄ¨čžžåŧåŽžįŽ°īŧŒæ•ˆįŽ‡čžƒäŊŽīŧ›æŗ°č¯­TCCåˆ‡åˆ†å¯å‚č€ƒWittawat Jitkrittumįš„åĻ一į§åŽžįŽ°ã€‚

  3. æŗ°č¯­å•åą‚æĄäģļ随æœēåœēåˆ†č¯æ¨Ąåž‹æ¯”åą‚å æĄäģļ随æœēåœēåˆ†č¯æ¨Ąåž‹å¤§åžˆå¤šīŧŒéœ€čĻåžˆå¤§įš„内存才čƒŊčŋčĄŒīŧˆ-Xmx>2Gīŧ‰ã€‚

  4. įŧ…į”¸č¯­éŸŗčŠ‚åˆ‡åŠŸčƒŊ中īŧŒį”ąäēŽéŸŗčŠ‚č¯å…¸ä¸­å­˜åœ¨ä¸åŒįš„įŧ–į å’Œå­—åē“īŧŒä¸åŒįš„įŧ–į å’Œå­—åē“äšĻ写éĄēåēå­˜åœ¨åŒēåˆĢīŧŒæ‰€į”¨įŧ…į”¸č¯­éŸŗčŠ‚åˆ‡åˆ†į›Žå‰åŸēæœŦ不čƒŊį”¨ã€‚

  5. įŧ…į”¸č¯­æ˛Ąæœ‰č¯æ€§æ ‡æŗ¨åŠŸčƒŊīŧŒæ‰€æœ‰įŧ…į”¸č¯­å…ŗé”Žč¯æŠŊ取䚟存在闎éĸ˜ã€‚

  6. åˆ†č¯ä¸­åą‚å æĄäģļ随æœēåœē效果最åĨŊīŧŒæœ€įŸ­åŒšé…åˆ†č¯æ•ˆæžœæœ€åˇŽã€‚

  7. 停į”¨č¯ä¸å¤Ÿå…¨īŧŒä¸ģčĻåŒ…åĢäē†æŗ°č¯­å’ŒčļŠå—č¯­įš„停į”¨č¯ã€‚

下čŊŊ

æ–šæŗ•ä¸€ã€į›´æŽĨ下čŊŊæēį įŧ–č¯‘

可äģĨč‡ĒåˇąæŽĨ下čŊŊæœŦ饚į›Žæēį čŋ›čĄŒæ‰“包

wget https://github.com/zhaoshiyu/SEANLP/archive/master.zip
unzip master.zip
cd SEANLP-master
mvn clean package -Dmaven.test.skip=true

æˆ–č€…git cloneæœŦ饚į›Žīŧš

git clone https://github.com/ZhaoShiyu/SEANLP.git
cd SEANLP
mvn clean package -Dmaven.test.skip=true

æŗ¨æ„īŧšæ­¤å¤„下čŊŊįš„æēį ä¸­ä¸åŒ…åĢæŗ°č¯­å•åą‚æĄäģļ随æœēåœēåˆ†č¯æ¨Ąåž‹īŧŒéœ€čĻåŽŒæ•´æ¨Ąåž‹č¯ˇį”¨æ–šæŗ•äēŒã€‚

æ–šæŗ•äēŒã€ä¸‹čŊŊjar

下čŊŊSEANLP-1.1.0.jar, æˆ–č€…äŊŋį”¨seanlp-1.1.0-sources.jar中įš„æ¨Ąåž‹ã€‚

调į”¨

SEANLP所有įš„功čƒŊéƒŊ可äģĨ通čŋ‡åˇĨå…ˇįąģSEANLPčŋ›čĄŒč°ƒį”¨ã€‚č°ƒį”¨æ ŧåŧä¸ēSEANLP.č¯­č¨€.功čƒŊ。

内存čĻæą‚

å•åą‚æĄäģļ随æœēåœēæŗ°č¯­åˆ†č¯æ¨Ąåž‹åžˆå¤§īŧŒéœ€čĻ-Xmx>2G

Demo

demo可äģĨå‚č€ƒä¸‹åˆ—äŊįŊŽīŧš
1. åˆ†č¯å’Œč¯æ€§æ ‡æŗ¨
2. åĨ子į›¸äŧŧåēĻ莥įŽ—
3. å…ŗé”Žč¯æŠŊ取和č‡Ē动摘čĻ

1ã€åˆ†č¯å’Œč¯æ€§æ ‡æŗ¨īŧš

package cn.edu.kmust.seanlp.demo;

import cn.edu.kmust.seanlp.SEANLP;

/**
 * åˆ†č¯demo
 * @author Zhao Shiyu
 *
 */
public class SegmentDemo {
	public static void main(String[] args) {
		//æŗ°č¯­åˆ†č¯
		String thText = "ā¸„ā¸§ā¸˛ā¸Ąā¸Ēā¸ąā¸Ąā¸žā¸ąā¸™ā¸˜āšŒāšƒā¸™ā¸—ā¸˛ā¸‡āš€ā¸¨ā¸Ŗā¸Šā¸ā¸ā¸´ā¸ˆā¸ā¸ąā¸šā¸Ŗā¸°ā¸šā¸šā¸„ā¸§ā¸˛ā¸Ąā¸Ēā¸ąā¸Ąā¸žā¸ąā¸™ā¸˜āšŒā¸—ā¸˛ā¸‡ā¸ā¸Žā¸Ģā¸Ąā¸˛ā¸ĸ";
		System.out.println(SEANLP.Thai.syllableSegment(thText));
		System.out.println(SEANLP.Thai.dCRFSegment(thText));
		//System.out.println(SEANLP.Thai.gCRFSegment(thText));
		System.out.println(SEANLP.Thai.datSegment(thText));
		System.out.println(SEANLP.Thai.maxSegment(thText));
		System.out.println(SEANLP.Thai.minSegment(thText));
		System.out.println(SEANLP.Thai.reMaxSegment(thText));
		System.out.println(SEANLP.Thai.reMinSegment(thText));
		
		//čļŠå—č¯­åˆ†č¯
		String viText = "Háģ‡ tháģ‘ng tín dáģĨng - ngÃĸn hàng cÅŠng tăng trÆ°áģŸng khÃĄ, ngày càng giáģ¯ vai trÃ˛ quan tráģng trong cÆĄ cáēĨu kinh táēŋ Tháģ§ đô.";
		System.out.println(SEANLP.Vietnamese.crfSegment(viText));
		System.out.println(SEANLP.Vietnamese.datSegment(viText));
		System.out.println(SEANLP.Vietnamese.maxSegment(viText));
		System.out.println(SEANLP.Vietnamese.minSegment(viText));
		System.out.println(SEANLP.Vietnamese.reMaxSegment(viText));
		System.out.println(SEANLP.Vietnamese.reMinSegment(viText));
		
		//æŸŦåŸ”å¯¨č¯­īŧˆéĢ˜æŖ‰č¯­īŧ‰åˆ†č¯
		String khText = "ធážļតážģពិតនិងកážļរបន្ដគំរážļមកំហែងមកលើážĸ្នកកážļរពážļរសិទ្ធិមនážģស្សនៅកម្ពážģជážļ។របážļយកážļរណ៍នេះផ្ážĸែកលើកážļរស៊ើបážĸង្កេតតែ";
		System.out.println(SEANLP.Khmer.crfSegment(khText));
		System.out.println(SEANLP.Khmer.datSegment(khText));
		System.out.println(SEANLP.Khmer.maxSegment(khText));
		System.out.println(SEANLP.Khmer.minSegment(khText));
		System.out.println(SEANLP.Khmer.reMaxSegment(khText));
		System.out.println(SEANLP.Khmer.reMinSegment(khText));
		
		//č€æŒč¯­åˆ†č¯
		String loText = "āē—āģˆāē˛āē™āē§āē´āē™āģ€āē„āēąāē™āģ€āē›āēąāē™āē›āē°āē—āē˛āē™āēšāģāēĨāē´āēĒāēąāē”āē­āēŊāē§āģ€āēŠāēĩāē§āēĩāģ€āē­āēĩāģāē­āē™.āē§āēĩ.āēāē¸āģˆāēĄāēšāģāēĨāē´āēĒāēąāē”āēāē˛āē™āēžāē´āēĄāē‚āē­āē‡āēŠāē˛āē§āē”āēąāē”.";
		System.out.println(SEANLP.Lao.datSegment(loText));
		System.out.println(SEANLP.Lao.maxSegment(loText));
		System.out.println(SEANLP.Lao.minSegment(loText));
		System.out.println(SEANLP.Lao.reMaxSegment(loText));
		System.out.println(SEANLP.Lao.reMinSegment(loText));
		
		//įŧ…į”¸č¯­åˆ†č¯
		String buText = "ကá€ļဆိုးကá€ļဇá€Ŧတá€Ŧကá€ļထိုကá€ēကá€ļနခိုကá€ļá€”á€žá€­á€¯á€¸á€†á€ąá€Ŧá€ē";
		System.out.println(SEANLP.Burmese.datSegment(buText));
		System.out.println(SEANLP.Burmese.maxSegment(buText));
		System.out.println(SEANLP.Burmese.minSegment(buText));
		System.out.println(SEANLP.Burmese.reMaxSegment(buText));
		System.out.println(SEANLP.Burmese.reMinSegment(buText));
		System.out.println(SEANLP.Burmese.syllableSegment(buText));
	}
}

2、åĨ子į›¸äŧŧåēĻ莥įŽ—

package cn.edu.kmust.seanlp.demo;

import cn.edu.kmust.seanlp.SEANLP;

/**
 * åĨ子į›¸äŧŧåēĻ莥įŽ—demo
 * @author Zhao Shiyu
 *
 */
public class SimilarityDemo {
	public static void main(String[] args) {
		String thText = "ā¸„ā¸§ā¸˛ā¸Ąā¸Ēā¸ąā¸Ąā¸žā¸ąā¸™ā¸˜āšŒāšƒā¸™ā¸—ā¸˛ā¸‡āš€ā¸¨ā¸Ŗā¸Šā¸ā¸ā¸´ā¸ˆā¸ā¸ąā¸šā¸Ŗā¸°ā¸šā¸šā¸„ā¸§ā¸˛ā¸Ąā¸Ēā¸ąā¸Ąā¸žā¸ąā¸™ā¸˜āšŒā¸—ā¸˛ā¸‡ā¸ā¸Žā¸Ģā¸Ąā¸˛ā¸ĸ";
		String viText = "Háģ‡ tháģ‘ng tín dáģĨng - ngÃĸn hàng cÅŠng tăng trÆ°áģŸng khÃĄ, ngày càng giáģ¯ vai trÃ˛ quan tráģng trong cÆĄ cáēĨu kinh táēŋ Tháģ§ đô.";
		String khText = "ធážļតážģពិតនិងកážļរបន្ដគំរážļមកំហែងមកលើážĸ្នកកážļរពážļរសិទ្ធិមនážģស្សនៅកម្ពážģជážļ។របážļយកážļរណ៍នេះផ្ážĸែកលើកážļរស៊ើបážĸង្កេតតែ";
		String loText = "āē—āģˆāē˛āē™āē§āē´āē™āģ€āē„āēąāē™āģ€āē›āēąāē™āē›āē°āē—āē˛āē™āēšāģāēĨāē´āēĒāēąāē”āē­āēŊāē§āģ€āēŠāēĩāē§āēĩāģ€āē­āēĩāģāē­āē™.āē§āēĩ.āēāē¸āģˆāēĄāēšāģāēĨāē´āēĒāēąāē”āēāē˛āē™āēžāē´āēĄāē‚āē­āē‡āēŠāē˛āē§āē”āēąāē”.";
		String buText = "ကá€ļဆိုးကá€ļဇá€Ŧတá€Ŧကá€ļထိုကá€ēကá€ļနခိုကá€ļá€”á€žá€­á€¯á€¸á€†á€ąá€Ŧá€ē";
		System.out.println(SEANLP.Thai.sentenceSimilarity(thText, thText));
		System.out.println(SEANLP.Vietnamese.sentenceSimilarity(viText, viText));
		System.out.println(SEANLP.Khmer.sentenceSimilarity(khText, khText));
		System.out.println(SEANLP.Lao.sentenceSimilarity(loText, loText));
		System.out.println(SEANLP.Burmese.sentenceSimilarity(buText, buText));
	}
}

3、å…ŗé”Žč¯æŠŊ取与č‡Ē动摘čĻ

package cn.edu.kmust.seanlp.demo;

import cn.edu.kmust.seanlp.SEANLP;

/**
 * å…ŗé”Žč¯æŠŊ取和č‡Ē动摘čĻdemo
 * @author Zhao Shiyu
 *
 */
public class ExtractDemo {
	public static void main(String[] args) {
		String thDocument = "ā¸ā¸ĩāšˆā¸›ā¸¸āšˆā¸™ā¸„ā¸§ā¸Ŗā¸Ŗā¸°ā¸Ąā¸ąā¸”ā¸Ŗā¸°ā¸§ā¸ąā¸‡ā¸„ā¸ŗā¸žā¸šā¸”āšā¸Ĩā¸°ā¸žā¸¤ā¸•ā¸´ā¸ā¸Ŗā¸Ŗā¸Ąāš€ā¸ā¸ĩāšˆā¸ĸā¸§ā¸ā¸ąā¸šā¸›ā¸ąā¸ā¸Ģā¸˛ā¸—ā¸°āš€ā¸Ĩā¸ˆā¸ĩā¸™āšƒā¸•āš‰ \n"
				+ "ā¸Ēā¸ŗā¸™ā¸ąā¸ā¸‚āšˆā¸˛ā¸§āšā¸Ģāšˆā¸‡ā¸›ā¸Ŗā¸°āš€ā¸—ā¸¨ā¸ˆā¸ĩā¸™ā¸Ŗā¸˛ā¸ĸā¸‡ā¸˛ā¸™ā¸§āšˆā¸˛ ā¸™ā¸˛ā¸ĸā¸Ģā¸‡ āš€ā¸Ģā¸Ĩāšˆā¸ĸ āš‚ā¸†ā¸Šā¸ā¸ā¸Ŗā¸°ā¸—ā¸Ŗā¸§ā¸‡ā¸ā¸˛ā¸Ŗā¸•āšˆā¸˛ā¸‡ā¸›ā¸Ŗā¸°āš€ā¸—ā¸¨ā¸ˆā¸ĩā¸™ā¸ā¸Ĩāšˆā¸˛ā¸§āš€ā¸Ąā¸ˇāšˆā¸­ā¸§ā¸ąā¸™ā¸—ā¸ĩāšˆ 19 ā¸Ąā¸ā¸Ŗā¸˛ā¸„ā¸Ąā¸§āšˆā¸˛ ā¸ā¸ĩāšˆā¸›ā¸¸āšˆā¸™ā¸„ā¸§ā¸Ŗā¸ˆā¸”ā¸ˆā¸ŗā¸›ā¸Ŗā¸°ā¸§ā¸ąā¸•ā¸´ā¸¨ā¸˛ā¸Ēā¸•ā¸ŖāšŒā¸ā¸˛ā¸Ŗā¸Ŗā¸¸ā¸ā¸Ŗā¸˛ā¸™āšƒā¸Ģāš‰āšā¸Ąāšˆā¸™ā¸ĸā¸ŗ ā¸Ēā¸ŗā¸™ā¸ļā¸ā¸œā¸´ā¸”ā¸­ā¸ĸāšˆā¸˛ā¸‡ā¸ĸā¸´āšˆā¸‡ āšā¸Ĩā¸°ā¸Ŗā¸°ā¸Ąā¸ąā¸”ā¸Ŗā¸°ā¸§ā¸ąā¸‡ā¸„ā¸ŗā¸žā¸šā¸”āšā¸Ĩā¸°ā¸žā¸¤ā¸•ā¸´ā¸ā¸Ŗā¸Ŗā¸Ąāš€ā¸ā¸ĩāšˆā¸ĸā¸§ā¸ā¸ąā¸šā¸›ā¸ąā¸ā¸Ģā¸˛ā¸—ā¸°āš€ā¸Ĩā¸ˆā¸ĩā¸™āšƒā¸•āš‰ \n"
				+ "ā¸™ā¸˛ā¸ĸā¸Šā¸´ā¸™āš‚ā¸‹ ā¸­ā¸˛āš€ā¸šā¸° ā¸™ā¸˛ā¸ĸā¸ā¸Ŗā¸ąā¸ā¸Ąā¸™ā¸•ā¸Ŗā¸ĩā¸ā¸ĩāšˆā¸›ā¸¸āšˆā¸™ā¸ā¸Ĩāšˆā¸˛ā¸§āš€ā¸Ąā¸ˇāšˆā¸­ā¸§ā¸ąā¸™ā¸—ā¸ĩāšˆ 18 ā¸Ąā¸ā¸Ŗā¸˛ā¸„ā¸Ąā¸§āšˆā¸˛ ā¸ā¸ĩāšˆā¸›ā¸¸āšˆā¸™ā¸Ēāšƒā¸Ēāšˆāšƒā¸ˆā¸­ā¸ĸāšˆā¸˛ā¸‡ā¸ĸā¸´āšˆā¸‡ā¸•āšˆā¸­ā¸ā¸˛ā¸Ŗā¸—ā¸ĩāšˆā¸ˆā¸ĩā¸™ā¸Ēā¸Ŗāš‰ā¸˛ā¸‡āš€ā¸ā¸˛ā¸°āš€ā¸—ā¸ĩā¸ĸā¸Ąā¸ā¸Ĩā¸˛ā¸‡ā¸—ā¸°āš€ā¸Ĩā¸ˆā¸ĩā¸™āšƒā¸•āš‰ āšā¸Ĩā¸°ā¸—ā¸”ā¸Ĩā¸­ā¸‡ā¸šā¸¸ā¸āš€ā¸šā¸´ā¸ā¸—ā¸Ŗā¸ąā¸žā¸ĸā¸˛ā¸ā¸Ŗā¸—ā¸ąāš‰ā¸‡ā¸™āš‰ā¸ŗā¸Ąā¸ąā¸™āšā¸Ĩā¸°āšā¸āšŠā¸Ēā¸˜ā¸Ŗā¸Ŗā¸Ąā¸Šā¸˛ā¸•ā¸´āšƒā¸™ā¸—ā¸°āš€ā¸Ĩā¸ˆā¸ĩā¸™ā¸•ā¸°ā¸§ā¸ąā¸™ā¸­ā¸­ā¸ āš€ā¸Ŗā¸ĩā¸ĸā¸ā¸Ŗāš‰ā¸­ā¸‡ā¸›ā¸Ŗā¸°ā¸Šā¸˛ā¸„ā¸Ąāš‚ā¸Ĩā¸āšā¸Ēā¸”ā¸‡ā¸„ā¸§ā¸˛ā¸Ąāš€ā¸Ģāš‡ā¸™āš€ā¸ā¸ĩāšˆā¸ĸā¸§ā¸ā¸ąā¸šāš€ā¸Ŗā¸ˇāšˆā¸­ā¸‡ā¸™ā¸ĩāš‰ā¸Ąā¸˛ā¸ā¸‚ā¸ļāš‰ā¸™ \n"
				+ "ā¸™ā¸˛ā¸ĸā¸Ģā¸‡ āš€ā¸Ģā¸Ĩāšˆā¸ĸā¸ā¸Ĩāšˆā¸˛ā¸§ā¸•āšˆā¸­ā¸ā¸˛ā¸Ŗā¸™ā¸ĩāš‰ā¸§āšˆā¸˛ ā¸ā¸˛ā¸Ŗā¸šā¸¸ā¸āš€ā¸šā¸´ā¸ā¸™āš‰ā¸ŗā¸Ąā¸ąā¸™āšā¸Ĩā¸°āšā¸āšŠā¸Ēā¸˜ā¸Ŗā¸Ŗā¸Ąā¸Šā¸˛ā¸•ā¸´ā¸‚ā¸­ā¸‡ā¸ˆā¸ĩā¸™ ā¸Ĩāš‰ā¸§ā¸™ā¸ā¸Ŗā¸°ā¸—ā¸ŗāšƒā¸™ā¸™āšˆā¸˛ā¸™ā¸™āš‰ā¸ŗā¸—ā¸°āš€ā¸Ĩā¸—ā¸ĩāšˆā¸­ā¸ĸā¸šāšˆā¸ ā¸˛ā¸ĸāšƒā¸•āš‰ā¸ā¸˛ā¸Ŗā¸„ā¸§ā¸šā¸„ā¸¸ā¸Ąā¸‚ā¸­ā¸‡ā¸ˆā¸ĩā¸™āš€ā¸­ā¸‡āš‚ā¸”ā¸ĸā¸›ā¸Ŗā¸˛ā¸¨ā¸ˆā¸˛ā¸ā¸‚āš‰ā¸­ā¸ā¸ąā¸‡ā¸‚ā¸˛ ā¸—ā¸¸ā¸ā¸Ēā¸´āšˆā¸‡ā¸—ā¸¸ā¸ā¸­ā¸ĸāšˆā¸˛ā¸‡ā¸­ā¸ĸā¸šāšˆāšƒā¸™ā¸ā¸Ŗā¸­ā¸šā¸­ā¸˜ā¸´ā¸›āš„ā¸•ā¸ĸā¸‚ā¸­ā¸‡ā¸ˆā¸ĩā¸™āš€ā¸­ā¸‡ ā¸­ā¸™ā¸ļāšˆā¸‡ ā¸ˆā¸ĩā¸™ā¸„ā¸Ŗā¸­ā¸‡ā¸­ā¸˜ā¸´ā¸›āš„ā¸•ā¸ĸāš€ā¸Ģā¸™ā¸ˇā¸­ā¸Ģā¸Ąā¸šāšˆāš€ā¸ā¸˛ā¸°ā¸Ģā¸™ā¸˛ā¸™ā¸‹ā¸˛āšā¸Ĩā¸°ā¸™āšˆā¸˛ā¸™ā¸™āš‰ā¸ŗā¸—ā¸°āš€ā¸Ĩāš‚ā¸”ā¸ĸā¸Ŗā¸­ā¸šā¸­ā¸ĸāšˆā¸˛ā¸‡ā¸Ąā¸´ā¸­ā¸˛ā¸ˆāš‚ā¸•āš‰āšā¸ĸāš‰ā¸‡āš„ā¸”āš‰";

		String viDocument = "ĐáēĄi háģ™i láē§n tháģŠ XII cáģ§a ĐáēŖng háģp phiÃĒn trÚ báģ‹"
				+ "NDĐT- SÃĄng 20-1, ĐáēĄi háģ™i đáēĄi biáģƒu toàn quáģ‘c láē§n tháģŠ XII cáģ§a ĐáēŖng háģp phiÃĒn trÚ báģ‹, hoàn táēĨt công tÃĄc chuáēŠn báģ‹ cáē§n thiáēŋt cho phiÃĒn khai máēĄc sáēŊ diáģ…n ra vào 8 giáģ sÃĄng 21-1."
				+ "MáģŸ đáē§u phiÃĒn háģp trÚ báģ‹, đáģ“ng chí LÃĒ Háģ“ng Anh, áģĻy viÃĒn Báģ™ Chính tráģ‹, ThÆ°áģng tráģąc Ban Bí thÆ° Trung Æ°ÆĄng ĐáēŖng tuyÃĒn báģ‘ lÃŊ do."
				+ "Đáģ“ng chí TrÆ°ÆĄng TáēĨn Sang, áģĻy viÃĒn Báģ™ Chính tráģ‹, Cháģ§ táģ‹ch nÆ°áģ›c điáģu khiáģƒn phiÃĒn háģp. Tiáēŋp đÃŗ, Cháģ§ táģ‹ch nÆ°áģ›c TrÆ°ÆĄng TáēĨn Sang xin ÃŊ kiáēŋn ĐáēĄi háģ™i thông qua chÆ°ÆĄng trÃŦnh phiÃĒn háģp trÚ báģ‹, thông qua Quy cháēŋ làm viáģ‡c cáģ§a ĐáēĄi háģ™i."
				+ "ĐáēĄi háģ™i đÃŖ hoàn thành cÃĄc pháē§n viáģ‡c quan tráģng gáģ“m: báē§u Đoàn Cháģ§ táģ‹ch, Đoàn thÆ° kÃŊ, Ban tháēŠm tra tÆ° cÃĄch đáēĄi biáģƒu, thông qua chÆ°ÆĄng trÃŦnh làm viáģ‡c cáģ§a ĐáēĄi háģ™i, thông qua Quy cháēŋ báē§u cáģ­ cáģ§a ĐáēĄi háģ™i và thông qua BÃĄo cÃĄo tháēŠm tra tÆ° cÃĄch đáēĄi biáģƒu."
				+ "Buáģ•i chiáģu, cÃĄc đáēĄi biáģƒu nghiÃĒn cáģŠu tài liáģ‡u táēĄi đoàn."
				+ "Ngày mai 21-1, ĐáēĄi háģ™i đáēĄi biáģƒu toàn quáģ‘c láē§n tháģŠ XII cáģ§a ĐáēŖng khai máēĄc táēĄi Trung tÃĸm Háģ™i ngháģ‹ quáģ‘c gia, Hà Náģ™i. ĐáēĄi háģ™i tiáēŋn hành táģĢ ngày 21 đáēŋn 28-1-2016, cÃŗ nhiáģ‡m váģĨ Ä‘ÃĄnh giÃĄ viáģ‡c tháģąc hiáģ‡n Ngháģ‹ quyáēŋt ĐáēĄi háģ™i XI cáģ§a ĐáēŖng và nhÃŦn láēĄi cháēˇng đưáģng 30 năm đáģ•i máģ›i đáēĨt nÆ°áģ›c; tháēŖo luáē­n, thông qua BÃĄo cÃĄo Chính tráģ‹ cáģ§a Ban CháēĨp hành Trung Æ°ÆĄng khÃŗa XI; cÃĄc bÃĄo cÃĄo: Ä‘ÃĄnh giÃĄ káēŋt quáēŖ tháģąc hiáģ‡n nhiáģ‡m váģĨ phÃĄt triáģƒn kinh táēŋ- xÃŖ háģ™i năm năm 2011-2015 và phÆ°ÆĄng hÆ°áģ›ng nhiáģ‡m váģĨ phÃĄt triáģƒn kinh táēŋ- xÃŖ háģ™i năm năm 2016- 2020; kiáģƒm điáģƒm sáģą lÃŖnh đáēĄo, cháģ‰ đáēĄo cáģ§a Ban CháēĨp hành Trung Æ°ÆĄng khÃŗa XI; táģ•ng káēŋt thi hành Điáģu láģ‡ ĐáēŖng khÃŗa XI và đáģ xuáēĨt báģ• sung, sáģ­a đáģ•i (náēŋu cÃŗ); viáģ‡c tháģąc hiáģ‡n Ngháģ‹ quyáēŋt T.Ư 4 khÃŗa XI váģ xÃĸy dáģąng ĐáēŖng. ĐáēĄi háģ™i báē§u Ban CháēĨp hành Trung Æ°ÆĄng khÃŗa XII. Cháģ§ đáģ cáģ§a ĐáēĄi háģ™i là Tăng cÆ°áģng xÃĸy dáģąng ĐáēŖng trong sáēĄch, váģ¯ng máēĄnh; phÃĄt huy sáģŠc máēĄnh toàn dÃĸn táģ™c và dÃĸn cháģ§ xÃŖ háģ™i cháģ§ nghÄŠa; đáēŠy máēĄnh toàn diáģ‡n, đáģ“ng báģ™ công cuáģ™c đáģ•i máģ›i; báēŖo váģ‡ váģ¯ng cháē¯c Táģ• quáģ‘c, giáģ¯ váģ¯ng môi trÆ°áģng hÃ˛a bÃŦnh, áģ•n đáģ‹nh; pháēĨn đáēĨu sáģ›m đưa nÆ°áģ›c ta cÆĄ báēŖn tráģŸ thành nÆ°áģ›c công nghiáģ‡p theo hÆ°áģ›ng hiáģ‡n đáēĄi."
				+ "Tham dáģą ĐáēĄi háģ™i XII cÃŗ 1510 đáēĄi biáģƒu, đáēĄi diáģ‡n cho hÆĄn 4,5 triáģ‡u đáēŖng viÃĒn, trong đÃŗ đáēĄi biáģƒu Ä‘Æ°ÆĄng nhiÃĒn cÃŗ 197 đáģ“ng chí là áģĻy viÃĒn Trung Æ°ÆĄng chính tháģŠc và dáģą khuyáēŋt khÃŗa XI; 1300 đáēĄi biáģƒu đưáģŖc báē§u táēĄi cÃĄc đáēĄi háģ™i ĐáēŖng báģ™ tráģąc thuáģ™c Trung Æ°ÆĄng; 13 đáēĄi biáģƒu cháģ‰ đáģ‹nh. Công tÃĄc chuáēŠn báģ‹ ĐáēĄi háģ™i đÃŖ đưáģŖc Ban CháēĨp hành Trung Æ°ÆĄng, tráģąc tiáēŋp là Báģ™ Chính tráģ‹, Ban Bí thÆ° cháģ‰ đáēĄo cháēˇt cháēŊ, đáēŋn nay đÃŖ hoàn táēĨt.";
		
		String loDocument = "āē›āē°āē—āē˛āē™â€‹āē›āē°â€‹āģ€āē—āē”​āēˆāēĩāē™â€‹āģ€āēĨāēĩāģˆâ€‹āēĄāēĸāģ‰āēŊāēĄāēĸāē˛āēĄâ€‹āēŠāē˛â€‹āē­āē¸â€‹āē”āē´āē”āē­āē˛āēŖāēąāēšāēšāēĩ \n"
				+ "āģ€āē§āēĨāē˛ 13:35 ​āģ‚āēĄāē‡â€‹āē‚āē­āē‡â€‹āē§āēąāē™â€‹āē—āēĩ 19 āēĄāēąāē‡āēāē­āē™â€‹āē™āēĩāģ‰â€‹āē•āē˛āēĄâ€‹āģ€āē§āēĨāē˛â€‹āē—āģ‰āē­āē‡â€‹āē–āē´āģˆāē™, āē—āģˆāē˛āē™â€‹ āēĒāēĩ​āēˆāēĩāģ‰â€‹āē™āēœāē´â€‹āē‡ āē›āē°āē—āē˛āē™â€‹āē›āē°â€‹āģ€āē—āē”​āēˆāēĩāē™â€‹āģ„āē”āģ‰â€‹āģ€āē”āēĩāē™āē—āē˛āē‡â€‹āģ„āē›â€‹āēŽāē­āē”​āēĒāē°āģœāē˛āēĄâ€‹āēšāē´āē™â€‹āēĒāē˛āēāēģāē™āēāē°āēĒāēąāē”​āē„āē˛â€‹āģ€āēĨāēąāē”​āē—āēĩāģˆâ€‹āē™āē°āē„āē­āē™āēĢāēŧāē§āē‡āēĨāēĩ​āē­āēąāē”​āē”āģ‰āē§āēâ€‹āēāēģāē™â€‹āēžāē´â€‹āģ€āēĒāē” ​āģ€āēžāēˇāģˆāē­â€‹āēĸāģ‰āēŊāēĄāēĸāē˛āēĄâ€‹āēŠāē˛â€‹āē­āē¸â€‹āē”āē´āē”āē­āē˛āēŖāēąāēšāēšāēĩāē—āē˛â€‹āē‡āēĨāēąāē”āē–āē°āēāē´āē”. \n"
				+ "āē—āģˆāē˛āē™ āēĒāēĩ​āēˆāē´āģ‰āē™āēœāē´â€‹āē‡ āēŠāēĩāģ‰â€‹āē­āē­āēâ€‹āē§āģˆāē˛, āēŠāē˛â€‹āē­āē¸â€‹āē”āē´āē”āē­āē˛āēŖāēąāēšāēšāēĩ​āģāēĄāģˆāē™â€‹āē›āē°â€‹āģ€āē—āē”​āē­āē˛āēŖāēąāēšâ€‹āģāēĨāē°â€‹āē­āē´āēĒāēĨāē˛āēĄâ€‹āē—āēĩāģˆâ€‹āģƒāēĢāēāģˆ, ​āģāēĨāē°â€‹āēāģâ€‹āģāēĄāģˆāē™â€‹āēĒāē°āēĄāē˛â€‹āēŠāē´āēāē—āēĩāģˆāēĒāēŗāē„āēąāē™â€‹āē‚āē­āē‡â€‹āēāē¸āģˆāēĄ 20 āē›āē°â€‹āģ€āē—āē”. āē™āēąāēšâ€‹āģāē•āģˆâ€‹āēˆāēĩāē™â€‹āēāēąāēšâ€‹āēŠāē˛â€‹āē­āē¸â€‹āē”āē´āē”āē­āē˛āēŖāēąāēšāēšāēĩ​āēĒāģ‰āē˛āē‡āēĒāē˛â€‹āēāēžāēģāē§āēžāēąāē™â€‹āēāē˛āē™â€‹āē—āēšāē”​āē™āēŗ​āēāēąāē™â€‹āģ€āē›āēąāē™â€‹āģ€āē§āēĨāē˛ 26 āē›āēĩ​āēĄāē˛â€‹āē™āēĩāģ‰, āēāē˛āē™â€‹āēžāēģāē§āēžāēąāē™â€‹āēĨāē°āēĢāē§āģˆāē˛āē‡â€‹āēĒāē­āē‡â€‹āēāģˆāē˛āēâ€‹āģ„āē”āģ‰â€‹āēŽāēąāēšâ€‹āēāē˛āē™â€‹āēžāēąāē”āē—āē°āē™āē˛â€‹āģāēšāēšâ€‹āēāģ‰āē˛āē§â€‹āēāē°â€‹āģ‚āē”āē” ​āģ‚āē”āēâ€‹āēĄāēĩ​āē„āē§āē˛āēĄâ€‹āģ„āē§āģ‰â€‹āģ€āē™āēˇāģ‰āē­â€‹āģ€āēŠāēˇāģˆāē­â€‹āģƒāēˆâ€‹āēāēąāē™â€‹āē”āģ‰āē˛āē™â€‹āēāē˛āē™â€‹āģ€āēĄāēˇāē­āē‡â€‹āē™āēąāēšâ€‹āēĄāēˇāģ‰â€‹āē™āēąāēšâ€‹āģ€āēĨāē´āēâ€‹āģ€āēŠāē´āģˆāē‡, āēāē˛āē™â€‹āēŽāģˆāē§āēĄâ€‹āēĄāēˇâ€‹āģƒāē™â€‹āē—āē¸āēâ€‹āē‚āēģāē‡â€‹āģ€āē‚āē”​​āģ„āē”āģ‰â€‹āēŽāēąāēšâ€‹āģāē˛āēāēœāēģāē™â€‹āē—āēĩāģˆâ€‹āē­āē¸āē”āēģāēĄāēĒāēģāēĄāēšāēšāē™ ​āē­āēąāē™āģ„āē”āģ‰â€‹āē™āēŗāēĄāē˛â€‹āģ€āēŠāē´āģˆāē‡āē„āē§āē˛āēĄāēœāē˛āēĒāē¸āēāē—āēĩāģˆāģƒāēĢāēāģˆāēĢāēŧāē§āē‡â€‹āģāēāģˆâ€‹āē›āē°āēŠāē˛āēŠāēģāē™â€‹āēĒāē­āē‡â€‹āē›āē°â€‹āģ€āē—āē”. ​āģƒāē™â€‹āģ„āēĨāēāē°â€‹āēĸāģ‰āēŊāēĄāēĸāē˛āēĄâ€‹āē„āēąāģ‰āē‡āē™āēĩāģ‰, āē‚āģ‰āē˛āēžāē°â€‹āģ€āēˆāēģāģ‰āē˛āēˆāē°â€‹āēŽāģˆāē§āēĄâ€‹āēāēąāēšâ€‹āēĒāēģāēĄâ€‹āģ€āē”āēąāē”​ ​āģ‚āēĄâ€‹āēŽāē˛āēĄâ€‹āģ€āēĄāēąāē” āēšāē´āē™ āēŠāē˛â€‹â€‹āģ€āēĨāēĩāēĄāē˛āē™ ​āģ€āēžāēˇāģˆāē­â€‹āģāēĨāēāē›āģˆāēŊāē™â€‹āē„āē§āē˛â€‹āēĄāē„āē´āē”​āģ€āēĢāēąāē™â€‹āēāģˆāēŊāē§â€‹āēāēąāēšâ€‹āēāē˛āē™â€‹āēžāēģāē§āēžāēąāē™â€‹āēĒāē­āē‡â€‹āēāģˆāē˛āēâ€‹āēžāģ‰āē­āēĄâ€‹āē”āģ‰āē§āēâ€‹āēšāēąāē™āēĢāē˛â€‹āēĒāē˛āēāēģāē™â€‹āģāēĨāē°â€‹āēžāē˛āēâ€‹āēžāēˇāģ‰āē™â€‹āē—āēĩāģˆâ€‹āēĒāēģāē™â€‹āģƒāēˆāēŽāģˆāē§āēĄâ€‹āēāēąāē™, ​āģāēĨāē°â€‹āģ€āēžāēˇāģˆāē­â€‹āēŠāē¸āēāēāēšāģ‰â€‹āēĒāē˛āēâ€‹āēžāēģāē§āēžāēąāē™â€‹āēĄāē´â€‹āē”āē•āē°āēžāē˛āēšâ€‹āģāēĨāē°â€‹āēāē˛āē™â€‹āēŽāģˆāē§āēĄâ€‹āēĄāēˇâ€‹āēĨāē°āēĢāē§āģˆāē˛āē‡â€‹āēˆāēĩāē™â€‹-​āēŠāē˛â€‹āē­āē¸â€‹āē”āē´āē”āē­āē˛āēŖāēąāēšāēšāēĩ​āģƒāēĢāģ‰â€‹āēžāēąāē”āē—āē°āē™āē˛â€‹āē§āģˆāē­āē‡āģ„āē§â€‹āģāēĨāē°â€‹āģƒāēĢāēāģˆāēĢāēŧāē§āē‡â€‹āēāē§āģˆāē˛â€‹āģ€āēāēģāģˆāē˛. āē‚āģ‰āē˛āēžāē°â€‹āģ€āēˆāēģāģ‰āē˛â€‹āģ€āēŠāēˇāģˆāē­â€‹āģāēąāģ‰āē™āē§āģˆāē˛, āēāē˛āē™â€‹āēĸāģ‰āēŊāēĄāēĸāē˛āēĄâ€‹āģ€āē—āēˇāģˆāē­â€‹āē™āēĩāģ‰ āēˆāē°â€‹â€‹āģ€āē•āēąāēĄâ€‹āģ„āē›â€‹āē”āģ‰āē§āēâ€‹â€‹āģ„āēĄāē•āēĩ​āēˆāē´āē”​āēĄāē´āē”āē•āē°āēžāē˛āēšâ€‹āģāēĨāē°â€‹āģāē˛āēāēœāēģāē™â€‹āē—āēĩāģˆâ€‹āē­āē¸āē”āēģāēĄāēĒāēģāēĄāēšāēšāē™ ​āģ€āēŠāē´āģˆāē‡āēˆāē°â€‹āēŠāģˆāē§āēâ€‹āēŠāē¸āēāēāēšāģ‰â€‹āēāē˛āē™â€‹āēŽāģˆāē§āēĄâ€‹āēĄāēˇâ€‹āēĨāē°āēĢāē§āģˆāē˛āē‡â€‹āēĒāē­āē‡â€‹āēāģˆāē˛āēâ€‹āģƒāē™â€‹āē—āē¸āēâ€‹āē‚āēģ​āē‡â€‹āģ€āē‚āē”​āē‚āēļāģ‰āē™āēĒāēšāģˆâ€‹āēĨāē°āē”āēąāēšâ€‹āģƒāģāģˆ āē—āēąāē‡â€‹āēˆāē°â€‹āēĄāēĩ​āēœāēģāē™â€‹āē”āēĩ​āē•āģāģˆâ€‹āēāē˛āē™â€‹āēāēģāēâ€‹āēĨāē°āē”āēąāēšâ€‹āēāē˛āē™â€‹āēŽāģˆāē§āēĄâ€‹āēĄāēˇâ€‹āēĨāē°āēĢāē§āģˆāē˛āē‡â€‹āēˆāēĩāē™â€‹āēāēąāēšāē›āē°â€‹āģ€āē—āē”​āēĒāē°āēĄāē˛āēŠāē´āēâ€‹āģƒāē™â€‹āēĒāē°āēžāē˛â€‹āēŽāģˆāē§āēĄāēĄāēˇâ€‹āē­āģˆāē˛āē§āģ€āē›āēĩ​āģ€āēŠāēâ€‹āģƒāēĢāģ‰â€‹āēĒāēšāē‡â€‹āē‚āēļāģ‰āē™. \n"
				+ "āēĢāēŧāēąāē‡āēˆāē˛āēâ€‹āēĒāē´āģ‰āē™āēĒāē¸āē”​āēāē˛āē™â€‹āēĸāģ‰āēŊāēĄāēĸāē˛āēĄâ€‹āēŠāē˛â€‹āē­āē¸â€‹āē”āē´āē”āē­āē˛āēŖāēąāēšāēšāēĩ​āģāēĨāģ‰āē§, āē—āģˆāē˛āē™ āēĒāēĩ​āēˆāē´āģ‰āē™āēœāē´â€‹āē‡ āēāēąāē‡â€‹āēˆāē°â€‹āģ€āē”āēĩāē™â€‹āē—āē˛āē‡â€‹āģ„āē›â€‹āēĸāģ‰āēŊāēĄāēĸāē˛āēĄâ€‹â€‹āģ€āē­āēĸāē´āēšâ€‹āģāēĨāē°â€‹āē­āēĩāēŖāē˛āē™āē—āē˛āē‡â€‹āēĨāēąāē”āē–āē°āēāē´āē”āē•āēˇāģˆāēĄâ€‹āē­āēĩāē. ";
		
		String kmDocument = "ážĸប់រំ​ចំណេះ​ទážŧទៅ \n "
				+ "កំឡážģង​ឆ្នážļំ​ áŸĸ០០៩-áŸĸ០១áŸŖ ​សកម្មភážļព​គោលនយោបážļយ​មážŊយចំនážŊន​ត្រážŧវ​បážļន​រៀបចំ​ដážŧចជážļ​ ផែនកážļរ​គោល​ស្តីពី​កážļរážĸភិវឌ្ឍ​មធ្យមសិក្សážļ​ និង​សៀវភៅ​ប្រតិបត្តិ​សម្រážļប់​ដំណើរកážļរ​មជ្ឈមណ្ឌល​ធនធážļន​សម្រážļប់​មធ្យម​ សិក្សážļ​ គោលនយោបážļយ​ស្តីពី​សážļលážļ​កážģមážļរ​មេត្រី​នៅ​មធ្យម​សិក្សážļ​ និង​កážļរ​កែលម្ážĸ​កម្មវិធីសិក្សážļ​។​ប្រព័ន្ធ​វិក្រឹតកážļរ​គ្រážŧបង្រៀន​ លើមážģខវិជ្ជážļ​គណិតវិទ្យážļ​និង​វិទ្យážļសážļស្ត្រ​ត្រážŧវ​បážļន​រៀបចំ​។ ​កម្មវិធី​បំណិន​ជីវិត​បច្ចេកវិទ្យážļ​ ព័ត៌មážļន​ និង​ទេសចរណ៍​ត្រážŧវ​បážļន​ážĸនážģម័ត​ និង​ស្តង់ដážļរ​បណ្ណážļល័យ​នៅ​មធ្យមសិក្សážļ​កំពážģង​រៀបចំ​ជážļ​សេចក្តី​ព្រážļង។ \n"
				+ "កážļរចážŧលរៀន ​និង​គážģណភážļព​នៅ​កម្រិត​នេះ​មážļន​កážļរប្រែប្រážŊល​តិចតážŊច​។ ​ážĸត្រážļ​ត្រážŊត​ថ្នážļក់​បážļន​ថយចážģះ​ តិចតážŊច ​ប៉ážģន្តែ​ážĸត្រážļ​បោះបង់​កážļរសិក្សážļ​មិនមážļន​ប្រែប្រážŊលទេ​។ ​សិស្ស​ភážļគច្រើ​នបážļន​ជ្រើសរើស​យក​មážģខវិជ្ជážļ​ វិទ្យážļសážļស្ត្រ​ពិត​។ ​ទោះយ៉ážļងណážļ​ក៏ដោយ​ គážģណភážļព​នៅ​កម្រិត​នេះ​មិន​ទážļន់​ážĸážļច​វážļស់វែង​បážļន​នៅ​ឡើយ​ ដោយសážļរ​ពážģំទážļន់​បážļន​ធ្វើ​តេស្ត​វážļយ​តម្លៃ​ថ្នážļក់​ជážļតិ​នៅ​ថ្នážļក់ទី ១áŸĸ​។ ​ážĸážļហážļរážŧបករណ៍​បážļន​ផ្តល់​ជážļ​រៀងរážļល់​ឆ្នážļំ​។ សិស្ស​បážļន​ទទážŊល​មេដážļយ​លើ​មážģខវិជ្ជážļ​គណិតវិទ្យážļ​និង​វិទ្យážļសážļស្ត្រ ​ពី​កម្មវិធី​ប្រកážŊត​ស៊ីមេ​ážĸážŧážĸážŧឡážļំព្យážļដ​ និង​កម្មវិធី​ប្រកážŊត​ážĸន្តរជážļតិ​ផ្សេងៗ​ទៀត​។ \n"
				+ "ប្រព័ន្ធ​វážļយតម្លៃ​ ថ្នážļក់ជážļតិ​ត្រážŧវ​បážļន​ដážļក់ឱ្យ​ážĸនážģវត្ត​និង​មážļន​ថវិកážļ​សម្រážļប់​ដំណើរកážļរ​។ កážļរប្រឡង​ថ្នážļក់ជážļតិ​នៅ​ថ្នážļក់​ទី​ ៩ ​និង​ទី ​១áŸĸ​ ត្រážŧវ​បážļន​ážĸនážģវត្ត​ជážļ​ទៀងទážļត់​។ \n"
				+ "ចំនážŊន​ážĸនážģវិទ្យážļល័យ​ និង​វិទ្យážļល័យ​បážļន​កើនឡើង​។ ​សážļលážļ​មធ្យម​សិក្សážļ​បឋមភážŧមិ​ áŸĨ០ ​ភážļគរយ​ ​បážļន​ážĸភិវឌ្ឍ​ទៅ​ជážļ​សážļលážļ​មធ្យម​សិក្សážļ​ទážģតិយភážŧមិ​។ ​មជ្ឈមណ្ឌល​ធនធážļន​នៅ​មធ្យម​សិក្សážļ​ត្រážŧវ​បážļន​ក៏សážļង​នៅ​គ្រប់​រážļជធážļនី ​ខេត្ត​។ ​សážļលážļ​មធ្យម​សិក្សážļ​បឋម​ភážŧមិ​ចំនážŊន​ ១៤១ ​ក្នážģង​ខេត្ត​ ៨​ មážļន​បន្ទប់​កážģំព្យážŧទ័រ​។ ប្រព័ន្ធ​នៃ​កážļរបណ្តážģះបណ្តážļល​និង​វិក្រឹតកážļរ​គ្រážŧបង្រៀន​ ជážļពិសេស​គ្រážŧបង្រៀន​កម្រិត​ážĸប់រំមážŧលដ្ឋážļន​នៅ​មជ្ឈមណ្ឌល​គរážģកោសល្យ​ភážŧមិភážļគ​ និង​វិទ្យážļស្ថážļន​ជážļតិ​ážĸប់រំ​កំពážģង​ត្រážŧវ​បážļន​ពង្រីក​។ កážļរងážļរ​វិក្រឹតកážļរ​លើ​មážģខវិជ្ជážļ​គណិតវិទ្យážļ​និង​វិទ្យážļសážļស្ត្រ ​ក៏​កំពážģង​ពង្រីក​ផងដែរ​។ ​នážļយក​សážļលážļ​មធ្យម​សិក្សážļ​ទážģតិយភážŧមិ​ទážļំងážĸស់​ និង​នážļយក​សážļលážļ​មធ្យម​សិក្សážļ​បឋម​ភážŧមិ​មážŊយ​ចំនážŊន​បážļន​ទទážŊល​កážļរ​បំប៉ន​ស្តីពី​ កážļរគ្រប់គ្រង​និង​ដឹកនážļំ​។ ប្រធážļន​ក្រážģម​បច្ចេកទេស​នៃ​មជ្ឈមណ្ឌល​ធនធážļន​នៅ​មធ្យម​សិក្សážļ​និង​បណ្តážļញ​ ទážļំងážĸស់​បážļន​ទទážŊល​កážļរបំប៉ន​ ស្តីពី​ស្តង់ដážļរ​កម្មវិធីសិក្សážļ​។ គោលនយោបážļយសážļលážļកážģមážļរមេត្រីត្រážŧវបážļនážĸនážģវត្តនៅសážļលážļចំនážŊន ៨áŸĸáŸŖ (áŸĨ០,៧៤ ភážļគរយ​នៃ​សážļលážļ​មធ្យម​សិក្សážļ​បឋមភážŧមិ​)។ \n"
				+ "បញ្ហážļ​ប្រឈម​ពេល​ខážļងមážģខ​គážē​ កážļរបង្កើន​សមធម៌​ក្នážģងកážļរ​ចážŧលរៀន​នៅ​មធ្យមសិក្សážļ​ តážļមរយៈ​បង្កើន​ចំនážŊន​សážļលážļ​មធ្យម​សិក្សážļ​បឋម​ភážŧមិ​ឱ្យបážļន​គ្រប់ឃážģំ​ សង្កážļត់​និងវិទ្យážļល័យ​នៅ​គ្រប់ស្រážģក​ ខណ្ឌ​។ គážģណភážļព​របស់​សិស្ស​បញ្ចប់​ថ្នážļក់ទី ​១áŸĸ​ ត្រážŧវ​លើក​កម្ពស់​ និង​ផ្តល់​នážŧវ​ចំណេះដឹង​ពážļក់ព័ន្ធ​ដទៃទៀត​ សម្រážļប់​កážļរážĸប់រំ​បច្ចេកទេស​ វិជ្ជážļជីវៈ ​និង​ឧត្តមសិក្សážļ​។ ​សážļលážļ​មធ្យម​សិក្សážļ​ភážļគច្រើន​ ជážļពិសេស​នៅ​តំបន់​ជនបទ​ខ្វះខážļត​ធážļតážģ​ចážŧល​ដែល​មážļនគážģណភážļព​ដážŧចជážļ​ គ្រážŧបង្រៀន​តážļមមážģខវិជ្ជážļ ​សម្ភážļរៈ​បង្រៀន​និង​គ្រážŋង​បរិក្ខážļរ​ សៀវភៅ​សិក្សážļគោល​ បន្ទប់ពិសោធ​វិទ្យážļសážļស្ត្រ​ បន្ទប់​កážģំព្យážŧទ័រ​និង​ភážļសážļ​ និង​បរិក្ខážļរ​បណ្ណážļល័យ​។ វិធីសážļស្ត្រ​ក្នážģងកážļរ​បង្រៀន​ក្នážģង​ពេលបច្ចážģប្បន្ន​ážĸនážģវត្ត​តážļមរបៀប​ ជážļមេរៀន​ ចម្លង​តážļម​ ដកស្រង់​ និង​កážļរចងចážļំ​។​ វិធីសážļស្ត្រ​ទážļំងនេះ​គážŊរ​ត្រážŧវ​បញ្ចážŧល​នážŧវ​កážļរគិត​ និង​ជំនážļញវិភážļគ​។ ​ស្តង់ដážļរ​គ្រážŧបង្រៀន​គážŊរ​ត្រážŧវ​ពិនិត្យ​តážļមដážļន វážļយតម្លៃ​ និង​ážĸភិវឌ្ឍ​ជážļប្រចážļំ​។ \n"
				+ "កážļរážĸប់រំ​បច្ចេកទេស​ជážļ​មážģខងážļរ​ថ្មី​មážŊយ​ក្នážģង​ ក្រសážŊង​ និង​ទើបតែ​បážļន​ážĸនážģម័ត​គោលនយោបážļយ​ស្តីពី​ កážļរážĸប់រំ​បច្ចេកទេស​។ ​ក្នážģង​គោលនយោបážļយ​នេះ​បážļន​លើក​ឡើង​ពី ​កážļរបង្កើត​វិទ្យážļល័យ​ចំណេះ​ទážŧទៅ​និង​ បច្ចេកទេស​នៅ​គ្រប់​រážļជធážļនី ​ខេត្ត​។ នážļពេលបច្ចážģប្បន្ន​ សážļលážļ​មធ្យម​សិក្សážļ​ចំណេះទážŧទៅ​និង​បច្ចេកទេស​ចំនážŊនពីរ​កំពážģង​ដំណើរកážļរ ​និង​ផ្តល់​នážŧវ​មážģខវិជ្ជážļ​សំខážļន់​បážŊន​។​ សិស្ស​ដែល​បញ្ចប់​កážļរសិក្សážļ​នៅ​សážļលážļ​ទážļំងនេះ​ ážĸážļច​រកកážļរងážļរ​បážļន​។ \n"
				+ "កម្មវិធី ​ážĸប់រំ​បច្ចេកទេស​ដែល​កំពážģង​ážĸនážģវត្ត​នážļពេលបច្ចážģប្បន្ន ​មážļន​កង្វះខážļត​ទážļំង​ក្របខណ្ឌ​គážģណភážļព​ដ៏រឹងមážļំ​ និង​ធážļតážģ​ចážŧល​ដែល​មážļនគážģណភážļព​។ ប្រព័ន្ធ​ទទážŊលស្គážļល់​និង​ប្រព័ន្ធធážļនážļគážģណភážļព​សážļលážļរៀន​មិន​ទážļន់​បážļន​បង្កើត​ ព្រម​ទážļំង​ទំនážļក់ទំនង​រវážļង​ប្រព័ន្ធážĸប់រំ​និង​ទីផ្សážļរ​កážļរងážļរ​នៅ​មážļនកម្រិត​ ។ ​ប្រព័ន្ធ​គ្រប់គ្រង រដ្ឋបážļល​និង​ហិរញ្ញវត្ថážģ​សážļលážļរៀន​មិន​ទážļន់​បង្កើត​។ កážļរងážļរ​ážĸប់រំ​បច្ចេកទេស​ត្រážŧវ​ដកបទពិសោធ​ពីប្រទេស​ដទៃ​។ ដៃគážŧ​ážĸភិវឌ្ឍ​ជážļច្រើន​បážļន​ចážļប់ážĸážļរម្មណ៍​គážļំទ្រ​ដល់​កម្មវិធីនេះ​។";
		
		String buDocument = "ဂá€ēá€Ŧကá€Ŧတá€Ŧ-ဘန္ေဒá€Ģင္း á€ģမန္နá€Ŋုုနယးá€ģမင္႔ရထá€Ŧးလမ္း ေဖá€Ģကယလုပယမညယ\n"
				+ "တရုတယနုိငယငá€ļေတá€Ŧ္ေကá€Ŧင္စီဝင္ ဝမယယုá€ļသည္ တရုတယ-á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧး ပူးေပá€Ģင္း ေဆá€Ŧá€€á€šá€œá€¯á€•á€šá€ąá€žá€Ŧ ဂá€ēá€Ŧကá€Ŧတá€Ŧ-ဘန္ေဒá€Ģင္း á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧး စတင္ ေဖá€Ģá€€á€šá€œá€¯á€•á€šá€ąá€›á€¸ အလမ္းအနá€Ŧးသုိ႔ တက္ေရá€Ŧက္ရန္ ၂၀- ရက္ေန႔မá€Ŋ ၂၂ ရက္ေန႔အထိ á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧးသုိ႔ ခရီးထá€ŧက္သá€ŧá€Ŧးမည္ á€ģဖစ္ေၾကá€Ŧင္း၊ á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧး သမáŧတ ကá€ēိá€ŗကုိသညယလညယး အခမ္းအနá€Ŧးသုိ႔ တက္ေရá€Ŧက္မည္ á€ģဖစ္ေၾကá€Ŧင္း၊ ယင္းရထá€Ŧးလမ္းသည္ á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧး၌ ပထမá€Ĩဎးဆုá€ļး á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧးလမ္း á€ģဖစ္á€ģပီး á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧး၏ အေá€ģခခá€ļ အေဆá€Ŧá€€á€šá€Ąá€Ąá€¯á€ļ ေကá€Ŧင္းမá€ŧန္ ေစေရးနá€Ŋင္႔ အá€ģပန္အလá€Ŋန္ ဆက္သá€ŧယ္မá€Ŋု အဆင္႔အတန္း တုိးá€ģမá€Ŋင္႔ေရးအတá€ŧက္ အကá€ēိá€ŗးရá€Ŋိမည္ á€ģဖစ္ေၾကá€Ŧင္း တရုတယá€ģပည္သူ႔ေန႔စá€Ĩ္သတင္းစá€Ŧမá€Ŋ သတင္းအရ သိရပá€Ģသည္။"
				+ "ဂá€ēá€Ŧကá€Ŧတá€Ŧ-ဘန္ေဒá€Ģင္း á€ģမန္နá€Ŋုုနယးá€ģမင္႔ရထá€Ŧးလမ္းသည္ á€…á€¯á€…á€¯á€ąá€•á€Ģင္း အရá€Ŋည္ ကဎလုိမဎတá€Ŧ ၁၅၀ ရá€Ŋိá€ģပီး တစ္နá€Ŧရီလá€Ŋá€ēင္ အá€ģမနယဆုá€ļး ကဎလုိမဎတá€Ŧ ၃၀၀ á€á€¯á€á€šá€ąá€™á€Ŧငယးနုိငယမညယ á€ģဖစ္ကá€Ŧ ရထá€Ŧးလမ္း ေဖá€Ģကယလုပယá€ģပီးေနá€Ŧက္ ဂá€ēá€Ŧကá€Ŧတá€Ŧမá€Ŋ ဘန္ေဒá€Ģငယးသုိ႔ ေပá€Ģက္ေရá€Ŧက္ရန္ အခá€ēိန္မá€Ŋá€Ŧ လက္ရá€Ŋိ သုá€ļးနá€Ŧရီမá€Ŋေန၍ အá€Ŧနဂတ္ မိနစ္ ၄၀ မေကá€ēá€Ŧ္ေအá€Ŧင္ á€ģဖစ္သá€ŧá€Ŧးမည္ á€ģဖစ္ေၾကá€Ŧင္း၊ ဂá€ēá€Ŧကá€Ŧတá€Ŧ-ဘန္ေဒá€Ģင္း á€ģမန္နá€Ŋုုနယးá€ģမင္႔ရထá€Ŧးလမ္းသည္ á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧးနုိငယငá€ļသá€Ŧ မက အေရá€Ŋ႔ေတá€Ŧင္အá€Ŧရá€Ŋေဒသတá€ŧင္ ပထမá€Ĩဎးဆုá€ļး á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧးလည္း á€ģဖစ္မည္ á€ģဖစ္ေၾကá€Ŧင္း၊ á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧးလမ္း ေဖá€Ģကယလုပယá€ģခင္းသည္ á€Ąá€„á€šá€’á€¯á€­á€”á€Žá€¸á€›á€Ŋá€Ŧးနုိငယငá€ļအတá€ŧက္ နá€Ŋစ္စá€Ĩ္ á€Ąá€œá€¯á€•á€šá€Ąá€€á€¯á€­á€„á€šá€ąá€”á€›á€Ŧ ၄ ေသá€Ŧင္းေကá€ēá€Ŧယကုိ အသစ္ ဖန္တီးေပးမည္ á€ģဖစ္á€ģပီး ရထá€Ŧးလမ္း တေလá€Ŋá€ēá€Ŧက္က ေဒသမá€ēá€Ŧးတá€ŧင္ စီးပá€ŧá€Ģးေရး တဆငယ႔တုိး ဖá€ŧá€ļ႔á€ģဖိá€ŗးသá€ŧá€Ŧးမည္ á€ģဖစ္ေၾကá€Ŧင္း သိရပá€Ģသည္။"
				+ "ဂá€ēá€Ŧကá€Ŧတá€Ŧ-ဘန္ေဒá€Ģင္း á€ģမန္နá€Ŋုုနယးá€ģမင္႔ရထá€Ŧးလမ္းသည္ တရုတယá€ģပည္၏ နုိငယငá€ļá€ģခá€Ŧး၌ á€’á€Žá€‡á€¯á€­á€„á€šá€¸á€ąá€›á€¸á€†á€˛á€ŧá€ģခင္း၊ ေဆá€Ŧကယလုပယá€ģခင္း၊ လည္ပတ္á€ģခင္းနá€Ŋင္႔ စီမá€ļခန္႔ခဲá€ŧá€ģခင္း á€ģဖစ္စá€Ĩ္ တစယရပယလုá€ļးတá€ŧင္ ပá€Ģဝင္ေဆá€Ŧင္ရá€ŧက္ေသá€Ŧ ပထမဆုá€ļးေသá€Ŧ á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧးလမ္း á€ģဖစ္ေၾကá€Ŧင္း၊ တရုတယနုိငယငá€ļတá€ŧင္ ကမáģá€Ŧေပၚတá€ŧင္ á€Ąá€á€¯á€­á€„á€šá€¸á€Ąá€á€Ŧ အၾကီးမá€Ŧးဆုá€ļး၊ á€ģမန္နá€Ŋုနယး အá€ģမငယ႔ဆုá€ļး၊ á€Ąá€ąá€á€á€šá€™á€Žá€†á€¯á€ļး၊ စီမá€ļခန္႔ခဲá€ŧမá€Ŋု အေတá€ŧ႔အၾကá€ļá€ŗ အရင္႔ကá€ēကယဆုá€ļးေသá€Ŧ á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧးလမ္း ကá€ŧန္ရက္ ရá€Ŋိá€ģပီး တရုတယá€ģပည္၏ á€ģမန္နá€Ŋုနယးá€ģမင္႔ရထá€Ŧးလမ္းမá€ēá€Ŧးသည္ နုိငယငá€ļတကá€Ŧ စá€ļခá€ēိန္စá€ļနá€Ŋုနယး á€€á€¯á€­á€€á€šá€™á€Žá€ąá€›á€¸á€Ąá€–á€˛á€ŧ႔၊ နုိငယငá€ļတကá€Ŧ မီးရထá€Ŧးလုပယငနယး အဖဲá€ŧ႔ခá€ēá€ŗပယတုိ႔၏ နည္းပညá€Ŧ စá€ļခá€ēိန္စá€ļနá€Ŋုနယးနá€Ŋင္႔ လည္း á€€á€¯á€­á€€á€šá€Šá€Žá€ąážá€€á€Ŧင္း á€ģပည္သူ႔ေန႔စá€Ĩ္သတင္းစá€Ŧတá€ŧင္ ေရးသá€Ŧးထá€Ŧးသည္။";
		
		//å…ŗé”Žč¯æå–
		System.out.println(SEANLP.Thai.extractKeyword(thDocument, 6));
		//č‡Ē动摘čĻ
		System.out.println(SEANLP.Thai.extractSummary(thDocument, 3));
		
		System.out.println(SEANLP.Lao.extractKeyword(loDocument, 6));
		System.out.println(SEANLP.Lao.extractSummary(loDocument, 3));
		
		System.out.println(SEANLP.Khmer.extractKeyword(kmDocument, 6));
		System.out.println(SEANLP.Khmer.extractSummary(kmDocument, 3));
		
		System.out.println(SEANLP.Vietnamese.extractKeyword(viDocument, 6));
		System.out.println(SEANLP.Vietnamese.extractSummary(viDocument, 3));
		
		System.out.println(SEANLP.Burmese.extractKeyword(buDocument, 6));
		System.out.println(SEANLP.Burmese.extractSummary(buDocument, 3));
	}

}

į‰ˆæƒ

é¸Ŗč°ĸ

æœŦ饚į›Žå‚č€ƒå’Œå€Ÿé‰´äē†äŧ˜į§€åŧ€æēéĄšį›ŽHanLPã€‚åœ¨æ­¤čĄ¨į¤ē感č°ĸīŧ

感č°ĸ昆明į†åˇĨ大å­Ļæ™ēčƒŊäŋĄæ¯å¤„į†é‡į‚šåŽžéĒŒåŽ¤å„äŊč€å¸ˆįš„指å¯ŧīŧŒæ„Ÿč°ĸįģ™æˆ‘提䞛帎劊įš„所有äēēīŧŒč°ĸč°ĸīŧ

äŊœč€… @Zhao Shiyu

About

Southeast Asia Natural Language Processing [Thai Vietnamese Khmer Lao Burmese(Myanmar) ]

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages