Merge pull request #233 from tejasvaidhyadev/doc_fix_LM

LM doc fix
JuliaText · Sep 6, 2020 · 67434f7 · 67434f7
2 parents 9185436 + 07dc373
commit 67434f7
Showing 1 changed file with 62 additions and 57 deletions.
diff --git a/docs/src/LM.md b/docs/src/LM.md
@@ -46,25 +46,28 @@ julia> train = ["khan","is","my","good", "friend","and","He","is","my","brother"
 # voc and train are used to train vocabulary and model respectively
 
 julia> model = MLE(voc)
-MLE(Vocabulary(Dict("khan"=>1,"name"=>1,"<unk>"=>1,"salman"=>1,"is"=>2,"Khan"=>1,"my"=>1,"he"=>1,"shahrukh"=>1,"and"=>1…), 1, "<unk
-        >", ["my", "name", "is", "salman", "khan", "and", "he", "is", "shahrukh", "Khan", "<unk>"]))
+MLE(Vocabulary(Dict("khan"=>1,"name"=>1,"<unk>"=>1,"salman"=>1,"is"=>2,"Khan"=>1,"my"=>1,"he"=>1,"shahrukh"=>1,"and"=>1…), 1, "<unk>", ["my", "name", "is", "salman", "khan", "and", "he", "is", "shahrukh", "Khan", "<unk>"]))
+
 julia> print(voc)
 11-element Array{String,1}:
- "my"      
- "name"    
- "is"      
- "salman"  
- "khan"    
- "and"     
- "he"      
- "is"      
+ "my"
+ "name"
+ "is"
+ "salman"
+ "khan" 
+ "and" 
+ "he" 
+ "is"
  "shahrukh"
- "Khan"    
- "<unk>"   
+ "Khan"
+ "<unk>"
+
 # you can see "<unk>" token is added to voc 
 julia> fit = model(train,2,2) #considering only bigrams
+
 julia> unmaskedscore = score(model, fit, "is" ,"<unk>") #score output P(word | context) without replacing context word with "<unk>"
 0.3333333333333333
+
 julia> masked_score = maskedscore(model,fit,"is","alien")
 0.3333333333333333
 #as expected maskedscore is equivalent to unmaskedscore with context replaced with "<unk>"
@@ -79,86 +82,87 @@ julia> masked_score = maskedscore(model,fit,"is","alien")
 
 ### `score` 
 
-    used to evaluate the probability of word given context (*P(word | context)*)
+used to evaluate the probability of word given context (*P(word | context)*)
 
 ```julia
-	score(m::gammamodel, temp_lm::DefaultDict, word::AbstractString, context::AbstractString)
+score(m::gammamodel, temp_lm::DefaultDict, word::AbstractString, context::AbstractString)
 ```
 
-Arguments:                                                        
+Arguments:
 
 1. `m` : Instance of `Langmodel` struct.
 2. `temp_lm`: output of function call of instance of `Langmodel`.
 3. `word`: string of word 
 4. `context`: context of given word
 
-	In case of Lidstone and Laplace it apply smoothing and, 
+- In case of `Lidstone` and `Laplace` it apply smoothing and, 
 
-	In Interpolated language model, provide Kneserney and WittenBell smoothing  
+- In Interpolated language model, provide `Kneserney` and `WittenBell` smoothing 
 
 ### `maskedscore` 
 
-  It is used to evaluate *score* with masks out of vocabulary words
+- It is used to evaluate *score* with masks out of vocabulary words
 
-  The arguments are the same as for score
+- The arguments are the same as for `score`
 
 ### `logscore` 
 
-  Evaluate the log score of this word in this context.
+- Evaluate the log score of this word in this context.
 
-  The arguments are the same as for score and maskedscore
+- The arguments are the same as for `score` and `maskedscore`
 
 ### `entropy`
-	```julia
-  entropy(m::Langmodel,lm::DefaultDict,text_ngram::word::Vector{T}) where { T <: AbstractString}
-	```
 
-  Calculate cross-entropy of model for given evaluation text.
+```julia
+entropy(m::Langmodel,lm::DefaultDict,text_ngram::word::Vector{T}) where { T <: AbstractString}
+```
+
+- Calculate *cross-entropy* of model for given evaluation text.
 
-  Input text must be Array of ngram of same lengths
+- Input text must be Array of ngram of same lengths
 
 ### `perplexity`  
 
-  Calculates the perplexity of the given text.
+- Calculates the perplexity of the given text.
 
-  This is simply 2 ** cross-entropy(`entropy`) for the text, so the arguments are the same as `entropy`.
+- This is simply 2 ** cross-entropy(`entropy`) for the text, so the arguments are the same as `entropy`.
 
 ##  Preprocessing
 
  For Preprocessing following functions:
 
 1. `everygram`: Return all possible ngrams generated from sequence of items, as an Array{String,1}
 
- ```julia
-   julia> seq = ["To","be","or","not"]
-   julia> a = everygram(seq,min_len=1, max_len=-1)
-    10-element Array{Any,1}:
-     "or"          
-     "not"         
-     "To"          
-     "be"                  
-     "or not" 
-     "be or"       
-     "be or not"   
-     "To be or"    
-     "To be or not"
- ```
+```julia
+julia> seq = ["To","be","or","not"]
+julia> a = everygram(seq,min_len=1, max_len=-1)
+ 10-element Array{Any,1}:
+  "or"
+  "not"
+  "To"
+  "be"
+  "or not" 
+  "be or"
+  "be or not"
+  "To be or"
+  "To be or not"
+```
 
 2. `padding_ngrams`: padding _ngram is used to pad both left and right of sentence and out putting ngrmas of order n
 
    It also pad the original input Array of string 
 
- ```julia
-   julia> example = ["1","2","3","4","5"]
-   julia> padding_ngrams(example,2,pad_left=true,pad_right=true)
-    6-element Array{Any,1}:
-     "<s> 1" 
-     "1 2"   
-     "2 3"   
-     "3 4"   
-     "4 5"   
-     "5 </s>"
- ```
+```julia
+julia> example = ["1","2","3","4","5"]
+julia> padding_ngrams(example,2,pad_left=true,pad_right=true)
+ 6-element Array{Any,1}:
+  "<s> 1" 
+  "1 2"
+  "2 3"
+  "3 4"
+  "4 5"
+  "5 </s>"
+```
 ## Vocabulary 
 
 Struct to store Language models vocabulary
@@ -169,17 +173,18 @@ It also Adds a special "unkown" tokens which unseen words are mapped to
 
 ```julia
 julia> words = ["a", "c", "-", "d", "c", "a", "b", "r", "a", "c", "d"]
+
 julia> vocabulary = Vocabulary(words, 2) 
-  Vocabulary(Dict("<unk>"=>1,"c"=>3,"a"=>3,"d"=>2), 2, "<unk>") 
+ Vocabulary(Dict("<unk>"=>1,"c"=>3,"a"=>3,"d"=>2), 2, "<unk>") 
 
 # lookup a sequence or words in the vocabulary
 julia> word = ["a", "-", "d", "c", "a"]
 
 julia> lookup(vocabulary ,word)
  5-element Array{Any,1}:
-  "a"    
+  "a"
   "<unk>"
-  "d"    
-  "c"    
+  "d"
+  "c"
   "a"
 ```