Skip to content

Commit

Permalink
Merge pull request #51 from rssdev10/feature/initial_docs
Browse files Browse the repository at this point in the history
Feature/initial docs
  • Loading branch information
aviks authored Jul 29, 2023
2 parents 7523996 + 99446b5 commit cbbf00e
Show file tree
Hide file tree
Showing 11 changed files with 167 additions and 10 deletions.
44 changes: 35 additions & 9 deletions .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,45 @@
name: CompatHelper

on:
schedule:
- cron: '55 00 * * *'

- cron: 0 0 * * *
workflow_dispatch:
permissions:
contents: write
pull-requests: write
jobs:
CompatHelper:
runs-on: ubuntu-latest
steps:
- uses: julia-actions/setup-julia@latest
- name: Check if Julia is already available in the PATH
id: julia_in_path
run: which julia
continue-on-error: true
- name: Install Julia, but only if it is not already available in the PATH
uses: julia-actions/setup-julia@v1
with:
version: 1.3
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()
version: '1'
arch: ${{ runner.arch }}
if: steps.julia_in_path.outcome != 'success'
- name: "Add the General registry via Git"
run: |
import Pkg
ENV["JULIA_PKG_SERVER"] = ""
Pkg.Registry.add("General")
shell: julia --color=yes {0}
- name: "Install CompatHelper"
run: |
import Pkg
name = "CompatHelper"
uuid = "aa819f21-2bde-4658-8897-bab36330d9b7"
version = "3"
Pkg.add(; name, uuid, version)
shell: julia --color=yes {0}
- name: "Run CompatHelper"
run: |
import CompatHelper
CompatHelper.main()
shell: julia --color=yes {0}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: julia -e 'using CompatHelper; CompatHelper.main()'
COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }}
# COMPATHELPER_PRIV: ${{ secrets.COMPATHELPER_PRIV }}
14 changes: 14 additions & 0 deletions .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,18 @@ on:
lookback:
default: 3
permissions:
actions: read
checks: read
contents: write
deployments: read
issues: read
discussions: read
packages: read
pages: read
pull-requests: read
repository-projects: read
security-events: read
statuses: read
jobs:
TagBot:
if: github.event_name == 'workflow_dispatch' || github.actor == 'JuliaTagBot'
Expand All @@ -17,3 +28,6 @@ jobs:
- uses: JuliaRegistries/TagBot@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
# Edit the following line to reflect the actual name of the GitHub Secret containing your private key
ssh: ${{ secrets.DOCUMENTER_KEY }}
# ssh: ${{ secrets.NAME_OF_MY_SSH_PRIVATE_KEY_SECRET }}
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ Languages.jl
[![pkgeval](https://juliahub.com/docs/Languages/pkgeval.svg)](https://juliahub.com/ui/Packages/Languages/w1H1r)


[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliatext.github.io/Languages.jl) [![](https://img.shields.io/badge/docs-dev-blue.svg)](https://juliatext.github.io/Languages.jl/dev)


## Introduction

Languages.jl is a Julia package for working with human languages. It provides:
Expand Down
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build/
site/
2 changes: 2 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
16 changes: 16 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
using Documenter
using Languages

makedocs(
sitename = "Languages",
format = Documenter.HTML(),
modules = [Languages],
pages = [
"Home" => "index.md",
"API" => "api.md"
]
)

deploydocs(
repo = "github.com/JuliaText/Languages.jl.git"
)
4 changes: 4 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```@autodocs
Modules = [Languages]
Private = false
```
57 changes: 57 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Languages.jl

Languages.jl is a Julia package for working with human languages. It provides:

* Lists of words from each language for basic categories:
* Articles
* Indefinite Articles
* Definite Articles
* Prepositions
* Pronouns
* Stopwords

These methods are supported only for English and German currently.

This package also detects the script and language for written text in a wide variety of languages.

## Usage

using Languages

articles(Languages.English())
stopwords(Languages.English())

All word lists are returned as vectors of UTF-8 strings.

## Script detection

Script detection model works by checking the unicode character ranges present within
the input text

Languages.detect_script("To be or not to be") # => Languages.LatinScript()

## Language Detection

A trigram based model is used to detect the language for the text. The model is
filtered based on the detected script.

We detect 84 of the most common languages spoken around the world. This usually
covers most languages with more than 10 million native speakers.

detector = LanguageDetector()
detector("To be or not to be") #=> (Languages.English(), Languages.LatinScript(), 1.0)

## List All Supported Languages
You can use `list_languages()` to get all supported languages.

The `LanguageDetector` model returns the language, the script, and the confidence when applied to a string.

The language and script detection code in this package is heavily inspired from the rust package [whatlang-rs](https://github.com/greyblake/whatlang-rs). That package is in turn derived from [franc](https://github.com/wooorm/franc). See `LICENSE.whatlang-rs` for details.

## Deprecations

The API of this package has been refurbished recently. If you have used this package earlier,
please be aware of these changes.

* The language names have been shortened. So `English` instead of `EnglishLanguage`. However, the language names are no longer exported. So they should be referred to with the package name: `Languages.English`
* Every language is a type. However all functions now accept and return instances of these types, rather than the types themselves.
2 changes: 1 addition & 1 deletion src/Languages.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ using RelocatableFolders
global trigram_models = Dict{String, Dict}()
for (script, langs) in trigram_models_json
for (lang, trigrams) in langs #store only supported langs
if from_code(lang) != nothing
if !isnothing(from_code(lang))
get!(trigram_models, script, Dict{String, Vector{String}}())[lang] = split(trigrams, '|')
end
end
Expand Down
22 changes: 22 additions & 0 deletions src/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,25 @@ abstract type Language; end
# Portuguese, Romanian, Russian, Spanish, Swedish, Turkish

# These are ISO 639-2T alpha-3 and ISO 639-3 codes
"""
isocode(lang::T) where {T<:Language}
Returns ISO code of the `lang`
"""
isocode(lang::T) where {T<:Language} = isocode(T)

"""
name(lang::T) where {T<:Language}
Returns the self-name of the language `lang`.
"""
name(lang::T) where {T<:Language} = name(T)

"""
english_name(lang::T) where {T<:Language}
Returns the name of the language `lang` in English.
"""
english_name(lang::T) where {T<:Language} = english_name(T)

struct Esperanto <: Language; end; english_name(::Type{Esperanto}) = "Esperanto"; name(::Type{Esperanto}) = "Esperanto"; isocode(::Type{Esperanto}) = "epo";
Expand Down Expand Up @@ -182,6 +199,11 @@ global const code_to_lang = Dict{String, Language}(
"uig" => Uyghur(),
)

"""
from_code(code::String)
Returns the language object for the ISO `code`.
"""
function from_code(code::String)
return get(code_to_lang, lowercase(code), nothing)
end
Expand Down
11 changes: 11 additions & 0 deletions src/whatlang.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@

const RELIABLE_CONFIDENCE_THRESHOLD = 0.8;

"""
detect_script(text::AbstractString)
Detect a script for the given `text`.
Returns either `Script` or a tuple `(Script, probability)`.
"""
function detect_script(text::AbstractString)
script_counters = [
[LatinScript() , 0],
Expand Down Expand Up @@ -384,6 +390,11 @@ Base.@deprecate detect(text::AbstractString, options=default_options()) Language
mutable struct LanguageDetector
end

"""
detector::LanguageDetector(text::AbstractString, options=default_options())
Returns a tuple `(Language, Script, confidence)` for the given `text`
"""
function(m::LanguageDetector)(text::AbstractString, options=default_options())
if text==""; throw(ArgumentError("Cannot detect language for empty text")); end
script = detect_script(text)
Expand Down

0 comments on commit cbbf00e

Please sign in to comment.