This python project analyzes national AI and defense strategy documents using zero-shot text classification. The project focuses on Southeast Asia and nearby countries, specifically: Australia, Indonesia, Malaysia, Singapore, Thailand, and Vietnam.
python -m main.py
import os
from textanalysis import analysis
path = os.path.join(os.getcwd(), 'data', 'policies', 'australia_defense.pdf')
temp = analysis.extract_pdfs(path)
df, fig = analysis.analyze_corpus(temp)
The result of analyze_corpus
is a dataframe of classified text (by topic and sentiment) and an interactive plot of the topic and sentiment by text chunk.
This code uses the facebook/bart-large-mnli
large BART model from Hugging Face. This is a MutliNLI-tuned model based on BART and used here for zero-shot text classification.
This code also uses the distilbert-base-uncased-finetuned-sst-2-english
model from Hugging Face. This is a fine-tuned model based on DistilBERT and used here for sentiment classification.
distilbert-base-uncased-finetuned-sst-2-english
has strong evaluation results in terms of accuracy and precision:
However, it is also subject to risks, limitations, and biases.
The national-level AI strategies or policies for GPAI and each country under consideration are included as .pdf
s in the data/policies
directory. The text-only version of those policies are included as .txt
s in the data/texts
directory.
The membership assessment metrics for the Global Partnership on Artificial Intelligence (GPAI) are included in the data/metrics
directory. This directory includes the source documents and consolidated metrics for the countries under consideration. The metrics are defined in the 2021 GPAI Frame for letter of intent and reference metrics to support the assessment of GPAI Membership (also available in the same directory). The datasets are organized with the following identifiers:
Identifier | Dataset |
---|---|
aidv | AI and Democratic Values Index |
aigs | AI Global Surveillance Index |
aii | Stanford AI Index |
cri | Commitment to Reducing Inequality Index |
di | Democracy Index |
gai | Global AI Index |
gair | Government AI Readiness Index |
gfs | Global Freedom Score |
libdem | V-Dem Liberal Democracy Index |
odi | Open Data Index |
ttaip | Total number of 10% top-cited AI scientific publications, fractional counts (source) |
Intermediate data files and output figures and tables are included in the data/output
directory.
Exploratory analysis suggest that the approach is feasible. The following figure shows the sentiment and topic classficiation through Singapore's National AI Strategy.
If this work is useful to you, please cite the following paper: Keith, A.J. (2024) Governance of artificial intelligence in Southeast Asia. Global Policy, 00, 1–18. Available from: https://doi.org/10.1111/1758-5899.13458.
@article{https://doi.org/10.1111/1758-5899.13458,
author = {Keith, Andrew J.},
title = {Governance of artificial intelligence in Southeast Asia},
journal = {Global Policy},
volume = {n/a},
number = {n/a},
pages = {},
doi = {https://doi.org/10.1111/1758-5899.13458},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/1758-5899.13458},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/1758-5899.13458},
}