๐คTransformersใฏใใใญในใใ่ฆ่ฆใ้ณๅฃฐใชใฉใฎ็ฐใชใใขใใชใใฃใซๅฏพใใฆใฟในใฏใๅฎ่กใใใใใซใไบๅใซๅญฆ็ฟใใใๆฐๅใฎใขใใซใๆไพใใพใใ
ใใใใฎใขใใซใฏๆฌกใฎใใใชๅ ดๅใซ้ฉ็จใงใใพใ:
- ๐ ใใญในใใฏใใใญในใใฎๅ้กใๆ ๅ ฑๆฝๅบใ่ณชๅๅฟ็ญใ่ฆ็ดใ็ฟป่จณใใใญในใ็ๆใชใฉใฎใฟในใฏใฎใใใซใ100ไปฅไธใฎ่จ่ชใซๅฏพๅฟใใฆใใพใใ
- ๐ผ๏ธ ็ปๅๅ้กใ็ฉไฝๆคๅบใใปใฐใกใณใใผใทใงใณใชใฉใฎใฟในใฏใฎใใใฎ็ปๅใ
- ๐ฃ๏ธ ้ณๅฃฐใฏใ้ณๅฃฐ่ช่ญใ้ณๅฃฐๅ้กใชใฉใฎใฟในใฏใซไฝฟ็จใใพใใ
ใใฉใณในใใฉใผใใผใขใใซใฏใใใผใใซ่ณชๅๅฟ็ญใๅ ๅญฆๆๅญ่ช่ญใในใญใฃใณๆๆธใใใฎๆ ๅ ฑๆฝๅบใใใใชๅ้กใ่ฆ่ฆ็่ณชๅๅฟ็ญใชใฉใ่คๆฐใฎใขใใชใใฃใ็ตใฟๅใใใใฟในใฏใๅฎ่กๅฏ่ฝใงใใ
๐คTransformersใฏใไธใใใใใใญในใใซๅฏพใใฆใใใใฎไบๅๅญฆ็ฟใใใใขใใซใ็ด ๆฉใใใฆใณใญใผใใใฆไฝฟ็จใใใใชใ่ช่บซใฎใใผใฟใปใใใงใใใใๅพฎ่ชฟๆดใใ็งใใกใฎmodel hubใงใณใใฅใใใฃใจๅ ฑๆใใใใใฎAPIใๆไพใใพใใๅๆใซใใขใผใญใใฏใใฃใๅฎ็พฉใใๅPythonใขใธใฅใผใซใฏๅฎๅ จใซในใฟใณใใขใญใณใงใใใ่ฟ ้ใช็ ็ฉถๅฎ้จใๅฏ่ฝใซใใใใใซๅคๆดใใใใจใใงใใพใใ
๐คTransformersใฏJaxใPyTorchใTensorFlowใจใใ3ๅคงใใฃใผใใฉใผใใณใฐใฉใคใใฉใชใผใซๆฏใใใใใใใใใฎใฉใคใใฉใชใใทใผใ ใฌในใซ็ตฑๅใใฆใใพใใ็ๆนใงใขใใซใๅญฆ็ฟใใฆใใใใใ็ๆนใงๆจ่ซ็จใซใญใผใใใใฎใฏ็ฐกๅใชใใจใงใใ
model hubใใใใปใจใใฉใฎใขใใซใฎใใผใธใง็ดๆฅใในใใใใใจใใงใใพใใใพใใใใใชใใฏใขใใซใใใฉใคใใผใใขใใซใซๅฏพใใฆใใใฉใคใใผใใขใใซใฎใในใใฃใณใฐใใใผใธใงใใณใฐใๆจ่ซAPIใๆไพใใฆใใพใใ
ไปฅไธใฏใใฎไธไพใงใ:
่ช็ถ่จ่ชๅฆ็ใซใฆ:
- BERTใซใใใในใฏใใฏใผใ่ฃๅฎ
- Electraใซใใๅๅๅฎไฝ่ช่ญ
- GPT-2ใซใใใใญในใ็ๆ
- RoBERTaใซใใ่ช็ถ่จ่ชๆจ่ซ
- BARTใซใใ่ฆ็ด
- DistilBERTใซใใ่ณชๅๅฟ็ญ
- T5ใซใใ็ฟป่จณ
ใณใณใใฅใผใฟใใธใงใณใซใฆ:
- ViTใซใใ็ปๅๅ้ก
- DETRใซใใ็ฉไฝๆคๅบ
- SegFormerใซใใใปใใณใใฃใใฏใปใฐใกใณใใผใทใงใณ
- DETRใซใใใใใใใฃใใฏใปใฐใกใณใใผใทใงใณ
ใชใผใใฃใชใซใฆ:
ใใซใใขใผใใซใชใฟในใฏใซใฆ:
Hugging Faceใใผใ ใซใใฃใฆไฝใใใ ใใฉใณในใใฉใผใใผใไฝฟใฃใๆธใ่พผใฟ ใฏใใใฎใชใใธใใชใฎใใญในใ็ๆๆฉ่ฝใฎๅ ฌๅผใใขใงใใใ
ไธใใใใๅ
ฅๅ๏ผใใญในใใ็ปๅใ้ณๅฃฐใ...๏ผใซๅฏพใใฆใใใซใขใใซใไฝฟใใใใซใๆใ
ใฏpipeline
ใจใใAPIใๆไพใใฆใใใพใใpipelineใฏใๅญฆ็ฟๆธใฟใฎใขใใซใจใใใฎใขใใซใฎๅญฆ็ฟๆใซไฝฟ็จใใใๅๅฆ็ใใฐใซใผใๅใใใใฎใงใใไปฅไธใฏใ่ฏๅฎ็ใชใใญในใใจๅฆๅฎ็ใชใใญในใใๅ้กใใใใใซpipelineใไฝฟ็จใใๆนๆณใงใ:
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
2่ก็ฎใฎใณใผใใงใฏใpipelineใงไฝฟ็จใใใไบๅๅญฆ็ฟๆธใฟใขใใซใใใฆใณใญใผใใใฆใญใฃใใทใฅใใ3่ก็ฎใงใฏไธใใใใใใญในใใซๅฏพใใฆใใฎใขใใซใ่ฉไพกใใพใใใใใงใฏใ็ญใใฏ99.97%ใฎไฟก้ ผๅบฆใงใใใธใใฃใใใงใใ
่ช็ถ่จ่ชๅฆ็ใ ใใงใชใใใณใณใใฅใผใฟใใธใงใณใ้ณๅฃฐๅฆ็ใซใใใฆใใๅคใใฎใฟในใฏใซใฏใใใใใ่จ็ทดใใใpipeline
ใ็จๆใใใฆใใใไพใใฐใ็ปๅใใๆคๅบใใใ็ฉไฝใ็ฐกๅใซๆฝๅบใใใใจใใงใใ:
>>> import requests
>>> from PIL import Image
>>> from transformers import pipeline
# Download an image with cute cats
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png"
>>> image_data = requests.get(url, stream=True).raw
>>> image = Image.open(image_data)
# Allocate a pipeline for object detection
>>> object_detector = pipeline('object-detection')
>>> object_detector(image)
[{'score': 0.9982201457023621,
'label': 'remote',
'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}},
{'score': 0.9960021376609802,
'label': 'remote',
'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}},
{'score': 0.9954745173454285,
'label': 'couch',
'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}},
{'score': 0.9988006353378296,
'label': 'cat',
'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}},
{'score': 0.9986783862113953,
'label': 'cat',
'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}]
ใใใงใฏใ็ปๅใใๆคๅบใใใใชใใธใงใฏใใฎใชในใใๅพใใใใชใใธใงใฏใใๅฒใใใใฏในใจไฟก้ ผๅบฆในใณใขใ่กจ็คบใใใพใใๅทฆๅดใๅ ็ปๅใๅณๅดใไบๆธฌ็ตๆใ่กจ็คบใใใใฎใงใ:
ใใฎใใฅใผใใชใขใซใงใฏใpipeline
APIใงใตใใผใใใใฆใใใฟในใฏใซใคใใฆ่ฉณใใ่ชฌๆใใฆใใพใใ
pipeline
ใซๅ ใใฆใไธใใใใใฟในใฏใซๅญฆ็ฟๆธใฟใฎใขใใซใใใฆใณใญใผใใใฆไฝฟ็จใใใใใซๅฟ
่ฆใชใฎใฏใ3่กใฎใณใผใใ ใใงใใไปฅไธใฏPyTorchใฎใใผใธใงใณใงใ:
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
And here is the equivalent code for TensorFlow:
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
ใใผใฏใใคใถใฏๅญฆ็ฟๆธใฟใขใใซใๆๅพ ใใใในใฆใฎๅๅฆ็ใๆ ๅฝใใๅไธใฎๆๅญๅ (ไธ่จใฎไพใฎใใใซ) ใพใใฏใชในใใซๅฏพใใฆ็ดๆฅๅผใณๅบใใใจใใงใใพใใใใใฏไธๆตใฎใณใผใใงไฝฟ็จใงใใ่พๆธใๅบๅใใพใใใพใใๅ็ดใซ ** ๅผๆฐๅฑ้ๆผ็ฎๅญใไฝฟ็จใใฆใขใใซใซ็ดๆฅๆธกใใใจใใงใใพใใ
ใขใใซ่ชไฝใฏ้ๅธธใฎPytorch nn.Module
ใพใใฏ TensorFlow tf.keras.Model
(ใใใฏใจใณใใซใใฃใฆ็ฐใชใ)ใงใ้ๅธธ้ใไฝฟ็จใใใใจใๅฏ่ฝใงใใใใฎใใฅใผใใชใขใซใงใฏใใใฎใใใชใขใใซใๅพๆฅใฎPyTorchใTensorFlowใฎๅญฆ็ฟใซใผใใซ็ตฑๅใใๆนๆณใใ็งใใกใฎTrainer
APIใไฝฟใฃใฆๆฐใใใใผใฟใปใใใง็ด ๆฉใๅพฎ่ชฟๆดใ่กใๆนๆณใซใคใใฆ่ชฌๆใใพใใ
-
ไฝฟใใใใๆๆฐใขใใซ:
- ่ช็ถ่จ่ช็่งฃใป็ๆใใณใณใใฅใผใฟใใธใงใณใใชใผใใฃใชใฎๅใฟในใฏใง้ซใใใใฉใผใใณในใ็บๆฎใใพใใ
- ๆ่ฒ่ ใๅฎๅ่ ใซใจใฃใฆใฎไฝใๅๅ ฅ้ๅฃใ
- ๅญฆ็ฟใใใฏใฉในใฏ3ใคใ ใใงใใฆใผใถใ็ด้ขใใๆฝ่ฑกๅใฏใปใจใใฉใใใพใใใ
- ๅญฆ็ฟๆธใฟใขใใซใๅฉ็จใใใใใฎ็ตฑไธใใใAPIใ
-
ไฝใ่จ็ฎใณในใใๅฐใชใใซใผใใณใใใใใชใณใ:
- ็ ็ฉถ่ ใฏใๅธธใซๅใใฌใผใใณใฐใ่กใใฎใงใฏใชใใใใฌใผใใณใฐใใใใขใใซใๅ ฑๆใใใใจใใงใใพใใ
- ๅฎๅๅฎถใฏใ่จ็ฎๆ้ใ็็ฃใณในใใๅๆธใใใใจใใงใใพใใ
- ใในใฆใฎใขใใชใใฃใซใใใฆใ60,000ไปฅไธใฎไบๅๅญฆ็ฟๆธใฟใขใใซใๆใคๆฐๅคใใฎใขใผใญใใฏใใฃใๆไพใใพใใ
-
ใขใใซใฎใฉใคใใฟใคใ ใฎใใใใ้จๅใง้ฉๅใชใใฌใผใ ใฏใผใฏใ้ธๆๅฏ่ฝ:
- 3่กใฎใณใผใใงๆๅ ็ซฏใฎใขใใซใใใฌใผใใณใฐใ
- TF2.0/PyTorch/JAXใใฌใผใ ใฏใผใฏ้ใง1ใคใฎใขใใซใ่ชๅจใซ็งปๅใใใใ
- ๅญฆ็ฟใ่ฉไพกใ็็ฃใซ้ฉใใใใฌใผใ ใฏใผใฏใใทใผใ ใฌในใซ้ธๆใงใใพใใ
-
ใขใใซใใตใณใใซใใใผใบใซๅใใใฆ็ฐกๅใซใซในใฟใใคใบๅฏ่ฝ:
- ๅ่่ ใ็บ่กจใใ็ตๆใๅ็พใใใใใซใๅใขใผใญใใฏใใฃใฎไพใๆไพใใฆใใพใใ
- ใขใใซๅ ้จใฏๅฏ่ฝใช้ใไธ่ฒซใใฆๅ ฌ้ใใใฆใใพใใ
- ใขใใซใใกใคใซใฏใฉใคใใฉใชใจใฏ็ฌ็ซใใฆๅฉ็จใใใใจใใงใใ่ฟ ้ใชๅฎ้จใๅฏ่ฝใงใใ
- ใใฎใฉใคใใฉใชใฏใใใฅใผใฉใซใใใใฎใใใฎใใซใใฃใณใฐใใญใใฏใฎใขใธใฅใผใซๅผใใผใซใใใฏในใงใฏใใใพใใใใขใใซใใกใคใซใฎใณใผใใฏใ็ ็ฉถ่ ใ่ฟฝๅ ใฎๆฝ่ฑกๅ/ใใกใคใซใซ้ฃใณ่พผใใใจใชใใๅใขใใซใ็ด ๆฉใๅๅพฉใงใใใใใซใๆๅณ็ใซ่ฟฝๅ ใฎๆฝ่ฑกๅใงใชใใกใฏใฟใชใณใฐใใใฆใใพใใใ
- ๅญฆ็ฟAPIใฏใฉใฎใใใชใขใใซใงใๅไฝใใใใใงใฏใชใใใฉใคใใฉใชใๆไพใใใขใใซใงๅไฝใใใใใซๆ้ฉๅใใใฆใใพใใไธ่ฌ็ใชๆฉๆขฐๅญฆ็ฟใฎใซใผใใซใฏใๅฅใฎใฉใคใใฉใช(ใใใใAccelerate)ใไฝฟ็จใใๅฟ ่ฆใใใใพใใ
- ็งใใกใฏใงใใใ ใๅคใใฎไฝฟ็จไพใ็ดนไปใใใใๅชๅใใฆใใพใใใexamples ใใฉใซใ ใซใใในใฏใชใใใฏใใใพใงไพใงใใใใชใใฎ็นๅฎใฎๅ้กใซๅฏพใใฆใใใซๅไฝใใใใใงใฏใชใใใใชใใฎใใผใบใซๅใใใใใใซๆฐ่กใฎใณใผใใๅคๆดใใๅฟ ่ฆใใใใใจใไบๆณใใใพใใ
ใใฎใชใใธใใชใฏใPython 3.6+, Flax 0.3.2+, PyTorch 1.3.1+, TensorFlow 2.3+ ใงใในใใใใฆใใพใใ
๐คTransformersใฏไปฎๆณ็ฐๅขใซใคใณในใใผใซใใๅฟ ่ฆใใใใพใใPythonใฎไปฎๆณ็ฐๅขใซๆ ฃใใฆใใชใๅ ดๅใฏใใฆใผใถใผใฌใคใใ็ขบ่ชใใฆใใ ใใใ
ใพใใไฝฟ็จใใใใผใธใงใณใฎPythonใงไปฎๆณ็ฐๅขใไฝๆใใใขใฏใใฃใใผใใใพใใ
ใใฎๅพใFlax, PyTorch, TensorFlowใฎใใกๅฐใชใใจใ1ใคใใคใณในใใผใซใใๅฟ ่ฆใใใใพใใ TensorFlowใคใณในใใผใซใใผใธใPyTorchใคใณในใใผใซใใผใธใFlaxใJaxใคใณในใใผใซใใผใธใงใใไฝฟใใฎใใฉใใใใฉใผใ ๅฅใฎใคใณในใใผใซใณใใณใใๅ็ งใใฆใใ ใใใ
ใใใใฎใใใฏใจใณใใฎใใใใใใคใณในใใผใซใใใฆใใๅ ดๅใ๐คTransformersใฏไปฅไธใฎใใใซpipใไฝฟ็จใใฆใคใณในใใผใซใใใใจใใงใใพใ:
pip install transformers
ใใใตใณใใซใ่ฉฆใใใใใพใใฏใณใผใใฎๆๅ ็ซฏใๅฟ ่ฆใงใๆฐใใใชใชใผในใๅพ ใฆใชใๅ ดๅใฏใใฉใคใใฉใชใใฝใผในใใใคใณในใใผใซใใๅฟ ่ฆใใใใพใใ
Transformersใใผใธใงใณ4.0.0ใใใcondaใใฃใณใใซใๆญ่ผใใพใใ: huggingface
ใ
๐คTransformersใฏไปฅไธใฎใใใซcondaใไฝฟใฃใฆ่จญ็ฝฎใใใใจใใงใใพใ:
conda install -c huggingface transformers
FlaxใPyTorchใTensorFlowใcondaใงใคใณในใใผใซใใๆนๆณใฏใใใใใใฎใคใณในใใผใซใใผใธใซๅพใฃใฆใใ ใใใ
ๆณจๆ: Windowsใงใฏใใญใฃใใทใฅใฎๆฉๆตใๅใใใใใซใใใใญใใใผใขใผใใๆๅนใซใใใใไฟใใใใใจใใใใพใใใใฎใใใชๅ ดๅใฏใใใฎissueใงใ็ฅใใใใ ใใใ
๐คTransformersใๆไพใใ ๅ จใขใใซใใงใใฏใใคใณใ ใฏใใฆใผใถใผใ็ต็นใซใใฃใฆ็ดๆฅใขใใใญใผใใใใhuggingface.co model hubใใใทใผใ ใฌในใซ็ตฑๅใใใฆใใพใใ
็พๅจใฎใใงใใฏใใคใณใๆฐ:
๐คTransformersใฏ็พๅจใไปฅไธใฎใขใผใญใใฏใใฃใๆไพใใฆใใพใ๏ผใใใใใฎใใคใฌใใซใช่ฆ็ดใฏใใกใใๅ็ งใใฆใใ ใใ๏ผ:
- ALBERT (Google Research and the Toyota Technological Institute at Chicago ใใ) Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- ALIGN (Google Research ใใ) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
- AltCLIP (BAAI ใใ) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
- Audio Spectrogram Transformer (MIT ใใ) Yuan Gong, Yu-An Chung, James Glass ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: AST: Audio Spectrogram Transformer
- BART (Facebook ใใ) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- BARThez (รcole polytechnique ใใ) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
- BARTpho (VinAI Research ใใ) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
- BEiT (Microsoft ใใ) Hangbo Bao, Li Dong, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BEiT: BERT Pre-Training of Image Transformers
- BERT (Google ใใ) Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- BERT For Sequence Generation (Google ใใ) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- BERTweet (VinAI Research ใใ) Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BERTweet: A pre-trained language model for English Tweets
- BigBird-Pegasus (Google Research ใใ) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Big Bird: Transformers for Longer Sequences
- BigBird-RoBERTa (Google Research ใใ) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Big Bird: Transformers for Longer Sequences
- BioGpt (Microsoft Research AI4Science ใใ) Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BioGPT: generative pre-trained transformer for biomedical text generation and mining
- BiT (Google AI ใใ) Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Big Transfer (BiT)Houlsby.
- Blenderbot (Facebook ใใ) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Recipes for building an open-domain chatbot
- BlenderbotSmall (Facebook ใใ) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Recipes for building an open-domain chatbot
- BLIP (Salesforce ใใ) Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- BLIP-2 (Salesforce ใใ) Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- BLOOM (BigScience workshop ใใ) BigScience Workshop ใใๅ ฌ้ใใใพใใ.
- BORT (Alexa ใใ) Adrian de Wynter and Daniel J. Perry ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Optimal Subarchitecture Extraction For BERT
- BridgeTower (Harbin Institute of Technology/Microsoft Research Asia/Intel Labs ใใ) released with the paper BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
- ByT5 (Google Research ใใ) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ByT5: Towards a token-free future with pre-trained byte-to-byte models
- CamemBERT (Inria/Facebook/Sorbonne ใใ) Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, รric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: CamemBERT: a Tasty French Language Model
- CANINE (Google Research ใใ) Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
- Chinese-CLIP (OFA-Sys ใใ) An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
- CLAP (LAION-AI ใใ) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation]https://arxiv.org/abs/2211.06687)
- CLIP (OpenAI ใใ) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Learning Transferable Visual Models From Natural Language Supervision
- CLIPSeg (University of Gรถttingen ใใ) Timo Lรผddecke and Alexander Ecker ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Image Segmentation Using Text and Image Prompts
- CodeGen (Salesforce ใใ) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: A Conversational Paradigm for Program Synthesis
- Conditional DETR (Microsoft Research Asia ใใ) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Conditional DETR for Fast Training Convergence
- ConvBERT (YituTech ใใ) Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ConvBERT: Improving BERT with Span-based Dynamic Convolution
- ConvNeXT (Facebook AI ใใ) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: A ConvNet for the 2020s
- ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
- CPM (Tsinghua University ใใ) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: CPM: A Large-scale Generative Chinese Pre-trained Language Model
- CTRL (Salesforce ใใ) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: CTRL: A Conditional Transformer Language Model for Controllable Generation
- CvT (Microsoft ใใ) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: CvT: Introducing Convolutions to Vision Transformers
- Data2Vec (Facebook ใใ) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
- DeBERTa (Microsoft ใใ) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: DeBERTa: Decoding-enhanced BERT with Disentangled Attention
- DeBERTa-v2 (Microsoft ใใ) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: DeBERTa: Decoding-enhanced BERT with Disentangled Attention
- Decision Transformer (Berkeley/Facebook/Google ใใ) Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Decision Transformer: Reinforcement Learning via Sequence Modeling
- Deformable DETR (SenseTime Research ใใ) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Deformable DETR: Deformable Transformers for End-to-End Object Detection
- DeiT (Facebook ใใ) Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Training data-efficient image transformers & distillation through attention
- DETA (The University of Texas at Austin ใใ) Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ NMS Strikes Back
- DETR (Facebook ใใ) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: End-to-End Object Detection with Transformers
- DialoGPT (Microsoft Research ใใ) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
- DiNAT (SHI Labs ใใ) Ali Hassani and Humphrey Shi ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Dilated Neighborhood Attention Transformer
- DistilBERT (HuggingFace ใใ), Victor Sanh, Lysandre Debut and Thomas Wolf. ๅใๆๆณใง GPT2, RoBERTa ใจ Multilingual BERT ใฎๅง็ธฎใ่กใใพใใ.ๅง็ธฎใใใใขใใซใฏใใใใ DistilGPT2ใDistilRoBERTaใDistilmBERT ใจๅไปใใใใพใใ. ๅ ฌ้ใใใ็ ็ฉถ่ซๆ: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- DiT (Microsoft Research ใใ) Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: DiT: Self-supervised Pre-training for Document Image Transformer
- Donut (NAVER ใใ), Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: OCR-free Document Understanding Transformer
- DPR (Facebook ใใ) Vladimir Karpukhin, Barlas Oฤuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Dense Passage Retrieval for Open-Domain Question Answering
- DPT (Intel Labs ใใ) Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Vision Transformers for Dense Prediction
- EfficientFormer (Snap Research ใใ) Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ EfficientFormer: Vision Transformers at MobileNetSpeed
- EfficientNet (from Google Brain) released with the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
- ELECTRA (Google Research/Stanford University ใใ) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ELECTRA: Pre-training text encoders as discriminators rather than generators
- EncoderDecoder (Google Research ใใ) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- ERNIE (Baidu ใใ) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ERNIE: Enhanced Representation through Knowledge Integration
- ErnieM (Baidu ใใ) Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
- ESM (Meta AI ใใ) ใฏใใฉใณในใใฉใผใใผใใญใใคใณ่จ่ชใขใใซใงใ. ESM-1b ใฏ Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. ESM-1v ใฏ Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rivesใใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Language models enable zero-shot prediction of the effects of mutations on protein function. ESM-2 ใจใESMFold ใฏ Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Language models of protein sequences at the scale of evolution enable accurate structure prediction
- FLAN-T5 (Google AI ใใ) Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V ใใๅ ฌ้ใใใใฌใใธใใชใผ google-research/t5x Le, and Jason Wei
- FLAN-UL2 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
- FlauBERT (CNRS ใใ) Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: FlauBERT: Unsupervised Language Model Pre-training for French
- FLAVA (Facebook AI ใใ) Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: FLAVA: A Foundational Language And Vision Alignment Model
- FNet (Google Research ใใ) James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: FNet: Mixing Tokens with Fourier Transforms
- Funnel Transformer (CMU/Google Brain ใใ) Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
- GIT (Microsoft Research ใใ) Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ GIT: A Generative Image-to-text Transformer for Vision and Language
- GLPN (KAIST ใใ) Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
- GPT (OpenAI ใใ) Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Improving Language Understanding by Generative Pre-Training
- GPT Neo (EleutherAI ใใ) Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy ใใๅ ฌ้ใใใใฌใใธใใชใผ : EleutherAI/gpt-neo
- GPT NeoX (EleutherAI ใใ) Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: GPT-NeoX-20B: An Open-Source Autoregressive Language Model
- GPT NeoX Japanese (ABEJA ใใ) Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori ใใใชใชใผใน.
- GPT-2 (OpenAI ใใ) Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Language Models are Unsupervised Multitask Learners
- GPT-J (EleutherAI ใใ) Ben Wang and Aran Komatsuzaki ใใๅ ฌ้ใใใใฌใใธใใชใผ kingoflolz/mesh-transformer-jax
- GPT-Sw3 (AI-Sweden ใใ) Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey รhman, Fredrik Carlsson, Magnus Sahlgren ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
- GPTSAN-japanese tanreinama/GPTSAN ๅๆฌไฟไน(tanreinama)ใใใชใชใผในใใใพใใ.
- Graphormer (Microsoft ใใ) Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Do Transformers Really Perform Bad for Graph Representation?.
- GroupViT (UCSD, NVIDIA ใใ) Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: GroupViT: Semantic Segmentation Emerges from Text Supervision
- Hubert (Facebook ใใ) Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
- I-BERT (Berkeley ใใ) Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: I-BERT: Integer-only BERT Quantization
- ImageGPT (OpenAI ใใ) Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Generative Pretraining from Pixels
- Informer (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
- Jukebox (OpenAI ใใ) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Jukebox: A Generative Model for Music
- LayoutLM (Microsoft Research Asia ใใ) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LayoutLM: Pre-training of Text and Layout for Document Image Understanding
- LayoutLMv2 (Microsoft Research Asia ใใ) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
- LayoutLMv3 (Microsoft Research Asia ใใ) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
- LayoutXLM (Microsoft Research Asia ใใ) Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
- LED (AllenAI ใใ) Iz Beltagy, Matthew E. Peters, Arman Cohan ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Longformer: The Long-Document Transformer
- LeViT (Meta AI ใใ) Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference
- LiLT (South China University of Technology ใใ) Jiapeng Wang, Lianwen Jin, Kai Ding ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
- LLaMA (The FAIR team of Meta AI ใใ) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ LLaMA: Open and Efficient Foundation Language Models
- Longformer (AllenAI ใใ) Iz Beltagy, Matthew E. Peters, Arman Cohan ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Longformer: The Long-Document Transformer
- LongT5 (Google AI ใใ) Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LongT5: Efficient Text-To-Text Transformer for Long Sequences
- LUKE (Studio Ousia ใใ) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
- LXMERT (UNC Chapel Hill ใใ) Hao Tan and Mohit Bansal ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering
- M-CTC-T (Facebook ใใ) Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Pseudo-Labeling For Massively Multilingual Speech Recognition
- M2M100 (Facebook ใใ) Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Beyond English-Centric Multilingual Machine Translation
- MarianMT Jรถrg Tiedemann ใใ. OPUS ใไฝฟใใชใใๅญฆ็ฟใใใ "Machine translation" (ใใทใณใใฉใณในใฌใผใทใงใณ) ใขใใซ. Marian Framework ใฏMicrosoft Translator Teamใใ็พๅจ้็บไธญใงใ.
- MarkupLM (Microsoft Research Asia ใใ) Junlong Li, Yiheng Xu, Lei Cui, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
- Mask2Former (FAIR and UIUC ใใ) Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Masked-attention Mask Transformer for Universal Image Segmentation
- MaskFormer (Meta and UIUC ใใ) Bowen Cheng, Alexander G. Schwing, Alexander Kirillov ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Per-Pixel Classification is Not All You Need for Semantic Segmentation
- mBART (Facebook ใใ) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Multilingual Denoising Pre-training for Neural Machine Translation
- mBART-50 (Facebook ใใ) Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
- MEGA (Facebook ใใ) Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Mega: Moving Average Equipped Gated Attention
- Megatron-BERT (NVIDIA ใใ) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Megatron-GPT2 (NVIDIA ใใ) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- MGP-STR (Alibaba Research ใใ) Peng Wang, Cheng Da, and Cong Yao. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Multi-Granularity Prediction for Scene Text Recognition
- mLUKE (Studio Ousia ใใ) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
- MobileBERT (CMU/Google Brain ใใ) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
- MobileNetV1 (Google Inc. ใใ) Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- MobileNetV2 (Google Inc. ใใ) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MobileNetV2: Inverted Residuals and Linear Bottlenecks
- MobileViT (Apple ใใ) Sachin Mehta and Mohammad Rastegari ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
- MPNet (Microsoft Research ใใ) Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MPNet: Masked and Permuted Pre-training for Language Understanding
- MT5 (Google AI ใใ) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: mT5: A massively multilingual pre-trained text-to-text transformer
- MVP (RUC AI Box ใใ) Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MVP: Multi-task Supervised Pre-training for Natural Language Generation
- NAT (SHI Labs ใใ) Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Neighborhood Attention Transformer
- Nezha (Huawei Noahโs Ark Lab ใใ) Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: NEZHA: Neural Contextualized Representation for Chinese Language Understanding
- NLLB (Meta ใใ) the NLLB team ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: No Language Left Behind: Scaling Human-Centered Machine Translation
- NLLB-MOE (Meta ใใ) the NLLB team. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ No Language Left Behind: Scaling Human-Centered Machine Translation
- Nystrรถmformer (the University of Wisconsin - Madison ใใ) Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention
- OneFormer (SHI Labs ใใ) Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: OneFormer: One Transformer to Rule Universal Image Segmentation
- OPT (Meta AI ใใ) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: OPT: Open Pre-trained Transformer Language Models
- OWL-ViT (Google AI ใใ) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Simple Open-Vocabulary Object Detection with Vision Transformers
- Pegasus (Google ใใ) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- PEGASUS-X (Google ใใ) Jason Phang, Yao Zhao, and Peter J. Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Investigating Efficiently Extending Transformers for Long Input Summarization
- Perceiver IO (Deepmind ใใ) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Perceiver IO: A General Architecture for Structured Inputs & Outputs
- PhoBERT (VinAI Research ใใ) Dat Quoc Nguyen and Anh Tuan Nguyen ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: PhoBERT: Pre-trained language models for Vietnamese
- Pix2Struct (Google ใใ) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
- PLBart (UCLA NLP ใใ) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Unified Pre-training for Program Understanding and Generation
- PoolFormer (Sea AI Labs ใใ) Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: MetaFormer is Actually What You Need for Vision
- ProphetNet (Microsoft Research ใใ) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
- QDQBert (NVIDIA ใใ) Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
- RAG (Facebook ใใ) Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- REALM (Google Research ใใ) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: REALM: Retrieval-Augmented Language Model Pre-Training
- Reformer (Google Research ใใ) Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Reformer: The Efficient Transformer
- RegNet (META Platforms ใใ) Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Designing Network Design Space
- RemBERT (Google Research ใใ) Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Rethinking embedding coupling in pre-trained language models
- ResNet (Microsoft Research ใใ) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Deep Residual Learning for Image Recognition
- RoBERTa (Facebook ใใ), Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: RoBERTa: A Robustly Optimized BERT Pretraining Approach
- RoBERTa-PreLayerNorm (Facebook ใใ) Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: fairseq: A Fast, Extensible Toolkit for Sequence Modeling
- RoCBert (WeChatAI ใใ) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining
- RoFormer (ZhuiyiTechnology ใใ), Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: RoFormer: Enhanced Transformer with Rotary Position Embedding
- SegFormer (NVIDIA ใใ) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
- SEW (ASAPP ใใ) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
- SEW-D (ASAPP ใใ) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
- SpeechT5 (Microsoft Research ใใ) Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
- SpeechToTextTransformer (Facebook ใใ), Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: fairseq S2T: Fast Speech-to-Text Modeling with fairseq
- SpeechToTextTransformer2 (Facebook ใใ), Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Large-Scale Self- and Semi-Supervised Learning for Speech Translation
- Splinter (Tel Aviv University ใใ), Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Few-Shot Question Answering by Pretraining Span Selection
- SqueezeBERT (Berkeley ใใ) Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
- Swin Transformer (Microsoft ใใ) Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- Swin Transformer V2 (Microsoft ใใ) Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Swin Transformer V2: Scaling Up Capacity and Resolution
- Swin2SR (University of Wรผrzburg ใใ) Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration
- SwitchTransformers (Google ใใ) William Fedus, Barret Zoph, Noam Shazeer ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- T5 (Google AI ใใ) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- T5v1.1 (Google AI ใใ) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ใใๅ ฌ้ใใใใฌใใธใใชใผ google-research/text-to-text-transfer-transformer
- Table Transformer (Microsoft Research ใใ) Brandon Smock, Rohith Pesala, Robin Abraham ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
- TAPAS (Google AI ใใ) Jonathan Herzig, Paweล Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: TAPAS: Weakly Supervised Table Parsing via Pre-training
- TAPEX (Microsoft Research ใใ) Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: TAPEX: Table Pre-training via Learning a Neural SQL Executor
- Time Series Transformer (HuggingFace ใใ).
- TimeSformer (Facebook ใใ) Gedas Bertasius, Heng Wang, Lorenzo Torresani ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Is Space-Time Attention All You Need for Video Understanding?
- Trajectory Transformer (the University of California at Berkeley ใใ) Michael Janner, Qiyang Li, Sergey Levine ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Offline Reinforcement Learning as One Big Sequence Modeling Problem
- Transformer-XL (Google/CMU ใใ) Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- TrOCR (Microsoft ใใ), Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
- TVLT (from UNC Chapel Hill ใใ), Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: TVLT: Textless Vision-Language Transformer
- UL2 (Google Research ใใ) Yi Tay, Mostafa Dehghani, Vinh Q ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Unifying Language Learning Paradigms Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
- UniSpeech (Microsoft Research ใใ) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
- UniSpeechSat (Microsoft Research ใใ) Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
- UPerNet (Peking University ใใ) Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Unified Perceptual Parsing for Scene Understanding
- VAN (Tsinghua University and Nankai University ใใ) Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Visual Attention Network
- VideoMAE (Multimedia Computing Group, Nanjing University ใใ) Zhan Tong, Yibing Song, Jue Wang, Limin Wang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- ViLT (NAVER AI Lab/Kakao Enterprise/Kakao Brain ใใ) Wonjae Kim, Bokyung Son, Ildoo Kim ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
- Vision Transformer (ViT) (Google AI ใใ) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- VisualBERT (UCLA NLP ใใ) Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: VisualBERT: A Simple and Performant Baseline for Vision and Language
- ViT Hybrid (Google AI ใใ) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- ViTMAE (Meta AI ใใ) Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Masked Autoencoders Are Scalable Vision Learners
- ViTMSN (Meta AI ใใ) Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Masked Siamese Networks for Label-Efficient Learning
- Wav2Vec2 (Facebook AI ใใ) Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Wav2Vec2-Conformer (Facebook AI ใใ) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ
- Wav2Vec2Phoneme (Facebook AI ใใ) Qiantong Xu, Alexei Baevski, Michael Auli ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
- WavLM (Microsoft Research ใใ) Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
- Whisper (OpenAI ใใ) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Robust Speech Recognition via Large-Scale Weak Supervision
- X-CLIP (Microsoft Research ใใ) Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Expanding Language-Image Pretrained Models for General Video Recognition
- X-MOD (Meta AI ใใ) Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ Lifting the Curse of Multilinguality by Pre-training Modular Transformers
- XGLM (From Facebook AI) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Few-shot Learning with Multilingual Language Models
- XLM (Facebook ใใ) Guillaume Lample and Alexis Conneau ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Cross-lingual Language Model Pretraining
- XLM-ProphetNet (Microsoft Research ใใ) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
- XLM-RoBERTa (Facebook AI ใใ), Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Unsupervised Cross-lingual Representation Learning at Scale
- XLM-RoBERTa-XL (Facebook AI ใใ), Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Larger-Scale Transformers for Multilingual Masked Language Modeling
- XLM-V (Meta AI ใใ) Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
- XLNet (Google/CMU ใใ) Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: โXLNet: Generalized Autoregressive Pretraining for Language Understanding
- XLS-R (Facebook AI ใใ) Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
- XLSR-Wav2Vec2 (Facebook AI ใใ) Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: Unsupervised Cross-Lingual Representation Learning For Speech Recognition
- YOLOS (Huazhong University of Science & Technology ใใ) Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
- YOSO (the University of Wisconsin - Madison ใใ) Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh ใใๅ ฌ้ใใใ็ ็ฉถ่ซๆ: You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
- ๆฐใใใขใใซใๆ็จฟใใใใงใใ๏ผๆฐใใใขใใซใ่ฟฝๅ ใใใใใฎใฌใคใใจใใฆใ่ฉณ็ดฐใชใฌใคใใจใใณใใฌใผใใ่ฟฝๅ ใใใพใใใใใใใฏใชใใธใใชใฎ
templates
ใใฉใซใใซใใใพใใPRใๅงใใๅใซใๅฟ ใใณใณใใชใใฅใผใทใงใณใฌใคใใ็ขบ่ชใใใกใณใใใซ้ฃ็ตกใใใใใใฃใผใใใใฏใๅ้ใใใใใซissueใ้ใใฆใใ ใใใ
ๅใขใใซใFlaxใPyTorchใTensorFlowใงๅฎ่ฃ ใใใฆใใใใ๐คTokenizersใฉใคใใฉใชใซๆฏใใใใ้ข้ฃใใผใฏใใคใถใๆใฃใฆใใใใฏใใใฎ่กจใๅ็ งใใฆใใ ใใใ
ใใใใฎๅฎ่ฃ ใฏใใใคใใฎใใผใฟใปใใใงใในใใใใฆใใ(ใตใณใใซในใฏใชใใใๅ็ ง)ใใชใชใธใใซใฎๅฎ่ฃ ใฎๆง่ฝใจไธ่ดใใใฏใใงใใใๆง่ฝใฎ่ฉณ็ดฐใฏdocumentationใฎExamplesใปใฏใทใงใณใง่ฆใใใจใใงใใพใใ
ใปใฏใทใงใณ | ๆฆ่ฆ |
---|---|
ใใญใฅใกใณใ | ๅฎๅ จใชAPIใใญใฅใกใณใใจใใฅใผใใชใขใซ |
ใฟในใฏๆฆ่ฆ | ๐คTransformersใใตใใผใใใใฟในใฏ |
ๅๅฆ็ใใฅใผใใชใขใซ | ใขใใซ็จใฎใใผใฟใๆบๅใใใใใซTokenizer ใฏใฉในใไฝฟ็จ |
ใใฌใผใใณใฐใจๅพฎ่ชฟๆด | PyTorch/TensorFlowใฎๅญฆ็ฟใซใผใใจTrainer APIใง๐คTransformersใๆไพใใใขใใซใไฝฟ็จ |
ใฏใคใใฏใใขใผ: ๅพฎ่ชฟๆด/ไฝฟ็จๆนๆณในใฏใชใใ | ๆงใ ใชใฟในใฏใงใขใใซใฎๅพฎ่ชฟๆดใ่กใใใใฎในใฏใชใใไพ |
ใขใใซใฎๅ ฑๆใจใขใใใญใผใ | ๅพฎ่ชฟๆดใใใขใใซใใขใใใญใผใใใฆใณใใฅใใใฃใงๅ ฑๆใใ |
ใใคใฐใฌใผใทใงใณ | pytorch-transformers ใพใใฏpytorch-pretrained-bert ใใ๐คTransformers ใซ็งป่กใใ |
๐ค ใใฉใณในใใฉใผใใผใฉใคใใฉใชใซๅผ็จใงใใ่ซๆใๅบๆฅใพใใ:
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}