Downloading Models¶
Polyglot requires a model for each task and language. These models are essential for the library to function. Given the large size of some of the models, we distribute the models through a download manager separately. The download manager has several modes of operation.
Modes of Operation¶
Command Line Mode¶
The subcommand download
takes a package or more as an argument and
download the specified packages in the polyglot_data
directory.
!polyglot download --help
usage: polyglot download [-h] [--dir DIR] [--quiet] [--force] [--exit-on-error] [--url SERVER_INDEX_URL] [packages [packages ...]]
positional arguments:
packages packages to be downloaded
optional arguments:
-h, --help show this help message and exit
--dir DIR download package to directory DIR
--quiet work quietly
--force download even if already installed
--exit-on-error exit if an error occurs
--url SERVER_INDEX_URL
download server index url
!polyglot download morph2.en
[polyglot_data] Downloading package morph2.en to
[polyglot_data] /home/rmyeid/polyglot_data...
[polyglot_data] Package morph2.en is already up-to-date!
Interactive Mode¶
You can reach this mode by not supplying any arguments to the command line.
!polyglot download
Polyglot Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader>
Library Interface¶
from polyglot.downloader import downloader
downloader.download("embeddings2.en")
Collections¶
You noticed, by now, that we can install a specific model by specifying its name and the target language.
Package name format is task_name.language_code
Packages are grouped by language. For example, if we want to download
all the models that are specific to Arabic, the arabic collection of
models name is LANG: followed by the language code of Arabic which
is ar
.
Therefore, we can just run:
!polyglot download LANG:ar
[polyglot_data] Downloading collection u'LANG:ar'
[polyglot_data] |
[polyglot_data] | Downloading package tsne2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package tsne2.ar is already up-to-date!
[polyglot_data] | Downloading package transliteration2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package transliteration2.ar is already up-to-
[polyglot_data] | date!
[polyglot_data] | Downloading package morph2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package morph2.ar is already up-to-date!
[polyglot_data] | Downloading package counts2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package counts2.ar is already up-to-date!
[polyglot_data] | Downloading package sentiment2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package sentiment2.ar is already up-to-date!
[polyglot_data] | Downloading package embeddings2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package embeddings2.ar is already up-to-date!
[polyglot_data] | Downloading package ner2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package ner2.ar is already up-to-date!
[polyglot_data] |
[polyglot_data] Done downloading collection LANG:ar
Packages are grouped by task. For example, if we want to download all the models that perform transliteration. The collection name is TASK: followed by the task name.
Therefore, we can just run:
downloader.download("TASK:transliteration2", quiet=True)
True
Langauge & Task Support¶
We can query our download manager for which tasks are supported by polyglot, as the following:
downloader.supported_tasks(lang="en")
[u'embeddings2',
u'counts2',
u'pos2',
u'ner2',
u'sentiment2',
u'morph2',
u'tsne2']
We can query our download manager for which languages are supported by polyglot named entity recognition subsystem, as the following:
print(downloader.supported_languages_table(task="ner2"))
1. Polish 2. Turkish 3. Russian
4. Indonesian 5. Czech 6. Arabic
7. Korean 8. Catalan; Valencian 9. Italian
10. Thai 11. Romanian, Moldavian, ... 12. Tagalog
13. Danish 14. Finnish 15. German
16. Persian 17. Dutch 18. Chinese
19. French 20. Portuguese 21. Slovak
22. Hebrew (modern) 23. Malay 24. Slovene
25. Bulgarian 26. Hindi 27. Japanese
28. Hungarian 29. Croatian 30. Ukrainian
31. Serbian 32. Lithuanian 33. Norwegian
34. Latvian 35. Swedish 36. English
37. Greek, Modern 38. Spanish; Castilian 39. Vietnamese
40. Estonian
You can view all the available and/or installed collections or packages through the list function
downloader.list(show_packages=False)
Using default data directory (/home/rmyeid/polyglot_data)
=========================================
Data server index for <polyglot-models>
=========================================
Collections:
[ ] LANG:af............. Afrikaans packages and models
[ ] LANG:als............ als packages and models
[ ] LANG:am............. Amharic packages and models
[ ] LANG:an............. Aragonese packages and models
[ ] LANG:ar............. Arabic packages and models
[ ] LANG:arz............ arz packages and models
[ ] LANG:as............. Assamese packages and models
[ ] LANG:ast............ Asturian packages and models
[ ] LANG:az............. Azerbaijani packages and models
[ ] LANG:ba............. Bashkir packages and models
[ ] LANG:bar............ bar packages and models
[ ] LANG:be............. Belarusian packages and models
[ ] LANG:bg............. Bulgarian packages and models
[ ] LANG:bn............. Bengali packages and models
[ ] LANG:bo............. Tibetan packages and models
[ ] LANG:bpy............ bpy packages and models
[ ] LANG:br............. Breton packages and models
[ ] LANG:bs............. Bosnian packages and models
[ ] LANG:ca............. Catalan packages and models
[ ] LANG:ce............. Chechen packages and models
[ ] LANG:ceb............ Cebuano packages and models
[ ] LANG:cs............. Czech packages and models
[ ] LANG:cv............. Chuvash packages and models
[ ] LANG:cy............. Welsh packages and models
[ ] LANG:da............. Danish packages and models
[ ] LANG:de............. German packages and models
[ ] LANG:diq............ diq packages and models
[ ] LANG:dv............. Divehi packages and models
[ ] LANG:el............. Greek packages and models
[P] LANG:en............. English packages and models
[ ] LANG:eo............. Esperanto packages and models
[ ] LANG:es............. Spanish packages and models
[ ] LANG:et............. Estonian packages and models
[ ] LANG:eu............. Basque packages and models
[ ] LANG:fa............. Persian packages and models
[ ] LANG:fi............. Finnish packages and models
[ ] LANG:fo............. Faroese packages and models
[ ] LANG:fr............. French packages and models
[ ] LANG:fy............. Western Frisian packages and models
[ ] LANG:ga............. Irish packages and models
[ ] LANG:gan............ gan packages and models
[ ] LANG:gd............. Scottish Gaelic packages and models
[ ] LANG:gl............. Galician packages and models
[ ] LANG:gu............. Gujarati packages and models
[ ] LANG:gv............. Manx packages and models
[ ] LANG:he............. Hebrew packages and models
[ ] LANG:hi............. Hindi packages and models
[ ] LANG:hif............ hif packages and models
[ ] LANG:hr............. Croatian packages and models
[ ] LANG:hsb............ Upper Sorbian packages and models
[ ] LANG:ht............. Haitian packages and models
[ ] LANG:hu............. Hungarian packages and models
[ ] LANG:hy............. Armenian packages and models
[ ] LANG:ia............. Interlingua packages and models
[ ] LANG:id............. Indonesian packages and models
[ ] LANG:ilo............ Iloko packages and models
[ ] LANG:io............. Ido packages and models
[ ] LANG:is............. Icelandic packages and models
[ ] LANG:it............. Italian packages and models
[ ] LANG:ja............. Japanese packages and models
[ ] LANG:jv............. Javanese packages and models
[ ] LANG:ka............. Georgian packages and models
[ ] LANG:kk............. Kazakh packages and models
[ ] LANG:km............. Khmer packages and models
[ ] LANG:kn............. Kannada packages and models
[ ] LANG:ko............. Korean packages and models
[ ] LANG:ku............. Kurdish packages and models
[ ] LANG:ky............. Kyrgyz packages and models
[ ] LANG:la............. Latin packages and models
[ ] LANG:lb............. Luxembourgish packages and models
[ ] LANG:li............. Limburgish packages and models
[ ] LANG:lmo............ lmo packages and models
[ ] LANG:lt............. Lithuanian packages and models
[ ] LANG:lv............. Latvian packages and models
[ ] LANG:mg............. Malagasy packages and models
[ ] LANG:mk............. Macedonian packages and models
[ ] LANG:ml............. Malayalam packages and models
[ ] LANG:mn............. Mongolian packages and models
[ ] LANG:mr............. Marathi packages and models
[ ] LANG:ms............. Malay packages and models
[ ] LANG:mt............. Maltese packages and models
[ ] LANG:my............. Burmese packages and models
[ ] LANG:ne............. Nepali packages and models
[ ] LANG:nl............. Dutch packages and models
[ ] LANG:nn............. Norwegian Nynorsk packages and models
[ ] LANG:no............. Norwegian packages and models
[ ] LANG:oc............. Occitan packages and models
[ ] LANG:or............. Oriya packages and models
[ ] LANG:os............. Ossetic packages and models
[ ] LANG:pa............. Punjabi packages and models
[ ] LANG:pam............ Pampanga packages and models
[ ] LANG:pl............. Polish packages and models
[ ] LANG:pms............ pms packages and models
[ ] LANG:ps............. Pashto packages and models
[ ] LANG:pt............. Portuguese packages and models
[ ] LANG:qu............. Quechua packages and models
[ ] LANG:rm............. Romansh packages and models
[ ] LANG:ro............. Romanian packages and models
[ ] LANG:ru............. Russian packages and models
[ ] LANG:sa............. Sanskrit packages and models
[ ] LANG:sah............ Sakha packages and models
[ ] LANG:scn............ Sicilian packages and models
[ ] LANG:sco............ Scots packages and models
[ ] LANG:se............. Northern Sami packages and models
[ ] LANG:sh............. Serbo-Croatian packages and models
[ ] LANG:si............. Sinhala packages and models
[ ] LANG:sk............. Slovak packages and models
[ ] LANG:sl............. Slovenian packages and models
[ ] LANG:sq............. Albanian packages and models
[ ] LANG:sr............. Serbian packages and models
[ ] LANG:su............. Sundanese packages and models
[ ] LANG:sv............. Swedish packages and models
[ ] LANG:sw............. Swahili packages and models
[ ] LANG:szl............ szl packages and models
[ ] LANG:ta............. Tamil packages and models
[ ] LANG:te............. Telugu packages and models
[ ] LANG:tg............. Tajik packages and models
[ ] LANG:th............. Thai packages and models
[ ] LANG:tk............. Turkmen packages and models
[ ] LANG:tl............. Tagalog packages and models
[ ] LANG:tr............. Turkish packages and models
[ ] LANG:tt............. Tatar packages and models
[ ] LANG:ug............. Uyghur packages and models
[ ] LANG:uk............. Ukrainian packages and models
[ ] LANG:ur............. Urdu packages and models
[ ] LANG:uz............. Uzbek packages and models
[ ] LANG:vec............ vec packages and models
[ ] LANG:vi............. Vietnamese packages and models
[ ] LANG:vls............ vls packages and models
[ ] LANG:vo............. Volapük packages and models
[ ] LANG:wa............. Walloon packages and models
[ ] LANG:war............ Waray packages and models
[ ] LANG:yi............. Yiddish packages and models
[ ] LANG:yo............. Yoruba packages and models
[ ] LANG:zh............. Chinese packages and models
[ ] LANG:zhc............ Chinese Character packages and models
[ ] LANG:zhw............ zhw packages and models
[ ] TASK:counts2........ counts2
[ ] TASK:embeddings2.... embeddings2
[ ] TASK:ner2........... ner2
[P] TASK:sentiment2..... sentiment2
[ ] TASK:tsne2.......... tsne2
([*] marks installed packages; [P] marks partially installed collections)