-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion #229
Comments
I think that list is a valid requirement for some people, but not everyone. I'd add that as a new list within this repo. |
Just add another file in src called "recognized-un-country.json" with a 1 or 0 value. This will keep the existing structure and pushed the responsibility to the person(s) creating their application. Hope this helps. |
Thanks, but I don't think it would be nice to keep the existing structure. Some src files have more entries than others. The idea is for all files to contain all 193 countries in the same order, so if you want to get multiple data of one country from all files, it would be very convenient. |
Does this data currently scrapped from wikipedia? if yes is it automated? |
Yes it's scrapped of Wikipedia mostly. Automating the process has been the goal for a long but I can't find much time that's why it's not implemented |
I can implement the automation but I still don't understand the wikipedia data. https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population In this wikipedia I don't understand what is the difference between numbered country and |
So numbered are officially recognised countries in numbered are disputed like Taiwan for example. This repo should focus only on recognised countries |
@samayo should this repo include the two observer states? |
Yes I think that would be ok |
Proposed ChangesUpdate
Removed
Added
Changed
Source
Note
|
Great point thanks for all the help so far you are making this easy even if I want to implement it. Some notes:
Lets leave the above as is for now because I don't see a reason to change that I'm ok with removing alphabet letters but not country names, remember that there are many websites, games that need to display just country names for some reason Other than those remarks everything else is a great idea |
Btw I was recently thinking to give chatgpt the Wikipedia section that contains the data and ask it to generate the python code to convert the data from html to JSON and use that script every month to look for more updates. The script would be made using python with scrappy I have an unfinished version of it in my local. Once chatgpt creates the script and it works we upload the script to a server and with cronjob run it every month to scrap and send a pr request That's what I thought initially, feel free to work upon the idea of provide your own |
I have never used Now I mostly use Instead of vps we can use github action instead. For Can we move this repo to new organization so I can add sdk If I can. |
Where did you find the flag svg @samayo? |
Wherever they are, they 100% need to go through SVGO or a similar compressor. |
I can't find the svg source to scrap. The image html always unstructured. |
It's from Wikipedia. Check each country's flag page, it will have SVG format |
@samayo https://en.wikipedia.org/wiki/File:Flag_of_the_United_States_(DoS_ECA_Color_Standard).svg This still need to go through svgo? |
I don't understand your question. You will find a .svg file on every Wikipedia page and that must be converted to base64 format. We store in this repo a base64 representation of the svg |
isn't svgo is for optimizing svg? |
You still need to right click on the flag and select "open image in new tab..." Then you will see this URL That is the SVG the one you linked is html page |
What is svgo for? |
That one is actually okay, but some flags are massive files. SVGOMG is just a GUI for SVGO. |
Ok so wikipedia -> svgo optimize -> base64? |
@samayo Can I make the scrapper with |
I highly suggest python so I can contribute also but you decide. Where do you plan to host the script? Here or at your own GitHub page? |
@samayo If you want to use |
I can create new repo so I can use typescript instead. If you don't want @samayo. |
I think I will give it a shot and you can also go ahead and try we can use one or the other or both. I am happy to get a regular pr from anyone |
Ok i will create a pr later with typescript |
@samayo any update for my previous question? |
all from wikipedia |
yes we should definitely add the country, we use ca use null, none, false or 0 |
Can you add the link or the wikipedia page? I can't find it. |
It seems I was wrong, it is not from wikipedia and the way the data is represented is not entirely optimal. Can you use this instead? https://developers.google.com/public-data/docs/canonical/countries_csv You can use other source. |
That data is different with this https://github.com/samayo/country-json/blob/0c522ea1e7ae88e9a2dd979322fbf8c2814b0de6/src/country-by-geo-coordinates.json The data in google doesn't have |
It's fine, we can use whatever is close enough and if there is a need to improve it then we can do that later |
@samayo I have some problem that the country name in different wikipedia page have different names For example
We can compare the url but I don't know will it still be different. But I'll try. Edit 1
Edit 2
|
I don't know about the redirect issue, but if you found the solution then it's good. |
If we use if like that the automation will be broken when the wikipedia page is updated and there is so many alias |
that's unlikely, i don't know what you are planning to use but using python and panda, to find the table you are looking for very easily. Take a look at this https://medium.com/analytics-vidhya/web-scraping-a-wikipedia-table-into-a-dataframe-c52617e1f451 from step 5, it is very easy to get all tables in the page and target the table you need. So it's unlikely any changes will break as far as I think. |
I can resolve the redirect issue using this https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bredirects api. |
@samayo In this wikipedia page https://en.wikipedia.org/wiki/List_of_country_calling_codes Some countries have 2 or more codes Which code should we include? In the old json its just concat the {
"country": "Puerto Rico",
"calling_code": 1939
}, |
@kennarddh |
We can use something like this {
"country": "example",
"data": [1787, 1938]
} |
@kennarddh looks good for me |
@samayo russia have like range code? {
"country": "russia",
"data": [71, 72, 73, 74, 75, 78, 79]
} Is this right? |
I think it's better to use 7 (007) as the others seem to be extensions |
Kazakhstan Also uses And also this |
Yes the two first countries are not officially recognised. Let's use 7 for Russia until something changes |
1242 The Bahamas is an independent country but uses the US calling code +1 and extension 242 So we should use the full calling code 242 |
Its not consistent. @samayo how to parse it if its not consistent? |
In that case let's use the value as seen as in Wikipedia, just the way it is written
This is harder than I thought but this is the right thing to do imo |
In that case let's use the value as seen as in Wikipedia, just the way it is written
This is harder than I thought but this is the right thing to do imo Another question is should we include 00 or + or just nothing as the prefix, I say to include 001 So a JSON entry for US should be 001 |
So use string instead of number @samayo? And should we pad the x and y with And for russia should we leave it like |
Good question, yes we should use string but we should not pad x, y with with 00 because then it would mean 00100424 |
isn't it should be 00x if x is 1 char? 0x if x is 2 chars and x if its more than 2 chars And russia will be |
closing as no update needed |
Hello, this is to discuss about new major change to the repo.
I am trying to remove most countries not recognized by UN.
Currently, there are 248 countries in this repo, but the UN recognizes only 193 of them, so this will be a big change.
Other than that, I will fill all data for each country (so, no
null
or empty values)All data will be also automated (to be updated each week whenever something changes in the source like wikipedia)
Let me know if you like to keep this repo as per the UN recognized countries only
The text was updated successfully, but these errors were encountered: