diff --git a/.github/ISSUE_TEMPLATE/revisao_retroativa.yaml b/.github/ISSUE_TEMPLATE/revisao_retroativa.yaml index 07ccc6e41..c8469def7 100644 --- a/.github/ISSUE_TEMPLATE/revisao_retroativa.yaml +++ b/.github/ISSUE_TEMPLATE/revisao_retroativa.yaml @@ -1,6 +1,6 @@ name: Revisão retroativa description: Dar manutenção em código legado de raspadores no repositório -title: "[Revisão retroativa]: Raspador de " +title: "[Revisão retroativa]: " labels: ["refactor"] body: - type: dropdown @@ -10,7 +10,7 @@ body: description: Selecione a opção abaixo multiple: false options: - - Neste repositório, há muitos códigos de raspadores que foram desenvolvidos no começo do projeto e não estão sendo usados. Para passar a usar o raspador deste município, é necessário testar para verificar se segue funcionando e revisá-lo caso não esteja. + - Neste repositório, há muitos códigos de raspadores que foram desenvolvidos no começo do projeto e não estão sendo usados. Para passar a usar o raspador deste município, é necessário testar para verificar se segue funcionando e revisá-lo caso não esteja. Consulte a documentação para te ajudar. validations: required: true - type: input @@ -39,10 +39,3 @@ body: placeholder: ex. mês/ano até atualmente; de 2016 à 2020 validations: required: true - - type: textarea - id: test-list - attributes: - label: Lista de testes - description: "Utilize a lista a seguir de referência para teste. O raspador precisa atender todos os itens para estar pronto para ser usado. \n 1. [ ] Você executou uma extração completa do spider localmente e os dados retornados estavam corretos.\n 2. [ ] Você executou uma extração por período (start_date e end_date definidos) ao menos uma vez e os dados retornados estavam corretos. \n 3. [ ] Você verificou que não existe nenhum erro nos logs (log/ERROR igual a zero).\n 4. [ ] Você definiu o atributo de classe start_date no seu spider com a data do Diário Oficial mais antigo disponível na página da cidade.\n 5. [ ] Você garantiu que todos os campos que poderiam ser extraídos foram extraídos de acordo com a documentação. \n \n Por favor, inclua qualquer informação relevante para o desenvolvimento." - validations: - required: false diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 8f3f6c88c..f810e0b59 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,12 +1,27 @@ -**AO ABRIR** um Pull Request de um novo raspador (spider), marque com um `X` cada um dos items do checklist -abaixo. **NÃO ABRA** um novo Pull Request antes de completar todos os items abaixo. - -#### Checklist - Novo spider -- [ ] Você executou uma extração completa do spider localmente e os dados retornados estavam corretos. -- [ ] Você executou uma extração por período (`start_date` e `end_date` definidos) ao menos uma vez e os dados retornados estavam corretos. -- [ ] Você verificou que não existe nenhum erro nos logs (`log_count/ERROR` igual a zero). -- [ ] Você definiu o atributo de classe `start_date` no seu spider com a data do Diário Oficial mais antigo disponível na página da cidade. -- [ ] Você garantiu que todos os campos que poderiam ser extraídos foram extraídos [de acordo com a documentação](https://docs.queridodiario.ok.org.br/pt/latest/escrevendo-um-novo-spider.html#definicao-de-campos). +**AO ABRIR** uma *Pull Request* de um novo raspador (*spider*), marque com um `X` cada um dos items da checklist abaixo. Caso algum item não seja marcado, JUSTIFIQUE o motivo. + +#### Layout do site publicador de diários oficiais +Marque apenas um dos itens a seguir: +- [ ] O *layout* não se parece com nenhum caso [da lista de *layouts* padrão](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/lista-sistemas-replicaveis.html) +- [ ] É um *layout* padrão e esta PR adiciona a spider base do padrão ao projeto junto com alguns municípios que fazem parte do padrão. +- [ ] É um *layout* padrão e todos os municípios adicionados usam a [classe de spider base](https://github.com/okfn-brasil/querido-diario/tree/main/data_collection/gazette/spiders/base) adequada para o padrão. + +#### Código da(s) spider(s) +- [ ] O(s) raspador(es) adicionado(s) tem os [atributos de classe exigidos](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#UFMunicipioSpider). +- [ ] O(s) raspador(es) adicionado(s) cria(m) objetos do tipo Gazette coletando todos [os metadados necessários](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#Gazette). +- [ ] O atributo de classe [start_date](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#UFMunicipioSpider.start_date) foi preenchido com a data da edição de diário oficial mais antiga disponível no site. +- [ ] Explicitar o atributo de classe [end_date](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#UFMunicipioSpider.end_date) não se fez necessário. +- [ ] Não utilizo `custom_settings` em meu raspador. + +#### Testes +- [ ] Uma coleta-teste **da última edição** foi feita. O arquivo de `.log` deste teste está anexado na PR. +- [ ] Uma coleta-teste **por intervalo arbitrário** foi feita. Os arquivos de `.log`e `.csv` deste teste estão anexados na PR. +- [ ] Uma coleta-teste **completa** foi feita. Os arquivos de `.log` e `.csv` deste teste estão anexados na PR. + +#### Verificações +- [ ] Eu experimentei abrir alguns arquivos de diários oficiais coletados pelo meu raspador e verifiquei eles [conforme a documentação](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#diarios-oficiais-coletados) não encontrando problemas. +- [ ] Eu verifiquei os arquivos `.csv` gerados pela minha coleta [conforme a documentação](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#arquivos-auxiliares) não encontrando problemas. +- [ ] Eu verifiquei os arquivos de `.log` gerados pela minha coleta [conforme a documentação](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#arquivos-auxiliares) não encontrando problemas. #### Descrição diff --git a/.github/workflows/periodic_crawl.yaml b/.github/workflows/daily_crawl.yaml similarity index 79% rename from .github/workflows/periodic_crawl.yaml rename to .github/workflows/daily_crawl.yaml index f541726bc..6b1bcf8e1 100644 --- a/.github/workflows/periodic_crawl.yaml +++ b/.github/workflows/daily_crawl.yaml @@ -1,10 +1,10 @@ -name: Daily execution of Spiders +name: Daily Crawl of Enabled Spiders on: schedule: - # Execute twice a day at 8AM/6PM (BRT) - - cron: "0 11 * * *" + # Execute once a day at 6PM (BRT) - cron: "0 21 * * *" + workflow_dispatch: jobs: schedule-jobs: @@ -29,6 +29,8 @@ jobs: - name: Prepare environment run: | python -m pip install --upgrade pip - pip install click python-decouple scrapinghub + pip install click python-decouple scrapinghub SQLAlchemy psycopg2 - name: Schedule jobs - run: python scripts/scheduler.py schedule-enabled-spiders + run: | + cd data_collection/ + python scheduler.py schedule-enabled-spiders diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index c11b946bd..bf52b77fb 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -3,6 +3,7 @@ name: Deploy to Scrapy Cloud on: push: branches: [main] + workflow_dispatch: jobs: deploy_to_scrapy_cloud: @@ -21,4 +22,4 @@ jobs: SHUB_APIKEY: ${{ secrets.SHUB_APIKEY }} run: | cd data_collection/ - shub deploy + shub deploy ${{ secrets.SCRAPY_CLOUD_PROJECT_ID }} diff --git a/.github/workflows/monthly_crawl.yaml b/.github/workflows/monthly_crawl.yaml index 11e9849e4..23b2c2c0f 100644 --- a/.github/workflows/monthly_crawl.yaml +++ b/.github/workflows/monthly_crawl.yaml @@ -1,9 +1,10 @@ -name: Weekly execution of Spiders +name: Monthly Crawl of Enabled Spiders on: schedule: # Execute once a month at 8PM (BRT) - cron: "0 23 1 * *" + workflow_dispatch: jobs: schedule-jobs: @@ -28,6 +29,8 @@ jobs: - name: Prepare environment run: | python -m pip install --upgrade pip - pip install click python-decouple scrapinghub + pip install click python-decouple scrapinghub SQLAlchemy psycopg2 - name: Schedule jobs - run: python scripts/scheduler.py last-month-schedule-enabled-spiders + run: | + cd data_collection/ + python scheduler.py last-month-schedule-enabled-spiders diff --git a/.github/workflows/schedule_spider.yaml b/.github/workflows/schedule_spider.yaml index 0f1d76a04..b57cf5477 100644 --- a/.github/workflows/schedule_spider.yaml +++ b/.github/workflows/schedule_spider.yaml @@ -32,14 +32,18 @@ jobs: - uses: actions/checkout@v2 - uses: actions/setup-python@v2 with: - python-version: "3.10" + python-version: '3.10' - name: Prepare environment run: | python -m pip install --upgrade pip - pip install click python-decouple scrapinghub + pip install click python-decouple scrapinghub SQLAlchemy psycopg2 - name: Schedule full crawl if: ${{ !github.event.inputs.start_date }} - run: python scripts/scheduler.py schedule-spider --spider_name=${{ github.event.inputs.spider_name }} + run: | + cd data_collection/ + python scheduler.py schedule-spider --spider_name=${{ github.event.inputs.spider_name }} - name: Schedule partial crawl if: ${{ github.event.inputs.start_date }} - run: python scripts/scheduler.py schedule-spider --spider_name=${{ github.event.inputs.spider_name }} --start_date=${{ github.event.inputs.start_date }} + run: | + cd data_collection/ + python scheduler.py schedule-spider --spider_name=${{ github.event.inputs.spider_name }} --start_date=${{ github.event.inputs.start_date }} --end_date=${{ github.event.inputs.end_date }} diff --git a/.github/workflows/schedule_spider_by_date.yaml b/.github/workflows/schedule_spider_by_date.yaml index eb85f0d08..22383a6d8 100644 --- a/.github/workflows/schedule_spider_by_date.yaml +++ b/.github/workflows/schedule_spider_by_date.yaml @@ -30,7 +30,9 @@ jobs: - name: Prepare environment run: | python -m pip install --upgrade pip - pip install click python-decouple scrapinghub + pip install click python-decouple scrapinghub SQLAlchemy psycopg2 - name: Schedule partial crawl if: ${{ github.event.inputs.start_date }} - run: python scripts/scheduler.py schedule-all-spiders-by-date --start_date ${{ github.event.inputs.start_date }} + run: | + cd data_collection/ + python scheduler.py schedule-all-spiders-by-date --start_date ${{ github.event.inputs.start_date }} diff --git a/.github/workflows/schedule_spider_date_range.yaml b/.github/workflows/schedule_spider_date_range.yaml deleted file mode 100644 index 4a1c8a0b4..000000000 --- a/.github/workflows/schedule_spider_date_range.yaml +++ /dev/null @@ -1,40 +0,0 @@ -name: Schedule Spider Crawl - Date Range - -on: - workflow_dispatch: - inputs: - spider_name: - description: 'Spider to be scheduled' - required: true - start_date: - description: 'Start date (YYYY-MM-DD)' - required: true - end_date: - description: 'End date (YYYY-MM-DD)' - required: true - -jobs: - schedule: - runs-on: ubuntu-latest - env: - SHUB_APIKEY: ${{ secrets.SHUB_APIKEY }} - SCRAPY_CLOUD_PROJECT_ID: ${{ secrets.SCRAPY_CLOUD_PROJECT_ID }} - FILES_STORE: ${{ secrets.FILES_STORE }} - QUERIDODIARIO_DATABASE_URL: ${{ secrets.QUERIDODIARIO_DATABASE_URL }} - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - AWS_ENDPOINT_URL: ${{ secrets.AWS_ENDPOINT_URL }} - AWS_REGION_NAME: ${{ secrets.AWS_REGION_NAME }} - SPIDERMON_DISCORD_FAKE: ${{ secrets.SPIDERMON_DISCORD_FAKE }} - SPIDERMON_DISCORD_WEBHOOK_URL: ${{ secrets.SPIDERMON_DISCORD_WEBHOOK_URL }} - ZYTE_SMARTPROXY_APIKEY: ${{ secrets.ZYTE_SMARTPROXY_APIKEY }} - steps: - - uses: actions/checkout@v2 - - uses: actions/setup-python@v2 - with: - python-version: "3.10" - - name: Prepare environment - run: | - python -m pip install --upgrade pip - pip install click python-decouple scrapinghub - python scripts/scheduler.py schedule-spider --spider_name=${{ github.event.inputs.spider_name }} --start_date=${{ github.event.inputs.start_date }} --end_date=${{ github.event.inputs.end_date }} diff --git a/.github/workflows/update_spider_status.yaml b/.github/workflows/update_spider_status.yaml new file mode 100644 index 000000000..362c78efa --- /dev/null +++ b/.github/workflows/update_spider_status.yaml @@ -0,0 +1,40 @@ +name: Update spider status on production + +on: + workflow_dispatch: + inputs: + spider_name: + description: 'Spider name' + required: true + status: + type: choice + description: 'New Spider status in production' + options: + - enabled + - disabled + required: true + +jobs: + update_status: + runs-on: ubuntu-latest + env: + QUERIDODIARIO_DATABASE_URL: ${{ secrets.QUERIDODIARIO_DATABASE_URL }} + steps: + - uses: actions/checkout@v2 + - uses: actions/setup-python@v2 + with: + python-version: '3.10' + - name: Prepare environment + run: | + python -m pip install --upgrade pip + pip install click python-decouple scrapinghub SQLAlchemy psycopg2 + - name: Enable spider in production + if: ${{ github.event.inputs.status == 'enabled' }} + run: | + cd data_collection/ + python scheduler.py enable-spider --spider_name=${{ github.event.inputs.spider_name }} + - name: Disable spider in production + if: ${{ github.event.inputs.status == 'disabled' }} + run: | + cd data_collection/ + python scheduler.py disable-spider --spider_name=${{ github.event.inputs.spider_name }} \ No newline at end of file diff --git a/data_collection/.local.env b/data_collection/.local.env new file mode 100644 index 000000000..176ecc305 --- /dev/null +++ b/data_collection/.local.env @@ -0,0 +1,6 @@ +AWS_ACCESS_KEY_ID=minio-access-key +AWS_SECRET_ACCESS_KEY=minio-secret-key +AWS_ENDPOINT_URL=http://localhost:9000 +AWS_REGION_NAME=nyc3 +FILES_STORE=s3://queridodiariobucket/ +QUERIDODIARIO_DATABASE_URL=postgresql://queridodiario:queridodiario@localhost:5432/queridodiariodb diff --git a/data_collection/gazette/commands/__init__.py b/data_collection/gazette/commands/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/data_collection/gazette/commands/qd-list-enabled.py b/data_collection/gazette/commands/qd-list-enabled.py new file mode 100644 index 000000000..72e37f83e --- /dev/null +++ b/data_collection/gazette/commands/qd-list-enabled.py @@ -0,0 +1,53 @@ +import datetime + +from scrapy.commands import ScrapyCommand +from scrapy.exceptions import UsageError + +from gazette.utils import get_enabled_spiders + + +class Command(ScrapyCommand): + requires_project = True + + def add_options(self, parser): + ScrapyCommand.add_options(self, parser) + parser.add_argument( + "--start_date", + dest="start_date", + default=None, + metavar="VALUE", + help="List spiders enabled from date (format: YYYY-MM-DD)", + ) + parser.add_argument( + "--end_date", + dest="end_date", + default=None, + metavar="VALUE", + help="List spiders enabled until date (format: YYYY-MM-DD)", + ) + + def short_desc(self): + return "List production enabled spiders" + + def run(self, args, opts): + start_date, end_date = None, None + + if opts.start_date is not None: + try: + start_date = datetime.datetime.strptime(opts.start_date, "%Y-%m-%d") + except ValueError: + raise UsageError("'start_date' must match YYYY-MM-DD format") + + if opts.end_date is not None: + try: + end_date = datetime.datetime.strptime(opts.end_date, "%Y-%m-%d") + except ValueError: + raise UsageError("'end_date' must match YYYY-MM-DD format") + + print("\nEnabled spiders\n===============") + for spider_name in get_enabled_spiders( + database_url=self.settings["QUERIDODIARIO_DATABASE_URL"], + start_date=start_date, + end_date=end_date, + ): + print(spider_name) diff --git a/data_collection/gazette/database/models.py b/data_collection/gazette/database/models.py index 8503076c3..7bd9806a7 100644 --- a/data_collection/gazette/database/models.py +++ b/data_collection/gazette/database/models.py @@ -49,46 +49,51 @@ def load_territories(engine): logger.info("Populating 'territories' table - Done!") +def get_new_spiders(session, territory_spider_map): + registered_spiders = session.query(QueridoDiarioSpider).all() + registered_spiders_set = { + (spider.spider_name, territory.id, spider.date_from) + for spider in registered_spiders + for territory in spider.territories + } + only_new_spiders = [ + spider_info + for spider_info in territory_spider_map + if spider_info not in registered_spiders_set + ] + return only_new_spiders + + def load_spiders(engine, territory_spider_map): Session = sessionmaker(bind=engine) session = Session() - if session.query(QueridoDiarioSpider).count() > 0: - return + table_is_populated = session.query(QueridoDiarioSpider).count() > 0 + new_spiders = ( + get_new_spiders(session, territory_spider_map) + if table_is_populated + else territory_spider_map + ) logger.info("Populating 'querido_diario_spider' table - Please wait!") - spiders = [] - territory_ids = set() - for info in territory_spider_map: - spider_name, territory_id, date_from = info - spiders.append( - QueridoDiarioSpider(spider_name=spider_name, date_from=date_from) - ) - territory_ids.add(territory_id) - - session.add_all(spiders) - session.commit() - - spiders = ( - session.query(QueridoDiarioSpider) - .filter( - QueridoDiarioSpider.spider_name.in_(set(s[0] for s in territory_spider_map)) - ) - .all() - ) - spider_map = {spider.spider_name: spider for spider in spiders} - - territories = session.query(Territory).filter(Territory.id.in_(territory_ids)).all() + territories = session.query(Territory).all() territory_map = {t.id: t for t in territories} - for info in territory_spider_map: - spider_name, territory_id, _ = info - spider = spider_map.get(spider_name) + spiders = [] + for info in new_spiders: + spider_name, territory_id, date_from = info territory = territory_map.get(territory_id) - if spider is not None and territory is not None: - spider.territories.append(territory) + if territory is not None: + spiders.append( + QueridoDiarioSpider( + spider_name=spider_name, + date_from=date_from, + territories=[territory], + ) + ) + session.add_all(spiders) session.commit() logger.info("Populating 'querido_diario_spider' table - Done!") diff --git a/data_collection/gazette/monitors.py b/data_collection/gazette/monitors.py index c61ddd919..388b510bb 100644 --- a/data_collection/gazette/monitors.py +++ b/data_collection/gazette/monitors.py @@ -27,7 +27,7 @@ def test_requests_items_ratio(self): ratio = n_requests_count / n_scraped_items percent = round(ratio * 100, 2) allowed_percent = round(max_ratio * 100, 2) - self.assertLess( + self.assertLessEqual( ratio, max_ratio, msg=f"""{percent}% is greater than the allowed {allowed_percent}% diff --git a/data_collection/gazette/resources/territories.csv b/data_collection/gazette/resources/territories.csv index 327c9f7e5..e909e1554 100644 --- a/data_collection/gazette/resources/territories.csv +++ b/data_collection/gazette/resources/territories.csv @@ -27,6 +27,7 @@ id,name,state_code,state 5100102,Acorizal,MT,Mato Grosso 1200013,Acrelândia,AC,Acre 5200134,Acreúna,GO,Goiás +2400208,Açu,RN,Rio Grande do Norte 3100500,Açucena,MG,Minas Gerais 3500105,Adamantina,SP,São Paulo 5200159,Adelândia,GO,Goiás @@ -47,8 +48,8 @@ id,name,state_code,state 3100609,Água Boa,MG,Minas Gerais 5100201,Água Boa,MT,Mato Grosso 2700102,Água Branca,AL,Alagoas -2200202,Água Branca,PI,Piauí 2500106,Água Branca,PB,Paraíba +2200202,Água Branca,PI,Piauí 5000203,Água Clara,MS,Mato Grosso do Sul 3100708,Água Comprida,MG,Minas Gerais 4200408,Água Doce,SC,Santa Catarina @@ -119,6 +120,7 @@ id,name,state_code,state 1700350,Aliança do Tocantins,TO,Tocantins 2900900,Almadina,BA,Bahia 1700400,Almas,TO,Tocantins +1500503,Almeirim,PA,Pará 3101706,Almenara,MG,Minas Gerais 2400604,Almino Afonso,RN,Rio Grande do Norte 4100400,Almirante Tamandaré,PR,Paraná @@ -137,8 +139,8 @@ id,name,state_code,state 3102001,Alterosa,MG,Minas Gerais 2600807,Altinho,PE,Pernambuco 3501004,Altinópolis,SP,São Paulo -4300554,Alto Alegre,RS,Rio Grande do Sul 1400050,Alto Alegre,RR,Roraima +4300554,Alto Alegre,RS,Rio Grande do Sul 3501103,Alto Alegre,SP,São Paulo 2100436,Alto Alegre do Maranhão,MA,Maranhão 2100477,Alto Alegre do Pindaré,MA,Maranhão @@ -154,6 +156,7 @@ id,name,state_code,state 3153509,Alto Jequitibá,MG,Minas Gerais 2200301,Alto Longá,PI,Piauí 5100508,Alto Paraguai,MT,Mato Grosso +4128625,Alto Paraíso,PR,Paraná 1100403,Alto Paraíso,RO,Rondônia 5200605,Alto Paraíso de Goiás,GO,Goiás 4100608,Alto Paraná,PR,Paraná @@ -255,8 +258,8 @@ id,name,state_code,state 2300804,Antonina do Norte,CE,Ceará 2200806,Antônio Almeida,PI,Piauí 2901700,Antônio Cardoso,BA,Bahia -4201208,Antônio Carlos,SC,Santa Catarina 3102902,Antônio Carlos,MG,Minas Gerais +4201208,Antônio Carlos,SC,Santa Catarina 3103009,Antônio Dias,MG,Minas Gerais 2901809,Antônio Gonçalves,BA,Bahia 5000906,Antônio João,MS,Mato Grosso do Sul @@ -264,8 +267,8 @@ id,name,state_code,state 4101309,Antônio Olinto,PR,Paraná 4300802,Antônio Prado,RS,Rio Grande do Sul 3103108,Antônio Prado de Minas,MG,Minas Gerais -3502507,Aparecida,SP,São Paulo 2500775,Aparecida,PB,Paraíba +3502507,Aparecida,SP,São Paulo 3502606,Aparecida d'Oeste,SP,São Paulo 5201405,Aparecida de Goiânia,GO,Goiás 5201454,Aparecida do Rio Doce,GO,Goiás @@ -298,8 +301,8 @@ id,name,state_code,state 3502804,Araçatuba,SP,São Paulo 2902104,Araci,BA,Bahia 3103306,Aracitaba,MG,Minas Gerais -2601052,Araçoiaba,PE,Pernambuco 2301208,Aracoiaba,CE,Ceará +2601052,Araçoiaba,PE,Pernambuco 3502903,Araçoiaba da Serra,SP,São Paulo 3200607,Aracruz,ES,Espírito Santo 5201603,Araçu,GO,Goiás @@ -332,8 +335,8 @@ id,name,state_code,state 4101507,Arapongas,PR,Paraná 3103751,Araporã,MG,Minas Gerais 4101606,Arapoti,PR,Paraná -4101655,Arapuã,PR,Paraná 3103801,Arapuá,MG,Minas Gerais +4101655,Arapuã,PR,Paraná 5101258,Araputanga,MT,Mato Grosso 4201307,Araquari,SC,Santa Catarina 2500908,Arara,PB,Paraíba @@ -385,6 +388,7 @@ id,name,state_code,state 2301505,Arneiroz,CE,Ceará 2200905,Aroazes,PI,Piauí 2501302,Aroeiras,PB,Paraíba +2200954,Aroeiras do Itaim,PI,Piauí 2201002,Arraial,PI,Piauí 3300258,Arraial do Cabo,RJ,Rio de Janeiro 1702406,Arraias,TO,Tocantins @@ -419,8 +423,10 @@ id,name,state_code,state 3504107,Atibaia,SP,São Paulo 3200706,Atilio Vivacqua,ES,Espírito Santo 1702554,Augustinópolis,TO,Tocantins +1500909,Augusto Corrêa,PA,Pará 3104809,Augusto de Lima,MG,Minas Gerais 4301503,Augusto Pestana,RS,Rio Grande do Sul +2401305,Augusto Severo,RN,Rio Grande do Norte 4301552,Áurea,RS,Rio Grande do Sul 2902401,Aurelino Leal,BA,Bahia 3504206,Auriflama,SP,São Paulo @@ -462,7 +468,9 @@ id,name,state_code,state 4202057,Balneário Barra do Sul,SC,Santa Catarina 4202008,Balneário Camboriú,SC,Santa Catarina 4202073,Balneário Gaivota,SC,Santa Catarina +4212809,Balneário Piçarras,SC,Santa Catarina 4301636,Balneário Pinhal,RS,Rio Grande do Sul +4220000,Balneário Rincão,SC,Santa Catarina 4102307,Balsa Nova,PR,Paraná 3504800,Bálsamo,SP,São Paulo 2101400,Balsas,MA,Maranhão @@ -486,8 +494,8 @@ id,name,state_code,state 5101605,Barão de Melgaço,MT,Mato Grosso 3105509,Barão de Monte Alto,MG,Minas Gerais 4301750,Barão do Triunfo,RS,Rio Grande do Sul -2401453,Baraúna,RN,Rio Grande do Norte 2501534,Baraúna,PB,Paraíba +2401453,Baraúna,RN,Rio Grande do Norte 3105608,Barbacena,MG,Minas Gerais 2301901,Barbalha,CE,Ceará 3505104,Barbosa,SP,São Paulo @@ -497,8 +505,8 @@ id,name,state_code,state 1300409,Barcelos,AM,Amazonas 3505203,Bariri,SP,São Paulo 2902708,Barra,BA,Bahia -3505302,Barra Bonita,SP,São Paulo 4202099,Barra Bonita,SC,Santa Catarina +3505302,Barra Bonita,SP,São Paulo 2201176,Barra D'Alcântara,PI,Piauí 2902807,Barra da Estiva,BA,Bahia 2601300,Barra de Guabiraba,PE,Pernambuco @@ -528,8 +536,8 @@ id,name,state_code,state 3105707,Barra Longa,MG,Minas Gerais 3300407,Barra Mansa,RJ,Rio de Janeiro 4202107,Barra Velha,SC,Santa Catarina -4301800,Barracão,RS,Rio Grande do Sul 4102604,Barracão,PR,Paraná +4301800,Barracão,RS,Rio Grande do Sul 2201200,Barras,PI,Piauí 2301950,Barreira,CE,Ceará 2903201,Barreiras,BA,Bahia @@ -540,8 +548,8 @@ id,name,state_code,state 3505500,Barretos,SP,São Paulo 3505609,Barrinha,SP,São Paulo 2302008,Barro,CE,Ceará -5203203,Barro Alto,GO,Goiás 2903235,Barro Alto,BA,Bahia +5203203,Barro Alto,GO,Goiás 2201408,Barro Duro,PI,Piauí 2903300,Barro Preto,BA,Bahia 2903276,Barrocas,BA,Bahia @@ -575,13 +583,13 @@ id,name,state_code,state 1501402,Belém,PA,Pará 2501906,Belém,PB,Paraíba 2601508,Belém de Maria,PE,Pernambuco -2601607,Belém do São Francisco,PE,Pernambuco 2502003,Belém do Brejo do Cruz,PB,Paraíba 2201572,Belém do Piauí,PI,Piauí +2601607,Belém do São Francisco,PE,Pernambuco 3300456,Belford Roxo,RJ,Rio de Janeiro 3106101,Belmiro Braga,MG,Minas Gerais -4202156,Belmonte,SC,Santa Catarina 2903409,Belmonte,BA,Bahia +4202156,Belmonte,SC,Santa Catarina 2903508,Belo Campo,BA,Bahia 3106200,Belo Horizonte,MG,Minas Gerais 2601706,Belo Jardim,PE,Pernambuco @@ -633,8 +641,8 @@ id,name,state_code,state 2502102,Boa Ventura,PB,Paraíba 4103040,Boa Ventura de São Roque,PR,Paraná 2302404,Boa Viagem,CE,Ceará -1400100,Boa Vista,RR,Roraima 2502151,Boa Vista,PB,Paraíba +1400100,Boa Vista,RR,Roraima 4103057,Boa Vista da Aparecida,PR,Paraná 4302154,Boa Vista das Missões,RS,Rio Grande do Sul 4302204,Boa Vista do Buricá,RS,Rio Grande do Sul @@ -646,8 +654,8 @@ id,name,state_code,state 2903805,Boa Vista do Tupim,BA,Bahia 2701001,Boca da Mata,AL,Alagoas 1300706,Boca do Acre,AM,Amazonas -3506805,Bocaina,SP,São Paulo 2201804,Bocaina,PI,Piauí +3506805,Bocaina,SP,São Paulo 3107208,Bocaina de Minas,MG,Minas Gerais 4202438,Bocaina do Sul,SC,Santa Catarina 3107307,Bocaiúva,MG,Minas Gerais @@ -659,17 +667,17 @@ id,name,state_code,state 3507001,Boituva,SP,São Paulo 2602100,Bom Conselho,PE,Pernambuco 3107406,Bom Despacho,MG,Minas Gerais -3300506,Bom Jardim,RJ,Rio de Janeiro 2102002,Bom Jardim,MA,Maranhão 2602209,Bom Jardim,PE,Pernambuco +3300506,Bom Jardim,RJ,Rio de Janeiro 4202503,Bom Jardim da Serra,SC,Santa Catarina 5203401,Bom Jardim de Goiás,GO,Goiás 3107505,Bom Jardim de Minas,MG,Minas Gerais +2502201,Bom Jesus,PB,Paraíba +2201903,Bom Jesus,PI,Piauí 2401701,Bom Jesus,RN,Rio Grande do Norte 4302303,Bom Jesus,RS,Rio Grande do Sul -2201903,Bom Jesus,PI,Piauí 4202537,Bom Jesus,SC,Santa Catarina -2502201,Bom Jesus,PB,Paraíba 2903904,Bom Jesus da Lapa,BA,Bahia 3107604,Bom Jesus da Penha,MG,Minas Gerais 2903953,Bom Jesus da Serra,BA,Bahia @@ -682,8 +690,8 @@ id,name,state_code,state 3201100,Bom Jesus do Norte,ES,Espírito Santo 4202578,Bom Jesus do Oeste,SC,Santa Catarina 4103156,Bom Jesus do Sul,PR,Paraná -1703305,Bom Jesus do Tocantins,TO,Tocantins 1501576,Bom Jesus do Tocantins,PA,Pará +1703305,Bom Jesus do Tocantins,TO,Tocantins 3507100,Bom Jesus dos Perdões,SP,São Paulo 2102077,Bom Lugar,MA,Maranhão 4302352,Bom Princípio,RS,Rio Grande do Sul @@ -692,9 +700,9 @@ id,name,state_code,state 3107901,Bom Repouso,MG,Minas Gerais 4202602,Bom Retiro,SC,Santa Catarina 4302402,Bom Retiro do Sul,RS,Rio Grande do Sul +3108008,Bom Sucesso,MG,Minas Gerais 2502300,Bom Sucesso,PB,Paraíba 4103206,Bom Sucesso,PR,Paraná -3108008,Bom Sucesso,MG,Minas Gerais 3507159,Bom Sucesso de Itararé,SP,São Paulo 4103222,Bom Sucesso do Sul,PR,Paraná 4202453,Bombinhas,SC,Santa Catarina @@ -705,9 +713,9 @@ id,name,state_code,state 3108206,Bonfinópolis de Minas,MG,Minas Gerais 2904001,Boninal,BA,Bahia 2904050,Bonito,BA,Bahia +5002209,Bonito,MS,Mato Grosso do Sul 1501600,Bonito,PA,Pará 2602308,Bonito,PE,Pernambuco -5002209,Bonito,MS,Mato Grosso do Sul 3108255,Bonito de Minas,MG,Minas Gerais 2502409,Bonito de Santa Fé,PB,Paraíba 5203575,Bonópolis,GO,Goiás @@ -752,10 +760,11 @@ id,name,state_code,state 3507704,Braúna,SP,São Paulo 3108800,Braúnas,MG,Minas Gerais 5203609,Brazabrantes,GO,Goiás +3108909,Brazópolis,MG,Minas Gerais 2602407,Brejão,PE,Pernambuco 3201159,Brejetuba,ES,Espírito Santo -2401800,Brejinho,RN,Rio Grande do Norte 2602506,Brejinho,PE,Pernambuco +2401800,Brejinho,RN,Rio Grande do Norte 1703701,Brejinho de Nazaré,TO,Tocantins 2102101,Brejo,MA,Maranhão 3507753,Brejo Alegre,SP,São Paulo @@ -801,8 +810,8 @@ id,name,state_code,state 5203962,Buritinópolis,GO,Goiás 2904753,Buritirama,BA,Bahia 2102358,Buritirana,MA,Maranhão -1100452,Buritis,RO,Rondônia 3109303,Buritis,MG,Minas Gerais +1100452,Buritis,RO,Rondônia 3508207,Buritizal,SP,São Paulo 3109402,Buritizeiro,MG,Minas Gerais 4302709,Butiá,RS,Rio Grande do Sul @@ -845,8 +854,8 @@ id,name,state_code,state 3508603,Cachoeira Paulista,SP,São Paulo 3300803,Cachoeiras de Macacu,RJ,Rio de Janeiro 2603108,Cachoeirinha,PE,Pernambuco -1703826,Cachoeirinha,TO,Tocantins 4303103,Cachoeirinha,RS,Rio Grande do Sul +1703826,Cachoeirinha,TO,Tocantins 3201209,Cachoeiro de Itapemirim,ES,Espírito Santo 2503407,Cacimba de Areia,PB,Paraíba 2503506,Cacimba de Dentro,PB,Paraíba @@ -865,16 +874,16 @@ id,name,state_code,state 2905206,Caetité,BA,Bahia 2905305,Cafarnaum,BA,Bahia 4103404,Cafeara,PR,Paraná -3508801,Cafelândia,SP,São Paulo 4103453,Cafelândia,PR,Paraná +3508801,Cafelândia,SP,São Paulo 4103479,Cafezal do Sul,PR,Paraná 3508900,Caiabu,SP,São Paulo 3110103,Caiana,MG,Minas Gerais 5204409,Caiapônia,GO,Goiás 4303301,Caibaté,RS,Rio Grande do Sul 4203105,Caibi,SC,Santa Catarina -4303400,Caiçara,RS,Rio Grande do Sul 2503605,Caiçara,PB,Paraíba +4303400,Caiçara,RS,Rio Grande do Sul 2401859,Caiçara do Norte,RN,Rio Grande do Norte 2401909,Caiçara do Rio do Vento,RN,Rio Grande do Norte 2402006,Caicó,RN,Rio Grande do Norte @@ -927,8 +936,8 @@ id,name,state_code,state 2603504,Camocim de São Félix,PE,Pernambuco 3110806,Campanário,MG,Minas Gerais 3110905,Campanha,MG,Minas Gerais -3111002,Campestre,MG,Minas Gerais 2701357,Campestre,AL,Alagoas +3111002,Campestre,MG,Minas Gerais 4303673,Campestre da Serra,RS,Rio Grande do Sul 5204607,Campestre de Goiás,GO,Goiás 2102556,Campestre do Maranhão,MA,Maranhão @@ -945,8 +954,8 @@ id,name,state_code,state 2202109,Campinas do Piauí,PI,Piauí 4303806,Campinas do Sul,RS,Rio Grande do Sul 5204706,Campinorte,GO,Goiás -4203303,Campo Alegre,SC,Santa Catarina 2701407,Campo Alegre,AL,Alagoas +4203303,Campo Alegre,SC,Santa Catarina 5204805,Campo Alegre de Goiás,GO,Goiás 2905909,Campo Alegre de Lourdes,BA,Bahia 2202117,Campo Alegre do Fidalgo,PI,Piauí @@ -955,7 +964,6 @@ id,name,state_code,state 4203402,Campo Belo do Sul,SC,Santa Catarina 4303905,Campo Bom,RS,Rio Grande do Sul 4104055,Campo Bonito,PR,Paraná -2516409,Tacima,PB,Paraíba 2801009,Campo do Brito,SE,Sergipe 3111309,Campo do Meio,MG,Minas Gerais 4104105,Campo do Tenente,PR,Paraná @@ -967,6 +975,7 @@ id,name,state_code,state 2202133,Campo Grande do Piauí,PI,Piauí 4104204,Campo Largo,PR,Paraná 2202174,Campo Largo do Piauí,PI,Piauí +5204854,Campo Limpo de Goiás,GO,Goiás 3509601,Campo Limpo Paulista,SP,São Paulo 4104253,Campo Magro,PR,Paraná 2202208,Campo Maior,PI,Piauí @@ -995,16 +1004,16 @@ id,name,state_code,state 5102694,Canabrava do Norte,MT,Mato Grosso 3509908,Cananéia,SP,São Paulo 2701605,Canapi,AL,Alagoas -3111804,Canápolis,MG,Minas Gerais 2906105,Canápolis,BA,Bahia -5102702,Canarana,MT,Mato Grosso +3111804,Canápolis,MG,Minas Gerais 2906204,Canarana,BA,Bahia +5102702,Canarana,MT,Mato Grosso 3509957,Canas,SP,São Paulo 2202251,Canavieira,PI,Piauí 2906303,Canavieiras,BA,Bahia 2906402,Candeal,BA,Bahia -3112000,Candeias,MG,Minas Gerais 2906501,Candeias,BA,Bahia +3112000,Candeias,MG,Minas Gerais 1100809,Candeias do Jamari,RO,Rondônia 4304200,Candelária,RS,Rio Grande do Sul 2906600,Candiba,BA,Bahia @@ -1029,9 +1038,9 @@ id,name,state_code,state 4203808,Canoinhas,SC,Santa Catarina 2906808,Cansanção,BA,Bahia 1400175,Cantá,RR,Roraima -3301108,Cantagalo,RJ,Rio de Janeiro 3112059,Cantagalo,MG,Minas Gerais 4104451,Cantagalo,PR,Paraná +3301108,Cantagalo,RJ,Rio de Janeiro 2102705,Cantanhede,MA,Maranhão 2202307,Canto do Buriti,PI,Piauí 2906824,Canudos,BA,Bahia @@ -1046,8 +1055,8 @@ id,name,state_code,state 4304655,Capão do Cipó,RS,Rio Grande do Sul 4304663,Capão do Leão,RS,Rio Grande do Sul 3112109,Caparaó,MG,Minas Gerais -2801306,Capela,SE,Sergipe 2701704,Capela,AL,Alagoas +2801306,Capela,SE,Sergipe 4304689,Capela de Santana,RS,Rio Grande do Sul 3510302,Capela do Alto,SP,São Paulo 2906857,Capela do Alto Alegre,BA,Bahia @@ -1066,6 +1075,7 @@ id,name,state_code,state 2202406,Capitão de Campos,PI,Piauí 3112703,Capitão Enéas,MG,Minas Gerais 2202455,Capitão Gervásio Oliveira,PI,Piauí +4104600,Capitão Leônidas Marques,PR,Paraná 1502301,Capitão Poço,PA,Pará 3112802,Capitólio,MG,Minas Gerais 3510401,Capivari,SP,São Paulo @@ -1089,8 +1099,8 @@ id,name,state_code,state 3510609,Carapicuíba,SP,São Paulo 3113404,Caratinga,MG,Minas Gerais 1301001,Carauari,AM,Amazonas -2402303,Caraúbas,RN,Rio Grande do Norte 2504074,Caraúbas,PB,Paraíba +2402303,Caraúbas,RN,Rio Grande do Norte 2202539,Caraúbas do Piauí,PI,Piauí 2906907,Caravelas,BA,Bahia 4304705,Carazinho,RS,Rio Grande do Sul @@ -1100,6 +1110,7 @@ id,name,state_code,state 3301157,Cardoso Moreira,RJ,Rio de Janeiro 3113602,Careaçu,MG,Minas Gerais 1301100,Careiro,AM,Amazonas +1301159,Careiro da Várzea,AM,Amazonas 3201308,Cariacica,ES,Espírito Santo 2303006,Caridade,CE,Ceará 2202554,Caridade do Piauí,PI,Piauí @@ -1148,8 +1159,8 @@ id,name,state_code,state 2907202,Casa Nova,BA,Bahia 4304903,Casca,RS,Rio Grande do Sul 3115003,Cascalho Rico,MG,Minas Gerais -4104808,Cascavel,PR,Paraná 2303501,Cascavel,CE,Ceará +4104808,Cascavel,PR,Paraná 1703909,Caseara,TO,Tocantins 4304952,Caseiros,RS,Rio Grande do Sul 3301306,Casimiro de Abreu,RJ,Rio de Janeiro @@ -1195,10 +1206,11 @@ id,name,state_code,state 2103000,Caxias,MA,Maranhão 4305108,Caxias do Sul,RS,Rio Grande do Sul 2202653,Caxingó,PI,Piauí -3511300,Cedral,SP,São Paulo +2402600,Ceará-Mirim,RN,Rio Grande do Norte 2103109,Cedral,MA,Maranhão -2604304,Cedro,PE,Pernambuco +3511300,Cedral,SP,São Paulo 2303808,Cedro,CE,Ceará +2604304,Cedro,PE,Pernambuco 2801603,Cedro de São João,SE,Sergipe 3115607,Cedro do Abaeté,MG,Minas Gerais 4204152,Celso Ramos,SC,Santa Catarina @@ -1216,6 +1228,7 @@ id,name,state_code,state 3511409,Cerqueira César,SP,São Paulo 3511508,Cerquilho,SP,São Paulo 4305124,Cerrito,RS,Rio Grande do Sul +4105201,Cerro Azul,PR,Paraná 4305132,Cerro Branco,RS,Rio Grande do Sul 2402709,Cerro Corá,RN,Rio Grande do Norte 4305157,Cerro Grande,RS,Rio Grande do Sul @@ -1290,8 +1303,8 @@ id,name,state_code,state 3201506,Colatina,ES,Espírito Santo 5103205,Colíder,MT,Mato Grosso 3512001,Colina,SP,São Paulo -4305587,Colinas,RS,Rio Grande do Sul 2103505,Colinas,MA,Maranhão +4305587,Colinas,RS,Rio Grande do Sul 5205521,Colinas do Sul,GO,Goiás 1705508,Colinas do Tocantins,TO,Tocantins 1716703,Colméia,TO,Tocantins @@ -1335,8 +1348,8 @@ id,name,state_code,state 3512308,Conchas,SP,São Paulo 4204301,Concórdia,SC,Santa Catarina 1502756,Concórdia do Pará,PA,Pará -2604601,Condado,PE,Pernambuco 2504504,Condado,PB,Paraíba +2604601,Condado,PE,Pernambuco 2908606,Conde,BA,Bahia 2504603,Conde,PB,Paraíba 2908705,Condeúba,BA,Bahia @@ -1350,7 +1363,9 @@ id,name,state_code,state 3118106,Congonhas do Norte,MG,Minas Gerais 4106001,Congonhinhas,PR,Paraná 3118205,Conquista,MG,Minas Gerais +5103361,Conquista D'Oeste,MT,Mato Grosso 3118304,Conselheiro Lafaiete,MG,Minas Gerais +4106100,Conselheiro Mairinck,PR,Paraná 3118403,Conselheiro Pena,MG,Minas Gerais 3118502,Consolação,MG,Minas Gerais 4305801,Constantina,RS,Rio Grande do Sul @@ -1424,6 +1439,7 @@ id,name,state_code,state 4305959,Cotiporã,RS,Rio Grande do Sul 5103379,Cotriguaçu,MT,Mato Grosso 3120102,Couto de Magalhães de Minas,MG,Minas Gerais +1706001,Couto Magalhães,TO,Tocantins 4305975,Coxilha,RS,Rio Grande do Sul 5003306,Coxim,MS,Mato Grosso do Sul 2504850,Coxixola,PB,Paraíba @@ -1508,6 +1524,7 @@ id,name,state_code,state 5103437,Curvelândia,MT,Mato Grosso 3120904,Curvelo,MG,Minas Gerais 2605103,Custódia,PE,Pernambuco +1600212,Cutias,AP,Amapá 5206701,Damianópolis,GO,Goiás 2505352,Damião,PB,Paraíba 5206800,Damolândia,GO,Goiás @@ -1541,6 +1558,7 @@ id,name,state_code,state 3121605,Diamantina,MG,Minas Gerais 5103502,Diamantino,MT,Mato Grosso 1707009,Dianópolis,TO,Tocantins +2910057,Dias d'Ávila,BA,Bahia 4306379,Dilermando de Aguiar,RS,Rio Grande do Sul 3121704,Diogo de Vasconcelos,MG,Minas Gerais 3121803,Dionísio,MG,Minas Gerais @@ -1575,6 +1593,7 @@ id,name,state_code,state 2910107,Dom Basílio,BA,Bahia 3122470,Dom Bosco,MG,Minas Gerais 3122504,Dom Cavati,MG,Minas Gerais +1502939,Dom Eliseu,PA,Pará 2203404,Dom Expedito Lopes,PI,Piauí 4306502,Dom Feliciano,RS,Rio Grande do Sul 2203453,Dom Inocêncio,PI,Piauí @@ -1624,10 +1643,10 @@ id,name,state_code,state 5207352,Edealina,GO,Goiás 5207402,Edéia,GO,Goiás 1301407,Eirunepé,AM,Amazonas -3514809,Eldorado,SP,São Paulo 5003751,Eldorado,MS,Mato Grosso do Sul -4306767,Eldorado do Sul,RS,Rio Grande do Sul +3514809,Eldorado,SP,São Paulo 1502954,Eldorado do Carajás,PA,Pará +4306767,Eldorado do Sul,RS,Rio Grande do Sul 2203503,Elesbão Veloso,PI,Piauí 3514908,Elias Fausto,SP,São Paulo 2203602,Eliseu Martins,PI,Piauí @@ -1651,8 +1670,8 @@ id,name,state_code,state 3301801,Engenheiro Paulo de Frontin,RJ,Rio de Janeiro 4306924,Engenho Velho,RS,Rio Grande do Sul 3123858,Entre Folhas,MG,Minas Gerais -4205175,Entre Rios,SC,Santa Catarina 2910503,Entre Rios,BA,Bahia +4205175,Entre Rios,SC,Santa Catarina 3123908,Entre Rios de Minas,MG,Minas Gerais 4107538,Entre Rios do Oeste,PR,Paraná 4306957,Entre Rios do Sul,RS,Rio Grande do Sul @@ -1740,8 +1759,8 @@ id,name,state_code,state 2910776,Feira da Mata,BA,Bahia 2910800,Feira de Santana,BA,Bahia 2702603,Feira Grande,AL,Alagoas -2802205,Feira Nova,SE,Sergipe 2605400,Feira Nova,PE,Pernambuco +2802205,Feira Nova,SE,Sergipe 2104073,Feira Nova do Maranhão,MA,Maranhão 3125408,Felício dos Santos,MG,Minas Gerais 2403707,Felipe Guerra,RN,Rio Grande do Norte @@ -1759,12 +1778,15 @@ id,name,state_code,state 3515608,Fernando Prestes,SP,São Paulo 3515509,Fernandópolis,SP,São Paulo 3515657,Fernão,SP,São Paulo +3515707,Ferraz de Vasconcelos,SP,São Paulo 1600238,Ferreira Gomes,AP,Amapá 2605509,Ferreiros,PE,Pernambuco 3125903,Ferros,MG,Minas Gerais 3125952,Fervedouro,MG,Minas Gerais 4107751,Figueira,PR,Paraná +5003900,Figueirão,MS,Mato Grosso do Sul 1707652,Figueirópolis,TO,Tocantins +5103809,Figueirópolis D'Oeste,MT,Mato Grosso 2910859,Filadélfia,BA,Bahia 1707702,Filadélfia,TO,Tocantins 2910909,Firmino Alves,BA,Bahia @@ -1792,6 +1814,7 @@ id,name,state_code,state 4205407,Florianópolis,SC,Santa Catarina 4108106,Flórida,PR,Paraná 3516002,Flórida Paulista,SP,São Paulo +3516101,Florínia,SP,São Paulo 1301605,Fonte Boa,AM,Amazonas 4308300,Fontoura Xavier,RS,Rio Grande do Sul 3126109,Formiga,MG,Minas Gerais @@ -1856,6 +1879,7 @@ id,name,state_code,state 2404101,Galinhos,RN,Rio Grande do Norte 4205605,Galvão,SC,Santa Catarina 2605905,Gameleira,PE,Pernambuco +5208152,Gameleira de Goiás,GO,Goiás 3127339,Gameleiras,MG,Minas Gerais 2911204,Gandu,BA,Bahia 2606002,Garanhuns,PE,Pernambuco @@ -1889,6 +1913,7 @@ id,name,state_code,state 3127354,Glaucilândia,MG,Minas Gerais 3517109,Glicério,SP,São Paulo 2911402,Glória,BA,Bahia +5103957,Glória D'Oeste,MT,Mato Grosso 5004007,Glória de Dourados,MS,Mato Grosso do Sul 2606101,Glória do Goitá,PE,Pernambuco 4309050,Glorinha,RS,Rio Grande do Sul @@ -1922,6 +1947,7 @@ id,name,state_code,state 2104552,Governador Edison Lobão,MA,Maranhão 2104602,Governador Eugênio Barros,MA,Maranhão 1101005,Governador Jorge Teixeira,RO,Rondônia +3202256,Governador Lindenberg,ES,Espírito Santo 2104628,Governador Luiz Rocha,MA,Maranhão 2911600,Governador Mangabeira,BA,Bahia 2104651,Governador Newton Bello,MA,Maranhão @@ -1953,8 +1979,8 @@ id,name,state_code,state 4309308,Guaíba,RS,Rio Grande do Sul 3517208,Guaiçara,SP,São Paulo 3517307,Guaimbê,SP,São Paulo -3517406,Guaíra,SP,São Paulo 4108809,Guaíra,PR,Paraná +3517406,Guaíra,SP,São Paulo 4108908,Guairaçá,PR,Paraná 2304954,Guaiúba,CE,Ceará 1301654,Guajará,AM,Amazonas @@ -1975,8 +2001,8 @@ id,name,state_code,state 3517703,Guará,SP,São Paulo 2506301,Guarabira,PB,Paraíba 3517802,Guaraçaí,SP,São Paulo -3517901,Guaraci,SP,São Paulo 4109203,Guaraci,PR,Paraná +3517901,Guaraci,SP,São Paulo 3128204,Guaraciaba,MG,Minas Gerais 4206405,Guaraciaba,SC,Santa Catarina 2305001,Guaraciaba do Norte,CE,Ceará @@ -2043,8 +2069,8 @@ id,name,state_code,state 3519071,Hortolândia,SP,São Paulo 2204600,Hugo Napoleão,PI,Piauí 4309654,Hulha Negra,RS,Rio Grande do Sul -4309704,Humaitá,RS,Rio Grande do Sul 1301704,Humaitá,AM,Amazonas +4309704,Humaitá,RS,Rio Grande do Sul 2105005,Humberto de Campos,MA,Maranhão 3519105,Iacanga,SP,São Paulo 5209903,Iaciara,GO,Goiás @@ -2059,6 +2085,7 @@ id,name,state_code,state 3519303,Ibaté,SP,São Paulo 2703007,Ibateguara,AL,Alagoas 3202454,Ibatiba,ES,Espírito Santo +4109757,Ibema,PR,Paraná 3129400,Ibertioga,MG,Minas Gerais 3129509,Ibiá,MG,Minas Gerais 4309803,Ibiaçá,RS,Rio Grande do Sul @@ -2118,6 +2145,8 @@ id,name,state_code,state 3130101,Igarapé,MG,Minas Gerais 2105153,Igarapé do Meio,MA,Maranhão 2105203,Igarapé Grande,MA,Maranhão +1503200,Igarapé-Açu,PA,Pará +1503309,Igarapé-Miri,PA,Pará 2606804,Igarassu,PE,Pernambuco 3520202,Igaratá,SP,São Paulo 3130200,Igaratinga,MG,Minas Gerais @@ -2127,12 +2156,12 @@ id,name,state_code,state 3301876,Iguaba Grande,RJ,Rio de Janeiro 2913507,Iguaí,BA,Bahia 3520301,Iguape,SP,São Paulo -2606903,Iguaracy,PE,Pernambuco 4110003,Iguaraçu,PR,Paraná +2606903,Iguaracy,PE,Pernambuco 3130309,Iguatama,MG,Minas Gerais 5004304,Iguatemi,MS,Mato Grosso do Sul -4110052,Iguatu,PR,Paraná 2305506,Iguatu,CE,Ceará +4110052,Iguatu,PR,Paraná 3130408,Ijaci,MG,Minas Gerais 4310207,Ijuí,RS,Rio Grande do Sul 3520426,Ilha Comprida,SP,São Paulo @@ -2163,11 +2192,11 @@ id,name,state_code,state 3130655,Indaiabira,MG,Minas Gerais 4207502,Indaial,SC,Santa Catarina 3520509,Indaiatuba,SP,São Paulo -4310405,Independência,RS,Rio Grande do Sul 2305605,Independência,CE,Ceará +4310405,Independência,RS,Rio Grande do Sul 3520608,Indiana,SP,São Paulo -4110409,Indianópolis,PR,Paraná 3130705,Indianópolis,MG,Minas Gerais +4110409,Indianópolis,PR,Paraná 3520707,Indiaporã,SP,São Paulo 5209952,Indiara,GO,Goiás 2802809,Indiaroba,SE,Sergipe @@ -2205,6 +2234,8 @@ id,name,state_code,state 4207601,Ipira,SC,Santa Catarina 2914000,Ipirá,BA,Bahia 4110508,Ipiranga,PR,Paraná +5210158,Ipiranga de Goiás,GO,Goiás +5104526,Ipiranga do Norte,MT,Mato Grosso 2204808,Ipiranga do Piauí,PI,Piauí 4310462,Ipiranga do Sul,RS,Rio Grande do Sul 1301803,Ipixuna,AM,Amazonas @@ -2216,6 +2247,7 @@ id,name,state_code,state 3521200,Iporanga,SP,São Paulo 2305803,Ipu,CE,Ceará 3521309,Ipuã,SP,São Paulo +4207684,Ipuaçu,SC,Santa Catarina 2607307,Ipubi,PE,Pernambuco 2404804,Ipueira,RN,Rio Grande do Norte 2305902,Ipueiras,CE,Ceará @@ -2238,8 +2270,8 @@ id,name,state_code,state 3521606,Irapuru,SP,São Paulo 2914406,Iraquara,BA,Bahia 2914505,Irará,BA,Bahia -4207858,Irati,SC,Santa Catarina 4110706,Irati,PR,Paraná +4207858,Irati,SC,Santa Catarina 2306108,Irauçuba,CE,Ceará 2914604,Irecê,BA,Bahia 4110805,Iretama,PR,Paraná @@ -2250,8 +2282,8 @@ id,name,state_code,state 5210307,Israelândia,GO,Goiás 4208005,Itá,SC,Santa Catarina 4310538,Itaara,RS,Rio Grande do Sul -2802908,Itabaiana,SE,Sergipe 2506905,Itabaiana,PB,Paraíba +2802908,Itabaiana,SE,Sergipe 2803005,Itabaianinha,SE,Sergipe 2914653,Itabela,BA,Bahia 3521705,Itaberá,SP,São Paulo @@ -2259,6 +2291,7 @@ id,name,state_code,state 5210406,Itaberaí,GO,Goiás 2803104,Itabi,SE,Sergipe 3131703,Itabira,MG,Minas Gerais +3131802,Itabirinha,MG,Minas Gerais 3131901,Itabirito,MG,Minas Gerais 3301900,Itaboraí,RJ,Rio de Janeiro 2914802,Itabuna,BA,Bahia @@ -2276,6 +2309,7 @@ id,name,state_code,state 3202702,Itaguaçu,ES,Espírito Santo 2915353,Itaguaçu da Bahia,BA,Bahia 3302007,Itaguaí,RJ,Rio de Janeiro +4110904,Itaguajé,PR,Paraná 3132206,Itaguara,MG,Minas Gerais 5210562,Itaguari,GO,Goiás 5210604,Itaguaru,GO,Goiás @@ -2301,6 +2335,7 @@ id,name,state_code,state 3302056,Italva,RJ,Rio de Janeiro 2915601,Itamaraju,BA,Bahia 3132503,Itamarandiba,MG,Minas Gerais +1301951,Itamarati,AM,Amazonas 3132602,Itamarati de Minas,MG,Minas Gerais 2915700,Itamari,BA,Bahia 3132701,Itambacuri,MG,Minas Gerais @@ -2314,20 +2349,22 @@ id,name,state_code,state 2915908,Itanagra,BA,Bahia 3522109,Itanhaém,SP,São Paulo 3133105,Itanhandu,MG,Minas Gerais +5104542,Itanhangá,MT,Mato Grosso 2916005,Itanhém,BA,Bahia 3133204,Itanhomi,MG,Minas Gerais 3133303,Itaobim,MG,Minas Gerais 3522158,Itaóca,SP,São Paulo 3302106,Itaocara,RJ,Rio de Janeiro 5210901,Itapaci,GO,Goiás -2306306,Itapajé,CE,Ceará 3133402,Itapagipe,MG,Minas Gerais +2306306,Itapajé,CE,Ceará 2916104,Itaparica,BA,Bahia 2916203,Itapé,BA,Bahia 2916302,Itapebi,BA,Bahia 3133501,Itapecerica,MG,Minas Gerais 3522208,Itapecerica da Serra,SP,São Paulo 2105401,Itapecuru Mirim,MA,Maranhão +4111209,Itapejara d'Oeste,PR,Paraná 4208302,Itapema,SC,Santa Catarina 3202801,Itapemirim,ES,Espírito Santo 4111258,Itaperuçu,PR,Paraná @@ -2335,8 +2372,8 @@ id,name,state_code,state 2607703,Itapetim,PE,Pernambuco 2916401,Itapetinga,BA,Bahia 3522307,Itapetininga,SP,São Paulo -3522406,Itapeva,SP,São Paulo 3133600,Itapeva,MG,Minas Gerais +3522406,Itapeva,SP,São Paulo 3522505,Itapevi,SP,São Paulo 2916500,Itapicuru,BA,Bahia 2306405,Itapipoca,CE,Ceará @@ -2353,8 +2390,8 @@ id,name,state_code,state 3522703,Itápolis,SP,São Paulo 5004502,Itaporã,MS,Mato Grosso do Sul 1711100,Itaporã do Tocantins,TO,Tocantins -3522802,Itaporanga,SP,São Paulo 2507002,Itaporanga,PB,Paraíba +3522802,Itaporanga,SP,São Paulo 2803203,Itaporanga d'Ajuda,SE,Sergipe 2507101,Itapororoca,PB,Paraíba 1101104,Itapuã do Oeste,RO,Rondônia @@ -2385,6 +2422,7 @@ id,name,state_code,state 2404903,Itaú,RN,Rio Grande do Norte 3133758,Itaú de Minas,MG,Minas Gerais 5104559,Itaúba,MT,Mato Grosso +1600253,Itaubal,AP,Amapá 5211404,Itauçu,GO,Goiás 2205102,Itaueira,PI,Piauí 3133808,Itaúna,MG,Minas Gerais @@ -2448,8 +2486,8 @@ id,name,state_code,state 4310876,Jacuizinho,RS,Rio Grande do Sul 1503804,Jacundá,PA,Pará 3524600,Jacupiranga,SP,São Paulo -4310900,Jacutinga,RS,Rio Grande do Sul 3134905,Jacutinga,MG,Minas Gerais +4310900,Jacutinga,RS,Rio Grande do Sul 4111902,Jaguapitã,PR,Paraná 2917607,Jaguaquara,BA,Bahia 3135001,Jaguaraçu,MG,Minas Gerais @@ -2480,6 +2518,7 @@ id,name,state_code,state 5104906,Jangada,MT,Mato Grosso 4112207,Janiópolis,PR,Paraná 3135209,Januária,MG,Minas Gerais +2405306,Januário Cicco,RN,Rio Grande do Norte 3135308,Japaraíba,MG,Minas Gerais 2703601,Japaratinga,AL,Alagoas 2803302,Japaratuba,SE,Sergipe @@ -2505,8 +2544,8 @@ id,name,state_code,state 2205250,Jardim do Mulato,PI,Piauí 2405702,Jardim do Seridó,RN,Rio Grande do Norte 4112603,Jardim Olinda,PR,Paraná -3525102,Jardinópolis,SP,São Paulo 4208955,Jardinópolis,SC,Santa Catarina +3525102,Jardinópolis,SP,São Paulo 4311130,Jari,RS,Rio Grande do Sul 3525201,Jarinu,SP,São Paulo 1100114,Jaru,RO,Rondônia @@ -2563,6 +2602,7 @@ id,name,state_code,state 2608206,Joaquim Nabuco,PE,Pernambuco 2205409,Joaquim Pires,PI,Piauí 4112801,Joaquim Távora,PR,Paraná +2513653,Joca Claudino,PB,Paraíba 2205458,Joca Marques,PI,Piauí 4311155,Jóia,RS,Rio Grande do Sul 4209102,Joinville,SC,Santa Catarina @@ -2619,8 +2659,8 @@ id,name,state_code,state 1503903,Juruti,PA,Pará 5105200,Juscimeira,MT,Mato Grosso 2918506,Jussara,BA,Bahia -4113007,Jussara,PR,Paraná 5212204,Jussara,GO,Goiás +4113007,Jussara,PR,Paraná 2918555,Jussari,BA,Bahia 2918605,Jussiape,BA,Bahia 1302306,Jutaí,AM,Amazonas @@ -2647,12 +2687,12 @@ id,name,state_code,state 1711902,Lagoa da Confusão,TO,Tocantins 3137205,Lagoa da Prata,MG,Minas Gerais 2508208,Lagoa de Dentro,PB,Paraíba +2608503,Lagoa de Itaenga,PE,Pernambuco 2406304,Lagoa de Pedras,RN,Rio Grande do Norte 2205573,Lagoa de São Francisco,PI,Piauí 2406403,Lagoa de Velhos,RN,Rio Grande do Norte 2205565,Lagoa do Barro do Piauí,PI,Piauí 2608453,Lagoa do Carro,PE,Pernambuco -2608503,Lagoa de Itaenga,PE,Pernambuco 2105922,Lagoa do Mato,MA,Maranhão 2608602,Lagoa do Ouro,PE,Pernambuco 2205581,Lagoa do Piauí,PI,Piauí @@ -2667,8 +2707,9 @@ id,name,state_code,state 2608750,Lagoa Grande,PE,Pernambuco 2105963,Lagoa Grande do Maranhão,MA,Maranhão 2406502,Lagoa Nova,RN,Rio Grande do Norte -2918803,Lagoa Real,BA,Bahia +2918753,Lagoa Real,BA,Bahia 2406601,Lagoa Salgada,RN,Rio Grande do Norte +5212253,Lagoa Santa,GO,Goiás 3137601,Lagoa Santa,MG,Minas Gerais 2508307,Lagoa Seca,PB,Paraíba 4311304,Lagoa Vermelha,RS,Rio Grande do Sul @@ -2677,22 +2718,23 @@ id,name,state_code,state 2205540,Lagoinha do Piauí,PI,Piauí 4209409,Laguna,SC,Santa Catarina 5005251,Laguna Carapã,MS,Mato Grosso do Sul -2918902,Laje,BA,Bahia +2918803,Laje,BA,Bahia 3302304,Laje do Muriaé,RJ,Rio de Janeiro 4311403,Lajeado,RS,Rio Grande do Sul 1712009,Lajeado,TO,Tocantins 4311429,Lajeado do Bugre,RS,Rio Grande do Sul 4209458,Lajeado Grande,SC,Santa Catarina 2105989,Lajeado Novo,MA,Maranhão -2919009,Lajedão,BA,Bahia -2919058,Lajedinho,BA,Bahia +2918902,Lajedão,BA,Bahia +2919009,Lajedinho,BA,Bahia 2608800,Lajedo,PE,Pernambuco -2918753,Lajedo do Tabocal,BA,Bahia +2919058,Lajedo do Tabocal,BA,Bahia 2406700,Lajes,RN,Rio Grande do Norte 2406809,Lajes Pintadas,RN,Rio Grande do Norte 3137700,Lajinha,MG,Minas Gerais 2919108,Lamarão,BA,Bahia 3137809,Lambari,MG,Minas Gerais +5105234,Lambari D'Oeste,MT,Mato Grosso 3137908,Lamim,MG,Minas Gerais 2205607,Landri Sales,PI,Piauí 4113205,Lapa,PR,Paraná @@ -2763,6 +2805,7 @@ id,name,state_code,state 3527504,Lucianópolis,SP,São Paulo 5105309,Luciara,MT,Mato Grosso 2406908,Lucrécia,RN,Rio Grande do Norte +3527603,Luís Antônio,SP,São Paulo 2205706,Luís Correia,PI,Piauí 2106201,Luís Domingues,MA,Maranhão 2919553,Luís Eduardo Magalhães,BA,Bahia @@ -2847,6 +2890,7 @@ id,name,state_code,state 4114203,Mandaguari,PR,Paraná 4114302,Mandirituba,PR,Paraná 3528601,Manduri,SP,São Paulo +4114351,Manfrinópolis,PR,Paraná 3139300,Manga,MG,Minas Gerais 3302601,Mangaratiba,RJ,Rio de Janeiro 4114401,Mangueirinha,PR,Paraná @@ -2885,10 +2929,10 @@ id,name,state_code,state 3528858,Marapoama,SP,São Paulo 4311791,Maratá,RS,Rio Grande do Sul 3203320,Marataízes,ES,Espírito Santo -2920700,Maraú,BA,Bahia 4311809,Marau,RS,Rio Grande do Sul -4210506,Maravilha,SC,Santa Catarina +2920700,Maraú,BA,Bahia 2704609,Maravilha,AL,Alagoas +4210506,Maravilha,SC,Santa Catarina 3139706,Maravilhas,MG,Minas Gerais 2509057,Marcação,PB,Paraíba 5105580,Marcelândia,MT,Mato Grosso @@ -2898,6 +2942,7 @@ id,name,state_code,state 2307809,Marco,CE,Ceará 2205953,Marcolândia,PI,Piauí 2206001,Marcos Parente,PI,Piauí +4114609,Marechal Cândido Rondon,PR,Paraná 2704708,Marechal Deodoro,AL,Alagoas 3203346,Marechal Floriano,ES,Espírito Santo 1200351,Marechal Thaumaturgo,AC,Acre @@ -2922,6 +2967,7 @@ id,name,state_code,state 4115200,Maringá,PR,Paraná 3529104,Marinópolis,SP,São Paulo 3140159,Mário Campos,MG,Minas Gerais +4115309,Mariópolis,PR,Paraná 4115358,Maripá,PR,Paraná 3140209,Maripá de Minas,MG,Minas Gerais 1504422,Marituba,PA,Pará @@ -2942,8 +2988,8 @@ id,name,state_code,state 2920908,Mascote,BA,Bahia 2308005,Massapê,CE,Ceará 2206050,Massapê do Piauí,PI,Piauí -4210605,Massaranduba,SC,Santa Catarina 2509206,Massaranduba,PB,Paraíba +4210605,Massaranduba,SC,Santa Catarina 4312104,Mata,RS,Rio Grande do Sul 2921005,Mata de São João,BA,Bahia 2705002,Mata Grande,AL,Alagoas @@ -3003,8 +3049,8 @@ id,name,state_code,state 3529609,Meridiano,SP,São Paulo 2308203,Meruoca,CE,Ceará 3529658,Mesópolis,SP,São Paulo -3302858,Mesquita,RJ,Rio de Janeiro 3141702,Mesquita,MG,Minas Gerais +3302858,Mesquita,RJ,Rio de Janeiro 2705200,Messias,AL,Alagoas 2407609,Messias Targino,RN,Rio Grande do Norte 2206209,Miguel Alves,PI,Piauí @@ -3049,6 +3095,7 @@ id,name,state_code,state 3530201,Mirante do Paranapanema,SP,São Paulo 4116000,Miraselva,PR,Paraná 3530300,Mirassol,SP,São Paulo +5105622,Mirassol d'Oeste,MT,Mato Grosso 3530409,Mirassolândia,SP,São Paulo 3142254,Miravânia,MG,Minas Gerais 4210852,Mirim Doce,SC,Santa Catarina @@ -3067,6 +3114,7 @@ id,name,state_code,state 5213400,Moiporá,GO,Goiás 2804102,Moita Bonita,SE,Sergipe 1504703,Moju,PA,Pará +1504752,Mojuí dos Campos,PA,Pará 2308500,Mombaça,CE,Ceará 3530904,Mombuca,SP,São Paulo 2106904,Monção,MA,Maranhão @@ -3099,8 +3147,8 @@ id,name,state_code,state 4312385,Monte Belo do Sul,RS,Rio Grande do Sul 4211058,Monte Carlo,SC,Santa Catarina 3143104,Monte Carmelo,MG,Minas Gerais -3531605,Monte Castelo,SP,São Paulo 4211108,Monte Castelo,SC,Santa Catarina +3531605,Monte Castelo,SP,São Paulo 2407906,Monte das Gameleiras,RN,Rio Grande do Norte 1713601,Monte do Carmo,TO,Tocantins 3143153,Monte Formoso,MG,Minas Gerais @@ -3125,6 +3173,7 @@ id,name,state_code,state 3143500,Morada Nova de Minas,MG,Minas Gerais 2308807,Moraújo,CE,Ceará 2614303,Moreilândia,PE,Pernambuco +4116109,Moreira Sales,PR,Paraná 2609402,Moreno,PE,Pernambuco 4312427,Mormaço,RS,Rio Grande do Sul 2921609,Morpará,BA,Bahia @@ -3167,8 +3216,10 @@ id,name,state_code,state 5214051,Mundo Novo,GO,Goiás 5005681,Mundo Novo,MS,Mato Grosso do Sul 3143807,Munhoz,MG,Minas Gerais +4116307,Munhoz de Melo,PR,Paraná 2922201,Muniz Ferreira,BA,Bahia 3203700,Muniz Freire,ES,Espírito Santo +2922250,Muquém de São Francisco,BA,Bahia 3203809,Muqui,ES,Espírito Santo 3143906,Muriaé,MG,Minas Gerais 2804300,Muribeca,SE,Sergipe @@ -3203,6 +3254,7 @@ id,name,state_code,state 3532405,Nazaré Paulista,SP,São Paulo 3144508,Nazareno,MG,Minas Gerais 2510006,Nazarezinho,PB,Paraíba +2206720,Nazária,PI,Piauí 5214408,Nazário,GO,Goiás 2804409,Neópolis,SE,Sergipe 3144607,Nepomuceno,MG,Minas Gerais @@ -3242,8 +3294,9 @@ id,name,state_code,state 4116604,Nova América da Colina,PR,Paraná 5006200,Nova Andradina,MS,Mato Grosso do Sul 4312807,Nova Araçá,RS,Rio Grande do Sul -4116703,Nova Aurora,PR,Paraná 5214804,Nova Aurora,GO,Goiás +4116703,Nova Aurora,PR,Paraná +5106158,Nova Bandeirantes,MT,Mato Grosso 4312906,Nova Bassano,RS,Rio Grande do Sul 3144672,Nova Belém,MG,Minas Gerais 4312955,Nova Boa Vista,RS,Rio Grande do Sul @@ -3264,10 +3317,11 @@ id,name,state_code,state 4211405,Nova Erechim,SC,Santa Catarina 4116901,Nova Esperança,PR,Paraná 1504950,Nova Esperança do Piriá,PA,Pará +4116950,Nova Esperança do Sudoeste,PR,Paraná 4313037,Nova Esperança do Sul,RS,Rio Grande do Sul 3532900,Nova Europa,SP,São Paulo -4117008,Nova Fátima,PR,Paraná 2922730,Nova Fátima,BA,Bahia +4117008,Nova Fátima,PR,Paraná 2510105,Nova Floresta,PB,Paraíba 3303401,Nova Friburgo,RJ,Rio de Janeiro 5214861,Nova Glória,GO,Goiás @@ -3296,10 +3350,10 @@ id,name,state_code,state 5106224,Nova Mutum,MT,Mato Grosso 5106174,Nova Nazaré,MT,Mato Grosso 3533403,Nova Odessa,SP,São Paulo -4117206,Nova Olímpia,PR,Paraná 5106232,Nova Olímpia,MT,Mato Grosso -2510204,Nova Olinda,PB,Paraíba +4117206,Nova Olímpia,PR,Paraná 2309201,Nova Olinda,CE,Ceará +2510204,Nova Olinda,PB,Paraíba 1714880,Nova Olinda,TO,Tocantins 2107357,Nova Olinda do Maranhão,MA,Maranhão 1303106,Nova Olinda do Norte,AM,Amazonas @@ -3332,8 +3386,8 @@ id,name,state_code,state 3136603,Nova União,MG,Minas Gerais 1101435,Nova União,RO,Rondônia 3203908,Nova Venécia,ES,Espírito Santo -4211603,Nova Veneza,SC,Santa Catarina 5215009,Nova Veneza,GO,Goiás +4211603,Nova Veneza,SC,Santa Catarina 2923001,Nova Viçosa,BA,Bahia 5106257,Nova Xavantina,MT,Mato Grosso 3533254,Novais,SP,São Paulo @@ -3347,9 +3401,9 @@ id,name,state_code,state 3145307,Novo Cruzeiro,MG,Minas Gerais 5215231,Novo Gama,GO,Goiás 4313409,Novo Hamburgo,RS,Rio Grande do Sul -3533502,Novo Horizonte,SP,São Paulo -4211652,Novo Horizonte,SC,Santa Catarina 2923035,Novo Horizonte,BA,Bahia +4211652,Novo Horizonte,SC,Santa Catarina +3533502,Novo Horizonte,SP,São Paulo 5106273,Novo Horizonte do Norte,MT,Mato Grosso 1100502,Novo Horizonte do Oeste,RO,Rondônia 5006259,Novo Horizonte do Sul,MS,Mato Grosso do Sul @@ -3387,6 +3441,7 @@ id,name,state_code,state 2207108,Olho D'Água do Piauí,PI,Piauí 2705903,Olho d'Água Grande,AL,Alagoas 2408409,Olho-d'Água do Borges,RN,Rio Grande do Norte +3145455,Olhos d'Água,MG,Minas Gerais 3533908,Olímpia,SP,São Paulo 3145505,Olímpio Noronha,MG,Minas Gerais 2609600,Olinda,PE,Pernambuco @@ -3440,10 +3495,11 @@ id,name,state_code,state 2923357,Ourolândia,BA,Bahia 5215504,Ouvidor,GO,Goiás 3534906,Pacaembu,SP,São Paulo +1505486,Pacajá,PA,Pará 2309607,Pacajus,CE,Ceará 1400456,Pacaraima,RR,Roraima -2804904,Pacatuba,SE,Sergipe 2309706,Pacatuba,CE,Ceará +2804904,Pacatuba,SE,Sergipe 2107506,Paço do Lumiar,MA,Maranhão 2309805,Pacoti,CE,Ceará 2309904,Pacujá,CE,Ceará @@ -3473,11 +3529,11 @@ id,name,state_code,state 2610004,Palmares,PE,Pernambuco 4313656,Palmares do Sul,RS,Rio Grande do Sul 3535101,Palmares Paulista,SP,São Paulo -1721000,Palmas,TO,Tocantins 4117602,Palmas,PR,Paraná +1721000,Palmas,TO,Tocantins 2923407,Palmas de Monte Alto,BA,Bahia -4212056,Palmeira,SC,Santa Catarina 4117701,Palmeira,PR,Paraná +4212056,Palmeira,SC,Santa Catarina 3535200,Palmeira d'Oeste,SP,São Paulo 4313706,Palmeira das Missões,RS,Rio Grande do Sul 2207405,Palmeira do Piauí,PI,Piauí @@ -3522,6 +3578,7 @@ id,name,state_code,state 2310258,Paraipaba,CE,Ceará 4212239,Paraíso,SC,Santa Catarina 3535705,Paraíso,SP,São Paulo +5006275,Paraíso das Águas,MS,Mato Grosso do Sul 4118006,Paraíso do Norte,PR,Paraná 4314027,Paraíso do Sul,RS,Rio Grande do Sul 1716109,Paraíso do Tocantins,TO,Tocantins @@ -3529,8 +3586,8 @@ id,name,state_code,state 2310308,Parambu,CE,Ceará 2923605,Paramirim,BA,Bahia 2310407,Paramoti,CE,Ceará -1716208,Paranã,TO,Tocantins 2408607,Paraná,RN,Rio Grande do Norte +1716208,Paranã,TO,Tocantins 4118105,Paranacity,PR,Paraná 4118204,Paranaguá,PR,Paraná 5006309,Paranaíba,MS,Mato Grosso do Sul @@ -3547,6 +3604,7 @@ id,name,state_code,state 3536000,Parapuã,SP,São Paulo 2510659,Parari,PB,Paraíba 2923704,Paratinga,BA,Bahia +3303807,Paraty,RJ,Rio de Janeiro 2408706,Paraú,RN,Rio Grande do Norte 1505536,Parauapebas,PA,Pará 5216403,Paraúna,GO,Goiás @@ -3563,14 +3621,15 @@ id,name,state_code,state 3536257,Parisi,SP,São Paulo 2207603,Parnaguá,PI,Piauí 2207702,Parnaíba,PI,Piauí -2403251,Parnamirim,RN,Rio Grande do Norte 2610400,Parnamirim,PE,Pernambuco +2403251,Parnamirim,RN,Rio Grande do Norte 2107803,Parnarama,MA,Maranhão 4314050,Parobé,RS,Rio Grande do Sul 2409100,Passa e Fica,RN,Rio Grande do Norte 3147600,Passa Quatro,MG,Minas Gerais 4314068,Passa Sete,RS,Rio Grande do Sul 3147709,Passa Tempo,MG,Minas Gerais +3147808,Passa-Vinte,MG,Minas Gerais 3147501,Passabém,MG,Minas Gerais 2510709,Passagem,PB,Paraíba 2409209,Passagem,RN,Rio Grande do Norte @@ -3585,6 +3644,7 @@ id,name,state_code,state 4212270,Passos Maia,SC,Santa Catarina 2108009,Pastos Bons,MA,Maranhão 3147956,Patis,MG,Minas Gerais +4118451,Pato Bragado,PR,Paraná 4118501,Pato Branco,PR,Paraná 2510808,Patos,PB,Paraíba 3148004,Patos de Minas,MG,Minas Gerais @@ -3595,6 +3655,7 @@ id,name,state_code,state 2409308,Patu,RN,Rio Grande do Norte 3303856,Paty do Alferes,RJ,Rio de Janeiro 2923902,Pau Brasil,BA,Bahia +1505551,Pau d'Arco,PA,Pará 1716307,Pau D'Arco,TO,Tocantins 2207793,Pau D'Arco do Piauí,PI,Piauí 2409407,Pau dos Ferros,RN,Rio Grande do Norte @@ -3628,8 +3689,8 @@ id,name,state_code,state 3148707,Pedra Azul,MG,Minas Gerais 3536802,Pedra Bela,SP,São Paulo 3148756,Pedra Bonita,MG,Minas Gerais -2511004,Pedra Branca,PB,Paraíba 2310506,Pedra Branca,CE,Ceará +2511004,Pedra Branca,PB,Paraíba 1600154,Pedra Branca do Amapari,AP,Amapá 3148806,Pedra do Anta,MG,Minas Gerais 3148905,Pedra do Indaiá,MG,Minas Gerais @@ -3667,6 +3728,7 @@ id,name,state_code,state 3149408,Pedro Teixeira,MG,Minas Gerais 2409803,Pedro Velho,RN,Rio Grande do Norte 1716604,Peixe,TO,Tocantins +1505601,Peixe-Boi,PA,Pará 5106422,Peixoto de Azevedo,MT,Mato Grosso 4314308,Pejuçara,RS,Rio Grande do Sul 4314407,Pelotas,RS,Rio Grande do Sul @@ -3692,9 +3754,11 @@ id,name,state_code,state 2108454,Peritoró,MA,Maranhão 4118857,Perobal,PR,Paraná 4118907,Pérola,PR,Paraná +4119004,Pérola d'Oeste,PR,Paraná 5216452,Perolândia,GO,Goiás 3537602,Peruíbe,SP,São Paulo 3150000,Pescador,MG,Minas Gerais +4212650,Pescaria Brava,SC,Santa Catarina 2610905,Pesqueira,PE,Pernambuco 2611002,Petrolândia,PE,Pernambuco 4212700,Petrolândia,SC,Santa Catarina @@ -3721,8 +3785,8 @@ id,name,state_code,state 2511509,Pilar,PB,Paraíba 5216908,Pilar de Goiás,GO,Goiás 3537909,Pilar do Sul,SP,São Paulo -2410009,Pilões,RN,Rio Grande do Norte 2511608,Pilões,PB,Paraíba +2410009,Pilões,RN,Rio Grande do Norte 2511707,Pilõezinhos,PB,Paraíba 3150505,Pimenta,MG,Minas Gerais 1100189,Pimenta Bueno,RO,Rondônia @@ -3736,16 +3800,17 @@ id,name,state_code,state 3538105,Pindorama,SP,São Paulo 1717008,Pindorama do Tocantins,TO,Tocantins 2310852,Pindoretama,CE,Ceará +3150539,Pingo-d'Água,MG,Minas Gerais 4119152,Pinhais,PR,Paraná 4314456,Pinhal,RS,Rio Grande do Sul 4314464,Pinhal da Serra,RS,Rio Grande do Sul 4119251,Pinhal de São Bento,PR,Paraná 4314472,Pinhal Grande,RS,Rio Grande do Sul 4119202,Pinhalão,PR,Paraná -3538204,Pinhalzinho,SP,São Paulo 4212908,Pinhalzinho,SC,Santa Catarina -2805208,Pinhão,SE,Sergipe +3538204,Pinhalzinho,SP,São Paulo 4119301,Pinhão,PR,Paraná +2805208,Pinhão,SE,Sergipe 3303955,Pinheiral,RJ,Rio de Janeiro 4314498,Pinheirinho do Vale,RS,Rio Grande do Sul 2108603,Pinheiro,MA,Maranhão @@ -3753,6 +3818,7 @@ id,name,state_code,state 4213005,Pinheiro Preto,SC,Santa Catarina 3204104,Pinheiros,ES,Espírito Santo 2924652,Pintadas,BA,Bahia +4314548,Pinto Bandeira,RS,Rio Grande do Sul 3150570,Pintópolis,MG,Minas Gerais 2208205,Pio IX,PI,Piauí 2108702,Pio XII,MA,Maranhão @@ -3798,8 +3864,8 @@ id,name,state_code,state 2924801,Piritiba,BA,Bahia 2511806,Pirpirituba,PB,Paraíba 4119608,Pitanga,PR,Paraná -3539509,Pitangueiras,SP,São Paulo 4119657,Pitangueiras,PR,Paraná +3539509,Pitangueiras,SP,São Paulo 3151404,Pitangui,MG,Minas Gerais 2511905,Pitimbu,PB,Paraíba 1717503,Pium,TO,Tocantins @@ -3810,10 +3876,10 @@ id,name,state_code,state 5217609,Planaltina,GO,Goiás 4119707,Planaltina do Paraná,PR,Paraná 2924900,Planaltino,BA,Bahia +2925006,Planalto,BA,Bahia +4119806,Planalto,PR,Paraná 4314704,Planalto,RS,Rio Grande do Sul 3539608,Planalto,SP,São Paulo -4119806,Planalto,PR,Paraná -2925006,Planalto,BA,Bahia 4213153,Planalto Alegre,SC,Santa Catarina 5106455,Planalto da Serra,MT,Mato Grosso 3151602,Planura,MG,Minas Gerais @@ -3926,6 +3992,7 @@ id,name,state_code,state 3152600,Pouso Alto,MG,Minas Gerais 4315131,Pouso Novo,RS,Rio Grande do Sul 4213708,Pouso Redondo,SC,Santa Catarina +5107008,Poxoréu,MT,Mato Grosso 3540853,Pracinha,SP,São Paulo 1600550,Pracuúba,AP,Amapá 2925501,Prado,BA,Bahia @@ -3936,8 +4003,9 @@ id,name,state_code,state 3541000,Praia Grande,SP,São Paulo 1718303,Praia Norte,TO,Tocantins 1506005,Prainha,PA,Pará -2512200,Prata,PB,Paraíba +4120358,Pranchita,PR,Paraná 3152808,Prata,MG,Minas Gerais +2512200,Prata,PB,Paraíba 2208601,Prata do Piauí,PI,Piauí 3541059,Pratânia,SP,São Paulo 3152907,Pratápolis,MG,Minas Gerais @@ -3945,6 +4013,7 @@ id,name,state_code,state 3541109,Presidente Alves,SP,São Paulo 3153103,Presidente Bernardes,MG,Minas Gerais 3541208,Presidente Bernardes,SP,São Paulo +4213906,Presidente Castello Branco,SC,Santa Catarina 4120408,Presidente Castelo Branco,PR,Paraná 2925600,Presidente Dutra,BA,Bahia 2109106,Presidente Dutra,MA,Maranhão @@ -3952,14 +4021,14 @@ id,name,state_code,state 1303536,Presidente Figueiredo,AM,Amazonas 4214003,Presidente Getúlio,SC,Santa Catarina 2925709,Presidente Jânio Quadros,BA,Bahia -3153202,Presidente Juscelino,MG,Minas Gerais 2109205,Presidente Juscelino,MA,Maranhão +3153202,Presidente Juscelino,MG,Minas Gerais 3204302,Presidente Kennedy,ES,Espírito Santo 1718402,Presidente Kennedy,TO,Tocantins 3153301,Presidente Kubitschek,MG,Minas Gerais 4315149,Presidente Lucena,RS,Rio Grande do Sul -1100254,Presidente Médici,RO,Rondônia 2109239,Presidente Médici,MA,Maranhão +1100254,Presidente Médici,RO,Rondônia 4214102,Presidente Nereu,SC,Santa Catarina 3153400,Presidente Olegário,MG,Minas Gerais 3541406,Presidente Prudente,SP,São Paulo @@ -3967,8 +4036,8 @@ id,name,state_code,state 2925758,Presidente Tancredo Neves,BA,Bahia 2109304,Presidente Vargas,MA,Maranhão 3541505,Presidente Venceslau,SP,São Paulo -2611408,Primavera,PE,Pernambuco 1506104,Primavera,PA,Pará +2611408,Primavera,PE,Pernambuco 1101476,Primavera de Rondônia,RO,Rondônia 5107040,Primavera do Leste,MT,Mato Grosso 2109403,Primeira Cruz,MA,Maranhão @@ -4033,6 +4102,7 @@ id,name,state_code,state 4121257,Ramilândia,PR,Paraná 3542206,Rancharia,SP,São Paulo 4121307,Rancho Alegre,PR,Paraná +4121356,Rancho Alegre D'Oeste,PR,Paraná 4214300,Rancho Queimado,SC,Santa Catarina 2109452,Raposa,MA,Maranhão 3153905,Raposos,MG,Minas Gerais @@ -4078,8 +4148,8 @@ id,name,state_code,state 1718550,Riachinho,TO,Tocantins 2410702,Riacho da Cruz,RN,Rio Grande do Norte 2611705,Riacho das Almas,PE,Pernambuco -2410801,Riacho de Santana,RN,Rio Grande do Norte 2926400,Riacho de Santana,BA,Bahia +2410801,Riacho de Santana,RN,Rio Grande do Norte 2512788,Riacho de Santo Antônio,PB,Paraíba 2512804,Riacho dos Cavalos,PB,Paraíba 3154507,Riacho dos Machados,MG,Minas Gerais @@ -4127,8 +4197,8 @@ id,name,state_code,state 4122206,Rio Branco do Sul,PR,Paraná 5007208,Rio Brilhante,MS,Mato Grosso do Sul 3154903,Rio Casca,MG,Minas Gerais -3543907,Rio Claro,SP,São Paulo 3304409,Rio Claro,RJ,Rio de Janeiro +3543907,Rio Claro,SP,São Paulo 1100262,Rio Crespo,RO,Rondônia 1718659,Rio da Conceição,TO,Tocantins 4214409,Rio das Antas,SC,Santa Catarina @@ -4158,8 +4228,8 @@ id,name,state_code,state 3155306,Rio Manso,MG,Minas Gerais 1506161,Rio Maria,PA,Pará 4215000,Rio Negrinho,SC,Santa Catarina -4122305,Rio Negro,PR,Paraná 5007307,Rio Negro,MS,Mato Grosso do Sul +4122305,Rio Negro,PR,Paraná 3155405,Rio Novo,MG,Minas Gerais 3204401,Rio Novo do Sul,ES,Espírito Santo 3155504,Rio Paranaíba,MG,Minas Gerais @@ -4222,8 +4292,8 @@ id,name,state_code,state 3544509,Rubinéia,SP,São Paulo 1506195,Rurópolis,PA,Pará 2311801,Russas,CE,Ceará -2411106,Ruy Barbosa,RN,Rio Grande do Norte 2927200,Ruy Barbosa,BA,Bahia +2411106,Ruy Barbosa,RN,Rio Grande do Norte 3156700,Sabará,MG,Minas Gerais 4122701,Sabáudia,PR,Paraná 3544608,Sabino,SP,São Paulo @@ -4250,8 +4320,8 @@ id,name,state_code,state 2311959,Salitre,CE,Ceará 3545100,Salmourão,SP,São Paulo 2612307,Saloá,PE,Pernambuco -3545159,Saltinho,SP,São Paulo 4215356,Saltinho,SC,Santa Catarina +3545159,Saltinho,SP,São Paulo 3545209,Salto,SP,São Paulo 3157104,Salto da Divisa,MG,Minas Gerais 3545308,Salto de Pirapora,SP,São Paulo @@ -4273,15 +4343,17 @@ id,name,state_code,state 3545506,Sandovalina,SP,São Paulo 4215455,Sangão,SC,Santa Catarina 2612406,Sanharó,PE,Pernambuco +4317103,Sant'Ana do Livramento,RS,Rio Grande do Sul 3545605,Santa Adélia,SP,São Paulo 3545704,Santa Albertina,SP,São Paulo 4123105,Santa Amélia,PR,Paraná -3157203,Santa Bárbara,MG,Minas Gerais 2927507,Santa Bárbara,BA,Bahia +3157203,Santa Bárbara,MG,Minas Gerais 3545803,Santa Bárbara d'Oeste,SP,São Paulo 5219100,Santa Bárbara de Goiás,GO,Goiás 3157252,Santa Bárbara do Leste,MG,Minas Gerais 3157278,Santa Bárbara do Monte Verde,MG,Minas Gerais +1506351,Santa Bárbara do Pará,PA,Pará 4316709,Santa Bárbara do Sul,RS,Rio Grande do Sul 3157302,Santa Bárbara do Tugúrio,MG,Minas Gerais 3546009,Santa Branca,SP,São Paulo @@ -4294,8 +4366,8 @@ id,name,state_code,state 3546108,Santa Clara d'Oeste,SP,São Paulo 4316758,Santa Clara do Sul,RS,Rio Grande do Sul 2513208,Santa Cruz,PB,Paraíba -2411205,Santa Cruz,RN,Rio Grande do Norte 2612455,Santa Cruz,PE,Pernambuco +2411205,Santa Cruz,RN,Rio Grande do Norte 2927705,Santa Cruz Cabrália,BA,Bahia 2612471,Santa Cruz da Baixa Verde,PE,Pernambuco 3546207,Santa Cruz da Conceição,SP,São Paulo @@ -4304,6 +4376,7 @@ id,name,state_code,state 3546306,Santa Cruz das Palmeiras,SP,São Paulo 5219209,Santa Cruz de Goiás,GO,Goiás 3157336,Santa Cruz de Minas,MG,Minas Gerais +4123303,Santa Cruz de Monte Castelo,PR,Paraná 3157377,Santa Cruz de Salinas,MG,Minas Gerais 1506401,Santa Cruz do Arari,PA,Pará 2612505,Santa Cruz do Capibaribe,PE,Pernambuco @@ -4320,34 +4393,35 @@ id,name,state_code,state 3157609,Santa Fé de Minas,MG,Minas Gerais 1718865,Santa Fé do Araguaia,TO,Tocantins 3546603,Santa Fé do Sul,SP,São Paulo -2209203,Santa Filomena,PI,Piauí 2612554,Santa Filomena,PE,Pernambuco +2209203,Santa Filomena,PI,Piauí 2109759,Santa Filomena do Maranhão,MA,Maranhão 3546702,Santa Gertrudes,SP,São Paulo +2109809,Santa Helena,MA,Maranhão 2513307,Santa Helena,PB,Paraíba 4123501,Santa Helena,PR,Paraná 4215554,Santa Helena,SC,Santa Catarina -2109809,Santa Helena,MA,Maranhão 5219308,Santa Helena de Goiás,GO,Goiás 3157658,Santa Helena de Minas,MG,Minas Gerais 2927903,Santa Inês,BA,Bahia +2109908,Santa Inês,MA,Maranhão 2513356,Santa Inês,PB,Paraíba 4123600,Santa Inês,PR,Paraná -2109908,Santa Inês,MA,Maranhão 5219357,Santa Isabel,GO,Goiás 3546801,Santa Isabel,SP,São Paulo 4123709,Santa Isabel do Ivaí,PR,Paraná 1303601,Santa Isabel do Rio Negro,AM,Amazonas 4123808,Santa Izabel do Oeste,PR,Paraná +1506500,Santa Izabel do Pará,PA,Pará 3157708,Santa Juliana,MG,Minas Gerais 3204500,Santa Leopoldina,ES,Espírito Santo -3546900,Santa Lúcia,SP,São Paulo 4123824,Santa Lúcia,PR,Paraná +3546900,Santa Lúcia,SP,São Paulo 2209302,Santa Luz,PI,Piauí 2928059,Santa Luzia,BA,Bahia -2513406,Santa Luzia,PB,Paraíba -3157807,Santa Luzia,MG,Minas Gerais 2110005,Santa Luzia,MA,Maranhão +3157807,Santa Luzia,MG,Minas Gerais +2513406,Santa Luzia,PB,Paraíba 1100296,Santa Luzia D'Oeste,RO,Rondônia 2806305,Santa Luzia do Itanhy,SE,Sergipe 2707909,Santa Luzia do Norte,AL,Alagoas @@ -4376,8 +4450,8 @@ id,name,state_code,state 4123956,Santa Mônica,PR,Paraná 2312205,Santa Quitéria,CE,Ceará 2110104,Santa Quitéria do Maranhão,MA,Maranhão -2513703,Santa Rita,PB,Paraíba 2110203,Santa Rita,MA,Maranhão +2513703,Santa Rita,PB,Paraíba 3547403,Santa Rita d'Oeste,SP,São Paulo 3159209,Santa Rita de Caldas,MG,Minas Gerais 2928406,Santa Rita de Cássia,BA,Bahia @@ -4395,8 +4469,8 @@ id,name,state_code,state 4317202,Santa Rosa,RS,Rio Grande do Sul 3159704,Santa Rosa da Serra,MG,Minas Gerais 5219506,Santa Rosa de Goiás,GO,Goiás -2806503,Santa Rosa de Lima,SE,Sergipe 4215604,Santa Rosa de Lima,SC,Santa Catarina +2806503,Santa Rosa de Lima,SE,Sergipe 3547601,Santa Rosa de Viterbo,SP,São Paulo 2209377,Santa Rosa do Piauí,PI,Piauí 1200435,Santa Rosa do Purus,AC,Acre @@ -4404,6 +4478,7 @@ id,name,state_code,state 1718907,Santa Rosa do Tocantins,TO,Tocantins 3547650,Santa Salete,SP,São Paulo 3204609,Santa Teresa,ES,Espírito Santo +2928505,Santa Teresinha,BA,Bahia 2513802,Santa Teresinha,PB,Paraíba 4317251,Santa Tereza,RS,Rio Grande do Sul 5219605,Santa Tereza de Goiás,GO,Goiás @@ -4413,18 +4488,20 @@ id,name,state_code,state 2612802,Santa Terezinha,PE,Pernambuco 4215679,Santa Terezinha,SC,Santa Catarina 5219704,Santa Terezinha de Goiás,GO,Goiás -4215695,Santa Terezinha do Progresso,SC,Santa Catarina +4124053,Santa Terezinha de Itaipu,PR,Paraná +4215687,Santa Terezinha do Progresso,SC,Santa Catarina 1720002,Santa Terezinha do Tocantins,TO,Tocantins 3159803,Santa Vitória,MG,Minas Gerais 4317301,Santa Vitória do Palmar,RS,Rio Grande do Sul 2928000,Santaluz,BA,Bahia -2928208,Santana,BA,Bahia 1600600,Santana,AP,Amapá +2928208,Santana,BA,Bahia 4317004,Santana da Boa Vista,RS,Rio Grande do Sul 3547205,Santana da Ponte Pensa,SP,São Paulo 3158300,Santana da Vargem,MG,Minas Gerais 3158409,Santana de Cataguases,MG,Minas Gerais 2513505,Santana de Mangueira,PB,Paraíba +3547304,Santana de Parnaíba,SP,São Paulo 3158508,Santana de Pirapama,MG,Minas Gerais 2312007,Santana do Acaraú,CE,Ceará 1506708,Santana do Araguaia,PA,Pará @@ -4447,10 +4524,9 @@ id,name,state_code,state 3159100,Santana dos Montes,MG,Minas Gerais 2928307,Santanópolis,BA,Bahia 1506807,Santarém,PA,Pará -2513653,Joca Claudino,PB,Paraíba 1506906,Santarém Novo,PA,Pará 4317400,Santiago,RS,Rio Grande do Sul -4215687,Santiago do Sul,SC,Santa Catarina +4215695,Santiago do Sul,SC,Santa Catarina 5107263,Santo Afonso,MT,Mato Grosso 2928604,Santo Amaro,BA,Bahia 4215703,Santo Amaro da Imperatriz,SC,Santa Catarina @@ -4481,6 +4557,8 @@ id,name,state_code,state 3160207,Santo Antônio do Itambé,MG,Minas Gerais 3160306,Santo Antônio do Jacinto,MG,Minas Gerais 3548104,Santo Antônio do Jardim,SP,São Paulo +5107792,Santo Antônio do Leste,MT,Mato Grosso +5107800,Santo Antônio do Leverger,MT,Mato Grosso 3160405,Santo Antônio do Monte,MG,Minas Gerais 4317558,Santo Antônio do Palma,RS,Rio Grande do Sul 4124301,Santo Antônio do Paraíso,PR,Paraná @@ -4521,18 +4599,22 @@ id,name,state_code,state 3548708,São Bernardo do Campo,SP,São Paulo 4215901,São Bonifácio,SC,Santa Catarina 4318002,São Borja,RS,Rio Grande do Sul +2708204,São Brás,AL,Alagoas 3160900,São Brás do Suaçuí,MG,Minas Gerais 2209559,São Braz do Piauí,PI,Piauí 1507102,São Caetano de Odivelas,PA,Pará 3548807,São Caetano do Sul,SP,São Paulo +2613107,São Caitano,PE,Pernambuco 4216008,São Carlos,SC,Santa Catarina 3548906,São Carlos,SP,São Paulo 4124608,São Carlos do Ivaí,PR,Paraná 2806701,São Cristóvão,SE,Sergipe 4216057,São Cristovão do Sul,SC,Santa Catarina 2928901,São Desidério,BA,Bahia -4216107,São Domingos,SC,Santa Catarina 2928950,São Domingos,BA,Bahia +5219803,São Domingos,GO,Goiás +2513968,São Domingos,PB,Paraíba +4216107,São Domingos,SC,Santa Catarina 2806800,São Domingos,SE,Sergipe 3160959,São Domingos das Dores,MG,Minas Gerais 1507151,São Domingos do Araguaia,PA,Pará @@ -4555,15 +4637,16 @@ id,name,state_code,state 1507300,São Félix do Xingu,PA,Pará 2411809,São Fernando,RN,Rio Grande do Norte 3304805,São Fidélis,RJ,Rio de Janeiro -3549003,São Francisco,SP,São Paulo +3161106,São Francisco,MG,Minas Gerais 2513984,São Francisco,PB,Paraíba 2806909,São Francisco,SE,Sergipe -3161106,São Francisco,MG,Minas Gerais +3549003,São Francisco,SP,São Paulo 4318101,São Francisco de Assis,RS,Rio Grande do Sul +2209658,São Francisco de Assis do Piauí,PI,Piauí 5219902,São Francisco de Goiás,GO,Goiás 3304755,São Francisco de Itabapoana,RJ,Rio de Janeiro -4318200,São Francisco de Paula,RS,Rio Grande do Sul 3161205,São Francisco de Paula,MG,Minas Gerais +4318200,São Francisco de Paula,RS,Rio Grande do Sul 3161304,São Francisco de Sales,MG,Minas Gerais 2110856,São Francisco do Brejão,MA,Maranhão 2929206,São Francisco do Conde,BA,Bahia @@ -4574,8 +4657,8 @@ id,name,state_code,state 1507409,São Francisco do Pará,PA,Pará 2209708,São Francisco do Piauí,PI,Piauí 4216206,São Francisco do Sul,SC,Santa Catarina -4318309,São Gabriel,RS,Rio Grande do Sul 2929255,São Gabriel,BA,Bahia +4318309,São Gabriel,RS,Rio Grande do Sul 1303809,São Gabriel da Cachoeira,AM,Amazonas 3204708,São Gabriel da Palha,ES,Espírito Santo 5007695,São Gabriel do Oeste,MS,Mato Grosso do Sul @@ -4597,10 +4680,10 @@ id,name,state_code,state 3162104,São Gotardo,MG,Minas Gerais 4318408,São Jerônimo,RS,Rio Grande do Sul 4124707,São Jerônimo da Serra,PR,Paraná -4124806,São João,PR,Paraná 2613206,São João,PE,Pernambuco -4216305,São João Batista,SC,Santa Catarina +4124806,São João,PR,Paraná 2111003,São João Batista,MA,Maranhão +4216305,São João Batista,SC,Santa Catarina 3162203,São João Batista do Glória,MG,Minas Gerais 5220009,São João d'Aliança,GO,Goiás 1400506,São João da Baliza,RR,Roraima @@ -4654,6 +4737,7 @@ id,name,state_code,state 3162922,São Joaquim de Bicas,MG,Minas Gerais 2613305,São Joaquim do Monte,PE,Pernambuco 4318440,São Jorge,RS,Rio Grande do Sul +4125209,São Jorge d'Oeste,PR,Paraná 4125308,São Jorge do Ivaí,PR,Paraná 4125357,São Jorge do Patrocínio,PR,Paraná 4216602,São José,SC,Santa Catarina @@ -4721,6 +4805,7 @@ id,name,state_code,state 3163706,São Lourenço,MG,Minas Gerais 2613701,São Lourenço da Mata,PE,Pernambuco 3549953,São Lourenço da Serra,SP,São Paulo +4216909,São Lourenço do Oeste,SC,Santa Catarina 2210359,São Lourenço do Piauí,PI,Piauí 4318804,São Lourenço do Sul,RS,Rio Grande do Sul 4217006,São Ludgero,SC,Santa Catarina @@ -4732,13 +4817,14 @@ id,name,state_code,state 2111409,São Luís Gonzaga do Maranhão,MA,Maranhão 1400605,São Luiz,RR,Roraima 5220157,São Luiz do Norte,GO,Goiás +3550001,São Luiz do Paraitinga,SP,São Paulo 4318903,São Luiz Gonzaga,RS,Rio Grande do Sul 2514909,São Mamede,PB,Paraíba 4125555,São Manoel do Paraná,PR,Paraná 3550100,São Manuel,SP,São Paulo 4319000,São Marcos,RS,Rio Grande do Sul -4217105,São Martinho,SC,Santa Catarina 4319109,São Martinho,RS,Rio Grande do Sul +4217105,São Martinho,SC,Santa Catarina 4319125,São Martinho da Serra,RS,Rio Grande do Sul 3204906,São Mateus,ES,Espírito Santo 2111508,São Mateus do Maranhão,MA,Maranhão @@ -4754,6 +4840,7 @@ id,name,state_code,state 3163805,São Miguel do Anta,MG,Minas Gerais 5220207,São Miguel do Araguaia,GO,Goiás 2210391,São Miguel do Fidalgo,PI,Piauí +2412559,São Miguel do Gostoso,RN,Rio Grande do Norte 1507607,São Miguel do Guamá,PA,Pará 1100320,São Miguel do Guaporé,RO,Rondônia 4125704,São Miguel do Iguaçu,PR,Paraná @@ -4769,8 +4856,8 @@ id,name,state_code,state 4319307,São Paulo das Missões,RS,Rio Grande do Sul 1303908,São Paulo de Olivença,AM,Amazonas 2412609,São Paulo do Potengi,RN,Rio Grande do Norte -3550407,São Pedro,SP,São Paulo 2412708,São Pedro,RN,Rio Grande do Norte +3550407,São Pedro,SP,São Paulo 2111532,São Pedro da Água Branca,MA,Maranhão 3305208,São Pedro da Aldeia,RJ,Rio de Janeiro 5107404,São Pedro da Cipa,MT,Mato Grosso @@ -4804,6 +4891,7 @@ id,name,state_code,state 3164407,São Sebastião da Bela Vista,MG,Minas Gerais 1507706,São Sebastião da Boa Vista,PA,Pará 3550803,São Sebastião da Grama,SP,São Paulo +3164431,São Sebastião da Vargem Alegre,MG,Minas Gerais 2515104,São Sebastião de Lagoa de Roça,PB,Paraíba 3305307,São Sebastião do Alto,RJ,Rio de Janeiro 3164472,São Sebastião do Anta,MG,Minas Gerais @@ -4820,32 +4908,35 @@ id,name,state_code,state 4319604,São Sepé,RS,Rio Grande do Sul 5220405,São Simão,GO,Goiás 3550902,São Simão,SP,São Paulo +3165206,São Thomé das Letras,MG,Minas Gerais 3165008,São Tiago,MG,Minas Gerais 3165107,São Tomás de Aquino,MG,Minas Gerais -2412906,São Tomé,RN,Rio Grande do Norte 4126108,São Tomé,PR,Paraná +2412906,São Tomé,RN,Rio Grande do Norte 4319703,São Valentim,RS,Rio Grande do Sul 4319711,São Valentim do Sul,RS,Rio Grande do Sul +1720499,São Valério,TO,Tocantins 4319737,São Valério do Sul,RS,Rio Grande do Sul 4319752,São Vendelino,RS,Rio Grande do Sul -3551009,São Vicente,SP,São Paulo 2413003,São Vicente,RN,Rio Grande do Norte +3551009,São Vicente,SP,São Paulo 3165305,São Vicente de Minas,MG,Minas Gerais +2515401,São Vicente do Seridó,PB,Paraíba 4319802,São Vicente do Sul,RS,Rio Grande do Sul -2613800,São Vicente Ferrer,PE,Pernambuco 2111706,São Vicente Ferrer,MA,Maranhão +2613800,São Vicente Ferrer,PE,Pernambuco 2515302,Sapé,PB,Paraíba 2929602,Sapeaçu,BA,Bahia 5107875,Sapezal,MT,Mato Grosso 4319901,Sapiranga,RS,Rio Grande do Sul 4126207,Sapopema,PR,Paraná 3165404,Sapucaí-Mirim,MG,Minas Gerais -3305406,Sapucaia,RJ,Rio de Janeiro 1507755,Sapucaia,PA,Pará +3305406,Sapucaia,RJ,Rio de Janeiro 4320008,Sapucaia do Sul,RS,Rio Grande do Sul 3305505,Saquarema,RJ,Rio de Janeiro -4320107,Sarandi,RS,Rio Grande do Sul 4126256,Sarandi,PR,Paraná +4320107,Sarandi,RS,Rio Grande do Sul 3551108,Sarapuí,SP,São Paulo 3165503,Sardoá,MG,Minas Gerais 3551207,Sarutaiá,SP,São Paulo @@ -4870,6 +4961,7 @@ id,name,state_code,state 4320305,Selbach,RS,Rio Grande do Sul 5007802,Selvíria,MS,Mato Grosso do Sul 3165560,Sem-Peixe,MG,Minas Gerais +1200500,Sena Madureira,AC,Acre 2111748,Senador Alexandre Costa,MA,Maranhão 3165578,Senador Amaral,MG,Minas Gerais 5220454,Senador Canedo,GO,Goiás @@ -4879,6 +4971,7 @@ id,name,state_code,state 2413201,Senador Georgino Avelino,RN,Rio Grande do Norte 1200450,Senador Guiomard,AC,Acre 3165800,Senador José Bento,MG,Minas Gerais +1507805,Senador José Porfírio,PA,Pará 2111763,Senador La Rocque,MA,Maranhão 3165909,Senador Modestino Gonçalves,MG,Minas Gerais 2312700,Senador Pompeu,CE,Ceará @@ -4894,7 +4987,6 @@ id,name,state_code,state 2930204,Sento Sé,BA,Bahia 4320404,Serafina Corrêa,RS,Rio Grande do Sul 3166303,Sericita,MG,Minas Gerais -2515401,São Vicente do Seridó,PB,Paraíba 1101500,Seringueiras,RO,Rondônia 4320453,Sério,RS,Rio Grande do Sul 3166402,Seritinga,MG,Minas Gerais @@ -4940,8 +5032,8 @@ id,name,state_code,state 4126504,Sertanópolis,PR,Paraná 4320503,Sertão,RS,Rio Grande do Sul 4320552,Sertão Santana,RS,Rio Grande do Sul -3551702,Sertãozinho,SP,São Paulo 2515930,Sertãozinho,PB,Paraíba +3551702,Sertãozinho,SP,São Paulo 3551801,Sete Barras,SP,São Paulo 4320578,Sete de Setembro,RS,Rio Grande do Sul 3167202,Sete Lagoas,MG,Minas Gerais @@ -4976,8 +5068,8 @@ id,name,state_code,state 5220702,Sítio d'Abadia,GO,Goiás 2930758,Sítio do Mato,BA,Bahia 2930766,Sítio do Quinto,BA,Bahia -2413706,Sítio Novo,RN,Rio Grande do Norte 2111805,Sítio Novo,MA,Maranhão +2413706,Sítio Novo,RN,Rio Grande do Norte 1720804,Sítio Novo do Tocantins,TO,Tocantins 2930774,Sobradinho,BA,Bahia 4320701,Sobradinho,RS,Rio Grande do Sul @@ -5017,8 +5109,8 @@ id,name,state_code,state 4320859,Tabaí,RS,Rio Grande do Sul 5107941,Tabaporã,MT,Mato Grosso 3552601,Tabapuã,SP,São Paulo -3552700,Tabatinga,SP,São Paulo 1304062,Tabatinga,AM,Amazonas +3552700,Tabatinga,SP,São Paulo 2614600,Tabira,PE,Pernambuco 3552809,Taboão da Serra,SP,São Paulo 2930907,Tabocas do Brejo Velho,BA,Bahia @@ -5028,6 +5120,7 @@ id,name,state_code,state 2614709,Tacaimbó,PE,Pernambuco 2614808,Tacaratu,PE,Pernambuco 3552908,Taciba,SP,São Paulo +2516409,Tacima,PB,Paraíba 5007950,Tacuru,MS,Mato Grosso do Sul 3553005,Taguaí,SP,São Paulo 1720903,Taguatinga,TO,Tocantins @@ -5057,8 +5150,8 @@ id,name,state_code,state 2931103,Tanquinho,BA,Bahia 3168051,Taparuba,MG,Minas Gerais 1304104,Tapauá,AM,Amazonas -4320909,Tapejara,RS,Rio Grande do Sul 4126801,Tapejara,PR,Paraná +4320909,Tapejara,RS,Rio Grande do Sul 4321006,Tapera,RS,Rio Grande do Sul 2931202,Taperoá,BA,Bahia 2516508,Taperoá,PB,Paraíba @@ -5108,8 +5201,8 @@ id,name,state_code,state 2414159,Tenente Laurentino Cruz,RN,Rio Grande do Norte 4321402,Tenente Portela,RS,Rio Grande do Sul 2516755,Tenório,PB,Paraíba -3554300,Teodoro Sampaio,SP,São Paulo 2931400,Teodoro Sampaio,BA,Bahia +3554300,Teodoro Sampaio,SP,São Paulo 2931509,Teofilândia,BA,Bahia 3168606,Teófilo Otoni,MG,Minas Gerais 2931608,Teolândia,BA,Bahia @@ -5123,8 +5216,8 @@ id,name,state_code,state 1507961,Terra Alta,PA,Pará 4127205,Terra Boa,PR,Paraná 4321436,Terra de Areia,RS,Rio Grande do Sul -2615201,Terra Nova,PE,Pernambuco 2931707,Terra Nova,BA,Bahia +2615201,Terra Nova,PE,Pernambuco 5108055,Terra Nova do Norte,MT,Mato Grosso 4127304,Terra Rica,PR,Paraná 4127403,Terra Roxa,PR,Paraná @@ -5164,6 +5257,7 @@ id,name,state_code,state 2807501,Tomar do Geru,SE,Sergipe 4127809,Tomazina,PR,Paraná 3169208,Tombos,MG,Minas Gerais +1508001,Tomé-Açu,PA,Pará 1304237,Tonantins,AM,Amazonas 2615409,Toritama,PE,Pernambuco 5108204,Torixoréu,MT,Mato Grosso @@ -5173,10 +5267,12 @@ id,name,state_code,state 3554706,Torrinha,SP,São Paulo 2414407,Touros,RN,Rio Grande do Norte 3554755,Trabiju,SP,São Paulo +1508035,Tracuateua,PA,Pará 2615508,Tracunhaém,PE,Pernambuco 2709202,Traipu,AL,Alagoas 1508050,Trairão,PA,Pará 2313500,Trairi,CE,Ceará +3305901,Trajano de Moraes,RJ,Rio de Janeiro 4321600,Tramandaí,RS,Rio Grande do Sul 4321626,Travesseiro,RS,Rio Grande do Sul 2931806,Tremedal,BA,Bahia @@ -5203,9 +5299,9 @@ id,name,state_code,state 5221403,Trindade,GO,Goiás 2615607,Trindade,PE,Pernambuco 4321956,Trindade do Sul,RS,Rio Grande do Sul -4322004,Triunfo,RS,Rio Grande do Sul -2615706,Triunfo,PE,Pernambuco 2516805,Triunfo,PB,Paraíba +2615706,Triunfo,PE,Pernambuco +4322004,Triunfo,RS,Rio Grande do Sul 2414456,Triunfo Potiguar,RN,Rio Grande do Norte 2112233,Trizidela do Vale,MA,Maranhão 5221452,Trombas,GO,Goiás @@ -5244,8 +5340,8 @@ id,name,state_code,state 2313559,Tururu,CE,Ceará 5221502,Turvânia,GO,Goiás 5221551,Turvelândia,GO,Goiás -4218806,Turvo,SC,Santa Catarina 4127965,Turvo,PR,Paraná +4218806,Turvo,SC,Santa Catarina 3169802,Turvolândia,MG,Minas Gerais 2112506,Tutóia,MA,Maranhão 1304260,Uarini,AM,Amazonas @@ -5337,8 +5433,8 @@ id,name,state_code,state 3556354,Vargem,SP,São Paulo 3170578,Vargem Alegre,MG,Minas Gerais 3205036,Vargem Alta,ES,Espírito Santo -4219176,Vargem Bonita,SC,Santa Catarina 3170602,Vargem Bonita,MG,Minas Gerais +4219176,Vargem Bonita,SC,Santa Catarina 2112704,Vargem Grande,MA,Maranhão 3170651,Vargem Grande do Rio Pardo,MG,Minas Gerais 3556404,Vargem Grande do Sul,SP,São Paulo @@ -5348,15 +5444,15 @@ id,name,state_code,state 3170750,Varjão de Minas,MG,Minas Gerais 2313955,Varjota,CE,Ceará 3306156,Varre-Sai,RJ,Rio de Janeiro -2414704,Várzea,RN,Rio Grande do Norte 2517100,Várzea,PB,Paraíba +2414704,Várzea,RN,Rio Grande do Norte 2314003,Várzea Alegre,CE,Ceará 2211357,Várzea Branca,PI,Piauí 3170800,Várzea da Palma,MG,Minas Gerais 2933059,Várzea da Roça,BA,Bahia 2933109,Várzea do Poço,BA,Bahia -2211407,Várzea Grande,PI,Piauí 5108402,Várzea Grande,MT,Mato Grosso +2211407,Várzea Grande,PI,Piauí 2933158,Várzea Nova,BA,Bahia 3556503,Várzea Paulista,SP,São Paulo 2933174,Varzedo,BA,Bahia @@ -5365,13 +5461,14 @@ id,name,state_code,state 3171006,Vazante,MG,Minas Gerais 4322608,Venâncio Aires,RS,Rio Grande do Sul 3205069,Venda Nova do Imigrante,ES,Espírito Santo +2414753,Venha-Ver,RN,Rio Grande do Norte 4128534,Ventania,PR,Paraná 2616001,Venturosa,PE,Pernambuco 5108501,Vera,MT,Mato Grosso -3556602,Vera Cruz,SP,São Paulo -4322707,Vera Cruz,RS,Rio Grande do Sul 2933208,Vera Cruz,BA,Bahia 2414803,Vera Cruz,RN,Rio Grande do Norte +4322707,Vera Cruz,RS,Rio Grande do Sul +3556602,Vera Cruz,SP,São Paulo 4128559,Vera Cruz do Oeste,PR,Paraná 2211506,Vera Mendes,PI,Piauí 4322806,Veranópolis,RS,Rio Grande do Sul @@ -5388,15 +5485,15 @@ id,name,state_code,state 4322855,Vespasiano Corrêa,RS,Rio Grande do Sul 4322905,Viadutos,RS,Rio Grande do Sul 4323002,Viamão,RS,Rio Grande do Sul -2112803,Viana,MA,Maranhão 3205101,Viana,ES,Espírito Santo +2112803,Viana,MA,Maranhão 5222005,Vianópolis,GO,Goiás 2616308,Vicência,PE,Pernambuco 4323101,Vicente Dutra,RS,Rio Grande do Sul 5008404,Vicentina,MS,Mato Grosso do Sul 5222054,Vicentinópolis,GO,Goiás -3171303,Viçosa,MG,Minas Gerais 2709400,Viçosa,AL,Alagoas +3171303,Viçosa,MG,Minas Gerais 2414902,Viçosa,RN,Rio Grande do Norte 2314102,Viçosa do Ceará,CE,Ceará 4323200,Victor Graeff,RS,Rio Grande do Sul @@ -5405,6 +5502,7 @@ id,name,state_code,state 3171402,Vieiras,MG,Minas Gerais 2517209,Vieirópolis,PB,Paraíba 1508209,Vigia,PA,Pará +5105507,Vila Bela da Santíssima Trindade,MT,Mato Grosso 5222203,Vila Boa,GO,Goiás 2415008,Vila Flor,RN,Rio Grande do Norte 4323309,Vila Flores,RS,Rio Grande do Sul @@ -5471,101 +5569,4 @@ id,name,state_code,state 3557154,Zacarias,SP,São Paulo 2114007,Zé Doca,MA,Maranhão 4219853,Zortéa,SC,Santa Catarina -1200500,Sena Madureira,AC,Acre -1301159,Careiro da Várzea,AM,Amazonas -1301951,Itamarati,AM,Amazonas -2708204,São Brás,AL,Alagoas -1600253,Itaubal,AP,Amapá -1600212,Cutias,AP,Amapá -5210158,Ipiranga de Goiás,GO,Goiás -5204854,Campo Limpo de Goiás,GO,Goiás -5212253,Lagoa Santa,GO,Goiás -5219803,São Domingos,GO,Goiás -5208152,Gameleira de Goiás,GO,Goiás -3202256,Governador Lindenberg,ES,Espírito Santo -4114351,Manfrinópolis,PR,Paraná -4116950,Nova Esperança do Sudoeste,PR,Paraná -4109757,Ibema,PR,Paraná -4115309,Mariópolis,PR,Paraná -4125209,São Jorge d'Oeste,PR,Paraná -4110904,Itaguajé,PR,Paraná -4116307,Munhoz de Melo,PR,Paraná -4114609,Marechal Cândido Rondon,PR,Paraná -4128625,Alto Paraíso,PR,Paraná -4104600,Capitão Leônidas Marques,PR,Paraná -4120358,Pranchita,PR,Paraná -4106100,Conselheiro Mairinck,PR,Paraná -4123303,Santa Cruz de Monte Castelo,PR,Paraná -4118451,Pato Bragado,PR,Paraná -4119004,Pérola d'Oeste,PR,Paraná -4105201,Cerro Azul,PR,Paraná -4124053,Santa Terezinha de Itaipu,PR,Paraná -4116109,Moreira Sales,PR,Paraná -4111209,Itapejara d'Oeste,PR,Paraná -4121356,Rancho Alegre D'Oeste,PR,Paraná -2513968,São Domingos,PB,Paraíba -1720499,São Valério,TO,Tocantins -1706001,Couto Magalhães,TO,Tocantins -1506500,Santa Izabel do Pará,PA,Pará -1503200,Igarapé-Açu,PA,Pará -1505551,Pau d'Arco,PA,Pará -1507805,Senador José Porfírio,PA,Pará -1502939,Dom Eliseu,PA,Pará -1500909,Augusto Corrêa,PA,Pará -1508001,Tomé-Açu,PA,Pará -1508035,Tracuateua,PA,Pará -1505601,Peixe-Boi,PA,Pará -1503309,Igarapé-Miri,PA,Pará -1500503,Almeirim,PA,Pará -1505486,Pacajá,PA,Pará -1506351,Santa Bárbara do Pará,PA,Pará -2400208,Açu,RN,Rio Grande do Norte -2402600,Ceará-Mirim,RN,Rio Grande do Norte -2405306,Januário Cicco (Boa Saúde),RN,Rio Grande do Norte -2401305,Augusto Severo (Campo Grande),RN,Rio Grande do Norte -2412559,São Miguel do Gostoso,RN,Rio Grande do Norte -2414753,Venha-Ver,RN,Rio Grande do Norte -3303807,Paraty,RJ,Rio de Janeiro -3305901,Trajano de Moraes,RJ,Rio de Janeiro -2910057,Dias d'Ávila,BA,Bahia -2922250,Muquém de São Francisco,BA,Bahia -2928505,Santa Teresinha,BA,Bahia -3164431,São Sebastião da Vargem Alegre,MG,Minas Gerais -3147808,Passa-Vinte,MG,Minas Gerais -3150539,Pingo-d'Água,MG,Minas Gerais -3108909,Brazópolis,MG,Minas Gerais -3131802,Itabirinha,MG,Minas Gerais -3145455,Olhos d'Água,MG,Minas Gerais -3165206,São Thomé das Letras,MG,Minas Gerais -2200954,Aroeiras do Itaim,PI,Piauí -2206720,Nazária,PI,Piauí -2209658,São Francisco de Assis do Piauí,PI,Piauí -3547304,Santana de Parnaíba,SP,São Paulo -3550001,São Luiz do Paraitinga,SP,São Paulo -3515707,Ferraz de Vasconcelos,SP,São Paulo -3527603,Luís Antônio,SP,São Paulo -3516101,Florínia,SP,São Paulo -5106158,Nova Bandeirantes,MT,Mato Grosso -5107800,Santo Antônio do Leverger,MT,Mato Grosso -5105622,Mirassol d'Oeste,MT,Mato Grosso -5105234,Lambari D'Oeste,MT,Mato Grosso -5107008,Poxoréu,MT,Mato Grosso -5104526,Ipiranga do Norte,MT,Mato Grosso -5103957,Glória D'Oeste,MT,Mato Grosso -5104542,Itanhangá,MT,Mato Grosso -5103809,Figueirópolis D'Oeste,MT,Mato Grosso -5105507,Vila Bela da Santíssima Trindade,MT,Mato Grosso -5103361,Conquista D'Oeste,MT,Mato Grosso -5003900,Figueirão,MS,Mato Grosso do Sul -4212809,Balneário Piçarras,SC,Santa Catarina -4216909,São Lourenço do Oeste,SC,Santa Catarina -4207684,Ipuaçu,SC,Santa Catarina -4213906,Presidente Castello Branco,SC,Santa Catarina -2613107,São Caetano,PE,Pernambuco -4317103,Sant'Ana do Livramento,RS,Rio Grande do Sul -5107792,Santo Antônio do Leste,MT,Mato Grosso -4314548,Pinto Bandeira,RS,Rio Grande do Sul -4220000,Balneário Rincão,SC,Santa Catarina -4212650,Pescaria Brava,SC,Santa Catarina -1504752,Mojuí dos Campos,PA,Pará -5006275,Paraíso das Águas,MS,Mato Grosso do Sul +2700000,Associação dos Municípios Alagoanos,AL,Alagoas diff --git a/data_collection/gazette/settings.py b/data_collection/gazette/settings.py index 63bed5f19..f6aa1be9d 100644 --- a/data_collection/gazette/settings.py +++ b/data_collection/gazette/settings.py @@ -17,6 +17,8 @@ "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0" ) +TEMPLATES_DIR = "templates" + DOWNLOAD_TIMEOUT = 360 FILES_STORE = config("FILES_STORE", default="s3://queridodiariobucket/") @@ -58,3 +60,5 @@ DOWNLOADER_MIDDLEWARES = {"scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware": 610} ZYTE_SMARTPROXY_APIKEY = "" + +COMMANDS_MODULE = "gazette.commands" diff --git a/data_collection/gazette/spiders/al/al_associacao_municipios.py b/data_collection/gazette/spiders/al/al_associacao_municipios.py index 39e2d0c0a..799033aab 100644 --- a/data_collection/gazette/spiders/al/al_associacao_municipios.py +++ b/data_collection/gazette/spiders/al/al_associacao_municipios.py @@ -1,7 +1,15 @@ +import datetime + from gazette.spiders.base.sigpub import SigpubGazetteSpider class AlAssociacaoMunicipiosSpider(SigpubGazetteSpider): name = "al_associacao_municipios" TERRITORY_ID = "2700000" - CALENDAR_URL = "https://www.diariomunicipal.com.br/ama" + CALENDAR_URL = "https://www.diariomunicipal.com.br/ama/" + start_date = datetime.date(2014, 4, 10) + + custom_settings = { + "DOWNLOAD_DELAY": 0.5, + "CONCURRENT_REQUESTS_PER_DOMAIN": 4, + } diff --git a/data_collection/gazette/spiders/al/al_igaci.py b/data_collection/gazette/spiders/al/al_igaci.py new file mode 100644 index 000000000..e3b2ecec2 --- /dev/null +++ b/data_collection/gazette/spiders/al/al_igaci.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class AlIgaciSpider(SaiGazetteSpider): + TERRITORY_ID = "2703106" + name = "al_igaci" + start_date = dt.date(2015, 7, 17) + allowed_domains = ["igaci.al.gov.br"] + base_url = "https://www.igaci.al.gov.br" diff --git a/data_collection/gazette/spiders/al/al_maceio.py b/data_collection/gazette/spiders/al/al_maceio.py index 483da7569..ec5d236e1 100644 --- a/data_collection/gazette/spiders/al/al_maceio.py +++ b/data_collection/gazette/spiders/al/al_maceio.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.sigpub import SigpubGazetteSpider @@ -5,3 +7,4 @@ class AlMaceioSpider(SigpubGazetteSpider): name = "al_maceio" TERRITORY_ID = "2704302" CALENDAR_URL = "https://www.diariomunicipal.com.br/maceio" + start_date = date(2018, 8, 9) diff --git a/data_collection/gazette/spiders/ba/ba_alcobaca.py b/data_collection/gazette/spiders/ba/ba_alcobaca.py index 763b834e5..e38ebb5f5 100644 --- a/data_collection/gazette/spiders/ba/ba_alcobaca.py +++ b/data_collection/gazette/spiders/ba/ba_alcobaca.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaAlcobacaSpider(DoemGazetteSpider): TERRITORY_ID = "2900801" name = "ba_alcobaca" state_city_url_part = "ba/alcobaca" + start_date = date(2017, 3, 3) diff --git a/data_collection/gazette/spiders/ba/ba_angical.py b/data_collection/gazette/spiders/ba/ba_angical.py new file mode 100644 index 000000000..fdf7efd76 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_angical.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaAngicalSpider(DoemGazetteSpider): + TERRITORY_ID = "2901403" + name = "ba_angical" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/angical" diff --git a/data_collection/gazette/spiders/ba/ba_antonio_cardoso.py b/data_collection/gazette/spiders/ba/ba_antonio_cardoso_2017.py similarity index 65% rename from data_collection/gazette/spiders/ba/ba_antonio_cardoso.py rename to data_collection/gazette/spiders/ba/ba_antonio_cardoso_2017.py index bc9dc05f9..590760d59 100644 --- a/data_collection/gazette/spiders/ba/ba_antonio_cardoso.py +++ b/data_collection/gazette/spiders/ba/ba_antonio_cardoso_2017.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaAntonioCardosoSpider(DoemGazetteSpider): TERRITORY_ID = "2901700" - name = "ba_antonio_cardoso" + name = "ba_antonio_cardoso_2017" state_city_url_part = "ba/antoniocardoso" + start_date = date(2017, 1, 2) diff --git a/data_collection/gazette/spiders/ba/ba_banzae.py b/data_collection/gazette/spiders/ba/ba_banzae.py index 99412624c..b93609cea 100644 --- a/data_collection/gazette/spiders/ba/ba_banzae.py +++ b/data_collection/gazette/spiders/ba/ba_banzae.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaBanzaeSpider(DoemGazetteSpider): TERRITORY_ID = "2902658" name = "ba_banzae" state_city_url_part = "ba/banzae" + start_date = date(2017, 2, 2) diff --git a/data_collection/gazette/spiders/ba/ba_barra_do_choca.py b/data_collection/gazette/spiders/ba/ba_barra_do_choca_2017.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_barra_do_choca.py rename to data_collection/gazette/spiders/ba/ba_barra_do_choca_2017.py index 5595fcf91..87eaa5d71 100644 --- a/data_collection/gazette/spiders/ba/ba_barra_do_choca.py +++ b/data_collection/gazette/spiders/ba/ba_barra_do_choca_2017.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaBarraDoChocaSpider(DoemGazetteSpider): TERRITORY_ID = "2902906" - name = "ba_barra_do_choca" + name = "ba_barra_do_choca_2017" state_city_url_part = "ba/barradochoca" + start_date = date(2017, 6, 9) diff --git a/data_collection/gazette/spiders/ba/ba_barrocas.py b/data_collection/gazette/spiders/ba/ba_barrocas_2017.py similarity index 65% rename from data_collection/gazette/spiders/ba/ba_barrocas.py rename to data_collection/gazette/spiders/ba/ba_barrocas_2017.py index 0e39290c3..ced0c8562 100644 --- a/data_collection/gazette/spiders/ba/ba_barrocas.py +++ b/data_collection/gazette/spiders/ba/ba_barrocas_2017.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaBarrocasSpider(DoemGazetteSpider): TERRITORY_ID = "2903276" - name = "ba_barrocas" + name = "ba_barrocas_2017" state_city_url_part = "ba/barrocas" + start_date = date(2017, 1, 2) diff --git a/data_collection/gazette/spiders/ba/ba_brotas_de_macaubas.py b/data_collection/gazette/spiders/ba/ba_brotas_de_macaubas.py index a727e0004..9efdb8a69 100644 --- a/data_collection/gazette/spiders/ba/ba_brotas_de_macaubas.py +++ b/data_collection/gazette/spiders/ba/ba_brotas_de_macaubas.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaBrotasDeMacaubasSpider(DoemGazetteSpider): TERRITORY_ID = "2904506" name = "ba_brotas_de_macaubas" state_city_url_part = "ba/brotasdemacaubas" + start_date = date(2019, 8, 13) diff --git a/data_collection/gazette/spiders/ba/ba_cachoeira.py b/data_collection/gazette/spiders/ba/ba_cachoeira_2017.py similarity index 65% rename from data_collection/gazette/spiders/ba/ba_cachoeira.py rename to data_collection/gazette/spiders/ba/ba_cachoeira_2017.py index 735600d39..d785ed9dc 100644 --- a/data_collection/gazette/spiders/ba/ba_cachoeira.py +++ b/data_collection/gazette/spiders/ba/ba_cachoeira_2017.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaCachoeiraSpider(DoemGazetteSpider): TERRITORY_ID = "2904902" - name = "ba_cachoeira" + name = "ba_cachoeira_2017" state_city_url_part = "ba/cachoeira" + start_date = date(2017, 1, 3) diff --git a/data_collection/gazette/spiders/ba/ba_cacule.py b/data_collection/gazette/spiders/ba/ba_cacule_2014.py similarity index 65% rename from data_collection/gazette/spiders/ba/ba_cacule.py rename to data_collection/gazette/spiders/ba/ba_cacule_2014.py index ba2645106..ca5a8e870 100644 --- a/data_collection/gazette/spiders/ba/ba_cacule.py +++ b/data_collection/gazette/spiders/ba/ba_cacule_2014.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaCaculeSpider(DoemGazetteSpider): TERRITORY_ID = "2905008" - name = "ba_cacule" + name = "ba_cacule_2014" state_city_url_part = "ba/cacule" + start_date = date(2014, 1, 2) diff --git a/data_collection/gazette/spiders/ba/ba_caetite.py b/data_collection/gazette/spiders/ba/ba_caetite.py new file mode 100644 index 000000000..abe940903 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_caetite.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaCaetiteSpider(DoemGazetteSpider): + TERRITORY_ID = "2905206" + name = "ba_caetite" + start_date = date(2021, 4, 27) + state_city_url_part = "ba/caetite" diff --git a/data_collection/gazette/spiders/ba/ba_camamu.py b/data_collection/gazette/spiders/ba/ba_camamu_2017.py similarity index 65% rename from data_collection/gazette/spiders/ba/ba_camamu.py rename to data_collection/gazette/spiders/ba/ba_camamu_2017.py index 7847d5f84..dfdb025a7 100644 --- a/data_collection/gazette/spiders/ba/ba_camamu.py +++ b/data_collection/gazette/spiders/ba/ba_camamu_2017.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaCamamuSpider(DoemGazetteSpider): TERRITORY_ID = "2905800" - name = "ba_camamu" + name = "ba_camamu_2017" state_city_url_part = "ba/camamu" + start_date = date(2017, 1, 3) diff --git a/data_collection/gazette/spiders/ba/ba_campo_alegre_de_lourdes.py b/data_collection/gazette/spiders/ba/ba_campo_alegre_de_lourdes.py new file mode 100644 index 000000000..ad118562d --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_campo_alegre_de_lourdes.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaCampoAlegreDeLourdesSpider(DoemGazetteSpider): + TERRITORY_ID = "2905909" + name = "ba_campo_alegre_de_lourdes" + start_date = date(2020, 11, 30) # Primeira edição em 30/11/2020 + state_city_url_part = "ba/campoalegredelourdes" diff --git a/data_collection/gazette/spiders/ba/ba_catolandia.py b/data_collection/gazette/spiders/ba/ba_catolandia_2015.py similarity index 65% rename from data_collection/gazette/spiders/ba/ba_catolandia.py rename to data_collection/gazette/spiders/ba/ba_catolandia_2015.py index b65227b92..4599f2956 100644 --- a/data_collection/gazette/spiders/ba/ba_catolandia.py +++ b/data_collection/gazette/spiders/ba/ba_catolandia_2015.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaCatolandiaSpider(DoemGazetteSpider): TERRITORY_ID = "2907400" - name = "ba_catolandia" + name = "ba_catolandia_2015" state_city_url_part = "ba/catolandia" + start_date = date(2015, 5, 6) diff --git a/data_collection/gazette/spiders/ba/ba_catu.py b/data_collection/gazette/spiders/ba/ba_catu_2014.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_catu.py rename to data_collection/gazette/spiders/ba/ba_catu_2014.py index 7d4fe0817..8b8e97ac1 100644 --- a/data_collection/gazette/spiders/ba/ba_catu.py +++ b/data_collection/gazette/spiders/ba/ba_catu_2014.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaCatuSpider(DoemGazetteSpider): TERRITORY_ID = "2907509" - name = "ba_catu" + name = "ba_catu_2014" state_city_url_part = "ba/catu" + start_date = date(2014, 7, 17) diff --git a/data_collection/gazette/spiders/ba/ba_cicero_dantas.py b/data_collection/gazette/spiders/ba/ba_cicero_dantas.py index 73da561d5..27b1091ea 100644 --- a/data_collection/gazette/spiders/ba/ba_cicero_dantas.py +++ b/data_collection/gazette/spiders/ba/ba_cicero_dantas.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaCiceroDantasSpider(DoemGazetteSpider): TERRITORY_ID = "2907806" name = "ba_cicero_dantas" state_city_url_part = "ba/cicerodantas" + start_date = date(2012, 1, 3) diff --git a/data_collection/gazette/spiders/ba/ba_cipo.py b/data_collection/gazette/spiders/ba/ba_cipo.py new file mode 100644 index 000000000..9146e6e79 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_cipo.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaCipoSpider(DoemGazetteSpider): + TERRITORY_ID = "2907905" + name = "ba_cipo" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/cipo" diff --git a/data_collection/gazette/spiders/ba/ba_correntina.py b/data_collection/gazette/spiders/ba/ba_correntina.py new file mode 100644 index 000000000..2fee40988 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_correntina.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaCorrentinaSpider(SaiGazetteSpider): + TERRITORY_ID = "2909307" + name = "ba_correntina" + start_date = dt.date(2007, 11, 30) + allowed_domains = ["sai.io.org.br"] + base_url = "https://sai.io.org.br/ba/correntina" diff --git a/data_collection/gazette/spiders/ba/ba_cotegipe.py b/data_collection/gazette/spiders/ba/ba_cotegipe.py new file mode 100644 index 000000000..5c6bdf803 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_cotegipe.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaCotegipeSpider(DoemGazetteSpider): + TERRITORY_ID = "2909406" + name = "ba_cotegipe" + start_date = date(2023, 1, 5) + state_city_url_part = "ba/cotegipe" diff --git a/data_collection/gazette/spiders/ba/ba_cristopolis.py b/data_collection/gazette/spiders/ba/ba_cristopolis.py new file mode 100644 index 000000000..f9496c898 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_cristopolis.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaCristopolisSpider(DoemGazetteSpider): + TERRITORY_ID = "2909703" + name = "ba_cristopolis" + start_date = date(2021, 1, 12) + state_city_url_part = "ba/cristopolis" diff --git a/data_collection/gazette/spiders/ba/ba_cruz_das_almas.py b/data_collection/gazette/spiders/ba/ba_cruz_das_almas.py new file mode 100644 index 000000000..40f4297ea --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_cruz_das_almas.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaCruzDasAlmasSpider(DoemGazetteSpider): + TERRITORY_ID = "2909802" + name = "ba_cruz_das_almas" + start_date = date(2021, 4, 1) + state_city_url_part = "ba/cruzdasalmas" diff --git a/data_collection/gazette/spiders/ba/ba_esplanada.py b/data_collection/gazette/spiders/ba/ba_esplanada.py new file mode 100644 index 000000000..5c0089396 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_esplanada.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaEsplanadaSpider(DoemGazetteSpider): + TERRITORY_ID = "2910602" + name = "ba_esplanada" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/esplanada" diff --git a/data_collection/gazette/spiders/ba/ba_floresta_azul.py b/data_collection/gazette/spiders/ba/ba_floresta_azul_2017.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_floresta_azul.py rename to data_collection/gazette/spiders/ba/ba_floresta_azul_2017.py index addd37ab9..ecb507849 100644 --- a/data_collection/gazette/spiders/ba/ba_floresta_azul.py +++ b/data_collection/gazette/spiders/ba/ba_floresta_azul_2017.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaFlorestaAzulSpider(DoemGazetteSpider): TERRITORY_ID = "2911006" - name = "ba_floresta_azul" + name = "ba_floresta_azul_2017" state_city_url_part = "ba/florestaazul" + start_date = date(2017, 1, 2) diff --git a/data_collection/gazette/spiders/ba/ba_formosa_do_rio_preto.py b/data_collection/gazette/spiders/ba/ba_formosa_do_rio_preto.py new file mode 100644 index 000000000..d303a1ac7 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_formosa_do_rio_preto.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaFormosaDoRioPretoSpider(DoemGazetteSpider): + TERRITORY_ID = "2911105" + name = "ba_formosa_do_rio_preto" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/formosadoriopreto" diff --git a/data_collection/gazette/spiders/ba/ba_inhambupe.py b/data_collection/gazette/spiders/ba/ba_inhambupe.py index 70ef21aee..93d4a350c 100644 --- a/data_collection/gazette/spiders/ba/ba_inhambupe.py +++ b/data_collection/gazette/spiders/ba/ba_inhambupe.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaInhambupeSpider(DoemGazetteSpider): TERRITORY_ID = "2913705" name = "ba_inhambupe" state_city_url_part = "ba/inhambupe" + start_date = date(2013, 1, 2) diff --git a/data_collection/gazette/spiders/ba/ba_ipiau.py b/data_collection/gazette/spiders/ba/ba_ipiau.py index d1cbbedef..8e0739cd9 100644 --- a/data_collection/gazette/spiders/ba/ba_ipiau.py +++ b/data_collection/gazette/spiders/ba/ba_ipiau.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaIpiauSpider(DoemGazetteSpider): TERRITORY_ID = "2913903" name = "ba_ipiau" state_city_url_part = "ba/ipiau" + start_date = date(2016, 5, 9) diff --git a/data_collection/gazette/spiders/ba/ba_irara.py b/data_collection/gazette/spiders/ba/ba_irara.py index 9582752c0..df93e118f 100644 --- a/data_collection/gazette/spiders/ba/ba_irara.py +++ b/data_collection/gazette/spiders/ba/ba_irara.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaIraraSpider(DoemGazetteSpider): TERRITORY_ID = "2914505" name = "ba_irara" + start_date = date(2018, 1, 3) state_city_url_part = "ba/irara" diff --git a/data_collection/gazette/spiders/ba/ba_itaberaba.py b/data_collection/gazette/spiders/ba/ba_itaberaba.py new file mode 100644 index 000000000..f0208e088 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_itaberaba.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaItaberabaSpider(DoemGazetteSpider): + TERRITORY_ID = "2914703" + name = "ba_itaberaba" + start_date = date(2022, 7, 4) + state_city_url_part = "ba/itaberaba" diff --git a/data_collection/gazette/spiders/ba/ba_itamaraju.py b/data_collection/gazette/spiders/ba/ba_itamaraju.py new file mode 100644 index 000000000..a4b486838 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_itamaraju.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaItamarajuSpider(DoemGazetteSpider): + TERRITORY_ID = "2915601" + name = "ba_itamaraju" + start_date = date(2008, 3, 28) + state_city_url_part = "ba/itamaraju" diff --git a/data_collection/gazette/spiders/ba/ba_itapicuru.py b/data_collection/gazette/spiders/ba/ba_itapicuru.py new file mode 100644 index 000000000..0afd01517 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_itapicuru.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaItapicuruSpider(DoemGazetteSpider): + TERRITORY_ID = "2916500" + name = "ba_itapicuru" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/itapicuru" diff --git a/data_collection/gazette/spiders/ba/ba_ituacu.py b/data_collection/gazette/spiders/ba/ba_ituacu.py index 5cd86429c..5e97055d8 100644 --- a/data_collection/gazette/spiders/ba/ba_ituacu.py +++ b/data_collection/gazette/spiders/ba/ba_ituacu.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaItuacuSpider(DoemGazetteSpider): TERRITORY_ID = "2917201" name = "ba_ituacu" + start_date = date(2018, 1, 2) state_city_url_part = "ba/ituacu" diff --git a/data_collection/gazette/spiders/ba/ba_jaborandi.py b/data_collection/gazette/spiders/ba/ba_jaborandi.py new file mode 100644 index 000000000..ebd6985a9 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_jaborandi.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaJaborandiSpider(SaiGazetteSpider): + TERRITORY_ID = "2917359" + name = "ba_jaborandi" + start_date = dt.date(2022, 3, 4) + allowed_domains = ["sai.io.org.br"] + base_url = "https://sai.io.org.br/ba/jaborandi" diff --git a/data_collection/gazette/spiders/ba/ba_jaguaquara.py b/data_collection/gazette/spiders/ba/ba_jaguaquara.py new file mode 100644 index 000000000..1bcf98aca --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_jaguaquara.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaJaguaquaraSpider(DoemGazetteSpider): + TERRITORY_ID = "2917607" + name = "ba_jaguaquara" + start_date = date(2021, 4, 5) + state_city_url_part = "ba/jaguaquara" diff --git a/data_collection/gazette/spiders/ba/ba_jeremoabo.py b/data_collection/gazette/spiders/ba/ba_jeremoabo.py new file mode 100644 index 000000000..036ac6bef --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_jeremoabo.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaJeremoaboSpider(SaiGazetteSpider): + TERRITORY_ID = "2918100" + name = "ba_jeremoabo" + start_date = dt.date(2016, 4, 28) + allowed_domains = ["jeremoabo.ba.gov.br"] + base_url = "https://www.jeremoabo.ba.gov.br" diff --git a/data_collection/gazette/spiders/ba/ba_laje.py b/data_collection/gazette/spiders/ba/ba_laje.py index fcce7bdc8..6d1bf3d83 100644 --- a/data_collection/gazette/spiders/ba/ba_laje.py +++ b/data_collection/gazette/spiders/ba/ba_laje.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaLajeSpider(DoemGazetteSpider): - TERRITORY_ID = "2918902" + TERRITORY_ID = "2918803" name = "ba_laje" + start_date = date(2020, 1, 8) state_city_url_part = "ba/laje" diff --git a/data_collection/gazette/spiders/ba/ba_lajedao.py b/data_collection/gazette/spiders/ba/ba_lajedao.py new file mode 100644 index 000000000..9ad0409f9 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_lajedao.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaLajedaoSpider(DoemGazetteSpider): + TERRITORY_ID = "2918902" + name = "ba_lajedao" + start_date = date(2021, 4, 14) + state_city_url_part = "ba/lajedao" diff --git a/data_collection/gazette/spiders/ba/ba_lauro_de_freitas.py b/data_collection/gazette/spiders/ba/ba_lauro_de_freitas.py new file mode 100644 index 000000000..f4752ed43 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_lauro_de_freitas.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaLauroDeFreitasSpider(SaiGazetteSpider): + TERRITORY_ID = "2919207" + name = "ba_lauro_de_freitas" + start_date = dt.date(2013, 7, 31) + allowed_domains = ["sai.io.org.br"] + base_url = "https://sai.io.org.br/ba/laurodefreitas" diff --git a/data_collection/gazette/spiders/ba/ba_luis_eduardo_magalhaes.py b/data_collection/gazette/spiders/ba/ba_luis_eduardo_magalhaes.py new file mode 100644 index 000000000..d01a081a4 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_luis_eduardo_magalhaes.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaLuisEduardoMagalhaesSpider(SaiGazetteSpider): + TERRITORY_ID = "2919553" + name = "ba_luis_eduardo_magalhaes" + start_date = dt.date(2017, 1, 4) + allowed_domains = ["sai.io.org.br"] + base_url = "https://sai.io.org.br/ba/luiseduardomagalhaes" diff --git a/data_collection/gazette/spiders/ba/ba_macajuba.py b/data_collection/gazette/spiders/ba/ba_macajuba.py index f76381a31..2f6a33024 100644 --- a/data_collection/gazette/spiders/ba/ba_macajuba.py +++ b/data_collection/gazette/spiders/ba/ba_macajuba.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaMacajubaSpider(DoemGazetteSpider): TERRITORY_ID = "2919603" name = "ba_macajuba" + start_date = date(2014, 3, 17) state_city_url_part = "ba/macajuba" diff --git a/data_collection/gazette/spiders/ba/ba_maragogipe.py b/data_collection/gazette/spiders/ba/ba_maragogipe.py new file mode 100644 index 000000000..396f4a832 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_maragogipe.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaMaragogipeSpider(SaiGazetteSpider): + TERRITORY_ID = "2704500" + name = "ba_maragogipe" + start_date = dt.date(2011, 2, 2) + allowed_domains = ["sai.io.org.br"] + base_url = "https://sai.io.org.br/ba/maragojipe" diff --git a/data_collection/gazette/spiders/ba/ba_medeiros_neto.py b/data_collection/gazette/spiders/ba/ba_medeiros_neto_2018.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_medeiros_neto.py rename to data_collection/gazette/spiders/ba/ba_medeiros_neto_2018.py index f842f706f..63c823338 100644 --- a/data_collection/gazette/spiders/ba/ba_medeiros_neto.py +++ b/data_collection/gazette/spiders/ba/ba_medeiros_neto_2018.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaMedeirosNetoSpider(DoemGazetteSpider): TERRITORY_ID = "2921104" - name = "ba_medeiros_neto" + name = "ba_medeiros_neto_2018" + start_date = date(2018, 1, 9) state_city_url_part = "ba/medeirosneto" diff --git a/data_collection/gazette/spiders/ba/ba_monte_santo.py b/data_collection/gazette/spiders/ba/ba_monte_santo.py new file mode 100644 index 000000000..5725be1d2 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_monte_santo.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaMonteSantoSpider(DoemGazetteSpider): + TERRITORY_ID = "2921500" + name = "ba_monte_santo" + start_date = date(2021, 1, 2) + state_city_url_part = "ba/montesanto" diff --git a/data_collection/gazette/spiders/ba/ba_morro_do_chapeu.py b/data_collection/gazette/spiders/ba/ba_morro_do_chapeu.py new file mode 100644 index 000000000..e0803c95b --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_morro_do_chapeu.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaMorroDoChapeuSpider(DoemGazetteSpider): + TERRITORY_ID = "2921708" + name = "ba_morro_do_chapeu" + start_date = date(2021, 1, 6) + state_city_url_part = "ba/morrodochapeu" diff --git a/data_collection/gazette/spiders/ba/ba_mucuri.py b/data_collection/gazette/spiders/ba/ba_mucuri.py index 69f0da683..22decdefa 100644 --- a/data_collection/gazette/spiders/ba/ba_mucuri.py +++ b/data_collection/gazette/spiders/ba/ba_mucuri.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaMucuriSpider(DoemGazetteSpider): TERRITORY_ID = "2922003" name = "ba_mucuri" + start_date = date(2018, 1, 3) state_city_url_part = "ba/mucuri" diff --git a/data_collection/gazette/spiders/ba/ba_riachao_das_neves.py b/data_collection/gazette/spiders/ba/ba_riachao_das_neves.py new file mode 100644 index 000000000..953e64c17 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_riachao_das_neves.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class BaRiachaoDasNevesSpider(SaiGazetteSpider): + TERRITORY_ID = "2926202" + name = "ba_riachao_das_neves" + start_date = dt.date(2010, 2, 4) + allowed_domains = ["sai.io.org.br"] + base_url = "https://sai.io.org.br/ba/riachaodasneves" diff --git a/data_collection/gazette/spiders/ba/ba_ribeira_do_pombal.py b/data_collection/gazette/spiders/ba/ba_ribeira_do_pombal_2014.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_ribeira_do_pombal.py rename to data_collection/gazette/spiders/ba/ba_ribeira_do_pombal_2014.py index 32696b0aa..0fa97c8ce 100644 --- a/data_collection/gazette/spiders/ba/ba_ribeira_do_pombal.py +++ b/data_collection/gazette/spiders/ba/ba_ribeira_do_pombal_2014.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaRibeiraDoPombalSpider(DoemGazetteSpider): TERRITORY_ID = "2926608" - name = "ba_ribeira_do_pombal" + name = "ba_ribeira_do_pombal_2014" state_city_url_part = "ba/ribeiradopombal" + start_date = date(2014, 1, 16) diff --git a/data_collection/gazette/spiders/ba/ba_salvador.py b/data_collection/gazette/spiders/ba/ba_salvador.py index 8d3b3eb25..856b0d551 100644 --- a/data_collection/gazette/spiders/ba/ba_salvador.py +++ b/data_collection/gazette/spiders/ba/ba_salvador.py @@ -8,6 +8,8 @@ class BaSalvadorSpider(BaseGazetteSpider): + zyte_smartproxy_enabled = True + TERRITORY_ID = "2927408" name = "ba_salvador" allowed_domains = ["salvador.ba.gov.br"] diff --git a/data_collection/gazette/spiders/ba/ba_santa_cruz_cabralia.py b/data_collection/gazette/spiders/ba/ba_santa_cruz_cabralia.py index 0146fc51d..223f8b838 100644 --- a/data_collection/gazette/spiders/ba/ba_santa_cruz_cabralia.py +++ b/data_collection/gazette/spiders/ba/ba_santa_cruz_cabralia.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaSantaCruzCabraliaSpider(DoemGazetteSpider): TERRITORY_ID = "2927705" name = "ba_santa_cruz_cabralia" state_city_url_part = "ba/santacruzcabralia" + start_date = date(2017, 1, 9) diff --git a/data_collection/gazette/spiders/ba/ba_santa_luzia.py b/data_collection/gazette/spiders/ba/ba_santa_luzia.py new file mode 100644 index 000000000..9ea5304f1 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_santa_luzia.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaSantaLuziaSpider(DoemGazetteSpider): + TERRITORY_ID = "2928059" + name = "ba_santa_luzia" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/santaluzia" diff --git a/data_collection/gazette/spiders/ba/ba_santa_rita_de_cassia.py b/data_collection/gazette/spiders/ba/ba_santa_rita_de_cassia.py new file mode 100644 index 000000000..3cca849d9 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_santa_rita_de_cassia.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaSantaRitaDeCassiaSpider(DoemGazetteSpider): + TERRITORY_ID = "2928406" + name = "ba_santa_rita_de_cassia" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/santaritadecassia" diff --git a/data_collection/gazette/spiders/ba/ba_santo_amaro.py b/data_collection/gazette/spiders/ba/ba_santo_amaro_2012.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_santo_amaro.py rename to data_collection/gazette/spiders/ba/ba_santo_amaro_2012.py index 27f8faa81..8ea872097 100644 --- a/data_collection/gazette/spiders/ba/ba_santo_amaro.py +++ b/data_collection/gazette/spiders/ba/ba_santo_amaro_2012.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaSantoAmaroSpider(DoemGazetteSpider): TERRITORY_ID = "2928604" - name = "ba_santo_amaro" + name = "ba_santo_amaro_2012" state_city_url_part = "ba/santoamaro" + start_date = date(2012, 12, 6) diff --git a/data_collection/gazette/spiders/ba/ba_satiro_dias.py b/data_collection/gazette/spiders/ba/ba_satiro_dias.py new file mode 100644 index 000000000..1d0dd0704 --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_satiro_dias.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaSatiroDiasSpider(DoemGazetteSpider): + TERRITORY_ID = "2929701" + name = "ba_satiro_dias" + start_date = date(2021, 3, 30) + state_city_url_part = "ba/satirodias" diff --git a/data_collection/gazette/spiders/ba/ba_sento_se.py b/data_collection/gazette/spiders/ba/ba_sento_se.py index c9cbbff5e..a5f017a1e 100644 --- a/data_collection/gazette/spiders/ba/ba_sento_se.py +++ b/data_collection/gazette/spiders/ba/ba_sento_se.py @@ -1,3 +1,5 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider @@ -5,3 +7,4 @@ class BaSentoSeSpider(DoemGazetteSpider): TERRITORY_ID = "2930204" name = "ba_sento_se" state_city_url_part = "ba/sentose" + start_date = date(2017, 1, 2) diff --git a/data_collection/gazette/spiders/ba/ba_tabocas_do_brejo_velho.py b/data_collection/gazette/spiders/ba/ba_tabocas_do_brejo_velho_2013.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_tabocas_do_brejo_velho.py rename to data_collection/gazette/spiders/ba/ba_tabocas_do_brejo_velho_2013.py index ba21030b2..bf01325a4 100644 --- a/data_collection/gazette/spiders/ba/ba_tabocas_do_brejo_velho.py +++ b/data_collection/gazette/spiders/ba/ba_tabocas_do_brejo_velho_2013.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaTabocasDoBrejoVelhoSpider(DoemGazetteSpider): TERRITORY_ID = "2930907" - name = "ba_tabocas_do_brejo_velho" + name = "ba_tabocas_do_brejo_velho_2013" state_city_url_part = "ba/tabocasdobrejovelho" + start_date = date(2013, 1, 4) diff --git a/data_collection/gazette/spiders/ba/ba_tapiramuta.py b/data_collection/gazette/spiders/ba/ba_tapiramuta.py new file mode 100644 index 000000000..91f26c42b --- /dev/null +++ b/data_collection/gazette/spiders/ba/ba_tapiramuta.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class BaTapiramutaSpider(DoemGazetteSpider): + TERRITORY_ID = "2931301" + name = "ba_tapiramuta" + start_date = date(2021, 1, 4) + state_city_url_part = "ba/tapiramuta" diff --git a/data_collection/gazette/spiders/ba/ba_teixeira_de_freitas.py b/data_collection/gazette/spiders/ba/ba_teixeira_de_freitas_2021.py similarity index 64% rename from data_collection/gazette/spiders/ba/ba_teixeira_de_freitas.py rename to data_collection/gazette/spiders/ba/ba_teixeira_de_freitas_2021.py index 28dfb297a..bc58de2f2 100644 --- a/data_collection/gazette/spiders/ba/ba_teixeira_de_freitas.py +++ b/data_collection/gazette/spiders/ba/ba_teixeira_de_freitas_2021.py @@ -1,7 +1,10 @@ +from datetime import date + from gazette.spiders.base.doem import DoemGazetteSpider class BaTeixeiraDeFreitasSpider(DoemGazetteSpider): TERRITORY_ID = "2931350" - name = "ba_teixeira_de_freitas" + name = "ba_teixeira_de_freitas_2021" state_city_url_part = "ba/teixeiradefreitas" + start_date = date(2021, 3, 2) diff --git a/data_collection/gazette/spiders/base/__init__.py b/data_collection/gazette/spiders/base/__init__.py index 0667b3aab..4363cb282 100644 --- a/data_collection/gazette/spiders/base/__init__.py +++ b/data_collection/gazette/spiders/base/__init__.py @@ -14,35 +14,29 @@ class BaseGazetteSpider(scrapy.Spider): # being blocked based on our location. zyte_smartproxy_enabled = False - def __init__(self, start_date=None, end_date=None, *args, **kwargs): + def __init__(self, start_date="", end_date="", *args, **kwargs): super(BaseGazetteSpider, self).__init__(*args, **kwargs) if not hasattr(self, "TERRITORY_ID"): raise NotConfigured("Please set a value for `TERRITORY_ID`") - if start_date is not None: + if start_date: try: self.start_date = datetime.strptime(start_date, "%Y-%m-%d").date() - self.logger.info(f"Collecting gazettes from {self.start_date}") except ValueError: self.logger.exception( f"Unable to parse {start_date}. Use %Y-%m-d date format." ) raise - else: - self.logger.info("Collecting all gazettes available from the beginning") - if end_date is not None: + self.end_date = datetime.today().date() + if end_date: try: self.end_date = datetime.strptime(end_date, "%Y-%m-%d").date() - self.logger.info(f"Collecting gazettes until {self.end_date}") except ValueError: self.logger.exception( f"Unable to parse {end_date}. Use %Y-%m-d date format." ) raise - elif hasattr(self, "end_date"): - self.logger.info(f"Collecting gazettes until {self.end_date}") - else: - self.end_date = datetime.today().date() - self.logger.info("Collecting all gazettes available until today") + + self.logger.info(f"Collecting data from {self.start_date} to {self.end_date}.") diff --git a/data_collection/gazette/spiders/base/diariooficialbr.py b/data_collection/gazette/spiders/base/diariooficialbr.py new file mode 100644 index 000000000..f9a2a4aa9 --- /dev/null +++ b/data_collection/gazette/spiders/base/diariooficialbr.py @@ -0,0 +1,42 @@ +import dateparser +import scrapy + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class BaseDiarioOficialBRSpider(BaseGazetteSpider): + def start_requests(self): + url = f"{self.BASE_URL}/pesquisa/search?initDate={self.start_date}&endDate={self.end_date}" + yield scrapy.Request(url) + + def parse(self, response): + editions_list = response.xpath('//div[contains(@class, "card-downloads")]') + for edition in editions_list: + edition_date_selector = edition.xpath( + './/div[contains(text(), "Publicado")]/text()' + ).get() + edition_date = dateparser.parse( + edition_date_selector.split("dia")[-1], languages=["pt"] + ).date() + + edition_number_raw = edition.xpath( + './/span[contains(text(), "Edição")]/text()' + ) + edition_number = edition_number_raw.re_first("nº\s+(\d+)") + is_extra_edition = "extra" in edition_number_raw.get().lower() + edition_url = edition.xpath( + './/a[contains(@href, "/download")]/@href' + ).get() + + yield Gazette( + date=edition_date, + edition_number=edition_number, + file_urls=[edition_url], + is_extra_edition=is_extra_edition, + power="executive", + ) + + next_page_url = response.xpath('//a[@aria-label="pagination.next"]/@href').get() + if next_page_url: + yield scrapy.Request(next_page_url) diff --git a/data_collection/gazette/spiders/base/dionet.py b/data_collection/gazette/spiders/base/dionet.py new file mode 100644 index 000000000..35e5c55db --- /dev/null +++ b/data_collection/gazette/spiders/base/dionet.py @@ -0,0 +1,48 @@ +from dateutil.rrule import DAILY, rrule +from scrapy.http import Request + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class DionetGazetteSpider(BaseGazetteSpider): + """ + Base Spider for all cities using IONEWS' DIONET product + + In addition to the normal spider attributes, for this base spider + one other class attributes may be necessary: + - subtheme + """ + + url_subtheme = "" + + def start_requests(self): + for date_ in rrule(freq=DAILY, dtstart=self.start_date, until=self.end_date): + day = str(date_.day).zfill(2) + month = str(date_.month).zfill(2) + + api_path = f"/apifront/portal/edicoes/edicoes_from_data/{date_.year}-{month}-{day}.json" + url = "".join([self.BASE_URL, api_path, self.url_subtheme]) + + yield Request(url=url, cb_kwargs={"gazette_date": date_.date()}) + + def parse(self, response, gazette_date): + gazette_data = response.json() + if gazette_data["erro"]: + return + + items = gazette_data.get("itens", []) + for item in items: + gazette_id = item["id"] + gazette_url = f"{self.BASE_URL}/portal/edicoes/download/{gazette_id}" + + is_extra_edition = item["suplemento"] == 1 + edition_number = item["numero"] + + yield Gazette( + date=gazette_date, + file_urls=[gazette_url], + is_extra_edition=is_extra_edition, + edition_number=edition_number, + power="executive", + ) diff --git a/data_collection/gazette/spiders/base/municipioonline.py b/data_collection/gazette/spiders/base/municipioonline.py new file mode 100644 index 000000000..82209cb74 --- /dev/null +++ b/data_collection/gazette/spiders/base/municipioonline.py @@ -0,0 +1,85 @@ +from collections import deque +from datetime import datetime, timedelta +from itertools import islice + +import scrapy +from dateutil.rrule import YEARLY, rrule + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class BaseMunicipioOnlineSpider(BaseGazetteSpider): + custom_settings = { + "DOWNLOAD_DELAY": 1, + "CONCURRENT_REQUESTS_PER_DOMAIN": 4, + } + + allowed_domains = ["municipioonline.com.br"] + + def start_requests(self): + url = f"https://www.municipioonline.com.br/{self.url_uf}/prefeitura/{self.url_city}/cidadao/diariooficial" + yield scrapy.Request(url, callback=self.date_filter_request) + + def date_filter_request(self, response): + """ + Cria requisições para filtro por data. + + O sistema em teoria permite fazer uma requisição única para qualquer start_date + e end_date. Porém, alguns municípios com cobertura cronológica maior retornam + resposta com código 500 caso o intervalo de tempo seja muito grande. + + Assim, nessa implementação, o intervalo de tempo máximo para apenas uma + requisição é de um ano. Acima disso, mais de uma requisição será realizada. + """ + dates_of_interest = [ + dt + for dt in rrule(freq=YEARLY, dtstart=self.start_date, until=self.end_date) + ] + + if self.end_date not in dates_of_interest: + dates_of_interest.append(self.end_date) + + for filter_start, filter_end in self._sliding_window(dates_of_interest, 2): + if dates_of_interest[-1] != filter_end: + filter_end -= timedelta(days=1) + + filter_start = filter_start.strftime("%d/%m/%Y") + filter_end = filter_end.strftime("%d/%m/%Y") + + formdata = { + "__EVENTTARGET": "ctl00$body$btnBuscaPalavrachave", + "ctl00$body$txtDtPeriodo": f"{filter_start}-{filter_end}", + } + + yield scrapy.FormRequest.from_response(response, formdata=formdata) + + def _sliding_window(self, iterable, n): + it = iter(iterable) + window = deque(islice(it, n - 1), maxlen=n) + for x in it: + window.append(x) + yield tuple(window) + + def parse(self, response): + editions_list = response.css("div.panel") + + for edition in editions_list: + metadata = edition.css("div.panel-title ::text") + edition_number = metadata.re_first(r"(\d+)/") + raw_date = metadata.re_first(r"\d{2}/\d{2}/\d{4}") + edition_date = datetime.strptime(raw_date, "%d/%m/%Y").date() + + url_path = edition.xpath(".//a[@onclick]").re_first(r"l=(.+)'") + + gazette_url = response.urljoin( + f"diariooficial/diario?n=diario.pdf&l={url_path}" + ) + + yield Gazette( + date=edition_date, + edition_number=edition_number, + file_urls=[gazette_url], + is_extra_edition=False, + power="executive", + ) diff --git a/data_collection/gazette/spiders/base/sai.py b/data_collection/gazette/spiders/base/sai.py new file mode 100644 index 000000000..f7bd0b6f2 --- /dev/null +++ b/data_collection/gazette/spiders/base/sai.py @@ -0,0 +1,68 @@ +import scrapy +from dateutil.parser import parse as dt_parse + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class SaiGazetteSpider(BaseGazetteSpider): + """ + Base Spider for all cases with use SAI (Serviço de Acesso a Informação) + Read more in https://imap.org.br/sistemas/sai/ + + Attributes + ---------- + base_url : str + It must be defined in child classes. + If the domain is sai.io.org.br you must add the subpat otherwise use the domain only + e.g: + - sai domain: https://sai.io.org.br/ba/maragojipe + - other domain: https://www.igaci.al.gov.br/site/diariooficial + + start_date : datetime.date + Must be get into execution from website + """ + + base_url = None + start_date = None + + @property + def _site_url(self): + return f"{self.base_url}/Site/DiarioOficial" + + def start_requests(self): + yield scrapy.Request(url=self._site_url, callback=self._pagination_requests) + + def _pagination_requests(self, response): + client_id = response.xpath("//select[@id='cod_cliente']/option[2]/@value").get() + + for year in range(self.start_date.year, self.end_date.year + 1): + formdata = { + "URL": "/Site/GetSubGrupoDiarioOficial", + "diarioOficial.cod_cliente": f"{client_id}", + "diarioOficial.tipoFormato": "1", + "diarioOficial.ano": f"{year}", + "diarioOficial.dataInicial": self.start_date.strftime("%Y-%m-%d"), + "diarioOficial.dataFinal": self.end_date.strftime("%Y-%m-%d"), + } + + yield scrapy.FormRequest( + url=self._site_url, + formdata=formdata, + callback=self.parse_item, + cb_kwargs={"client_id": client_id}, + ) + + def parse_item(self, response, client_id): + gazette_list = response.json() + for gazette_item in gazette_list: + edition_number = gazette_item["cod_documento"] + date = dt_parse(gazette_item["dat_criacao"]).date() + file_url = f"https://sai.io.org.br/Handler.ashx?f=diario&query={edition_number}&c={client_id}&m=0" + yield Gazette( + date=date, + file_urls=[file_url], + edition_number=edition_number, + is_extra_edition=False, + power="executive_legislative", + ) diff --git a/data_collection/gazette/spiders/es/es_associacao_municipios.py b/data_collection/gazette/spiders/es/es_associacao_municipios.py index 95ef414ed..065d809f9 100644 --- a/data_collection/gazette/spiders/es/es_associacao_municipios.py +++ b/data_collection/gazette/spiders/es/es_associacao_municipios.py @@ -1,28 +1,13 @@ -from dateparser import parse +from datetime import date -from gazette.items import Gazette -from gazette.spiders.base import BaseGazetteSpider +from gazette.spiders.base.dionet import DionetGazetteSpider -class EsAssociacaoMunicipiosSpider(BaseGazetteSpider): +class EsAssociacaoMunicipiosSpider(DionetGazetteSpider): TERRITORY_ID = "3200000" name = "es_associacao_municipios" - allowed_domains = ["diariomunicipales.org.br"] - start_urls = ["https://diariomunicipales.org.br/?r=site/edicoes&Edicao_page=1"] + allowed_domains = ["ioes.dio.es.gov.br"] + start_date = date(2021, 3, 1) - def parse(self, response): - for gazette_node in response.css(".items tbody tr"): - url = gazette_node.css("[download]::attr(href)").extract_first() - date = gazette_node.css("td::text")[1].extract() - date = parse(date, languages=["pt"]).date() - yield Gazette( - date=date, - file_urls=[url], - is_extra_edition=False, - power="executive", - ) - - css_path = ".pagination .next:not(.disabled) a::attr(href)" - next_page_url = response.css(css_path).extract_first() - if next_page_url: - yield response.follow(next_page_url) + BASE_URL = "https://ioes.dio.es.gov.br" + url_subtheme = "?subtheme=dom" diff --git a/data_collection/gazette/spiders/es/es_serra.py b/data_collection/gazette/spiders/es/es_serra.py index d6d22dbbe..de0982019 100644 --- a/data_collection/gazette/spiders/es/es_serra.py +++ b/data_collection/gazette/spiders/es/es_serra.py @@ -1,42 +1,13 @@ -import datetime +from datetime import date -import scrapy -from dateutil.rrule import DAILY, rrule +from gazette.spiders.base.dionet import DionetGazetteSpider -from gazette.items import Gazette -from gazette.spiders.base import BaseGazetteSpider - -class EsSerraSpider(BaseGazetteSpider): +class EsSerraSpider(DionetGazetteSpider): TERRITORY_ID = "3205002" name = "es_serra" allowed_domains = ["ioes.dio.es.gov.br"] + start_date = date(2021, 1, 1) - start_date = datetime.date(2021, 1, 1) - - def start_requests(self): - for date in rrule(freq=DAILY, dtstart=self.start_date, until=self.end_date): - day = str(date.day).zfill(2) - month = str(date.month).zfill(2) - url = f"https://ioes.dio.es.gov.br/apifront/portal/edicoes/edicoes_from_data/{date.year}-{month}-{day}.json?subtheme=diariodaserra" - yield scrapy.Request(url=url, cb_kwargs={"gazette_date": date.date()}) - - def parse(self, response, gazette_date): - gazette_data = response.json() - if gazette_data["erro"]: - return - - items = gazette_data.get("itens", []) - for item in items: - gazette_id = item["id"] - gazette_url = ( - f"https://ioes.dio.es.gov.br/portal/edicoes/download/{gazette_id}" - ) - is_extra_edition = item["suplemento"] == 1 - yield Gazette( - date=gazette_date, - edition_number=item["numero"], - file_urls=[gazette_url], - is_extra_edition=is_extra_edition, - power="executive", - ) + BASE_URL = "https://ioes.dio.es.gov.br" + url_subtheme = "?subtheme=diariodaserra" diff --git a/data_collection/gazette/spiders/mg/mg_belo_horizonte.py b/data_collection/gazette/spiders/mg/mg_belo_horizonte.py index 3c06d7a05..520897270 100644 --- a/data_collection/gazette/spiders/mg/mg_belo_horizonte.py +++ b/data_collection/gazette/spiders/mg/mg_belo_horizonte.py @@ -2,6 +2,7 @@ from urllib.parse import urlencode import scrapy +import w3lib.url from dateutil.rrule import DAILY, rrule from gazette.items import Gazette @@ -33,12 +34,17 @@ def parse(self, response, gazette_date): gazettes = data["data"] for gazette in gazettes: is_extra_edition = gazette["tipo_edicao"] != "P" - gazette_hash = gazette["documento_jornal"]["nome_minio"] gazette_url = ( f"https://api-dom.pbh.gov.br/api/v1/documentos/{gazette_hash}/download" ) + prefix = gazette["documento_jornal"]["prefix"] + if prefix is not None: + gazette_url = w3lib.url.add_or_replace_parameter( + gazette_url, "prefix", prefix + ) + yield Gazette( date=gazette_date, edition_number=gazette["numero_edicao"], diff --git a/data_collection/gazette/spiders/mg/mg_januaria.py b/data_collection/gazette/spiders/mg/mg_januaria.py new file mode 100644 index 000000000..c2b421900 --- /dev/null +++ b/data_collection/gazette/spiders/mg/mg_januaria.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class MgJanuariaSpider(BaseInstarSpider): + TERRITORY_ID = "3135209" + name = "mg_januaria" + allowed_domains = ["januaria.mg.gov.br"] + base_url = "https://www.januaria.mg.gov.br/portal/diario-oficial" + start_date = date(2022, 4, 29) diff --git a/data_collection/gazette/spiders/mg/mg_uberlandia.py b/data_collection/gazette/spiders/mg/mg_uberlandia.py new file mode 100644 index 000000000..471b2d68d --- /dev/null +++ b/data_collection/gazette/spiders/mg/mg_uberlandia.py @@ -0,0 +1,66 @@ +import datetime + +import dateparser +import scrapy +import w3lib +from dateutil.rrule import MONTHLY, rrule + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class MgUberlandiaSpider(BaseGazetteSpider): + TERRITORY_ID = "3170206" + name = "mg_uberlandia" + start_date = datetime.date(2005, 1, 3) + + def start_requests(self): + first_day_of_start_date_month = datetime.date( + self.start_date.year, self.start_date.month, 1 + ) + months_of_interest = rrule( + MONTHLY, dtstart=first_day_of_start_date_month, until=self.end_date + ) + for month_date in months_of_interest: + yield scrapy.Request( + f"https://www.uberlandia.mg.gov.br/{month_date.year}/{month_date.month}/?post_type=diariooficial", + errback=self.on_error, + ) + + def on_error(self, failure): + # month/year URLs have two different valid query parameters: + # post_type=diario_oficial or post_type=diariooficial + # so if the first is not found, it will retry with the second type + if failure.value.response.status == 404: + alternative_url = w3lib.url.add_or_replace_parameter( + failure.value.response.url, "post_type", "diario_oficial" + ) + yield scrapy.Request(alternative_url) + + def parse(self, response): + gazettes = response.css("article.elementor-post") + for gazette in gazettes: + gazette_date = dateparser.parse( + gazette.css( + ".elementor-post-date::text, .ee-post__metas__date::text" + ).get() + ).date() + if gazette_date < self.start_date or gazette_date > self.end_date: + continue + + edition = gazette.css("h3 a::text, h5::text") + edition_number = edition.re_first(r"(\d+)") + is_extra_edition = bool(edition.re(r"\d+.*?([A-Za-z]+)")) + + gazette_url = gazette.css("a::attr(href)").get() + + yield Gazette( + date=gazette_date, + edition_number=edition_number, + is_extra_edition=is_extra_edition, + file_urls=[gazette_url], + power="executive", + ) + + for page_url in response.css("nav a.page-numbers::attr(href)").getall(): + yield scrapy.Request(page_url) diff --git a/data_collection/gazette/spiders/ms/ms_bela_vista.py b/data_collection/gazette/spiders/ms/ms_bela_vista.py new file mode 100644 index 000000000..889627b87 --- /dev/null +++ b/data_collection/gazette/spiders/ms/ms_bela_vista.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class MsBelaVistaSpider(BaseInstarSpider): + TERRITORY_ID = "5002100" + name = "ms_bela_vista" + allowed_domains = ["belavista.ms.gov.br"] + base_url = "https://www.belavista.ms.gov.br/portal/diario-oficial" + start_date = date(2011, 11, 16) diff --git a/data_collection/gazette/spiders/ms/ms_corumba.py b/data_collection/gazette/spiders/ms/ms_corumba.py new file mode 100644 index 000000000..8eafcb2d9 --- /dev/null +++ b/data_collection/gazette/spiders/ms/ms_corumba.py @@ -0,0 +1,12 @@ +from datetime import date + +from gazette.spiders.base.dionet import DionetGazetteSpider + + +class MsCorumba(DionetGazetteSpider): + TERRITORY_ID = "5003207" + name = "ms_corumba" + allowed_domains = ["do.corumba.ms.gov.br"] + start_date = date(2012, 6, 26) + + BASE_URL = "https://do.corumba.ms.gov.br" diff --git a/data_collection/gazette/spiders/ms/ms_costa_rica.py b/data_collection/gazette/spiders/ms/ms_costa_rica.py new file mode 100644 index 000000000..46701f379 --- /dev/null +++ b/data_collection/gazette/spiders/ms/ms_costa_rica.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class MsCostaRicaSpider(BaseInstarSpider): + TERRITORY_ID = "5003256" + name = "ms_costa_rica" + allowed_domains = ["costarica.ms.gov.br"] + base_url = "https://www.costarica.ms.gov.br/portal/diario-oficial" + start_date = date(2005, 1, 3) diff --git a/data_collection/gazette/spiders/pr/pr_antonio_olinto.py b/data_collection/gazette/spiders/pr/pr_antonio_olinto.py new file mode 100644 index 000000000..39e979323 --- /dev/null +++ b/data_collection/gazette/spiders/pr/pr_antonio_olinto.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class PrAntonioOlintoSpider(BaseInstarSpider): + TERRITORY_ID = "4101309" + name = "pr_antonio_olinto" + allowed_domains = ["antonioolinto.pr.gov.br"] + base_url = "https://antonioolinto.pr.gov.br/portal/diario-oficial" + start_date = date(2016, 6, 14) diff --git a/data_collection/gazette/spiders/pr/pr_primeiro_de_maio.py b/data_collection/gazette/spiders/pr/pr_primeiro_de_maio.py new file mode 100644 index 000000000..adad821e8 --- /dev/null +++ b/data_collection/gazette/spiders/pr/pr_primeiro_de_maio.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class PrPrimeiroDeMaioSpider(BaseInstarSpider): + TERRITORY_ID = "4120507" + name = "pr_primeiro_de_maio" + allowed_domains = ["primeirodemaio.pr.gov.br"] + base_url = "https://www.primeirodemaio.pr.gov.br/portal/diario-oficial" + start_date = date(2022, 4, 14) diff --git a/data_collection/gazette/spiders/pr/pr_tamboara.py b/data_collection/gazette/spiders/pr/pr_tamboara.py new file mode 100644 index 000000000..830216d36 --- /dev/null +++ b/data_collection/gazette/spiders/pr/pr_tamboara.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.doem import DoemGazetteSpider + + +class PrTamboaraSpider(DoemGazetteSpider): + TERRITORY_ID = "4126702" + name = "pr_tamboara" + start_date = date(2022, 8, 22) + state_city_url_part = "pr/tamboara" diff --git a/data_collection/gazette/spiders/rj/rj_niteroi.py b/data_collection/gazette/spiders/rj/rj_niteroi.py index 9ae0b0614..b55b5e947 100644 --- a/data_collection/gazette/spiders/rj/rj_niteroi.py +++ b/data_collection/gazette/spiders/rj/rj_niteroi.py @@ -11,8 +11,9 @@ class RjNiteroiSpider(BaseGazetteSpider): name = "rj_niteroi" allowed_domains = ["niteroi.rj.gov.br"] start_urls = ["http://www.niteroi.rj.gov.br"] - download_url = "http://pgm.niteroi.rj.gov.br/downloads/do/{}/{}/{:02d}.pdf" + download_url = "http://www.niteroi.rj.gov.br/wp-content/uploads/do/{}/{}/{:02d}.pdf" start_date = dt.date(2003, 7, 1) + end_date = dt.date.today() month_names = [ "01_Jan", @@ -30,7 +31,8 @@ class RjNiteroiSpider(BaseGazetteSpider): ] def parse(self, response): - parsing_date = dt.date.today() + parsing_date = self.end_date + while parsing_date >= self.start_date: month = self.month_names[parsing_date.month - 1] url = self.download_url.format(parsing_date.year, month, parsing_date.day) diff --git a/data_collection/gazette/spiders/rj/rj_rio_de_janeiro.py b/data_collection/gazette/spiders/rj/rj_rio_de_janeiro.py index d32990f20..5f87d2b4d 100644 --- a/data_collection/gazette/spiders/rj/rj_rio_de_janeiro.py +++ b/data_collection/gazette/spiders/rj/rj_rio_de_janeiro.py @@ -1,41 +1,12 @@ -import datetime +from datetime import date -import scrapy -from dateutil.rrule import DAILY, rrule +from gazette.spiders.base.dionet import DionetGazetteSpider -from gazette.items import Gazette -from gazette.spiders.base import BaseGazetteSpider - -class RjRioDeJaneiroSpider(BaseGazetteSpider): +class RjRioDeJaneiroSpider(DionetGazetteSpider): TERRITORY_ID = "3304557" name = "rj_rio_de_janeiro" allowed_domains = ["doweb.rio.rj.gov.br"] + start_date = date(2006, 3, 16) - start_date = datetime.date(2006, 3, 16) - - def start_requests(self): - for date in rrule(freq=DAILY, dtstart=self.start_date, until=self.end_date): - day = str(date.day).zfill(2) - month = str(date.month).zfill(2) - url = f"https://doweb.rio.rj.gov.br/apifront/portal/edicoes/edicoes_from_data/{date.year}-{month}-{day}.json" - yield scrapy.Request(url=url, cb_kwargs={"gazette_date": date.date()}) - - def parse(self, response, gazette_date): - gazette_data = response.json() - if gazette_data["erro"]: - return - - items = gazette_data.get("itens", []) - for item in items: - gazette_id = item["id"] - gazette_url = ( - f"https://doweb.rio.rj.gov.br/portal/edicoes/download/{gazette_id}" - ) - is_extra_edition = item["suplemento"] == 1 - yield Gazette( - date=gazette_date, - file_urls=[gazette_url], - is_extra_edition=is_extra_edition, - power="executive", - ) + BASE_URL = "https://doweb.rio.rj.gov.br" diff --git a/data_collection/gazette/spiders/rj/rj_sao_joao_de_meriti.py b/data_collection/gazette/spiders/rj/rj_sao_joao_de_meriti.py new file mode 100644 index 000000000..d901fda34 --- /dev/null +++ b/data_collection/gazette/spiders/rj/rj_sao_joao_de_meriti.py @@ -0,0 +1,36 @@ +import datetime as dt + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class RjSaoJoaoDeMeritiSpider(BaseGazetteSpider): + TERRITORY_ID = "3305109" + name = "rj_sao_joao_de_meriti" + allowed_domains = ["transparencia.meriti.rj.gov.br"] + start_urls = ["https://transparencia.meriti.rj.gov.br/diario_oficial_get.php"] + BASE_URL = "https://transparencia.meriti.rj.gov.br/ver20230623/WEB-ObterAnexo.rule?sys=LAI&codigo=" + start_date = dt.date(2017, 1, 1) + custom_settings = {"DOWNLOAD_DELAY": 0.5, "RANDOMIZE_DOWNLOAD_DELAY": True} + + def parse(self, response): + for gazette_data in response.json(): + raw_gazette_date = gazette_data["Data_Formatada"] + gazette_date = dt.datetime.strptime(raw_gazette_date, "%d/%m/%Y").date() + + if not self.start_date <= gazette_date <= self.end_date: + continue + gazette_code = gazette_data["Codigo_ANEXO"] + # links quebrados no portal de transparência + if gazette_code == 1: + continue + gazette_edition_number = gazette_data["ANEXO"] + gazette_url = f"{self.BASE_URL}{gazette_code}" + + yield Gazette( + date=gazette_date, + edition_number=gazette_edition_number, + file_urls=[gazette_url], + is_extra_edition=False, + power="executive_legislative", + ) diff --git a/data_collection/gazette/spiders/rn/rn_mossoro.py b/data_collection/gazette/spiders/rn/rn_mossoro.py deleted file mode 100644 index f6f6976da..000000000 --- a/data_collection/gazette/spiders/rn/rn_mossoro.py +++ /dev/null @@ -1,44 +0,0 @@ -import datetime as dt -import re - -import scrapy - -from gazette.items import Gazette -from gazette.spiders.base import BaseGazetteSpider - - -class RnMossoroSpider(BaseGazetteSpider): - TERRITORY_ID = "2408003" - name = "rn_mossoro" - start_date = dt.date(2023, 1, 1) - allowed_domains = ["dom.mossoro.rn.gov.br"] - start_urls = ["https://www.dom.mossoro.rn.gov.br/dom/edicoes"] - - def parse(self, response): - for edition in response.css("div.edicoes-list div.col-md-3"): - url = edition.css("a::attr(href)").get() - raw_date = edition.css("div.card-content p::text").get().strip() - date = dt.datetime.strptime(raw_date, "%d/%m/%Y").date() - raw_edition_number = edition.css("div.card-content h4::text").get().strip() - edition_number = re.findall(r"DOM N. (\d+)", raw_edition_number) - - if date > self.end_date: - continue - elif date < self.start_date: - return - - yield Gazette( - date=date, - edition_number=edition_number, - file_urls=[f"https://www.dom.mossoro.rn.gov.br{url}"], - is_extra_edition=False, - power="executive_legislative", - ) - - next_page_url = response.xpath( - "//a[contains(text(), 'PRÓXIMA PÁGINA')]/@href" - ).get() - if next_page_url: - yield scrapy.Request( - f"https://www.dom.mossoro.rn.gov.br{next_page_url}", callback=self.parse - ) diff --git a/data_collection/gazette/spiders/rn/rn_mossoro_2008_2022.py b/data_collection/gazette/spiders/rn/rn_mossoro_2008.py similarity index 98% rename from data_collection/gazette/spiders/rn/rn_mossoro_2008_2022.py rename to data_collection/gazette/spiders/rn/rn_mossoro_2008.py index 8e5d87702..5a081d39b 100644 --- a/data_collection/gazette/spiders/rn/rn_mossoro_2008_2022.py +++ b/data_collection/gazette/spiders/rn/rn_mossoro_2008.py @@ -10,7 +10,7 @@ class RnMossoroSpider(BaseGazetteSpider): TERRITORY_ID = "2408003" - name = "rn_mossoro_2008_2022" + name = "rn_mossoro_2008" allowed_domains = ["jom.mossoro.rn.gov.br"] start_date = dt.date(2008, 1, 1) diff --git a/data_collection/gazette/spiders/rn/rn_mossoro_2023.py b/data_collection/gazette/spiders/rn/rn_mossoro_2023.py new file mode 100644 index 000000000..072e02fd8 --- /dev/null +++ b/data_collection/gazette/spiders/rn/rn_mossoro_2023.py @@ -0,0 +1,58 @@ +import datetime as dt + +import scrapy + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class RnMossoroSpider(BaseGazetteSpider): + TERRITORY_ID = "2408003" + name = "rn_mossoro_2023" + start_date = dt.date(2023, 1, 2) + allowed_domains = ["dom.mossoro.rn.gov.br"] + start_urls = ["https://www.dom.mossoro.rn.gov.br/dom/edicoes"] + + def parse(self, response): + for edition in response.css("div.edicoes-list div.col-md-3"): + raw_date = edition.css("div.card-content p::text").get().strip() + gazette_date = dt.datetime.strptime(raw_date, "%d/%m/%Y").date() + + if gazette_date > self.end_date: + continue + if self.start_date > gazette_date: + return + + intermediary_page = edition.css("a::attr(href)").get() + yield scrapy.Request( + response.urljoin(intermediary_page), + callback=self.parse_gazette_page, + cb_kwargs={"gazette_date": gazette_date}, + ) + + next_page_url = response.xpath( + "//a[contains(text(), 'PRÓXIMA PÁGINA')]/@href" + ).get() + if next_page_url: + yield scrapy.Request(response.urljoin(next_page_url), callback=self.parse) + + def extra_edition_check(self, edition_number): + if edition_number.isdigit(): + return False + return True + + def parse_gazette_page(self, response, gazette_date): + edition_number = response.xpath( + "//div[@id='main-content']//li[3]//strong/text()" + ).get() + file_link = response.xpath( + "//div[@id='main-content']//a[contains(@href, '/pmm/uploads/publicacao/pdf')]/@href" + ).get() + + yield Gazette( + date=gazette_date, + edition_number=edition_number, + file_urls=[response.urljoin(file_link)], + is_extra_edition=self.extra_edition_check(edition_number), + power="executive", + ) diff --git a/data_collection/gazette/spiders/ro/ro_jaru.py b/data_collection/gazette/spiders/ro/ro_jaru.py new file mode 100644 index 000000000..aa96476cc --- /dev/null +++ b/data_collection/gazette/spiders/ro/ro_jaru.py @@ -0,0 +1,14 @@ +from datetime import date + +from gazette.spiders.base.dionet import DionetGazetteSpider + + +class RoJaruSpider(DionetGazetteSpider): + zyte_smartproxy_enabled = True + + TERRITORY_ID = "1100114" + name = "ro_jaru" + allowed_domains = ["doe.jaru.ro.gov.br"] + start_date = date(2022, 1, 1) + + BASE_URL = "https://doe.jaru.ro.gov.br" diff --git a/data_collection/gazette/spiders/rs/rs_cachoeira_do_sul.py b/data_collection/gazette/spiders/rs/rs_cachoeira_do_sul.py new file mode 100644 index 000000000..009d5192a --- /dev/null +++ b/data_collection/gazette/spiders/rs/rs_cachoeira_do_sul.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class RsCachoeiraDoSulSpider(BaseInstarSpider): + TERRITORY_ID = "4303004" + name = "rs_cachoeira_do_sul" + allowed_domains = ["cachoeiradosul.rs.gov.br"] + base_url = "https://www.cachoeiradosul.rs.gov.br/portal/diario-oficial" + start_date = date(2021, 10, 27) diff --git a/data_collection/gazette/spiders/sc/sc_florianopolis.py b/data_collection/gazette/spiders/sc/sc_florianopolis.py index a47cf56f3..6dd188553 100644 --- a/data_collection/gazette/spiders/sc/sc_florianopolis.py +++ b/data_collection/gazette/spiders/sc/sc_florianopolis.py @@ -12,7 +12,6 @@ class ScFlorianopolisSpider(BaseGazetteSpider): name = "sc_florianopolis" TERRITORY_ID = "4205407" - start_date = date(2009, 6, 1) def start_requests(self): @@ -25,7 +24,7 @@ def start_requests(self): for year, month in periods_of_interest: data = dict(ano=str(year), mes=str(month), passo="1", enviar="") yield FormRequest( - "http://www.pmf.sc.gov.br/governo/index.php?pagina=govdiariooficial", + "https://www.pmf.sc.gov.br/governo/index.php?pagina=govdiariooficial", formdata=data, ) @@ -42,14 +41,16 @@ def parse(self, response): yield Gazette( date=gazette_date, edition_number=gazette_edition_number, - file_urls=(url,), + file_urls=[ + url, + ], is_extra_edition=self.is_extra(link), power="executive_legislative", ) @staticmethod def get_pdf_url(response, link): - relative_url = link.css("::attr(href)").extract_first() + relative_url = link.css("::attr(href)").get() if not relative_url.lower().endswith(".pdf"): return None @@ -57,7 +58,7 @@ def get_pdf_url(response, link): @staticmethod def get_date(link): - text = " ".join(link.css("::text").extract()) + text = " ".join(link.css("::text").getall()) pattern = r"\d{1,2}\s+de\s+\w+\s+de\s+\d{4}" match = re.search(pattern, text) if not match: @@ -67,5 +68,5 @@ def get_date(link): @staticmethod def is_extra(link): - text = " ".join(link.css("::text").extract()) + text = " ".join(link.css("::text").getall()) return "extra" in text.lower() diff --git a/data_collection/gazette/spiders/se/se_aquidaba.py b/data_collection/gazette/spiders/se/se_aquidaba.py new file mode 100644 index 000000000..a050bd379 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_aquidaba.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeAquidabaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2800209" + name = "se_aquidaba" + start_date = date(2017, 2, 16) + url_uf = "se" + url_city = "aquidaba" diff --git a/data_collection/gazette/spiders/se/se_areia_branca.py b/data_collection/gazette/spiders/se/se_areia_branca.py new file mode 100644 index 000000000..337db6b55 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_areia_branca.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeAreiaBrancaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2800506" + name = "se_areia_branca" + start_date = date(2017, 1, 2) + url_uf = "se" + url_city = "areiabranca" diff --git a/data_collection/gazette/spiders/se/se_barra_dos_coqueiros.py b/data_collection/gazette/spiders/se/se_barra_dos_coqueiros.py new file mode 100644 index 000000000..fbac9946d --- /dev/null +++ b/data_collection/gazette/spiders/se/se_barra_dos_coqueiros.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeBarraDosCoqueirosSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2800605" + name = "se_barra_dos_coqueiros" + start_date = date(2022, 9, 30) + url_uf = "se" + url_city = "barradoscoqueiros" diff --git a/data_collection/gazette/spiders/se/se_campo_do_brito.py b/data_collection/gazette/spiders/se/se_campo_do_brito.py new file mode 100644 index 000000000..60948ee9f --- /dev/null +++ b/data_collection/gazette/spiders/se/se_campo_do_brito.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeCampoDoBritoSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2801009" + name = "se_campo_do_brito" + start_date = date(2021, 1, 5) + url_uf = "se" + url_city = "campodobrito" diff --git a/data_collection/gazette/spiders/se/se_canhoba.py b/data_collection/gazette/spiders/se/se_canhoba.py new file mode 100644 index 000000000..efd55c54b --- /dev/null +++ b/data_collection/gazette/spiders/se/se_canhoba.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeCanhobaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2801108" + name = "se_canhoba" + start_date = date(2020, 1, 16) + url_uf = "se" + url_city = "canhoba" diff --git a/data_collection/gazette/spiders/se/se_caninde_de_sao_francisco.py b/data_collection/gazette/spiders/se/se_caninde_de_sao_francisco.py new file mode 100644 index 000000000..a3f5758cc --- /dev/null +++ b/data_collection/gazette/spiders/se/se_caninde_de_sao_francisco.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeCanindeDeSaoFranciscoSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2801207" + name = "se_caninde_de_sao_francisco" + start_date = date(2017, 1, 2) + url_uf = "se" + url_city = "canindedesaofrancisco" diff --git a/data_collection/gazette/spiders/se/se_capela.py b/data_collection/gazette/spiders/se/se_capela.py new file mode 100644 index 000000000..463144d0b --- /dev/null +++ b/data_collection/gazette/spiders/se/se_capela.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeCapelaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2801306" + name = "se_capela" + start_date = date(2021, 2, 4) + url_uf = "se" + url_city = "capela" diff --git a/data_collection/gazette/spiders/se/se_divina_pastora.py b/data_collection/gazette/spiders/se/se_divina_pastora.py new file mode 100644 index 000000000..7495767fe --- /dev/null +++ b/data_collection/gazette/spiders/se/se_divina_pastora.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeDivinaPastoraSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2802007" + name = "se_divina_pastora" + start_date = date(2019, 1, 10) + url_uf = "se" + url_city = "divinapastora" diff --git a/data_collection/gazette/spiders/se/se_estancia.py b/data_collection/gazette/spiders/se/se_estancia.py new file mode 100644 index 000000000..910fd3f68 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_estancia.py @@ -0,0 +1,11 @@ +import datetime as dt + +from gazette.spiders.base.sai import SaiGazetteSpider + + +class SeEstanciaSpider(SaiGazetteSpider): + TERRITORY_ID = "2802106" + name = "se_estancia" + start_date = dt.date(2016, 4, 28) + allowed_domains = ["estancia.se.gov.br"] + base_url = "https://www.estancia.se.gov.br" diff --git a/data_collection/gazette/spiders/se/se_frei_paulo.py b/data_collection/gazette/spiders/se/se_frei_paulo.py new file mode 100644 index 000000000..35b5d4cec --- /dev/null +++ b/data_collection/gazette/spiders/se/se_frei_paulo.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeFreiPaulo(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2802304" + name = "se_frei_paulo" + start_date = date(2022, 1, 14) + url_uf = "se" + url_city = "freipaulo" diff --git a/data_collection/gazette/spiders/se/se_ilha_das_flores.py b/data_collection/gazette/spiders/se/se_ilha_das_flores.py new file mode 100644 index 000000000..f522ef0a8 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_ilha_das_flores.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeIlhaDasFloresSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2802700" + name = "se_ilha_das_flores" + start_date = date(2017, 1, 11) + url_uf = "se" + url_city = "ilhadasflores" diff --git a/data_collection/gazette/spiders/se/se_itabaiana.py b/data_collection/gazette/spiders/se/se_itabaiana.py new file mode 100644 index 000000000..b0ab104b2 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_itabaiana.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeItabaianaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2802908" + name = "se_itabaiana" + start_date = date(2023, 1, 2) + url_uf = "se" + url_city = "itabaiana" diff --git a/data_collection/gazette/spiders/se/se_itabaianinha.py b/data_collection/gazette/spiders/se/se_itabaianinha.py new file mode 100644 index 000000000..a918825a0 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_itabaianinha.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeItabaianinhaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2803005" + name = "se_itabaianinha" + start_date = date(2017, 1, 2) + url_uf = "se" + url_city = "itabaianinha" diff --git a/data_collection/gazette/spiders/se/se_japaratuba.py b/data_collection/gazette/spiders/se/se_japaratuba.py new file mode 100644 index 000000000..71ef40087 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_japaratuba.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeJaparatubaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2803302" + name = "se_japaratuba" + start_date = date(2017, 3, 22) + url_uf = "se" + url_city = "japaratuba" diff --git a/data_collection/gazette/spiders/se/se_moita_bonita.py b/data_collection/gazette/spiders/se/se_moita_bonita.py new file mode 100644 index 000000000..9d4c40cde --- /dev/null +++ b/data_collection/gazette/spiders/se/se_moita_bonita.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeMoitaBonitaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2804102" + name = "se_moita_bonita" + start_date = date(2022, 2, 17) + url_uf = "se" + url_city = "moitabonita" diff --git a/data_collection/gazette/spiders/se/se_muribeca.py b/data_collection/gazette/spiders/se/se_muribeca.py new file mode 100644 index 000000000..46d77d791 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_muribeca.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeMuribecaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2804300" + name = "se_muribeca" + start_date = date(2019, 5, 20) + url_uf = "se" + url_city = "muribeca" diff --git a/data_collection/gazette/spiders/se/se_nossa_senhora_das_dores.py b/data_collection/gazette/spiders/se/se_nossa_senhora_das_dores.py new file mode 100644 index 000000000..116757b7c --- /dev/null +++ b/data_collection/gazette/spiders/se/se_nossa_senhora_das_dores.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeNossaSenhoraDasDoresSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2804607" + name = "se_nossa_senhora_das_dores" + start_date = date(2017, 1, 2) + url_uf = "se" + url_city = "nossasenhoradasdores" diff --git a/data_collection/gazette/spiders/se/se_nossa_senhora_de_lourdes.py b/data_collection/gazette/spiders/se/se_nossa_senhora_de_lourdes.py new file mode 100644 index 000000000..bcf086b9e --- /dev/null +++ b/data_collection/gazette/spiders/se/se_nossa_senhora_de_lourdes.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeNossaSenhoraDeLourdesSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2804706" + name = "se_nossa_senhora_de_lourdes" + start_date = date(2017, 1, 12) + url_uf = "se" + url_city = "nossasenhoradelourdes" diff --git a/data_collection/gazette/spiders/se/se_pedra_mole.py b/data_collection/gazette/spiders/se/se_pedra_mole.py new file mode 100644 index 000000000..5af6d92fb --- /dev/null +++ b/data_collection/gazette/spiders/se/se_pedra_mole.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SePedraMole(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2805000" + name = "se_pedra_mole" + start_date = date(2017, 1, 26) + url_uf = "se" + url_city = "pedramole" diff --git a/data_collection/gazette/spiders/se/se_pirambu.py b/data_collection/gazette/spiders/se/se_pirambu.py new file mode 100644 index 000000000..0a8a0bcff --- /dev/null +++ b/data_collection/gazette/spiders/se/se_pirambu.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SePirambuSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2805307" + name = "se_pirambu" + start_date = date(2017, 3, 9) + url_uf = "se" + url_city = "pirambu" diff --git a/data_collection/gazette/spiders/se/se_poco_verde.py b/data_collection/gazette/spiders/se/se_poco_verde.py new file mode 100644 index 000000000..0796a248f --- /dev/null +++ b/data_collection/gazette/spiders/se/se_poco_verde.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SePocoVerdeSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2805505" + name = "se_poco_verde" + start_date = date(2023, 1, 2) + url_uf = "se" + url_city = "pocoverde" diff --git a/data_collection/gazette/spiders/se/se_riachao_do_dantas.py b/data_collection/gazette/spiders/se/se_riachao_do_dantas.py new file mode 100644 index 000000000..2f18266ee --- /dev/null +++ b/data_collection/gazette/spiders/se/se_riachao_do_dantas.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeRiachaoDoDantasSiririSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2805802" + name = "se_riachao_do_dantas" + start_date = date(2018, 11, 14) + url_uf = "se" + url_city = "riachaododantas" diff --git a/data_collection/gazette/spiders/se/se_rosario_do_catete.py b/data_collection/gazette/spiders/se/se_rosario_do_catete.py new file mode 100644 index 000000000..9dea4e75e --- /dev/null +++ b/data_collection/gazette/spiders/se/se_rosario_do_catete.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeRosarioDoCateteSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2806107" + name = "se_rosario_do_catete" + start_date = date(2021, 1, 12) + url_uf = "se" + url_city = "rosariodocatete" diff --git a/data_collection/gazette/spiders/se/se_sao_domingos.py b/data_collection/gazette/spiders/se/se_sao_domingos.py new file mode 100644 index 000000000..c97a759b3 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_sao_domingos.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeSaoDomingosSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2806800" + name = "se_sao_domingos" + start_date = date(2021, 1, 12) + url_uf = "se" + url_city = "saodomingos" diff --git a/data_collection/gazette/spiders/se/se_simao_dias.py b/data_collection/gazette/spiders/se/se_simao_dias.py new file mode 100644 index 000000000..aefaa7229 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_simao_dias.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeSimaoDiasSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2807105" + name = "se_simao_dias" + start_date = date(2021, 1, 4) + url_uf = "se" + url_city = "simaodias" diff --git a/data_collection/gazette/spiders/se/se_siriri.py b/data_collection/gazette/spiders/se/se_siriri.py new file mode 100644 index 000000000..6ba3f003e --- /dev/null +++ b/data_collection/gazette/spiders/se/se_siriri.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeSiririSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2807204" + name = "se_siriri" + start_date = date(2017, 1, 3) + url_uf = "se" + url_city = "siriri" diff --git a/data_collection/gazette/spiders/se/se_telha.py b/data_collection/gazette/spiders/se/se_telha.py new file mode 100644 index 000000000..3edc3eaf8 --- /dev/null +++ b/data_collection/gazette/spiders/se/se_telha.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.municipioonline import BaseMunicipioOnlineSpider + + +class SeTelhaSpider(BaseMunicipioOnlineSpider): + TERRITORY_ID = "2807303" + name = "se_telha" + start_date = date(2017, 1, 20) + url_uf = "se" + url_city = "telha" diff --git a/data_collection/gazette/spiders/sp/sp_aguas_de_sao_pedro.py b/data_collection/gazette/spiders/sp/sp_aguas_de_sao_pedro.py new file mode 100644 index 000000000..73f15d91f --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_aguas_de_sao_pedro.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpAguasDeSaoPedroSpider(BaseInstarSpider): + TERRITORY_ID = "3500600" + name = "sp_aguas_de_sao_pedro" + allowed_domains = ["aguasdesaopedro.sp.gov.br"] + base_url = "https://www.aguasdesaopedro.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 5, 18) diff --git a/data_collection/gazette/spiders/sp/sp_andradina.py b/data_collection/gazette/spiders/sp/sp_andradina.py new file mode 100644 index 000000000..b9a4bc74d --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_andradina.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpAndradinaSpider(BaseInstarSpider): + TERRITORY_ID = "3502101" + name = "sp_andradina" + allowed_domains = ["andradina.sp.gov.br"] + base_url = " https://www.andradina.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 3, 3) diff --git a/data_collection/gazette/spiders/sp/sp_aparecida.py b/data_collection/gazette/spiders/sp/sp_aparecida.py new file mode 100644 index 000000000..8dfc961f8 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_aparecida.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpAparecidaSpider(BaseInstarSpider): + TERRITORY_ID = "3502507" + name = "sp_aparecida" + allowed_domains = ["aparecida.sp.gov.br"] + base_url = "https://www.aparecida.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 8, 23) diff --git a/data_collection/gazette/spiders/sp/sp_arapei.py b/data_collection/gazette/spiders/sp/sp_arapei.py new file mode 100644 index 000000000..73a63bf05 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_arapei.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpArapeiSpider(BaseInstarSpider): + TERRITORY_ID = "3503158" + name = "sp_arapei" + allowed_domains = ["arapei.sp.gov.br"] + base_url = "https://www.arapei.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 5, 27) diff --git a/data_collection/gazette/spiders/sp/sp_avanhandava.py b/data_collection/gazette/spiders/sp/sp_avanhandava.py new file mode 100644 index 000000000..11b4d5d0e --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_avanhandava.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpAvanhandavaSpider(BaseInstarSpider): + TERRITORY_ID = "3504404" + name = "sp_avanhandava" + allowed_domains = ["avanhandava.sp.gov.br"] + base_url = "https://www.avanhandava.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 1, 7) diff --git a/data_collection/gazette/spiders/sp/sp_barbosa.py b/data_collection/gazette/spiders/sp/sp_barbosa.py new file mode 100644 index 000000000..9e420bb26 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_barbosa.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpBarbosaSpider(BaseInstarSpider): + TERRITORY_ID = "3505104" + name = "sp_barbosa" + allowed_domains = ["barbosa.sp.gov.br"] + base_url = "https://www.barbosa.sp.gov.br/portal/diario-oficial" + start_date = date(2017, 1, 5) diff --git a/data_collection/gazette/spiders/sp/sp_botucatu.py b/data_collection/gazette/spiders/sp/sp_botucatu.py new file mode 100644 index 000000000..811d16b2f --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_botucatu.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpBotucatuSpider(BaseInstarSpider): + TERRITORY_ID = "3507506" + name = "sp_botucatu" + allowed_domains = ["botucatu.sp.gov.br"] + base_url = "https://www.botucatu.sp.gov.br/portal/diario-oficial" + start_date = date(2000, 1, 6) diff --git a/data_collection/gazette/spiders/sp/sp_brejo_alegre.py b/data_collection/gazette/spiders/sp/sp_brejo_alegre.py new file mode 100644 index 000000000..a5d9f6257 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_brejo_alegre.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpBrejoAlegreSpider(BaseInstarSpider): + TERRITORY_ID = "3507753" + name = "sp_brejo_alegre" + allowed_domains = ["brejoalegre.sp.gov.br"] + base_url = "https://www.brejoalegre.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 10, 21) diff --git a/data_collection/gazette/spiders/sp/sp_charqueada.py b/data_collection/gazette/spiders/sp/sp_charqueada.py new file mode 100644 index 000000000..6fc78eef1 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_charqueada.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpCharqueadaSpider(BaseInstarSpider): + TERRITORY_ID = "3511706" + name = "sp_charqueada" + allowed_domains = ["charqueada.sp.gov.br"] + base_url = "https://www.charqueada.sp.gov.br/portal/diario-oficial" + start_date = date(2009, 1, 9) diff --git a/data_collection/gazette/spiders/sp/sp_cunha.py b/data_collection/gazette/spiders/sp/sp_cunha.py new file mode 100644 index 000000000..b41ba8f69 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_cunha.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.dosp import DospGazetteSpider + + +class SpCunhaSpider(DospGazetteSpider): + TERRITORY_ID = "3513603" + name = "sp_cunha" + code = 4800 + start_date = date(2021, 10, 19) diff --git a/data_collection/gazette/spiders/sp/sp_dirce_reis.py b/data_collection/gazette/spiders/sp/sp_dirce_reis.py new file mode 100644 index 000000000..a3e9f5918 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_dirce_reis.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpDirceReisSpider(BaseInstarSpider): + TERRITORY_ID = "3513850" + name = "sp_dirce_reis" + allowed_domains = ["dircereis.sp.gov.br"] + base_url = "https://www.dircereis.sp.gov.br/portal/diario-oficial" + start_date = date(2019, 10, 7) diff --git a/data_collection/gazette/spiders/sp/sp_dracena.py b/data_collection/gazette/spiders/sp/sp_dracena.py new file mode 100644 index 000000000..637423816 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_dracena.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpDracenaSpider(BaseInstarSpider): + TERRITORY_ID = "3514403" + name = "sp_dracena" + allowed_domains = ["dracena.sp.gov.br"] + base_url = "https://www.dracena.sp.gov.br/portal/diario-oficial" + start_date = date(2020, 11, 24) diff --git a/data_collection/gazette/spiders/sp/sp_eldorado.py b/data_collection/gazette/spiders/sp/sp_eldorado.py new file mode 100644 index 000000000..ee518df4a --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_eldorado.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpEldoradoSpider(BaseInstarSpider): + TERRITORY_ID = "3514809" + name = "sp_eldorado" + allowed_domains = ["eldorado.sp.gov.br"] + base_url = "https://www.eldorado.sp.gov.br/portal/diario-oficial" + start_date = date(2018, 12, 4) diff --git a/data_collection/gazette/spiders/sp/sp_floreal.py b/data_collection/gazette/spiders/sp/sp_floreal.py new file mode 100644 index 000000000..d1aeb6986 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_floreal.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpFlorealSpider(BaseInstarSpider): + TERRITORY_ID = "3515905" + name = "sp_floreal" + allowed_domains = ["floreal.sp.gov.br"] + base_url = "https://www.floreal.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 11, 9) diff --git a/data_collection/gazette/spiders/sp/sp_general_salgado.py b/data_collection/gazette/spiders/sp/sp_general_salgado.py new file mode 100644 index 000000000..84dc8df8f --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_general_salgado.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpGeneralSalgadoSpider(BaseInstarSpider): + TERRITORY_ID = "3516903" + name = "sp_general_salgado" + allowed_domains = ["generalsalgado.sp.gov.br"] + base_url = "https://www.generalsalgado.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 1, 13) diff --git a/data_collection/gazette/spiders/sp/sp_iepe.py b/data_collection/gazette/spiders/sp/sp_iepe.py new file mode 100644 index 000000000..f77a17bbe --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_iepe.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpIepeSpider(BaseInstarSpider): + TERRITORY_ID = "3519907" + name = "sp_iepe" + allowed_domains = ["iepe.sp.gov.br"] + base_url = "https://www.iepe.sp.gov.br/portal/diario-oficial" + start_date = date(2019, 10, 16) diff --git a/data_collection/gazette/spiders/sp/sp_igaracu_do_tiete.py b/data_collection/gazette/spiders/sp/sp_igaracu_do_tiete.py new file mode 100644 index 000000000..f94b0b187 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_igaracu_do_tiete.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpIgaracuDoTieteSpider(BaseInstarSpider): + TERRITORY_ID = "3520004" + name = "sp_igaracu_do_tiete" + allowed_domains = ["igaracudotiete.sp.gov.br"] + base_url = "https://www.igaracudotiete.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 1, 1) diff --git a/data_collection/gazette/spiders/sp/sp_irapuru.py b/data_collection/gazette/spiders/sp/sp_irapuru.py new file mode 100644 index 000000000..734abcfd2 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_irapuru.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpIrapuruSpider(BaseInstarSpider): + TERRITORY_ID = "3521606" + name = "sp_irapuru" + allowed_domains = ["irapuru.sp.gov.br"] + base_url = "https://www.irapuru.sp.gov.br/portal/diario-oficial" + start_date = date(2023, 3, 10) diff --git a/data_collection/gazette/spiders/sp/sp_itapolis.py b/data_collection/gazette/spiders/sp/sp_itapolis.py new file mode 100644 index 000000000..388ce0b46 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_itapolis.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpItapolisSpider(BaseInstarSpider): + TERRITORY_ID = "3522703" + name = "sp_itapolis" + allowed_domains = ["itapolis.sp.gov.br"] + base_url = "https://www.itapolis.sp.gov.br/portal/diario-oficial" + start_date = date(2017, 1, 11) diff --git a/data_collection/gazette/spiders/sp/sp_itapui.py b/data_collection/gazette/spiders/sp/sp_itapui.py new file mode 100644 index 000000000..5e40e69bc --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_itapui.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpItapuiSpider(BaseInstarSpider): + TERRITORY_ID = "3522901" + name = "sp_itapui" + allowed_domains = ["itapui.sp.gov.br"] + base_url = "https://www.itapui.sp.gov.br/portal/diario-oficial" + start_date = date(2017, 12, 15) diff --git a/data_collection/gazette/spiders/sp/sp_itariri.py b/data_collection/gazette/spiders/sp/sp_itariri.py new file mode 100644 index 000000000..087852ea0 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_itariri.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpItaririSpider(BaseInstarSpider): + TERRITORY_ID = "3523305" + name = "sp_itariri" + allowed_domains = ["itariri.sp.gov.br"] + base_url = "https://www.itariri.sp.gov.br/portal/diario-oficial" + start_date = date(2023, 2, 24) diff --git a/data_collection/gazette/spiders/sp/sp_itobi.py b/data_collection/gazette/spiders/sp/sp_itobi.py new file mode 100644 index 000000000..54f2dadc0 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_itobi.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpItobiSpider(BaseInstarSpider): + TERRITORY_ID = "3523800" + name = "sp_itobi" + allowed_domains = ["itobi.sp.gov.br"] + base_url = "https://www.itobi.sp.gov.br/portal/diario-oficial" + start_date = date(2011, 4, 1) diff --git a/data_collection/gazette/spiders/sp/sp_joanopolis.py b/data_collection/gazette/spiders/sp/sp_joanopolis.py new file mode 100644 index 000000000..e6819a4f9 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_joanopolis.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpJoanopolisSpider(BaseInstarSpider): + TERRITORY_ID = "3525508" + name = "sp_joanopolis" + allowed_domains = ["joanopolis.sp.gov.br"] + base_url = "https://www.joanopolis.sp.gov.br/portal/diario-oficial" + start_date = date(2013, 1, 30) diff --git a/data_collection/gazette/spiders/sp/sp_joao_ramalho.py b/data_collection/gazette/spiders/sp/sp_joao_ramalho.py new file mode 100644 index 000000000..6cc9576f9 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_joao_ramalho.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpJoaoRamalhoSpider(BaseInstarSpider): + TERRITORY_ID = "3525607" + name = "sp_joao_ramalho" + allowed_domains = ["joaoramalho.sp.gov.br"] + base_url = "https://www.joaoramalho.sp.gov.br/portal/diario-oficial" + start_date = date(2020, 3, 24) diff --git a/data_collection/gazette/spiders/sp/sp_junqueiropolis.py b/data_collection/gazette/spiders/sp/sp_junqueiropolis.py new file mode 100644 index 000000000..13de71a4c --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_junqueiropolis.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpJunqueiropolisSpider(BaseInstarSpider): + TERRITORY_ID = "3526001" + name = "sp_junqueiropolis" + allowed_domains = ["junqueiropolis.sp.gov.br"] + base_url = "https://www.junqueiropolis.sp.gov.br/portal/diario-oficial" + start_date = date(2020, 10, 24) diff --git a/data_collection/gazette/spiders/sp/sp_lagoinha.py b/data_collection/gazette/spiders/sp/sp_lagoinha.py new file mode 100644 index 000000000..4a942c186 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_lagoinha.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpLagoinhaSpider(BaseInstarSpider): + TERRITORY_ID = "3526308" + name = "sp_lagoinha" + allowed_domains = ["lagoinha.sp.gov.br"] + base_url = "https://www.lagoinha.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 11, 25) diff --git a/data_collection/gazette/spiders/sp/sp_luiziania.py b/data_collection/gazette/spiders/sp/sp_luiziania.py new file mode 100644 index 000000000..ddb406a52 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_luiziania.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpLuizianiaSpider(BaseInstarSpider): + TERRITORY_ID = "3527702" + name = "sp_luiziania" + allowed_domains = ["luiziania.sp.gov.br"] + base_url = "https://www.luiziania.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 4, 6) diff --git a/data_collection/gazette/spiders/sp/sp_macatuba.py b/data_collection/gazette/spiders/sp/sp_macatuba.py index 98ac857c4..ae9e15bb0 100644 --- a/data_collection/gazette/spiders/sp/sp_macatuba.py +++ b/data_collection/gazette/spiders/sp/sp_macatuba.py @@ -1,7 +1,10 @@ +import datetime + from gazette.spiders.base.sigpub import SigpubGazetteSpider class SpMacatubaSpider(SigpubGazetteSpider): name = "sp_macatuba" TERRITORY_ID = "3528007" - CALENDAR_URL = "https://www.diariomunicipal.com.br/macatuba" + CALENDAR_URL = "https://www.diariomunicipal.com.br/macatuba/" + start_date = datetime.date(2018, 4, 25) diff --git a/data_collection/gazette/spiders/sp/sp_macaubal.py b/data_collection/gazette/spiders/sp/sp_macaubal.py new file mode 100644 index 000000000..50f163abc --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_macaubal.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpMacaubalSpider(BaseInstarSpider): + TERRITORY_ID = "3528106" + name = "sp_macaubal" + allowed_domains = ["macaubal.sp.gov.br"] + base_url = "https://macaubal.sp.gov.br/portal/diario-oficial" + start_date = date(2023, 6, 21) diff --git a/data_collection/gazette/spiders/sp/sp_mira_estrela.py b/data_collection/gazette/spiders/sp/sp_mira_estrela.py new file mode 100644 index 000000000..ad926422b --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_mira_estrela.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpMiraEstrelaSpider(BaseInstarSpider): + TERRITORY_ID = "3530003" + name = "sp_mira_estrela" + allowed_domains = ["miraestrela.sp.gov.br"] + base_url = "https://www.miraestrela.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 2, 16) diff --git a/data_collection/gazette/spiders/sp/sp_mirante_do_paranapanema.py b/data_collection/gazette/spiders/sp/sp_mirante_do_paranapanema.py new file mode 100644 index 000000000..49f753a14 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_mirante_do_paranapanema.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpMiranteDoParanapanemaSpider(BaseInstarSpider): + TERRITORY_ID = "3530201" + name = "sp_mirante_do_paranapanema" + allowed_domains = ["mirantedoparanapanema.sp.gov.br"] + base_url = "https://www.mirantedoparanapanema.sp.gov.br/portal/diario-oficial" + start_date = date(2019, 5, 7) diff --git a/data_collection/gazette/spiders/sp/sp_monte_mor.py b/data_collection/gazette/spiders/sp/sp_monte_mor.py new file mode 100644 index 000000000..56cc506dd --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_monte_mor.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpMonteMor(BaseInstarSpider): + TERRITORY_ID = "3531803" + name = "sp_monte_mor" + allowed_domains = ["montemor.sp.gov.br"] + base_url = "https://www.montemor.sp.gov.br/portal/diario-oficial" + start_date = date(2019, 9, 20) diff --git a/data_collection/gazette/spiders/sp/sp_monteiro_lobato.py b/data_collection/gazette/spiders/sp/sp_monteiro_lobato.py new file mode 100644 index 000000000..99fabca21 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_monteiro_lobato.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.dosp import DospGazetteSpider + + +class SpMonteiroLobatoSpider(DospGazetteSpider): + TERRITORY_ID = "3531704" + name = "sp_monteiro_lobato" + code = 5006 + start_date = date(2020, 11, 26) diff --git a/data_collection/gazette/spiders/sp/sp_nhandeara.py b/data_collection/gazette/spiders/sp/sp_nhandeara.py new file mode 100644 index 000000000..238e4b3fb --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_nhandeara.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpNhandearaSpider(BaseInstarSpider): + TERRITORY_ID = "3532603" + name = "sp_nhandeara" + allowed_domains = ["nhandeara.sp.gov.br"] + base_url = "https://www.nhandeara.sp.gov.br/portal/diario-oficial" + start_date = date(2023, 8, 17) diff --git a/data_collection/gazette/spiders/sp/sp_nova_castilho.py b/data_collection/gazette/spiders/sp/sp_nova_castilho.py new file mode 100644 index 000000000..ac41c8d13 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_nova_castilho.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpNovaCastilhoSpider(BaseInstarSpider): + TERRITORY_ID = "3532868" + name = "sp_nova_castilho" + allowed_domains = ["novacastilho.sp.gov.br"] + base_url = "https://www.novacastilho.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 1, 29) diff --git a/data_collection/gazette/spiders/sp/sp_nova_luzitania.py b/data_collection/gazette/spiders/sp/sp_nova_luzitania.py new file mode 100644 index 000000000..b5e5ff0d2 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_nova_luzitania.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpNovaLuzitaniaSpider(BaseInstarSpider): + TERRITORY_ID = "3533304" + name = "sp_nova_luzitania" + allowed_domains = ["novaluzitania.sp.gov.br"] + base_url = "https://www.novaluzitania.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 10, 31) diff --git a/data_collection/gazette/spiders/sp/sp_ourinhos.py b/data_collection/gazette/spiders/sp/sp_ourinhos.py new file mode 100644 index 000000000..2badcb6c1 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_ourinhos.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpOurinhosSpider(BaseInstarSpider): + TERRITORY_ID = "3534708" + name = "sp_ourinhos" + allowed_domains = ["ourinhos.sp.gov.br"] + base_url = "https://www.ourinhos.sp.gov.br/portal/diario-oficial" + start_date = date(2005, 1, 20) diff --git a/data_collection/gazette/spiders/sp/sp_palmital.py b/data_collection/gazette/spiders/sp/sp_palmital.py new file mode 100644 index 000000000..a79094e95 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_palmital.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPalmitalSpider(BaseInstarSpider): + TERRITORY_ID = "3535309" + name = "sp_palmital" + allowed_domains = ["palmital.sp.gov.br"] + base_url = "https://www.palmital.sp.gov.br/portal/diario-oficial" + start_date = date(2005, 6, 11) diff --git a/data_collection/gazette/spiders/sp/sp_pindorama.py b/data_collection/gazette/spiders/sp/sp_pindorama.py new file mode 100644 index 000000000..182544759 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_pindorama.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPindoramaSpider(BaseInstarSpider): + TERRITORY_ID = "3538105" + name = "sp_pindorama" + allowed_domains = ["pindorama.sp.gov.br"] + base_url = "https://www.pindorama.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 4, 29) diff --git a/data_collection/gazette/spiders/sp/sp_planalto.py b/data_collection/gazette/spiders/sp/sp_planalto.py new file mode 100644 index 000000000..1bd2714f6 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_planalto.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPlanaltoSpider(BaseInstarSpider): + TERRITORY_ID = "3539608" + name = "sp_planalto" + allowed_domains = ["planalto.sp.gov.br"] + base_url = "https://www.planalto.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 10, 10) diff --git a/data_collection/gazette/spiders/sp/sp_poloni.py b/data_collection/gazette/spiders/sp/sp_poloni.py new file mode 100644 index 000000000..a0ec7ccc7 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_poloni.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPoloniSpider(BaseInstarSpider): + TERRITORY_ID = "3539905" + name = "sp_poloni" + allowed_domains = ["poloni.sp.gov.br"] + base_url = "https://www.poloni.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 11, 4) diff --git a/data_collection/gazette/spiders/sp/sp_pontes_gestal.py b/data_collection/gazette/spiders/sp/sp_pontes_gestal.py new file mode 100644 index 000000000..63c5cdd63 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_pontes_gestal.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPontesGestalSpider(BaseInstarSpider): + TERRITORY_ID = "3540309" + name = "sp_pontes_gestal" + allowed_domains = ["pontesgestal.sp.gov.br"] + base_url = "https://www.pontesgestal.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 7, 14) diff --git a/data_collection/gazette/spiders/sp/sp_porangaba.py b/data_collection/gazette/spiders/sp/sp_porangaba.py new file mode 100644 index 000000000..cfd05c698 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_porangaba.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPorangabaSpider(BaseInstarSpider): + TERRITORY_ID = "3540507" + name = "sp_porangaba" + allowed_domains = ["porangaba.sp.gov.br"] + base_url = "https://www.porangaba.sp.gov.br/portal/diario-oficial" + start_date = date(2020, 10, 6) diff --git a/data_collection/gazette/spiders/sp/sp_presidente_epitacio.py b/data_collection/gazette/spiders/sp/sp_presidente_epitacio.py new file mode 100644 index 000000000..298ca6031 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_presidente_epitacio.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpPresidenteEpitacioSpider(BaseInstarSpider): + TERRITORY_ID = "3541307" + name = "sp_presidente_epitacio" + allowed_domains = ["presidenteepitacio.sp.gov.br"] + base_url = "https://www.presidenteepitacio.sp.gov.br/portal/diario-oficial" + start_date = date(2019, 10, 18) diff --git a/data_collection/gazette/spiders/sp/sp_santa_maria_da_serra.py b/data_collection/gazette/spiders/sp/sp_santa_maria_da_serra.py new file mode 100644 index 000000000..1445f9785 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_santa_maria_da_serra.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpSantaMariaDaSerraSpider(BaseInstarSpider): + TERRITORY_ID = "3547007" + name = "sp_santa_maria_da_serra" + allowed_domains = ["santamariadaserra.sp.gov.br"] + base_url = "https://www.santamariadaserra.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 3, 16) diff --git a/data_collection/gazette/spiders/sp/sp_sao_jose_dos_campos.py b/data_collection/gazette/spiders/sp/sp_sao_jose_dos_campos.py index 0ee9e160b..b9efc2be3 100644 --- a/data_collection/gazette/spiders/sp/sp_sao_jose_dos_campos.py +++ b/data_collection/gazette/spiders/sp/sp_sao_jose_dos_campos.py @@ -1,57 +1,12 @@ -import dateparser -from scrapy import FormRequest +from datetime import date -from gazette.items import Gazette -from gazette.spiders.base import BaseGazetteSpider +from gazette.spiders.base.dionet import DionetGazetteSpider -class SpSaoJoseDosCamposSpider(BaseGazetteSpider): +class SpSaoJoseDosCamposSpider(DionetGazetteSpider): TERRITORY_ID = "3549904" - - GAZETTE_NAME_CSS = "td:last-child a::text" - GAZETTE_URL_CSS = "td:last-child a::attr(href)" - GAZETTE_DATE_CSS = "td:nth-child(2)::text" - NEXT_PAGE_LINK_CSS = ".paginador_anterior_proxima a" - JAVASCRIPT_POSTBACK_REGEX = r"javascript:__doPostBack\('(.*)',''\)" - - allowed_domains = ["servicos2.sjc.sp.gov.br"] name = "sp_sao_jose_dos_campos" - start_urls = [ - "http://servicos2.sjc.sp.gov.br/servicos/portal_da_transparencia/boletim_municipio.aspx" - ] - - def parse(self, response): - for element in response.css("#corpo table tr"): - if element.css("th").extract(): - continue - - date = element.css(self.GAZETTE_DATE_CSS).extract_first() - date = dateparser.parse(date, languages=["pt"]).date() - url = element.css(self.GAZETTE_URL_CSS).extract_first() - gazette_title = element.css(self.GAZETTE_NAME_CSS).extract_first() - is_extra = "Extra" in gazette_title - - yield Gazette( - date=date, - file_urls=[url], - is_extra_edition=is_extra, - power="executive_legislative", - ) - - for element in response.css(self.NEXT_PAGE_LINK_CSS): - if not element.css("a::text").extract_first() == "Próxima": - continue - - event_target = element.css("a::attr(href)") - event_target = event_target.re(self.JAVASCRIPT_POSTBACK_REGEX).pop() + allowed_domains = ["diariodomunicipio.sjc.sp.gov.br"] + start_date = date(2015, 8, 7) - yield FormRequest.from_response( - response, - callback=self.parse, - formname="aspnetForm", - formxpath="//form[@id='aspnetForm']", - formdata={"__EVENTARGUMENT": "", "__EVENTTARGET": event_target}, - dont_click=True, - dont_filter=True, - method="POST", - ) + BASE_URL = "https://diariodomunicipio.sjc.sp.gov.br" diff --git a/data_collection/gazette/spiders/sp/sp_sao_pedro.py b/data_collection/gazette/spiders/sp/sp_sao_pedro.py new file mode 100644 index 000000000..0a195167f --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_sao_pedro.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpSaoPedroSpider(BaseInstarSpider): + TERRITORY_ID = "3550407" + name = "sp_sao_pedro" + allowed_domains = ["saopedro.sp.gov.br"] + base_url = "https://www.saopedro.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 5, 25) diff --git a/data_collection/gazette/spiders/sp/sp_sebastianopolis_do_sul.py b/data_collection/gazette/spiders/sp/sp_sebastianopolis_do_sul.py new file mode 100644 index 000000000..3b2d166c4 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_sebastianopolis_do_sul.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpSebastianopolisDoSulSpider(BaseInstarSpider): + TERRITORY_ID = "3551306" + name = "sp_sebastianopolis_do_sul" + allowed_domains = ["sebastianopolisdosul.sp.gov.br"] + base_url = "https://www.sebastianopolisdosul.sp.gov.br/portal/diario-oficial" + start_date = date(2020, 5, 26) diff --git a/data_collection/gazette/spiders/sp/sp_tapirai.py b/data_collection/gazette/spiders/sp/sp_tapirai.py new file mode 100644 index 000000000..30b8b4d59 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_tapirai.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpTapiraiSpider(BaseInstarSpider): + TERRITORY_ID = "3553500" + name = "sp_tapirai" + allowed_domains = ["tapirai.sp.gov.br"] + base_url = "https://www.tapirai.sp.gov.br/portal/diario-oficial" + start_date = date(2019, 9, 3) diff --git a/data_collection/gazette/spiders/sp/sp_taquaral.py b/data_collection/gazette/spiders/sp/sp_taquaral.py new file mode 100644 index 000000000..e5dbbad77 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_taquaral.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpTaquaralSpider(BaseInstarSpider): + TERRITORY_ID = "3553658" + name = "sp_taquaral" + allowed_domains = ["taquaral.sp.gov.br"] + base_url = "https://www.taquaral.sp.gov.br/portal/diario-oficial" + start_date = date(2017, 1, 3) diff --git a/data_collection/gazette/spiders/sp/sp_terra_roxa.py b/data_collection/gazette/spiders/sp/sp_terra_roxa.py new file mode 100644 index 000000000..614333bb2 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_terra_roxa.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpTerraRoxaSpider(BaseInstarSpider): + TERRITORY_ID = "3554409" + name = "sp_terra_roxa" + allowed_domains = ["terraroxa.sp.gov.br"] + base_url = "https://www.terraroxa.sp.gov.br/portal/diario-oficial" + start_date = date(2022, 6, 29) diff --git a/data_collection/gazette/spiders/sp/sp_tremembe.py b/data_collection/gazette/spiders/sp/sp_tremembe.py new file mode 100644 index 000000000..f69feb5b9 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_tremembe.py @@ -0,0 +1,10 @@ +from datetime import date + +from gazette.spiders.base.dosp import DospGazetteSpider + + +class SpTremembeSpider(DospGazetteSpider): + TERRITORY_ID = "3554805" + name = "sp_tremembe" + code = 5264 + start_date = date(2016, 5, 11) diff --git a/data_collection/gazette/spiders/sp/sp_turiuba.py b/data_collection/gazette/spiders/sp/sp_turiuba.py new file mode 100644 index 000000000..2766ea227 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_turiuba.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpTuriubaSpider(BaseInstarSpider): + TERRITORY_ID = "3555208" + name = "sp_turiuba" + allowed_domains = ["turiuba.sp.gov.br"] + base_url = "https://www.turiuba.sp.gov.br/portal/diario-oficial" + start_date = date(2020, 6, 19) diff --git a/data_collection/gazette/spiders/sp/sp_uniao_paulista.py b/data_collection/gazette/spiders/sp/sp_uniao_paulista.py new file mode 100644 index 000000000..37cd0ed85 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_uniao_paulista.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpUniaoPaulistaSpider(BaseInstarSpider): + TERRITORY_ID = "3555703" + name = "sp_uniao_paulista" + allowed_domains = ["uniaopaulista.sp.gov.br"] + base_url = "https://www.uniaopaulista.sp.gov.br/portal/diario-oficial" + start_date = date(2023, 1, 11) diff --git a/data_collection/gazette/spiders/sp/sp_valparaiso.py b/data_collection/gazette/spiders/sp/sp_valparaiso.py new file mode 100644 index 000000000..7b767a212 --- /dev/null +++ b/data_collection/gazette/spiders/sp/sp_valparaiso.py @@ -0,0 +1,11 @@ +from datetime import date + +from gazette.spiders.base.instar import BaseInstarSpider + + +class SpValparaisoSpider(BaseInstarSpider): + TERRITORY_ID = "3556305" + name = "sp_valparaiso" + allowed_domains = ["valparaiso.sp.gov.br"] + base_url = "https://www.valparaiso.sp.gov.br/portal/diario-oficial" + start_date = date(2021, 11, 19) diff --git a/data_collection/gazette/spiders/to/to_aguiarnopolis.py b/data_collection/gazette/spiders/to/to_aguiarnopolis.py new file mode 100644 index 000000000..fb07a28b5 --- /dev/null +++ b/data_collection/gazette/spiders/to/to_aguiarnopolis.py @@ -0,0 +1,11 @@ +import datetime + +from gazette.spiders.base.diariooficialbr import BaseDiarioOficialBRSpider + + +class ToAguiarnopolisSpider(BaseDiarioOficialBRSpider): + TERRITORY_ID = "1700301" + name = "to_aguiarnopolis" + allowed_domains = ["diariooficial.aguiarnopolis.to.gov.br"] + start_date = datetime.date(2020, 1, 23) + BASE_URL = "https://diariooficial.aguiarnopolis.to.gov.br" diff --git a/data_collection/gazette/spiders/to/to_camposlindos.py b/data_collection/gazette/spiders/to/to_camposlindos.py new file mode 100644 index 000000000..17413ae7e --- /dev/null +++ b/data_collection/gazette/spiders/to/to_camposlindos.py @@ -0,0 +1,11 @@ +import datetime + +from gazette.spiders.base.diariooficialbr import BaseDiarioOficialBRSpider + + +class ToCamposLindosSpider(BaseDiarioOficialBRSpider): + TERRITORY_ID = "1703842" + name = "to_campos_lindos" + allowed_domains = ["camposlindos.diariooficialbr.com.br"] + start_date = datetime.date(2021, 4, 30) + BASE_URL = "https://camposlindos.diariooficialbr.com.br" diff --git a/data_collection/gazette/spiders/to/to_goiatins.py b/data_collection/gazette/spiders/to/to_goiatins.py new file mode 100644 index 000000000..abe921ccf --- /dev/null +++ b/data_collection/gazette/spiders/to/to_goiatins.py @@ -0,0 +1,11 @@ +import datetime + +from gazette.spiders.base.diariooficialbr import BaseDiarioOficialBRSpider + + +class ToGoiatinsSpider(BaseDiarioOficialBRSpider): + TERRITORY_ID = "1709005" + name = "to_goiatins" + allowed_domains = ["goiatins.diariooficialbr.com.br"] + start_date = datetime.date(2021, 1, 14) + BASE_URL = "https://goiatins.diariooficialbr.com.br" diff --git a/data_collection/gazette/spiders/to/to_peixe.py b/data_collection/gazette/spiders/to/to_peixe.py new file mode 100644 index 000000000..39b4a4af3 --- /dev/null +++ b/data_collection/gazette/spiders/to/to_peixe.py @@ -0,0 +1,11 @@ +import datetime + +from gazette.spiders.base.diariooficialbr import BaseDiarioOficialBRSpider + + +class ToPeixeSpider(BaseDiarioOficialBRSpider): + TERRITORY_ID = "1716604" + name = "to_peixe" + allowed_domains = ["peixe.diariooficialbr.com.br"] + start_date = datetime.date(2022, 3, 30) + BASE_URL = "https://peixe.diariooficialbr.com.br" diff --git a/data_collection/gazette/spiders/to/to_sampaio.py b/data_collection/gazette/spiders/to/to_sampaio.py new file mode 100644 index 000000000..7ad317d13 --- /dev/null +++ b/data_collection/gazette/spiders/to/to_sampaio.py @@ -0,0 +1,54 @@ +import datetime as dt +from urllib.parse import urlencode + +import scrapy + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + + +class ToSampaioSpider(BaseGazetteSpider): + TERRITORY_ID = "1718808" + name = "to_sampaio" + allowed_domains = ["sampaio.to.gov.br"] + start_date = dt.date(2020, 5, 4) + + def get_url(self, page=1): + url_params = { + "pagina": page, + "data_inicial": self.start_date.strftime("%d/%m/%Y"), + "data_final": self.end_date.strftime("%d/%m/%Y"), + } + + return ( + f"https://diariooficial.sampaio.to.gov.br/pesquisar?{urlencode(url_params)}" + ) + + def start_requests(self): + yield scrapy.Request(url=self.get_url()) + + def parse(self, response, current_page=1): + editions = response.css("#resultados tr.tr_table_list_loop") + for edition in editions: + raw_date = edition.xpath("./td[3]/text()").get() + date = dt.datetime.strptime(raw_date, "%d/%m/%Y").date() + + gazette_url = edition.css("a::attr(href)").get() + title = edition.xpath("./td[1]/a/div//text()") + is_extra_edition = "extra" in " ".join(title.getall()).lower() + edition_number = title.re_first(r"\d+") + yield Gazette( + date=date, + file_urls=[gazette_url], + edition_number=edition_number, + is_extra_edition=is_extra_edition, + power="executive_legislative", + ) + + pagination = response.css("#paginacao_area a:contains('Próxima')").get() + if pagination: + next_page = current_page + 1 + yield scrapy.Request( + url=self.get_url(page=next_page), + cb_kwargs={"current_page": next_page}, + ) diff --git a/data_collection/gazette/spiders/to/to_tocantinia.py b/data_collection/gazette/spiders/to/to_tocantinia.py new file mode 100644 index 000000000..a962df17c --- /dev/null +++ b/data_collection/gazette/spiders/to/to_tocantinia.py @@ -0,0 +1,11 @@ +import datetime + +from gazette.spiders.base.diariooficialbr import BaseDiarioOficialBRSpider + + +class ToTocantiniaSpider(BaseDiarioOficialBRSpider): + TERRITORY_ID = "1721109" + name = "to_tocantinia" + allowed_domains = ["tocantinia.diariooficialbr.com.br"] + start_date = datetime.date(2017, 3, 21) + BASE_URL = "https://tocantinia.diariooficialbr.com.br" diff --git a/data_collection/gazette/utils.py b/data_collection/gazette/utils.py new file mode 100644 index 000000000..7cdf715c4 --- /dev/null +++ b/data_collection/gazette/utils.py @@ -0,0 +1,24 @@ +from sqlalchemy import create_engine, select +from sqlalchemy.orm import sessionmaker + +from gazette.database.models import QueridoDiarioSpider + + +def get_enabled_spiders(*, database_url, start_date=None, end_date=None): + """Return list of all currently enabled spiders within date period. + If start_date and/or end_date are provided, it will return only + the enabled spiders that are within the requested date period. + """ + engine = create_engine(database_url) + Session = sessionmaker(bind=engine) + session = Session() + + stmt = select(QueridoDiarioSpider).where(QueridoDiarioSpider.enabled.is_(True)) + if start_date is not None: + stmt = stmt.where(QueridoDiarioSpider.date_from <= start_date) + if end_date is not None: + stmt = stmt.where(QueridoDiarioSpider.date_to >= end_date) + + result = session.execute(stmt) + for spider in result.scalars(): + yield spider.spider_name diff --git a/data_collection/requirements.in b/data_collection/requirements.in index 48dda1918..41cd69666 100644 --- a/data_collection/requirements.in +++ b/data_collection/requirements.in @@ -12,4 +12,5 @@ python-decouple scrapy scrapy-zyte-smartproxy SQLAlchemy -spidermon \ No newline at end of file +spidermon +w3lib \ No newline at end of file diff --git a/scripts/scheduler.py b/data_collection/scheduler.py similarity index 71% rename from scripts/scheduler.py rename to data_collection/scheduler.py index 772dcf987..d79ad82c0 100644 --- a/scripts/scheduler.py +++ b/data_collection/scheduler.py @@ -1,9 +1,13 @@ import datetime import click -import enabled_spiders from decouple import config from scrapinghub import ScrapinghubClient +from sqlalchemy import create_engine, update +from sqlalchemy.orm import sessionmaker + +from gazette.database.models import QueridoDiarioSpider +from gazette.utils import get_enabled_spiders YESTERDAY = datetime.date.today() - datetime.timedelta(days=1) @@ -73,9 +77,9 @@ def schedule_spider(spider_name, start_date, end_date): } job_args = {} - if start_date is not None: + if start_date: job_args["start_date"] = start_date - if end_date is not None: + if end_date: job_args["end_date"] = end_date spider = project.spiders.get(spider_name) @@ -85,6 +89,48 @@ def schedule_spider(spider_name, start_date, end_date): ) +@cli.command() +@click.option( + "--spider_name", + required=True, + help="Spider name", +) +def enable_spider(spider_name): + engine = create_engine(config("QUERIDODIARIO_DATABASE_URL")) + Session = sessionmaker(bind=engine) + session = Session() + + stmt = ( + update(QueridoDiarioSpider) + .where(QueridoDiarioSpider.spider_name == spider_name) + .values(enabled=True) + ) + + session.execute(stmt) + session.commit() + + +@cli.command() +@click.option( + "--spider_name", + required=True, + help="Spider name", +) +def disable_spider(spider_name): + engine = create_engine(config("QUERIDODIARIO_DATABASE_URL")) + Session = sessionmaker(bind=engine) + session = Session() + + stmt = ( + update(QueridoDiarioSpider) + .where(QueridoDiarioSpider.spider_name == spider_name) + .values(enabled=False) + ) + + session.execute(stmt) + session.commit() + + @cli.command() @click.option( "--start_date", @@ -106,7 +152,9 @@ def schedule_job(start_date, full, spider_name): @cli.command() def schedule_enabled_spiders(): - for spider_name in enabled_spiders.SPIDERS: + for spider_name in get_enabled_spiders( + database_url=config("QUERIDODIARIO_DATABASE_URL"), start_date=YESTERDAY + ): _schedule_job(start_date=YESTERDAY, full=False, spider_name=spider_name) @@ -116,7 +164,9 @@ def last_month_schedule_enabled_spiders(): # day as the physical one (sometimes it take more than two days and other weeks) # so running this command will ensure that we get the data of the latest month start_date = datetime.date.today() - datetime.timedelta(days=31) - for spider_name in enabled_spiders.SPIDERS: + for spider_name in get_enabled_spiders( + database_url=config("QUERIDODIARIO_DATABASE_URL"), start_date=start_date + ): _schedule_job(start_date=start_date, full=False, spider_name=spider_name) @@ -126,7 +176,9 @@ def last_month_schedule_enabled_spiders(): ) @cli.command() def schedule_all_spiders_by_date(start_date): - for spider_name in enabled_spiders.SPIDERS: + for spider_name in get_enabled_spiders( + database_url=config("QUERIDODIARIO_DATABASE_URL"), start_date=start_date + ): _schedule_job(start_date, full=False, spider_name=spider_name) diff --git a/data_collection/templates/spiders/qdtemplate.tmpl b/data_collection/templates/spiders/qdtemplate.tmpl new file mode 100644 index 000000000..c32b4a132 --- /dev/null +++ b/data_collection/templates/spiders/qdtemplate.tmpl @@ -0,0 +1,29 @@ +from datetime import date + +from gazette.items import Gazette +from gazette.spiders.base import BaseGazetteSpider + +class UFMunicipioSpider(BaseGazetteSpider): + name = "$name" + TERRITORY_ID = "" + allowed_domains = ["$domain"] + start_urls = ["$url"] + start_date = date() + + def parse(self, response): + # Lógica de extração de metadados + + # partindo de response ... + # + # ... o que deve ser feito para coletar DATA DO DIÁRIO? + # ... o que deve ser feito para coletar NÚMERO DA EDIÇÃO? + # ... o que deve ser feito para coletar se a EDIÇÃO É EXTRA? + # ... o que deve ser feito para coletar a URL DE DOWNLOAD do arquivo? + + yield Gazette( + date = date(), + edition_number = "", + is_extra_edition = False, + file_urls = [""], + power = "executive", + ) \ No newline at end of file diff --git a/docs/CONTRIBUTING-en-US.md b/docs/CONTRIBUTING-en-US.md index 32225356d..eee2227de 100644 --- a/docs/CONTRIBUTING-en-US.md +++ b/docs/CONTRIBUTING-en-US.md @@ -10,11 +10,12 @@ Already read? So let's go to the specific information of this repository: - [Windows](#windows) - [Automated code formatting](#automated-code-formatting) - [Maintaining](#maintaining) + - [Scraper code review](#scraper-code-review) ## Challenges The main challenge of this repository is to have more and more scrapers from websites that publish official gazettes, aiming to reach the 5570 Brazilian municipalities. We use the [City Expansion Board](https://github.com/orgs/okfn-brasil/projects/12/views/13) to organize this challenge progress. Consult it to find relevant tasks you can contribute to. -To help you develop, use the guidelines on the page about [how to write a new scraper](https://docs.queridodiario.ok.org.br/pt/latest/escrevendo-um-novo-spider.html) available at [Querido Diario's technical documentation](https://docs.queridodiario.ok.org.br/pt/latest/). +To help you develop, use the guidelines on the page about [how to write a new scraper](https://docs.queridodiario.ok.org.br/en/latest/writing-a-new-spider.html) available at [Querido Diario's technical documentation](https://docs.queridodiario.ok.org.br/en/latest/). ## How to setup the development environment Scrapers are developed using [Python](https://docs.python.org/3/) and [Scrapy](https://scrapy.org) framework. You can check [how to install Python](https://www.python.org/downloads/) on your operating system and learn more about Scrapy [in this tutorial](https://docs.scrapy.org/en/latest/intro/tutorial.html). With Python on your computer, follow the development environment setup step-by-step: @@ -46,14 +47,25 @@ pre-commit install _Attention:_ These steps need to be executed only the first time you interact with the project during the environment setup. After that, just activate the virtual environment (step 3) every time you use or contribute to the repository. ### Windows -The following instructions were tried on Windows 10. -1. [Install Microsoft Visual Build Tools](https://visualstudio.microsoft.com/downloads/). When starting the installation, you need to select `C++ build tools` in the loading tab and also `Windows 10 SDK` and `MSVC v142 - VS 2019 C++ x64/x86 build tools` in the individual components tab. +#### Using Windows terminal +The following instructions were tried on Windows 10 and 11. Remember that if you want to integrate with the [querido-diario-data-processing](https://github.com/okfn-brasil/querido-diario- data-processing) it is preferable that your environment configuration is done [using WSL](CONTRIBUTING.md#using-wsl). + +1. Install [Visual Studio Community](https://visualstudio.microsoft.com/pt-br/downloads/). Before the installation, you need to select in the **Individual Components** tab "Windows 10 SDK" or "11" (depending on your system) and "MSVC v143 build tools - VS 2022 C++ x64/x86 ( v14.32-17.4)". Note that Windows 10 SDK and MSVC v142 - VS 2019 C++ x64/x86 build tools versions will often be updated, so look for similar items under Individual Components to perform the installation (i.e. newer and compatible with your system). Under **Workloads**, select “Desktop development with C++”. Install the updates, close the application and follow the next steps. + 2. Follow all [steps used in Linux](#linux), except for item 3. In it, the command should be: ```console .venv/Scripts/activate.bat ``` _Note_: In Windows commands, the direction of the slash (`/` or `\`) may vary depending on the use of [WSL](https://learn.microsoft.com/en-us/windows/wsl/about). +#### Using WSL + +Open a new Ubuntu terminal and clone the forked [querido-diario](https://github.com/okfn-brasil/querido-diario) repository + +Follow the instructions regarding installation using [Linux](CONTRIBUTING.md#em-linux). + +[This tutorial](https://github.com/Luisa-Coelho/qd-data-processing/blob/readme_update/wsl_windows.md) will help you install and configure WSL on your Windows machine. + ## Automated code formatting Project uses [Black](https://github.com/psf/black) as an automated tool to format and check code style and [isort](https://github.com/pycqa/isort) to sort the imports. CI will fail if your code are not correctly formatted according these tools. @@ -61,3 +73,14 @@ If you followed the setup instructions, installing pre-commit hooks, it is possi # Maintaining Maintainers must follow the guidelines in Querido Diário's [Guide for Maintainers](https://github.com/okfn-brasil/querido-diario-comunidade/blob/main/.github/CONTRIBUTING-en-US.md#maintaining). + +## Scraper code review + +Every time a PR for scrapers is opened, the [validation list](https://github.com/okfn-brasil/querido-diario/blob/main/.github/pull_request_template.md) is triggered. The contributing person is expected to carry out all the checks contained in the checklist, but it is also reviewer's responsibility to check them too. + +The checklist already covers more objective aspects such as the code model, mandatory fields and test collection files. However, other aspects must be taken into consideration in the review interaction. Examples: + +- Python code standard regarding the use of double quotes (`"example"` / `"example='texto'"`) +- Good practices in using XPath or selectors, avoiding unnecessary "turns" +- Readability: if you had difficulty understanding a section, check if this code can be improved +- Think review's interaction as a progression in the evolution of the person contributing to the project, giving *feedback* as comments on the necessary lines and pointing out general issues or reinforcing specific issues. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index b1644f9f9..958cafc03 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -5,16 +5,30 @@ O Querido Diário possui um [Guia para Contribuição](https://github.com/okfn-b Já leu? Então vamos às informações específicas deste repositório: - [Desafios](#desafios) +- [Como configurar o ambiente de desenvolvimento](#como-configurar-o-ambiente-de-desenvolvimento) +- [Desafios](#desafios) + - [Labels](#labels) + - [Metas do Repositório](#metas-do-repositório) - [Como configurar o ambiente de desenvolvimento](#como-configurar-o-ambiente-de-desenvolvimento) - [Em Linux](#em-linux) - [Em Windows](#em-windows) -- [Formatação automática de código](#formação-automática-de-código) + - [Formatação automática de código](#formação-automática-de-código) - [Mantendo](#mantendo) + - [Revisão de raspadores](#revisão-de-raspadores) ## Desafios -O principal desafio deste repositório é ter cada vez mais raspadores de sites publicadores de diários oficiais, visando atingir os 5570 municípios brasileiros. Utilizamos o [Quadro de Expansão de Cidades](https://github.com/orgs/okfn-brasil/projects/12/views/13) para organizar o progresso do desafio. Consulte-o para localizar tarefas relevantes com as quais você pode contribuir. +O principal desafio deste repositório é ter cada vez mais raspadores de sites publicadores de diários oficiais, visando atingir os 5570 municípios brasileiros. Utilizamos o [Quadro de Expansão de Cidades](https://github.com/orgs/okfn-brasil/projects/12/views/13) para organizar o progresso do desafio. Consulte-o para localizar tarefas relevantes com as quais você pode contribuir. + +Para te ajudar a desenvolver, utilize as orientações da página ["contrbuindo com raspadores"](https://docs.queridodiario.ok.org.br/pt-br/latest/contribuindo/raspadores.html#contribuindo-com-raspadores) disponível na [documentação técnica do Querido Diário](https://docs.queridodiario.ok.org.br/). + + +### Labels +As issues são marcadas com etiquetas, um recurso que serve para classificar issues de mesmo tipo, sinalizar se há algum empecilho ou direcionar a comunidade para tarefas mais do perfil delas. No geral, adotamos *labels* comuns a outros projetos de código aberto como "docs", "bug", "dependencies", mas também temos algumas específicas. Veja quais são na seção de [labels](https://github.com/okfn-brasil/querido-diario/labels) + + +### Metas do Repositório +Para garantir que nossos esforços estejam alinhados e focados em objetivos claros, definimos metas para o desenvolvimento e expansão do projeto. Estas metas são revisadas e atualizadas regularmente, refletindo as prioridades e os desafios que enfrentamos. Convidamos as pessoas contribuidoras a se familiarizarem com estas metas, disponíveis em nosso [Quadro de Metas](https://github.com/okfn-brasil/querido-diario/milestones). Sua contribuição pode ser ainda mais valiosa quando alinhada com estas direções. -Para te ajudar a desenvolver, utilize as orientações da página sobre [como escrever um novo raspador](https://docs.queridodiario.ok.org.br/pt/latest/escrevendo-um-novo-spider.html) disponível na [documentação técnica do Querido Diário](https://docs.queridodiario.ok.org.br/pt/latest/). ## Como configurar o ambiente de desenvolvimento Os raspadores são desenvolvidos usando [Python](https://docs.python.org/3/) e o framework [Scrapy](https://scrapy.org). Você pode conferir [como instalar Python](https://www.python.org/downloads/) em seu sistema operacional e conhecer mais sobre o Scrapy [neste tutorial](https://docs.scrapy.org/en/latest/intro/tutorial.html). Com Python em seu computador, siga o passo-a-passo da configuração do ambiente de desenvolvimento: @@ -46,20 +60,44 @@ pre-commit install _Atenção:_ Estas etapas precisam ser executadas apenas na primeira vez que interagir com o projeto durante a preparação do ambiente. Depois disso, basta ativar o ambiente virtual (passo 3) cada vez que for utilizar ou contribuir com o repositório. ### Em Windows -As instruções a seguir foram experimentadas em Windows 10. -1. [Instale o Microsoft Visual Build Tools](https://visualstudio.microsoft.com/downloads/). Ao iniciar a instalação, você precisa selecionar `C++ build tools` na aba de carregamento e também `Windows 10 SDK` e `MSVC v142 - VS 2019 C++ x64/x86 build tools` na aba de componentes individuais. + +#### Pelo terminal do Windows +As instruções a seguir foram experimentadas em Windows 10 e 11. Lembre-se que caso deseje realizar uma integração com o repositório [querido-diario-data-processing](https://github.com/okfn-brasil/querido-diario-data-processing) é preferível que a sua configuração de ambiente seja feita [utilizando WSL](CONTRIBUTING.md#utilizando-wsl). + +1. Instale o [Visual Studio Comunidade](https://visualstudio.microsoft.com/pt-br/downloads/) . Ao abrir o terminal do instalado do Visual Studio, antes de instalar, você precisa selecionar na aba de **Componentes Individuais** "SDK do Windows 10" ou "11" (a depender do seu sistema) e "Ferramentas de build do MSVC v143 - VS 2022 C++ x64/x86 (v14.32-17.4)". Note que muitas vezes as versões Windows 10 SDK e MSVC v142 - VS 2019 C++ x64/x86 build tools serão atualizadas, portanto procure por itens similares em Componentes individuais para realizar a instalação (ou seja, mais novos e compatíveis com o seu sitema). Em **Cargas de Trabalho**, selecione “Desenvolvimento para desktop com C++”. Instale as atualizações, feche o aplicativo e siga os próximos passos. + 2. Siga todos os [passos usados no Linux](#em-linux), com exceção do item 3. Nele, o comando deve ser: ```console .venv/Scripts/activate.bat ``` _Observação_: Nos comandos em Windows, o sentido da barra (`/` ou `\`) pode variar a depender da utilização de [WSL](https://learn.microsoft.com/pt-br/windows/wsl/about). +#### Utilizando WSL + +Abra um novo terminal do Ubuntu e faça o clone do repositório forked do [querido-diario](https://github.com/okfn-brasil/querido-diario). + +Siga as instruções referentes À instalação utilizando [Linux](CONTRIBUTING.md#em-linux). + +[Este tutorial](https://github.com/Luisa-Coelho/qd-data-processing/blob/readme_update/wsl_windows.md) vai te ajudar na instalação e configuração do WSL na sua máquina Windows. + + ## Formação automática de código -O projeto usa [Black](https://github.com/psf/black) como ferramenta de automação para formatar e verificar o estilo do código e usa [isort](https://github.com/pycqa/isort) para organizar as importações. A integração contínua (CI) falhará se seu código não estiver adequadamente formatado. +O projeto usa [Black](https://github.com/psf/black) como ferramenta de automação para formatar e verificar o estilo do código e usa [isort](https://github.com/pycqa/isort) para organizar as importações. A integração contínua (CI) falhará se seu código não estiver adequadamente formatado. Mas, se você seguiu as orientações para configurar o ambiente de desenvolvimento corretamente, especialmente instalando o `pre-commit`, é possível que você nunca precise corrigir a formatação manualmente. O `pre-commit` fará isso por você, já que executa antes de cada `commit`. Ainda, caso queira verificar todos os arquivos no projeto, use `make format` para evocar as ferramentas. _Observação_: `make` não é disponibilizado nativamente em Windows, sendo necessário instalá-lo para a utilização sugerida. # Mantendo -As pessoas mantenedoras devem seguir as diretrizes do [Guia para Mantenedoras](https://github.com/okfn-brasil/querido-diario-comunidade/blob/main/.github/CONTRIBUTING.md#mantendo) do Querido Diário. \ No newline at end of file +As pessoas mantenedoras devem seguir as diretrizes do [Guia para Mantenedoras](https://github.com/okfn-brasil/querido-diario-comunidade/blob/main/.github/CONTRIBUTING.md#mantendo) do Querido Diário. + +## Revisão de raspadores + +Toda vez que uma PR para raspadores é aberta, a [lista de validações](https://github.com/okfn-brasil/querido-diario/blob/main/.github/pull_request_template.md) é acionada. É esperado que a pessoa contribuidora faça todas as verificações contidas na checklist, mas também é responsabilidade da pessoa revisora conferir os itens. + +A checklist já cobre aspectos mais objetivos como o modelo do código, os campos obrigatórios e os arquivos de coleta-teste. Entretanto, outros aspectos devem ser levados em consideração na interação de revisão. Exemplos: + +- Padrão de código Python quanto ao uso de aspas duplas (`"exemplo"` / `"exemplo='texto'"`) +- Boas práticas no uso do XPath ou seletores evitando "voltas" desnecessárias +- Legibilidade: se você teve dificuldade para entender algum trecho, verifique se este código pode ser melhorado +- Pense a interação de revisão como uma progressão da evolução da pessoa contribuidora junto ao projeto, dando *feedbacks* como comentários nas linhas necessárias e apontando questões gerais ou reforçando questões pontuais. diff --git a/docs/README-en-US.md b/docs/README-en-US.md index aceb17922..84910fa16 100644 --- a/docs/README-en-US.md +++ b/docs/README-en-US.md @@ -1,118 +1,118 @@ -**English (US)** | [Português (BR)](/docs/README.md) - -

- Querido Diário - -

- -# Querido Diário -Within the [Querido Diário ecosystem](https://github.com/okfn-brasil/querido-diario-comunidade/blob/main/.github/CONTRIBUTING-en-US.md#ecosystem), this repository is responsible for **scraping official gazettes publishing sites** - -Find out more about [technologies](https://queridodiario.ok.org.br/tecnologia) and [history](https://queridodiario.ok.org.br/sobre) of the project on the [Querido Diário website](https://queridodiario.ok.org.br) - -# Summary -- [How to contribute](#how-to-contribute) -- [Development Environment](#development-environment) -- [How to run](#how-to-run) -- [Troubleshooting](#troubleshooting) -- [Support](#support) -- [Thanks](#thanks) -- [Open Knowledge Brazil](#open-knowledge-brazil) -- [License](#license) - -# How to contribute -

- - catarse - -

- -Thank you for considering contributing to Querido Diário! :tada: - -You can find how to do it at [CONTRIBUTING-en-US.md](/docs/CONTRIBUTING-en-US.md)! - -Also, check the [Querido Diário documentation](https://docs.queridodiario.ok.org.br/pt/latest/index.html) to help you. - -# Development Environment -You need to have [Python](https://docs.python.org/3/) (+3.0) and [Scrapy](https://scrapy.org) framework installed. - -The commands below set it up in Linux operating system. They consist of creating a [virtual Python environment](https://docs.python.org/3/library/venv.html), installing the requirements listed in `requirements-dev` and the code standardization tool `pre-commit`. - -``` console -python3 -m venv .venv -source .venv/bin/activate -pip install -r data_collection/requirements-dev.txt -pre-commit install -``` - -> Configuration on other operating systems is available at ["how to setup the development environment"](/docs/CONTRIBUTING-en-US.md#how-to-setup-the-development-environment), including more details for those who want to contribute to the repository. - -# How to run -To try running a scraper already integrated into the project or to test what you are developing, follow the commands: - -1. If you haven't already done so, activate the virtual environment in the `/querido-diario` directory: -``` console -source .venv/bin/activate -``` -2. Go to the `data_collection` directory: -```console -cd data_collection -``` -3. Check the available scrapers list: -```console -scrapy list -``` -4. Run a listed scraper: -```console -scrapy crawl //example: scrapy crawl ba_acajutiba -``` -5. The official gazettes collected from scraping will be saved in the `data_collection/data` folder - -6. When executing item 4, the scraper will collect all official gazettes from the publishing site of that municipality since the first digital edition. For smaller runs, use flags in the run command: - -- `start_date=YYYY-MM-DD`: will set the collecting start date. -```console -scrapy crawl -a start_date= -``` -- `end_date=YYYY-MM-DD`: will set the collecting end date. If omitted, it will assume the date of the day it is being executed. -```console -scrapy crawl -a end_date= -``` - -# Troubleshooting -Check out the [troubleshooting](/docs/TROUBLESHOOTING-en-US.md) file to resolve the most common issues with project environment setup. - -# Support -

- - Discord Invite - -

- -Join our [community server](https://go.ok.org.br/discord) for exchanges about projects, questions, requests for help with contributions and talk about civic innovation in general. - -# Thanks -This project is maintained by Open Knowledge Brazil and made possible thanks to the technical communities, the [Ambassadors of Civic Innovation](https://embaixadoras.ok.org.br/), volunteers and financial donors, in addition to partner universities, companies supporters and funders. - -Meet [who supports Querido Diario](https://queridodiario.ok.org.br/apoie#quem-apoia). - -# Open Knowledge Brazil -

- - Twitter Follow - - - Instagram Follow - - - LinkedIn Follow - -

- -[Open Knowledge Brazil](https://ok.org.br/) is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analysis and data journalism to promote free knowledge in the various fields of society. - -All work produced by OKBR is openly and freely available. - -# License - -Code licensed under the [MIT License](/LICENSE.md). \ No newline at end of file +**English (US)** | [Português (BR)](/docs/README.md) + +

+ Querido Diário + +

+ +# Querido Diário +Within the [Querido Diário ecosystem](https://github.com/okfn-brasil/querido-diario-comunidade/blob/main/.github/CONTRIBUTING-en-US.md#ecosystem), this repository is responsible for **scraping official gazettes publishing sites** + +Find out more about [technologies](https://queridodiario.ok.org.br/tecnologia) and [history](https://queridodiario.ok.org.br/sobre) of the project on the [Querido Diário website](https://queridodiario.ok.org.br) + +# Summary +- [How to contribute](#how-to-contribute) +- [Development Environment](#development-environment) +- [How to run](#how-to-run) +- [Troubleshooting](#troubleshooting) +- [Support](#support) +- [Thanks](#thanks) +- [Open Knowledge Brazil](#open-knowledge-brazil) +- [License](#license) + +# How to contribute +

+ + catarse + +

+ +Thank you for considering contributing to Querido Diário! :tada: + +You can find how to do it at [CONTRIBUTING-en-US.md](/docs/CONTRIBUTING-en-US.md)! + +Also, check the [Querido Diário documentation](https://docs.queridodiario.ok.org.br/en/latest/) to help you. + +# Development Environment +You need to have [Python](https://docs.python.org/3/) (+3.0) and [Scrapy](https://scrapy.org) framework installed. + +The commands below set it up in Linux operating system. They consist of creating a [virtual Python environment](https://docs.python.org/3/library/venv.html), installing the requirements listed in `requirements-dev` and the code standardization tool `pre-commit`. + +``` console +python3 -m venv .venv +source .venv/bin/activate +pip install -r data_collection/requirements-dev.txt +pre-commit install +``` + +> Configuration on other operating systems is available at ["how to setup the development environment"](/docs/CONTRIBUTING-en-US.md#how-to-setup-the-development-environment), including more details for those who want to contribute to the repository. + +# How to run +To try running a scraper already integrated into the project or to test what you are developing, follow the commands: + +1. If you haven't already done so, activate the virtual environment in the `/querido-diario` directory: +``` console +source .venv/bin/activate +``` +2. Go to the `data_collection` directory: +```console +cd data_collection +``` +3. Check the available scrapers list: +```console +scrapy list +``` +4. Run a listed scraper: +```console +scrapy crawl //example: scrapy crawl ba_acajutiba +``` +5. The official gazettes collected from scraping will be saved in the `data_collection/data` folder + +6. When executing item 4, the scraper will collect all official gazettes from the publishing site of that municipality since the first digital edition. For smaller runs, use flags in the run command: + +- `start_date=YYYY-MM-DD`: will set the collecting start date. +```console +scrapy crawl -a start_date= +``` +- `end_date=YYYY-MM-DD`: will set the collecting end date. If omitted, it will assume the date of the day it is being executed. +```console +scrapy crawl -a end_date= +``` + +# Troubleshooting +Check out the [troubleshooting](/docs/TROUBLESHOOTING-en-US.md) file to resolve the most common issues with project environment setup. + +# Support +

+ + Discord Invite + +

+ +Join our [community server](https://go.ok.org.br/discord) for exchanges about projects, questions, requests for help with contributions and talk about civic innovation in general. + +# Thanks +This project is maintained by Open Knowledge Brazil and made possible thanks to the technical communities, the [Ambassadors of Civic Innovation](https://embaixadoras.ok.org.br/), volunteers and financial donors, in addition to partner universities, companies supporters and funders. + +Meet [who supports Querido Diario](https://queridodiario.ok.org.br/apoie#quem-apoia). + +# Open Knowledge Brazil +

+ + Twitter Follow + + + Instagram Follow + + + LinkedIn Follow + +

+ +[Open Knowledge Brazil](https://ok.org.br/) is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analysis and data journalism to promote free knowledge in the various fields of society. + +All work produced by OKBR is openly and freely available. + +# License + +Code licensed under the [MIT License](/LICENSE.md). diff --git a/docs/README.md b/docs/README.md index d8db36e80..cffd41a02 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,4 +1,4 @@ -**Português (BR)** | [English (US)](/docs/README-en-US.md) +**Português (BR)** | [English (US)](/docs/README-en-US.md)

Querido Diário @@ -13,6 +13,7 @@ Conheça mais sobre as [tecnologias](https://queridodiario.ok.org.br/tecnologia) # Sumário - [Como contribuir](#como-contribuir) - [Ambiente de desenvolvimento](#ambiente-de-desenvolvimento) +- [Template para raspadores](#template-para-raspadores) - [Como executar](#como-executar) - [Dicas de execução](#dicas-de-execução) - [Solução de problemas](#solução-de-problemas) @@ -32,7 +33,7 @@ Agradecemos por considerar contribuir com o Querido Diário! :tada: Você encontra como fazê-lo no [CONTRIBUTING.md](/docs/CONTRIBUTING.md)! -Além disso, consulte a [documentação do Querido Diário](https://docs.queridodiario.ok.org.br/pt/latest/index.html) para te ajudar. +Além disso, consulte a [documentação do Querido Diário](https://docs.queridodiario.ok.org.br/pt-br/latest/) para te ajudar. # Ambiente de desenvolvimento Você precisa ter [Python](https://docs.python.org/3/) (+3.0) e o framework [Scrapy](https://scrapy.org) instalados. @@ -48,6 +49,21 @@ pre-commit install > A configuração em outros sistemas operacionais está disponível em ["como configurar o ambiente de desenvolvimento"](/docs/CONTRIBUTING.md#como-configurar-o-ambiente-de-desenvolvimento), incluindo mais detalhes para quem deseja contribuir com o desenvolvimento do repositório. +# Template para raspadores + +Ao invés de começar um arquivo de raspador do zero, você pode inicializar um arquivo de código de raspador já no padrão do Querido Diário, a partir de um template. Para isso, faça: + +1. Vá para o diretório `data_collection`: +```console +cd data_collection +``` +2. Acione o template: +```console +scrapy genspider -t qdtemplate +``` + +Um arquivo `uf_nome_do_municipio.py` será criado no diretório `spiders`, com alguns campos já preenchidos. O diretório é organizado por UF, lembre-se de mover o arquivo para o diretório adequado. + # Como executar Para experimentar a execução de um raspador já integrado ao projeto ou testar o que esteja desenvolvendo, siga os comandos: @@ -86,6 +102,7 @@ scrapy crawl -a end_date= * **Arquivo de log** É possível enviar o log da raspagem para um arquivo ao invés de deixá-lo no terminal. Isto é particularmente útil quando se desenvolve um raspador que apresenta problemas e você quer enviar o arquivo de log no seu PR para obter ajuda. Para isso, use a flag de configuração `-s` seguida de: + `LOG_FILE=log_.txt`: definirá o arquivo para armazenar as mensagens de log. ```console scrapy crawl -s LOG_FILE=log_.txt @@ -133,4 +150,4 @@ Todo o trabalho produzido pela OKBR está disponível livremente. # Licença -Código licenciado sob a [Licença MIT](LICENSE.md). \ No newline at end of file +Código licenciado sob a [Licença MIT](LICENSE.md). diff --git a/docs/TROUBLESHOOTING-en-US.md b/docs/TROUBLESHOOTING-en-US.md index 803cedc2c..0aa8c0a52 100644 --- a/docs/TROUBLESHOOTING-en-US.md +++ b/docs/TROUBLESHOOTING-en-US.md @@ -14,4 +14,28 @@ module.c:1:10: fatal error: Python.h: No such file or directory error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ``` -Please try to install `python3-dev`. E.g. via `apt install python3-dev`, if you are using a Debian-like distro, or use your distro manager package. Make sure that you use the correct version (e.g. `python3.6-dev` or `python3.7-dev`). You can check your version via `python3 --version`. \ No newline at end of file +Please try to install `python3-dev`. E.g. via `apt install python3-dev`, if you are using a Debian-like distro, or use your distro manager package. Make sure that you use the correct version (e.g. `python3.6-dev` or `python3.7-dev`). You can check your version via `python3 --version`. + +## Error `pinned with ==` + +While running `pip install requeriments`, a inexact pinning error may appear. Therefore, use "--no-deps" along the installation: + +~~~Linux +pip install -r data_collection/requirements-dev.txt --no-deps +~~~ + +## Error `legacy-install` + +In your WSL terminal, a `legacy-install failure`, like this one below, may occur while installing packages. + +``` +error: legacy-install failure +error: command 'x86_64-linux-gnu-gcc' failed: No such file or directory +``` + +Thus, upgrade your pio and install some essential libraries for Linux: + +~~~Linux +python3 -m pip install --upgrade pip +sudo apt-get install build-essential libssl-dev libffi-dev python3-dev +~~~ diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md index 36e965741..8450b83d3 100644 --- a/docs/TROUBLESHOOTING.md +++ b/docs/TROUBLESHOOTING.md @@ -13,4 +13,28 @@ module.c:1:10: fatal error: Python.h: No such file or directory compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ``` -Tente instalar `python3-dev`. Por exemplo, via `apt install python3-dev`, se você está usando uma distro Debian, ou utilize o gerenciamento de pacotes da sua distro (por exemplo, `python3.6-dev` or `python3.7-dev`). Você pode saber qual é a sua versão via `python3 --version`. \ No newline at end of file +Tente instalar `python3-dev`. Por exemplo, via `apt install python3-dev`, se você está usando uma distro Debian, ou utilize o gerenciamento de pacotes da sua distro (por exemplo, `python3.6-dev` or `python3.7-dev`). Você pode saber qual é a sua versão via `python3 --version`. + +## Erro `pinned with ==` + +Ao realizar o pip install requeriments pode ocorrer um erro de fixação inexata, então utilize o "--no-deps" ao instalar: + +~~~Linux +pip install -r data_collection/requirements-dev.txt --no-deps +~~~ + +## Erro `legacy-install` + +Ao instalar bibliotecas pode ocorrer o seguinte erro no seu terminal WSL: + +``` +error: legacy-install failure +error: command 'x86_64-linux-gnu-gcc' failed: No such file or directory +``` + +Então faça o upgrade do pip e instale algumas bibliotecas essenciais do Linux: + +~~~Linux +python3 -m pip install --upgrade pip +sudo apt-get install build-essential libssl-dev libffi-dev python3-dev +~~~ diff --git a/scripts/enabled_spiders.py b/scripts/enabled_spiders.py deleted file mode 100644 index 1da06e38f..000000000 --- a/scripts/enabled_spiders.py +++ /dev/null @@ -1,152 +0,0 @@ -# List of Spiders that are enabled to be executed -# automatically in production -SPIDERS = [ - "al_maceio", - "am_manaus", - "ap_macapa", - "ap_santana", - "ba_acajutiba", - "ba_alagoinhas", - "ba_barreiras", - "ba_campo_formoso", - "ba_canudos", - "ba_feira_de_santana", - "ba_itapetinga", - "ba_juazeiro", - "ba_mascote", - "ba_prado", - "ba_salvador", - "ba_santo_estevao", - "ba_senhor_do_bonfim", - "ba_teolandia", - "ba_tucano", - "ce_horizonte", - "ce_sobral", - "df_brasilia", - "es_serra", - "es_vila_velha", - "go_aparecida_de_goiania", - "go_goiania", - "ma_afonso_cunha", - "ma_aldeias_altas", - "ma_axixa", - "ma_bacuri", - "ma_bacurituba", - "ma_boa_vista_do_gurupi", - "ma_caxias", - "ma_centro_do_guilherme", - "ma_codo", - "ma_coroata", - "ma_duque_bacelar", - "ma_feira_nova_do_maranhao", - "ma_maranhaozinho", - "ma_milagres_do_maranhao", - "ma_nina_rodrigues", - "ma_santo_antonio_dos_lopes", - "ma_sao_jose_dos_basilios", - "ma_sao_vicente_ferrer", - "ma_viana", - "ma_ze_doca", - "mg_belo_horizonte", - "mg_betim", - "mg_campo_belo", - "mg_candeias", - "mg_carmo_da_cachoeira", - "mg_contagem", - "mg_crucilandia", - "mg_itajuba", - "mg_itauna", - "mg_nova_serrana", - "mg_piranguinho", - "mg_salinas", - "mg_taiobeiras", - "mg_uberaba", - "mg_varzea_da_palma", - "ms_campo_grande", - "ms_inocencia", - "ms_maracaju", - "mt_cuiaba", - "mt_rondonopolis", - "pa_belem", - "pa_santana_do_araguaia", - "pb_joao_pessoa", - "pe_cabrobo", - "pe_jaboatao_dos_guararapes", - "pe_petrolina", - "pe_recife_2020", - "pi_teresina", - "pr_cafelandia", - "pr_curitiba", - "pr_jaboti", - "pr_londrina", - "pr_maringa", - "pr_sao_mateus_do_sul", - "rj_arraial_do_cabo", - "rj_belford_roxo", - "rj_nova_iguacu", - "rj_rio_de_janeiro", - "rn_natal", - "rr_boa_vista", - "rs_caxias_do_sul", - "rs_cerrito", - "rs_porto_alegre", - "rs_vera_cruz", - "sc_florianopolis", - "sc_joinville", - "se_nossa_senhora_do_socorro", - "sp_adolfo", - "sp_alto_alegre", - "sp_aracariguama", - "sp_aracatuba", - "sp_avare", - "sp_barao_de_antonina", - "sp_birigui", - "sp_braganca_paulista", - "sp_campinas", - "sp_campo_limpo_paulista", - "sp_catanduva", - "sp_coronel_macedo", - "sp_glicerio", - "sp_guaracai", - "sp_guarulhos", - "sp_ibitinga", - "sp_itapeva", - "sp_itapevi", - "sp_itapirapua_paulista", - "sp_jaboticabal", - "sp_jandira", - "sp_itu", - "sp_jaboticabal", - "sp_jau_2023", - "sp_jundiai", - "sp_lavinia", - "sp_marilia", - "sp_monte_alto_2017", - "sp_mogi_guacu", - "sp_osasco", - "sp_parisi", - "sp_patrocinio_paulista", - "sp_paulinia", - "sp_penapolis", - "sp_piedade", - "sp_pratania", - "sp_rio_claro", - "sp_santa_ernestina", - "sp_salto", - "sp_santo_andre", - "sp_santos", - "sp_sao_bernardo_do_campo", - "sp_sao_manuel", - "sp_sao_roque", - "sp_sarutaia", - "sp_sertaozinho", - "sp_sumare", - "sp_valinhos", - "sp_vera_cruz", - "sp_vinhedo", - "sp_votorantim", - "sp_votuporanga", - "to_araguaina", - "to_gurupi", - "to_palmas", -]