Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify *novusagenda.com subdomains #55

Open
3 tasks
DiPierro opened this issue Dec 8, 2020 · 0 comments
Open
3 tasks

Identify *novusagenda.com subdomains #55

DiPierro opened this issue Dec 8, 2020 · 0 comments
Assignees
Labels
help wanted Extra attention is needed research

Comments

@DiPierro
Copy link
Contributor

DiPierro commented Dec 8, 2020

A number of local governments in the Bay Area and in other parts of the country post their meeting minutes, agendas, etc. on websites on the *novusagenda.com subdomain. These websites typically look something like this or this and follow the web address convention PLACE.novusagenda.com/agendapublic, where PLACE is a custom field.

Your task is to compile a list of as many *novusagenda.com subdomains as you can find. This will allow us to evaluate how many government agencies are using this website format, which, in turn, will help us to decide which scrapers to build next.

In the past, we have found that this subdomain enumerating search engine is the easiest and most comprehensive way to compile lists of subdomains. (Note that we may need to set up an account to unlock all of the search features on this website.) However, there are many different ways to find subdomains, including using advanced Google searches or using certain pen testing Python libraries and command line utilities (see nmmapper.com for a few examples), and we encourage you to be creative.

To complete this task, please do the following:

  • Create a Google Sheet with a single column, where each row is a unique *novusagenda.com subdomain. Be sure to change the sharing settings so that this sheet is public to anyone with the link.
  • Paste a link to your *primegov.com Google Sheet to the sites_sheet field of this spreadsheet for the row where the short_name is "novusagenda".
  • Write a brief reply to this issue documenting the process you used to identify subdomains so that we can continue to develop best practices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed research
Projects
None yet
Development

No branches or pull requests

3 participants