-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geoip processor update #11854
Geoip processor update #11854
Conversation
…e datatabase types/providers--sans IP validation
The back-end support is largely complete, though needs some polishing up. I discussed the GUI changes with @mikedklein, and we were thinking of something as in the following illustration: The Database Providers would be MaxMind and IPInfo, which correspond with the prefixes in the DatabaseType enum set. The caveat with this approach, which makes the workflow a bit unnatural (at the code level) is that, the city_db_type and asn_db_type fields in the config object would have to be updated dynamically when a database provider selection is made (e.g. When MaxMind is selected the city_db_type and asn_db_type will get values MAXMIND_CITY and MAXMIND_ASN). When updating the model, current selection for the Database Provider based on the *_db_type fields in the config object. Alternatively, we can remove the 'db_type' concept and instead create a database_provider field. However, this will require a substantially disruptive change set, including database migrations (which may be just fine if you think it is worth it). As is, I can switch from MaxMind to IPInfo directly via the endpoint, and the process works as expected (the processor uses the respective City/ASN databases). |
…dividual db types
…abase_vendor_type' for GeoIpResolverConfig. Updated GeoIpResolverFactory to be non-singleton.
graylog2-server/src/main/java/org/graylog/plugins/map/geoip/GeoIpResolverFactory.java
Outdated
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog/plugins/map/geoip/MaxMindIpResolver.java
Show resolved
Hide resolved
… provided (i.e. this is optional). Updated GeoIpResolverEngine to check if IP address is reserved
… confirm it is a valid db file).
…o do query validation for the selected db/resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few questions and suggestions below.
In your comment you mentioned the management of the database reader.
How is lifecycle management handled in general for the objects that read from a database file?
I don't see any code that "restarts" the objects once a user changes the configuration in the web interface. So it looks like any change to the resolver config currently requires a restart of the server.
Everything that is configured in the web interface shouldn't require a restart to get applied.
Lookup tables, for example, are services with lifecycle management that will be restarted once their config gets updated. (that includes closing file handles to databases, etc. - see: MaxmindDataAdapter)
To make the runtime changes work, we would need to implement something similar for the geo IP resolver engine.
And this raises the question of what use case or problem we need to solve triggered these changes in the first place? 😅
If we add lifecycle handling to the geo IP resolver engine, we basically reimplement parts of lookup tables.
So I think we should take a step back and talk about the problem we want to solve. /cc @waab76 @bud1979
graylog2-server/src/main/java/org/graylog/plugins/map/geoip/GeoIpResolverEngine.java
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog2/rest/resources/system/ClusterConfigResource.java
Outdated
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog2/rest/resources/system/ClusterConfigResource.java
Outdated
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog/plugins/map/geoip/GeoIpResolverEngine.java
Outdated
Show resolved
Hide resolved
...rver/src/main/java/org/graylog2/migrations/V20211221144300_GeoIpResolverConfigMigration.java
Outdated
Show resolved
Hide resolved
…aylog Schema--if true, IP address fields to search will be limited to the preset list.
graylog2-web-interface/src/components/maps/configurations/GeoIpResolverConfig.jsx
Outdated
Show resolved
Hide resolved
…set default vendor type after renaming old column
…set default vendor after having renamed field in separate update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality looks good and previously enabled geolocation processing continues to work after upgrading.
I am reviewing this now, and should be done today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roberto-graylog I am still reviewing, but I noticed one issue I wanted to share:
It looks like there is an order-of-operation issue that only shows itself on the first startup. The org.graylog.plugins.map.geoip.processor.GeoIpProcessor
starts up before the migration runs to migrate the Cluster config data.
The result is the following error only on the first startup (not sure if this is a race condition that only I am seeing) and the GeoProcessor won't run if it was previously enabled halting GeoProcessing. Subsequent restarts of the GeoIpProcessor are successful because the migration has run by then. I think we will need to find a way for the processor to wait on startup until after the migrations have run. I believe @mpfz0r recently added some preflight checks to Graylog when it starts, I am not sure if something there could help. Any suggestions @mpfz0r?
Old DB config that is present when the GeoIpProcessor starts up the first time:
Final and correct config after the migration runs:
{
"_id" : ObjectId("61eaae5b871ccb599da8d859"),
"type" : "org.graylog.plugins.map.config.GeoIpResolverConfig",
"payload" : {
"enabled" : true,
"asn_db_path" : "",
"city_db_path" : "/Users/danieltorrey/maxmind/GeoLite2-City.mmdb",
"db_vendor_type" : "MAXMIND",
"enforce_graylog_schema" : false
},
"last_updated" : ISODate("2022-01-21T13:00:11.966Z"),
"last_updated_by" : "e065896b-8a9a-4f45-83f2-e740525ed035"
}
A bit more info: The graylog2-server/graylog2-server/src/main/java/org/graylog2/bootstrap/CmdLineTool.java Line 294 in ec53025
And later, migrations are run during the graylog2-server/graylog2-server/src/main/java/org/graylog2/bootstrap/CmdLineTool.java Line 311 in ec53025
graylog2-server/graylog2-server/src/main/java/org/graylog2/bootstrap/ServerBootstrap.java Line 186 in ec53025
The root of the problem is that message processing objects are initialized at the time when injection is set up. There appears to be no state control that waits for Graylog startup operations to be complete first. Other than a larger reworking, can anyone think of any solutions or workarounds for this PR? @mpfz0r @bernd @kingzacko1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roberto-graylog I've finished reviewing the rest of the code. Excellent work! I really like the implementation, structure, and all of the error handling. Seems really solid to me.
I only left a few very small comments here. My only remaining concern with this PR is the initial start error and failure of the GeoLocation engine to start. Once a solution is found for that, I think this is ready for merging.
graylog2-server/src/main/java/org/graylog/plugins/map/config/GeoIpResolverConfig.java
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog/plugins/map/config/GeoIpResolverConfig.java
Show resolved
Hide resolved
...g2-server/src/main/java/org/graylog2/rest/resources/system/GeoIpResolverConfigValidator.java
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog2/rest/resources/system/ClusterConfigResource.java
Show resolved
Hide resolved
Thanks @danotorrey , this was a great catch--I completely missed it. The migration runner requires the injector, not I cannot see at the moment how we an run it before the processor gets injected. I will continue to look and let you know if I find something. I attempted to run migrations on beforeInjectorCreation with the pre-flight injector, but that just seems to have angered it--perhaps i am just missing module configurations. |
@roberto-graylog I just thought of one possible solution that might be simple. While the One lazy init option is to use the Apache Commons Lazy Initializer. Moving the UPDATE |
@danotorrey @roberto-graylog How about doing something like the following? We can wait with the engine's initialization until the server is in the diff --git graylog2-server/src/main/java/org/graylog/plugins/map/geoip/processor/GeoIpProcessor.java graylog2-server/src/main/java/org/graylog/plugins/map/geoip/processor/GeoIpProcessor.java
index cacad4bf6d..72e9109cd3 100644
--- graylog2-server/src/main/java/org/graylog/plugins/map/geoip/processor/GeoIpProcessor.java
+++ graylog2-server/src/main/java/org/graylog/plugins/map/geoip/processor/GeoIpProcessor.java
@@ -25,7 +25,9 @@ import org.graylog.plugins.map.geoip.GeoIpVendorResolverService;
import org.graylog2.cluster.ClusterConfigChangedEvent;
import org.graylog2.plugin.Message;
import org.graylog2.plugin.Messages;
+import org.graylog2.plugin.ServerStatus;
import org.graylog2.plugin.cluster.ClusterConfigService;
+import org.graylog2.plugin.lifecycles.Lifecycle;
import org.graylog2.plugin.messageprocessors.MessageProcessor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -55,32 +57,38 @@ public class GeoIpProcessor implements MessageProcessor {
private final ScheduledExecutorService scheduler;
private final MetricRegistry metricRegistry;
private final GeoIpVendorResolverService geoIpVendorResolverService;
+ private final ServerStatus serverStatus;
- private final AtomicReference<GeoIpResolverConfig> config;
- private final AtomicReference<GeoIpResolverEngine> filterEngine;
+ private final AtomicReference<GeoIpResolverEngine> filterEngine = new AtomicReference<>(null);
@Inject
public GeoIpProcessor(ClusterConfigService clusterConfigService,
@Named("daemonScheduler") ScheduledExecutorService scheduler,
EventBus eventBus,
MetricRegistry metricRegistry,
- GeoIpVendorResolverService geoIpVendorResolverService) {
+ GeoIpVendorResolverService geoIpVendorResolverService,
+ ServerStatus serverStatus) {
this.clusterConfigService = clusterConfigService;
this.scheduler = scheduler;
this.metricRegistry = metricRegistry;
this.geoIpVendorResolverService = geoIpVendorResolverService;
-
- final GeoIpResolverConfig config = clusterConfigService.getOrDefault(GeoIpResolverConfig.class,
- GeoIpResolverConfig.defaultConfig());
-
- this.config = new AtomicReference<>(config);
- this.filterEngine = new AtomicReference<>(new GeoIpResolverEngine(geoIpVendorResolverService, config, metricRegistry));
+ this.serverStatus = serverStatus;
eventBus.register(this);
}
@Override
public Messages process(Messages messages) {
+ // Lazily initialize the filter engine because we might not get the correct resolver config until the
+ // server is running. (until migrations are done)
+ if (filterEngine.get() == null) {
+ if (serverStatus.getLifecycle() == Lifecycle.RUNNING) {
+ reload();
+ } else {
+ return messages;
+ }
+ }
+
for (Message message : messages) {
filterEngine.get().filter(message);
}
@@ -103,7 +111,6 @@ public class GeoIpProcessor implements MessageProcessor {
GeoIpResolverConfig.defaultConfig());
LOG.info("Updating GeoIP resolver engine - {}", newConfig);
- config.set(newConfig);
filterEngine.set(new GeoIpResolverEngine(geoIpVendorResolverService, newConfig, metricRegistry));
}
} |
@bernd I like the thinking here. Great idea to use the server status to identify the running server state. How about using
|
@danotorrey Good point! 👍 I was a bit reluctant to use |
…ime the process(...) message is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @roberto-graylog. The latest update fixed the error on first start up 🎉. Well done!
Do you think the info
log messages when the processor engine starts up and restarts can be reduced to debug
level? currently, the following messages are logging for every processor (4 by default) and could be even more for customer configurations. Perhaps adding started/starting info
level messages to GeoIpProcessor
instead would be better, so that they only display once time.
2022-01-26 09:23:12,104 INFO : org.graylog.plugins.map.geoip.processor.GeoIpProcessor - Updating GeoIP resolver engine - GeoIpResolverConfig{enabled=true, enforceGraylogSchema=false, databaseVendorType=MAXMIND, cityDbPath=/Users/danieltorrey/maxmind/GeoLite2-City.mmdb, asnDbPath=/Users/danieltorrey/maxmind/GeoLite2-ASN.mmdb}
2022-01-26 09:23:12,104 INFO : org.graylog.plugins.map.geoip.processor.GeoIpProcessor - Updating GeoIP resolver engine - GeoIpResolverConfig{enabled=true, enforceGraylogSchema=false, databaseVendorType=MAXMIND, cityDbPath=/Users/danieltorrey/maxmind/GeoLite2-City.mmdb, asnDbPath=/Users/danieltorrey/maxmind/GeoLite2-ASN.mmdb}
2022-01-26 09:23:12,104 INFO : org.graylog.plugins.map.geoip.processor.GeoIpProcessor - Updating GeoIP resolver engine - GeoIpResolverConfig{enabled=true, enforceGraylogSchema=false, databaseVendorType=MAXMIND, cityDbPath=/Users/danieltorrey/maxmind/GeoLite2-City.mmdb, asnDbPath=/Users/danieltorrey/maxmind/GeoLite2-ASN.mmdb}
2022-01-26 09:23:12,104 INFO : org.graylog.plugins.map.geoip.processor.GeoIpProcessor - Updating GeoIP resolver engine - GeoIpResolverConfig{enabled=true, enforceGraylogSchema=false, databaseVendorType=MAXMIND, cityDbPath=/Users/danieltorrey/maxmind/GeoLite2-City.mmdb, asnDbPath=/Users/danieltorrey/maxmind/GeoLite2-ASN.mmdb}
2022-01-26 09:23:12,104 INFO : org.graylog.plugins.map.geoip.processor.GeoIpProcessor - Updating GeoIP resolver engine - GeoIpResolverConfig{enabled=true, enforceGraylogSchema=false, databaseVendorType=MAXMIND, cityDbPath=/Users/danieltorrey/maxmind/GeoLite2-City.mmdb, asnDbPath=/Users/danieltorrey/maxmind/GeoLite2-ASN.mmdb}
2022-01-26 09:23:12,105 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - Created Geo IP Resolvers for 'MAXMIND'
2022-01-26 09:23:12,105 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpLocationResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - Created Geo IP Resolvers for 'MAXMIND'
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpAsnResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - Created Geo IP Resolvers for 'MAXMIND'
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - Created Geo IP Resolvers for 'MAXMIND'
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - Created Geo IP Resolvers for 'MAXMIND'
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpLocationResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpLocationResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpLocationResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpLocationResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpAsnResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpAsnResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpAsnResolver' Status Enabled: true
2022-01-26 09:23:12,106 INFO : org.graylog.plugins.map.geoip.GeoIpResolverEngine - 'MaxMindIpAsnResolver' Status Enabled: true
Otherwise, this PR LGTM!
Thanks @danotorrey . I have made the updates from INFO to DEBUG for those entries. |
Description
This PR addresses issue GeoIP Processor Update
The code changes include:
Motivation and Context
The ability to switch vendors at runtime gives users flexibility, depending on which data subscription they have.
How Has This Been Tested?
Testing includes a unit test to confirm the factory can successfully query the current configuration and create the appropriate location and ASN resolver to get the appropriate location and ASN data. To avoid embedding vendor data, all further testing was performed manually on a local installation.
Screenshots (if appropriate):
N/A
Testing Steps During Review:
The DB migration should the cluster_config collection for type 'org.graylog.plusins.map.config.GeoIpResovlerConfig' to
{ "enabled": true, "enforce_graylog_schema": false, "db_vendor_type": "MAXMIND", "city_db_path": "/etc/graylog/server/GeoLite2-City.mmdb", "asn_db_path": "/" }
Test the Geo-Location Processor updates by navigating to System/Configuration. Test that all the fields are updated correctly AND that when enabled=true the City database is validated (file location as well as database type) and that when an ASN database path is provided, it is also validated. Also test that the vendor type is switched--the server logs indicate that it switched successfully as well as which new type is configured.
With the configuration enabled and enforce_graylog_schema=true , submit a message with fields source_ip,host_ip, and destination_ip, and confirm that the IP addresses in these fields (and only these fields) is handled correctly. Include reserved, public IPv4 addresses as well as an IPv6 address to confirm it is handled correctly. public addresses should produce additional fields with the geo data. Reserved addresses should produce a new field called _reserved_ip with value true. IPv6 addresses will also produced a log entry in the server log. Include IP addresses in any other than the aforementioned fields to confirm only the preset fields were queried.
Update the configuration to enforce_graylog_schema=false and submit a message with IP addresses in fields other than the preset fields--confirm these addresses were handled.
Types of changes
Checklist: