You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bug: versionstore is not hooked up properly. See nodestore.nextVersion or something - needs to come from versionstore.
queries for topics/producers that are not stored produce 500 errors, including multi-topic requests when some topics are present but not others
duplicate uploads don't duplicate. If I upload the same file twice, I don't get duplicate messages in GetMessages. On one hand this is desired, on the other hand I don't remember implementing it...Add write-ahead log #5
output MCAP writer is misassociating schemas when merging across different files, resulting in nil-valued schemas for one of the filesFixes bug in mcap merge coordinator #6
Tree merge is erroneously merging areas of the tree that did not change3e16a2e
Statistics mishandles NaN and probably infinite valued floats. Probably the thing to do is skip these for sum/min/max/mean accounting, and maintain nan/inf counts as a separate statistic.1883231
Tree iterator is building a full list of leaves up front, which is extremely slow on huge resultsets. Needs to be incremental.60685f5
statistics storage on inner nodes is currently stored per-schema, but there are some kinds of statistics that are difficult for us to get per-schema, such as the compressed size of leaf data. We need to restructure the statistics representation a bit to handle this.
the executor defers initialization of the output writer until a message is successfully pulled, to allow schema conflicts to be surfaced as 400 errors. It needs to write an empty file if the resultset is simply empty.82dbf9c
tables response format needs structural cleanup
tree methods
delete
return message diff between versions
get statistics since (version) maybe? does that make sense?
testing
service-level integration tests. Test service restarts don't corrupt database.
tests for concurrent inserts into tree
duplicate data must be deduplicated (on timestamp and message byte)
testing for storage with minio
test a huge rootmap
performance analysis
establish metrics of interest: memory usage during querying and ingestion, tree node sizes, records per second @ various input file dimensions, read throughput, time to first record. These need to be easy to observe through prometheus or something. When we get to evaluation it should focus on minio-backed deployments and be run against both local and remote minio deployments.
effectiveness of write batching
design questions
leaf nodes are sized by time, not byte size. There is probably an ideal write size for our storage writes. If we get a sample of messages we can attempt to size a tree to hit that, but if the sample is bad we will produce sub-ideal writes, through no fault of the user. We need to implement that sampling mechanism and also think harder about the general problem. It would be ideal if user write patterns influenced physical storage layout as little as possible (it won't be totally avoidable).
We are starting with ros1msg support because there are a lot of bag files available on the internet that we can use to test with, but ros1msg is not the only recording format used in ros2, or possibly even the most common. To support ros2 we will need to add parsing/statistics support for protobuf, CDR, and flatbuffers. We will need to survey the mcap community to figure out what's highest priority - my guess is CDR will be.
Multiple schemas may be used for a single topic name, particularly over long periods of time as schemas evolve. Is it OK to have multiple schemas in one tree, or do we really need to make trees unique by schema? Nothing in playback breaks due to multiple schemas, but search/statistics features could be complicated (they will be complicated whether there are multiple trees or one).70905e5
bidirectional playback protocol - currently the user makes a request and gets a dump of MCAP response data back. If frontend tools could define a contract, would they choose this or something with finer-grained bidirectional controls? If a spec were defined would FE tooling implement support? This issue seems worth a read/consideration: Allow play/pause/seek control when streaming recorded data through web socket foxglove/ws-protocol#261
It would similarly be useful to spike out a variant with parquet (or a columnar format of some kind) in the leaves. We would need to transcode both on the way in and out, which would be a pain, but having some sense of how much would be lost or gained in row-oriented throughput would be useful.
If we want to support the ability to dump messages on a topic irrespective of producer, we are going to have a problem with the current arch doing an in-memory merge join if the number of producers is very large. Very large numbers of producers can happen easily in simulation, if each run ID is treated as a different producer. For heavy analytics usecases we have an answer, which is use spark or something, but for usecases like viewing logs in a webpage that doesn't work. Probably to support this we will need to spill to disk in the executor. I think as long as we can see a path we can defer this for a while.
we use data files numbered with unpadded decimal numbers, but padded numbers could be helpful for lexicographical sorting. Maybe we should be padding our object IDs.
Today our data files are a concatenation of node serializations from leaf to root. If this could itself be packaged as a single MCAP file (maybe with attachments for the inner nodes) this would be a big usability win.
features needed
Currently we flush WAL synchronously with inserts. WAL flushing needs to get moved to a background thread that intelligently flushes after periods of inactivity on a topic or when size limits get reached.Add write-ahead log #5
export command should not require topics. When no topics are supplied it should return all topics.be61a9b
inner node serialization should change from JSON to compact binary format. Waiting on full statistics support to gather a better picture of what we will need.
inner nodes should be cached in serialized form, not deserialized
statrange queries currently have a minimum granularity of 60s (I think) lower than which will produce a 400. We need to extend it to actually look at the message data and produce a correct result.
Switch to 64-bit offsets and lengths in IDs. This costs 8 bytes per IDs but will insulate us against gigantic messages.d6ad718
dump command: it should be possible to dump the database to a hive-partitioned directory of MCAP files.
export command shouldn't require a producer. Should be able to return data across all producers.
statistics must be extended to variable-length arrays
currently we route topics to ingestion workers based on a hash of producer/topic to ensure two workers don't process the same topic, but this causes underutilization when a worker gets assigned multiple slow-to-process topics. We should switch to a semaphore based strategy to allow other workers to pick up the extra work in this case.
track original import request ID through WAL, and log completion
need API for looking up message definitions by hash, which implies storing them somewhere by hash.
multiple database support. it should be possible to have sim and real-world data segregated on one instance.54707ef
Statistics should degrade gracefully for encoding formats we don't understand, i.e still keep message count, bytecount, just not fieldlevel. This will let non ros1msg MCAP users get some benefit before we implement full parsers.
can we detect if a CLI user is in vim mode and have vim mode in the client?
make number of concurrent wal files configurable. Today we use just one. We don't want one per producer/topic. But one probably isn't optimal.
playback needs to support a mode where the first message returned on each merged stream is the last prior to the requested time, within an adjustable time bound, to allow visualizations to avoid data gradually filtering in.
it should be possible for a database to span different storage buckets. This enables the user to configure retention policies at the bucket level instead of based on an object prefix, which is usually frowned on. It will not be possible for individual trees to span buckets, without storing a bit more state in the node IDs (one byte for 128 allowed buckets within a database seems like it would be sufficient and leave us plenty of range for length).
catalog introspection
from within the client,
what producers do I have?
what topics exist for a producer?
what are the message-level stats for each table's root nodes?
what previous versions of a table do I have, dated and numbered?
what schema(s) are associated with a topic?
what fields on a topic can be queried?
eventually - what databases do I have?
community
present @ foxglove community meetup
present @ foxglove community meetup 1 mo followup
project logo
performance evaluation
is there MCAP or bag data at berkeley we could load up for an evaluation at the end?
establish benchmark metrics
client
interfaces - stick with REST? Use gprc? Yes: grpc. Maybe keep rest, but if the CLI tool is good we don't need rest.
switch to string-format time params in APIs, or js clients will struggle
fun CLI features. Like psql "session" interface, plotting of statistical ranges, displaying images? playing video?f688894
web interface - just to display functionality. Maybe coverage (ranges of data coverage at a given granularity).
autocomplete based on producer/table listing
autocomplete grammar
MCAP library in Java or Scala
Python and java iterator implementations that access a root directly.
clustering
versionstore, wal, rootmap are currently sqlite-based. Both versionstore and rootmap need to move out of sqlite because multiple nodes need to hit them. WAL can stay sqlite for now. Let's go with postgres for now.
storage needs an S3-compatible implementation. Use minio libraries.
*inserts need to shard across replicas based on producer + topic. What manages the shards? Probably goes in postgres.
on the read side, it would be best if we could merge reads with WAL. The "problem" is this would require distribute WAL storage IF we also want any node to be able to serve reads. We can solve this with distributed WAL storage but that's more complicated and slower.
I think the way retention will work is to store a retention policy on the root's record in the rootmap, and guard readers against reading data older than the policy dictates. Once that is in place retention can be managed with regular object lifecycle policies supported by the cloud provider.
Targeted exemption from GC is still outstanding
We will probably need to stick insertion times on inner nodes (in the children probably?) in order to implement the guard.
search & query language
statistics: field-level
SQL or not SQL?
SQL: better 3rd party compatibility, maybe chatgpt can answer queries for us
Not SQL: SQL is crappy for expressing complex as-of joins, which are a common kind of query. Maybe we can do a lot better. Ideally end users would be able to express queries themselves. Queries might be something like "show me all times in last 6 months when it was raining and we were taking an unprotected left and there were dogs in the intersection". That is hard to write in SQL if you aren't a SQL expert. We don't want customers to need to hire teams of SQL experts to translate. Also chatgpt is far from writing good english to SQL for arbitrary business contexts - not clear it will ever work.
we should be able to accelerate as-of joins using the MCAP message index. Prior to decompressing anything, consult the indexes to see if messages on the relevant topic are within the threshold of each other.
Once we have UDF support, it would be really useful to have materialized views
"neighbors" remains unimplemented in the query language
statistics acceleration can be applied at a higher level than the scan level, so that scans on different tables can restrict otherwise unrestricted scans on other tables by time. This would be helpful to improve performance in join scenarios but will require more sophistication.
maintenance
Custom golang-ci lint rule enforcing capitalization of log lines
Custom assert/require lib with better pretty-printing and representation of unsigned numerics
tree pretty printer for better test diffs
weirdnesses
versions are assigned unnecessarily while staging writes to WAL. Each write to WAL gets a version, then we merge them and create one big commit with a final version. I think the version assignment can just be deferred until the big commit.Add write-ahead log #5
tree insert over existing data currently clones all nodes down to the leaf. Pretty sure it only needs to clone the root for tree dimensions, and then all the other copying happens at time of merge from WAL. No indication so far that this is a bottleneck but it probably will be if it isn't yet.Add write-ahead log #5
cgo sqlite stuff is hard to inspect with pprof. Need a solution or perhaps switch to golang embedded db.Add write-ahead log #5
Usage of the word granularity is weird and we may want to revise. Our granularity is an interval in seconds that the stats bucket width must be at least as small as, but this means low "granularity" is "highly granular". Maybe we are misusing the word or should pick a better one.
beta release blockers
document versioning strategy
document versioning strategy for physical tree nodes
graceful statistics degradation for non-ros1msg format messages
document data deletion strategy (based on object lifecycle policies) and implement feature support in the server to mask deleted data.
whole-tree delete command
swagger API docs
The text was updated successfully, but these errors were encountered:
spent some time hacking on the meaningful names in storage, namely paths including topic and producer name. It makes the API that merges messages from a list of tree roots inconvenient, since a list of prefixes must also be specified. I think we should stash that idea and maybe think about solving the problem with better introspection APIs in the database. Ideally users don't care about the data file layout.20c80fe
bugs
bug: versionstore is not hooked up properly. See nodestore.nextVersion or something - needs to come from versionstore.queries for topics/producers that are not stored produce 500 errors, including multi-topic requests when some topics are present but not othersduplicate uploads don't duplicate. If I upload the same file twice, I don't get duplicate messages in GetMessages. On one hand this is desired, on the other hand I don't remember implementing it...Add write-ahead log #5output MCAP writer is misassociating schemas when merging across different files, resulting in nil-valued schemas for one of the filesFixes bug in mcap merge coordinator #6Tree merge is erroneously merging areas of the tree that did not change3e16a2eStatistics mishandles NaN and probably infinite valued floats. Probably the thing to do is skip these for sum/min/max/mean accounting, and maintain nan/inf counts as a separate statistic.1883231Tree iterator is building a full list of leaves up front, which is extremely slow on huge resultsets. Needs to be incremental.60685f5nondeterministic import failure: nondeterministic failure when querying concurrently with big import #11Ensure inner node children in merge are fully cloned #18service does not currently crash on a port conflictc0440dbsemicolon termination should be in the grammar not enforced in client - to enable batched queries.cfc95d8local disk storage implementation should write to a tmpfile + renameEnsure local disk directory store does atomic put #14the executor defers initialization of the output writer until a message is successfully pulled, to allow schema conflicts to be surfaced as 400 errors. It needs to write an empty file if the resultset is simply empty.82dbf9ctree methods
deletereturn message diff between versionstesting
tests for concurrent inserts into treeduplicate data must be deduplicated (on timestamp and message byte)testing for storage with minioperformance analysis
design questions
Multiple schemas may be used for a single topic name, particularly over long periods of time as schemas evolve. Is it OK to have multiple schemas in one tree, or do we really need to make trees unique by schema? Nothing in playback breaks due to multiple schemas, but search/statistics features could be complicated (they will be complicated whether there are multiple trees or one).70905e5features needed
Currently we flush WAL synchronously with inserts. WAL flushing needs to get moved to a background thread that intelligently flushes after periods of inactivity on a topic or when size limits get reached.Add write-ahead log #5can we ditch the nodestore staging map if inserts flush to WAL?Add write-ahead log #5data files should be segregated by tree in storage, with a meaningful name20c80festatrange command should not require start/endDon't require start/end for statrange command #7export command should not require topics. When no topics are supplied it should return all topics.be61a9bSwitch to 64-bit offsets and lengths in IDs. This costs 8 bytes per IDs but will insulate us against gigantic messages.d6ad718WAL doesn't garbage collect yetAdd WAL garbage collection #8multiple database support. it should be possible to have sim and real-world data segregated on one instance.54707efDie immediately on second sigint3e35173catalog introspection
from within the client,
what producers do I have?what topics exist for a producer?what are the message-level stats for each table's root nodes?what previous versions of a table do I have, dated and numbered?what schema(s) are associated with a topic?what fields on a topic can be queried?eventually - what databases do I have?community
present @ foxglove community meetuppresent @ foxglove community meetup 1 mo followupperformance evaluation
client
fun CLI features. Like psql "session" interface, plotting of statistical ranges, displaying images? playing video?f688894clustering
versionstore, wal, rootmap are currently sqlite-based. Both versionstore and rootmap need to move out of sqlite because multiple nodes need to hit them. WAL can stay sqlite for now. Let's go with postgres for now.storage needs an S3-compatible implementation. Use minio libraries.*
inserts need to shard across replicas based on producer + topic. What manages the shards? Probably goes in postgres.on the read side, it would be best if we could merge reads with WAL. The "problem" is this would require distribute WAL storage IF we also want any node to be able to serve reads. We can solve this with distributed WAL storage but that's more complicated and slower.monitoring
pprof debugging endpointdeployment
retention policies
search & query language
statistics: field-levelSQL or not SQL?SQL: better 3rd party compatibility, maybe chatgpt can answer queries for usNot SQL: SQL is crappy for expressing complex as-of joins, which are a common kind of query. Maybe we can do a lot better. Ideally end users would be able to express queries themselves. Queries might be something like "show me all times in last 6 months when it was raining and we were taking an unprotected left and there were dogs in the intersection". That is hard to write in SQL if you aren't a SQL expert. We don't want customers to need to hire teams of SQL experts to translate. Also chatgpt is far from writing good english to SQL for arbitrary business contexts - not clear it will ever work.Expanded in query language #9.statistics acceleration for scansLeverage statistics in querying #25maintenance
weirdnesses
versions are assigned unnecessarily while staging writes to WAL. Each write to WAL gets a version, then we merge them and create one big commit with a final version. I think the version assignment can just be deferred until the big commit.Add write-ahead log #5tree insert over existing data currently clones all nodes down to the leaf. Pretty sure it only needs to clone the root for tree dimensions, and then all the other copying happens at time of merge from WAL. No indication so far that this is a bottleneck but it probably will be if it isn't yet.Add write-ahead log #5cgo sqlite stuff is hard to inspect with pprof. Need a solution or perhaps switch to golang embedded db.Add write-ahead log #5beta release blockers
The text was updated successfully, but these errors were encountered: