Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Backfill should gracefully handle the case where no one has the complete state for a given event #10764

Open
MadLittleMods opened this issue Sep 4, 2021 · 5 comments · Fixed by #11114
Labels
A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Sep 4, 2021

This issue is spawning from #10566 (comment) and summarizes the problem/conversation there. Thanks @erikjohnston and @richvdh for the insights on this!


We initially used a fake_prev_event_id for MSC2716 to make a chain of events float outside of the normal DAG but have since opted to use empty prev_events=[]. This issue still valid for the general case where no servers have the state for a given event though.

The refactor in #10645 changed the way we handle fetching state for backfills. Before we'd just get the state at the last event, whereas now we try and get the state for each prev_event (including the fake event), which obviously fails. This means that federated homeservers no longer accept the insertion event because it can't fetch the fake_prev_event_id state (Error attempting to resolve state at missing prev_events, ERROR 403: We can't get valid state history.).

The goal of the code that's rejecting the fake event is protecting against an attack where a remote server could arbitrarily replace the servers view of the current state of the room. But the mitigation we use is a sledgehammer and we should replace it with something more intelligent that both mitigates the attack and gracefully handles missing events. (related to #8451)

Potential solutions

To be determined (TBD).

Reproduction case

  1. Create an event which references a made up prev_event_id
  2. Try to backfill it from federated homeserver
2021-09-01 20:19:21,560 - synapse.handlers.federation_event - 746 - WARNING - GET-5-$hbHXgulNKlmg9mkGQiBH5SSgq2NL4GO70sIIfHCa-YY - Error attempting to resolve state at missing prev_events
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation_event.py", line 711, in _resolve_state_at_missing_prevs
    remote_state = await self._get_state_after_missing_prev_event(
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation_event.py", line 777, in _get_state_after_missing_prev_event
    ) = await self.federation_client.get_room_state_ids(
  File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_client.py", line 370, in get_room_state_ids
    result = await self.transport_layer.get_room_state_ids(
  File "/usr/local/lib/python3.8/site-packages/synapse/federation/transport/client.py", line 71, in get_room_state_ids
    return await self.client.get_json(
  File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 1001, in get_json
    response = await self._send_request_with_optional_trailing_slash(
  File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 385, in _send_request_with_optional_trailing_slash
    response = await self._send_request(request, **send_request_args)
  File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 631, in _send_request
    raise exc
synapse.api.errors.HttpResponseException: 404: Not Found

@MadLittleMods MadLittleMods changed the title Backfill should gracefully handle the case where no one has the complete state for a given event Backfill should gracefully handle the case where no one has the complete state for a given event (fake_prev_event_id) Sep 4, 2021
@MadLittleMods MadLittleMods added the T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. label Sep 4, 2021
@MadLittleMods MadLittleMods self-assigned this Feb 4, 2022
@TheNamelessWonderer
Copy link

TheNamelessWonderer commented Feb 6, 2022

I have the same issue:

Error attempting to resolve state at missing prev_events
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation_event.py", line 768, in _resolve_state_at_missing_prevs
    remote_state = await self._get_state_after_missing_prev_event(
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation_event.py", line 834, in _get_state_after_missing_prev_event
    ) = await self._federation_client.get_room_state_ids(
  File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_client.py", line 402, in get_room_state_ids
    result = await self.transport_layer.get_room_state_ids(
  File "/usr/local/lib/python3.8/site-packages/synapse/federation/transport/client.py", line 82, in get_room_state_ids
    return await self.client.get_json(
  File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 1016, in get_json
    response = await self._send_request_with_optional_trailing_slash(
  File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 390, in _send_request_with_optional_trailing_slash
    response = await self._send_request(request, **send_request_args)
  File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 646, in _send_request
    raise exc

Is there a workaround or should I wait for a fix?

@MadLittleMods MadLittleMods changed the title Backfill should gracefully handle the case where no one has the complete state for a given event (fake_prev_event_id) Backfill should gracefully handle the case where no one has the complete state for a given event Feb 7, 2022
@MadLittleMods
Copy link
Contributor Author

@TheNamelessWonderer There isn't a fix yet.

We've side-stepped the issue for MSC2716 by no longer using a fake_prev_event_id. But I've updated the issue description to describe the general case where this could happen and is still relevant.

@MadLittleMods MadLittleMods reopened this Feb 7, 2022
@haslersn
Copy link
Contributor

I observed the following log. Is it due to this issue right here, or should I create a new issue? Synapse version 1.57.1 from the official docker image.

2022-04-27 13:44:46,934 - synapse.handlers.federation_event - 829 - WARNING - _process_incoming_pdus_in_room_inner-22-$kO5wSAMej0j1AWAZijTogWi_LSxUpXkRNjkqI8gC3e4-$JdFaG_kRij43v6svPuqhORqMn9MguAKVUa4srN41QWc - Error attempting to resolve state at missing prev_events
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/[defer.py](http://defer.py/)", line 1660, in _inlineCallbacks
    result = current_[context.run](http://context.run/)(gen.send, result)
StopIteration: DestinationRetryTimings(failure_ts=1651067082744, retry_last_ts=1651067082744, retry_interval=600000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/[defer.py](http://defer.py/)", line 1660, in _inlineCallbacks
    result = current_[context.run](http://context.run/)(gen.send, result)
StopIteration: DestinationRetryTimings(failure_ts=1651067082744, retry_last_ts=1651067082744, retry_interval=600000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_[event.py](http://event.py/)", line 794, in _resolve_state_at_missing_prevs
    remote_state = await self._get_state_after_missing_prev_event(
  File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_[event.py](http://event.py/)", line 860, in _get_state_after_missing_prev_event
    ) = await self._federation_client.get_room_state_ids(
  File "/usr/local/lib/python3.9/site-packages/synapse/federation/federation_[client.py](http://client.py/)", line 409, in get_room_state_ids
    result = await self.transport_layer.get_room_state_ids(
  File "/usr/local/lib/python3.9/site-packages/synapse/federation/transport/[client.py](http://client.py/)", line 83, in get_room_state_ids
    return await self.client.get_json(
  File "/usr/local/lib/python3.9/site-packages/synapse/http/[matrixfederationclient.py](http://matrixfederationclient.py/)", line 1043, in get_json
    response = await self._send_request_with_optional_trailing_slash(
  File "/usr/local/lib/python3.9/site-packages/synapse/http/[matrixfederationclient.py](http://matrixfederationclient.py/)", line 382, in _send_request_with_optional_trailing_slash
    response = await self._send_request(request, **send_request_args)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/[matrixfederationclient.py](http://matrixfederationclient.py/)", line 471, in _send_request
    limiter = await synapse.util.retryutils.get_retry_limiter(
  File "/usr/local/lib/python3.9/site-packages/synapse/util/[retryutils.py](http://retryutils.py/)", line 102, in get_retry_limiter
    raise NotRetryingDestination(
synapse.util.retryutils.NotRetryingDestination: Not retrying server [hispagatos.org](http://hispagatos.org/).

@haslersn
Copy link
Contributor

haslersn commented May 3, 2022

Why is this closed by a commit on a synapse fork?

@richvdh
Copy link
Member

richvdh commented May 3, 2022

good question. I'm assuming it wasn't deliberate, @babolivier ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
4 participants