Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid recursion in the document loader. (#127) #1

Merged
merged 1 commit into from
Aug 2, 2022

Conversation

ligurio
Copy link
Member

@ligurio ligurio commented Jul 18, 2022

NOTE:


The document loading API (yaml_parser_load) was susseptable to a
stack overflow issue when given input data which opened many
mappings and/or sequences without closing them.

This was due to the use of recurion in the implementation.

With this change, we avoid recursion, and maintain our own loader
context stack on the heap.

The loader context contains a stack of document node indexes.
Each time a sequence or mapping start event is encountered,
the node index corrasponding to the event is pushed to the
stack. Each time a sequence or mapping end event is encountered,
the corrasponding node's index is popped from the stack.

The yaml_parser_load_nodes() function sits on the event stream,
issuing events to the appropriate handlers by type.

When an event handler function constructs a node, it needs to
connect the new node to its parent (unless it's the root node).
This is where the loader context stack is used to find the
parent node. The way that the new node is added to the tree
depends on whether the parent node is a mapping (with a
yaml_node_pair_t to fill), or a sequence (with a yaml_node_item_t).

Fixes: yaml#107

The document loading API (yaml_parser_load) was susseptable to a
stack overflow issue when given input data which opened many
mappings and/or sequences without closing them.

This was due to the use of recurion in the implementation.

With this change, we avoid recursion, and maintain our own loader
context stack on the heap.

The loader context contains a stack of document node indexes.
Each time a sequence or mapping start event is encountered,
the node index corrasponding to the event is pushed to the
stack.  Each time a sequence or mapping end event is encountered,
the corrasponding node's index is popped from the stack.

The yaml_parser_load_nodes() function sits on the event stream,
issuing events to the appropriate handlers by type.

When an event handler function constructs a node, it needs to
connect the new node to its parent (unless it's the root node).
This is where the loader context stack is used to find the
parent node.  The way that the new node is added to the tree
depends on whether the parent node is a mapping (with a
yaml_node_pair_t to fill), or a sequence (with a yaml_node_item_t).

Fixes: yaml#107
@Totktonada
Copy link
Member

A small wish: use git cherry-pick -x <refspec> or add the original commit hash into the commit message manually. This way it will be easier to look over the history commits in the future: at least it'll make obvious that this commit is a backport.

Tarantool does not use the load-all-at-once yaml_parser_load() API, it uses the stream-oriented API (yaml_parser_parse()). So tarantool is not affected by the problem, which is fixed there.

Anyway, I don't mind the backport of the fix as the extra protection measure: it is good to don't have a code that may lead to the stack overflow even if it is not used at the moment.

This fix seems to work, I run the loader this way:

$ cmake .
$ make
$ yes '{' | head -n 102400 | tr -d '\n' > x.yaml
$ ./run-loader x.yaml
[1] Loading 'x.yaml': FAILURE (0 documents)

Before the fix:

$ ./run-loader x.yaml 
[1] Loading 'x.yaml': Segmentation fault

However on large input it cycles or works very slow. I reported it in yaml#251.

@igormunkin igormunkin merged commit 0f655b5 into master Aug 2, 2022
@igormunkin igormunkin deleted the ligurio/fix-libyaml-gh-107 branch August 2, 2022 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stack Overflow (72997432, 73012922)
4 participants