Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap corruption during authoritative resolution #5

Open
chjj opened this issue Apr 14, 2018 · 3 comments
Open

Heap corruption during authoritative resolution #5

chjj opened this issue Apr 14, 2018 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@chjj
Copy link
Contributor

chjj commented Apr 14, 2018

Spotted this today when clicking an obfuscated twitter link. The daemon had been running locally on my laptop for a few days.

rs: query
rs:   id=27745
rs:   labels=2
rs:   name=t.co.
rs:   type=1
rs:   class=1
rs:   edns=0
rs:   dnssec=0
rs:   tld=co
rs:   addr=127.0.0.1:51557
rs: udp nodata
ns: query
ns:   id=14389
ns:   labels=2
ns:   name=t.co.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=co
ns:   addr=127.0.0.1:52340
ns: udp nodata
corrupted double-linked list (not small)
Aborted

I've been unable to reproduce it, so no chance of using valgrind to track this down. The ns: udp nodata log implies that hsk_ns_onrecv() successfully executed otherwise I would suspect this of being an issue with the authoritative cache that was just added. It's also not sending a message in response, so there's no cache hit there.

My best guess is something funky happened in the P2P pool which corrupted the heap. I'm starting to add more debug logs so we can narrow this down when it happens again.

@chjj chjj added the bug Something isn't working label Apr 14, 2018
@chjj chjj self-assigned this Apr 14, 2018
@chjj
Copy link
Contributor Author

chjj commented Feb 7, 2019

Update: Finally a lead after so many months of head scratching...

When implementing the unbound node module, I noticed a similar heap corruption in node.js. It only presented itself when the unbound context was set to async mode. It was consistent and reproducible (it seemed to cause unrecoverable memory corruption maybe 1 out of every 20 times the bns test suite was run). I'm going to guess the same issue is affecting hnsd.

I hesitate to call this a bug in libunbound. It's possible that libuv and libevent don't play well with each other for some reason (?).

I think the solution for now would be to call unbound's resolver synchronously in the uv thread pool (the same fix used in the unbound node module).

We can leave this open to investigate the causes of this more thoroughly in the future.

@chjj chjj mentioned this issue May 13, 2019
@pinheadmz
Copy link
Member

This has come up again in a branch where hnsd can discover peers and open more connections: #38 (comment)

@pinheadmz
Copy link
Member

Got a stack trace of this:

peer 714 (64.227.15.172:12038): sending verack                                                                                                                                                    [25/1965]
peer 714 (64.227.15.172:12038): sending sendheaders                                                                                                                                                        
peer 714 (64.227.15.172:12038): sending getaddr                                                                                                                                                            
peer 714 (64.227.15.172:12038): sending getheaders                                                                                                                                                         
corrupted double-linked list (not small)                                                                                                                                                                   
                                                                                                                                                                                                           
Thread 1 "hnsd" received signal SIGABRT, Aborted.                                                                                                                                                          
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                                                                                      
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.                                                                                                                                     
(gdb) bt                                                                                                                                                                                                   
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                                                                                  
#1  0x00007ffff7cd5859 in __GI_abort () at abort.c:79                                                                                                                                                      
#2  0x00007ffff7d403ee in __libc_message (action=action@entry=do_abort,                                                                                                                                    
    fmt=fmt@entry=0x7ffff7e6a285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155                                                                                                                              
#3  0x00007ffff7d4847c in malloc_printerr (                                                                                                                                                                
    str=str@entry=0x7ffff7e6c248 "corrupted double-linked list (not small)") at malloc.c:5347                                                                                                              
#4  0x00007ffff7d48af7 in unlink_chunk (p=p@entry=0x5555563441c0, av=0x7ffff7e9bb80 <main_arena>)                                                                                                          
    at malloc.c:1468                                                                                                                                                                                       
#5  0x00007ffff7d4b773 in _int_malloc (av=av@entry=0x7ffff7e9bb80 <main_arena>,                                                                                                                            
    bytes=bytes@entry=8224) at malloc.c:4041                                                                                                                                                               
#6  0x00007ffff7d4d419 in __GI___libc_malloc (bytes=8224) at malloc.c:3066                                                                                                                                 
#7  0x00005555555732a0 in hsk_dns_msg_alloc () at src/dns.c:71                                                                                                                                             
#8  0x000055555557618c in hsk_dns_msg_decode (data=<optimized out>,                                                                                                                                        
    data@entry=0x5555557812f1 "\244\306\001 ", data_len=<optimized out>, data_len@entry=50,                                                                                                                
    msg=msg@entry=0x7fffffffa6c0) at src/dns.c:86                                                                                                                                                          
#9  0x0000555555582bbb in hsk_dns_req_create (data=0x5555557812f1 "\244\306\001 ", data_len=50,                                                                                                            
    addr=0x7fffffffa7a0) at src/req.c:72
#10 0x000055555556ee78 in after_recv ()
#11 0x00005555555b13c5 in uv__udp_recvmsg (handle=0x5555559e48f0) at src/unix/udp.c:205
#12 uv__udp_io (loop=<optimized out>, w=0x5555559e4970, revents=1) at src/unix/udp.c:142
#13 0x00005555555b3238 in uv__io_poll (loop=loop@entry=0x55555563f420 <default_loop_struct>, 
    timeout=2979) at src/unix/linux-core.c:400
#14 0x00005555555a855c in uv_run (loop=0x55555563f420 <default_loop_struct>, mode=UV_RUN_DEFAULT)
    at src/unix/core.c:368
#15 0x000055555556bef0 in main () at src/unix/core.c:820

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants