#6 closed defect (fixed)
ntrip server connection reset with ntripcaster
Reported by: | Owned by: | stuerze | |
---|---|---|---|
Priority: | normal | Component: | Professional Caster |
Version: | ntripcaster 2.0.9 | Keywords: | recv() in source.c; http chunk left=0"-condition |
Cc: | Jun.Shih@… |
Description
Bug report from Jun Shih (Jun.Shih@…)
=========== first message ===================
I have seen the following ntrip server reset condition in my log:
[02/Nov/2010:21:26:43] [22:Source Thread] add_chunk: hex [], read=21
[02/Nov/2010:21:26:43] [22:Source Thread] add_chunk: http chunk left=0
[02/Nov/2010:21:26:43] [22:Source Thread] DEBUG: Source received 0 bytes in try
2, total 0, errno: 0
[02/Nov/2010:21:26:43] [22:Source Thread] Didn't receive data from source after
800 milliseconds, assuming it died...
A bit debugging found that this caused by the "http chunk left=0" condition, the recv will return 0 because recv length parameter is zero (con->http_chunk->left - read_bytes =0) in source.c:663
len = recv(con->sock, con->food.source->chunk[con->food.source->cid].data + read_bytes, con->http_chunk->left - read_bytes, 0);
when len=0, subsequently maxread=0 as well. These cause the while loop to terminate on line 737 or 744. I believe this code is the same as 2.0.14.
=========== second message ===================
But in my case connection is not shutdown (I can see from wireshark). Note that recv returns 0 if the length parameter is zero (which is not undocumented in the manual) and it does not return -1 in this case. I have attached a sample program to demonstrate that, to test “ telnet localhost 5000” and type some chars and a return from another terminal.
[jshih@ntriptest tcp_server]$ make tcp_recv
cc tcp_recv.c -o tcp_recv
[jshih@ntriptest tcp_server]$ ./tcp_recv
TCPServer Waiting for client on port 5000
got a connection from (127.0.0.1 , 43891)bytes_recieved 0 errno 0
bytes_recieved 0 errno 0
bytes_recieved 0 errno 0
I know my server is continuously sending but with inserted delay test. Further I changed the code to demonstrate that, there are data in the pipe in my case:
if (con->http_chunk->left < SOURCE_READSIZE) {
len = recv(con->sock, con->food.source->chunk[con->food.source->cid].data + read_bytes, con->http_chunk->left - read_bytes, 0);
xa_debug (4, "Recv LEN1: recv left=%d read=%d len:%d errno:%d", con->http_chunk->left, read_bytes,len, errno);
if (len == 0)
{
len = recv(con->sock, con->food.source->chunk[con->food.source->cid].data + read_bytes, con->http_chunk->left - read_bytes+1, 0);
xa_debug (4, "Recv LEN2: recv left=%d read=%d len:%d errno:%d", con->http_chunk->left, read_bytes,len, errno);
len = 0;
}
maxread = (int)(0.5 * con->http_chunk->left);
Below is the log capture:
[03/Nov/2010:15:53:08] [6:Source Thread] add_chunk: hex [], read=21
[03/Nov/2010:15:53:08] [6:Source Thread] add_chunk: http chunk left=0
[03/Nov/2010:15:53:08] [6:Source Thread] Recv LEN1: recv left=0 read=0 len:0 errno:0
[03/Nov/2010:15:53:08] [6:Source Thread] Recv LEN2: recv left=0 read=0 len:1 errno:0
As you can see if I immediately give a length parameter=1, I successfully recv’ed 1 byte. So there are data.
Also I made a change to source.c (see attached). I run for hours without connection drop (previously was dropping every a few minutes), please review the changes to see if this is a right solution.
changes in source.c =
736c736
< } else if (len == 0) {
---
} else if (len == 0 && con->http_chunk->left) {
744c744
< } while ((read_bytes < maxread) && (tries < source_read_tries));
---
} while ((con->http_chunk->left==0 read_bytes < maxread) && (tries < source_read_tries));
======================
Attachments (2)
Change History (17)
by , 14 years ago
Attachment: | tcp_recv.c added |
---|
follow-up: 2 comment:1 by , 14 years ago
comment:2 by , 14 years ago
Please ignore my previous comment. After further looking at the code and debugging, I fully understood the problem. The issue that sometimes, reading trailing bytes failed (returned -1 i.e., EAGAIN). So as the trailing bytes are not read out, it is interpreted as the next chunk and have length of 0. Below is my add debug code, and logs so you can see the problem, I will try a few things to see how to fix it and will follow next.
--- debug code change ----
[jshih@ntriptest ntripcaster-2.0.9]$ diff src/source.c.orig src/source.c 647c647 < xa_debug (5, "add_chunk: hex [%s], read=%d", con->http_chunk->buf, len); --- > xa_debug (5, "add_chunk: hex [%s], read=%d off %d", con->http_chunk->buf, len, con->http_chunk->off); 649a650,654 > int i; > for (i=0; i <20; i++) > { > xa_debug (4, "%2i: %x %c", i, con->http_chunk->buf[i]&0xff, con->http_chunk->buf[i]); > } 652d656 < 654a659,660 > //if (con->http_chunk->left <=0) continue; > 746a753 > write_log(LOG_DEFAULT, "left %d len %d maxread %d read_bytes %d", con->http_chunk->left, len, maxread, read_bytes); 773c780 < xa_debug (4, "read trailing [%s]", con->http_chunk->buf); --- > xa_debug (4, "read trailing [%s] %d", con->http_chunk->buf, len);
------------------- logs shows the issues, see highlitted -------------------------
[04/Nov/2010:18:33:40] [6:Source Thread] -------add_chunk: Chunk 24 was [100] bytes
[04/Nov/2010:18:33:40] [6:Source Thread] http_chunk left now=86
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: In function kick_dead_clients. Will run 2 laps
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: leaving function kick_dead_clients, 2 laps left
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received 86 bytes in try 0, total 0, errno: 0
[04/Nov/2010:18:33:40] [6:Source Thread] -------add_chunk: Chunk 25 was [86] bytes
[04/Nov/2010:18:33:40] [6:Source Thread] read trailing [] -1
[04/Nov/2010:18:33:40] [6:Source Thread] http_chunk left now=0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: In function kick_dead_clients. Will run 2 laps
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: leaving function kick_dead_clients, 2 laps left
[04/Nov/2010:18:33:40] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received -1 bytes in try 0, total 0, errno: 11
[04/Nov/2010:18:33:40] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received -1 bytes in try 1, total 0, errno: 11
[04/Nov/2010:18:33:40] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received -1 bytes in try 2, total 0, errno: 11
[04/Nov/2010:18:33:41] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:41] [6:Source Thread] DEBUG: Source received -1 bytes in try 3, total 0, errno: 11
[04/Nov/2010:18:33:41] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:41] [6:Source Thread] DEBUG: Source received -1 bytes in try 4, total 0, errno: 11
[04/Nov/2010:18:33:42] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:42] [6:Source Thread] DEBUG: Source received -1 bytes in try 5, total 0, errno: 11
[04/Nov/2010:18:33:42] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:42] [6:Source Thread] DEBUG: Source received -1 bytes in try 6, total 0, errno: 11
[04/Nov/2010:18:33:42] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:42] [6:Source Thread] DEBUG: Source received -1 bytes in try 7, total 0, errno: 11
[04/Nov/2010:18:33:43] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:43] [6:Source Thread] DEBUG: Source received -1 bytes in try 8, total 0, errno: 11
[04/Nov/2010:18:33:43] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:43] [6:Source Thread] DEBUG: Source received -1 bytes in try 9, total 0, errno: 11
[04/Nov/2010:18:33:44] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:44] [6:Source Thread] DEBUG: Source received -1 bytes in try 10, total 0, errno: 11
[04/Nov/2010:18:33:44] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:44] [6:Source Thread] DEBUG: Source received -1 bytes in try 11, total 0, errno: 11
[04/Nov/2010:18:33:44] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:44] [6:Source Thread] DEBUG: Source received -1 bytes in try 12, total 0, errno: 11
[04/Nov/2010:18:33:45] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:45] [6:Source Thread] DEBUG: Source received -1 bytes in try 13, total 0, errno: 11
[04/Nov/2010:18:33:45] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:45] [6:Source Thread] DEBUG: Source received -1 bytes in try 14, total 0, errno: 11
[04/Nov/2010:18:33:46] [6:Source Thread] sock read d
[04/Nov/2010:18:33:46] [6:Source Thread] sock read a
[04/Nov/2010:18:33:46] [6:Source Thread] add_chunk: hex [], read=21 off 0
[04/Nov/2010:18:33:46] [6:Source Thread] 0: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 1: 61 a
[04/Nov/2010:18:33:46] [6:Source Thread] 2: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 3: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 4: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 5: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 6: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 7: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 8: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 9: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 10: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 11: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 12: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 13: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 14: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 15: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 16: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 17: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 18: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 19: 0
[04/Nov/2010:18:33:46] [6:Source Thread] add_chunk: http chunk left=0
[04/Nov/2010:18:33:46] [6:Source Thread] DEBUG: Source received 0 bytes in try 15, total 0, errno: 0
[04/Nov/2010:18:33:46] [6:Source Thread] left 0 len 0 maxread 0 read_bytes 0
[04/Nov/2010:18:33:46] [6:Source Thread] Didn't receive data from source after 6000 milliseconds, assuming it died...
[04/Nov/2010:18:33:46] [6:Source Thread] Kicking source 0 [10.4.44.132] [Source died] [http source], connected for 32 seconds on
comment:3 by , 14 years ago
The following change seems to work fine and should be low risk.
diff -c src/source.c.orig src/source.c *** src/source.c.orig 2010-11-03 10:05:51.000000000 -0600 --- src/source.c 2010-11-04 16:56:24.000000000 -0600 *************** *** 652,657 **** --- 652,659 ---- xa_debug (4, "add_chunk: http chunk left=%d", con->http_chunk->left); + if (con->http_chunk->left <=0) continue; // this will clean up if trailing bytes were not read out. + } else if (len > 0) { con->http_chunk->off += len; len = -1;
comment:4 by , 14 years ago
Cc: | added |
---|
Sorry for the long delay.
It is not really clear, what your last fix should do. The condition "con->http_chunk->left <=0" should never occur, as this means either a 0 chunk or a negative one is found. This would be an error in data.
If it occurs, then something must be wrong with the previous strtol call.
What are the chunk sizes you have? The code expects 20 bytes for one chunk, whereas the chunk header may have only 3 or 4, so it reads too much, but that should not cause a failure.
Maybe your sender sends chunks of 0 bytes?
comment:5 by , 14 years ago
The problem is caused by the trailing bytes (\r\n) not being read out sometimes (I have a wireless unit so that there are delays in data and sometimes the trailing bytes are not read out). So see line 772, this sometimes returns -1 (EAGAIN).
len = sock_read_line(con->sock, con->http_chunk->buf, 20); // read trailing /n. ajd
Note there is no error checking here so it is fails then the trailing bytes are then interpreted as start of next chunk size.
The right size would be to make sure the trailing bytes are read out. But it is not as easy the way it is structured. The change I made would detect the problem and handle it. I thought it is an easier fix and lower risk.
comment:6 by , 14 years ago
Well I rather would fix the parsing completely :-) Thanks for that additional info. It makes the problem much clearer.
comment:7 by , 14 years ago
Hmm, I'm still fail with this issue.
len = sock_read_line(con->sock, con->http_chunk->buf + con->http_chunk->off, 20 - con->http_chunk->off); xa_debug (5, "add_chunk: hex [%s], read=%d", con->http_chunk->buf, len); if (len > (20 - con->http_chunk->off)) { // a whole line could be read. ajd con->http_chunk->left = strtol(con->http_chunk->buf, (char **)NULL, 16); con->http_chunk->off = 0; xa_debug (4, "add_chunk: http chunk left=%d", con->http_chunk->left); } else if (len > 0) { con->http_chunk->off += len; len = -1; break; } else break;
Len can return 3 values:
- -1 (no data, error)
- 0 to 20-offset (data, but not enough)
- 20-offsets+1 (return found)
Only in the third case the first if clause should be used. But your statements seems to indicate, that (-1 > 20-0), which can't be. Also all values are int, so there should be no conversion issues.
Also should the offset never be >20, so the check can never get negative.
Would it be possible to setup a ntripcaster on one of our servers and you stream data to it, so I can verify this myself?
comment:8 by , 14 years ago
The problem is not with the read here. The problem is at line source.c:772
len = sock_read_line(con->sock, con->http_chunk->buf, 20); // read trailing /n. ajd
If this line fails (to read the "\r\n") due to EAGAIN, then the stream would look like "\r\n\r\n8e\r\ndata....\r\n" The code can not handle that.
comment:9 by , 14 years ago
Ok. Now I think I can reproduce. For me this piece of code looks actually totally misplaced. It should be part of the above case (storing a variable somewhere which tells whether a \r\n\ or a length is to be expected). I will see what I can do.
comment:11 by , 14 years ago
Owner: | changed from | to
---|---|
Status: | new → needinfo |
Did you have time to test it?
comment:12 by , 14 years ago
Reporter: | changed from | to
---|
comment:14 by , 14 years ago
I tried and it works for my tests. Please let me know what version it will be released under and when. Thanks.
Found another condition which will cause the connection to prematurely drop, see log below:
[03/Nov/2010:19:08:44] [11:Source Thread] DEBUG: Source received -1 bytes in try 5, total 0, errno: 11
[03/Nov/2010:19:08:44] [11:Source Thread] add_chunk: hex [ab], read=21
[03/Nov/2010:19:08:44] [11:Source Thread] add_chunk: http chunk left=171
[03/Nov/2010:19:08:44] [11:Source Thread] DEBUG: Source received -1 bytes in try 6, total 0, errno: 11
[03/Nov/2010:19:08:45] [11:Source Thread] left 171 len -1 maxread 0 read_bytes 0
[03/Nov/2010:19:08:45] [11:Source Thread] Didn't receive data from source after 2800 milliseconds, assuming it died...
maxread is not updated if left is > SOURCE_READSIZE in the else statement.