Modify

Opened 13 years ago

Closed 13 years ago

Last modified 3 years ago

#6 closed defect (fixed)

ntrip server connection reset with ntripcaster

Reported by: jshih@… Owned by: stuerze
Priority: normal Component: Professional Caster
Version: ntripcaster 2.0.9 Keywords: recv() in source.c; http chunk left=0"-condition
Cc: Jun.Shih@…

Description

Bug report from Jun Shih (Jun.Shih@…)

=========== first message ===================
I have seen the following ntrip server reset condition in my log:
[02/Nov/2010:21:26:43] [22:Source Thread] add_chunk: hex [], read=21
[02/Nov/2010:21:26:43] [22:Source Thread] add_chunk: http chunk left=0
[02/Nov/2010:21:26:43] [22:Source Thread] DEBUG: Source received 0 bytes in try
2, total 0, errno: 0
[02/Nov/2010:21:26:43] [22:Source Thread] Didn't receive data from source after
800 milliseconds, assuming it died...

A bit debugging found that this caused by the "http chunk left=0" condition, the recv will return 0 because recv length parameter is zero (con->http_chunk->left - read_bytes =0) in source.c:663

len = recv(con->sock, con->food.source->chunk[con->food.source->cid].data + read_bytes, con->http_chunk->left - read_bytes, 0);

when len=0, subsequently maxread=0 as well. These cause the while loop to terminate on line 737 or 744. I believe this code is the same as 2.0.14.

=========== second message ===================
But in my case connection is not shutdown (I can see from wireshark). Note that recv returns 0 if the length parameter is zero (which is not undocumented in the manual) and it does not return -1 in this case. I have attached a sample program to demonstrate that, to test “ telnet localhost 5000” and type some chars and a return from another terminal.

[jshih@ntriptest tcp_server]$ make tcp_recv
cc tcp_recv.c -o tcp_recv
[jshih@ntriptest tcp_server]$ ./tcp_recv

TCPServer Waiting for client on port 5000
got a connection from (127.0.0.1 , 43891)bytes_recieved 0 errno 0
bytes_recieved 0 errno 0
bytes_recieved 0 errno 0

I know my server is continuously sending but with inserted delay test. Further I changed the code to demonstrate that, there are data in the pipe in my case:

if (con->http_chunk->left < SOURCE_READSIZE) {

len = recv(con->sock, con->food.source->chunk[con->food.source->cid].data + read_bytes, con->http_chunk->left - read_bytes, 0);
xa_debug (4, "Recv LEN1: recv left=%d read=%d len:%d errno:%d", con->http_chunk->left, read_bytes,len, errno);

if (len == 0)

{

len = recv(con->sock, con->food.source->chunk[con->food.source->cid].data + read_bytes, con->http_chunk->left - read_bytes+1, 0);
xa_debug (4, "Recv LEN2: recv left=%d read=%d len:%d errno:%d", con->http_chunk->left, read_bytes,len, errno);


len = 0;

}

maxread = (int)(0.5 * con->http_chunk->left);

Below is the log capture:

[03/Nov/2010:15:53:08] [6:Source Thread] add_chunk: hex [], read=21

[03/Nov/2010:15:53:08] [6:Source Thread] add_chunk: http chunk left=0
[03/Nov/2010:15:53:08] [6:Source Thread] Recv LEN1: recv left=0 read=0 len:0 errno:0
[03/Nov/2010:15:53:08] [6:Source Thread] Recv LEN2: recv left=0 read=0 len:1 errno:0

As you can see if I immediately give a length parameter=1, I successfully recv’ed 1 byte. So there are data.

Also I made a change to source.c (see attached). I run for hours without connection drop (previously was dropping every a few minutes), please review the changes to see if this is a right solution.

changes in source.c =

736c736
< } else if (len == 0) {
---

} else if (len == 0 && con->http_chunk->left) {

744c744
< } while ((read_bytes < maxread) && (tries < source_read_tries));
---

} while ((con->http_chunk->left==0
read_bytes < maxread) && (tries < source_read_tries));

======================

Attachments (2)

tcp_recv.c (1.8 KB ) - added by anonymous 13 years ago.
fix.patch (2.2 KB ) - added by stoecker 13 years ago.
Fix the issue hopefully correctly

Download all attachments as: .zip

Change History (17)

by anonymous, 13 years ago

Attachment: tcp_recv.c added

comment:1 by jshih@…, 13 years ago

Found another condition which will cause the connection to prematurely drop, see log below:

[03/Nov/2010:19:08:44] [11:Source Thread] DEBUG: Source received -1 bytes in try 5, total 0, errno: 11
[03/Nov/2010:19:08:44] [11:Source Thread] add_chunk: hex [ab], read=21
[03/Nov/2010:19:08:44] [11:Source Thread] add_chunk: http chunk left=171
[03/Nov/2010:19:08:44] [11:Source Thread] DEBUG: Source received -1 bytes in try 6, total 0, errno: 11
[03/Nov/2010:19:08:45] [11:Source Thread] left 171 len -1 maxread 0 read_bytes 0
[03/Nov/2010:19:08:45] [11:Source Thread] Didn't receive data from source after 2800 milliseconds, assuming it died...


maxread is not updated if left is > SOURCE_READSIZE in the else statement.

in reply to:  1 comment:2 by jshih@…, 13 years ago

Please ignore my previous comment. After further looking at the code and debugging, I fully understood the problem. The issue that sometimes, reading trailing bytes failed (returned -1 i.e., EAGAIN). So as the trailing bytes are not read out, it is interpreted as the next chunk and have length of 0. Below is my add debug code, and logs so you can see the problem, I will try a few things to see how to fix it and will follow next.

--- debug code change ----

[jshih@ntriptest ntripcaster-2.0.9]$ diff src/source.c.orig src/source.c
647c647
< 						xa_debug (5, "add_chunk: hex [%s], read=%d", con->http_chunk->buf, len);
---
> 						xa_debug (5, "add_chunk: hex [%s], read=%d off %d", con->http_chunk->buf, len, con->http_chunk->off);
649a650,654
>               int i;
>               for (i=0; i <20; i++)
>               {
>                 xa_debug (4, "%2i: %x %c", i, con->http_chunk->buf[i]&0xff, con->http_chunk->buf[i]);
>               }
652d656
< 
654a659,660
>               //if (con->http_chunk->left <=0) continue;
> 
746a753
>     write_log(LOG_DEFAULT, "left %d len %d maxread %d read_bytes %d", con->http_chunk->left, len, maxread, read_bytes);
773c780
< 			xa_debug (4, "read trailing [%s]", con->http_chunk->buf);
---
> 			xa_debug (4, "read trailing [%s] %d", con->http_chunk->buf, len);


------------------- logs shows the issues, see highlitted -------------------------
[04/Nov/2010:18:33:40] [6:Source Thread] -------add_chunk: Chunk 24 was [100] bytes
[04/Nov/2010:18:33:40] [6:Source Thread] http_chunk left now=86
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: In function kick_dead_clients. Will run 2 laps
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: leaving function kick_dead_clients, 2 laps left
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received 86 bytes in try 0, total 0, errno: 0
[04/Nov/2010:18:33:40] [6:Source Thread] -------add_chunk: Chunk 25 was [86] bytes
[04/Nov/2010:18:33:40] [6:Source Thread] read trailing [] -1
[04/Nov/2010:18:33:40] [6:Source Thread] http_chunk left now=0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: In function kick_dead_clients. Will run 2 laps
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: leaving function kick_dead_clients, 2 laps left
[04/Nov/2010:18:33:40] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received -1 bytes in try 0, total 0, errno: 11
[04/Nov/2010:18:33:40] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received -1 bytes in try 1, total 0, errno: 11
[04/Nov/2010:18:33:40] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:40] [6:Source Thread] DEBUG: Source received -1 bytes in try 2, total 0, errno: 11
[04/Nov/2010:18:33:41] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:41] [6:Source Thread] DEBUG: Source received -1 bytes in try 3, total 0, errno: 11
[04/Nov/2010:18:33:41] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:41] [6:Source Thread] DEBUG: Source received -1 bytes in try 4, total 0, errno: 11
[04/Nov/2010:18:33:42] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:42] [6:Source Thread] DEBUG: Source received -1 bytes in try 5, total 0, errno: 11
[04/Nov/2010:18:33:42] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:42] [6:Source Thread] DEBUG: Source received -1 bytes in try 6, total 0, errno: 11
[04/Nov/2010:18:33:42] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:42] [6:Source Thread] DEBUG: Source received -1 bytes in try 7, total 0, errno: 11
[04/Nov/2010:18:33:43] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:43] [6:Source Thread] DEBUG: Source received -1 bytes in try 8, total 0, errno: 11
[04/Nov/2010:18:33:43] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:43] [6:Source Thread] DEBUG: Source received -1 bytes in try 9, total 0, errno: 11
[04/Nov/2010:18:33:44] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:44] [6:Source Thread] DEBUG: Source received -1 bytes in try 10, total 0, errno: 11
[04/Nov/2010:18:33:44] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:44] [6:Source Thread] DEBUG: Source received -1 bytes in try 11, total 0, errno: 11
[04/Nov/2010:18:33:44] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:44] [6:Source Thread] DEBUG: Source received -1 bytes in try 12, total 0, errno: 11
[04/Nov/2010:18:33:45] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:45] [6:Source Thread] DEBUG: Source received -1 bytes in try 13, total 0, errno: 11
[04/Nov/2010:18:33:45] [6:Source Thread] add_chunk: hex [], read=-1 off 0
[04/Nov/2010:18:33:45] [6:Source Thread] DEBUG: Source received -1 bytes in try 14, total 0, errno: 11
[04/Nov/2010:18:33:46] [6:Source Thread] sock read d

[04/Nov/2010:18:33:46] [6:Source Thread] sock read a

[04/Nov/2010:18:33:46] [6:Source Thread] add_chunk: hex [], read=21 off 0
[04/Nov/2010:18:33:46] [6:Source Thread] 0: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 1: 61 a
[04/Nov/2010:18:33:46] [6:Source Thread] 2: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 3: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 4: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 5: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 6: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 7: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 8: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 9: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 10: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 11: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 12: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 13: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 14: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 15: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 16: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 17: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 18: 0
[04/Nov/2010:18:33:46] [6:Source Thread] 19: 0
[04/Nov/2010:18:33:46] [6:Source Thread] add_chunk: http chunk left=0
[04/Nov/2010:18:33:46] [6:Source Thread] DEBUG: Source received 0 bytes in try 15, total 0, errno: 0
[04/Nov/2010:18:33:46] [6:Source Thread] left 0 len 0 maxread 0 read_bytes 0
[04/Nov/2010:18:33:46] [6:Source Thread] Didn't receive data from source after 6000 milliseconds, assuming it died...
[04/Nov/2010:18:33:46] [6:Source Thread] Kicking source 0 [10.4.44.132] [Source died] [http source], connected for 32 seconds on

comment:3 by anonymous, 13 years ago

The following change seems to work fine and should be low risk.

diff -c src/source.c.orig src/source.c
*** src/source.c.orig	2010-11-03 10:05:51.000000000 -0600
--- src/source.c	2010-11-04 16:56:24.000000000 -0600
***************

*** 652,657 ****
--- 652,659 ----
  
  							xa_debug (4, "add_chunk: http chunk left=%d", con->http_chunk->left);
  
+               if (con->http_chunk->left <=0) continue;  // this will clean up if trailing bytes were not read out.
+ 
  						} else if (len > 0) {
  							con->http_chunk->off += len;
  							len = -1;

comment:4 by stoecker, 13 years ago

Cc: Jun.Shih@… added

Sorry for the long delay.

It is not really clear, what your last fix should do. The condition "con->http_chunk->left <=0" should never occur, as this means either a 0 chunk or a negative one is found. This would be an error in data.

If it occurs, then something must be wrong with the previous strtol call.

What are the chunk sizes you have? The code expects 20 bytes for one chunk, whereas the chunk header may have only 3 or 4, so it reads too much, but that should not cause a failure.

Maybe your sender sends chunks of 0 bytes?

comment:5 by jshih@…, 13 years ago

The problem is caused by the trailing bytes (\r\n) not being read out sometimes (I have a wireless unit so that there are delays in data and sometimes the trailing bytes are not read out). So see line 772, this sometimes returns -1 (EAGAIN).

len = sock_read_line(con->sock, con->http_chunk->buf, 20); // read trailing /n. ajd

Note there is no error checking here so it is fails then the trailing bytes are then interpreted as start of next chunk size.

The right size would be to make sure the trailing bytes are read out. But it is not as easy the way it is structured. The change I made would detect the problem and handle it. I thought it is an easier fix and lower risk.

comment:6 by stoecker, 13 years ago

Well I rather would fix the parsing completely :-) Thanks for that additional info. It makes the problem much clearer.

comment:7 by stoecker, 13 years ago

Hmm, I'm still fail with this issue.

len = sock_read_line(con->sock, con->http_chunk->buf + con->http_chunk->off, 20 - con->http_chunk->off);

xa_debug (5, "add_chunk: hex [%s], read=%d", con->http_chunk->buf, len);

if (len > (20 - con->http_chunk->off)) { // a whole line could be read. ajd
	con->http_chunk->left = strtol(con->http_chunk->buf, (char **)NULL, 16);
	con->http_chunk->off = 0;

	xa_debug (4, "add_chunk: http chunk left=%d", con->http_chunk->left);
} else if (len > 0) {
	con->http_chunk->off += len;
	len = -1;
	break;
} else break;

Len can return 3 values:

  • -1 (no data, error)
  • 0 to 20-offset (data, but not enough)
  • 20-offsets+1 (return found)

Only in the third case the first if clause should be used. But your statements seems to indicate, that (-1 > 20-0), which can't be. Also all values are int, so there should be no conversion issues.

Also should the offset never be >20, so the check can never get negative.

Would it be possible to setup a ntripcaster on one of our servers and you stream data to it, so I can verify this myself?

comment:8 by jshih@…, 13 years ago

The problem is not with the read here. The problem is at line source.c:772

			len = sock_read_line(con->sock, con->http_chunk->buf, 20); // read trailing /n. ajd

If this line fails (to read the "\r\n") due to EAGAIN, then the stream would look like "\r\n\r\n8e\r\ndata....\r\n" The code can not handle that.

comment:9 by stoecker, 13 years ago

Ok. Now I think I can reproduce. For me this piece of code looks actually totally misplaced. It should be part of the above case (storing a variable somewhere which tells whether a \r\n\ or a length is to be expected). I will see what I can do.

by stoecker, 13 years ago

Attachment: fix.patch added

Fix the issue hopefully correctly

comment:10 by stoecker, 13 years ago

Please try attached patch and see if it fixes the issue correctly.

comment:11 by stoecker, 13 years ago

Owner: changed from stoecker to stuerze
Status: newneedinfo

Did you have time to test it?

comment:12 by stoecker, 13 years ago

Reporter: changed from stuerze to jshih@…

comment:13 by jshih@…, 13 years ago

sorry I have been busy and will try in this week or next.

comment:14 by jshih@…, 13 years ago

I tried and it works for my tests. Please let me know what version it will be released under and when. Thanks.

comment:15 by stoecker, 13 years ago

Resolution: fixed
Status: needinfoclosed

In [3033/ntrip]:

fix #6 - issue with partial chunk receiving - new release 2.0.15

Modify Ticket

Change Properties
Action
as closed The owner will remain stuerze.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.