Opened 5 years ago
Closed 11 months ago
#119 closed defect (worksforme)
Pull relays do not reconnect, recover on 2.0.36
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Component: | Professional Caster |
Version: | Keywords: | ||
Cc: |
Description
We have run into an issue with relays reconnecting to sources that go away and then come back. Our relevant configuration is:
relay pull -i u:p -m /RTCM3EPH products.igs-ip.net:2101/RTCM3EPH
max_clients 10000
max_clients_per_source 1000
max_sources 40
max_admins 2
throttle 0
max_ip_connections 1000
I'm going through the code trying to understand how timeouts to read connections like this would be applied. Is there some configuration we're missing that could help us recover quickly?
We had a recent outage where our BKG relay did not recover after a relay source went down for 15 minutes - the BKG relay stayed down for 2 hours, while our other caster (a SNIP) recovered the stream after 15 minutes.
Any advice in debugging this would be very appreciated. I'm thinking about adding a setsockopt call with SOL_TCP and TCP_USER_TIMEOUT on the sockets to improve timeouts.
Attachments (0)
Change History (3)
comment:1 by , 12 months ago
Status: | new → assigned |
---|
comment:2 by , 12 months ago
Owner: | changed from | to
---|---|
Status: | assigned → needinfo |
comment:3 by , 11 months ago
Resolution: | → worksforme |
---|---|
Status: | needinfo → closed |
Somehow that got overlooked. Is this still an issue. Can you reproduce or describe what's required for that to happen because on the instances here it works as expected.