Context Navigation

← Previous Ticket
Next Ticket →

Modify ↓

#119 closed defect (worksforme)

Pull relays do not reconnect, recover on 2.0.36

Reported by:	mark@…	Owned by:	mark@…
Priority:	major	Component:	Professional Caster
Version:		Keywords:
Cc:

Description

We have run into an issue with relays reconnecting to sources that go away and then come back. Our relevant configuration is:

relay pull -i u:p -m /RTCM3EPH products.igs-ip.net:2101/RTCM3EPH

max_clients 10000
max_clients_per_source 1000
max_sources 40
max_admins 2
throttle 0

max_ip_connections 1000

I'm going through the code trying to understand how timeouts to read connections like this would be applied. Is there some configuration we're missing that could help us recover quickly?

We had a recent outage where our BKG relay did not recover after a relay source went down for 15 minutes - the BKG relay stayed down for 2 hours, while our other caster (a SNIP) recovered the stream after 15 minutes.

Any advice in debugging this would be very appreciated. I'm thinking about adding a setsockopt call with SOL_TCP and TCP_USER_TIMEOUT on the sockets to improve timeouts.

Attachments (0)

Change History (3)

comment:1 by stuerze, 2 years ago

Status:	new → assigned

comment:2 by stoecker, 2 years ago

Owner:	changed from stoecker to mark@…
Status:	assigned → needinfo

Somehow that got overlooked. Is this still an issue. Can you reproduce or describe what's required for that to happen because on the instances here it works as expected.

comment:3 by stoecker, 23 months ago

Resolution:	→ worksforme
Status:	needinfo → closed

Modify Ticket

Change Properties

Summary:
Description:	We have run into an issue with relays reconnecting to sources that go away and then come back. Our relevant configuration is: relay pull -i u:p -m /RTCM3EPH products.igs-ip.net:2101/RTCM3EPH max_clients 10000 max_clients_per_source 1000 max_sources 40 max_admins 2 throttle 0 max_ip_connections 1000 I'm going through the code trying to understand how timeouts to read connections like this would be applied. Is there some configuration we're missing that could help us recover quickly? We had a recent outage where our BKG relay did not recover after a relay source went down for 15 minutes - the BKG relay stayed down for 2 hours, while our other caster (a SNIP) recovered the stream after 15 minutes. Any advice in debugging this would be very appreciated. I'm thinking about adding a setsockopt call with SOL_TCP and TCP_USER_TIMEOUT on the sockets to improve timeouts. You may use WikiFormatting here.
Type:		Priority:
Component:		Version:
Keywords:		Cc:

Action

leave as closed The owner will remain mark@….

reopen The resolution will be deleted. Next status will be 'reopened'.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: