Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#12046 closed Patch - Bug Fix (Fixed)

Fix random SBE PlaybackSock timeout in MBE

Reported by: Cédric Schieli <cschieli@…> Owned by: JYA
Priority: major Milestone: unknown
Component: MythTV - General Version: 0.27-fixes
Severity: medium Keywords: SBE PlaybackSock timeout
Cc: Stuart Auchterlonie Ticket locked: no

Description

SBE PlaybackSocks in the master suffer from random disconnection, occuring after a 7000 ms timeout. It often happens when a frontend ask for the thumbnail of a program currently being recorded on a slave backend.

I could identify two problems causing this:

First, there is a race between MainServer::ProcessRequestWork and PlaybackSock::SendReceiveStringList. Even if callbacks are disabled during SendReceiveStringList execution, a ProcessRequestWork may already be running and can swallow the reply, leading to the timeout in SendReceiveStringList.

The second problem is that an invocation of ProcessRequestWork is fired for each block of data arriving in the socket (for example when a reply is long enough to be fragmented, ie. GENERATED_PIXMAP) but this data is consumed all at once by one worker, leaving the other workers without food. This also leads to the timeout in ReadStringList.

This patch fixes the first problem by assuring that no worker reads from the socket while a SendReceiveStringList is running and the second one by aborting a worker if there is no more data to read, but only once the lock has been acquired.

Attachments (1)

0001-Fix-random-SBE-PlaybackSock-timeout-in-MBE.patch (4.4 KB ) - added by Cédric Schieli <cschieli@…> 12 years ago.
Fix random SBE PlaybackSock timeout in MBE

Download all attachments as: .zip

Change History (11)

by Cédric Schieli <cschieli@…>, 12 years ago

Fix random SBE PlaybackSock timeout in MBE

comment:1 by Cédric Schieli <cschieli@…>, 12 years ago

comment:2 by stuartm, 12 years ago

Owner: set to stuartm
Status: newaccepted

comment:3 by Stuart Auchterlonie, 12 years ago

Cc: Stuart Auchterlonie added

comment:4 by JYA, 12 years ago

it looks to me that the reference count on sock is increased too many times:

    sock->IncrRef(); 
    ReferenceLocker rlocker(sock); 

this will cause the sock's reference counter twice, and always by one once the function exits.

ultimately, the socket will never be destroyed and will leak

comment:5 by JYA, 12 years ago

Owner: changed from stuartm to JYA
Status: acceptedassigned

comment:6 by JYA, 12 years ago

Not a consequence of this patch but a still existing problem.

But it seems to me that while now the socket won't be closed unnecessarily, and process request sent whose data has been consumed by another will find the request dismissed and ultimately ignored.

there should be a way to queue the request, or have ReadStringList not consume more data that it should (re-injecting discarded data in the socket)

comment:7 by JYA, 12 years ago

Actually, forget that last comment... Looking as MythSocket::ReadStringList, it only uses the data, not all that is in the buffer.

so the comment: "data is consumed all at once by one worke" is not actually correct. However, there could indeed be a race there...

comment:8 by JYA, 12 years ago

Resolution: Fixed
Status: assignedclosed

Fixed: commit c9395d7c96c06cc6f508b5cbbf87979ea2c5de0b Author: Bradley Baetz <bbaetz@…> Date: Thu May 1 09:43:24 2014 +1000

Fix random SBE PlaybackSock timeout in MBE.

comment:9 by JYA, 12 years ago

Apologies, I misread how the ReferenceLocker class worked which introduced a massive regression.

When I re-applied the patch, I forget to put back the original author...

comment:10 by Cédric Schieli <cschieli@…>, 12 years ago

Thanks a lot Jean-Yves for applying this patch.

I still think that all the data of one (fragmented) reply is read all at once by only one MythSocket::ReadStringList invokation. It will loop until the announced size of the reply (which means the total size of all the fragments) has arrived in the buffer and then consume it all. The other workers fired on each subsequent TCP data packet arrival of the same fragmented reply will never see that data.

Note: See TracTickets for help on using tickets.