Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#10870 closed Bug Report - Hang/Deadlock (fixed)

Race condition between StreamHandler::RemoveListener and StreamHandler::AddListener

Reported by: roger@… Owned by: danielk
Priority: critical Milestone: 0.26
Component: MythTV - General Version: 0.25-fixes
Severity: medium Keywords: AddListener RemoveListener
Cc: Ticket locked: no

Description

There is a race condition when a thread calls StreamHandler::AddListener when a parallel executing thread has already called AddListener::RemoveListener. This results in both threads being deadlocked and never returning from these calls. The error message "Programmer Error: Start called before Stop finished" is also logged. The symptoms of this problem are recordings appearing in the completed recordings list which either have no file associated with them or a file with only a few bytes (usually 326 I think) in it.

I have attached a log showing this problem. This log is from mythbackend running on a quad core machine.

The backend is attempting to record two programs with the same start time on the same DTV multiplex. The capture cards are configured to allow multiple simultaneous recordings. The salient log lines are at

2012-06-26 21:00:02.354383 I [2501/2511] TVRecEvent streamhandler.cpp:97 (RemoveListener) - SH(/dev/dvb/adapter201/frontend0): RemoveListener(0xffffffffa1031e80) -- begin

The thread 2511 will end up blocked in StreamHandler::Stop and never return, and

2012-06-26 21:00:02.369020 I [2501/2706] RecThread streamhandler.cpp:45 (AddListener) - SH(/dev/dvb/adapter201/frontend0): AddListener(0x972e448) -- begin

The thread 2706 will end up blocked in StreamHandler:Start and never return.

Attachments (4)

mythbackendversion.txt (672 bytes ) - added by roger@… 13 years ago.
mythbackend version info
badrec.txt (54.0 KB ) - added by roger@… 13 years ago.
mythbackend log
gdb.txt (63.4 KB ) - added by stuartm 13 years ago.
Partial BT, first dozen threads are missing including one stuck in StreamHandler::RemoveListener()
mythbackend-backtrace.txt (191.9 KB ) - added by pedro@… 13 years ago.
backtrace of deadlocked backend

Download all attachments as: .zip

Change History (19)

by roger@…, 13 years ago

Attachment: mythbackendversion.txt added

mythbackend version info

by roger@…, 13 years ago

Attachment: badrec.txt added

mythbackend log

comment:1 by Roger James <roger@…>, 13 years ago

The patch below is solely intended as a work around, and I have put it here for information only. I have regression tested it and done some very basic testing against the issue I reported. It appears to work. I don't like the idea of introducing extra locks to fix a bug, but it is the least intrusive change I could think of. A better fix would possibly be the implementation of a proper finite state machine with queued event handling.

Roger

$ git diff streamhandler.cpp
diff --git a/mythtv/libs/libmythtv/streamhandler.cpp b/mythtv/libs/libmythtv/str
index 2b0a7b6..28de368 100644
--- a/mythtv/libs/libmythtv/streamhandler.cpp
+++ b/mythtv/libs/libmythtv/streamhandler.cpp
@@ -133,7 +133,8 @@ void StreamHandler::RemoveListener(MPEGStreamData *data)

 void StreamHandler::Start(void)
 {
-    QMutexLocker locker(&_start_stop_lock);
+    QMutexLocker start_stop_procedure_locker(&_start_stop_procedure_lock);
+    QMutexLocker start_stop_locker(&_start_stop_lock);

     if (_running)
     {
@@ -173,7 +174,8 @@ void StreamHandler::Start(void)

 void StreamHandler::Stop(void)
 {
-    QMutexLocker locker(&_start_stop_lock);
+    QMutexLocker start_stop_procedure_locker(&_start_stop_procedure_lock);
+    QMutexLocker start_stop_locker(&_start_stop_lock);

     do
     {
$ git diff streamhandler.h
diff --git a/mythtv/libs/libmythtv/streamhandler.h b/mythtv/libs/libmythtv/strea
index 1db0584..ae0bc80 100644
--- a/mythtv/libs/libmythtv/streamhandler.h
+++ b/mythtv/libs/libmythtv/streamhandler.h
@@ -45,7 +45,7 @@ class PIDInfo
 typedef QMap<uint,PIDInfo*> PIDInfoMap;

 // locking order
-// _pid_lock -> _listener_lock -> _start_stop_lock
+// _pid_lock -> _listener_lock -> _start_stop_procedure_lock -> _start_stop_lock

 class StreamHandler : protected MThread, public DeviceReaderCB
 {
@@ -93,7 +93,7 @@ class StreamHandler : protected MThread, public DeviceReaderCB
     virtual void RemoveNamedOutputFile(const QString &filename) {}
     /// At minimum this sets _running_desired, this may also send
     /// signals to anything that might be blocking the run() loop.
-    /// \note: The _start_stop_lock must be held when this is called.
+    /// \note: The _start_stop_procedure_lock and the _start_stop_lock must be
     virtual void SetRunningDesired(bool desired) { _running_desired = desired;

   protected:
@@ -101,6 +101,7 @@ class StreamHandler : protected MThread, public DeviceReader
     bool              _needs_buffering;
     bool              _allow_section_reader;

+    mutable QMutex    _start_stop_procedure_lock; /// Serialises calls to Start
     mutable QMutex    _start_stop_lock;
     volatile bool     _running_desired;
     volatile bool     _error;
$

comment:2 by beirdo, 13 years ago

Owner: set to danielk
Status: newassigned

comment:3 by danielk, 13 years ago

Milestone: unknown0.26
Priority: minormajor
Status: assignedaccepted

comment:4 by Daniel Thor Kristjansson <danielk@…>, 13 years ago

Resolution: fixed
Status: acceptedclosed

In 1b5864ddcf6d37a5801e2ceccfd1ed9ba6886f35/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:5 by Daniel Thor Kristjansson <danielk@…>, 13 years ago

In 5f04f275d50d69f81dbaa05a0b37acd0b68fc9ec/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:6 by Daniel Thor Kristjansson <danielk@…>, 13 years ago

In ab52f19a201d5238e21e068cae1bc4215c62ee07/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:7 by danielk, 13 years ago

Resolution: fixed
Status: closednew

comment:8 by Daniel Thor Kristjansson <dkristjansson@…>, 13 years ago

Resolution: fixed
Status: newclosed

In f612576017d1cdd0d2f815dce162cf0fd5310ab0/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:9 by Daniel Thor Kristjansson <dkristjansson@…>, 13 years ago

In 445faaa47005601c8be21bfcfd1e5b0e51d111ec/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:10 by stuartm, 13 years ago

Priority: majorcritical
Resolution: fixed
Status: closednew

This doesn't appear to be fixed, just experienced the deadlock in StreamHandler::RemoveListener(), BT attached.

Edit: Backtrace not attached, forgot to enable logging in gdb ...

Last edited 13 years ago by stuartm (previous) (diff)

by stuartm, 13 years ago

Attachment: gdb.txt added

Partial BT, first dozen threads are missing including one stuck in StreamHandler::RemoveListener()

comment:11 by Daniel Thor Kristjansson <dkristjansson@…>, 13 years ago

Resolution: fixed
Status: newclosed

In 9c199a42258f2a19dd1c8d3b934e6eb04f1757ce/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:12 by danielk, 13 years ago

stuartm, unless we have a backtrace with the deadlocked threads we can't really do anything. For now, I've backported the current fix to 0.25-fixes.

by pedro@…, 13 years ago

Attachment: mythbackend-backtrace.txt added

backtrace of deadlocked backend

comment:13 by pedro@…, 13 years ago

This problem doesn't seem fixed. I'm running the ubuntu packages of mythtv version 0.25.2+fixes.20120802.46cab93-0ubuntu1 that judging by the date should include the July 24th backport and yet the backend gets completely deadlocked when trying to record more than one show at a time from the same DVB-T card where all channels are available on the same frequency in a single stream.

To get the server back up I have to kill -9 the backend process and then restart it. I've attached a backtrace that seems to show both calls to StreamHandler::RemoveListener and StreamHandler::AddListener as this bug describes. I've setup recordings of 3 news shows at the same times twice a day and this seems to trip the bug every time, so if any other information is needed let me know.

comment:14 by danielk, 13 years ago

Perdo, please test the 0.26 beta.

comment:15 by Pedro Côrte-Real <pedro@…>, 13 years ago

I tested the packages from https://launchpad.net/~mythbuntu/+archive/0.26 version 0.26.0~master.20120816.7203f17-0ubuntu0mythbuntu2. With those I was able to record 3 shows at a time without it failing.

Note: See TracTickets for help on using tickets.