Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#13607 closed Bug Report - General (fixed)

Program description not being extracted where title expands across additional dvb field

Reported by: bib1963 Owned by: Klaas de Waal
Priority: minor Milestone: 32.0
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

A program "The league of Extraordinary Gentlemen" is being transmitted here in the UK.

The program description is not being extracted.

From the db:
MariaDB [mythtvdb]> select starttime,title,subtitle,description from program where title like "%gentlemen%" limit 1;
+---------------------+---------------------------------------+----------+-------------+
| starttime           | title                                 | subtitle | description |
+---------------------+---------------------------------------+----------+-------------+
| 2020-04-12 17:50:00 | The League of Extraordinary Gentlemen |          |             |
+---------------------+---------------------------------------+----------+-------------+
1 row in set (0.05 sec)

And from dvbsnoop, the relevant extract:

            DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 230 (0xe6)
              ISO639_2_language_code:  eng
            event_name_length: 30 (0x1e)
            event_name: "The League of Extraordinary..."  -- Charset: Latin alphabet
            text_length: 195 (0xc3)
            text_char: "...Gentlemen: (2003) Fantasy with Sean Connery. In an alternative Victorian age, Allan Quatermain, Dorian Gray, Captain Nemo, Mina Harker and the Invisible Man stop a world war. Violence.  [AD,S]"  -- Charset: Latin alphabet

I assume it breaks when it hits that colon at the end of "Gentlemen".

Change History (10)

comment:1 by Klaas de Waal, 5 years ago

Which channel is it? Is this on DVB-T/T2 (Freeview) or on Astra 28.2E satellite (Freesat)? If it is on Freesat I might be able to reproduce this.

comment:2 by bib1963, 5 years ago

That particular extraction was on DVB-T2, but I am sure I have also seen it on satellite.

comment:3 by bib1963, 5 years ago

Here are some more which seem to be missing descriptions:

2020-04-20 13:30:00 | Beyond Stardom                                     
2020-04-20 20:00:00 | Harbour Lights                                     
2020-04-21 00:10:00 | House                                              
2020-04-22 18:30:00 | Lawmen of the Old West                             
2020-04-18 09:45:00 | Tad the Lost Explorer and the Secret of King Midas 
2020-04-20 19:30:00 | Tales of the Unexpected                            
2020-04-18 18:50:00 | The League of Extraordinary Gentlemen              
2020-04-18 20:00:00 | World Without End  

I'm not sure all of them could be hit by data going across multiple fields. "House" is very short and would appear to have corrupted entries or they are not using plain ascii, yet it's the same same entries. Here is the dvbsnoop details...

        DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 93 (0x5d)
              ISO639_2_language_code:  eng
            event_name_length: 5 (0x05)
            event_name: "House"  -- Charset: Latin alphabet
            text_length: 83 (0x53)
            text_char: "..215264.363l.313j_]351.M263376342ޛ222235336.333).8251246277251v202214303314327.347307341/363p327z.@210Ip.[.330E272351352246355356242e276<270256C.273`.3323s342.M257@"  -- Charset: reserved

comment:4 by Stuart Auchterlonie, 5 years ago

That looks suspiciously like it's been encoded as some of the Freesat stuff is

comment:5 by Klaas de Waal, 5 years ago

Owner: set to Klaas de Waal
Status: newassigned

comment:6 by Klaas de Waal, 5 years ago

Status: assignedinfoneeded

The issue with the "The League of Extraordinary Gentlemen" has been reproduced on channel 300, Film4, on Astra-2 28E2. A fix for this issue has been applied in master in commit c1fb397f7f6ad25845f6fe7cde0cead07e11c932.

Please give feedback on this, especially if it does not only fix the "League" issue but if it causes unwanted effects, i.e. regressions, on other programs.

comment:7 by Klaas de Waal <kdewaal@…>, 5 years ago

In c1fb397f7/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:8 by Klaas de Waal, 5 years ago

With additional debug code running for 24 hours receiving EIT from Astra-2 there are four occasions with two different programs where the description would be discarded because there was a year in the concatenated string, as shown here:

2020-04-21 02:01:59.505022 I  KdW UK EIT fixup fix #13607
    position1 m_ukYear 108
    strFull 'Hollywood's Brightest Bombshell: The Hedy Lamarr Story. Documentary about Hollywood wild-child Hedy Lamarr. [2017]'
    kdwfix t,s,d 'Hollywood's Brightest Bombshell' '' 'The Hedy Lamarr Story. Documentary about Hollywood wild-child Hedy Lamarr. [2017]'
    no_fix t,s,d 'Hollywood's Brightest Bombshell' '' ''
--
2020-04-21 02:13:04.503598 I  KdW UK EIT fixup fix #13607
    position1 m_ukYear 50
    strFull 'Teenage Mutant Ninja Turtles: Out of the Shadows: (2016) Part-animated superhero adventure. The quartet of crime-fighting friends try to stop their enemy Shredder from helping the alien Krang from conquering Earth.'
    kdwfix t,s,d 'Teenage Mutant Ninja Turtles: Out of the Shadows' '' '(2016) Part-animated superhero adventure. The quartet of crime-fighting friends try to stop their enemy Shredder from helping the alien Krang from conquering Earth.'
    no_fix t,s,d 'Teenage Mutant Ninja Turtles: Out of the Shadows' '' ''

The "no_fix" string is the title, subtitle, description as a result of the original code and that code discards the desciption because there is a year in the concatenated string.

The "kdwfix" string is the title, subtitle, description with the fix applied. Note that the year in the description is removed by later processing so you do not see that in the guide.

comment:9 by Klaas de Waal <kdewaal@…>, 5 years ago

Resolution: fixed
Status: infoneededclosed

In 96a8372d11/mythtv:

Error: Processor CommitTicketReference failed
GIT backend not available

comment:10 by Stuart Auchterlonie, 5 years ago

Milestone: needs_triage32.0
Version: UnspecifiedMaster Head
Note: See TracTickets for help on using tickets.