CCExtractor Changelog

What's new in CCExtractor 0.94

Dec 16, 2021
  • BOM is no longer enabled by default on windows platforms
  • CEA-708: Rust decoder is now default instead of C decoder
  • CEA-708 subs are now extracted by default
  • New: Add check for Minimum supported rust version (MSRV) (#1387)
  • Fix: Fix CEA-708 Carriage Return command implementation
  • Fix: Fix bug with startat/endat parameter (#1396)
  • Fix: Mac Build processes (#1390)
  • Fix: Fix bug with negative delay parameter (#1365)
  • Pin Rust to 1.56.0 due to bug in 1.57.0
  • Changes in release artifacts:
  • Reintroduction of a minimal CCExtractor source package for Linux (omits the windows and git folder)
  • Add a portable version for Windows

New in CCExtractor 0.93 (Aug 17, 2021)

  • Minor Rust updates (format, typos, docs)
  • Updated GUI

New in CCExtractor 0.92 (Aug 10, 2021)

  • Rust updates: Added srt writer
  • Rust updates:-Added writers for transcripts and SAMI
  • Added missing DLL to Windows installer
  • Updated Windows GUI

New in CCExtractor 0.91 (Jul 26, 2021)

  • More Rust in the 708 decoder (Add Pen Presets and timing functions)
  • Updated GUI

New in CCExtractor 0.90 (Jul 14, 2021)

  • New installer
  • New GUI (flutter based)
  • More Rust (the 708 decoder is being rewritten)

New in CCExtractor 0.88 (May 22, 2019)

  • More tapping points for debug image in ccextractor.
  • Add support for tesseract 4.0
  • Remove multiple RGB to grey conversion in OCR.
  • Update UTF8Proc to 2.2.0
  • Update LibPNG to 1.6.35
  • Update Protobuf-c to 1.3.1
  • Warn instead of fatal when a 0xFF marker is missing
  • Segfault in general_loop.c due to null pointer dereference (case of no encoder)
  • Enable printing hdtv stats to console.
  • Many typos in comments and output messages
  • Ignore Visual Studio temporary project files
  • Add support for non-Latin characters in stdout
  • Check whether stream is empty
  • Add support for EIA-608 inside .mkv
  • Add support for DVB inside .mkv
  • Added -latrusmap Map Latin symbols to Cyrillic ones in special cases of Russian Teletext files (issue #1086)
  • Fix: Several OCR crashes

New in CCExtractor 0.87 (Jan 22, 2019)

  • New: Upgrade libGPAC to 0.7.1.
  • New: mp4 tx3g & multitrack subtitles.
  • New: Guide to update dependencies (docs/Updating_Dependencies.txt).
  • New: Add LICENSE File (#959).
  • New: Display quantisation mode in info box (#954).
  • New: Add instruction required to build ccextractor with HARDSUBX support (#946).
  • New: Added version no. of libraries to --version.
  • New: Added -quant (OCR quantization function).
  • New: Python API now compatible with Python 3.
  • Fix: linux/builddebug: Added non-local directories to the incluye search path so we don't
  • require a locally compiled tesseract or leptonica.
  • Fix: Correct -HARDSUBX Bug In CMake, allow build with hardsubx using cmake (#966).
  • Fix: possible segfaults in hardsubx_classifier.c due to strdup (#963).
  • Fix: Improve the start and end timestamps of extracted burned in captions (#962).
  • Fix: Update COMPILATION.md (#960).
  • Fix: Fixed crash with "-out=report" and "-out=null".
  • Fix: -nocf not working with OCR'ing (#958).
  • Fix: segfault in add_cc_sub_text and initialize to NULL in init_encoder (#950).
  • Fix: ccx_decoders_common.c: Copy data type when creating a copy of the subtitle structure.
  • Fix: Implicit declaration of these functions throws warning during build (#948).
  • Fix: ccx_decoders_common.c: Properly release allocated resources on free_subtitle().
  • Fix: Added a datatype member to struct cc_subtitle - needed so we can properly free all
  • memory when void *data points to a structure that has its own pointers.
  • Fix: dvb_subtitle_decoder.c: When combining image regions verify that the offset is
  • never negative.
  • Fix: Updated traivis.yml to fix osx build (#947).
  • Fix: Add utf8proc src file to cmake, updated header file (#944).
  • Fix: Added required pointers on freep() calls.
  • Fix: Removed dvb_debug_traces_to_stdout and used the usual dbg_print instead.
  • Fix: Additional debug traces for DVB.
  • Fix: Fix minor memory leak in ocr.c.
  • Fix: Fix issue with displaying utf8proc version.
  • Fix: Fix failing cmake due to liblept/tesseract header files.
  • Fix: Added missing n in params.c.
  • Fix: builddebug: Use -fsanitize=address -fno-omit-frame-pointer.
  • Fix: ccx_decoders_common.c: Removed trivial memory leak.
  • Fix: ccx_encoders_srt.c: Made sure a pointer is non-NULL before dereferencing.
  • Fix: dvb_subtitle_decoder.c: Initialize pointer members to NULL when creating a structure.
  • Fix: lib_ccx.c: Initialize (memset 0) structure cc_subtitle after memory allocation.
  • Fix: Added verboseness to error/warnings in dvb_subtitle_decoder.c.
  • Fix: dvb_subtitle_decoder.c: Work on passing invalid streams errors upstream (plus some
  • warning messages) so we can eventually recover from this situation instead of crashing.
  • Fix: telxcc.c: Currently setting a colour doesn't necessarily add a space even though the
  • specifications mandate it. (#930).
  • Fix: dvb_subtitle_decoder.c: Fix null pointer derefence when region==NULL in write_dvb_sub.
  • Fix: DVB Teletext subtitle incomplete.
  • Fix: replace all 0xA characters within startbox with 0x20.
  • Fix: DVB Teletext subtitle incomplete (#922).
  • Fix: Add missing return value to one of the returns in process_tx3g().
  • Fix: Typos and other minor bugs.
  • Fix: Tidy CMakeLists & vcxproj (#920).
  • Fix: Added m2ts and -mxf to help screen.
  • Fix: Added MKV to demuxer_print_cfg.
  • Fix: Added MXF to demuxer_print_cfg.
  • Fix: "Out of order packets" error had wrong print() parameters.
  • Fix: Updated Python documentation.
  • Fix: Fix incorrect path in XML (#904).
  • Fix: linux build script (non-debug): Don't hide warnings from compiler.
  • Fix: linux build script (debug): Display what's step of the build script we're in.
  • Fix: Make the build reproducible (#976).
  • Fix: Remove instance of o1 and o2 from help.
  • Fix: Colors of DVB subtitles with depth 2 broken due to a missing break.
  • Fix: CEA-708: Caption loss due to CW command (#991).
  • Fix: CEA-708: Update patch for windows priority with functions (#990).

New in CCExtractor 0.75 (Apr 25, 2016)

  • Fixed issue with teletext to other then srt
  • CCExtractor can be used as library if compiled using cmake
  • By default the Windows version adds BOM to generated UTF files (this is because it's needed to open the files correctly) while all other builds don't add it (because it messes with text processing tools)
  • You can use -bom and -nobom to change the behaviour

New in CCExtractor 0.74 (Apr 25, 2016)

  • Fixed issue with -o1 -o2 and -12 parameters (where it would write output only in the o2 file)
  • Fixed UCLA parameter issue. Now the UCLA parameter settings can't be overwritten anymore by later parameters that affect the custom transcript
  • Switched order around for TLT and TT page number in custom transcript to match UCLA settings
  • Added nobom parameter, for when files are processed by tools that can't handle the BOM. If using this, files might be not readable under windows.
  • Segfault fix when no input files were given
  • No more bin output when sending to server + possibility to send TT to server for processing
  • Windows: Added the Microsoft redistributable MSVCR120.DLL to both the installation package and the application zip.

New in CCExtractor 0.73 (Apr 25, 2016)

  • Added support of BIN format for Teletext
  • Added start of librarisation. This will allow in the future for other programs to use encoder/decoder functions and more

New in CCExtractor 0.72 (Apr 25, 2016)

  • Fix for WTV files with incorrect timing
  • Added support for fps change using data from AVC video track in a H264 TS file
  • Added FFMpeg Support to enable all encapsulator and decoder provided by ffmpeg

New in CCExtractor 0.71 (Apr 25, 2016)

  • Added feature to receive captions in BIN format according to CCExtractor's own protocol over TCP (-tcp port [-tcppassword password])
  • Added ability to send captions to the server described above or to the online repository (-sendto host[:port])
  • Added -stdin parameter for reading input stream from standard input
  • Compilation in Cygwin using linux/Makefile
  • Fix for .bin files when not using latin1 charset
  • Correction of mp4 timing, when one timestamp points timing of two atom

New in CCExtractor 0.70 (Apr 25, 2016)

  • Added a huge dictionary.
  • Added DVB subtitles decoder, spupng in output
  • Added support for cdt2 media atoms in QT video files. Now multiple atoms in a single sample sequence are supported.
  • Changed Makefile.
  • Fixed some bugs.
  • Added feature to print info about file's subtitles and streams (-out=report).
  • Support Long PMT.
  • Support Configuration file.
  • There is an sample configuration file in doc/ folder with name ccextractor.cnf.sample
  • Just now only ccextractor.cnf named files kept beside ccextractor executable is supported
  • for details of which options can be set using configuration file, please look at sample file.
  • Added options for custom transcript output:
  • new parameter (-customtxt format), where the format must be like this: 1100100 (7 digits).
  • These indicate whether the next things should be displayed or not in the (timed) transcript:
  • Display start time
  • Display end time
  • Display caption mode
  • Display caption channel
  • Use a relative timestamp ( relative to the sample)
  • Display XDS info
  • Use colors
  • Make sure you use this parameter after others that might affect these settings (-out, -ucla, -xds, -txt, -ttxt, ...)
  • Fixed Negative timing Bug

New in CCExtractor 0.69 (Apr 25, 2016)

  • A few patches including proper support for multiple multicast clients listening on the same port
  • GUI: Fixed teletext preview
  • GUI: Added a small indicator of data being received when reading from
  • UDP
  • GUI: Added UTF-8 support to preview Window (used for teletext)
  • Fixes in Makefile and build script, compilation in linux and OSX failed if another libpng was found in the system
  • WTV support directly in CCExtractor (no need for wtvccdump any more)
  • Started refactoring and clean-up
  • Fix: MPEG clock rollover (happens each 26 hours) caused a time discontinuity
  • Windows GUI: Started work on HDHomeRun support. For now it just looks for HDHomeRun devices. Lots of other things will arrive in the next versions
  • Windows GUI: Some code refactoring, since the HDHomeRun support makes the code larger enough to require more than one source file

New in CCExtractor 0.68 (Apr 25, 2016)

  • A couple of shared variables between 608 decoders were causing problems when both fields were processed at the same time with 12, fixed
  • Added BOM for UTF-8 files
  • Corrected a few extended characters in the UTF-8 encoding probably never used in real world captioning but since we got a good test sample file
  • Color and fonts in PAC commands were ignored, fixed
  • Added a new output format, spupng. It consists on one .png file for each subtitle frame and one .xml with all the timing
  • Some fixes

New in CCExtractor 0.67 (Apr 25, 2016)

  • Padding bytes were being discarded early in the process in 0.66, which is convenient for debugging, but it messes with timing in raw, which depends on padding. Fixed.
  • MythTV's branch had a fixed size buffer that could not be enough some times. Made dynamic.
  • Better support for PAT changing mid stream.
  • Removed quotes in Start in .smi (format fix).
  • Added multicast support
  • Added ability to select IP address to bind in UDP
  • Fixes in -unixts and -delay for teletext.
  • Added -autodash : When two people are talking, add a dash as needed (this is based on subtitle position). Only in .srt and with -trim. Quite experimental, feedback appreciated.
  • Added -latin1 to select Latin 1 as encoding. Default is now UTF-8 (-utf8 still exists but it's not needed).
  • Added -ru1, which emulates a (non-existing in real life) 1 line roll-up mode.

New in CCExtractor 0.66 (Apr 25, 2016)

  • Fixed bug in auto detection code that triggered a message about file being auto of sync.
  • Added -investigate_packets
  • The PMT is used to select the most promising elementary stream to get captions from. Sometimes captions are where you least expect it so -datapid allows you to select a elementary stream manually, in case the CC location is not obvious from the PMT contents. To assist looking for the right stream, the parameter "-investigate_packets" will have CCExtractor look inside each
  • stream, looking for CC markers, and report the streams that are likely to contain CC data even if it can't be determined from their PMT entry.
  • Added -datastreamtype to manually selecting a stream based on its type instead of its PID. Useful if your recording program always hides the caption under the stream stream type.
  • Added -streamtype so if an elementary stream is selected manually for processing the streamtype can be selected too. This can be needed if you process for example a stream that is declared as "private MPEG" in the PMT, so CCExtractor can't tell what it is.
  • Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6 MPEG private data).
  • PMT content listing improved, it now shows the stream type for more types.
  • Fixes in roll-up, cursor was being moved to column 1 if a RU2, RU3 or RU4 was received even if already in roll-up mode.
  • Added -autoprogram. If a multiprogram TS is processed and autoprogram is used CCExtractor will analyze all PMTs and use the first program that has a suitable data stream.
  • Timed transcript (ttxt) now also exports the caption mode roll-up, paint-on, etc) next to each line, as it's useful to detect things like commercials.
  • Content Advisory information from XDS is now decoded if it's transmitted in "US TV parental guidelines" or "MPA".
  • Other encoding such as Canada's are not supported yet due to lack of samples.
  • Copy Management information from XDS is now decoded.
  • Added -xds. If present and export format is timed transcript only), XDS information will be saved to file (same file as the transcript, with XDS being clearly marked). Note that for now all XDS data is exported even if it doesn't change, so the transcript file will be significantly larger.
  • Added some PaintOn support, at least enough to prevent it from breaking things when the other modes are used.
  • Removed afd_data() warning. AFD doesn't carry any caption related data. AFD still detected in code in case we want to do something with it later anyway.
  • Ported last changes from Petr Kutalek's telxcc. Current version is 2.4.4.
  • In teletext mode when exporting to transcript (not .srt), an effort is made to detect and merge line duplicates. This is done by using the Levenshtein's distance, which is the number of changes requires to convert one string to another. To simplify things, strings are compared up to the length of the shortest one.
  • There are 3 parameters that can be used to tweak the thresholds:
  • deblev: Enable debug so the calculated distance for each two strings is displayed. The output includes both strings, the calculated distance, the maximum allowed distance, and whether the strings are ultimately considered equivalent or not, i.e.
  • the calculated distance is less or equal than the max allowed.
  • levdistmincnt value: Minimum distance we always allow regardless of the length of the strings. Default 2. This means that if the calculated distance is 0, 1 or 2, we consider the strings to be equivalent.
  • levdistmaxpct value: Maximum distance we allow, as a percentage of the shortest string length. Default 10%. For example, consider a comparison of one string of 30 characters and one of 60 characters. We want to determine whether the first 30 characters of the longer string are more or less the same as the shortest string, i.e. whether the longest string is the shortest one plus new characters and maybe some corrections. Since the shortest string is 30 characters and the default percentage is 10%, we would allow a distance of
  • up to 3 between the first 30 characters.
  • Added -lf : Use UNIX line terminator (LF) instead of Windows (CRLF).
  • Added -noautotimeref: Prevent UTC reference from being auto set from the stream data.

New in CCExtractor 0.65 (Apr 25, 2016)

  • Minor GUI changes for teletext
  • Added end timestamps in timed transcripts
  • Added support for SMPTE
  • Initial support for MPEG2 video tracks inside MP4 files
  • Improved MP4 auto detection
  • Support for PCR if PTS is not available (needed for some teletext samples, and probably useful for everything else).
  • Support for UDP streaming - finally. Use "-udp $port" to have CCExtractor listen for a stream. I've only been able to test it with an European HDHomeRun, but it should work fine with any other tuner.
  • Refactored PMT / PAT processing in transport streams, now allows to display their contents (-parsePAT and -parsePMT) which makes troubleshooting easier.

New in CCExtractor 0.64 (Oct 29, 2012)

  • Changed Window GUI size (larger).
  • Added Teletext options to GUI.
  • Added -teletext to force teletext mode even if not detected
  • Added -noteletext to disable teletext detection. This can be needed for streams that have both 608 data and teletext packets if you need to process the 608 data (if teletext is detected it will take precedence otherwise).
  • Added -datapid to force a specific elementary stream to be used for data (bypassing detections).
  • Added -ru2 and -ru3 to limit the number of visible lines in roll-up captions (bypassing whatever the broadcast says).
  • Added support for a .hex (hexadecimal) dump of data.
  • Added support for wtv in Windows. This is done by using a new program (wtvccdump.exe) and a new DirectShow filter (CCExtractorDump.dll) that process the .wtv using DirecShow's filters and export the line 21 data to a .hex file. The GUI calls wtvccdump.exe as needed.
  • Added --nogoptime to force PTS timing even when CCExtractor would use GOP timing otherwise.

New in CCExtractor 0.63 (Aug 17, 2012)

  • Telext support added. Integration is still quite basic (there's equivalent code from both CCExtractor and telxcc) and some clean up is needed, but it works. Some bug fixes, as usual.

New in CCExtractor 0.59 (Oct 7, 2011)

  • More AVC/H.264 work. pic_order_cnt_type != 0 will be processed now.
  • Fix: Rollup captions with interruptions for Text (with ResumeTextDisplay in the middle of the caption data) were missing complete lines. Added a timed text transcript output format, probably only useful for rollup captions. Use timedtranscript or ttxt.
  • Output is like this:
  • 00:01:25,485 | HOST: LAST NIGHT THE REPUBLICAN
  • 00:01:29,522 | HOPEFULS INTRODUCE THEMSELVES TO
  • 00:01:30,623 | PRIMARY VOTERS.
  • XDS parser. Not complete (no point in dealing with VChip stuff for example), but enough to extract program and station information.
  • Input streams can now come from standard input using (just an hyphen) as parameter.
  • Added a new output format called 'null' (use null or out=null). This format means "Don't produce any file", and is useful to have CCExtractor process the stream (for XDS messages, debugging, etc) without actually generating anything.
  • Updated Windows GUI. Added quiet => If used, CCExtractor will not write any message.
  • Added stdout => If used, the captions will be sent to stdout (console) instead of file. Combined with , CCExtractor can work as a filter in a larger process, receiving the stream from stdin and sending the captions to stdout. Some code clean up, minor refactoring.
  • Teletext detection (not yet processing).

New in CCExtractor 0.54 (Apr 17, 2009)

  • Add -nosync and -fullbin switches for debugging purposes.
  • Remove -lg (--largegops) switch.
  • Improve syncronization of captions for source files with jumps in their time information or gaps in the caption information.
  • [R. Abarca] Changed Mac script, it now compiles/link everything from the /src directory.
  • It's now possible to have CCExtractor add credits automatically.
  • Added a feature to add start and end messages (for credits).

New in CCExtractor 0.53 (Feb 24, 2009)

  • Force generated RCWT files to have the same length as source file.
  • Fix documentation for -startat / -endat switches.
  • Make -startat / -endat work with all output formats.
  • Fix sync check for raw/rcwt files. - Improve timing of dvr-ms NTSC captions.
  • Add -in=bin switch to read CCExtractor's own binary format.
  • Fix problem with short input files (smaller 1MB).
  • Clean up regular and debug output.
  • Add --no_progress_bar switch to help readability of redirected output.
  • Add -out=bin switch to write RCWT data.
  • Remove -bo/--bufferoutput switch and functionality.
  • [Volker] Added new generic binary format (RCWT for Raw Captions With Time). This new format allows one file to contain all the available closed caption data instead of just one stream.
  • Added --no_progress_bar to disable status information (mostly used when debugging, as the progress information is annoying in the middle of debug logs).
  • The Windows GUI was reported to freeze in some conditions.
  • Fixed. - The Windows GUI is now targeted for .NET 2.0 instead of 3.5. This allows Windows 2000 to run it (there's not .NET 3.5 for Windows 2000), as requested by a couple of key users.

New in CCExtractor 0.52 (Dec 22, 2008)

  • Removed -autopad and -goppad, no longer needed.
  • In preparation to a new binary format we have renamed the current .bin to .raw. Raw files have only CC data (with no header, timing, etc).
  • The input file format (when forced) is now specified with -in=format such as -in=ts, -in=raw, -in=ps ...
  • The old switches (-ts, -ps, etc) still work.
  • The only exception is -bin which has been removed (reserved for the new binary format). Use -in=raw to process a raw file.
  • Removed -d, which when produced a raw file used a DVD format. This has been merged into a new output type "dvdraw". So now instead of using -raw -d as before, use -out=dvdraw if you need this.
  • Removed --noff
  • Added gui_mode_reports for frontend communications, see related file.
  • Windows GUI rewritten. Source code now included, too.
  • [Volker] Dish Network clean-up

New in CCExtractor 0.50 (Dec 13, 2008)

  • Changes:
  • [Volker] Fix in DVR-MS NTSC timing
  • [Volker] More clean-up
  • Minor fixes

New in CCExtractor 0.49 (Dec 10, 2008)

  • Changes:
  • [Volker] Major MPEG parser rework. Code much
  • cleaner now.
  • Some stations transmit broken roll-up captions,
  • and for some reason don't send CRs but RUs...
  • Added work-around code to make captions readable.
  • Started work on EIA-708 (DTV). Right now you can
  • add -debug-708 to get a dump of the 708 data.
  • An actually useful decoder will come soon.
  • Some of the changes MIGHT HAVE BROKEN MythTV's
  • code. I don't use MythTV myself so I rely on
  • other people's samples and reports. If MythTV
  • is broken please let me know.
  • Added new debug options.
  • Other minor bugfixes and changes.

New in CCExtractor 0.46 (Nov 25, 2008)

  • Changes:
  • Added support for live streaming, ccextractor can now process files that are being recorded at the same time.
  • [Volker] Added a new DVR-MS loop - this is completely new, DVR-MS specific code, so we no longer use the generic MPEG code for DVR-MS. DVR-MS should (or will be eventually at least) be as reliable as TS.
  • Note: For now, it's only ATSC recordings, not NTSC (analog) recordings.

New in CCExtractor 0.45 (Nov 17, 2008)

  • Changes:
  • Added autodetection of DVR-MS files.
  • Added -asf to force DVR-MS mode.
  • Added some specific support for DVR-MS files. These format used to work correcty in 0.34 (pure luck) but the MPEG code rework broke it. It should work as it used to.
  • Updated Windows GUI to support the new options.
  • Added -lg --largegops From the help screen: Each Group-of-Picture comes with timing information. When this info is too separate (for example because there are a lot of frames in a GOP) ccextractor may prefer not to use GOP timing. Use this option is you need ccextractor to use GOP timing in large GOPs.

New in CCExtractor 0.44 (Sep 10, 2008)

  • Added an option to the GUI to process individual files in batch, i.e. call ccextractor once per file. Use it if you want to process several unrelated files in one go.
  • Added an option to prevent duplicate lines in roll-up captions.
  • Several minor bugfixes.
  • Updated the GUI to add the new options.

New in CCExtractor 0.41 (Jun 16, 2008)

  • A semi-decent MPEG parser (instead of a pattern scanner).
  • A transcript mode. No time codes, no repeated lines in roll-up caption.
  • Tivo support.
  • Several bugfixes.

New in CCExtractor 0.40 (May 21, 2008)

  • - A semi-decent MPEG parser (instead of a pattern scanner).
  • - A transcript mode. No time codes, no repeated lines in roll-up caption.
  • - Tivo support.
  • - Several bugfixes

New in CCExtractor 0.30 (May 25, 2007)

  • Fix in extended char decoding, I wasn't replacing the previous char.
  • When a sequence code was found before having a PTS, reported time was undefined.