removeads/README.md

# Goal of the script

Remove ads from TV recordings with optimal cuts with single video frame precision.

When I record a movie on TV, I sometimes wish to archive the movie on my NAS.
In that case, I want to remove (at least) the beginning and the end of the recording
if the movie was broadcast on a advertisment free channel, or even worse I may
have to split it in numerous parts so as to remove the advertisments.
And I do not want to reencode the entire movie since it's a really slow process (on my NAS)
and the movie is already broadcast using H264 (DVB-T).

Doing this by hand is really painful, because most tools like ffmpeg, or mkvmerge are only
able to cut a movie (without reencoding) at a boundary within the video stream that corresponds
to a reference frame (so called I-frames). These frames are only present roughly every 10-20 frames
which corresponds to quite long duration (in the order of a second).

I really want to cut the movie with a better precision. So I have written a Python script
that leverages _ffmpeg_, _ffprobe_ and _mkvmerge_, _subvodocr_ to do the job with the required precision.

# Parameters

# How does it work ?

The processing follows a quite long pipeline:

1. The original .ts file is first transformed into an .mp4 file using _ffmpeg_ to correct timestamps:

2. The .mp4 is then transformed into a Matroska container (which is the default container) still using _ffmpeg_:

3. Then the movie is then cut using the indications passed as parameters. It is possible to give as many parts as needed.

Each part is treated with the same algorithm.
Trouver l'estampille de la trame 'I' la plus proche (mais postérieure) au début de la portion.
Trouver l'estampille de la trame 'I' la plus proche (mais antérieure) à la fin de la portion.
On a alors
debut  -----    trame  --------- trame  ---------  fin.
'B/P'  'B/P'*   'I'               'I'   'B/P'*   'B/P'
Si la trame de début est déjà 'I', il n'y a rien à faire (idem pour la fin).
Sinon on extrait les trames 'B' ou 'P' depuis le début jusqu'à la trame 'I' non incluse

4. Then each part that have been previously obtained are merged using _mkvmerge_:

5. The subtitles (image based) are then extracted using _mkvextract_:

6. These images are then processed using _vobsubocr_ to create SRT files:

7. The SRT files are then remuxed inside the Matroska container using _mkvmerge_:

# How to determine where to cuts

Use `mpv --osd-fractions --osd-level=3 ./movie.ts`