1 (edited by JDiG 2017-02-14 17:34:54)

Topic: Some results from QTGMC & HEVC encoding tests

I've been tooling around with Hybrid, trying to squeeze maximum performance out of it for QTGMC deinterlacing + Bobbing, followed by x265 encoding.

Hybrid's latest release (2017.02.12.2) is stable on my encoding rig, based on my test runs today. More important than that is the fact that a processing bottleneck that used to slow down QTGMC and multithreaded Avisynth performance has been removed from the pipeline.

This bottleneck on an earlier version of Hybrid was evident when I attempted QTGMC deinterlacing + bobbing for NVEncC encoding on a GTX 1060: the encoding speed hovered around 2 fps. On a new development version of Hybrid the encoding speed suddenly jumped to 13.50 fps, proving the removal of the bottleneck.

Since GTX 1060's HEVC encoder leaves quite a bit of room for improvement visual quality wise, I decided to see what kind of encoding performance I could get when running both QTGMC and x265 software encoder simultaneously. So instead of QTGMC being done on the CPU and encoding on the GPU, now QTGMC processing and encoding are both done on the CPU.

First some data on the rig and Hybrid settings used.

Rig:
• Dual Xeon X5660 Westmere (MMX, SSE-SSE4.2)
• 12GB of RAM
• Windows 7 64-bit

Some Hybrid settings on Filtering > Avisynth > Misc page:
• "distributor()" enabled
• Memory max 1536 with "double" enabled
• "enforce 8bit" enabled

Encoding source:
• 1080i 29.97fps TFF, mpeg2 .ts, video bitrate 16.8 mbps, 6580 frames in length

Encoding output:
• 1080p59.94, H.265 .mkv, video bitrate <10 mbps

Results:
https://s2.postimg.org/4lq39ejft/x265_enc_results.png

The first thing I noticed was that an x265 executable compiled on MSVC was slightly faster on my rig than an executable compiled on GCC. Hence the following tests were all done on the MSVC-compiled x265.

What the results seem to show is that raising thread count for Edi and MT does not result in faster encoding. That's at least partly because adding threads takes CPU cycles away from x265 encoder which is otherwise happy to use all available CPU cycles. Raising thread counts certainly shows up on Avisynth benchmark as increased # of threads, higher CPU usage and even higher average fps but those results do not translate to real-world encoding scenarios.

I'll do additional encoding tests using GPU acceleration instead of x265 to see how Edi & MT thread counts affect performance when encoding is not done by the CPU.

2

Re: Some results from QTGMC & HEVC encoding tests

Since your Avisynth script is ~3 times faster than the average encoding speed, Avisynth isn't your bottleneck. x265 is.
So there is no reason that a faster script processing helps with the encoding speed.  -> as long as your Avisynth script isn't the bottleneck, throwing more cpu power at it is the wrong move.
Ideally you want to spend the least amount of cpu power on the script handling, while at the same time stay speedwise above the encoding speed. wink
Since you got lots of cpu cores, you might want to use a higher slice count. (for me max performance was hit at coreCount/2 slices; be aware that using slices lowers the compression efficiency)

Cu Selur

3

Re: Some results from QTGMC & HEVC encoding tests

Thank you Selur! I'll try raising the slice count.

By the way, I noticed that even when I've ticked the "Adaptive quantization: Spatial" box in NVEnc > Main, it looks like AQ is set to Off during encoding. Also, ticking and unticking the box makes no changes to the NVEnc command line string shown at the bottom of the Main screen.

4

Re: Some results from QTGMC & HEVC encoding tests

By the way, I noticed that even when I've ticked the "Adaptive quantization: Spatial" box in NVEnc > Main, it looks like AQ is set to Off during encoding. Also, ticking and unticking the box makes no changes to the NVEnc command line string shown at the bottom of the Main screen.

Only if you constant quantizer which contradicts adaptive quantization. (I merely forgot to gray it out wink)

5

Re: Some results from QTGMC & HEVC encoding tests

^ Heh, OK big_smile

I changed Slices 1 to 0, and ran a job, but x265 does nothing. Jobs window says "STARTED encoding video", but nothing actually happens. CPUs are idling. Task manager reports ffmpeg_32.exe uses 1.7GB and x265 uses 484MB of RAM.

I also tried setting Pools to 2 (since the rig has 2 CPUs) and Slices to 8, but the same thing happens as above.

6

Re: Some results from QTGMC & HEVC encoding tests

That's strange. Worked fine last time I used the option,.. and like always without a debug output I got no clue. smile

7 (edited by JDiG 2017-02-14 19:27:57)

Re: Some results from QTGMC & HEVC encoding tests

Here's the DebugOutput.

I saw in Task Manager that x265.exe's CPU usage climbed to 20% for a blink of an eye, and then it fell to zero. DebugOutput shows that x265 actually did something for a brief moment.

8

Re: Some results from QTGMC & HEVC encoding tests

Just tested it here,...
no problem with the Threading settings at all,..

9

Re: Some results from QTGMC & HEVC encoding tests

I swapped MSVC-compiled x265 to the GCC-compiled version, but the same thing happens. If I leave Slices at 1 but set Pools to 2, x265 starts encoding as expected.

10

Re: Some results from QTGMC & HEVC encoding tests

Encoding call:

"C:\PROGRA~1\Hybrid\ffmpeg_32.exe" -y -threads 16 -i "F:\HYBRID_TEMP\encodingTempSynthSkript_19_23_00_7410.avs" -an -sn  -vsync 0  -pix_fmt yuv420p  -f yuv4mpegpipe - | "C:\PROGRA~1\Hybrid\x265.exe" --preset slow --pme --input - --y4m --profile main --no-high-tier --level-idc 5.1 --bframes 8 --ref 6 --slices 12 --crf 25.00 --limit-refs 1 --dynamic-rd 0.00 --vbv-maxrate 40000 --vbv-bufsize 40000 --aud --deblock=-2:-2 --repeat-headers --range limited --colorprim bt709 --transfer bt709 --colormatrix bt709 --output "F:\HYBRID~1\19_23_00_7410_06.265"

looks fine and the general output of ffmpeg looks fine:

ffmpeg version N-83486-g25d9cb4621 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 6.3.0 (Rev1, Built by MSYS2 project)
  configuration:  --enable-avisynth --enable-gmp --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --disable-w32threads --enable-fontconfig --enable-frei0r --enable-gnutls --enable-libass --enable-libbluray --enable-libbs2b --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libwavpack --enable-libwebp --enable-libxavs --enable-libxvid --enable-libzimg --enable-libsnappy --enable-gpl --enable-nvenc --enable-version3 --enable-filter=frei0r --disable-debug
  libavutil      55. 46.100 / 55. 46.100
  libavcodec     57. 78.100 / 57. 78.100
  libavformat    57. 66.102 / 57. 66.102
  libavdevice    57.  2.100 / 57.  2.100
  libavfilter     6. 73.100 /  6. 73.100
  libswscale      4.  3.101 /  4.  3.101
  libswresample   2.  4.100 /  2.  4.100
  libpostproc    54.  2.100 / 54.  2.100

 Input #0, avisynth, from 'F:\HYBRID_TEMP\encodingTempSynthSkript_19_23_00_7410.avs':
  Duration: 00:00:29.66, start: 0.000000, bitrate: 0 kb/s
    Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 1920x1080, 
 FFmpeg : 59.94 fps, 59.94 tbr, 59.94 tbn, 59.94 tbc
Output #0, yuv4mpegpipe, to 'pipe:':
  Metadata:
    encoder         : Lavf57.66.102
    Stream #0:0: Video: wrapped_avframe, yuv420p, 1920x1080, q=2-31, 200 kb/s, 59.94 fps, 59.94 tbn, 59.94 tbc
    Metadata:
      encoder         : Lavc57.78.100 wrapped_avframe
Stream mapping:
  Stream #0:0 -> #0:0 (rawvideo (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help

frame=    9 fps=0.0 q=-0.0 size=   27338kB time=00:00:00.15 bitrate=1491506.9kbits/s speed=0.287x    
frame=   16 fps= 15 q=-0.0 size=   48600kB time=00:00:00.26 bitrate=1491507.2kbits/s speed=0.257x    
frame=   25 fps= 15 q=-0.0 size=   75938kB time=00:00:00.41 bitrate=1491505.8kbits/s speed=0.248x    
frame=   32 fps= 14 q=-0.0 size=   97200kB time=00:00:00.53 bitrate=1491503.4kbits/s speed=0.236x    
frame=   40 fps= 14 q=-0.0 size=  121500kB time=00:00:00.66 bitrate=1491504.9kbits/s speed=0.239x    

doesn't show any problem.

the output of x265:

 y4m  [info]: 1920x1080 fps 60000/1001 i420p8 unknown frame count
 raw  [info]: output file: F:\HYBRID~1\19_23_00_7410_06.265
 x265 [info]: HEVC encoder version 2.2+35-fe2f2dd96f8c
 x265 [info]: build info [Windows][MSVC 1910][64 bit] 8bit+10bit+12bit
 x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
 x265 [info]: Main profile, Level-5.1 (Main tier)
 x265 [info]: Thread pool 0 using 12 threads on numa nodes 0
 x265 [info]: Thread pool 1 using 12 threads on numa nodes 1
 x265 [info]: Slices                              : 12
 x265 [info]: frame threads / pool features       : 5 / wpp(17 rows)+pme
 x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
 x265 [info]: ME / range / subpel / merge         : star / 57 / 3 / 3
 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
 x265 [info]: Lookahead / bframes / badapt        : 25 / 8 / 2
 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
 x265 [info]: References / ref-limit  cu / depth  : 6 / off / on
 x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
 x265 [info]: Rate Control / qCompress            : CRF-25.0 / 0.60
 x265 [info]: VBV/HRD buffer / max-rate / init    : 40000 / 40000 / 0.900
 x265 [info]: tools: rect limit-modes rd=4 psy-rd=2.00 rdoq=2 psy-rdoq=1.00
 x265 [info]: tools: rskip signhide tmvp strong-intra-smoothing lslices=4
 x265 [info]: tools: slices=12 deblock(tC=-2:B=-2) sao

also doesn't indicate any problem,...

=> try not using pools and slices at the same time and/or lowering the slice count

11

Re: Some results from QTGMC & HEVC encoding tests

Selur wrote:

=> try not using pools and slices at the same time and/or lowering the slice count

My first run was with Pools=1 and Slices=0 which is why Hybrid set Slices to 12 automatically. It crashed the same way as other jobs, i.e. x265 processed ~40 frames and gave up without crashing.

Selur wrote:

general output of ffmpeg looks fine

You saw this in DebugOutput:

frame=    9 fps=0.0 q=-0.0 size=   27338kB time=00:00:00.15 bitrate=1491506.9kbits/s speed=0.287x   
frame=   16 fps= 15 q=-0.0 size=   48600kB time=00:00:00.26 bitrate=1491507.2kbits/s speed=0.257x   
frame=   25 fps= 15 q=-0.0 size=   75938kB time=00:00:00.41 bitrate=1491505.8kbits/s speed=0.248x   
frame=   32 fps= 14 q=-0.0 size=   97200kB time=00:00:00.53 bitrate=1491503.4kbits/s speed=0.236x   
frame=   40 fps= 14 q=-0.0 size=  121500kB time=00:00:00.66 bitrate=1491504.9kbits/s speed=0.239x

That's how much x265 did before throwing in the towel. But it or Hybrid didn't crash, x265 just stopped working. I waited approx. 90 seconds while keeping an eye on Task Manager, and when x265 CPU use remained at 0%, I killed the job.

12

Re: Some results from QTGMC & HEVC encoding tests

Seems like for some reason the decoder isn't sending any more data (or the encoder isn't requesting any more),...
(might be some strange combination with QTGMC that is triggering this)
Since 0 <> Hybrid uses CoreCount/2 slices, try: Pools/Threads set to 1/0 and n slices set to 8 (which is what I use on my dual Xeon system).

13 (edited by JDiG 2017-02-14 20:08:14)

Re: Some results from QTGMC & HEVC encoding tests

Selur wrote:

Seems like for some reason the decoder isn't sending any more data (or the encoder isn't requesting any more),...
(might be some strange combination with QTGMC that is triggering this)
Since 0 <> Hybrid uses CoreCount/2 slices, try: Pools/Threads set to 1/0 and n slices set to 8 (which is what I use on my dual Xeon system).

Ths is odd. I'm now running a job with Pools/Threads set to 1/0 and Slices set to 2... and x265 is chugging along at 1.75fps.

The odd thing is that I tried this exact same configuration earlier, and that time x265 gave up after working for a very very short while.

update: Pools/Threads at 1/0 and Slices at 8 results again in a very short burst of activity after which x265 goes on vacation.

14

Re: Some results from QTGMC & HEVC encoding tests

Odd indeed.
Hmm,.. try if it changes when you use 'Yadif (Avisynth)' instead of QTGMC as Deinterlacer.

15

Re: Some results from QTGMC & HEVC encoding tests

Also try if it changes anything if you don't enable Distributed motion estimation (pme).
I never use it since it caused problems on my system which I couldn't reproduce properly so that the x265 developers also could reproduce them,...

Tried to reproduce the issue with download HDV2.mpeg from https://archive.org/download/InterlacedVsProgressive but here no random stopping occurred.
I tried even with --frame-threads 4 --pmode --pme --pools 4,4 ad --slicees 8,...

=> can you reproduce the issue with a short sample? If you can, could you upload it somewhere and share it with me? (so I can try whether I can reproduce the issue with your sample, may be the source triggers some bug,...)

16

Re: Some results from QTGMC & HEVC encoding tests

^ Link to test clip sent.

I'll run more tests using the settings you recommended. It could very well be that the problem is my older Xeons, but who knows...

17

Re: Some results from QTGMC & HEVC encoding tests

Don't think it's the general age as my Xeon system consists of two Xeon E5640 (64bit Win10 pro; 96GB RAM; 500GB SSD) which aren't that new and crispy either. smile

  • started Hybrid

  • reset the x265 settings to make sure nothing interferes

  • set slices to 0
    which gave me: 'x265 --input - --y4m --limit-modes --no-open-gop --slices 8 --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --qpfile GENERATED_QP_FILE --psy-rd 2.50 --rdoq-level 2 --psy-rdoq 15.00 --aq-mode 2 --no-cutree --range limited --colormatrix bt709 --output OUTPUTFILE'

  • added job to queue and started it, after a few seconds encoding showed up at ~2.7fps, CPU usage ~80%. Then after a while a bunch of ffmpeg crash-popups occured when ffmpeg used more than 2GB RAM

  • next try I did was with avisynth 'add distributor()' disabled, speed was ~2.7fps, and ffmpeg seems to be stuck at 1711MB (Encoding ran through without a problem here. )

=> what is the RAM of ffmpeg when the encoding stops for you?

18

Re: Some results from QTGMC & HEVC encoding tests

Assuming Avisynth is the issue here.
Try the following:
a. download the 'Vapoursynth Addon' from the download section of my homepage
b. extract it's content into your Hybrid folder
c. restart Hybrid
d. load source, configure x265, set Deinterlacer to 'QTGMC (Vapoursynth)'
e. add job to queue and start encoding
-> here the encoding runs fine at ~70% CPU usage and 4fps (only set slices to 0); I guess faster speeds are possible too by tweaking the QTGMC and x265 settings
(to remove the Vapoursynth support and switch back to Avisynth, you just need to either edit the misc.ini file and set vapoursynth to  "false" or delete all the content you added with the 'Vapoursynth Addon')

=> with Vapoursynth the bottleneck is x265 here, I tested the speed with x265 preset ultrafast + slices 8 and got 80% cpu usage and ~20fps so if one aims for more speed x265 settings need to be adjusted,..

Cu Selur

Ps.: I'll also play around with your source a bit to see whether I can get avisynth to use less RAM. smile

19

Re: Some results from QTGMC & HEVC encoding tests

Selur wrote:

Don't think it's the general age as my Xeon system consists of two Xeon E5640 (64bit Win10 pro; 96GB RAM; 500GB SSD) which aren't that new and crispy either. smile

96GB of RAM?! Holy sh....

=> what is the RAM of ffmpeg when the encoding stops for you?

A few megabytes short of 1700 MB.

20

Re: Some results from QTGMC & HEVC encoding tests

96GB of RAM?! Holy sh....

That allows me to:
a. use 64GB as RAM Drive which I normally use as Hybrid temp folder wink
or
b. run multiple VM with decent memory wink

A few megabytes short of 1700 MB.

that would still be okay, still please try whether the problem also occurs if you use Vapoursynth

Cu Selur

21

Re: Some results from QTGMC & HEVC encoding tests

Another thing when using Avisynth, set Filtering->Avisynth->Misc->Memory max to '2000' (this did stop the crash on my system)

Cu Selur

22 (edited by JDiG 2017-02-14 22:58:01)

Re: Some results from QTGMC & HEVC encoding tests

Okay.. first few tests with Vapoursynth. QTGMC settings as previously.

#1
x265 set to Pools/Threads 1/0, Slices set to 0 (so Hybrid sets it to 12). Pmode disabled, pme enabled.
Result: short blip of action from x265.exe, followed by throwing in the towel.
Notes: Hybrid runs mencoder_64.exe instead of FFmpeg now.

#2
x265 set to Pools/Threads 1/0, Slices set to 1. Pmode disabled, pme enabled.
Result: x265 takes a few moments to wake up, then grabs 700MB of RAM and gets to work. Task Manager says it's taking up 48-53% of CPU.
Average encoding fps: 3.55 fps

#3
x265 set to Pools/Threads 1/0, Slices set to 4. Pmode disabled, pme enabled.
Result: x265 wakes up for a moment, grabs 700MB of RAM, and falls back asleep again.

#4
x265 set to Pools/Threads 1/0, Slices set to 0. Pmode and pme disabled.
Result: x265 is prodded with a pitchfork until it wakes up from slumber, tries to grab some RAM and crashes.

#5
x265 set to Pools/Threads 1/0, Slices set to 4. Pmode and pme disabled.
Result: x265 grabs 700MB of RAM and gets to work.... for a couple of seconds, then CPU use drops to 0%.

23

Re: Some results from QTGMC & HEVC encoding tests

Argh,.. something is off Vapoursynth isn't used. (That's a bug, otherwise vspipe instead of mencoder would be used)
-> looking into it

24 (edited by JDiG 2017-02-14 23:07:30)

Re: Some results from QTGMC & HEVC encoding tests

Wanted to see how Vapoursynth QTGMC plays with NVEncC. It doesn't.

Job finishes like nothing's wrong, and the encoding speed is close to 60 fps which is fantastic. Unfortunately the output plays very incorrectly; playback speed varies at first, then settles for a speeded-up motion, like playing 30fps content at 60fps. Output also looks like it hasn't been deinterlaced, there are combing artefacts everywhere.

edit: re-ran the NVEncC job and created a DebugOutput (attached).

25

Re: Some results from QTGMC & HEVC encoding tests

No surprise, since Vapoursynth isn't used,... (like I posted before)