Topic: x264 performance increasing the --threads setting

Hybrid uses CPU cores*1,5 for the --threads setting of x264 by default.
At my AMD PhenomII X4 quad-core system and my AMD octa-core FX-8350 system CPU cores*2,5 is always the fastest setting.
It boosts performance up to 8% with and without opencl at the 8-core system and up to 2,5% at the quad-core system compared to the default setting @1080p resolution.

Since speed increases most between cores*1,5 and cores*2,25 this will be my new default setting.


Re: x264 performance increasing the --threads setting

okay, do that. smile

3 (edited by mogobime 2017-04-25 23:27:07)

Re: x264 performance increasing the --threads setting

When using OpenCL,
--lookahead-threads calculated by Hybrid / 2
tends to be the fastest setting when using 12-22 --threads.


Re: x264 performance increasing the --threads setting

Wouldn't it be simpler to disable the 'auto' option?

Here's how Hybrid (and x264) calculates the lookahead threads, when set to auto:

 895     if( h->param.i_lookahead_threads == X264_THREADS_AUTO )
 896     {
 897         if( h->param.b_sliced_threads )
 898             h->param.i_lookahead_threads = h->param.i_threads;
 899         else
 900         {
 901             /* If we're using much slower lookahead settings than encoding settings, it helps a lot to use
 902              * more lookahead threads.  This typically happens in the first pass of a two-pass encode, so
 903              * try to guess at this sort of case.
 904              *
 905              * Tuned by a little bit of real encoding with the various presets. */
 906             int badapt = h->param.i_bframe_adaptive == X264_B_ADAPT_TRELLIS;
 907             int subme = X264_MIN( h->param.analyse.i_subpel_refine / 3, 3 ) + (h->param.analyse.i_subpel_refine > 1);
 908             int bframes = X264_MIN( (h->param.i_bframe - 1) / 3, 3 );
 910             /* [b-adapt 0/1 vs 2][quantized subme][quantized bframes] */
 911             static const uint8_t lookahead_thread_div[2][5][4] =
 912             {{{6,6,6,6}, {3,3,3,3}, {4,4,4,4}, {6,6,6,6}, {12,12,12,12}},
 913              {{3,2,1,1}, {2,1,1,1}, {4,3,2,1}, {6,4,3,2}, {12, 9, 6, 4}}};
 915             h->param.i_lookahead_threads = h->param.i_threads / lookahead_thread_div[badapt][subme][bframes];
 916             /* Since too many lookahead threads significantly degrades lookahead accuracy, limit auto
 917              * lookahead threads to about 8 macroblock rows high each at worst.  This number is chosen
 918              * pretty much arbitrarily. */
 919             h->param.i_lookahead_threads = X264_MIN( h->param.i_lookahead_threads, h->param.i_height / 128 );
 920         }
 921     }

from the x264 source code.

-> just changing the x264 'threads' settings on it's own without taking the other factors into account won't give you anything reliable. wink
See: https://forum.doom9.org/showthread.php?t=163901

Cu Selur


Re: x264 performance increasing the --threads setting

=> just getting a higher speed shouldn't be the goal, you normally also want to preserve the accuracy wink

6 (edited by mogobime 2017-04-26 14:03:46)

Re: x264 performance increasing the --threads setting

As mentioned in my PM, using up to 18 threads doesn't change quality in a mentionable way when I believe Dark Shikari's statement.

What I just remebered:
Hybrid doesn't allow toreduce sync-lookahead below the calculated minimum-value, but tooltip says:

Reducing it saves some memory and latency and the cost of throughput, since lower values allow less leeway in when threads are scheduled.

Reducing this, might be useful when using OpenCL with reduced normal lookahead-threads (which tends to be faster for me), but it isn't possible to set this in Hybrid.
(Increasing sync-lookahead only +1 already reduces Performance for me when using OpenCL with lookahead-threads set to auto).


Re: x264 performance increasing the --threads setting

a. that statement was from before all the other threads were implemented, wasn't it?
b. I wanted to tell you that using more lookahead-threads might not be such a good idea. smile

Regarding sync-lookahead-threads, will look at the code once I am back home. Not sure how I implemented that.

Will report back. (also wanted to look into the avisynth-filter-reset thing,.. soo much to do so little time)

Cu Selur


Re: x264 performance increasing the --threads setting

Checked the sync-lookahead code:

int threads = model->intValue(MYSTRING("threads"));
  if (threads == 0) {
    threads = model->intValue(CPUCOUNT) * 1.5;
  threads = threads + model->intValue(MYSTRING("maxBFrames"));
  this->doSet(model, MYSTRING("syncLookahead"), MYSTRING("minimum"), threads);

0 is always possible by disabling syncthreads, other than that minimum is always: Threads + B-Frame count, which makes sense to me. smile (see: https://forum.doom9.org/showthread.php? … st1321285)

Cu Selur

9 (edited by mogobime 2017-04-27 05:10:20)

Re: x264 performance increasing the --threads setting

0 is always possible by disabling syncthreads, other than that minimum is always: Threads + B-Frame count, which makes sense to me.

Not that I wanted to claim that the Hybrid calculation doesn't make sense, I just asked myself why the manual setting blocks values below the calculated minimum.
As mentioned above, raising it only +1 with OpenCL reduces fps at my system already - which made me guess that reducing it might be useful in some set-ups. After reading the linked thread I tried to to set sync-lookahead values below the values Hybrid sets with sync-lookahead set to "minimize" via command-line.
->Useless, x264 truly seems to map them to the Hybrid value (no mentionable changes in fps/bitrate).

I knew that I can disable it already, but never did before. Now I did so and performance didn't change/slightly increased without OpenCL and dropped by -5% with OpenCL enabled hmm I'd have never expected that...
Maybe creating this frame-buffer is just useless extra work for some modern CPUs with large and fast L3/L2 caches. Doing lookahead on the GPU seems to be a different thing.


Re: x264 performance increasing the --threads setting

OpenCL support in x264 was never that 'good', all the tests I did didn't really help with speed here, for some sources I got a few percent higher speed, but iirc it hurt ssim values,...
-> personally I don't use it smile

11 (edited by mogobime 2017-04-27 05:45:07)

Re: x264 performance increasing the --threads setting

Maybe you misunderstood me. I disabled sync-lookahead with and without OpenCL. OpenCL seems to need this buffer, but for encoding without OpenCL sync-lookahead seems to be useless nowadays/decreases performance (really slightly) which makes me think that it's just useless extra work for a modern CPU with big L3/L2 caches...

For me OpenCL lookahead is useful with HD+F-HD resolutions, but it might become useless when using DGDecNV+AviSynth GPU filters+OpenCL lookahead in some cases. But I tend to disable DGDecNV then...


Re: x264 performance increasing the --threads setting

That a lookahead will reduce the speed seems kind of obvious since it is more work, but it should help with the rate control / bit rate distribution. smile
Speed should change depending on the ..-threads settings. smile

13 (edited by mogobime 2017-04-27 05:47:46)

Re: x264 performance increasing the --threads setting

As mentioned above, I'm talking about disabling the sync-lookahead / input-lookahead buffer smile
Quotating akupenguin:

Sync-lookahead is the number of frames in limbo in the queues connecting lookahead to the rest of the encoder. It has no effect on compression quality.


Re: x264 performance increasing the --threads setting

Disabling it hurts speed with openCL, using a higher value hurts the speed to,.. so the conclusion so far is: don't mess with it?


Re: x264 performance increasing the --threads setting

Did a few runs:

  • slow preset (47.68 fps, 5942.16 kb/s)

  • slow preset + opencl + no-scync-lookahead  (46.72 fps, 5938.07 kb/s)

  • slow preset + opencl (sync-lookahead is 27 by default) (49.63 fps, 5908.57 kb/s)

  • slow preset + opencl + sync-lookahead 30 (49.21 fps, 5909.13 kb/s)

  • slow preset + opencl + sync-lookahead 40 (48.97 fps, 5908.01 kb/s)

  • slow preset + opencl + sync-lookahead 80 (49.52 fps, 5909.99 kb/s)

  • slow preset + opencl + sync-lookahead 160 (49.46 fps, 5908.03 kb/s)

  • slow preset + opencl + sync-lookahead 240 (49.42 fps, 5909.11 kb/s)

About my system: Ryzen 7 1800x, 64GB RAM, GeForce GTX 980ti
Source is 1920x1080@24fps with 14315 frames (~10min).
For decoding I used ffmpeg:

ffmpeg -y -loglevel fatal -threads 8 -r 24 -analyzeduration 100M -probesize 100M -i "F:\TestClips&Co\Blu-ray\x264 BR-Demo\BDMV\STREAM\00002.m2ts" -map 0:0 -an -sn  -vsync 0 -pix_fmt yuv420p  -f rawvideo - 

-> so far my conclusion is in regard to opencl+synclookahead: sticking with the default seems to be fine.

  • medium preset + threads 40 (since you mentioned that 2.5*logical code count might be good) + opencl (sync-lookahead is 43 by default) (53.51 fps, 5939.06 kb/s)

-> you were right, even on my system using a higher thread count does increase the speed

Not sure how these findings hold up when additional filtering through Vapoursynht, Avisynth, ffmpeg or mplayer is done,..

Cu Selur


Re: x264 performance increasing the --threads setting

Now I used the following Avisynth script:

SetMemoryMax(3000) # <- this really is necessary
SetMTMode(6,16) # change MT mode
# loading source: F:\TestClips&Co\Blu-ray\x264 BR-Demo\BDMV\STREAM\00002.m2ts
#  input luminance scale tv
# current resolution: 1920x1080
SetMTMode(2) # change MT mode
# adjusting frame rate
return last

I choose SalFPS3(48.0)  since it a. increases the frame count and b. causes the script to eat up some cpu time.
When benching the script with AVSMeter adding distributor() only slowed down the decoding and it crashes the encoding, since ffmpeg will use too much memory for a 32bit application. wink

  • salfps 48 + slow preset (30.04 fps, 7728.63 kb/s)

  • salfps 48 + slow preset + opencl + threads 40 (30.72 fps, 7665.28 kb/s

A noteworthy effects of using the avisynth script:
a. CPU usage has dropped by more than half of the first benchmark round (now 25-50%).
b. using more threads doesn't help
-> seems like the avisynth script is the bottleneck here smile (Vapoursynth or Avisynth+ 64 would probably produce faster results)

Cu Selur

17 (edited by mogobime 2017-04-29 22:35:41)

Re: x264 performance increasing the --threads setting

While writing in another window a reboot button from my firewall popped up and I accidentically hit it by pressing enter. Now all the benchmark results I did with modified sync-lookahead + lookahead threads are lost sad

All I can say now is I used a 1920x1080p source and a modified "very slow" preset (ref 5, bframes 5, merange 16, keyint 500, no-fast-pskip, no-dct-decimate, aq-mode 2).

If the regular x264 threads were set to default (12) and so CPU usage wasn't 100%, increasing sync-lookahead by +2 increased fps 2-3% maximally.
Setting higher values (like you did) didn't help in most cases or even reduced speed.

Also increasing lookahead-threads might help a little.
You can try + or -1 lookahead-thread but it will never help when you don't adjust sync-lookahead at least to the default value that Hybrid would use with this lookahead-threads value. (Simply increase/decrease regular x264 threads as long as Hybrid increases/decreases lookahead-threads by +1/-1. Remember the values and set them after setting regular threads back to the no. of threads you want to use.)
For example (using 12 x264 threads which is default at my system):
lookahead-threads 1/sync-lookahead 17-19
lookahead-threads 2/sync-lookahead 18-20
lookahead-thread 3/sync-lookahead 27-29

There was no setting which helped to squeeze out more than 2-3%. So lookahead-threads left default or  set to +1 / +2 and raising sync-lookahead by +2 might be the best / a safe choice to get roundabout 2% more fps at 1080p.

Raising sync-lookahead seems to help more when you've not optimized your regular x264 threads to the maximum. When not using OpenCL and having threads optimized to the max (CPU cores*2.25 or 2.5), disabling sync-lookahead might even be faster than using it with the default values.
I'm currently repeating a few of tests I did to show the effects. Will post them later.