Speed it up, a bit

Rate this post

I have posted here several modifications for some libc functions optimised to run faster on modern processors (see the Algorithms and Technologies category).

All these functions were implemented in DataparkSearch Engine. As the performance of these functions, especially in comparison with standard implementations on specified platform, depends on microprocessor and compiler optimisation level used, I’ve added special performance testing during configure stage to select only those function implementations which run faster on target platform. That gives you maximum performance of DataparkSearch on every platform.

Also, I’ve spun off these function implementations as a separate library, libdp, which can be installed on your PC separately and be used to speed up any dynamic linked application (via LD_PRELOAD environment variable). E.g.:


LD_PRELOAD=/usr/local/dpsearch/lib/libdp-4.so perl programm.pl

There is PerlBench performance comparison for Perl interpreter running on Intel Celeron M processor under Ubuntu Linux 10.04. Configuration A is the system Perl invoked as /usr/bin/perl, Configuration B is the same interpreter invoked as /usr/bin/perl5.10.1, Configuration C is the system Perl invoked with libdp preloaded via LD_PRELOAD environment variable. Small performance differences in 1-2% can be counted as sporadic fluctuations and corresponding tests as run at the same performance.

And there is rjsh-pybench performance comparison for system Python interpreter running on the same laptop without (benchmark p0) and with (benchmark p1) libdp preloaded.


PYBENCH 1.4

Benchmark: p1 (rounds=20, warp=1)

Comparing with: p0 (rounds=20, warp=1)
Tests:                              min run     cmp run     avg run      diff
-----------------------------------------------------------------------------
          BuiltinFunctionCalls:   169.50 ms   159.50 ms   173.80 ms    +6.27%
           BuiltinMethodLookup:   179.00 ms   179.00 ms   190.12 ms    -0.00%
                 CompareFloats:   289.00 ms   289.00 ms   300.25 ms    +0.00%
         CompareFloatsIntegers:   239.00 ms   219.50 ms   245.63 ms    +8.88%
               CompareIntegers:   219.00 ms   218.50 ms   228.90 ms    +0.23%
        CompareInternedStrings:   217.00 ms   217.00 ms   226.40 ms    +0.00%
                  CompareLongs:   209.00 ms   208.50 ms   213.35 ms    +0.24%
                CompareStrings:   277.50 ms   277.50 ms   285.00 ms    +0.00%
                CompareUnicode:   218.50 ms   218.50 ms   223.83 ms    +0.00%
                 ConcatStrings:   139.50 ms   159.50 ms   156.62 ms   -12.54%
                 ConcatUnicode:   169.50 ms   189.50 ms   185.28 ms   -10.55%
               CreateInstances:   189.00 ms   189.00 ms   193.50 ms    +0.00%
            CreateNewInstances:  1117.00 ms  1127.00 ms  1141.15 ms    -0.89%
       CreateStringsWithConcat:   198.50 ms   208.50 ms   201.85 ms    -4.80%
       CreateUnicodeWithConcat:   179.50 ms   209.50 ms   196.13 ms   -14.32%
                  DictCreation:   179.00 ms   179.00 ms   186.83 ms    +0.00%
             DictWithFloatKeys:   199.00 ms   199.00 ms   206.60 ms    -0.00%
           DictWithIntegerKeys:   218.00 ms   218.00 ms   224.75 ms    +0.00%
            DictWithStringKeys:   317.00 ms   317.00 ms   326.80 ms    +0.00%
                      ForLoops:   179.50 ms   189.50 ms   195.80 ms    -5.28%
                    IfThenElse:   258.00 ms   258.00 ms   271.63 ms    +0.00%
                   ListSlicing:   289.00 ms   289.00 ms   297.30 ms    +0.00%
                NestedForLoops:   449.50 ms   449.50 ms   467.98 ms    +0.00%
          NormalClassAttribute:   248.00 ms   249.00 ms   254.55 ms    -0.40%
       NormalInstanceAttribute:   248.50 ms   248.50 ms   253.50 ms    +0.00%
           PythonFunctionCalls:   289.00 ms   289.00 ms   294.23 ms    -0.00%
             PythonMethodCalls:   279.50 ms   279.50 ms   291.72 ms    +0.00%
                     Recursion:   179.50 ms   179.50 ms   192.52 ms    +0.00%
                  SecondImport:   868.00 ms   888.50 ms   884.02 ms    -2.31%
           SecondPackageImport:   119.50 ms   119.50 ms   127.45 ms    +0.00%
         SecondSubmoduleImport:   129.50 ms   129.50 ms   138.88 ms    +0.00%
       SimpleComplexArithmetic:   189.50 ms   199.00 ms   203.45 ms    -4.77%
        SimpleDictManipulation:   308.50 ms   308.50 ms   322.00 ms    +0.00%
         SimpleFloatArithmetic:   348.00 ms   348.50 ms   358.18 ms    -0.14%
      SimpleIntFloatArithmetic:   268.00 ms   268.00 ms   277.42 ms    +0.00%
       SimpleIntegerArithmetic:   188.50 ms   188.50 ms   195.37 ms    +0.00%
        SimpleListManipulation:   268.50 ms   258.50 ms   275.75 ms    +3.87%
          SimpleLongArithmetic:   189.50 ms   179.00 ms   198.60 ms    +5.87%
                    SmallLists:   148.50 ms   148.50 ms   153.90 ms    -0.00%
                   SmallTuples:   179.00 ms   169.50 ms   183.95 ms    +5.60%
         SpecialClassAttribute:   199.00 ms   199.00 ms   202.32 ms    +0.00%
      SpecialInstanceAttribute:   279.00 ms   279.00 ms   285.30 ms    -0.00%
                StringMappings:   259.00 ms   259.00 ms   266.23 ms    -0.00%
              StringPredicates:   235.50 ms   235.50 ms   246.42 ms    -0.00%
                 StringSlicing:   128.50 ms   139.00 ms   138.25 ms    -7.55%
                     TryExcept:   198.00 ms   198.50 ms   202.45 ms    -0.25%
                TryRaiseExcept:   129.00 ms   129.00 ms   135.25 ms    +0.00%
                  TupleSlicing:   289.50 ms   299.00 ms   305.12 ms    -3.18%
               UnicodeMappings:   229.00 ms   239.00 ms   247.88 ms    -4.18%
             UnicodePredicates:   244.50 ms   244.50 ms   253.53 ms    -0.00%
             UnicodeProperties:   186.50 ms   186.00 ms   194.32 ms    +0.27%
                UnicodeSlicing:   109.50 ms   129.00 ms   122.97 ms   -15.12%
-----------------------------------------------------------------------------
   Notional minimum round time: 13036.00 ms 13156.50 ms                -0.92%

Small differences in 1-5% less 1% can be considered as fluctuations as well.

2 Responses to “Speed it up, a bit”

  1. Maxime Says:

    The PYBENCH benchmark results were replaced by the results of the same test run at warp factor 1 instead of 10 previously, that gives more precise measurement.

  2. Old trick speedups RaspberryPi | Founds Says:

    […] The main aim is to test DataparkSearch Engine on a new architecture. And the first step was to test libdp, an auxiliary library implementing some libc functions in a more efficient way (see Speed it up, a bit). […]

Leave a Reply