Pokefan531's Posts

I post my tech and my series, Yuki's Story.
  • Yuki's Story Series
  • Pokemon Studio Series
  • rss
  • archive
  • Legacy AMD APU Llano laptop for Emulation - Part 4

    Radeon HD 6520g GPU Emulation tests

    image

    Since I covered the CPU for emulation, let’s test out the GPU. The Radeon HD 6520g. It is based on Terascale 2 on lower HD 6000s. It supports DX11, and OpenGL 4.5. Like many Terascale 2 cards, it doesn’t natively support FP64, so it’s emulated. That means advanced tessellation performance is not good, and Mesa Drivers doesn’t have the emulation yet, so it’s stuck at OpenGL 3.3 for now. You can force OpenGL versions on games and apps that doesn’t need FP64. It’s best to try OpenGL 4.3 or 4.4 from commands.

    We are gonna test out emulators on graphic scaling, shaders, and effects. To not have CPU bottleneck, there are few methods. One is to run a light game that only uses software rendering on the emulator, like SNES or Genesis, and use less accurate emulator or setting. Two is to find something that has a lot of things on screen that doesn’t affect the speed of the emulator from CPU bottleneck. Third is to always use overclocking to reduce or remove any CPU bottleneck. For putting effects like SSAO, Per-Pixel, or traditional anti-aliasing, it may be rarely use since it’s from the first gen APU. I couldn’t guess the performance for A4s since I don’t have any lower APUs than A6-3420m. For A8s or new gen APUs, I can guess they are faster than the results below, in most situations. My laptop comes with single memory slot, with DDR3, and unlikely be a bottleneck.

    Shaders: I am testing out shaders and loading Sonic 3 and Knuckles. We’re using 320x224 game to have enough pixels for low res games. We are upscaling the screen at 3x since I’m using a 768p monitor with Integer scaling, which many shaders looks better with. Some shaders will have a + on the fps for having a dynamic that goes faster on more simple details and slower on many details on screen. I let the shaders play out, and find the lowest fps. Tested on Retroarcch without vsync, audio sync, and put on Threaded Video.

    720p or 768p:
    None: 400.0+
    Bicubic: 330.0+
    xBR-lv2: 174.0
    xBR-lv3: 160.0
    xBR-lv3-multipass: 268.0
    xBR-mlv4: 158.0
    Super-xbr-2P: 158.0
    Super-xbr-3P: 126.0
    xbrz-freescale: 75.0+
    xbrx-freescale-multipass: 170.0
    4x-xbrz: 50.0+
    fxaa: 10.0 (set at 1x: 47.0+)
    smaa: 170.0+
    ddt: 290.0+
    crt-geom: 167.0
    crt-royale: 64.0
    mame-ntsc: 54.0
    gtu: 100.0
    lcd-grid-v2: 200.0
    pixellate: 290.0+
    motionblur: 174.0
    nedi: 81.0
    nnedi3-2x-luma: 70.0
    nnedi3-2x-rgb: 35.0
    omniscale-legacy: 24.0
    scalefx: 193.0
    scalefx-hybrid: 157.0
    scalefx-rAA: 63.0
    4xScaleHQ: 155.0
    hq4x: 234.0
    adaptive-sharpen: 11.0 (set at 1x: 85.0)

    At HD Ready resolutions, a lot of the shaders play well from 320x224 at 3x. Scaling shaders like Bicubic runs well. xBR series runs pretty good, and multipass is preferred for faster results. Fixed xBRz is not at fullspeed. FXAA should be set to 1x since it’s only postprocessing an image and not scaling higher. Somehow, it’s still slower than SMAA. Could be how the shader is ported or something? Many CRT shaders works pretty great. Even crt-royale standard works over 60fps, but some crt shaders may not look fine at 3x or 720p resolutions. Hyllian, geom, or cgwg looks fine. Mame-crt one is the most demanding. LCD shaders plays well, and same with sharp upscales at non-integer like Pixellate runs fast. NNEDI3 upscale works fullspeed with only luma, but don’t use anything above like RGB, higher neurons, or 4x. Omniscale are pretty slow, and legacy was less than half. ScaleFX worls pretty well. The hybrid and rAA looks similar, but hybrid is preferred for prerendered game for performance. Adaptive sharpen should be set at 1x. Not sure why few shader presets above doesn’t use 1x for pure postprocess shaders that has nothing for upscaling. Overall, many shaders runs pretty good for 720p or 768p. It really looks good.

    1080p:
    Bicubic: 179.0
    xBR-lv2: 85.0
    xBR-lv3: 78.0
    xBR-lv3-multipass: 147.0
    xBR-mlv4: 95.0
    super-xbr-3p: 79.0
    super-xbr-6p: 36.0
    xbrz-freescale: 41.0+
    xbrz-freescale-multipass: 92.0
    ddt: 221.0+
    crt-royale: 37.0
    crt-royale-fake-bloom-intel: 48.0
    gtu: 54.0
    lcd-grid-v2: 100.0
    pixellate: 199.0
    nedi: 57.0
    nnedi3-2x-luma: 52.0
    scalefx: 137.0
    scalefx-hybrid: 115.0
    4xscaleHQ: 115.0
    hq4x: 162.0
    jinc2-sharper: 125.0

    At 1080p, many of the shaders holds up well from 720p or 3x to 1080p or 4x. xBR family plays at fullspeed, especially the multipass presets. Super-Xbr ones works great at 2x with 3 passes. Trying to do 4x with six pass preset is almost down to half fullspeed. crt-royale is the chore for great crt shader. I tried the one with fake bloom and for intel windows driver, but it’s still below fullspeed, and royale is mostly made for 4k. Gtu is also below, but the rest of the crt shaders should perform fine, and some that don’t look as good at 720p should look better at 1080p. Nedi and NNEDI3 also loses their speed for reaching 60fps. Scalefx performs pretty good. Default scalefx scales at 3x. Generally, many shaders above still performs at fullspeed. For fast forwarding, I would choose faster alternative, like the multipass presets of a targeted shader for more speed. It’s amazing how the 6520g performs good with shaders. It was a bit lower when I used to use glsl before Retroarch has slang and DX11 and glcore video driver. It’s a big improvement from gl driver with glsl shaders and cg shaders, to glcore (linux) and DX11 (Windows) with slang shaders.

    Since I want to compare my Tegra K1 tablet, I’ll force 5x on 1080p display, and with 5x, it will crop parts of the top and bottom, almost to how most CRTs crops the image.

    5x at 1080P (overscan):
    None: 237.0
    Bicubic: 151.0
    xBR-lv2: 62.0
    xBR-lv3: 57.0
    xBR-lv3-multipass: 119.0
    xBR-mlv4: 76.0
    super-xbr-3p: 65.0
    xbrz-freescale-multipass: 69.0
    ddt: 210.0+
    crt-geom: 59.0
    lcd-grid-v2: 71.0
    pixellate: 173.0
    scalefx: 137.0
    scalefx-hybrid: 115.0
    4xscaleHQ: 114.0
    hq4x: 162.0
    jinc2-sharper: 96.0

    First of all, take a look at few framemeister videos about 5x and scanlines and integers like Phonedork. He explains about 5x integer for 1080p. My Tablet is 1200p, which shows full 5x. Anyway, Bicubic and Jinc runs pretty good. xBR multipass and xbrz has the sharpest antialiasing on the filter, and multipass versions are faster. I do recommend using lv3-multipass. Super-xbr-3p is above 60fps, but other filters are preferred. DDT performs very fast and can look a bit better than bilinear for upscaling. Crt-geom is barely, so I do suggest cgwg or hyllian for CRT shaders. LCD from 240p to 5x still performs good. Presets from interpolation folder like pixellate is still very efficient. Although AANN is very heavy, but that’s for non-integer nearest scaling to see pixels. scalefx and hq4x didn’t change from 4x, but 9x scalefx is too heavy. I’m surprise a lot of those shaders performs fine at 5x upscale. Although, 6520g has no support for 4k, and 1440p may be a tight minimum for 512MB of VRAM.

    Let’s test 480p image for measurements. Likely in Retroarch, you will play 240p games, but certain PS1, Saturn, N64, and many Dreamcast games have 640x480. A generic 480p image is tested for GPU measurements.

    480P to 1080P:
    None: 189.0
    Super-xbr-2p: 47.0 (without jinc: 62.0)
    Super-xbr-3p: 37.0 (without jinc: 45.0)
    ddt: 180.0+
    nedi: 22.5 (without jinc: 25.0)
    nnedi3-2x-luma: 19.3 (without jinc: 21.0)
    reverse-aa: 126.0
    xbr-lv3-multipass: 90.5
    jinc2-sharper: 94.0

    I test less shaders since it’s more demanding for 480p image to 1080p. I only tested a simple image for shaders. Super-xbr shaders seems to perform all right. By default, it will perform lower than 60fps from 480p. I removed jinc from the last pass and let bilinear scale from 960p to 1080p, and I do get a barebones on 2 pass one. The three pass still is under 60fps. This results can vary on Super-xbr on media players, such as MadVR or MPV. MadVR uses 2P, and Super-Xbr hasn’t been looked for MPV, but outperformed by Ravu. I would love to test Ravu on Retroarch, but it doesn’t have it. Super-xbr performs more like default 2p one on media players, even without jinc. Back to shaders, DDT performs pretty high, and it does make some edge lines more smooth. NEDI and NNEDI3 are below 30FPS, and they aren’t useful for media players based on my tests. Reverse-AA performs pretty good, and it can make digitalized 480p games look good. However, pixelated areas will stay pixellated, so you will get mixed look on reverse-aa, but can look decent. xBR with multipass will help out 480p images that has little to no antialiasing that are prerendered, and it can look pretty great on early 480p games. Letting xBR upscale from 480p to 1080p is not integer scaling, so use 2x on xBR and let bilinear scale it. jinc2 shaders performs fast enough that it delivers high quality scaling than bilinear, bicubic, and lanczos. Overall, the shaders above can make 480p games look great, depending on the content. Likely, you would use super-xbr, nnedi3, and jinc to upscale 480p content, but as for the 6520g, go with jinc2 or reverse-aa.

    Since we covered the shaders for Retroarch, let’s go to internal resolution on emulators.

    PSX:
    PCSXR-PGXP will perform really good on 1080p screen, especially using PGXP will make the polygons look even better. Shaders on Pete’s OpenGL2 2.9 and Tweak works very well. Although when installing tweak version, make sure you find the ini file on ini folder of PCSXR directory, and change scaling from 8x to 4x. 8x heavily tanks the performance due to 512MB VRAM limit, but 4x is very fine on any situation.

    N64:
    Using GlideN64 4.0 (Linux) at 1080p with default setting, except using 3-point filtering, depth buffer to 0 for CPU performance, no MSAA or FXAA, and testing the very far and wide view of icy mountains of hailfire peaks from Banjo Tooie. By using the settings, it goes to 36fps (22) on the screen, and I know it should look less demanding, but with framebuffer effects being more accurate, it involves GPU limits. Turning on Legacy Blending and set frame buffer mode to 1 (VI origin) goes up to 47fps (28), and it’s still not fullspeed. 3-point filtering has no effect. However, lowering the resolution to 3x native resolution to 720p, or playing it on my internal 768p monitor, it gets me around 54fps, even with the said setting changes on the scene. For 480p or native resolution that you find in Angrylion, it goes to 60fps, as long as you at least have FBE to VI origin. Don’t use MSAA, because it is much more demanding than just going few times the native resolution. It should play fine overall. Many games should run fine at 1080p, but demanding parts on late big games can get tricky. Go down to 3x or 2x if you have to.
    Using Project64 with Jabo’s D3D8 will not have any GPU bottleneck due to its age. Glide64 is the same thing, if you can ignore the CPU bottleneck caused by Glide wrappers for Glide64.

    Saturn: I have test a little bit of Yaba Sanshiro, but the APU’s tesellation performance is not really great, so I’d use perspetive correction. At 1x and 2x, they perform identical. At 3x (720p), I do see it perform a little bit slower, so it will be more slower at 4x or higher. RBG is set to default for performance reasons. GPU Tesellation usually performs the fastest, but when you have a GPU that doesn’t have native fp64 or uses two fp32 to emulate fp64, it would either perform slower than cpu tesellation or won’t show any polygons, or quads on screen.

    Dreamcast:
    Old deprecated NullDC can run at higher resolutions just fine. Reicast’s fork, Flycast, can run at higher resolutions fine, but don’t use Per-Pixel since it’s not good for 6520g. If the game you’re playing runs at fullspeed on the APU, you can play at higher resolution on the standalone Flycast. Redream only allows higher resolution on paid versions, but it can run fine. I haven’t encounter any GPU bottleneck for Dreamcast yet.

    PS2:
    PCSX2 can have GPU go high, even at 1x. Often, the CPU is the main bottleneck that even native res or play with hack presets couldn’t show the GPU bottleneck as often. However, whenever the CPU isn’t demanding for a certain game that is pretty light, the GPU becomes the bottleneck, and even cases on some 2D elements. If the game is pure 2D, use software rendering if it helps more. I tried lowering the settings of GS2X settings, and no improvements.

    Gamecube/Wii:
    Haven’t tested much on the laptop, but I’d advise to use 1x resolution, and don’t use ubershaders. It’s not enough for 6520g, so use skip drawings to avoid shader compilation stutter. Although, for light games such as Animal Crossing and Sonic Gem Collections, you can use higher resolution.

    PSP:
    PPSSPP allows internal resolution increase. On God of War, not it’s only CPU demanding, but it’s also GPU demanding too on my laptop. GOW’s titlescreen is slow when you are on higher resolution, even at 2x. Frameskipping may sound ideal, but it can produce glitches and input lag. You can’t use frameskip for general use. GOW runs fine at 1x, but we’re left with CPU, with overclocking, and default settings on graphics, to encounter small slowdown with fighting several enemies onscreen at the beginning of the game. For the rest of the PSP games, they can run fine at 2x. 2x runs fullspeed in most games. 3x can run fine on some games, but other games can go below fullspeed. 4x is mostly slower and shouldn’t be used. 2x is the safe resolution for many PSP games to play.
    As for Shaders, the last three on the list are the only demanding ones I found for 6520g. 5xBR-lv2 plays around 41fps, Video Smoothing is around 21fps, and Supersampling AA one is really low to around 11fps. The rest plays fine at 1080p. It’s only tested for native resolution, so higher resolution can changed a bit for other shaders, specifically scaling ones like 5xBR and 4xHQ.

    NDS:
    As mentioned on CPU testing on A6-3420m, Software is faster than OpenGL, even using native resolution, and using software graphics to high settings. On any Desmume builds, using at least 2x will become slower. It’s not worth testing Desmume for GPU performance since it all relates to CPU bottleneck.
    On MelonDS, it does a good job increasing performance when using OpenGL renderer, but as for increasing resolution, I haven’t gone through with it yet. The emulator is still bottlenecked by CPU performance. However, you almost get same performance when you increase internal resolution when playing 3D games, and not become slower that Desmume has as a problem.

    Citra: Since it is CPU demanding on the laptop, it won’t reach GPU bottleneck until you choose 2x or 3x, but mostly 2x you find the bottleneck, even using hardware shaders and two accurate options turned off.

    Overall, 6520g is a decent iGPU for retrogaming, and very good for retroarch shaders. The benchmark is made for test purposes and to show how the 1st gen APU perform in graphics. I didn’t test old windows games since I don’t have that much of them installed on the laptop.

    PCSXR-PGXP: 4x IR + PGXP
    Jabo’s D3D8 (Windows): Any Resolution
    Glide64 (Deprecated): Any Resolution
    GlideN64 (Linux): 3x IR + 3-point Bilinear + Legacy Blending
    Yaba Sanshiro: 2x
    Dreamcast: 1080p (No Per-Pixel)
    Dolphin: 1x (Except very light games), async skip
    PPSSPP: 1x (GoW), 2x

    As a bonus, I’ll give you the average of few windows games. Shown on both native Windows and Wine with Gallium Nine DX9.
    Crysis GPU Demo: 38-40 Average (Low 800x600)
    Fortnite: 20-30 Average (Very Low 360p)*

    *Although partially playable, 3.5GB of RAM is not enough for Fortnite to get rid of constant lag spikes.

    Another bonus is for players like MPC-HC+MadVR or MPV can use shaders to make video image look better. I’ll be testing MPV since I have Linux on the laptop as of this writing. I don’t know when I’ll be testing Windows to use MadVR. I feel like getting it to work on wine is challenging.

    MPV Upscale from 640x360p to 1080p:
    Bilinear: 96.3
    Lanczos: 71.8
    Spline32: 71.8
    Spline64: 63.9
    Ewa_Robidoux: 14.3
    Ewa_Lanczos: 8.0
    Ravu-lite-r2: 52.0
    Ravu-lite-r3: 52.2
    Ravu-lite-r4: 50.0
    Ravu-r2: 45.9
    Ravu-r3: 45.0
    Ravu-r4: 41.2
    nnedi3-nns16-win8x4: 36.0
    FSRCNN_x2_r2-8-0-2: 26.5
    FSRCNNX_x2-8-0-4-1: 28.5

    Tested common scalers and image double shaders. All tests are using gather .hook files and use bilinear scaling on luma and chroma be default. Gather shaders performs better than compute on the laptop. (Note: you must force OpenGL and GLSL version to 4.3 at least for gather to work.) No temporal scaling (motion blend) is used. Used FPS check from Gallium HUD, since MPV bar length or display stats can lower fps by few frames lower, however it doesn’t happen on D3D11. Surprised that bilinear is over 90fps. Bilinear, Lanczos, and Spline performs pretty good. EWA based scalers performs the worst, and gets very low fps. On Ravu, it performs below 60fps on all of them. Ravu-lite r2 and r3 performs identical. Between r2 and r4, performance is not as different. Standard Ravu performs lower, but Ravu for 360p is good enough to use r4 on videos that are at most 30fps. NNEDI3-16, surprisingly performs a bit better than what I expected. It is below Ravu-r4. FSRCNN shaders are up. They’re not useful for this GPU since they’re below 30fps and can take a hit on videos that are higher resolution. However, FSRCNNX-8 performs slightly better and has more quality than FSRCNN-8.

    MPV Upscale from 720x480p DVD to 1080p:
    Bilinear: 86.3
    Lanczos: 59.9
    Spline32: 59.9
    Spline64: 54.0
    Ravu-lite-r2: 40.7
    Ravu-lite-r3: 40.3
    Ravu-lite-r4: 39.3
    Ravu-r2: 35.5
    Ravu-r3: 34.4
    Ravu-r4: 31.4
    nnedi3-nns16-win8x4: 26.3
    FSRCNN_x2_r2-8-0-2: 19.1
    FSRCNNX_x2-8-0-4-1: 20.4

    For 480p to 1080p, we see more results. Lanczos and Spline36 are identical and is borderline with 60fps. Although, bilinear should be stable and not have one framedrop per few seconds. Ravu shaders are appropriate for videos that are at most 30fps. Ravu-r4 is nearly down to 30fps, but r3 seems to be safe. r4 is better suited for 360p instead. NNEDI3 performs lower, and neither can any FSRCNN shaders give enough performance for general playback. FSRCNNX performs slightly better than standard FSRCNN again by little.

    720p videos won’t play well with those image doublers, so you’re left with bilinear for 60fps playback and lanczos or spline shaders for 30fps or lower.

    Overall, the entire laptop plays really fine with up to 5th gen home consoles and up to PSP portable consoles. It’s a really decent APU to use for emulation and it can have more tweaks with Linux. As a HTPC, it plays well on things up to 1080p. Playback is fine, but GPU performance for scaling can be basic or limiting for upscaling quality. As for games, it should play DirectX 9 era games really fine and early DirectX 11 ones too.

    Previous Page on CPU Emulation tests.

    • 1 year ago
    • #emulation
    • #laptop
    • #apu
  • Legacy AMD APU Llano Laptop for Emulation tests - Part 3

    A6-3420m CPU Emulation performance


    image

    I wanted to test my laptop’s APU for performance test on emulation. To recap, it is AMD’s first gen APU that the CPU is based on Phenom K10 CPUs, except having boost. It is unlocked so you can overclock it with a software. By default, the A6-3420m is a quad core 1.5Ghz cpu with boost to 2.4Ghz on one core. Boost was new so it helps a little. Overclocking brings some programs significant jump. From being a weak CPU to a decent one for emulation is an interesting story. The first gen Llano APUs are all unlocked, and are the few exceptions to overclock your laptop without being an actual risk. It came out in 2011, and seeing the first gen APUs in action should be surprising. They’re weak from the start, but offers decent GPU performance, and I’m offering both stock and overclocked benchmarks here for each emulator.

    Benchmarks:         A6-3420m 1.5Ghz-2.4Ghz and OC A6-3420m 2.3Ghz-2.8Ghz
    All tests are using the lowest non stutter FPS on the exact scene for a while to see how it performs and to see how to avoid sound stuttering to have smooth experience. Retroarch is on some of the benchmark and is using DX11 as main on windows, and OpenGL for Linux and for hardware rendering. Standalone hardware rendering based emulator is preferred (ex. Standalone Flycast vs Libretro’s Flycast, Standalone Mupen64Plus vs Libretro Mupen64Plus). Testing a 3D emulator is best with DirectX on Windows most of the time, and OpenGL for the rest or on Linux. Mesa drivers are the fastest and offers better compatibility. GPU bottleneck is not an issue by using native resolution without any shaders or anti-aliasing applied. The lowest FPS of a heavy game is a way to see which Emulator you could generally use. Note, if a specific system hardware or emulator to emulate one most demanding game doesn’t go fullspeed, doesn’t mean you can’t use the emulator for general good performance. BSNES’s demanding games for the CPU are three rare ST018 games. You may not play one game that is only demanding, but to see how many other popular titles perform. Some emulators may not play a demanding game due to compatibility or development issue. It’s a good way to see how good of a performance you would get to use it generally. Having over 60fps is a great way to have smooth experience and to throw any or most games without any problem.

    NES:
    Mesen-Stock:       Megaman 2 Intro
                           82.0 (100.0)
    Mesen-Very-High-OC:Megaman 2 Intro
                           48.0 (59.5)
    Nestopia UE works very well and very light. Mesen by default performs fine at stock. For virtual overclocking, only the CPU overclock can barely perform. However, it’s best to use Nestopia UE for those features, as well as using Runahead feature for lower input latency.

    SMS/GG/Genesis/CD/32x:
    Genesis-GX-Nuked:  Virtua Racing Demos (MAME OPN2 / Nuked OPN2)
                           118.0 (154.0) / 75.0 (93.0)
    The Genesis GX Plus core is too efficient to find any issue, and it is the most accurate currently and it was made for GC and Wii. Virtua Racing is the only demanding title since it uses SVP chip for 3D rendering. While it performs good, the Nuked OPN2 audio was added for more accurate sound. It seems to perform great, I suggest using MAME for fast forwarding, especially Runahead feature. 32x Virtua Racing runs around four times the fullspeed on Picodrive. I haven’t tested it on Fusion yet, but assuming it will run at fullspeed.

    SNES:
    SNES9x:            Super Mario RPG
                           116.0 (163.0)
    Bsnes-v110 fast:   Super Mario RPG
                           50.0 (63.0)
                      ST018 Game
                           36.0 (47.0)
    Higan:             ST018 Game
                           21.0 (29.0)
    Bsnes-HD-Mode7:    Super Mario Kart
    1x (2x)
    Testing the new Bsnes or BSNES-HD core performs really fine. Non-chip games works fullspeed out of the box. Games with Super FX2 chip or SA-1 chip are a bit demanding, and they are below fullspeed with CPU in stock. With overclocking, they are barely above 60fps. Super Mario RPG uses SA-1 chip. It would stay smooth and may not encounter small slowdowns. The most demanding games are the ones that uses ST010 DSP4 chip. Only three Japanese games use it, so they aren’t common. However, they won’t play at fullspeed, regardless.
    Higan an be used on Bsnes Standalone if you turn off all special fast features. Generally, it’s best to use Bsnes since Higan’s performance isn’t there at all for the CPU. I also suggest the newest Bsnes standalone or HD core over any Bsnes forks you find from Retroarch.
    I haven’t tested the Super FX overclocking feature.
    I recommend the main SNES9X if you want to fast forward and use Retroarch’s Runahead for less latency, especially paired with overclocking for SA1 games.
    HD side on Bsnes is also tested. Using Super Mario Kart and playing the demos, and the game has DSP1 chip. On any game with Mode7, it is not fullspeed at 2x at stock CPU. For overclocking, it generally performs smooth on most Mode7 games. With Super Mario Kart, since it has an external chip, it is slightly demanding, that it goes down to almost below fullspeed. For a long test, I do get 59fps at the lowest I got, but it generally plays at fullspeed. 2x with overclocked APU should be good, as long as you don’t use 2x on other games that has more demanding chip games than any DSP games.

    Virtual Boy: Simple, perfect performance, regardless of hard sync.

    Sega Saturn:
    Yaba Sanshiro is the best emulator you can use on the APU. You can enable frameskip to get the best performance as much as possible. Some parts of any games may go a bit below fullspeed, but the audio is async, so it may not be as noticable, as long as the CPU is overclocked.

    PlayStation:
    Beetle-PSX Core:   Crash Team Racing (Interpreter / Max Perfprmance 1024 DMA)
                           36.0 (45.0) / 47.0 (54.0)
    Mednafen:          Crash Team Racing
                           41.0 (57.0)
    PCSX-Rearmed:      Crash Team Racing (Interpeter / Dyanmic)
    57.0 (71.0) / 61.0 (81.0)
    PCSX-R PGXP:       Crash Team Racing (Vanilla / PGXP MEMORY + CPU 1.5x)
    ~85.0 (~115.0) / ~60.0 (~85.0)
    These are four emulators tested for the laptop and each has its own story.
    Beetle PSX Core from Retroarch is based on Mednafen. I am testing with the new dynamic recompiler on performance mode and most games should work with it. While the performance is noticably faster than standard interpreter, it is only more playable with overclocked CPU to barely have any lag, at least in software. Hardware rendering is quite slower on this laptop. I don’t know exactly why it’s slower than software, even using Linux with Mesa Drivers, but it still hits really similar speed when comparing interpreter and dynamic. If you want to do hardware with higher resolution and PGXP, use PCSX-R fork. With Crash Team Racing intro and test the ice bear scene, that’s the part where I found the slowest point. Even with that, dynamic at max performance with software and host CPU overclock gives best results. Although, the interpreter on beetle is kinda slower than Mednafen and beetle is a fork of it.
    Mednafen is a multicore emulator, and I used its PSX emulator that is the most accurate. Without frameskip for full mesaurements, Mednafen is faster than Beetle core. Somehow, overclocking your CPU brings the performance up dramatically. It is pretty close to 60fps on few spots on CTR demos, but fullspeed on a lot of areas. It’s unbelievable for standalone Mednafen to be faster then Retroarch core that you may use this for faithful emulation. Although, you can turn on frameskip for full emulation performance, I recommend not having frameskip for good response. Somehow, Mednafen doesn’t use CPU boost clock for me, but still shows it’s faster than Beetle core.
    Another Retroarch core is PCSX-Rearmed. In the last few years, we do have it for x86 and x64 PCs. It uses less accurate interpreter and Pete’s Software for performance. On stock CPU, the performance reaches fullsleed most of the time, but you can encounter minor slowdown, but it’s not that below. With Overclock, it reaches fullspeed on all areas of testing. Like Mednafen, it renders at 1x. Recently, we got dynamic recompiler for x86, x64, and Arm64. It made PCSX-Rearmed run at fullspeed without overclocking the CPU. For a 1x resolution, this emulator is preferred over the other two for performance.
    PCSX-R PGXP is a really good emulator and performs excellent. You can use Pete’s OpenGL for Linux and OpenGL2 2.9 Tweak version for Windows. Pete’s OpenGL 1.78 on Linux is more reliable than Windows version and just as fast as OpenGL2 2.9 tweak when using full framebuffer settings. Only difference are that OpenGL 2.9 allows shaders and xBR upscaling on textures. Both Pete’s OpenGL 1.78 and OpenGL2 2.9 Tweak offers PGXP capabilities, so you should see very great polygon rendering. Only PGXP Memory for the CPU are usable with fullspeed. Combining PGXP memory and internal CPU overclock at 1.5x gets you slightly above fullspeed. Overclocking your CPU should bring more relief for fullspeed on any games. The Linux drivers, despite performing better than official drivers from AMD for OpenGL, it performs the same. Only one downside with r600g drivers at the moment on any video plugin is the lighting on Spyro on some areas, but they are minor, not severe. Regardless, you should have great experience on PCSX-R PGXP. Although, neither of the builds use .CHD iso files. I did test Windows PCSX-R PGXP on Wine, and while I was able to use OGL2 Tweak and get the same performance as Windows, I do have problems with the audio plugins and Xaudio2 driver. I do recommend finding PGXP Linux Build for easier setup. It’s available as a PPA and AUR build.

    N64:
    Angrylion Plus with Project64 using internal LLE mode plays at half the speed or lower mostly.
    This is gonna be a long explanation about this laptop hardware and drivers. In short, you can play many N64 games with pretty great accuracy without the use of Angrylion. However, it is a mess on Windows side. I’ve tested many video plugins. Windows 10 updates seems to make things a bit slower. Rice plugins are all over the place, and many of them have problems. GLN64 is not as good. Jabo’s D3D8 1.6.1 is the fastest you would get. Glide64 and GlideN64 are bottlenecked by AMD OpenGL drivers, meaning that it’s slower. Glide64’s performance is mediocre. I tried using nGlide, and it helps a bit, it’s still doesn’t solve the lag on some games, mainly Quake 2 demos that’s used as a test to see if the lag is present. Jabo’s is the fastest, and only has minor lag because of Windows 10 updates. GlideN64 is really slow, even turning off framebuffer at 240p. It’s a driver issue, and overclocking the CPU didn’t help much. Quake 2 demo lag was few frames per second. I would’ve test Windows 7 since the laptop was made for it, but I no longer have it since 2016. Mupen64plus is slightly slower, since all plugins use OpenGL.
    Let’s jump into Linux. This is unbelievable! I use Mesa Drivers and downloaded Mupen64Plus and got GlideN64 4.0. I tested Quake 2 demos, and by default, it’s much faster than almost every plugin I tried on Windows. I overclocked the CPU, and turn off Depth Buffer to RDRAM with non-noticable regression, and it goes from minor lag to none! I bumped up the resolution and no lag is present at all. I do however set Framebuffer mode to VI origin to use less GPU usage on high resolutions. GlideN64 is really fast on Linux on this laptop. Even 3-point filtering finally works on my laptop. I recommend using standalone Mupen64Plus for Linux since it’s faster. On Retroarch on Mupen64Plus-Next, I still have minor lag with the same settings. To get the easiest way to have mupen64plus with GlideN64 bundled, search M64p.

    Dreamcast: Redream is the fastest emulator you can use for the CPU. It works fine at CPU stock. Reicast’s fork, Flycast, is more compatible with games, but is more demanding. Even with CPU overclocking and turn off few accurate settings, it is a bit below fullspeed. On my drivers, I do have sprite glitch on Marvel Vs Capcom on Redream. It was tested on Linux, but on Windows, the performance may worsen due to dated drivers and poor OpenGL drivers.

    GBA:
    mGBA:              Mermaid Melody PPPP Menu
    141.0
    VBA-Next:          Mermaid Melody PPPP Menu
    126.0
    VBA-M:             Mermaid Melody PPPP Menu
    127.0
    Plays very fine. mGBA is newer and more accurate than VBA emulators. VBA-M is the slowest generally. VBA-Next is sometimes close to mGBA’s speed and sometimes by VBA-M’s speed. Even when using bios and disable remove idle as shown, mGBA offers better performance.

    NDS:
    Desmume 0.9.11+:   Pokemon Black2/White2 Title Screen (No Frameskip / Frameskip 9)
                           33.0 (40.0) / 60.0+ (80.0+)
    MelonDS 0.83:      Pokemon Black2/White2 Title Screen (OpenGL 1x / Jit Recompiler)
                           20.0 (29.0) / 00.0 (42.0)
    I’m testing two emulators for measurements. I’m using a jit command on Desmume Linux build for full performance. On Windows, it has OpenGL renderer, but Software is the fastest, so that’s why I’m using software rendering on Desmume. I’m testing a demanding area on Pokemon B2W2. Without frameskip, you would get almost down to half fullspeed. With overclocking, you would get a bit more performance. With frameskip at max, I get fullspeed. Although, I suggest using lower frameskip, like one or two. On a lot of games, it may not need that much frameskip, generally. It performs fine on other games that have less demanding scenes. It’s probably better for overclocked CPU since you can lower frameskip by one.
    On MelonDS, since it has an OpenGL renderer, I decide to test it myself. As a result, I get below half the speed at stock clocks. On Overclock, I get about half the speed, so it’s the interpreter CPU that is the bottleneck. With beta ready Jit recompiler with default settings for pre-0.9 release, I do see some increase. It slightly passes Desmume without frameskip. However, some games will run near fullspeed and others at fullspeed. Not much has been tested for high internal resolution or other games.
    Your last choice to get better performance to games that are in the first 2/3 of the DS life cycle, No$GBA is your choice. It is fast and you can use Nocash or OpenGL renderer. Although, Wine has problems with OpenGL that it crashes wine. The nocash is faster and No$GBA is the fastest option while being really least accurate, like you can hear the audio have noisy sounds on couple of games, and it has problems playing Pokemon Gen 5 games.
    (Note!) I heard Drasic DS is gonna go Open Source after it has AARCH64 ARMv8 dynamic recompiler implemented. It is faster than Desmume that you can run it on an android emulator at fullspeed. It may not be that easy to set up since it’s payware and using an emulator, but it does perform well. Although it does have a slight input lag, it still considerable for emulated Drastic DS. I haven’t test it yet. Dev is working on x86-64 and x86 builds and will be out once the emulator goes free.

    GameCube/Wii:
    Dolphin x64:       Soul Caliber 2
                           36.0 (45.0)
    Soul Caliber 2 runs fine. At some parts, you can encounter a little slowdown. With big effects that happens during fighting, I see ¾ of performance with overclock. Some games may play fine though, at least with overclocking. Make sure you run at 1x with async shaders, not using ubershaders. You won’t play any heavier titles though. You can play with the virtual overclock options and you may set it to half the speed or quarter for some games.

    PS2: While it runs at least, even with overclocking, a lot of games runs slower that it’s not a recommended system to use PS2 emulators. At best, you stick with DX11 on Windows or OpenGL on Linux for PCSX2. Pushing speed to very aggressive may be appropriate for certain games that can run decently or almost fullspeed, but those are lighter titles.

    PSP:
    PPSSPP:            God of War
    37.0 (48.6)
    It runs games completely fine. Only demanding game is God of War. You can encounter slowdown on certain parts of the game. You can solve it by only setting the CPU clock to 222mhz on the option specifically for GOW. The game isn’t constantly slow or majority of the time, it’s just it has slowdowns sometimes, and goes fullspeed on other times. If God of War only has slowdowns on many enemies with the performance given above, you won’t at least encounter slowdowns on the rest of PSP titles.

    3DS: On overclocking too, I couldn’t generally get Pokemon games to play at fullspeed on needed amount of times. It goes lower than fullspeed on battles, somewhat lower on overworld, and a bit lower than half the speed on double battle or battle royale. A lot of 3DS games runs generally slow. They barely reach fullspeed, even overclocking the CPU. Citra won’t run fast enough for this system.

    Dosbox: From any Dosbox builds I use as explained from previous page, it runs the dynamic recompiler fine. It reaches commonly around above near late 486 performance, around 24000. With overclock, it goes up around 36000, equivalent to 486DX4-100Mhz. Although, some 486-pentium era games are able to use more cycles without slowing down the emulator. On Interpreter, it runs around 12000, equivalent to 486DX-33Mhz. With overclock, you go to around 18000, equivalent to 486DX2-50Mhz. I do recommend Dosbox ECE, or finding Dosbox builds that has patches, and is 32bit build since 32bit dynamic recompiler is robust.

    PCEM: It can run any 386 processors. 486, it can run on any SX ones pretty fine. However for DX, let’s get into it. 486DX-25mhz can run fine at stock as an interpreter. Interpreter seems more constant on speed than dynamic recompiler. With Overclock, it can use DX-33mhz pretty good as an interpreter. Dynamic Recompiler is a way to get good performance for emulated CPUs and go higher, but on places like Windows 95, sometimes windows being on idle or loading things on Windows can bring the performance down a bit than expected. It can go above the targeted interpreters, but dynamic is better used on DOS mode on this laptop. On stock, it can go up to 486DX-40, and with overclock, it can go up to 486DX2-50. I use DBOPL on sound blaster setting to get a little more performance for the CPUs. The laptop can’t go any higher to use Pentium CPUs, and using 3DFX Voodoo hasn’t been tested, but I recommend using threads of 2 since the host CPU has four cores.

    Recommended Emulators:
    NES: Nestopia UE
    SMS/GG/Genesis: Genesis GX Plus
    SNES: Snes9x
    PSX: PCSX-R PGXP, PCSX-Rearmed
    N64: Mupen64Plus (Gliden64, Linux), Project64 (Jabo’s, Windows)
    Saturn: Yaba Sanshiro
    Dreamcast: ReDream
    GC/Wii: Dolphin
    GB/GBC: Sameboy
    GBA: mGBA
    NDS: Desmume 0.9.11+
    PSP: PPSSPP
    PCEM: 486DX 25Mhz/40Mhz
    DOS: Dosbox ECE

    Recommended emulators are listed as usable. If a system or emulator is not listed, it either that it won’t be playable due to speed, not past playable yet, or too fast enough to play (Stella, Atari 2600). The emulators on the list are recommended for general use. This is using stock settings on most emulators listed. Also, lighter games will perform faster, and you can toggle more settings for those games, like Runahead.

    If any of you know what are the most demanding games for GBA, Saturn, Dreamcast, or DOS, let me know and comment.

    Using AMD cards on OpenGL Emulators:
    On Windows, you can only use official AMD drivers. It runs pretty fine for DirectX stuff, but for OpenGL, a lot of OpenGL programs runs slower and sometimes broken. OpenGL drivers are not really optimized, and since Terascale GPUs aren’t supported for at least four years as of this writing, you may not get to use newer OpenGL emulators or updates, even though you feel it should be more capable than how it performs. Even worse, first few generations of AMD APUs have short lifespan for graphic drivers from AMD, and Windows 10 can make things a bit slower than using the first version or using Windows 7. Again, Terascale GPUs will not have Vulkan support on any drivers.

    Using Linux with Mesa Drivers, r600g:
    I tested OpenGL emulators on few distros with Mesa drivers. It does perform almost as good as Nvidia’s OpenGL drivers.
    On GlideN64, all the slowdowns on Quake 2 are gone. I don’t have that problem on Linux. The Mesa drivers are much more reliable, even if there very few errors I explained above, it’s still very much stable and efficient. Trust me, it’s far better than Windows.

    Since we covered the CPU performance for emulators, we’ll test out GPU performance of Radeon HD 6520g on the next page.

    Next Page on GPU emulation performance.

    Previous Page on software and emulators use.

    • 1 year ago
    • 1 notes
    • #emulation
    • #laptop
    • #apu
  • Legacy AMD APU Llano Laptop for Emulation tests - Part 2

    Software tools and Emulators used.

    I covered everything about my laptop and its history. I went from Windows 10 to Manjaro XFCE 19 and installed needed software to get great performance and needed emulators to suit my laptop’s hardware. I find AUR builds easy again because I can find plugins and needed standalone emulators for good testing. AUR builds work almost perfect for me, aside from some build time, but as long as it can installed a program or an extension, I’m good.

    For my laptop, I installed TPC, or TurionPowerControl to overclock my laptop’s CPU. The bios doesn’t have an option for permanent overclock, since it’s a laptop. However, the CPU has used much voltage by default for big room for overclocking with little higher power draw. At stock, it run at base 1.5Ghz at 1.1625V and 2.4Ghz boost at 1.415V. It can easily be undervolted to cool the laptop a bit. The temp limit is at 85C, so it can throttle the CPU if it reaches around that point. Undervolt would go down to 1.5Ghz at 1.0625V and 2.4Ghz boost at 1.200V, which brings down about seven percent deop on temperature, and little more battery life. Overclock would hit up to 2.3Ghz at 1.175V and 2.8Ghz at 1.400V, only little voltage change on each, and still runs stable. Note that each CPUs can differ silicon quality that can reach lower or higher voltage for overclocking or undervolting. Averagely, my laptop’s CPU performance would be 33% increase. The Llano APUs are one of the exceptions that you can overclock your laptop without much worry.

    I installed Gamemode for Linux, and I explained in the last page. It’s useful for setting power mode for both CPU and GPU to performance mode, so it will use highest clock speed as much as it can. Radeon-Profile is an app on Linux that can force power mode on GPU if you run a program if you can’t use gamemode for whatever reason. On Windows, you can force performance mode on Catalyst driver, and have Windows set to performance mode on power setup. Drivers used on Linux as of this writing is Mesa r600g Driver 19.3.

    Now let’s get to the list of emulators I will be using for each system. Some are using Retroarch.

    NES: I’m using Nestopia and Mesen on Retroarch. Both are quite accurate, with the latter being the most accurate. Nestopia is the fastest option, but Mesen can run pretty smoothly too with default settings. Nestopia has much more headroom for Runahead and NES CPU overclocking. I use FireBrandX’s digital palettes.

    SNES: I use Snes9x Mainline for Retroarch and new standalone Bsnes v110 for test. Snes9x runs very well and can use both features listed above. Bsnes is used for testing mostly. I’ll explain about Bsnes, but what I can say is it can run a lot of games at fullspeed. No need for Libretro’s old many Bsnes cores in my opinion and Snes9x Mainline is suitable generally. Bsnes AUR builds are available.

    N64: On Windows, it was Project64 with Jabo’s D3D8 1.6.1 Plugin, but after switching to Linux, Mupen64Plus with GlideN64 while using Mesa drivers offers a better option. Since February 2020, M64p is free once again so you don’t have to do DIY build for each plugin to use with Mupen64plus, and get good GUI. Just go to this website: https://github.com/loganmc10/m64p/releases . If it doesn’t go free in the future, you would have to use AUR files to build plugins and find the gui. There is Mupen64Plus Next for Retroarch, but a standalone build is the fastest and more reliable option.

    Gamecube/Wii: I use beta builds of Dolphin to measure it. We’ll explain about that later on.

    Sega SMS/Genesis/GameGear/32x: I use Genesis GX Plus on Retroarch, and it runs pretty great. For 32x, I can use Kega Fusion for Windows, or Picodrive core, but I don’t have 32x game to test.

    Sega Saturn: I use Yaba Sanshiro, and it’s the fastest emulator you can get for the laptop. Yaba Sanshiro has some great options.

    Sega Dreamcast: I installed Redream. It is the fastest dreamcast emulator available and more accurate than NullDC. I do have a fork of Reicast called Flycast that is used for testing too. https://flyinghead.github.io/flycast-builds/ (ubuntu build is just linux build). I do recommend using Flycast standalone builds instead of Libretro core one since standalone Hardware-based rendering emulators often run faster than on Retroarch.

    Playstation 1: I use both PCSX-R PGXP and PCSX-Rearmed core. PCSX-R has great option for perspective correction and much less jittering polygons while being faster than Libretro’s Beetle PSX core and standalone Mednafen. The explanation on those four emulators and the windows PCSX-R on Pete’s OpenGL2 Tweak will also be mentioned.

    Playstation 2: I use PCSX2.

    GB/GBC: Sameboy. It is more accurate than Gambatte and VBA.

    GBA: mGBA, and it is the fastest and most accurate GBA emulator I used.

    NDS: Despite development drama, I use standalone Desmume mainline. It’s pretty good for this laptop since you have an option to use frameskip. MelonDS with JIT will be included.

    3DS: Citra Canary is the best option.

    PSP: PPSSPP.

    Dos: Dosbox ECE or Dosbox-X at best. ECE install for Dosbox is difficult, and you would need to build one, but I wanted a 32bit build since the dynamic core is pretty robust in 32bit. However on Linux, some forks or main are kinda hard to find 32bit version to have full speed for dynamic recompiler since 64bit is slower or has bugs. At the end, I use Dosbox-X, and it performs the same as standard Dosbox on normal mode. On Dynamic mode, you would need 32bit version to get the exact on either Windows or Linux.

    Win9x: PCEM. Using v15 and use 486 CPUs.

    Wine: Linux specific to run Windows programs. You can use Stable builds or Staging builds. Lutris and Proton can be used. Although, since my laptop’s APU lacks Vulkan support, running DX11 games are much harder and barely run. The APU isn’t really that strong for many DX11 games anyway. DX9 works with default OpenGL wrapper, but since we’re using AMD GPU with a Mesa Gallium Driver, we can run Native DX9 API on Wine with Gallium Nine Standalone. You just install needed dependencies for Mesa D3D9 files to have Gallium Nine config enable Native DX9. DX10 is the same story as DX11. DX8 and lower works pretty good for the most part. Note that Wine is not an Emulator, but a compatibility layer.

    Those are the softwares that will be used for performance testing on the next page.

    Next Page on CPU emulation tests.

    Previous Page on the laptop overview.

    • 1 year ago
    • 1 notes
    • #emulation
    • #laptop
    • #apu
© 2016–2021 Pokefan531's Posts