Rikomagic MK 802 III Review

In the past few months, numerous simple computers, usually using a single board or a SoC, running Android or Linux. Compared to regular PCs the performance is rather poor, but so is the power consumption (usually just a few watts). The most known example is probably Raspberry Pi, but there are many alternatives and Rikomagic MK802 III is one of them.

I've repeatedly needed a simple "something" that could be connected to a TV and would allow me things like Skype or e-mail. And as MK802 III has a dual-core CPU (compared to the previous MK802 III) it seemed like a good choice. So let's see what are my experiences after a few days of using it ...

rikomagic mk802 android

Sysbench, memory bandwidth and gcc optimizations

If you're testing basic hardware performance, it's quite probable you're using sysbench, a widely used benchmarking toolkit for testing CPU, I/O and RAM. I've been using it quite intensively recently, and it seems that when it's compiled with gcc optimizations (which is the default in most distributions), some of the tests report meaningless values, because gcc optimizes out important parts of the code.

Memory operations type: read
Memory scope type: global
Threads started!
Done.

Operations performed: 10240 (4236544.83 ops/sec)

102400.00 MB transferred (42365448.33 MB/sec)

Well, I wouldn't object against harware with several TB/s memory bandwidth, but I wouldn't expect such numbers when testing current x86 hardware ...

This tool is often recommended, so I wonder how many people used these crazy results as a basis for important decisions in the past. Sadly, I'm one of them. Let this be a proof that writing good benchmarking tools is quite tricky and how important it's not to take the numbers granted but to verify them.

sysbench ram bandwidth gcc optimizations postgresql

sysbench failed to read file

I do a lot of HW testing recently, and when testing I/O using sysbench I've encountered an issue with strange test failures with this message

FATAL: Failed to read file! file: 73 pos: 705503232 errno = 0 ()

Obviously the values (position and file number) may change according to parameters you've run your test with. I've found many questions about these issues on the web but no solution, however I think I've solved it - it's caused by running multiple tests on the same set of files, when some of the tests delete the files and the subsequent tests fail ...

sysbench failed file read fatal

Seagate BlackArmor NAS 220

A few days ago I've finally taken the plunge to buy a small NAS, especially to improve the backup strategy a home (particularly to backup the data to a separate device). I've decided to buy a 2TB model of Seagate BlackArmor NAS 220, basically a tiny computer with two 1TB 3,5" drives (in my case ST31000528AS). So far I'm quite happy with it, so let me briefly summarize my experiences with this device.

seagate blackarmor nas 220 review

Invitation to pgconf.eu 2012

As Magnus already announced a few hours ago, this year's pgconf.eu conference is going to be in Prague on October 23 - 26. That's pretty exciting and while there's a lot of work to make it happen, I believe it's going to be a great conference. We had a local PUG meeting last week and while drinking beer, we've prepared this short unofficial "strip" invitation

It's probably the right time to start thinking about the topic of your talk ...

postgresql pgconf 2012 invitation

Fulltext with dictionaries in shared memory

If you're using the fulltext built-in to PostgreSQL and if you can't use solution based on snowball due to the nature of the language (as it works great for english, but there's not anything similar for czech with reasonable accuracy and probably never will be), you're somehow forced to use ispell based dictionaries. In that case you've probably noticed two annoying features.

For each connection (backend), the dictionaries are loaded into private memory, i.e. each connection spends a lot of CPU parsing the dictionary files on the first query, and it needs a lot of memory to store the same information. If the parsed dictionary needs 25MB (and that's not an exception) and if you do have 20 concurrent connections using the fulltext, you've suddenly lost 500 MB of RAM. That may seem like a negligible amount of RAM nowadays, but there are environments where this is significant (e.g. VPS or some cloud instances).

There are workarounds - e.g. using a connection pool with already initialized connections (you'll skip the initialization time but you're wasting memory) or keeping small number of persistent connections just for fulltext queries (but that's inconvenient to work with). There are probably other solutions but none of them is perfect ...

Recent issues with the fulltext forced me to write an extension that allows storing the dictionaries in the shared memory. Even this solution is not perfect (more on that later), but it's definitely a step in the right direction. So how does it work and what it does?

postgresql fulltext ispell shared memory dictionary

Connection limits - a proof of concept

From time to time someone on the mailing list asks if there's a way to limit number of connections for a given IP address, database or a user. No, currently there's nothing like that built into the core,. although it'd be pretty useful feature, especially on clusters shared by multiple clients (applications, customers, ...). Sometimes such quotas can be reasonably implemented using a connection pooler like pgbouncer or things like netfilter connlimit, but both solutions have disadvantages.

But hey! We do have a little thingies called "extensions", "hooks" and "shared libraries." Last week I hacked up a simple "proof of concept" extension that uses "client auth hook" and allows you to limit number of connections by various criteria. It's not perfect (more on that later), but it works quite fine. It's available at github and once I fix those issues I'm aware of I'll publish it on pgxn.

PS: Thanks to Magnus for pointing out the existing "client auth hook" might be useful to implement this, and TL for pointing out how unbelievably ugly kluge the original version of the extension was.

postgresql connection limit quota extension user database ip address

Calculating quantiles and trimmed aggregates

If you've ever needed to compute median or other quantiles, you probably already know there are no built-in aggregates for that in PostgreSQL. There are several solutions available, with various level of flexibility and performance - let me name at least Joe Conway's solution based on PL/R and depesz's PL/pgSQL solution. I personally wasn't happy with any of those but as I'm really lazy, I haven't written anything on my own ... until last week. The extension (already available on pgxn) is based on a disgusting trick (more on that later), but seems to work fine, especially when it comes to performance.

When I wrote that, I realized I could use the same basic idea to implement trimmed averages - not sure if that's the right term in english, but it means something like "remove 3% of the lowest and 2% of the highest values and compute AVG/VARIANCE of the remaining data." Basically it's a neat way to get rid of the outliers. This is available on pgxn too.

So lets see how that works, what's the performance, etc.

postgresql aggregate median quantile average variance stddev

Collecting query histogram

If you're using PostgreSQL and need to track performance anomalies, you probably know the option log_min_duration and maybe even auto_explain contrib module, that allow you to react to queries that are slower than the defined limit. That's a great thing, but there's a catch - how to set the threshold, especially when there are queries with very different query times?

If you set it too low, you'll get huge log files with a lot of queries. If you set it too high, you'll probably miss changes in case of the very show queries. What if you set the limit to 200 ms and the queries that used to run at 50 ms suddenly slow down to 150 ms? The performance effectively dropped to 30% yet nothing got logged ...

postgresql histogram queries monitoring extension scalability

SSD benchmark results - TPC-H

If you compare the SSD with a traditional drive in a TPC-H benchmark, you'll notice that the performance increases even less than in case of the read-write pgbench workload. For example when comparing m-time (i.e. time spent evaluating queries with identical query plans, more detailed definition is available in the article about TPC-H benchmark with a traditional drive), SSD with an XFS file system performs like this

XFS performance (m-time) with a SSD drive

while the traditional drive performs like this

XFS performance (m-time) with a traditional drive

i.e. the SSD drive is about 1.5x faster  than the traditional drive (which is significantly worse than with the read-only/read-write pgbench, where the SSD was about 25x/13x faster). But even for a SSD drive it's true that the best performance is achieved for large database blocks and the size of file system blocks does not matter.

postgresql tpch ssd benchmark ext2 ext3 ext4 xfs jfs reiserfs nilfs2 btrfs
1 2 3 4 5 6 7 8