No, this is not a real Sinatra error :).
This is the personal homepage of Elad Meidar, a web developer and an entrepreneur specializing in Ruby on Rails. I hang around in Israel, and i am currently having the best time of my life over at Fiverr.com
I am a proud member of RailsBridge, Helping new Rails developers get into our world and also contributed a few Patches to the Ruby on Rails core.
438 million, 218 thousand and 363 rows.
Current count of indexes on the table on the other hand, is 0.
I imagine you all ask how long does it take to perform a `select (*)` on it, well, I stopped waiting after about 4 minutes.
This peculiar situation happens in one of our client’s projects, the table itself fills up from a daemon that listens to some kind of a stream with the current daily amount that goes somewhere around 4 million rows per one single day. all we are storing is a simple integer and a foreign key ( “sample” ).
Crazy, i know.
This table (“samples table”) should allow the app to access any subset of query, but mostly based on a `WHERE user_id = xxx` clause, so i can’t offload “old” rows away into oblivion (or an archive).
After a little research, i decided on the following options:
The amount of data is huge, so i was initially looking for some information regarding data size limitations on those NoSQLs:
What i am planning on doing is to create some kind of sampling and to keep to most recent data in a NoSQL storage engine.
Partitioning seems like a reasonable RDBMS level solution, but on mysql it’s limited to 1000 partitions only and they are also not very dynamic (i can’t create an automatic partitioning engine that will.
We decided on trying the following flow:
We will create a cron task that will run every hour processing all the samples from the last hour and will avg it up, later storing it in a statistics table with only the hourly avg as the sample value.
another task will do the same scoping out from hours to days, and from days to weeks which will be our lowest resolution.
This method drops our row counts in places we can afford data resolution decrease in 10s of millions of rows.
This process is still under development so if anyone has a better idea and care to enlighten us, please do so.
You're seeing this error because I think it is funny.