Distribution of table in time
I have a MySQL table with approximately 3000 rows per user. One of the columns is a datetime field, which is mutable, so the rows aren't in chronological order.
I'd like to visualize the time distribution in a chart, so I need a number of individual datapoints. 20 datapoints would be enough.
I could do this:
select timefield from entries where uid = ? order by timefield;
and look at every 150th row.
Or I could do 20 separate queries and use
limit 1 and
But there must be a more efficient solution...
Michal Sznajder almost had it, but you can't use column aliases in a WHERE clause in SQL. So you have to wrap it as a derived table. I tried this and it returns 20 rows:
SELECT * FROM ( SELECT @rownum:[email protected]+1 AS rownum, e.* FROM (SELECT @rownum := 0) r, entries e) AS e2 WHERE uid = ? AND rownum % 150 = 0;
Read more... Read less...
As far as visualization, I know this is not the periodic sampling you are talking about, but I would look at all the rows for a user and choose an interval bucket, SUM within the buckets and show on a bar graph or similar. This would show a real "distribution", since many occurrences within a time frame may be significant.
SELECT DATEADD(day, DATEDIFF(day, 0, timefield), 0) AS bucket -- choose an appropriate granularity (days used here) ,COUNT(*) FROM entries WHERE uid = ? GROUP BY DATEADD(day, DATEDIFF(day, 0, timefield), 0) ORDER BY DATEADD(day, DATEDIFF(day, 0, timefield), 0)
Or if you don't like the way you have to repeat yourself - or if you are playing with different buckets and want to analyze across many users in 3-D (measure in Z against x, y uid, bucket):
SELECT uid ,bucket ,COUNT(*) AS measure FROM ( SELECT uid ,DATEADD(day, DATEDIFF(day, 0, timefield), 0) AS bucket FROM entries ) AS buckets GROUP BY uid ,bucket ORDER BY uid ,bucket
If I wanted to plot in 3-D, I would probably determine a way to order users according to some meaningful overall metric for the user.
For whatever reason, your example only works when the where @recnum uses a less than operator. I think when the where filters out a row, the rownum doesn't get incremented, and it can't match anything else.
If the original table has an auto incremented id column, and rows were inserted in chronological order, then this should work:
select timefield from entries where uid = ? and id % 150 = 0 order by timefield;
Of course that doesn't work if there is no correlation between the id and the timefield, unless you don't actually care about getting evenly spaced timefields, just 20 random ones.
select timefield from entries where rand() = .01 --will return 1% of rows adjust as needed.
Not a mysql expert so I'm not sure how rand() operates in this environment.