IRC log for #storm on 20090304

00:00.19*** join/#storm wallflower (n=wallflow@ip-205-246-113-216.pool.grokthis.net)
00:27.01shaunmhow does transaction management with sqlite work exactly?
00:27.32shaunmaccording to the sqlite documentation, sqlite works in "autocommit mode" unless the "BEGIN TRANSACTION" command is ussed
00:27.37shaunm*issued
00:28.00shaunmand COMMIT reverts it back to autocommit, so you need another BEGIN TRANSACTION
00:28.16shaunmdoes storm automatically issue BEGIN TRANSACTION commands that I'm not seeing?
00:50.39jkakarshaunm: Yes, it does.
00:51.11jkakarshaunm: You have store.commit() and store.rollback() to decide how the transaction ends.  It begins automatically.
00:51.48shaunmok
00:52.08shaunmjust surprised it doesn't show up with a debug tracer on
01:13.49*** join/#storm artista-frustrad (n=artista_@201-40-95-59.ctame704.dsl.brasiltelecom.net.br)
01:29.30*** join/#storm jamesh (n=james@canonical/launchpad/jamesh)
02:47.01*** join/#storm goschtl (n=goschtl@p5B0BBA94.dip.t-dialin.net)
03:16.55*** join/#storm sidnei (n=sidnei@plone/dreamcatcher)
05:00.40*** join/#storm artista-frustrad (n=artista_@201-15-218-10.ctame704.dsl.brasiltelecom.net.br)
05:11.35*** join/#storm bigdog1 (n=scmikes@72-197-8-8-arpa.cust.cinci.current.net)
05:19.39*** join/#storm bigdog1 (n=scmikes@72-197-8-8-arpa.cust.cinci.current.net)
05:27.05*** join/#storm thumper (n=tim@canonical/launchpad/thumper)
05:50.46*** join/#storm jukart (n=jukart@d91-128-122-97.cust.tele2.at)
06:44.47*** join/#storm jukart (i=lovely@81.189.156.94)
09:50.41*** join/#storm goschtl (n=goschtl@p5B0BBA94.dip.t-dialin.net)
12:11.20*** join/#storm uzzed (n=alexandr@200.139.120.198.dynamic.adsl.gvt.net.br)
12:25.44*** join/#storm niemeyer (n=niemeyer@200-103-244-201.ctame705.dsl.brasiltelecom.net.br)
12:41.09*** join/#storm andrea-bs (n=andrea-b@ubuntu/member/beeseek.developer.andrea-bs)
12:51.52*** join/#storm gord (n=gord@5ac7ea62.bb.sky.com)
12:57.04*** join/#storm sidnei (n=sidnei@plone/dreamcatcher)
13:31.38*** join/#storm thumper (n=tim@canonical/launchpad/thumper)
13:38.27*** join/#storm vvinet (n=vince@132.210.76.200)
14:06.34*** join/#storm jamesh (n=james@canonical/launchpad/jamesh)
14:14.05*** join/#storm thumper (n=tim@125-236-193-95.adsl.xtra.co.nz)
14:16.15grahameanyone about?
14:16.40grahameI have some ideas to speed up storm for some uses (iterating over large numbers of results), and wondering if they'd be merged
14:18.04thervegrahame: ideas don't get merged, code does :)
14:18.09grahamewell, yeah
14:18.21grahameI'm iterating over lots of rows, say about 3 million
14:18.30thervethat's a lot
14:18.31grahamein that case, caching makes pretty much no sense
14:18.56grahameso I was thinking I'd modify things to notice when a particular iterator has yielded "lots" (say the cache size / 2) of rows
14:19.00grahameand then stop caching the results
14:19.40grahameit'd have the advantage of speeding up queries returning a lot of rows, and also mean that the cache doesn't get wiped out by large numbers of results you'll never look at again
14:20.06grahamea bit of experimentation suggests it'd make things about as fast as the django ORM for the case of big result sets
14:20.35jameshgrahame: niemeyer merged a more efficient cache implementation recently, although it is not selected by default
14:21.29grahameah, GenerationalCache
14:21.32grahameI'll have a look, thanks
14:22.25jameshgrahame: the default cache uses a straight LRU cache, which is pretty inefficient.  The GenerationalCache essentially manages two caches, clearing one then rotating when the active one fills up
14:22.53jameshso doesn't have to manage an LRU list
14:25.17jameshgrahame: it'll probably be made the default cache implementation in future
14:25.45grahamejamesh: there deosn't seem to be much performance difference for my test
14:26.01grahamejamesh: it's kind of a worst case for any cache though; it seems sensible to notice and stop caching
14:26.27grahamejamesh: the code in Store._load_object to calculate the cache key seems fairly expensive too
14:26.58grahame(it might just be I'm using the wrong tool, and should just not use storm in this way)
14:28.01jameshgrahame: yeah.  The cache key bit would be a good thing to look at optimising
14:28.40grahamejamesh: do you think a patch to notice the current query is swamping the cache and stop caching would make it in? then we skip all that code
14:28.55jameshwe used to have something faster in there but it caused other problems (previously the Variable instances were hashable and being used directly)
14:29.42jameshgrahame: we'd still need to push things through the store._alive weak dict
14:30.05grahamejamesh: yeah, I was wondering about that
14:30.51grahameI might just sneak in behind storm and use it to build me a query, then run it directly
14:31.06grahameI'm using storm to build a database from google transit feeds
14:31.14grahamethen doing silly things like drawing maps from the data
14:31.29jameshbecause it is easier than getting things from transperth? :)
14:31.35grahameyeah, pretty much
14:31.46grahame./armada.py stop_timetable sqlite:perth.db 17586
14:31.53grahamegives me the timetable for the stop near my house
14:31.56grahamea lot easier!
14:32.26jameshis that the bus stop number or some other key?
14:32.33grahamethat's the bus stop number
14:32.50grahamethe numbers on the transperth stop signs actually match the bus stop numbers in the feed, too
14:33.07jameshcool.
14:33.11grahameI've got a pretty map of all the routes too
14:33.33grahamejust trying to make it usefully fast now, the mapping stuff spends most of its time in storm
14:33.53*** join/#storm gord (n=gord@5ac7ea62.bb.sky.com)
14:34.39jameshif you are after just a few numbers, result_set.values(columns) might help.
14:37.39grahamewow, that's exactly what I needed
14:37.49grahameI definitely owe you a beer :-)
14:38.05grahamethat's stupidly fast too
14:41.14jameshwe really need better documentation :(
14:41.52grahamewell, I don't mind writing something about this
14:42.20grahameshrugs
14:53.27jameshthere are a bunch of methods that let you work with a result set as a whole rather than retrieving each value
14:53.38jameshand there are more we could add.
14:54.16grahameall that's needed is a little addition to the tutorial discussing them
15:04.58jameshwell, I think we probably need something more than just a tutorial.
15:06.01jamesha tutorial that includes absolutely everything is probably not that good at teaching new users
15:06.14jameshand tutorials are not the best form of reference for existing users
15:28.19*** join/#storm shaunm (n=shaunm@proxyserver.wolfram.com)
15:31.36*** join/#storm uzzed1 (n=alexandr@189.115.81.82)
15:34.11grahameyou could perhaps call it a cookbook
15:34.39grahamejust a list of problems and suggested solutions
15:35.12*** part/#storm sidnei (n=sidnei@plone/dreamcatcher)
16:03.10*** join/#storm deryck (n=deryck@samba/team/deryck)
16:25.26mupstorm/result-set-in-subselects r299 committed by jkakar@kakar.ca
16:25.27mup- New ResultSet.select method returns a Select expression based on
16:25.27mup<PROTECTED>
16:25.27mup<PROTECTED>
16:27.20*** join/#storm andrea-bs (n=andrea-b@ubuntu/member/beeseek.developer.andrea-bs)
16:35.30mupstorm/result-set-in-subselects r300 committed by jkakar@kakar.ca
16:35.30mup- EmptyResultSet has an implementation of the new select method.
16:43.06jkakarIf anyone has spare review cycles I've put a small Storm branch in review: bug #337494
16:43.07mupBug #337494: Use ResultSets in subselects <review> <Storm:In Progress by jkakar> <https://launchpad.net/bugs/337494>
16:43.14shaunmis there any way storm would be trying to do multiple transaction where there is one process with one thread using one Store object?
16:44.04jkakarshaunm: Not unless you have more than one Store in use.
16:44.47shaunmI am completely baffled as to how I'm getting this "database table is locked" error
16:45.23radixshaunm: do you have a simple script that can reproduce it?
16:46.33shaunmunfortunately, no
16:47.10shaunmand I'm able to get through the same code paths successfully for some other objects I process
16:47.48shaunmhmm, I could try to put something together
16:48.37jkakarCrap, there's a failing MySQL test in my branch.
16:50.39*** part/#storm philn (n=phil@o.bcn.fluendo.net)
16:53.26jameshjkakar: if you propose it for merging, it'll show on the active reviews pages
16:53.34jameshand LP will generate diffs
16:53.45*** topic/#storm by jamesh -> The Storm Python ORM - http://storm.canonical.com/ - 0.14 released! || Review branches: https://code.launchpad.net/storm/+activereviews
16:54.08jkakarjamesh: Oh right, I completely forgot about that step for some reason.  Thanks. :)
16:55.00shaunmooh
16:55.12*** join/#storm deryck_ (n=deryck@24-179-42-225.dhcp.leds.al.charter.com)
16:55.14shaunmhow does storm actually do iterators over ResultSet objects?
16:55.23mupstorm/result-set-in-subselects r301 committed by jkakar@kakar.ca
16:55.23mup- Don't return a real Select expression from EmptyResultSet.select
16:55.23mup<PROTECTED>
16:55.23mup<PROTECTED>
16:55.45jameshshaunm: you can iterate over a result set, if that's what you're asking.
16:56.15shaunmjamesh: I know I can.  I'm asking what it actually does with the database when I'm doing that
16:56.40shaunmbecause I only get the locking error after I enter this iterator
16:58.06jameshshaunm: it will execute the query and read the results in blocks with fetchmany()
16:58.12shaunmyup, that's the problem.  if I wrap the ResultSet with list() before iterating, error goes away
16:58.41shaunmjamesh: hence leaving an open query on the database, causing UPDATEs to fail
16:58.44jameshI don't think SQLite likes you doing extra queries while keeping a previous result set open
16:58.53shaunmyeah
16:59.26mupstorm/result-set-in-subselects r302 committed by jkakar@kakar.ca
16:59.26mup- Remove unnecessary untested code.
17:01.48shaunmjamesh: so by forcing this into a list, am I substantially hurting performance for a problem that will only manifest with sqlite?
17:03.00shaunmin this particular case, the upper bound for how many rows might be returned is basically the number of languages gnome is translated into
17:03.03jameshshaunm: if you're always going to use all items in the result set, and it isn't ever going to be overly large, it won't hurt performance much.
17:03.18shaunmok
17:03.58shaunmI'll leave it in, with a comment, and just be mindful of things like this
17:07.37shaunmI need to do some refactoring to avoid excessive UPDATE commands
17:17.32shaunmfive crawler modules down, six to go
17:20.14*** part/#storm goschtl (n=goschtl@p5B0BBA94.dip.t-dialin.net)
17:25.42jameshjkakar: looks like an interesting branch.  I wonder if turning Foo.bar.is_in(result_set) to Foo.bar.is_in(result_set.select(Foo.bar)) would be appropriate?
17:26.55jkakarjamesh: That was actually what I'd originally planned, but it means s.expr.Comparable.is_in needs to be special-cased to know about ResultSet, which feels dirty.
17:27.23jameshjkakar: well, Column.is_in could special case it ...
17:27.26jkakarjamesh: I thought about using and interface, with adaptation or something, to make is_in more generically able to transform its inputs, but decided that was more than I cared about.
17:28.01jameshbtw, theres a few of my branches still waiting for review :)
17:28.01jkakarjamesh: Sure, Column is probably a better place for it even, but it still feels a bit leaky.  OTOH, I totally agree with you that it's a nicer API.
17:28.22jkakarjamesh: I'll make some time for them today. :)
17:28.29jameshthanks
17:43.04jkakarjamesh: Thanks for your review comments, all make sense.
17:43.22jkakarjamesh: I'm not happy with the EmptyResultSet.select behaviour right now, either.
17:43.45jameshjkakar: the other reason for going with a real Select() in EmptyResultSet is that the Select will have the right number of columns
17:46.44jkakarjamesh: Yeah.
17:47.35*** join/#storm sidnei (n=sidnei@plone/dreamcatcher)
17:48.49jkakarjamesh: for the "test on a set expression result set" what do you mean exactly?
17:49.18jameshjkakar: something like result1.union(result2).select()
17:49.19mupstorm/result-set-in-subselects r303 committed by jkakar@kakar.ca
17:49.19mup- EmptyResultSet.select returns a Select expression instead of an
17:49.20mup<PROTECTED>
17:49.35jkakarjamesh: Ah, I see, okay, thanks.
17:49.37jameshprobably best to use unions for the example, since I think that's the only one mysql supports
17:49.56jameshjkakar: feel free to raise an exception in such cases if it looks too hard to handle
17:55.58jkakarjamesh: That's what I'm thinking.  There's actually a similar issue with values.  result1.union(result2).values(Foo.bar) blows up.
18:00.57mupstorm/result-set-in-subselects r304 committed by jkakar@kakar.ca
18:00.58mup- ResultSet.select and ResultSet.values both raise Feature error if
18:00.58mup<PROTECTED>
18:00.58mup<PROTECTED>
18:02.19jameshjkakar: there is some code in the ResultSet.__contains__() implementation that could probably be used to handle set expressions
18:02.37jameshusing the replace_columns() helper function
18:03.33jkakarjamesh: Ah, good eye.  I think that looks workable.
18:07.05jameshI think I wrote that code after radix complained about my __contains__() implementation breaking union result sets
18:16.06jameshhaving LP generate diffs for the reviews is pretty cool
18:44.12*** join/#storm sidnei_ (n=sidnei@189.30.217.33)
18:45.34*** join/#storm sidnei__ (n=sidnei@189.30.217.33)
18:51.45mupstorm/result-set-in-subselects r305 committed by jkakar@kakar.ca
18:51.45mup- Merged trunk.
18:54.33mupstorm/result-set-in-subselects r306 committed by jkakar@kakar.ca
18:54.33mup- Updated NEWS file.
18:57.30shaunmhmm, another locking issue
18:57.52shaunmissuing a bunch of CREATE TABLE statements, then I get a database-locked error on COMMIT
18:58.57shaunmCREATE statements done with store.execute(cmd, noresult=True)
18:59.25shaunmnot a single SELECT or other statement being printed out by the debugger
19:13.56shaunmok, if I commit after each one, I get the error after the first command that actually creates a table
19:14.04shaunm(they're all "IF NOT EXISTS")
19:14.50shaunmso does "CREATE TABLE IF NOT EXISTS" leave something open with sqlite when the table does exist?
19:14.58jkakarshaunm: What happens if you call store.flush() between CREATE TABLE statements?
19:15.16jkakarshaunm: Actually, scratch that, that was crack, sorry.
19:15.16shaunmsame
19:24.39shaunmsuppose I could just do that one table creation manually and move on with things
19:25.26shaunmnot much point in spending time on an issue that only happens when I add a table and only with a database that I only use for testing
19:27.00shaunmoh
19:27.18shaunmmy database is just flat-out locked.  not a storm thing at all
19:31.29shaunmprobably a stale lock from me killing the crawler when it was doing something stupid.  anybody happen to know how to get this lock off?
19:34.47shaunmson of a bitch
19:35.19shaunmI had a python session running where I was futzing around with stuff
19:35.28shaunmsorry for the stupidity
19:37.25radixhehe
19:47.10jamesheventually you'll want a real database
19:57.32shaunmjamesh: of course.  sqlite is just for testing.  I've run pulse off of mysql before
19:57.45shaunmusing django.  haven't tried it yet with storm
19:58.04shaunmbut obviously, the goal for production use is mysql or postgres
19:58.53jameshshaunm: you might be better off testing on your target database
19:59.03jkakarAye
20:07.28shaunmthat makes it difficult to run my test instance on gnome.org
20:07.44shaunmwith sqlite, I can just upload the database file
20:11.44shaunmplus, I'm not sure I have a concrete target database.  I only have experience with mysql, but there are gnome folks who would prefer we did stuff with postgres
20:16.10jameshfwiw, the PostgreSQL backend to storm probably gets more attention than the MySQL backend
20:16.27jamesh(that said, most database-related tests get run for all three current backends)
20:18.06*** join/#storm oubiwann (n=oubiwann@97-119-85-2.omah.qwest.net)
20:20.51shaunmI don't really have the level of database expertise to make an informed decision between the two
20:48.57*** join/#storm artista_frustrad (n=artista_@201-25-170-30.ctame704.dsl.brasiltelecom.net.br)
21:10.00*** join/#storm artista_frustrad (n=artista_@201-40-94-71.ctame704.dsl.brasiltelecom.net.br)
22:12.17*** join/#storm cody-somerville (n=cody-som@ubuntu/member/somerville32)
22:13.04cody-somervilleHow well does storm work with threads?
22:38.04jkakarjml: https://code.edge.launchpad.net/~jkakar/storm/result-set-in-subselects/+merge/4147
22:38.23jmljkakar: thanks
22:38.37jmllooks
22:38.51jmljkakar: do you need another review?
22:38.58jkakarjml: It was fun.  Also, FYI, for some reason the diff on that merge page is several revisions and several hours out of date.
22:39.12jkakarjml: Yes please!  Once I have a second review taken care of I can merge it.
22:39.44jmljkakar: it's the revision created when you originally submit the merge proposal
22:39.52jmljkakar: that's a feature, not a bug.
22:40.41jmljkakar: the "diff against target" feature should show the current diff.
22:41.19jkakarjml: It's confusing because it says "Review Diff", but isn't what should be reviewed.
22:42.07jkakarjml: Am I doing something wrong?
22:43.01jmljkakar: well...
22:43.04jmljkakar: no, you aren't.
22:43.33jmljkakar: the idea is that the when you propose a branch for merging, the diff you want reviewed is the current head of that branch
22:43.46jkakarjml: Right, that makes sense.
22:43.55jmljkakar: this is how bzr send works, for example.
22:44.07jmljkakar: there should *probably* be a way to update the diff
22:44.26jmljkakar: but there's some internal contention on this.
22:45.10jkakarjml: So, the thing that's weird for me is that I've made changes based on review feedback from jamesh, but the diff doesn't reflect that and is misleading for users that don't know they should bzr branch $branch and look at the real diff.
22:46.00jkakarjml: I mean, I guess no one should review a branch without really getting it and running its tests, etc., but from a convenience point of view being able to re-review minor changes just by looking at the diff on the web would be nice.
22:46.15jmljkakar: yeah.
22:46.22jmljkakar: I agree 100% with that.
22:46.23jkakarjml: Would it help if I filed a bug?
22:46.29thumperhello?
22:46.33jkakarthumper: Hey!
22:46.37jmljkakar: yes please, but wait a moment or two for thumper to catch up :)
22:46.41jkakarHeh
22:46.44thumperreads
22:47.28thumperI have a plan to work with beuno on this
22:47.34jkakarCool.
22:47.38thumperto have the most obvious diff be shown
22:47.45thumperso if there is a more up to date preview diff
22:47.49thumperwe can show that more
22:47.55thumperhowever we'd also have the review diff
22:48.00thumperbut it would be "closed"
22:48.01jkakarI really think the diff adds a lot of value to the page, btw,  Thanks for adding it. :)
22:48.08thumperand load with ajax
22:48.18thumperjkakar: abentley did a lot of it
22:48.23thumperjkakar: I just made it look nice :)
22:50.37jkakarthumper: Is there any value in me filing a bug about this?
22:54.01thumperjkakar: always
23:00.38jameshthumper: I also noticed that jkakar's branch is the only one showing a diff inline.  Is that just because the older proposals were from before the Launchpad update?
23:01.23thumperjamesh: probably
23:01.49thumperjamesh: I'm running lp:mad for storm, so the preview diff should be up to date
23:02.43jameshthumper: I was comparing https://code.edge.launchpad.net/~jkakar/storm/result-set-in-subselects/+merge/4147 with https://code.edge.launchpad.net/~therve/storm/twisted-integration/+merge/3733
23:02.46thumpercan we use an order by on a result set
23:02.53thumperthat already has an order by clause?
23:02.58thumperlooks
23:03.22jkakarthumper: result.order_by(Foo.bar, Foo.baz, ...)?
23:03.23thumperjamesh: the code to auto generate the review diff landed around merge 4000
23:03.41thumperjkakar: we have a function that already provides an order_by
23:03.47thumperjkakar: and we want to override the ordering
23:04.08jmlrs.order_by(...).order_by(...), where the second makes the first irrelevant
23:04.46jkakarthumper: Just call .order_by again.
23:05.49jkakarOn this topic, you guys know about __storm_order__, right?
23:05.51jameshthumper: the librarian file diffs look okay.  I was just noticing that jkakar's branch had a pretty printed diff on the proposal page while the others did not.
23:06.04thumperjamesh: that's on my todo list
23:06.21thumperjamesh: almost certainly, they'll start looking nice in the next week or so on edge
23:06.21jameshthumper: this is really cool, by the way :)
23:06.29thumperI'm glad you like it
23:07.11jkakarthumper: bug #338002
23:07.12mupBug #338002: 'Review Diff' on merge proposal page can be out-of-date <Launchpad Bazaar Integration:New> <https://launchpad.net/bugs/338002>
23:07.40thumperjkakar: thanks I'll triage later
23:12.01jameshjkakar: so, do you think having ResultSet.find() and ResultSet.select() would be confusing?
23:12.20jamesh[not that we have ResultSet.find() yet]
23:13.15jkakarjamesh: I was wondering about it and didn't have a clear feeling about it.
23:13.41jameshI think jml asked about ResultSet.find() once
23:13.44jkakarjamesh: One thought was to call it ResultSet.get_select or get_select_expr, but I kind of like verbs.
23:13.59jmldid I?
23:14.15jmloh yeah, I did.
23:14.26jameshjml: I think it was you.  A way to narrow down an existing result set
23:14.29jmlyeah
23:14.38jmlactually, I've got some code that you guys might want to look at
23:14.42jkakarlike to add extra where query parameters?
23:15.45jameshjkakar: yeah.
23:15.56jkakarI'm not sure 'find' is the right name for that.
23:16.11jkakarMaybe ResultSet.extend?
23:16.48jameshjkakar: we've got a find() method on bound reference sets that does essentially the same thing
23:16.57jmlor restrict()
23:17.00jmlor filter()
23:17.05jml(except yay python)
23:18.07jkakarjamesh: It does essentially the same thing yes, but "find me things in this reference set" and "mutate this result set with some custom bits" are quite different things.
23:18.42jameshjkakar: I wasn't thinking of a method that mutates the result set
23:18.53jameshhave it return a new one
23:18.55jkakarThen again, maybe consistency is better... I guess I don't have a strong opinion about it.  I think I'm fine with 'select' unless someone has a better idea.
23:19.19jkakarjamesh: Right, that's what I was thinking, in fact.  It's still different from "find me things in this reference set".
23:19.32jameshreally?
23:19.44jameshit is "find me things in this result set"
23:20.06jkakarHmm.  I guess, okay.
23:21.07jmlhttps://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/annotate/head%3A/lib/canonical/launchpad/interfaces//branchcollection.py
23:21.12jmlhttps://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/annotate/head%3A/lib/canonical/launchpad/database//branchcollection.py
23:22.38jkakarUnauthorized. :(
23:22.43jmljkakar: sorry :(
23:23.01jmljkakar: you'll be able to see it in July :)
23:23.40jkakarHeh

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.