rt.cpan.org updated - thank you bestpractical

so it seems that we got a nice present for christmas from bestpractical: rt.cpan.org has been updated to last version 3.8

thank you, and keep up the good work!


helping perl packagers package perl modules (for real this time)

chromatic posted a long rant (who would have guessed? :-) ) about perl modules shipped by linux distributions. however, he doesn't have all the answers... nor the experience needed for this rant. since i'm a perl author and mandriva packager for perl and lots of perl modules, i think i have more enlightened information about this topic.

first, let me state that using system perl is fine, but i really discourage it for your enterprise application. the perl version will change from time to time, ditto for the perl modules you are relying upon. so if you want to be in control your software foundation for your app (and you should) - compile your perl and your modules yourself.

second thing: i also discourage mixing using perl modules installed by your package system and by running cpan as root. you'll end up with a mix of files in /usr/perl5 that either belong or not to a system package, which sucks. installing packages in a local lib of yours is fine, which is made quite easy with local::lib by now.

if you're comfortable with those rules, then you're welcome to using the system perl and the modules shipped by your distribution. after all, we packagers are going through this work in the hope of being useful to others - that is, you!

but back to chromatic post. if you want to install cpan modules as system packages, there's already a tool that does it for you: it's called cpan2dist, and is part of cpanplus. it works as long as a backend for your distribution exists. there's currently one for debian, mandriva (that i wrote), fedora and gentoo. it's not that difficult to write, and allows you to write:
# cpan2dist --format CPANPLUS::Dist::Mdv --install Foo::Bar
this will automatically download foo::bar, check its dependencies (and build & install them if needed recursively), build the module as a mandriva package and install it. what else exactly do you want/need? (as a cpanplus backend writer, i do have some things that i'd like cpan2dist to have, but none as a regular user). and if you're a packager and want to integrate cpan2dist with your linux distribution build system, cpan2pkg can help you (even if it's currently stalled due to sthg missing in cpanplus).

however, if you want to help perl packagers package your modules for a distribution, here's a list of thing that you should not do. this is a list of real, practical things to do as a module author - not some generic hand-waving towards the perl community out there. this comes from my experience as packager for mandriva of more than 400 modules, and makes me curse the module author everytime i'm encountering one of those problems...

  • test your dist before shipping. really, i'm not kidding. lots of dists just fail their tests. and not just on linux, on all the platforms. so if you make an update that "just can't fail" (yeah, right) to your dist just before shipping, please run your test suite nevertheless. just in case, you know, it might fail.
  • if you're shipping pod tests that are skipped depending on the presence of test::pod and test::pod::coverage, make sure you have those modules installed, so you are running those tests, too. even better: skip those tests unless RELEASE_TESTING or AUTHOR_TESTING is set. after all, it's nice for you to know you still have some documentation work to do, but i don't care as a packager... and, you know, it's now the standard & recommended way of shipping those tests.
  • those 2 items lead me to another easy thing for module authors to do to help us: check the cpantester status of your dist. investigate all the fails that you have. if you see a fail that is your fault, fix it and upload a new version. it helps us because this prevents us from having to report a bug against your dist. i generally wait 3 or 4 days before reporting a bug on a dist that has some failure reports, hoping (what a fool) that the author will notice by herself that sthg is going wrong.
  • speaking of bug reports, if we take the time to open a report for your dist (very often with a patch attached)... please read it. and act. or at least answer us. either apply the patch, or explain why you don't want to apply it like that... and ship a new version of your dist, with the fix included.
  • but of course, before reporting a bug, we should find the bug tracker. so, by using rt.cpan.org, you really help us to have a single unified point of contact. i know that rt is kind of slow, not very intuitive, has some problems and could be cleaned out a bit... but it is here, bestpractical is providing & administering it for us for free, and has this nice feature of having a queue for every perl dist on cpan. if you don't want to use it, there are some more polite ways of saying it... and giving the url of your tracker helps, too. oh, and if i took the pain to play by your rules and report a bug to your non-standard bug tracker, i would greatly appreciate that you act on my ticket. or at least, you know, just acknowledge the fact that you received the report.
  • if you want to really piss off a packager, a simple but effective way is to change your versioning scheme every now and then, by (ab-)using your knowledge of perl way of understanding versions. in the same major version, of course. going from version 1.470 to 1.50 is not funny. if you want to change your versioning scheme, you can change the major number to. after all, i'm pretty sure that you're not paying any extra money per major number used in your dist. this is what caused us to mangle the version of perl modules shipped in mandriva.
  • speaking of regular changes, it's irritating to have to follow you through your use of makefile.pl to build.pl to makefile.pl to build.pl to... well, you understand what i mean. even to use this shiny replacement that is module::build, or this oh-so-marvelous module::install, oh no finally module::build way of working fits me better in retrospective... it's ok for you to change from time to time, but changing at every version of your dist - just make up your mind dammit!
  • speaking of it, i hate module::install. and especially its feature that prompts and tries to handle the deps itself coz-it's-so-cool-it-can-do-it-for-real. sorry, but that's not your job in the tool chain. just report that you miss some deps. i know that there's a flag to make this feature go away while launching makefile.pl. but i don't want to bother and would rather expect that the whole stuff has sane defaults...
  • oh, and in case you're wondering - every prompting in the configure phase (makefile.pl or build.pl) sucks and should be banned.
  • having clear and up-to-date dependencies would be fine, too. i know it's not always easy to have them correctly, but you can change your tools and adopt one that extract your prereqs for you.
  • try to avoid dependency on modules that are known to fail. even if it works in your setup, trust cpantesters if they tell you that it fails 95% of its reports: it might not be a good idea to depend on it.
  • trying to support old perl versions and old releases of modules is fine, but update your modules and see if your code work. some functions may become deprecated, or you were relying on a buggy behaviour, or whatever. we update the perl packages as we see new versions, not only your modules. so a linux dist will usually have latest & greatest version of all the modules - you'd better be sure that your code work with them, hmm?
  • finally, if you're developping under macosx, make sure that you don't ship resource files, or textmate temp files. having ._Foo.pm in the dist is not fun: automatic compile tests will fail, unless that's your manicheck or signature check. and even if everything in perl dist is fine, things may bork in the repackaging of the system package due to a file not listed. so, be extra careful when shipping your dist - or change platform and burn your shiny toy that calls itself a computer (careful & clever readers may have guessed from previous sentence that i don't like macosx - but that's not a reason to ditch this post and not to follow the advices i'm reporting).
there, i think that's a pretty good start. i've encountered each and every item of this list at least once (i stopped counting exactly how much a long time ago). don't take it personally if you made some of those mistakes - i have done almost all of them by myself as module author (except of course using module::install and using a mac, but you could have guessed for at least the last part). what's important is to realize that those behaviours are annoying for packagers, and that changing those habits is quite easy to do (except for the burning your mac part, because i agree that it's not very environmental-friendly - see, i'm not that stubborn! ;-) ).

if you're following all those advices, packagers of your modules will love you. (or at least, not hate you - which is still a win :-) ). i know i will...


scrapping web pages with embedded javascript in perl

recently i had to download various pdf files from a vendor website, which list them according to different criterias. since i'm lazy, i wanted to write a script that would download all those files for me.

that would be quite easy, but the list was generated by some ajax when i was changing the criterias... so, what to do to pally this problem?

there are some solutions out there for the perl programmer.

although www::mechanize's author stated that he does not want to bother with integrating javascript support, someone wrote a javascript plugin for mechanize. it's still experimental by now, though.

the same author, leveraging his knowledge, also wrote www::scripter that's supposed to achieve the same result.

but before going in those complex beasts, it may be worth trying to understand what the javascript is doing, and if the hidden url that gives the wanted list of files is easy to guess. to do that, you can go the hard way and read the javascript... or the easy way and wiretap the network!

enters http::proxy which is a configurable proxy written in perl by book. among the various examples, it comes with a logger.pl script that displays all the urls accessed by your browser (with cookies). run it as is, configure your browser to use a proxy on localhost port 3128, and check the output of logger.pl while tinkering with your ajaxy application.

chances are that you'll see a url such as http://www.examplecom/path/to/script/list/?crit1=foo&crit2=bar&limit=10

well, that was the case for me, and i could just download the list according to my criterias, parse it with the excellent html::tree and download each of the files separately... \o/

so the title of this blog post is wrong: no javascript webscrapping (although i gave some hints on how to do it), but cheap trick to achieve the same result! :-)


parrot 1.9.0 available in mandriva

since parrot 1.9.0 is officially out, it landed quite quickly in mandriva cooker. next rakudo version will soon follow.


rakudo available in mandriva

after some chat with parrot-porters, i finally undertood that what's get installed in /usr/src/parrot (and that i was trimming during installation) is not parrot full source, but intermediate forms of pmc, needed if high-level languages want to subclass them. they are thus now shipped in parrot-src package - and this is enough for rakudo to be compiled.

therefore, rakudo 2009-11 is now (finally) available in mandriva...


mojomojo now available in mandriva

after struggling quite some time with its dependencies, and then some other prereqs not mentioned deep in the dependency chain, i'm happy to report that mojomojo, the catalyst powered wiki, is now available in mandriva.


migration to moose - step 3

continuing [0] to port mpd modules to moose - this time it's poe::component::client::mpd turn. in order to do that, i had to fix the tests (using the shiny new test::corpus::audio::mpd) in order to be sure not to introduce regressions.

migration was quite easy, but a bit long. especially since i used a lazy_builder attribute for the connection, leading to commands sent before actual connection took place - and thus the tests failed with a cryptic message from poe::component::client::tcp [1]

the excellent moosex::poe was used too, but not in the connection, since it reuses various components having a more classic way of defining their events.

and now that the base code is moose-ified, i think i'll be able to gain more clarity while leveraging some moose levers.

[0] see this previous post and this one
[1] saying that {server} doesn't have a put method... as opposed as what's documented. but that's because at that time it has not (yet) morphed into a poe::wheel::readwrite. even if i'm at fault for not waiting the connected event, couldn't it provide a better error message?