Most people would generally prefer a climate where it’s bright and warm most of the time. But for Canadians and others who live where it’s not, there are compensations, and one is the experience of spring. I have a picture.
After all the months of 50° North Latitude winter—icy-sharp in most of Canada, wet and dark here in Vancouver—the soul, the spirit, and the libido all spring to life when the sun comes back. We’ve had a solid year of crappy weather, but this last Saturday through Monday were solidly summery, bright and warm; and in this season the days are already long and each gets longer so fast you can feel it.
On the back porch, our pear tree’s branches were silhouetted against the neighbors’ big wild old cherry; the cherry yields no edible fruit but who cares, it’s beautiful tree any time of year.
I’ve been watching our internal leadership conference and spending quite a bit of time talking in the virtual hallways, and I’ve been surprised at the intensity of feeling about Mr. McNealy. Yes, there are those here saying “About bloody time, now we can make some progress” but there’s a much bigger group that is genuinely emotional about this transition. Maybe it’s a function of seniority: I never met nor corresponded with Scott, and he hasn’t been much of a presence in the company’s conversation in the time I’ve been here. But there are a lot of smart, seasoned, unsentimental people making it clear that he’s been a major force in their lives, at a more personal level than I’m used to hearing when people speak about executives. I guess also that to a lot of people, Sun’s vision, for which Scott gets some of the credit, was a radical and wonderful thing. I first used Unix in 1979 and quit a nice big-company job to become a VAX-bsd sysadmin in 1983, so I’ve always kind of lived inside that vision. But I’ll tell you one thing, what I’ve been hearing the last couple of days makes me really regret that I didn’t get to know Scott.
Jane Jacobs died; the city I live in, Vancouver, is pretty solidly Jacobsian both in its current shape and its planning dogma. By choosing to live here I’m empirically a fan. Oddly, few have remarked how great Jacobs looked; her face commanded the eye. Which leads me Alex Waterhouse-Hayward’s wonderful Jane Jacobs & Viveca Lindfors; surprising portraits and thoughts on decoration. W-H’s blog has become one of only two or three that I stab at excitedly whenever I see something new. For example, see Sex Crimes, Homicide and Drugs and yes, that’s what it’s about. Staying with the death-and-betrayal theme, and apparently (but not really) shifting back 2½ millennia, see John Cowan’s The War (after Simonides), being careful to look closely at the links. I’ve written about those same wars.
At that Rails conference, when I was talking to Obie Fernandez, he asked, more or less “How can Sun love us? We’re not Java” and I said, more or less, “Hey, you’re programmers, you write software and there have to be computers to run it, we sell computers, why wouldn’t we love you?” Anyhow, we touched on parallelism a bit and I talked up the T1; Obie took that ball and ran with it, saying all sorts of positive things about synergy between Rails’ shared-nothing architecture and our multicore systems. Yeah, well, good in theory, but I’m too old to make that kind of prediction without running some tests. Hah, it turns out that Joyent has been doing that, and have 76 PDF slides on the subject. If you care about big-system scaling issues, read the whole thing; a little long, but amusing and with hardly any bullet lists. If you’re a Sun shareholder looking for a pick-me up, check out slides 40-41, 49, and 52-74. Oh, I gather that the T1, Solaris, and ZFS are OK for Java too. [Update: The title was just “SAMR”, as in LAMP with two new letters. Enough people didn’t get it that I was forced to think about it, and MARS works better anyhow.]
I got email late yesterday from David Berlind: “Hey, can I call you for a minute?” He wanted commentary on a story he was writing that I think is about the potential for intellectual-property lock-ins on RSS and Atom extensions. I say “I think is about” because the headline is “Will or could RSS get forked?”. After a few minutes’ chat, David asked if he could record for a podcast, and even though I only had a cellphone, the audio came out OK. The conversation was rhythmic: David brought up a succession of potential issues and answered each along the lines of “Yes, it’s reasonable to worry about that, but in this case I don’t see any particular problems.” Plus I emitted a mercifully-brief rant on the difference between protocols, data, and software. On the one hand, I thought David could have been a little clearer that I was pushing back against the thrust of his story, but on the other hand he included the whole conversation right there in the piece, so anyone who actually cares can listen and find out what I actually said, not what I think I said nor what David reported I said. I find this raw barely-intermediated journalism (we talk on the phone this afternoon, it’s on the Web in hours) a little shocking still. On balance, it’s better than the way we used to do things.
It’s not that complicated, really. Bloggers are taking over the world. Resistance is futile; you will be assimilated.
I’m not really a Bob Dylan fan. A voice like that, and a tunesmithing talent like that, come along only a few times per century, but he’s still kind of irritating. That aside, the song One More Cup of Coffee, from the 1976 album Desire, can’t be ignored; wonderful tune, wonderful orchestration, wonderful performance. (“5✭♫” series introduction here; with an explanation of why the title may look broken.)
Nothing I can possibly write will add any wisdom to the millions of words, some 90% of them in excess of needs, written on the subject of this particular person.
A personal statement: Bob Dylan has long irritated me for, during the first thirty years or so of his career, never having given a straight answer to a straight question, and for writing songs with dozens of boring verses. But they’ll still be listening to lots of his performances long after I’m dead, and in recent years he’s become a better, more direct, interview.
My taste in Dylan is a little unusual: once you get past One More Cup of Coffee, my favorites would be Baby Let Me Follow You Down (from the Last Waltz soundtrack) and Crash on the Levee (Down in the Flood) from The Basement Tapes.
Desire, the record, is hit and miss. Joey, glorification of the life of some mafioso, is flawed in concept and unlistenable in execution. Hurricane, whatever you think about Mr. Carter, that song rocks; and Isis hits pretty hard too.
Is there anything in One More Cup of Coffee that’s not perfect? Well yes, in the verses, the lyrics on occasion drag (“He oversees his kingdom / So no stranger does intrude / His voice it trembles as he calls out / For another plate of food”). But apart from that, the sentiment is compelling, Scarlet Rivera’s violin is beautifully scored and played, the tune is to die for, and the backing vocals are by Emmylou Harris, who you can bet is going to be here in the 5-✭ series one of these days. And while there’s not much middle ground on the subject of Dylan’s singing, if you like it, you’ll really like this song.
Listen to the choruses: Bob and Emmylou veer wildly around the rhythm, then coalesce on the beat when it matters, and they’re making it up as they go along, they’re wholly inhabiting the moment, and it’s quite, quite perfect.
Oh yeah, it’s out there. And there’s a live version too; but the smart thing would be to go buy the un-compressed un-DRM’ed shiny round silver version of Desire; it’s a keeper.
First of all, implementors of anything Atom-related need to spend some time chez Jacques Distler; in particular, the conversation that plays out in the comments. Second, there’s this piece of software called Planet Planet that allows you to make an aggregate web page by reading lots of feeds; for example, see Planet Apache or Planet Sun. Sam Ruby decided that its Atom support needed some work, so he did it. Now, here’s the exciting part: he pinged me over the weekend and said “Hey, look at this” wanting to show me his cleverly-Atomized Planet Intertwingly feed. I looked at it in NetNewsWire and was puzzled for a moment; some but not all of the things in the feed were highlighted as unread, even though this was the first time I’d seen it. Then the light went on. This is Atom doing exactly what we went to all that trouble to make it do. NetNewsWire has good Atom support and, because Atom entries all have unique IDs and timestamps, it can tell that it’s seen lots of those entries before in other feeds that I subscribe to. That’s how I found Jacques’ piece. This is huge; anyone who uses synthetic or aggregated feeds knows that dupes are a big problem, showing up all over the place. No longer, Atom makes that problem go away.
Check out Dave Hyatt’s excellent write-up on designing and rendering Web pages so they take advantage of the higher-resolution screens that may be coming our way. I emphasize “may” because I’ve seen how slowly we’ve picked up pixels over the years. The first really substantial screen I ever worked on was a 1988-vintage Sun workstation with about a million pixels. The Mac on my lap right now, which has 125 times as much memory as that workstation, has only 1.38 million pixels. Anyhow, Hyatt has some smart things to say on the issues, which are trickier than you might think. I suspect that sometime in a couple of years, if I still care about ongoing, I’m going to have to go back and reprocess all the images so that higher-res versions are available for those who have the screens and don’t mind downloading bigger files. Anyhow, Dave’s piece may be slightly misleading in that he talks about SVG as though it’s something coming in the future. Not so, check out this nifty SVG Atom logo, which works fine in all the Mozilla browsers I have here. Load it up, resize the window, and watch what happens. Then do a “view source”. [Update: Jeff Schiller writes to tell me that Opera 9 does SVG (and Opera 8 “SVG Tiny”) too.] [Dave Walker writes: Though the shipping version of Safari doesn’t support SVG, the nightlies do.] [Dave Lemen points to JPEG 2000 as possibly useful in a high-res context.]
My brother Rob is really taking to this blogging medium. Check out his recent Credo, and also the only instance I’ve seen of Anglo-Saxon alliterative poetry applied to a mini-van.
Almost every Sunday I grab the week’s ongoing logfiles and update my numbers. I find it interesting and maybe others will too, so this entry is now the charts’ permanent home. I’ll update it most weeks, probably. [Updated: 2006/04/23.]
The notes on usage and source code will return in coming weeks when I get the cycles to rewrite this whole article.
I recently
updated the
ongoing software
(but haven’t updated the Colophon I see, oops).
Anyhow, the XMLHttpRequest
now issued by each page seems to be a
pretty reliable counter of the number of actual browsers with humans behind
them reading the pages. I checked against
Google Analytics
and the numbers agreed to within a dozen or two on days with 5,000 to 10,000
page views; interestingly, Google Analytics was always 10 or 20 views
higher.
Anyhow, do not conclude that now I know how many people are reading whatever it is I write here; because I publish lots of short pieces that are all there in my RSS feed, and anyone reading my Atom feed gets the full content of everything. I and I have no #&*!$ idea how many people look at my feeds.
By the way, this was the first time in weeks and weeks that I’d looked at the Analytics numbers, and they showed almost exactly zero change from the report linked above. So I’m going to turn them off; they’re a little too intrusive and I think may be slowing page loads.
Anyhow, I ran some detailed statistics on the traffic for Wednesday, February 8th, 2006.
Total connections to the server | 180,428 |
Total successful GET transactions | 155,507 |
Total fetches of the RSS and Atom feeds | 88,450 |
Total GET transactions that actually fetched data (i.e. status code 200 as opposed to 304) | 87,271 |
Total GETs of actual ongoing pages (i.e. not CSS, js, or images) | 18,444 |
Actual human page-views | 6,348 |
So, there you have it. Doing a bit of rounding, if you take the 180K transactions and subtract the 90K feed fetches and the 6000 actual human page views, you’re left with 84,000 or so “Web overhead” transactions, mostly stylesheets and graphics and so on. For every human who viewed a page, it was fetched almost twice again by various kinds of robots and non-browser automated agents.
It’s amazing that the whole thing works at all.
In December of 1996 I released a piece of software called Lark, which was the world’s first XML Processor (as the term is defined in the XML Specification). It was successful, but I stopped maintaining it in 1998 because lots of other smart people, and some big companies like Microsoft, were shipping perfectly good processors. I never quite open-sourced it, holding back one clever bit in the moronic idea that I could make money out of Lark somehow. The magic sauce is a finite state machine that can be used to parse XML 1.0. Recently, someone out there needed one of those, so I thought I’d publish it, with some commentary on Lark’s construction and an amusing anecdote about the name. I doubt there are more than twelve people on the planet who care about this kind of parsing arcana. [Rick Jelliffe has upgraded the machine].
Lauren and I went to Australia in late 1996 to visit her mother and to get married, which we did on November 30th. Forty-eight hours later, Lauren twisted her knee badly enough that she was pretty well confined to a sofa for the rest of our Australian vacation.
So I broke out my computer and finished the work I’d already started on my XML processor, and decided to call it Lark for Lauren’s Right Knee.
Lark was a pure deterministic finite automaton (DFA) parser, with a little teeny state stack. Some of its transitions were labeled with named “events” that would provoke the parser to do something if, for example, it had just recognized a start tag or whatever.
DFA-driven parsers are a common enough design pattern, although I think Lark is the only example in the XML space. There are well-known parser generators such as yacc, GNU bison, and javacc, usually used in combination with lexical scanners such as flex so that you can write your grammar in terms of tokens not characters. Also, they handle LALR langauges, so the parsing technique is quite a bit richer than a pure state machine.
I thought I had a better idea. The grammar of XML is simple enough, and the syntax characters few enough, that I thought I could just write down the state machine by hand. So that’s what I did, inventing a special-purpose DFA-description language for the purpose.
Then I had a file called Lark.jin
which was really a Java
program that used the state machine to parse XML. The transition “events”
in the machine were mapped to case
labels in a huge
switch
construct. Then there was a horrible, horrible
Perl program that read the Lark.jin
and the automaton,
generated the DFA tables in Java syntax, inserted them into the code and
produced Lark.java
, which you actually compiled
to make the parser.
So while Java doesn’t have a preprocessor, Lark did, which made quite a few things easier.
There were a lot of tricks; some of the state transitions
weren’t on characters, they were on XML character classes such as
NameChar
and so on.
This made the automaton easier to write, and in fact, to keep the class files
small, the character-class transitions persisted into the Java form, and the
real DFA was built at startup time.
These days, quick startup might be more important than .class
file size.
It was damn fast. James Clark managed to hand-craft a Java-language XML parser called XP that was a little faster than Lark, but he did that by clever I/O buffering, and I was determined to leapfrog him by improving my I/O.
This was before the time of standardized XML APIs, but Lark had a stream API that influenced SAX, and a DOM-like tree API; both worked just fine. Lark is one of very few parsers ever to have survived the billion laughs attack.
Lark was put into production in quite a few deployments, and the flow of bug reports slowed to a trickle. Then in 1998 I noticed that IBM and Microsoft and BEA and everyone else were building XML Processors, so I decided that it wasn’t worthwhile maintaining mine.
I never got around to teaching it namespaces, which means it wouldn’t be real useful today.
It had one serious bug that would have been real work to fix and since
nobody ever encountered it in practice, I kept putting it off and never did.
If you had an internal parsed entity reference in an attribute value and the
replacement text included the attribute delimiter ('
or
"
), it would scream and claim you had a busted XML document.
What happened was, Rick Jelliffe, who is a Good Person, was looking for a FSM for XML and I eventually noticed, and so I sent him mine.
There’s no reason whatsoever to keep it a secret: here it is. Be warned: it’s ugly.
Fortunately, there were only 227 states and 8732 transitions, so the state
number fit into a
byte; that and the associated event index pack into a short.
To make things even tighter, the transitions were only keyed by characters up
to 127, as in 7-bit ASCII.
Characters higher than that can’t be XML syntax characters, so we’re only
interested whether they fall into classes like NameChar
and
NameStartChar
and so on. A 64K byte[]
array takes
care of that, each byte having a class bitmask.
As a result of all this jiggery-pokery, the DFA ends up, believe it
or not, constituting a short[227][128]
.
Here’s a typical chunk of the automaton:
1. # in Start tag GI
2. State StagGI BustedMarkup {in element type}
3. T $NameC StagGI
4. T $S InStag !EndGI
5. T > InDoc !EndGI !ReportSTag
6. T / EmptyClose !EndGI
This state, called StagGI
, is the state where we’re actually
reading the name of a tag, we got here by seeing a <
followed
by a NameStart
character.
Line 1 is a comment.
In line 2 we name the state, and support error reporting, providing the name
of another state to fall back into in case of error, and in the curly braces,
some text to help build an error message.
Line 3 says that if we see a valid XML Name character, we just stay in this
state.
Line 4 says that if we see an XML space character, we move to state
InStag
and process an EndGI
event, which would stash
the characters in the start tag.
And so on.
An early cut of Lark used String and StringBuffer objects to hold all the bits and pieces of the XML. This might be a viable strategy today, but in 1996’s Java it was painfully slow. So the code goes to heroic lengths to live in the land of character arrays at all times, making Strings only when a client program asks for one through the API. The performance difference was mind-boggling.
If you look at the automaton, and the Lark code, at least half—I’d bet three quarters—is there to deal with parsing the DTD and then dealing with entity wrangling. A whole bunch more is there to support DOM-building and walking.
I bet if I went through and simply removed support for anything coming out
of the <!DOCTYPE>
, including all entity processing,
then discarded
the DOM stuff, then added namespace support and SAX and StAX APIs, it would be
less than half its current size.
Then if I reworked the I/O, knowing what I know now and stealing some tricks
that James Clark uses in
expat, I bet it would
be the fastest Java XML parser on the planet for XML docs without a
DOCTYPE; by a wide margin. It’s hard to beat a DFA.
And it would still be fully XML 1.0 compliant. Because (snicker) this is Java, and your basic core Java now includes an XML parser, so I could simply instrument Larkette to buffer the prologue and if it saw a DOCTYPE with an internal subset, defer to Java’s built-in parser.
I’ll probably never do it. But the thought brings a smile to my face.
Last weekend, Lauren felt like cooking up home-made Easter eggs, so the shopping list included “chocolate chips (large bag)”. I was heading down the bulk-foods aisle and realized one of the vertical acrylic bins was full of them. Someone had been sloppy, and there was a little heap of chocolate chips on the shelf underneath it. For a second, I flashed into pure eight-year-old mode, thinking “Holy cow, there’s a whole bin full of chocolate chips, and more just lying there!” I popped a few in my mouth and they were excellent; semi-sweet, dark, strong, and firm. I was still in the state that Buddhists don’t mean when they say “Child’s Mind”, thinking “I can get as many as I want!” The list did say “large bag” after all, so I put a bag under the spout and gleefully jammed the lever all the way over. At home, Lauren said “You went overboard, a bit, didn’t you?” and now we have a plastic canister-full in the pantry which should last us into 2007. It’s a good feeling.
That would be my wife Lauren. After I b0rked our Win2K gamebox, I tried re-installing the OS and eventually reduced it to complete brick-ness, it recognized neither the video adapter nor the network card. So Lauren brushed me aside and started wrestling with the problem, and to make a long story short, it almost completely works again. At one point she seemed nearly infinite in her capabilities, sitting in front of the computer wrangling software updates while knitting baby stuff and looking up words in a German dictionary for the kid’s homework. Some of the German nouns and muttered curses at the Windows install sounded remarkably like each other. Why would anyone not marry a geek? The only problem is that Win2K won’t auto-switch resolutions to play games any more, it gets the frequency wrong and the LCD goes pear-shaped, you have to hand-select the frequency and switch into the right resolution first. LazyWeb?
Herewith two hideously ugly little shell scripts for use when Spotlight refuses to search your mail. Spotlight is a flawed v1.0 implementation of a really good idea and will, I’m sure, be debugged in a near-future release. [Update: The LazyWeb is educating me... these are moving targets.]
My problem is that whereas Mail.app will search my To/From/Subject lines (slowly, and with a really irritating GUI), the “Entire Message” option just doesn’t work, it returns instantly with no results. Yes, I’ve read the hints about making Spotlight re-index, but it just flatly refuses to work for me. Mind you, I have a lot of email, but still, it should at least try.
It turns out I had never really figured out the -print0
and
-0
idioms that a lot of the shell-command stalwarts now have.
Thanks to Malcolm Tredinnick for raising my consciousness.
This lives in $HOME/bin
under the name
mailgrep
:
#!/bin/sh
find $HOME/Library/Mail/IMAP* -name '*.emlx' -print0 | \
xargs -0 fgrep -i $@
Isn’t xargs
a funny command? I’ve discovered that it’s nearly
impossible to describe what does, and then why what it does is necessary, but
there are just a whole bunch of places where you’d be lost without it.
This lives in $HOME/bin/mailview
:
#!/bin/sh
find $HOME/Library/Mail/IMAP* -name '*.emlx' -print0 | \
xargs -0 fgrep -i -l -Z $@ | \
xargs -0 open
The first cut of this dodged xargs
and used an
incredibly-inefficient and slow chain of -exec
arguments to open
the files one at a time with
view
(aka vim
), to work around
a well-known vim
misfeature; it complained about the input
not being a terminal and left my Terminal.app keystrokes borked.
But Malcolm, confirming my belief in the broken-ness of vim
,
said “Oh, *that* ‘view’. I thought it was some sexy Mac ‘view my email’ app”.
D’oh, of course; the magic OS X open
command does just the right
thing.
Erm, you might want to run mailgrep
before you run
mailview
; I’m not sure what would happen if you asked OS X to
open three or four thousand email messages at once.
Friday Slide Scan #28 is two Eighties florals, one interior, one exterior. With a confession.
First some spring flowers fallen from a tree, just as now in our front yard, at dusk.
I’m not sure what these are, but look at the light in the center. Rewards enlarging.
Here’s the confession. Sometimes on Fridays when I’m feeling kinda burned-out, I knock off work and do these slide scans in the office, because this is where I have the big screen. Blowing these pictures up to mega-huge, picking away at the old-slide crud and scanning artifacts, tinkering with the colour balance, and listening; I never play music while I’m writing or coding seriously, but I play it real loud while photo-editing. It’s all pretty well pure pleasure; you just can’t imagine how good that second one above looks at near-native size. It reconstitutes the part of my mind that I earn my living with; that’s my story and I’m sticking to it.
Images in the Friday Slide Scans are from 35mm slides taken between 1953 and 2003 by (in rough chronological order) Bill Bray, Jean Bray, Tim Bray, Cath Bray, and Lauren Wood; when I know exactly who took one, I’ll say; in this case, at least one is by Cath Bray. Most but not all of the slides were on Kodachrome; they were digitized using a Nikon CoolScan 4000 ED scanner and cleaned up by a combination of the Nikon scanning software and PhotoShop Elements.
Three pictures around Vancouver; one of a fresh green springtime tree, two of rotten old buildings being torn down.
There’s nothing quite as fresh as just-sprouted deciduous leaves; another few weeks and this tree will be just a tree.
I have a thing about demolition. The first is a rotten dingy old one-story on Main Street near 23rd, the second is an unlovely grey mid-rise being torn down to build still more condos at Homer and Helmcken.
Michael J. Totten is a journalist and blogger who’s back and forth to the Middle East and writes about it, quite well in my opinion; he supports this by freelancing and with his blog’s tip jar. He gets lots of link love from the right-wing blogosphere, which is puzzling because Totten is balanced and clear-eyed and doesn’t seem to have any particular axe to grind. Recently, he and a friend were having fun in Istanbul and, on a random drive out into the country, decided on impulse to keep going, all the way across Turkey and into Iraq; into the Kurdish mini-state in Iraq’s north, to be precise. It makes a heck of a story, with lots of pictures, in six parts: I, II, III, IV, V, and VI.
James Governor relays a question that sounds important but I think is actively dangerous: do AJAX apps present more of a server-side load? The question is dangerous because it’s meaningless and unanswerable. Your typical Web page will, in the process of loading, call back to the server for a bunch of stylesheets and graphics and scripts and so on: for example, this ongoing page calls out to three different graphics, one stylesheet, and one JavaScript file. It also has one “AJAXy” XMLHttpRequest call. From the server’s point of view, those are all just requests to dereference one URI or another. In the case of ongoing, the AJAX request is for a static file less than 200 bytes in size (i.e. cheap). On the other hand, it could have been for something that required a complex outer join on two ten-million-row tables (i.e. very expensive). And one of the virtues of the Web Architecture is that it hides those differences, the “U” in URI stands for “Uniform”, it’s a Uniform interface to a resource on the Web that could be, well, anything. So saying “AJAX is expensive” (or that it’s cheap) is like saying “A mountain bike is slower than a battle tank” (or that it’s faster). The truth depends on what you’re doing with it. In the case of web sites, it depends on how many fetches you do and where you have to go to get the data to satisfy them. ongoing is a pretty quick web site, even though it runs on a fairly modest server, but that has nothing to do with AJAX-or-not; it’s because of the particular way I’ve set up the Web resources that make the pages here. I’ve argued elsewhere that AJAX can be a performance win, system-wide; but that argument too is contingent on context, lots of context.
Graham McMynn is a teenager who was kidnapped in Vancouver on April 4th and freed, in a large, noisy, and newsworthy police operation, on April 12th. Hao Wu is a Chinese film-maker and blogger who was kidnapped in Beijing on February 22nd in a small, quiet police operation not intended to be newsworthy, and who has not been freed. Read about it here, here, and here. Making noise about it might influence the government of China to moderate its actions against Mr. Wu, and can’t do any harm. Mr. McMynn’s kidnappers were a gaggle of small-time hoodlums, one of whom was out on bail while awaiting trial for another kidnapping (!). Mr. Wu’s were police. In a civilized country, the function of the police force is to deter such people and arrest them. A nation where they are the same people? Nobody could call it “civilized”.