1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
|
Roadmap
- Move all the fetchers etc. into pixman-image to make pixman-compose.c
less intimidating.
DONE
- Make combiners for unified alpha take a mask argument. That way
we won't need two separate paths for unified vs component in the
general compositing code.
DONE, except that the Altivec code needs to be updated. Luca is
looking into that.
- Delete separate 'unified alpha' path
DONE
- Split images into their own files
DONE
- Split the gradient walker code out into its own file
DONE
- Add scanline getters per image
DONE
- Generic 64 bit fetcher
DONE
- Split fast path tables into their respective architecture dependent
files.
See "Render Algorithm" below for rationale
Images will eventually have these virtual functions:
get_scanline()
get_scanline_wide()
get_pixel()
get_pixel_wide()
get_untransformed_pixel()
get_untransformed_pixel_wide()
get_unfiltered_pixel()
get_unfiltered_pixel_wide()
store_scanline()
store_scanline_wide()
1.
Initially we will just have get_scanline() and get_scanline_wide();
these will be based on the ones in pixman-compose. Hopefully this will
reduce the complexity in pixman_composite_rect_general().
Note that there is access considerations - the compose function is
being compiled twice.
2.
Split image types into their own source files. Export noop virtual
reinit() call. Call this whenever a property of the image changes.
3.
Split the get_scanline() call into smaller functions that are
initialized by the reinit() call.
The Render Algorithm:
(first repeat, then filter, then transform, then clip)
Starting from a destination pixel (x, y), do
1 x = x - xDst + xSrc
y = y - yDst + ySrc
2 reject pixel that is outside the clip
This treats clipping as something that happens after
transformation, which I think is correct for client clips. For
hierarchy clips it is wrong, but who really cares? Without
GraphicsExposes hierarchy clips are basically irrelevant. Yes,
you could imagine cases where the pixels of a subwindow of a
redirected, transformed window should be treated as
transparent. I don't really care
Basically, I think the render spec should say that pixels that
are unavailable due to the hierarcy have undefined content,
and that GraphicsExposes are not generated. Ie., basically
that using non-redirected windows as sources is fail. This is
at least consistent with the current implementation and we can
update the spec later if someone makes it work.
The implication for render is that it should stop passing the
hierarchy clip to pixman. In pixman, if a souce image has a
clip it should be used in computing the composite region and
nowhere else, regardless of what "has_client_clip" says. The
default should be for there to not be any clip.
I would really like to get rid of the client clip as well for
source images, but unfortunately there is at least one
application in the wild that uses them.
3 Transform pixel: (x, y) = T(x, y)
4 Call p = GetUntransformedPixel (x, y)
5 If the image has an alpha map, then
Call GetUntransformedPixel (x, y) on the alpha map
add resulting alpha channel to p
return p
Where GetUnTransformedPixel is:
6 switch (filter)
{
case NEAREST:
return GetUnfilteredPixel (x, y);
break;
case BILINEAR:
return GetUnfilteredPixel (...) // 4 times
break;
case CONVOLUTION:
return GetUnfilteredPixel (...) // as many times as necessary.
break;
}
Where GetUnfilteredPixel (x, y) is
7 switch (repeat)
{
case REPEAT_NORMAL:
case REPEAT_PAD:
case REPEAT_REFLECT:
// adjust x, y as appropriate
break;
case REPEAT_NONE:
if (x, y) is outside image bounds
return 0;
break;
}
return GetRawPixel(x, y)
Where GetRawPixel (x, y) is
8 Compute the pixel in question, depending on image type.
For gradients, repeat has a totally different meaning, so
UnfilteredPixel() and RawPixel() must be the same function so that
gradients can do their own repeat algorithm.
So, the GetRawPixel
for bits must deal with repeats
for gradients must deal with repeats (differently)
for solids, should ignore repeats.
for polygons, when we add them, either ignore repeats or do
something similar to bits (in which case, we may want an extra
layer of indirection to modify the coordinates).
It is then possible to build things like "get scanline" or "get tile" on
top of this. In the simplest case, just repeatedly calling GetPixel()
would work, but specialized get_scanline()s or get_tile()s could be
plugged in for common cases.
By not plugging anything in for images with access functions, we only
have to compile the pixel functions twice, not the scanline functions.
And we can get rid of fetchers for the bizarre formats that no one
uses. Such as b2g3r3 etc. r1g2b1? Seriously? It is also worth
considering a generic format based pixel fetcher for these edge cases.
Since the actual routines depend on the image attributes, the images
must be notified when those change and update their function pointers
appropriately. So there should probably be a virtual function called
(* reinit) or something like that.
There will also be wide fetchers for both pixels and lines. The line
fetcher will just call the wide pixel fetcher. The wide pixel fetcher
will just call expand, except for 10 bit formats.
Rendering pipeline:
Drawable:
0. if (picture has alpha map)
0.1. Position alpha map according to the alpha_x/alpha_y
0.2. Where the two drawables intersect, the alpha channel
Replace the alpha channel of source with the one
from the alpha map. Replacement only takes place
in the intersection of the two drawables' geometries.
1. Repeat the drawable according to the repeat attribute
2. Reconstruct a continuous image according to the filter
3. Transform according to the transform attribute
4. Position image such that src_x, src_y is over dst_x, dst_y
5. Sample once per destination pixel
6. Clip. If a pixel is not within the source clip, then no
compositing takes place at that pixel. (Ie., it's *not*
treated as 0).
Sampling a drawable:
- If the channel does not have an alpha channel, the pixels in it
are treated as opaque.
Note on reconstruction:
- The top left pixel has coordinates (0.5, 0.5) and pixels are
spaced 1 apart.
Gradient:
1. Unless gradient type is conical, repeat the underlying (0, 1)
gradient according to the repeat attribute
2. Integrate the gradient across the plane according to type.
3. Transform according to transform attribute
4. Position gradient
5. Sample once per destination pixel.
6. Clip
Solid Fill:
1. Repeat has no effect
2. Image is already continuous and defined for the entire plane
3. Transform has no effect
4. Positioning has no effect
5. Sample once per destination pixel.
6. Clip
Polygon:
1. Repeat has no effect
2. Image is already continuous and defined on the whole plane
3. Transform according to transform attribute
4. Position image
5. Supersample 15x17 per destination pixel.
6. Clip
Possibly interesting additions:
- More general transformations, such as warping, or general
shading.
- Shader image where a function is called to generate the
pixel (ie., uploading assembly code).
- Resampling kernels
In principle the polygon image uses a 15x17 box filter for
resampling. If we allow general resampling filters, then we
get all the various antialiasing types for free.
Bilinear downsampling looks terrible and could be much
improved by a resampling filter. NEAREST reconstruction
combined with a box resampling filter is what GdkPixbuf
does, I believe.
Useful for high frequency gradients as well.
(Note that the difference between a reconstruction and a
resampling filter is mainly where in the pipeline they
occur. High quality resampling should use a correctly
oriented kernel so it should happen after transformation.
An implementation can transform the resampling kernel and
convolve it with the reconstruction if it so desires, but it
will need to deal with the fact that the resampling kernel
will not necessarily be pixel aligned.
"Output kernels"
One could imagine doing the resampling after compositing,
ie., for each destination pixel sample each source image 16
times, then composite those subpixels individually, then
finally apply a kernel.
However, this is effectively the same as full screen
antialiasing, which is a simpler way to think about it. So
resampling kernels may make sense for individual images, but
not as a post-compositing step.
Fullscreen AA is inefficient without chained compositing
though. Consider an (image scaled up to oversample size IN
some polygon) scaled down to screen size. With the current
implementation, there will be a huge temporary. With chained
compositing, the whole thing ends up being equivalent to the
output kernel from above.
- Color space conversion
The complete model here is that each surface has a color
space associated with it and that the compositing operation
also has one associated with it. Note also that gradients
should have associcated colorspaces.
- Dithering
If people dither something that is already dithered, it will
look terrible, but don't do that, then. (Dithering happens
after resampling if at all - what is the relationship
with color spaces? Presumably dithering should happen in linear
intensity space).
- Floating point surfaces, 16, 32 and possibly 64 bit per
channel.
Maybe crack:
- Glyph polygons
If glyphs could be given as polygons, they could be
positioned and rasterized more accurately. The glyph
structure would need subpixel positioning though.
- Luminance vs. coverage for the alpha channel
Whether the alpha channel should be interpreted as luminance
modulation or as coverage (intensity modulation). This is a
bit of a departure from the rendering model though. It could
also be considered whether it should be possible to have
both channels in the same drawable.
- Alternative for component alpha
- Set component-alpha on the output image.
- This means each of the components are sampled
independently and composited in the corresponding
channel only.
- Have 3 x oversampled mask
- Scale it down by 3 horizontally, with [ 1/3, 1/3, 1/3 ]
resampling filter.
Is this equivalent to just using a component alpha mask?
Incompatible changes:
- Gradients could be specified with premultiplied colors. (You
can use a mask to get things like gradients from solid red to
transparent red.
Refactoring pixman
The pixman code is not particularly nice to put it mildly. Among the
issues are
- inconsistent naming style (fb vs Fb, camelCase vs
underscore_naming). Sometimes there is even inconsistency *within*
one name.
fetchProc32 ACCESS(pixman_fetchProcForPicture32)
may be one of the uglies names ever created.
coding style:
use the one from cairo except that pixman uses this brace style:
while (blah)
{
}
Format do while like this:
do
{
}
while (...);
- PIXMAN_COMPOSITE_RECT_GENERAL() is horribly complex
- switch case logic in pixman-access.c
Instead it would be better to just store function pointers in the
image objects themselves,
get_pixel()
get_scanline()
- Much of the scanline fetching code is for formats that no one
ever uses. a2r2g2b2 anyone?
It would probably be worthwhile having a generic fetcher for any
pixman format whatsoever.
- Code related to particular image types should be split into individual
files.
pixman-bits-image.c
pixman-linear-gradient-image.c
pixman-radial-gradient-image.c
pixman-solid-image.c
- Fast path code should be split into files based on architecture:
pixman-mmx-fastpath.c
pixman-sse2-fastpath.c
pixman-c-fastpath.c
etc.
Each of these files should then export a fastpath table, which would
be declared in pixman-private.h. This should allow us to get rid
of the pixman-mmx.h files.
The fast path table should describe each fast path. Ie there should
be bitfields indicating what things the fast path can handle, rather than
like now where it is only allowed to take one format per src/mask/dest. Ie.,
{
FAST_a8r8g8b8 | FAST_x8r8g8b8,
FAST_null,
FAST_x8r8g8b8,
FAST_repeat_normal | FAST_repeat_none,
the_fast_path
}
There should then be *one* file that implements pixman_image_composite().
This should do this:
optimize_operator();
convert 1x1 repeat to solid (actually this should be done at
image creation time).
is there a useful fastpath?
There should be a file called pixman-cpu.c that contains all the
architecture specific stuff to detect what CPU features we have.
Issues that must be kept in mind:
- we need accessor code to be preserved
- maybe there should be a "store_scanline" too?
Is this sufficient?
We should preserve the optimization where the
compositing happens directly in the destination
whenever possible.
- It should be possible to create GPU samplers from the
images.
The "horizontal" classification should be a bit in the image, the
"vertical" classification should just happen inside the gradient
file. Note though that
(a) these will change if the tranformation/repeat changes.
(b) at the moment the optimization for linear gradients
takes the source rectangle into account. Presumably
this is to also optimize the case where the gradient
is close enough to horizontal?
Who is responsible for repeats? In principle it should be the scanline
fetch. Right now NORMAL repeats are handled by walk_composite_region()
while other repeats are handled by the scanline code.
(Random note on filtering: do you filter before or after
transformation? Hardware is going to filter after transformation;
this is also what pixman does currently). It's not completely clear
what filtering *after* transformation means. One thing that might look
good would be to do *supersampling*, ie., compute multiple subpixels
per destination pixel, then average them together.
|