| About Subscriptions Feeds |
May 12, 2008David Miller: gdb bugs...
So I spent the weekend fishing out some gdb bugs on sparc. Every time I think I understand and know how all of this stuff works I get thrown some new surprises. This time was no exception. The kernel has all of these neat features, via ptrace()'s PTRACE_SETOPTIONS, that allows a debugger to get notifications when a process forks, vforks, does an exec, etc. When these events occur in the inferior process, it does a ptrace_notify() which sends a SIGTRAP to the process with the event encoded into the siginfo exit code. As a result, when the process tries to return back into userspace, it'll do signal processing. As part of that, it will invoke ptrace_stop() which sleeps the task and wakes up the debugger parent so it can examine the event and process state. The debugger has several choices of what to do. For many cases it will do a ptrace() PTRACE_CONT with a code indicating how the process should continue. Another thing the debugger can do is decide that it's no longer interested in tracing this task, and therefore it does a ptrace() PTRACE_DETACH. This is part of the first bug. A long time ago we picked the values for PTRACE_this and PTRACE_that on sparc. Some of them mirrored the SunOS values. One of those was PTRACE_DETACH. We unconditionally recognized the SunOS PTRACE_DETACH value, even for Linux processes. Unfortunately, this is the value that also ended up in the sys/ptrace.h Linux header file. So that's the value every Linux application ended up using too. I yanked out the SunOS PTRACE_foo call support long ago, and it's amazing how much works without a properly functioning PTRACE_DETACH. Putting in a compat translation from the SunOS to the intended Linux value for PTRACE_DETACH in the Sparc ptrace code solved this bug. Which brings us to the next bug... What brought me down this path in the first place was examining why running gdb under itself didn't work properly. This kind of game is always fun: bash$ ./gdb ./gdb (top-gdb) set args test (top-gdb) run (gdb) run Hello World! ...This would hang when the inner-gdb tries to run the test program, and I had to figure out why. After lots of tracing I found that the inner-gdb was hanging in a sigpause() call. GDB uses sigpause() to wait for SIGCHLD events when it is simply waiting for running inferiors to ptrace_stop() or take some other kind of signal. In comes the issue of system call restart. Handling this wrt. debuggers is not easy. One nice feature of debuggers like GDB is that you can ask them at any point in time to call some function in the program they are running.
(gdb) p printf("hi\n")
hi
$1 = 2
(gdb)
and after calling the function it completely restores the process state to
what it was before the call. How does it implement this?
First it saves the process state, mostly this comprises the registers. Next it allocates some stack space for the call, pushes the arguments for the function call onto the stack and in registers. Next, it makes the function return address point to a breakpoint it can uniquely recognize. Finally, it sets the program counter to point at the function to call. When the function returns it hits the breakpoint, and the debugger restores all of the saved state it stored away before the special call. Now, back to system call restart. When a signal interrupts a system call, it can return immediately to process the signal. Internally to the kernel system calls return special error codes to let the signal dispatch know what to do with the system call that was in progress. It can say that -EINTR should be returned and the system call completed. But it can also say that the program counter should be rewound to the system call trap, the argument registers re-setup with their original values, and the system call thus replayed. In the example above, imagine what happens if the debugger calls an inferior function at the time that the signal is dispatched. Somehow, in order to restore the state properly after the call, this "we're in a syscall and should do syscall restarting" state has to get saved and restored too. Long ago I had this clever idea wherein I tried to solve this problem entirely inside of the kernel to shield debuggers from having to deal with it. The idea was that we'd modify the process state and perform the system call restart operations before we stopped the program to let the debugger see the state. Although great in theory, in practice it's an unworkable solution. We don't know what the debugger is going to do with the process. As I stated earlier it can pass along the signal to the process, or it can cancel the signal delivery altogether. This decision influences whether we should do system call restart or not, but we already pre-commited that state and let the debugger see it already. We can't know what the debugger is going to do ahead of time, therefore it is impossible to do the right thing. This is what was causing the inner debugger to hang. The inner gdb is receiving a SIGCHLD because the 'test' program it is debugging has hit ptrace_stop(). The top-level gdb looks at this and says "ok, let's just let the inner gdb see the signal, PTRACE_CONT." But my funny in-kernel ptrace syscall restart code already setup for a syscall restart of sigpause(), but what should have happened was a return of -EINTR. The inner gdb is now wedged forever, it missed the SIGCHLD and the debugged process is sleeping waiting to be woke up by the inner gdb. So I ripped out all of this silly code, and ended up doing what powerpc, x86, and other platforms do. I added a piece of software binary state (an unused bit in one of the processor state registers), that gdb can control. It is the "we're in a system call" state. When the debugger does an inferior call, it clears this bit when changing the program counter register. This forces the kernel to not do system call restart processing when the task wakes out of ptrace_stop(). But later, when gdb restores all of the register state after the call, that special bit will be restored, and we'll do the right thing as we deliver the original signal and subsequently do syscall restart processing as needed. It's always fun to find land mines like these ones. May 08, 2008Harald Welte: Victory: Skype withdraws appeals case, judgement from lower court accepted
The court hearing in the "Welte vs. Skype Technologies SA" case went pretty well. Initially the court again suggested that the two parties might reach some form of amicable agreement. We indicated that this has been discussed before and we're not interested in settling for anything less than full GPL compliance. The various arguments by Skype supporting their claim that the GPL is violating German anti-trust legislation as well as further claims aiming at the GPL being invalid or incompatible with German legislation were not further analyzed by the court. The court stated that there was not enough arguments and material brought forward by Skype to support such a claim. And even if there was some truth to that, then Skype would not be able to still claim usage rights under that very same license. The lawyer representing Skype still continued to argue for a bit into that direction, which resulted one of the judges making up an interesting analogy of something like: "If a publisher wants to publish a book of an author that wants his book only to be published in a green envelope, then that might seem odd to you, but still you will have to do it as long as you want to publish the book and have no other agreement in place". In the end, the court hinted twice that if it was to judge about the case, Skype would not have very high chances. After a short break, Skype decided to revoke their appeals case and accept the previous judgement of the lower court (Landgericht Muenchen I, the decision was in my favor) as the final judgement. This means that the previous court decision is legally binding to Skype, and we have successfully won what has probably been the most lengthy and time consuming case so far. May 07, 2008Harald Welte: Back from the trip to Taiwan
It's been some time since my last blog post, mainly because I've been quite busy in Taiwan. First there was the conference, then there were a number of meetings with various companies to educate them about GPL licensing and how to interoperate with the FOSS community for better hardware/driver support. The other part was actual spare time. I spent many months in Taipei during my work for OpenMoko, but I never really had much time to explore the city, or even other parts of the country. This time I explored quite a bit of the Taipei nightlife, visiting places like Luxy, Lava, Room18, Barcode, ageha, and even the so-called "meat market" of Carnegies and Tavern. I've also had time to try one of the many hot spa's of Taipei in Beitou, as well as a really great motorbike trip to the national forest in the Wulai mountain region. Unfortunately the weather wasn't that great, so I had to postpone my plans to visit the northeastern and the eastern coast to some future trip. And the most interesting part is: I actually made contact to Taiwanese people who are not at all in any way related to work :) Further Taipei exploration brought me to the Wufenpu fashion wholesale area, as well as Ximending. Most impressive is also the "Taipei underworld", i.e. the various underground shopping malls near Taipei Main Station, such as the Taipei City Mall, Station Front Mall and ZhongShen Mall I and II. You can literally walk for many kilometers underground... Now I am one day in Frankfurt, and tomorrow one day in Munich, Friday one half day at home, and then there will be four days of music festival at WGT 2008. Harald Welte: Tomorrow: Court hearing in Welte vs. Skype GPL case
Tomorrow at 10:30am at the Oberlandesgericht Muenchen (higher regional court of Munich) there will be an oral hearing in the "Welte vs. Skype Technologies SA" case. The hearing is to be held in room E.06. This case is about a GPL violation of Skype, related to their sales of Wifi Skype phones based on the Linux operating system kernel. I'm fighting as part of the gpl-violations.org project in enforcing the GPL against Skype since February 2007. Initially Skype didn't respond, we then applied for a preliminary injunction. That injunction was granted by the court in June 2007, but Skype chose to file an appeals case against it. The court hearing tomorrow is exactly to debate about this appeal. Interestingly, Skype is arguing against the validity of the GPL as a whole, asserting that it is violating anti-trust regulation and similarly strange claims. Patrick McHardy: Summer office
The weather has been great the past days, so I set up my summer workplace :)
Working outside is really pleasant after a month of almost constantly grey sky. Below the balcony there's a small stream, and hundreds of birds sit in the trees and sing, which makes an amazing scenery. I sent out a first batch of HIFN fixes today to avoid causing too much conflicts in the series in case something turns up during review. Caught a good time during which both Evgeniy and Herbert were responsive and it only took about an hour to get all patches reviewed, fix a minor bug and get them merged. The remaining ones are hopefully in shape by tommorrow, the descriptor accounting still needs a bit more work. Herbert also merged some patches from Loc Ho today for async hashing support, which is cool because I already started adding hashing support to the HIFN driver until I noticed the CrytoAPI doesn't support it asynchronously yet :) Also sent out a few netfilter patches and fixed a slightly embarrasing bug in the macvlan driver. It would crash the kernel on module unload because cleanup was performed incorrectly, causing the kernel to jump to a NULL function pointer when receiving the next packet on the underlying device. I wonder why I've never noticed this. May 06, 2008Patrick McHardy: Fighting the HIFN driver
What I hoped initially to be just a simple fix for a few arithmetic errors in the driver for the HIFN 795x crypto accelerator cards turned into a week long struggle, accompanied by at least a hundred crashes and reboots. The initial bug manifested itself by going into an endless loop when the CryptoAPI issued a request for less data than the full scatterlist, caused by an integer underflow while calculating the remaining amount of data to be processed. The fix was straight-forward: only use the minimum of the scatterlist size and the crypto request size. While at it, I also fixed some endian bugs, missing error propagation for errors that shouldn't happen, but did because of the underflow, and some overly strict data alignment checks. Testing looked good, no more crashes, but surprisingly the testcases of the tcrypt module using algorithms provided by HIFN randomly failed. This turned out to be caused by an incorrect return value indicating synchronous processing to the CryptoAPI, while the request was in fact processed asynchronously. So when the result was not already available when returning from the driver, testcases failed. After fixing the tcrypt failures, next was some real-life testing using IPsec. The first attempt resulted in an immediate crash in crypto_authenc_genivc(). This one was fixed fairly quickly, the asynchronous completion handler interpreted a pointer as an incorrect structure. The second attempt looked more promising, no crashes, packets went through and looked like IPsec. The remote side failed to parse them however, closer looking revealed that they were incorrectly constructed and had 16 bytes of garbage at the end. From my last attempt to fix the driver I remembered that this was most likely caused by missing initialization vector size initialisation of the CBC modes. Naively, I changed the driver to properly initialize the ivsize. To my surprise, attempting to add SAs using cbc(aes) now failed with -ENOENT. Figuring out the reason took me almost an entire day. When the ivsize is already initialized, the CryptoAPI attempts to spawn a new instance of the algorithm. Algorithms are identified by name, possibly combined with modes, like cbc(aes). When spawning new algorithms, the driver name is used for the lookup however, which in the case of HIFN was "hifn-aes" for all AES modes, causing the lookup to return the ofb(aes) algorithm instead of cbc(aes). Using unique driver names for the different algorithm modes fixed this problem. While chasing this bug, I noticed some DMA memory corruption issues in the HIFN driver. When a request contains more than a single scatterlist element, the driver programmed the hardware to perform one crypt operation per scatterlist element, but for the full request size, corrupting the memory after its tail. The fix for this was a bit more involved since using the correct length also requires to perform only a single operation for all scatterlist elements since the source and destination descriptors don't necessarily have identical lengths. This complicates keeping track of free descriptor entries. Previously, each operation needed exactly one command, source, destination and result descriptor. With only a single operation, it needs one command and result descriptor and a varying amount of source and destination descriptors. On the upside, this reduces the number of interrupts per request to exactly one instead of one per scatterlist element and gets rid of some atomic operations. Additionally tcrypt can now detect destination buffer corruption for cipher tests. Continuing testing with IPsec, things now looked better, packets were properly sized and the receive side worked properly. Outgoing packets were still dropped by the receiver however. Looking more closely at the packets showed that they contained what looked like a block of unencrypted data at the end. Additionally there still were some rare random crashes in the CryptoAPI. The crashes were caused by a missing check for end-of-scatterlist in one of the CryptoAPI scatterlist helpers, the unencrypted block of data by an off-by-one in the eseqiv sequence number generator. Both problems were fixed by Herbert Xu. The first victory - IPsec now worked properly using ping. TCP connections stalled after a short period however. Half a day later, I also figured out the reason for the stalls. The HIFN driver needs to keep some context for each request since it processes them asynchronously. The driver used the global per-transform storage for this context instead of the per-request storage, corrupting existing contexts when more than one request was outstanding. Even in flood mode, ping exhibits ping-pong behaviour, waiting for a reply before sending the next request, which is why it wasn't affected by this problem. With this also fixed, IPsec seemed to be working properly, at least on the HIFN side. There still appears to be some corruption of the XFRM CB with asynchronous processing, causing outgoing tunnel mode packets to be sent without IP_DF, but that should be easily fixed. Next was testing with dm-crypt, for which I actually purchased the card. Testing worked fine while debugging was enabled, without debugging it reproduceably crashed in the device mapper code. This was fairly nasty to debug since enabling debugging stopped the bug from happening. After following lots of dead ends and some suggestions from Evgeniy, I found the cause: when no descriptors are currently available, the request is queued and processed once enough descriptors are available again. The queue length is limited (in the case of HIFN to 1), when the limit is reached the behaviour depends on the flags specified by the caller. When using CRYPTO_TFM_REQ_MAY_SLEEP, the caller goes to sleep and waits for notification from the driver when its ready to accept more requests. When dequeuing the crypto queue, asynchronous crypto drivers need to check for backlogged clients and wake them before continuing processing. This was missing from the HIFN driver, causing it to call the dm-crypt completion handler for a request that wasn't fully initialized. With this bug also fixed, dm-crypt survived a 24 hour stress test. I'm a bit reluctant at this point to use it for real data though, all those bugs didn't exactly instill confidence. The patches are in an almost upstream-submittable state, just the descriptor accounting needs some minor cleanup. I hope to get this done today or tommorrow and then attend to the huge backlog in my inbox that has grown over the past week. On the netfilter front, nothing too exciting has happened during the last two weeks. 2.6.25 appears to have gone pretty well, netfilter-wise, except for one nasty hashing regression on ARM, fixed by Philip Craig. The amount of patches merged during the 2.6.26 merge window was smaller than usual, the highlights are:
I'm particulary happy about finally managing to merge the SIP helper patches, which I had queued for almost 9 month. If you've tried using it and it didn't work, now is a good time to try again and submit bug reports :) Patrick McHardy: Overcoming laziness
I decided to give blogging another try. My last attempt failed after just one or two entries because of me being too lazy to actually write something, but since I enjoy reading other people's blogs, I hope I can keep the motivation up a bit longer this time :) May 05, 2008Jeremy Kerr: linux.conf.au hackfest: the solution, part two
In the last article we finished with a SPE-based fractal renderer, but with a limited maximum fractal size of 64 × 64 pixels:
We'd like to generate full-size fractals, but the DMAs (which we use to transfer the fractal image out of the SPE) have a maximum size of 64kB. The solution is to perform multiple DMAs each containing a subset of the image's rows. Each invocation of
We just need to modify the spe-fractal code (
render_fractal(&args.fractal);
mfc_put(args.fractal.imgbuf, ppe_buf,
args.fractal.rows * args.fractal.cols * sizeof(struct pixel),
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
First, we need to modify our static void render_fractal(struct fractal_params *params, int start_row, int n_rows) In the SPE program's
bytes_per_row = sizeof(*buf) * args.fractal.cols;
rows_per_dma = sizeof(buf) / bytes_per_row;
And do the rendering and DMAs in a loop:
for (row = 0; row < args.fractal.rows; row += rows_per_dma) {
render_fractal(&args.fractal, row, rows_per_dma);
mfc_put(buf, ppe_buf + row * bytes_per_dma,
rows_per_dma * bytes_per_row,
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
}
This loop will render as many image rows as will fit into a single DMA, then DMA the rendered data back to main memory. Now, we're able to render fractals larger than 64 × 64 pixels:
The source for the updated fractal renderer is available in fractal.2.tar.gz. performanceNow that we can generate full-size fractals, we can compare the running times with the PPE-based fractal renderer. The following table shows running times with a standard fractal (using these fractal parameters).
So, we get a 27% speedup by moving the fractal generation code to run on a SPE. We're still a way behind the optimal performance though, and benchmarking on other systems gives better times (for example, generating the same fractal on an Intel Core 2 Duo @ 2.4GHz takes 13.8 seconds). We can improve the Cell performance by a large amount - stay tuned for the next article to see how. April 30, 2008David Miller: Effective GIT bisecting...
I've had to do a lot of this lately, and the most efficient attempts take on a certain pattern. So you have a bug, and you can readily reproduce it. Also, the bug appeared in the last pull you made from Linus's tree. Perfect. At this point you know that 'master' has the bug and that 'ORIG_HEAD' lacks the bug. You could just blindly bisect the whole thing, but you can save yourself some time (and also learn a bit about the nature of the bug) by using some clues and some quick tests to narrow things down a lot from the beginning. This determination can be easy. Your goal is to first find a spot which you think works. You'd like it to be something a bit further than ORIG_HEAD, as you're trying to narrow things down. The easy case is some driver breaks or similar, or you see some error message and it's clear what subsystem that came from. Take that information and use it to scan over the changesets you got from your pull: gitk ORIG_HEAD..Note that when you select a changeset in gitk, the SHA ID of that commit becomes the current X selection. You can use that to do things more quickly below. So you're found a sequence of commits that look suspicious. Pick the changeset before the first commit in the suspicious set, and check out a test tree with it as the tip into a test branch: git checkout -b test $(SHA_ID)Build that kernel, and make sure the bug doesn't happen. Let's assume that this kernel passes your test. You have a few options on how to proceed. The easiest thing to do is to just bisect using the information you now have: git bisect start git bisect bad master git bisect good testand so on. Build, test boot, and if it shows the bug: git bisect badelse if it succeeds: git bisect goodand repeat the process until GIT shows you the guilty commit. The other option is to try and figure out an approximate more optimal end point for bisection. Take the set of "suspicious" changeset your determined above, and take the one after the last and go: git checkout -b test2 $(SHA_ID)If this shows the bug, you're in business: git bisect start git bisect bad test2 git bisect good testand continue as detailed above. When you're done with all of this: git bisect resetand report your results to the mailing list. April 26, 2008Harald Welte: First ASUS day of OpenTechSummit Taipei
As I might have indicated before, I have the pleasure of being invited to the OpenTechSummit 2008 in Taiwan. Two days ago, I was at the opening dinner. The problem of that dinner was the lack of attendees. There were loads of delicious (free, sponsored) food, but not even remotely enough people to eat it. Today I had a bit of a problem finding the ASUS venue, since it was said to be at "exit 2" of the MRT station. Unfortunately it had two exits of that name, one on each side of the station :) One presentation there I found particularly embarrassing was the one about the eePC SDK. First of all, I will ignore my thoughts about why you actually need such an SDK if it really is nothing more than a customized Debian Linux with Eclipse. But even then, why fly in a foreing speaker to do a click-by-click walk-thhrough on how to create a 'hell world' Qt program using eclipse? My favourite of the day was definitely the presentation on the OpenPattern router board. April 24, 2008Harald Welte: Back to Taipei
After a break of almost six months, I'm back to Taipei. Obviously I now see everything from a quite different angle: I no longer work for OpenMoko, Inc., thus I actually have spare time here and can explore both the capital city as well as the country much better than before with that ever-growing OpenMoko workload. However, the first day wasn't quite as relaxing as it should have been. First, the apartment key that was supposed to be with the guard of the apartment building accidentally was mixed up with some other key and got sent to the landlord. A couple of hours later I discover that my Yamaha TW225 motorbike doesn't work anymore. First diagnosis: Battery is empty (not surprisingly). I try for like 15minutes to kickstart it, to no avail. Not even a single explosion in the engine. Then I tried to push it, and got it to a couple of explosions after which it died again. Further push-starting was prevented by the way-too-smooth floor of the parking garage, where the wheel just slides as soon as you release the clutch :( Some disassembly revealed where the battery is (I don't know this bike at all, much opposed to my F650ST in Germany). The battery was severely short of acid/fluid, maybe somebody pushed the bike over and it leaked. Obtaining battery additive and refilling results in only 800mA charge current. I think it's dead. Now I'm in the process of ordering a new battery. Let's hope the next couple of days are better than the start of this trip... April 21, 2008Harald Welte: Review of DORS/CLUC 2008 in Zagreb, Croatia
I've spent the last five days in beautiful Croatia - most of the time in its capital Zagreb. The local conference DORS/CLUC has been around for a couple of years, and in fact I've been at a previous incarnation three years ago. It's a nice, small but great event. And in fact, for the invited speakers as myself it feels more like an all-inclusive holiday than a conference. The organizers went out of their way to make us feel at home, including a trip to the waterfalls of Plitvice national park (photos will be available shortly at my public photo album. It was also great to spend some time with Alan Cox again, who to my surprise was also attending the event with two lectures. Hope his luggage didn't get lost again on his way home... April 12, 2008Harald Welte: Further studying of Abis protocols, moving towards implementation
The first quarter of 2008 is already gone, and I still haven't found all the time that I wanted to find to play with my BS11 base station[s]. However, I've spent quite a bit of time over the last couple of days further studying the GSM/3GPP 08.5x documents, as well as a thorough read through the mISDN source code. GSM/3GPP 08.5x describe the layer1, 2 and 3 protocols of the Abis link between BSC (Base Station Controller) and BTS (Base Transceiver Station) in a GSM network. It's modelled on top of a E1 link in PCM30C configuration, i.e. TS0 is for CRC4 and synchronization, TS16 is used for the layer2+layer3 protocols, whereas the other time slots are used for transfer of the actual voice call data. After looking at the various different driver options on Linux, I have determined that mISDN is the most promising and flexible architecture available. mISDN also has a layer0 + layer1 driver for the NT mode of the HFC-E1 card that I'm using. mISDN is great in a way that every layer is strictly separated from the other layer, and that at any layer parts of the stack can be implemented in userspace using library API. Thus, I've started to write some mISDNuser based code to attach to the kernel-side hardware and lower-layer drivers. I'm not yet sure if the Q.921 (ISDN Layer2, also called LAPD) of the mISDN kernel side can be reused for Abis or not. The differences between standard Q.921 used on European ISDN and the Abis Layer2 are fairly small, so I hope to get it working with the existing LAPD code. Unfortunately, I have paid work to take care of, so I will once again be distracted from this most interesting of my toy projects. Harald Welte: Report from FSFE FTF Licensing and Legal workshop
I'm on seven-hour train ride back from Amsterdam, where I've been attending the first Licensing and Legal workshop of the Freedom Task Force (FTF) of the Free Software Foundation Europe (FSFE). While having a somewhat lengthy name, the FTF has been doing great work on bringing together a large group of legal and technical experts in the field of Free Software licensing. So far this was all 'virtual', happening on mailing lists.` The meeting in Amsterdam was the first of its kind, and was a huge success. By the nature of the FSFE, most of the people were from Europe, though there were attendees from the US and even Australia, too. There were many interesting and surprisingly interactive workshops. It was also a good opportunity to meet Armijn (the second half of gpl-violations.org) and Shane (full-time manager of the FSFE FTF), as well as many lawyers, both corporate legal counsel and from law firms. The interest in Armijns presentation about gpl-violations.org and Till Jaeger's overview about the legal cases we've handled over the years in Germany were very well received and there was more interest and questions than the short time permitted. What was really good for me to see is that large consumer electronics companies in Europe and the US are now implementing internal business processes to ensure GPL and other FOSS license compliance. They're also increasingly using very clear contractual language throughout their supply chain to minimize the potential risk of any "hidden" GPL surprises in products they source from OEM/ODM companies. April 11, 2008Harald Welte: We don't do Advertisement on the netfilter.org homepage
For some reason, the amount of inquiries about companies who want to put ads on netfilter.org has significantly increased. Since the content of that site has not really changed much in the last (at least) four years, this sudden interest is somewhat surprising to me. However, we are absolutely not interested in advertisements. I personally hate any form of advertisement, whether in print media, radio, TV, WWW or on billboards. In fact, advertisements are the reason for me to not watch any privately owned TV or radio stations for at least eight years. So to all the advertising companies out there: Only over my dead body will there be any kind of banner ads on any of the websites of the projects in which I have anything to say. April 07, 2008Rusty Russell: C inline functions not in headers
I just appreciated an interesting side-effect of slapping "inline" on static functions within .c files. You don't get a warning when they become unused. This breaks my normal method for code cleanup (in this case, the tun driver). So unless you have evidence otherwise, plase trust the compiler to inline static functions appropriately and don't label them inline. (And remember: inline is the register keyword for the 21st century.) Jeremy Kerr: linux.conf.au hackfest: the solution, part one
During linux.conf.au 2008, a bunch of us ozlabbers ran the hackfest - a programming competition for conference attendees. This year's task was to optimise a fractal generation program to run on the Cell Broadband Engine - the hackfest task description is still available if you want to take a squiz. The next few articles here will take you through a solution to the hackfest task. This is only one approach, and there may be many others. If you have any comments or questions, feel free to mail me. (If you're viewing this through a feed reader or planet, you may want to check out the the original article, where you get much nicer code formatting.) optimisingThe task is a matter of optimising an existing program. We should take a leaf out of Knuth's book here:
We'll start out with something simple, and work our way up from there. starting outAs a starting point, it'd be a good idea to check out the simple-fractal example, to find out what sort of problem we're tackling here. While we're at it, we can do a bit of profiling on the sample fractal generator to find out where the hot paths of the program are. A quick way to do this is to run the simple-fractal program under callgrind:
[jk@pokey simple-fractal]$ callgrind --simulate-cache=yes --dump-instr=yes \
./simple-fractal fractal.data
Looking at the callgraph output (using kcachegrind), we can get a list of the functions taking the largest amount of CPU time:
The 'Self' column gives the estimated percentage of cycles spent in each
function. We can see that 99.2% of the CPU time is spent in
Now that we know what we need to optimise, we can work on offloading this
to the SPEs on the Cell Processor. Because the majority of the running time is
due to cell versionWe can get a fractal generator working on the Cell pretty quickly, by using the simple-fractal sample code for the fractal side of things, along with the data-transfer example for a framework for getting code running on the SPEs. To me, the most logical approach is to move the
This will require a few changes:
If immediate gratification is more your style, here's one I prepared earlier. After these changes (plus some general plumbing), you should have a working SPE-based fractal renderer!
However, we still have a few limitations:
So, nothing too exciting yet. However, in the next part of this series, we'll be working on optimising our new program to use some of the neat features of the Cell architecture, and get around each of these limitations. Stay tuned! April 05, 2008Rusty Russell: Hard To Misuse CommentrySince my blogfu doesn't extend to comments, I recommend the thoughtful comments found on my recent 'Hard to Misuse' posts at LWN: firstly 'How Do I Make This Hard to Misuse?' commentry and then 'What If I Don't Actually Like My Users?' commentry. April 01, 2008Rusty Russell: What If I Don't Actually Like My Users?
Here begins our descent into hell; if an interface manages to achieve negative scores on the Hard To Misuse List, your users may detect the dull red glow of malignancy rather than incompetence.
That's everything I know about interface design. Now, go and make your own mistakes so you can have wise things to say about it! March 30, 2008Rusty Russell: How Do I Make This Hard to Misuse?
It's useful to arm ourselves with a pithy phrase should we ever have to face an "it'll be easier to use!" argument. But once we've pointed to it, it's still not clear how to improve the difficulty of interface misuse. So I've created a "best" to "worst" list: my hope is that by putting "hard to misuse" on one axis in our mental graphs, we can at least make informed decisions about tradeoffs like "hard to misuse" vs "optimal". The Hard To Misuse Positive Score List
March 27, 2008Harald Welte: Schiphol airport uses active millimeter wave screening
I was quite surprised that Amsterdam airport is beginning to introduce active millimeter wave screening instead of the good old metal detectors. The specific device seen in operation at one of the queues between the international and the Schengen area of the airport was L3 Communications ProVision(TM). While doing some research about this subject on the net, I discovered cargo X-ray solutions such as those described in this article. You can mount a mobile unit onto a track and then go as deep as 200mm of steel to x-ray through the metal plating of a cargo container. This is really scary stuff... March 26, 2008Harald Welte: I don't work for Google - no matter what the rumors say
A number of people have recently independently approached me about rumours that I'm now working for Google/Android, after having left OpenMoko, Inc. in November 2007. According to one source, some friend who visited Android was told by Android that I would be now working for them. There is no truth to this. Please put an end to those rumours. I'm not working with or for either Google or Android. There also are no plans to do so, and there have never been any negotiations, aside from the usual Google headhunters that approach anyone visible in the FOSS world every so often - which I always decline, indicating that I am not interested in a dependent employment position, no matter for which company. I will continue to be doing freelance contract work on projects that are interesting to me and within my fields of expertise. Should anyone chose to approach me with an interesting technical Android system-level and/or hardware related project, that would certainly potentially be interesting. But I'd look at it like any other inquiry. Harald Welte: KLM also using Linux in their Entertainment System
The intercontinental KLM flight from Sao Paulo to Amsterdam was using a fairly new (05/2007) Boeing 777-300, and it was equipped with something like an 8" wide screen entertainment system, not unlike the one that I saw some months back in a Shanghai Airlines flight. This time I had the luck to see the Linux based system boot twice. The boot time is horrible (on the order of 4 minutes) and you can see many hardware details. It's using a Geode type CPU and a realmagic GPU, has a natsemi Ethernet chip and the credit card reader is actually a USB HID device. All over the place they have fairly low-level debug code spit out to the console, this really looks like "it worked on one developer board, ship it to the airline" product. You can see mistakes in shell scripts ("ls: no such file or directory" and similar stuff from init scripts, as well as debug code from their UI applications. It would really be interesting to get my hands onto an Ethernet link in that in-plane network. Guess one could have quite a bit of fun with that :) I've taken a series of snapshots throughout the boot process. Will post them once I'm back home and find time to wade through the holiday pics. Harald Welte: Back from holidays
I'm currently sitting at Amsterdam Schiphol Airport, waiting for the last connection in my Recife - Sao Paulo - Amsterdam - Berlin return trip. I'll be wading through the several thousand emails over much of the next couple of days, so please give me some time to get back to you. March 25, 2008Harald Welte: Receiving the 2007 FSF Award for Advancement of Free Software
The news has already made it to the net during my (offline) holidays, so this entry in my journal will come hardly as a surprise to you: The Free Software Foundation Award for the Advancement of Free Software 2007 has been granted to me :) I am deeply honored to be the recipient of the award, joining the list of (much more distinguished) recipients of the award. At the same time I'm sorry to not having been able to personally attend the awards ceremony. I've outlined the three key reasons for this in the statement that I prepared to be read at the ceremony. March 19, 2008David Miller: Vger recovering...
VGER suffered a major disk failure and it was not easy to bring the machine back up cleanly. Thanks to the incredible efforts of Matthew Galgoci, Matti Aarnio, and others, it is back up now and slowly but surely the mail queue is running what should be quite an impressive backlog :-) David Miller: Back from Japan...
I just returned from a wonderful 10 day trip to Japan, during which I gave a presentation on multi-queue networking for the Linux Foundation Japan Symposium. Most of my time was spent in Tokyo, but I was able to make 3 exciting trips outside the city. On Sunday March 9th, Kazunori Miyazawa gave us a tour of the Kamakura region and Enoshima. On Friday March 14th we took the hikari shinkansen to Osaka, during which I was able to experience "kuidaore" (literally "to ruin oneself by extravagance in food") in the Dotonbori district of the city. That evening we travelled to Kyoto and slept at a ryokan at which we had a traditional Japanese breakfast the next morning. Finally we spent that Saturday exploring Kyoto, mostly the eastern section. Finally we hopped back onto the shinkansen in the evening to get back to Tokyo. On Sunday I was able to meet up with other members of the USAGI Project for dinner, several of whom were unfortunately travelling and thus outside Japan during most of my stay. All in all it was a great trip, and I already have a list of places and things I want to explore next time! March 18, 2008Rusty Russell: APIs: "Easy to Use" vs "Hard to Misuse"
It's an elementary goal of API design to make something easy to use: easy for yourself, easy for yourself next year, easy for others. Let's take that as a given. Many goals will conflict with "easy to use", but the subtlest is the requirement that an API be hard to misuse. Ease of use attracts users, but difficulty of misuse keeps them alive. To make this concept crisp, I have two real life examples. The first is the safety catch on a gun. Hard to misuse beats easy to use. The second example is the Linux kernel's kmalloc dynamic memory allocation function. It takes two arguments: a size and a flag. The most commonly used flag arguments are GFP_KERNEL and GFP_ATOMIC: I'll ignore the others for this example. This flag indicates what the allocator should do when no memory is immediately available: should it wait (sleep) while memory is freed or swapped out (GFP_KERNEL), or should it return NULL immediately (GFP_ATOMIC). And this flag is entirely redundant: kmalloc() itself can figure out whether it is able to sleep or not. Implementing malloc() would be a no-brainer, and kernel coders generally like ease of use. So why don't we? [Correction:Jon Corbet points out that it's not entirely redundant in some configurations; we'd need to do a few lines extra work.] Because atomic allocations should be avoided: they're drawing from a limited pool and more likely to fail or make other atomic allocations fail. By placing the burden of specifying this onto the author, we make atomic allocations easier to spot and thus harder to abuse. And if we want to make our APIs harder to misuse we need to measure how an API scores, and that'll be the topic of the next post. March 12, 2008Rusty Russell: Bricklayer, not cathedral builder.
I'm always a little uncomfortable with "fuzzy" programming topics; much better to judge between two specific pieces of code. The big issues are important but it's hard to say something new on that topic which will help people code better. Most useful stuff has been said already. Nonetheless, for my OLS keynote years ago I did have a point which I felt was underappreciated, and managed to rope it down to actual guidelines so the idea was of practical use. I'm going to revisit that topic in my next few blog posts, because unfortunately my OLS keynote was not recorded anywhere for me to simply point to, and there has been some maturing of these ideas since then. March 11, 2008Harald Welte: Update from first week of holidays
For those of you who're curious: The first week of holidays went just fine, spending something three days in Sao Paulo and three days in Curitiba In Curitiba, I had a rental car and went to Vila Velha, as well as driving the serpentines of the Rua Graciosa through Morretes to the Beach. Oh, and obviously in Curitiba I had to go to Homem Pizza and Happy Burger, the two restaurants that I frequented the most while working at Conectiva 7 years ago. The biggest problem so far was the malfunction of the in-room Save of the Hotel in Curitiba, resulting in not being able to access any of my cash reserves, credit/debit cards, passport or laptop for two days. They actually had to physically break the safe open since the lock mechanism was stalled/clogged in a way that it did no longer move. Now I've just arrived in Recife, where after two days, the journey will continue towards Porto de Galinhas. February 28, 2008Harald Welte: Almost offline for holidays
I'm hereby announcing that I'll be offline most of the time between March 3rd and March 26. This is the longest time that I've been offline for quite some time - and it's a much deserved holiday after the intense work of the last year. I'll be doing quite a bit of travel in Brazil through those more than 3 weeks, meeting some old friends and ex-colleagues from my time in 2001 at Conectiva. I'll also be spending some time at the beach, plus exploring a bit of Parana and Pernambuco by [rental] car. This also means that I'll likely end up being forced to use my horrible Brazilian Portuguese again. But well, at least for me, unless forced to speak a certain language, I won't speak it at all. So this must be a good thing, then. Please don't expect any reaction to e-mails, snail mail, phone calls, faxes or any of the like during that period of time. I won't even have my German GSM phone online to avoid roaming charges killing me. February 24, 2008Harald Welte: Thoughts on FOSDEM 2008
I really have been disappointed quite a bit with my visit to FOSDEM this year. In fact, many of my observations might actually apply to Brussels as a whole, I really don't know. It all started with arriving at Bruxelles Central station on friday, where the entire station was so crowded it took me ages to fight my way through the crowds. Then something like only the fourth idle cab driver was willing to actually take us to the hotel. The others for whatever reason didn't want to earn those 15 EUR. Aren't there some regulations forcing them to transport paying passengers? Then, let's talk about the social event on friday. How can you hold such an event in a place that's about one third of the required size, and which has a music volume level that effectively prevents any form of communication. I left after about 10 minutes there, since there just was no point at all. One wonders what happens if there is a fire. Aren't there some kind of regulations of the max number of people you are allowed to cram into tiny places like that pub? At the conference venue the problem seemed to re-occur. All the rooms are significantly too small. Combined with the lack of ventilation and the lack of a PA system it was not possible to stand more than a single talk in the X.org devroom, before I had to get out to get fresh air. Getting in and out of the DevRooms is also a challenge by itself, since the hallways are over-crowded and full of noisy and loud conversations. Opening the door for even a small amount of time is barely impossible, since that would expose the talk on the inside to the enormous noise levels on the hallway. Especially since the DevRooms don't have any PA system, it's already quite a challenge to understand the speaker inside the room. Somebody opening the door just completely kills the communication flow The entire idea of putting up all the projects with tables in the hallways seems questionable to me. They do nothing but block the path for other people (also blocking emergency escape paths). Furthermore, cold air gets in all the time since many people have to use the doors in order to walk between the different buildings. It would make much more sense to keep the hallways for what they are: Ways where people walk between rooms. The project tables should be inside rooms. Those rooms would self-contain the noise generated by the tables, be more comfortable (warm, no wind) and keep the hallways free for people to walk on. The same problem exists for the "BAR" where you get food and drinks. It's too small, too crowded, and absolutely not comfortable at all (cold wind coming in through the permanently open doors, ...) And then consider the public transport "performance" on weekends. It took me regularly more than an hour for something that was a 2.6km distance between hotel and venue. That's quite ridiculous. Given how crammed those few trams are that actually run, it doesn't seem to be a shortage of passengers that makes them operate so few trains per hour. All in all, I could not do anything else but to attribute FOSDEM 2008 as something like "the most inefficient event", i.e. where I wasted a lot of time for reasons stated above, rather than actually attending lectures. February 22, 2008Harald Welte: Flying from Berlin to Brussels without showing any ID
It was really surprising to see that there was absolutely zero control of any ID on the flight between Berlin and Brussels. I'm well aware of the marvels (and data protection nightmares) associated with the Schengen agreement. However, zero form of identification on air travel was really a big surprise to me. Not even my flights inside Germany had this 'feature' How did this work? First of all, I booked the tickets through a travel agent quite some time in advance. No form of ID required (though he has my banking details). Next, I did a Lufthansa online check-in from my home, printed the boarding pass. On the airport, used the self-service luggage drop-off counter. Then directly went to the security check, and then to the gate. During the entire time, nobody asked for any form of ID. So if I did buy the tickets on cash rather than with bank transfer, it would actually still be possible to travel under false name and thus anynomously. Amazing. Am I missing something? February 20, 2008Harald Welte: flu provides opportunity to watch linux.conf.au video recordings
A quite serious flu hit me four days ago. While this prevented me from getting any serious work done (my doctor actually explicitly asked me to refrain even from mental work), it provided me with ample opportunity to watch through all the exciting video recordings of linux.conf.au 2008. The various technical X.org driver side related talks were really good to hear, and I'm happy that there is so much innovation and development happening there now. The most hilarious talk according to my scale of humor was Matthew Garrett's presentation on suspend to disk. I had to watch it twice, just because it's so entertaining. Rusty: Even you'll have a hard time competing against that level of entertainment :) February 19, 2008Jeremy Kerr: spufs git tree on kernel.org
After going through the magical approval process, I now have a spufs git tree published on kernel.org. If you're looking to try out the latest work on spufs, just do a:
[jk@pokey ~]$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jk/spufs.git
As with other git repositories on kernel.org, there's a gitweb interface to browse the tree. Of course, if you have any bug reports/requests/comments about the code in the spufs.git tree, feel free to email me at jk@ozlabs.org, or the Cell/B.E. open source development list at cbe-oss-dev@ozlabs.org. February 13, 2008Harald Welte: Something is wrong if your mail client is using 13.0GB of memory
On my fairly new quad-core 4GB RAM system I noticed that suddenly closing tabs in the web browser resulted in tons of disk accesses, which I [correctly] attributed to swap usage. This is quite a big surprise, since I don't use any integrated desktop and generally only run lots of uxterms in ion3 (over two 1600x1200 heads with 8 virtual desktops on each head) plus firefox. As it turns out, earlier today I started thunderbird (Debian calls it icedove) in order to do some cleanup (moving folders around) on my IMAP server. After about half a day, I was looking at the following line in top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3474 laforge 20 0 13.1g 3.1g 10m D 1 81.7 47:49.91 icedove-binThis is ridiculous. 13GB virtual, 3.1GB resident set size. And all that with something like roughly 3 million e-mails spread over about 200 IMAP folders. Who is supposed to use those programs? What do they use for testing? People with 10 mails in their inbox? Also, if you actually download the headers of a new folder, or headers of new mails in a folder, it takes _ages_. It looks actually like they individually request the headers of each email, without using the tagged command features of IMAP, thereby removing all the pipelining effects and being bound to one complete thunderbird-through-kernel-through-network-through-imap-server roundtrip per message. I haven't actually looked at the code, but just from observing the application, this seems to be the case. Also, every time I use the 'search messages' feature for any header that the IMAP server does not have an index for, thunderbird refuses to wait long enough until the server responds. So far I always thought mutt's memory usage of 40-80MB is already excessive, considering all it does is displaying a bit of plain-text emails. Well, for once I've been happy again that I'm not a regular user of those kind of bloated GUI programs. firefox somehow being the sole exception to that. It's barely useable on my 1.06GHz / 512MB laptop, where you already notice quite considerable lag in the responsiveness of the UI. :/ Guess next time I have to move folders, I'll probably revert to something like cyradm (that's a minimalistic imap client with command shell, not unlike the old 'ftp' command for FTP). February 06, 2008Rusty Russell: lca2008 Projector Pong with Wiimote and Linux: Pong Hero!
Once the teething problems were out, and with much assistance from various people, we had fun at linux.conf.au's Open Day playing a pong variant using IR pens and a Wiimote. I've finally put all the information up on a typically-ugly web page, including a link to the source code. February 04, 2008Jeremy Kerr: linux.conf.au 2008 hackfest
As a few people have asked about the original task, I've put up an archive of the linux.conf.au 2008 hackfest site. I'll also put up a bit of a post-mortem for those who are interested in our suggested solution to the problem. This should be in the next week or so. Congratulations to Anthony Towns and Jaymz Julian for taking the first and second place prizes (respectively), and thanks to the entrants, who did some great work in a limited amount of time! February 03, 2008Harald Welte: Working on ISO15693 support for librfid
It's really been bugging me for a long time that librfid was lacking support for the ISO15693 protocol. The supported reader hardware ASIC can do it, but librfid always was lacking the respective code. It has been on my agenda even three years ago, but there were always higher priority items to pre-empt it. In December 2007, Bjoern Riemer submitted a long patch to add partial ISO15693 support to librfid. The size of the patch reflected the huge amount of work that must have went in that code. So I couldn't really afford to let all that work bit-rot. I went through several iterations of code cleanup, starting with cosmetic issues, and digging deeper and deeper. I think it now doesn't really look all that similar to what Bjoern originally did, but at least now we have a working and fairly well-organized ISO15693 anti-collision implementation in librfid. However, ISO15693 has many different options with regard to speed, modulation, coding, etc. All those combinations have to be carefully tested. What's also missing is a way how to iteratively cycle through all available ISO15693 tags within range, similar to what we do with ISO14443A and B. Once that work has been finished, the actual higher-level standard commands, as well as the nxp I*Code2 and TI Tag-it vendor-specific extensions can be implemented on top. This can probably be done on one or two more days of additional work. Stay tuned... February 02, 2008Harald Welte: Meeting between gpl-violations.org and FSFE FTF
The last two days, I enjoyed a meeting between gpl-violations.org and the FSF Europe Freedom Task Force. Participating were Armijn Hemel (whom I have to thank to assure gpl-violations.org doesn't die while I was in Taiwan for OpenMoko), Shane Coughland (who is doing an excellent job coordinating the FTF) and myself. For a couple of hours we've also been joined by Till Jaeger, who has handled all the legal cases of gpl-violations.org so far. This meeting has been over-due, mostly because I basically dropped off the planet for way too long time. We've discussed all the current matters regarding strategies for license enforcement, current cases, progress of the FTF legal and technical networks, as well as future plans for incorporating the gpl-violations.org project. Yes, you have read correctly. I've been planning to do this for quite some time, and I'm confident that 2008 will finally be the year in which this happens. It's too early to talk about any details, but this is the logical step to assure both financial and legal independence of the project from my person, as well as scalability. As you might know, we have a couple of hundred reported violations and can only cherry-pick those we consider particularly important. In any case, it was a very productive meeting. I seriously believe it has helped to make all of us work together in a coherent manner, i.e. increased productivity and effectiveness for a long-term strategy to increase the amount of free software license compliance in the industry. January 28, 2008Harald Welte: Disrespect for election observers in Hessen
My fellow friends from the CCC have tried their best to observer the elections in Hessen (Germany) yesterday. The amount of resistance they've met is more than shocking. If you want to read more about this (in German), I'd suggest reading Frank's blog entry, Holger's blog entry and the official CCC release on this subject. In fact, in some of the municipalities the election supervisors have received official statements warning them about the CCC's intention to disturb the elections. What nonsense is this ?!? Having been part of a CCC election observer team in the past, I can only state that this is beyond anything that we've seen before. Why would there be any resistance against quiet and peaceful observation of the elections? The CCC election observers have absolutely zero history of ever having disturbed an election in any possible way. I'm sure you can ask about any municipality that has had first-hand contact about this. We know the laws and regulations very well, and want to do nothing else but to _observe_ the January 20, 2008Harald Welte: Learning about NAS chipsets
For gpl-violations.org, I've been analyzing a number of NAS devices recently. While most of them are based on some kind of more or less general purpose CPU (Intel StrongARM based IOP or e.g. VIA's embedded x86) plus standard peripherals, there appear to be more and more special purpose SoC's for this purpose. To some extent, this is only a logical development. NAS appliances seem to be a growing market, and the desire to achieve higher integration by e.g. moving the SATA/IDE controllers into the SoC make development easier and reduce BOM cost. It's quite amazing how much effort some companies actually go through. One series of chips that particularly caught my attention is the Stormlink Gemini series of NAS CPU's, e.g. the SL-3516. Looking at the public data sheets is particularly boring since they only have two pages. Instead of that, I'd recommend looking through the kernel sources that their downstream appliance vendors publish. They actually have hardware crypto, hardware IPsec acceleration, TSO (TCP segmentation offloading), hardware NAT, ... As if that wasn't enough already, they also now have a dual core variant, which has two ARM920 cores next to the hardware crypto and pimped-up Ethernet controller! While reading through the code, I made a slightly cleaned up diff against vanilla 2.6.15. It reveals a number of things that I'd like to point out:
As a summary: Kudos to those who have designed the product, and actually implemented all its features, in purely GPL licensed code. It's just such a pity that none of the code, not even the most generic and clean bits have been merged mainline. Harald Welte: Securitization
As a friend of mine (who has studied political science) recently told me about the process of securitization. Finally I know a word for the process that seems so commonplace in todays politics: Framing something that is actually a minor problem with some criminals into a question of essential survival, thus eliminating any rational debate about it. January 17, 2008Jeremy Kerr: how could this possibly go wrong?Japanese robot 'wired to monkey's brain'Japanese and US researchers say they have created a humanoid robot that acts according to the brain activity of a monkey from all the way across the Pacific. January 11, 2008Jeremy Kerr: petitboot v0.2
The next version of petitboot - the graphical bootloader for the PlayStation 3 - is now out. Some notable changes in the v0.2 build:
See the petitboot project page for more details and downloads. I've also built an OtherOS image with remote access support, so it's now possible to ssh to your bootloader. January 03, 2008Harald Welte: Repairing VIA EPIA-ME6000 busted capacitors
Just before Christmas, my vdr powered diskless Linux-based digital video recorder went bust. A bit of testing revealed that the VIA EPIA-ME6000 main board itself must be dead. I immediately ordered a replacement mini-ITX board without further investigating the broken one, expecting that the replacement might actually arrive before the Christmas holidays. Unfortunately this didn't happen. While replacing the board, I discovered that six of the 1000uF electrolytic capacitors went bust. So today I finally found a bit of time (it's great to be able to find time to do things again) to try and replace the broken capacitors. Despite the new ones being slightly larger, the board now works again like a charm. And that at a total cost of 1.62 EUR. So now I have two working mini-ITX boards. Guess I have to either find some use for it, or sell the new one again... January 01, 2008Harald Welte: My personal favourite from 24C3: Xbox 360 hacking
I've seen quite a number of presentations live at 24C3 as well as recorded ones in the days following the event. While many of them cover important subjects, there is one lecture that is outstanding: "Deconstructing Xbox 360 Security". The level of technicality of this presentation was just right. Finally something that went deep down into the technical details. Explaining what kind of flaws they found in the disassembled power PC object code. I definitely want to see more lectures/presentations like this. Don't be afraid to overload the audience with technical details. Just go ahead with it :) Also, this presentation has shown how far advanced the game console hacking is compared to mobile phone hacking (at least from what I've seen in the ETC (Ada-developers) and and Motorola hacker communities). The problems are similar: Completely undocumented hardware, cryptographic authentication of code by the boot loader (sometimes down to mask ROM), ... So I hope that the mobile phone hacker community will grow and more people with this skillet, attitude and time will join. Free your phones! December 31, 2007David Miller: TCP retransmit queue overhead...
There is this one cpu cycle killer we've been trying to find ways to overcome, and the more I think about it the situation wouldn't even exist if the SACK standard had been given just a smidgen more thought. SACK has been specified as a lossy indicator of out of order packet reception. This is the core problem. It's a hint, and you cannot actually "act" upon the information. So if you get some SACK blocks you can't free those marked packets from your retransmit queue. SACK is specified such that the receiver can free up those packets and stop advertising them in the SACK blocks. This is profoundly stupid, just look at the implications. What this means is that during a loss recovery event, we have to hold onto a whole extra window of data in the retransmit queue for no reason at all. Even more brilliant is that when the hole is filled and we get this HUGE ACK back acknowledging two windows worth of data we have to purge all of those packets from the retransmit queue in one go. For large windows, we're talking 4,000 packets or more. I don't care what datastructure you use to manage your retransmit queue, that's going to be expensive and show up in profiles. If, instead, the SACK specification authors spent just a little bit more time thinking about the implications of SACK being a "soft" indicator I think they would have changed their minds. If SACK were a "hard" indication and we could thus free up packets so marked there, the processing would be spread out throughout the recovery event instead of being batched up into one huge wallop of packet freeing. Supposedly the reason for marking SACK a "soft" indication is so that "low memory" embedded systems could free up the data. But this is incredibly stupid. If you can handle one window of data, you can also handle holding onto two full windows of data as well. We aren't talking about PDP-11's with 16K of RAM or anything like that. Furthermore this is inconsistent because the sender has to hold all of the data in such loss recovery events, why treat the receiver specially? It makes absolutely no sense, because if such small embedded systems ever have to send data, they have to commit to the same amount of RAM resourses in such situations. In fact, a "hard SACK" would help those small devices when they act as senders, because during a loss event they could liberate packets indicated in the SACK blocks they get back. The SACK standard folks made a horrible tradeoff. They created a significant future performance hindrance for %99.9999 of systems in order to cater to some theoretical memory usage issue on %0.00000001 of machines in the world. Every TCP implementation out theere is going to have to come up with a workaround for this issue especially as huge windows become more prevalent. And all of that engineering effort would be totally unnecessary (read as: we could work on much more important stuff) if SACK had been specified as a hard indicator. I have the urge to propose some TCP option that allows two TCP implementations to negotiate "hard" SACK blocks but that won't work. We'd still support "soft" SACKs for eternity and furthermore that's what anyone wishing to exploit this high cpu cost code path in the TCP stack would use. To be honest, nobody drops data in the way that SACK allows. It'd probably be pragmatic and reasonable just to start enforcing the "hard" semantics. And if they are violated (packets disappear from SACK blocks) we just RST the connection. We could add a ton of logging if such a thing happens, and we could investigate such cases. It's quite a mess and somebody has to mop it up. David Miller: Holidays...
I have to say that Christmas holidays are the best time to hack on things. Most of the yahoos are away on vacation and therefore the constant stream of distractions and really dumb emails in your inbox just aren't there. In short you can work on the things you always want to work on but never can because of time constraints. There really is an information overload problem. Today I want to talk about multiple return values. In short, don't do it :-) It really is a sign that things need to be redesigned, and on top of it multiple return values result in quite inefficient code. The most common case with C is when you need to allocate some memory and return it to caller, but if something goes wrong you want to give some error status too. So you end up with absolute crap like this: int create_foo(int flags, struct foo **foop)or, even worse: struct foo *create_foo(int flags, int *errp)I mean, that just doesn't deserve to live. First of all, the compiler has to allocate stack space for that "by reference" turd you had to add to the arguments to pass that second piece of information back. And that's slow, even on register starved cpus like x86 where stack accesses have been heavily optimized inside of the cpu to make up for that. Secondly, the semantics are not entirely clear. If an error happens for the second API above, will the function return value always be NULL? In the first API above, when an error happens will the "*foop" always be NULL'd out for me or is the caller expected to do that? Likewise, for the second API above, if there is no error and non-NULL is returned, can I depend upon "*errp" being set to zero? You don't know, because when you look at that interface definition you simply cannot tell. It's one big ambiguous ugly interface. For this particular case in the kernel we've settled on a set of macros that allow a pointer and an error to be returned in a single function return value. Basically, it takes advantage of the fact that the range of negative error codes cannot ever be legitimate pointers. So the interface above looks like: struct foo *create_foo(int flags)and the caller first checks:
p = create_foo(flags);
if (IS_ERR(p)) {
err = PTR_ERR(p);
return err;
}
And if "IS_ERR" is not true, it is a non-NULL pointer and
no error happened.
It is completely unambiguous what the return values mean, and how they will be presented to the caller. Yes, it's true that looking at the function definition you can't "see" this. But once people are exposed to this pattern enough they pick it up, and it allows us to eliminate ambiguity and unclear semantics for these kinds of cases. Have a happy new year everyone. December 30, 2007Harald Welte: proprietary MiFARE [in]security finally falling
At a presentation entitled "Mifare - Little security, despite obscurity" at the 24C3, Henryk Ploetz and Karsten Nohl presented about their revelations of the proprietary Philips MiFARE classic RFID system. As everyone in the IT industry suspected, the level of security provided by such a cheap, low-gate and completely undisclosed system is in fact very limited. I'm particularly proud that this security research is exactly what Milosch and me originally wanted to enable by creating the OpenPCD and OpenPICC project. We wanted to put easier accessible and open, documented tools for low-level access to 13.56MHz RFID systems. With a bit of luck, at some point in 2008, it should once again become clear that security by obscurity doesn't work. This lesson seems to be well-understood in the Internet world, but apparently has little penetration into the RFID sphere so far. There are still many proprietary systems whose security relies solely on the secrecy. Once a single person reveals that secret, the system is broken. I can only hardly imagine the amount of economic damage imposed by the perpetrators of such systems. There are a couple of hundred million MiFARE classic tags on this planet, particularly in public transport ticketing and access control. The vendors of such systems should be blamed for their false claims. And anyone who bought it should be blamed for their blind belief in the claims of profit-oriented corporations without any independent validation or verification. Harald Welte: Dependency of essential Linux bluetooth features on dbus
Apparently I'm not the only one with outspoken criticism of the BlueZ dependencies on dbus. I do not want to debate the merits of a message bus system on any system (desktop or non-desktop) and neither do I want to start a debate on how efficient dbus is trying to solve that problem. However, what I'm fundamentally opposed to is when basic interaction in a network or between a computing device and its peripherals depends on extensive userspace dependencies. Now you might argue that ipsec needs a userspace keying daemon, that routing protocols need a routing daemon, and 802.1x or WPA need a userspace daemon, too. This is not the point. There are very valid technical reasons for doing so, and nobody really proposes that such things should move into the kernel. Also, none of the above-mentioned programs have requirements on other userspace components aside from glibc or maybe some netlink specific library. Bluetooth however now requires dbus. At least it is almost impossible to do without. I have tried for neverending hours and didn't make it work. Others apparently have similar problems. If people want to [d]bus-enable their kernel-related tools, let them do it. But please make it optional and don't depend on it. This is just not how things are done in the Linux kernel world until now, and I don't think there has been any debate on whether we really want such a paradigm change yet.. December 29, 2007Harald Welte: Personal reflection on the 24th annual Chaos Communication Congress
It's great to be at 24C3, the 24th incarnation of the Chaos Computer Clubs annual congress in Berlin. In fact, this is my 10th anniversary at this congress, i.e. the first one I visited was 15C3. I ended up at 15C3 as somewhat of a coincidence by just following a fellow Linux hacker from the Linux User Group Nuernberg to whom I've since lost all contact. What's actually worth mentioning is that this is the first CCC congress that I visit as a pure guest. I have no lecture, and I am not actively involved with any of the things I have been involved before, such as the video recording/streaming team or the Sputnik RFID location system. Interestingly, I felt the first day much more tiring than usually, despite having slept more than in any of the previous years. Apparently the lack of constant adrenaline caused by last-minute-problem-solving has its impact.. The congress is a lot of fun, I've been talking to many old friends, colleagues and fellow hackers from all over the world, involved in all of the projects and/or companies that I've remotely had any contact throughout that ten year time period. It's a very nice feeling. I doubt there is any other event or occasion where I would feel more at home than at this annual congress. This is my culture. This is where I belong. Here are people who understand, or rather: understood. December 14, 2007Harald Welte: HTC TyTN II / Kaiser doesn't look like a GPL violation!
There have been numerous rumors floating around the net that the HTC TyTN II (aka Kaiser) might be a GPL violation due to a number of strings in the firmware image referring to Linux and vmlinux. I've done some analysis on this subject, and posted my preliminary results in this posting to lkml earlier today. So as indicated, I do not see any reason to believe there is a GPL violation with regard to the Linux kernel in the MSM7200 modem side as used in the abovementioned device. So please stop those rumors now. I'm obviously not opposed to people being watchful and report/investigate potential GPL violations. But before you call it an actual violation, please rather make sure that you have some evidence! December 13, 2007Harald Welte: Final cleanup of OpenMoko Neo1973 kernel patches
I'm doing one final review+cleanup iteration for the OpenMoko Neo1973 GTA01 related kernel patches before pushing them for review later tonight or at some point tomorrow. The cleanups are mostly dead code removal, avoiding compile-time warnings as well as cosmetic cleanups such as adding MODULE_DESCRIPTION to all modules, and using consistent naming for files and driver names. GTA02 will have to wait a bit more. On the one hand, changes that the kernel developers want me to do on PCF50606 will likely appear in the PCF50633 driver, too. On the other hand, the entire Smedia Glamo driver core has not been polished yet. Harald Welte: Playing around with the HTC TyTN II / Kaiser
For reasons that I cannot yet disclose, I have obtained a HTC TyTN II (aka Kaiser). This is my first (and hopefully last) Windows Mobile based device. So far I've taken the device fully apart, unmounted all the shielding covers and took high-resolution photographs of each and every part of the phone. The resulting information is now that I'm aware of all the major components in the device, and I've started to do some data mining on those components. As everyone knows, HTC used a Qualcomm MSM7200 based chipset in this device. The MSM integrates both the GSM baseband (DSP+ARM9) as well as the application processor (ARM11) and many other things. What's less known is the further peripheral configuration.
For those interested, I'll go through my PCB photographs and will edit and publish them soon. I am now digging through all the various XDA/WM6 hacker information out there and trying to understand the various tools that can be used for further taking apart the software side. I've already managed to get into the bootloader, which apparently offers a standard USB serial emulation that can be accessed even from a Linux PC. Unfortunately the MSM7200 is a highly proprietary/closed chipset, and there is very limited public information available. I've already ran into this while evaluating potential hardware for OpenMoko at some point in the past. I became curious about this MSM7xxx chipset family when they were first added to the ARM-Linux machine type registry many months ago. Anyway, meanwhile Google seems to be doing a lot using this chipset, as they have recently announced the availability of a linux-msm.git tree. The source code should document many things such as GPIO assignments, IRQ's and contain drivers for most of the hardware (on the application processor side). Now if some of you ask yourselves if I have turned my back on OpenEZX and OpenMoko: No, that's not true. I'm just looking at this for a very pecul |