You know how some times in life you start to do one thing only to realize that you have to do something else first? Most of the time, it’s not even stuff we bother to acknowledge. But every once in a while you start to feel like you’ve built up quite a queue of tasks and may even lose sight of what started you on that queue. This is one of those times for me.
I got a new Dell E6410 laptop and wanted to install MacOS on it. (Don’t bother asking why, you’ll just have to accept that I wanted it on the laptop.) As you may have read here previously, I’ve been able to install the MacOS on two prior Dell desktops but this would be my first laptop attempt and it promised to be a little rocky because the new laptop includes some shiny new hardware that the hacintosh community hasn’t had time to build drivers for yet. The following is a transliteration of the mental queue for this process.
I tried running the MacOS install disc in the E6410. As expected, it failed.
I tried using a generic installer disc to preload as I did with my Dell E520. As expected, that failed too.
I found a preload image that included a kernel set up for the i5 CPU in my laptop and that did boot. Then the install disc ran fine right up to the end when it gave the failure notice, as happened to me with the Dell Dim 4700 install. And like that install, this one seemed to work fine after rebooting.
Right away, the big issue with the resulting MacOS on the Latitude E6410 was lack of networking. Both the wireless and the wired connections were not working. I briefly played around with trying to get the wireless to work but realized that I was pretty far away from that (and that better minds are on that case), so I switched to trying to get the wired working. My Intel 82577LM integrated networking is pretty similar to the Intel 82566MM for which there is already a kext so I tried using that. Nothing.
Next, I tried tweaking that kext. I discovered that my device ID 10EA was not in the Info.plist file so I simply added it. The result was that the device was discovered and SystemProfiler showed that I did have an Ethernet card and I even got a note in the Network Settings that I had a new Ethernet card. But the activity light on the port had gone dark and the Network Settings couldn’t communicate over the port. Something seemed wrong with the kext that was causing the computer to recognize it but fail to use it.
I fiddled around with the DSDT for a while figuring that maybe I had some sort of hardware conflict going on. Unfortunately, I don’t know much about the DSDT fiddling and besides, the fact that the activity light went out made me think that device communication was active, if wrong.
So I figured it must be a problem with the network driver – something about that old kext isn’t working with my new network card – a reasonable expectation. So I wondered what about the kext might be failing. The kext is pretty simple, though, and besides the Info.plist, there’s really just the driver binary. But there were a number of different Intel82566MM.kext versions floating out there. I tried them all just to see if any of them did anything different. Nope.
Interestingly, this driver was done with open source and was still available on googlecode. I downloaded the driver code to investigate. Yep, it’s C code. That’s a problem for me since I’ve never bonded with C. (I like telling the computer what to do but I think it is the computer’s responsibility to manage memory, not the programmer’s.) But I can read enough of it to get a sense for what was going on. And most interestingly, a lot of the code was from Intel.I poked around on Intel’s web site and found where I could download drivers. No surprise that there were no downloads for Macs. (Most Mac drivers come directly from Apple since most of the hardware is sold by Apple, even if it was designed by Intel originally.) But in addition to drivers for Windows, Intel also offers drivers for Linux. And Linux and MacOS are both Unix derivatives. So I could then see how the author of the original Intel82566MM kext had used the Linux download to build a Mac kext.
I had to go deep to find the right Intel driver for my 82577LM since Intel’s site didn’t have everything connected properly but I did eventually find the Linux code for download. (And funny that a week after I downloaded the code, Intel came up with a new version of the driver code and fixed the connection to the 82577LM page!) Since the time of the 82566MM driver, the code had branched to cover a couple of different categories of Intel E1000 drivers. The base E1000 cards go with one driver and the extended E1000e cards now go with another driver (as well as a third driver for a few special case cards). Interestingly, the 82566MM and 82577LM both are in the E1000e category, so maybe I could download the new Linux code, do a port to a kext like had been done previously and cover all of the current E1000e cards? Worth a shot.
Things were off to a rocky start right away. I was disappointed that the open source on googlecode didn’t include an Xcode project that I could pick up and use. So I decided to stick with the command line setup that it was ready for. I was able to download the googlecode project and run make without any problem. So I re-made the existing kext – not what I was going for nor that exciting, but at least a baby step. Quick test and yep, same result as the kexts I had downloaded. It would have been nice if making my own kext worked, but at least I was able to confirm that I was making the kext the right way and duplicating the results.
Next, I looked at the new Intel driver code. Whoa. Lots of differences. File names, structures, etc.. My aim was to repeat the same process that the previous port had done. So, first, the ethtool, netdev, and kcompat code was tossed. Next, I had to replace all the type definitions of the sort “s32” with the Mac equivalents “SInt32”. Then came the hard part of reconciling the changes by looking at the kext source, looking at the original Intel source that was used for that kext (another download from Intel to get the older version), and looking at the new Intel source. It was quite a puzzle to put together. And I spent days looking at compiler errors gradually go down from thousands to a handful then back up to hundreds once I moved on to the next level of compilation. Just getting the code to compile at all was a huge effort but I remained hopeful that the compilation was the hard part and that once deployed, the driver would just work.
Yes, I know that was naive. The driver did not just work. In fact, it caused a kernel panic. Considering that I had been able to compile the old kext and deploy it and get the same results as the kexts I downloaded, I felt pretty certain I had the kext creation part of the task figured out. That meant it was the code content itself where I was having the problem. And as I wrote above, that’s a problem since I’m not a C guy and I am most definitely not an expert on writing drivers for networking hardware. But I do know how to put debug statements in code so that’s what I did. I found that things worked pretty well right up to the kernel panic (duh!) and I found the last line of code that was called before the crash. I determined that the crash was caused by trying to execute a method on a null object (Java-speak, but probably translates to C). The object shouldn’t be null so the real question is why is the object null. Well, I have no way of figuring that out since I can’t quite figure out what about the code defines any of the objects.
It seemed like a debugger would be a good thing but how could I debug on a computer that was definitely going to crash? And of course, I’d been doing my development on a different system (where I now have 8 “Spaces” filled up with windows that represent my various steps in this queue!) that I couldn’t jeopardize crashing. Besides, any debugging would be better done on a real Mac and not a Hac. Lucky for me I do have a real Mac, even if it is old and dusty. I pulled the G4 out of the closet (proving again that holding on to old hardware can be beneficial!) and set it up.
Not wanting to make a mess of the system drive that runs 10.4 on that system, I unplugged that drive and plugged in the original smaller drive that came with the G4 (which I had been using as sort of an “extra space” when the G4 was last my main system). I put the MacOS 10.6 disc in the DVD drive and booted up looking forward to doing a Snow Leopard install on an actual Mac, even though it was old. The disc couldn’t be read and I just got a “?” icon on the screen. I tried it in my external DVD drive but same result. Hmm. I put the OS 10.4 disc in and it worked – that’s the way I had installed 10.4 on that system before so it was just confirmation that I wasn’t losing my head.
I did some googling and discovered that the partition format of the new 10.6 install DVD is GPT which old Mac hardware cannot read. Further research confirmed that it’s because 10.6 has dropped support for PPC hardware, including my G4. That means that 10.6 cannot run on my G4.
But 10.5 can run on it and being the good hacintosher that I am, I had bought the $129 OS retail install disc. I tried using the 10.5 install disc. The computer did read the disc and did start the installer. And then it quickly told me that the hardware was not supported. What? More googling turned up that the 10.5 installer requires a processor speed of 867Mhz and even though I had 2 processors of 450, the installer rejects it. Granted, you can’t just multiply processor count by the speed to get an effective speed but still, this seemed like an arbitrarily high bar to set for requirements.
And it turns out I am not alone in thinking that. Many people thought that requirement was silly and came up with ways of hacking the install to remove the requirement. But even more clever is that somebody came up with a way to change the NVRAM to convince the computer that the processor speed was higher. And that trick turned into the LeopardAssist application. Only catch is that to run that application, you need to have a working 10.4 install. And since my drive was empty, I’d need to start with a new install of 10.4. When 10.4 finished installing, I installed LeopardAssist.
Now, I was able to install 10.5 on the G4. When 10.5 finished, I removed LeopardAssist (a feature of the LA installer on the Options panel of the first page of the installer) so that the settings would match that of the real hardware. The last thing I need here is to introduce some new level of confusion to my process!
Okay, so back to basics. Let’s create a kext as Apple tells me to and see if I can do the remote debugging like they want. Now using Xcode, I created the kext with 10.5 as a target and was sure to include PPC as an architecture. I was able to build the sample kext according to Apple’s guide. The next step is to confirm that you can do remote debugging so I followed Apple’s guide on that. Everything went okay up to the point where it had me deliberately cause a kernel panic on the target computer. It wouldn’t work! The instructions said to add code to cause the panic, but the kext was not doing a match such that the kext that contained the code to panic would be called. The kext just loaded nicely on my G4 and that was it.
I found Apple’s code for InstantPanic which was an even simpler kext to cause a panic and that did do what it was supposed to. So I know I can make the G4 panic, but why not with my kext? I managed to modify the Info.plist properties to get the kext to start matching. The problem turned out to be in following instructions for the debugging that were set up for 10.6 but testing on 10.5. Apparently, you do matching somewhat differently on 10.5 than on 10.6. I read through the same Apple guide for 10.5 which I was only able to find in a Google Docs cache. Rather than using IOMatchCategory as was described for 10.6, I found this code worked for 10.5 (Apple doc on PCI matching):
Then the matching started and then the kext loaded and then the panic was called. I also discovered that rather than following the “sabotage the kernel extension” instructions, it was much easier to just add a call to the function “panic()” – much more clear. Don’t know why the guide was so complex there.
Anyway, now that I finally had a successful panic, I could call gdb from the development computer and connect to the now panicing target computer and I could indeed navigate the stack. Neat.
Back to my real code. The first step was to stuff the Intel driver code into a new XCode project. As mentioned above, the googlecode download did not include a project. So I created a new one and added code to it. I started over with new copies of the Intel code to make sure I didn’t carry along any problems from my previous failed attempt. Besides the same issues I had initially (“s32” changed to “SInt32”, etc.) I had all kinds of build errors. Digging through the Xcode build options bit by bit (no pun intended) finally got the build settings the same way they were defined in the original googlecode Makefile. And yes, I finally got it compile without errors.
Deploying that to the G4, it worked fine. No kernel panic at all. So maybe I have it fixed? I tried putting the kext on the E6410 and it kernel panicked right away. Why? Ah, I see. Since the G4 doesn’t have that network card, the matching didn’t start on the G4; but since the E6410 does have a match, the kext started and caused the panic. While I could monkey with the match parameters in the Info.plist to force matching, that’d mean that the driver would try to run with hardware that doesn’t exist and that could cause different problems that result in a kernel panic making it impossible to know what is a “real” problem with my code and what is just a result of forcing it on the G4. In other words, since the G4 does not contain the subject network card (and it can’t just be moved since it is a laptop network card, not a PCI card) the G4 will not be of any use to development of this kext. Which means the effort of setting up remote debugging was a waste as was figuring out why I couldn’t cause a panic. And it means that debugging would need to be done without any kind of network.
Or maybe I could setup a network using Bluetooth or Firewire? First, I tried firewire by hooking together my development Mac and my target E6410. The dev Mac recognized the firewire connection as an IP connection and gave me a new network connection. The target Mac did not, however. I don’t know why but it didn’t even recognize that a connection had been made. Could have something to do with the way the Firewire port was being recognized by MacOS on the Dell hardware. Or it could be that the hardware just wouldn’t support it. Either way, it wasn’t going to work without adding some huge additional effort to my queue. I’m already deep enough in the queue that I run the risk of the queue toppling so I had to move on.
On to the Bluetooth. I was able to get the integrated Bluetooth in the E6410 to borrow the network connection from the development computer and I was even able to ping the E6410 target from the dev computer. Cool! Then I loaded the kext, caused the panic, and boom, the Bluetooth shut off. I guess since Bluetooth networking shuts down during a kernel panic, it isn’t an option for remote debugging either. That means no remote debugging on this laptop.
Which means I’m back to the brute force debugging. And I’ll need to be able to make copies of my code from my dev mac to my test mac without a network. To help with this, I found a handy trick for mounting a USB drive from single user mode.
Now what about the code was causing the kernel panic. Going way back above, you’ll recall that I suspected an object wasn’t being defined but why not and how would I fix it? I stared at the code for hours and eventually figured it out: in the new version of the Intel code, there was a place where a “switch” was added based on mac type – with no default in the switch, nothing was being defined. But why no mac type? Because the Intel-provided netdev.c had been updated to include the new driver card definitions but I had not ported the data from netdev.c to the kext replacement cpp code. When I did that, I was able to get further. I came across another similar issue and fixed that the same way. Then the driver loaded with no kernel panic. And even more surprising, the kext actually worked!
Fortunately, the process of resolving the kernel panic was all that was needed to get the card working. Actually, the card could only get a 10 Mbit connection initially but that was an easy fix to bring the card up to 1000 Mbit – more code needed to be ported from netdev. And then I had my new working kext.
The only thing left to do was write about it here and publish it to the hacintosh community.
Note that in this queue, I had taken a huge detour about kext debugging and setting up the G4. In writing about it now, it sure seems like a waste of time. But at the time, I was pretty well convinced that remote debugging was going to be the key to it all working since I didn’t think I stood a shot of figuring it out on my own (wrong!) and since I was confident other problems would follow (right) that I also wouldn’t be able to fix (wrong!). Therefore it made sense to attempt to get remote debugging working. And even though I knew I wouldn’t get the driver running properly on a computer that didn’t include the network card, I did not realize how the kext matching works so I didn’t know that I’d never get the kext loaded in a valid way on the G4 if it didn’t have the right hardware. In other words, while setting up the G4 for remote debugging didn’t directly help me get the kext working, it did help me understand the matching mechanism which did come in useful later in finalizing the Info.plist. And the process of setting up debugging helped me to understand the build architecture concepts. Yes, I could have taken a more direct path to those lessons but that’s the nature of a task queue like this and the reason for the expression: what doesn’t kill you makes you stronger!