Computer Corner II #31

The Computer Corner Take II (#31) by Bill Kibler

To see more Computer Corner articles look here: CCII page or check out the Home Page .

Building your own distro, step by step

The Why

It seems like I am always wondering if I should stop using my current distro and instead build my own. I have used "linux from scrath", "gentoo", "arch", and many others over the past 15 years or so and always run into the same problem, what to include. Just because I am unhappy with some new distro direction change, doesn't mean I don't like most of what they have done. So I usually just ignore the changes and keep doing what works. However, this time I have a clear project - generating a distro for the ARRL handbook - and as such, can see just what needs to be removed or included. So this step-by-step guide is what I did to generate the ARRL handbook distro.

This is a follow on project from writing article #30 about why the ARRL needs to do their own linux distribution for inclusion with their handbook publication. In writing the article several concerns cropped up that only the ARRL can solve by creating their own distro. They currently support only windows platforms, no Mac and no linux at all. The handbook and articles in their magazines rarely use linux and when they do, there is no clear direction about what to use. So my position is that the ARRL needs to select one set of linux tools and given distribution to enable both staff and readers to be successful in using linux.

In article #30 I covered some of the problems I had with several distros and why I felt they didn't work for the target group of users. You may have your own list of reasons why one distro is better than an other, but the long production cycle requires that the distro not change more than once every two or three years. Their base repository of programs needs as well to be large enough to support the projects. Lastly, you need a tool set to generate your distro and thus for me those concerns all point to building a "Debian Stable" distro. The following steps follow my actions in building the ARRL Debian Distro.

Step 1 - read the manual!

Over the years I have earned a reputation for solving problems that others have been unable to do. Althought I do have a better than average skill set, it is being pragmatic that allows me to succeed where others failed. My first step in being pragmatic is getting as much data as possible about the task or problem facing me. For this task it meant going to the web and tracking down the Debian live web page and finding the live-build tool and it's manual. I have used many of the Debian manuals and help pages and find them to be some of the best. The live-build manual is both informative and provides good step by step instructions for generating your own disto. I suggest you read the whole manual first, as I did, before trying any of the tutorials.

As part of the first step of being pragmatic, I like to make sure I know my tools and that they all work correctly. That means following at least one of the tutorial examples from the manual and testing it very well. For me that means using QEMU and Virtual Box as well. Since I am going to be building many test versions and need to run them to help me make changes, I will need a good emulation setup that can use the ISO file directly without any CDRom burning. If you do not understand how to setup the emulation, then get their manuals and follow the examples.

At this point I need to make this very clear, that what we are about to do is for linux users that have a very good command line skill set. If you have just started using linux and haven't become a true linux convert yet, I suggest you wait until you have a better grasp of how linux works and what makes it different from windows. My article #3 - Linux 101 - points outs some suttle differences between windows and linux that you might consider before starting here. Let me say that if you believe that you must re-boot the system after adding a new program to linux, you do not have an appropriate skill set for this task. For those lacking, I suggest you add to your skills by using as many linux distributions you can get your hands on. Follow that by testing those distros using an emulator, both on your destop and with vnc and ssh connections. There is a pretty good chance that things will go wrong and you will need the abiltiy to troubleshoot linux without any desktop - learn how to switch to a terminal session and become root user. Know where the logs are kept and how to access them. Know where apt-get keeps it's files. You will be using all these and more non-basic user activities as we move forward.

Step 1 ends when you are sure you know what your going to do, that all your tools are working well and that you have some ideas about testing and checking the results in such a way that you can see more than just major problems with the release. You need to have enough testing steps to be able to find minor issues with the built releases. Generating a release is not enough, you need to be able to test every asspect of the product as well.

Step 2 - The first test build

At this point I have read the manual once, found some concepts I need to review later when I start tweaking the package lists, and stopped at the first tutorial. I have a good idea now of how all the pieces work together and weak idea of the needed changes I will be making later. What I really don't know now is what sizes my images will be, and just how long a build is going to take. I am not on a fast internet connection, in fact it is rather slow but reliable. So my next step is doing tutorial 1 to see what is produced and how long it takes.

I created a "debian" directory to house everything in my "work" subdirectory. Over time my "work" directory has moved from it's own hard drive, nfs mount point, and now back to a 1TB drive. I like separating my "work" from my "home" directory as it allows me to backup and manage the two paths differently. I feel that client data and project information is too important to put with my personal home data and thus created the "work" space. There have been times where I backed up the work data every evening if not more often. Clients can get pretty unhapy when you tell them your hard drive crashed and took all their work with it - so back up anything important and do it often. Keep in mind too, that we will be creating lots of data everytime we do a different image build. I feel this project might create 10 or 20GB of data very quickly.

So the first step goes like this:

> su
Password: xxxxxx
root> mkdir debian
root> chown kibler:users debian
root> exit
> cd debian
> mkdir tutorial1 ; cd tutorial1 ; lb config
[2014-01-24 16:39:08] lb config 
P: Creating config tree for a debian/wheezy/amd64 system
> su
Password: xxxxxx
root> lb build 2>&1 | tee build.log
[2014-01-24 16:06:37] lb build 
[2014-01-24 16:06:37] lb bootstrap 
P: Setting up cleanup function
[2014-01-24 16:06:37] lb bootstrap_cache restore
P: Restoring bootstrap stage from cache...
[2014-01-24 16:06:37] lb bootstrap_cdebootstrap 
[2014-01-24 16:06:38] lb bootstrap_debootstrap 
P: Begin bootstrapping system...
....
....
....
[2014-01-24 17:25:18] lb chroot_devpts remove
[2014-01-24 17:25:18] lb testroot 
P: Begin unmounting /dev/pts...
P: Begin unmounting filesystems...
P: Saving caches...
Reading package lists...
Building dependency tree...
Reading state information...
[2014-01-24 17:25:19] lb source

>

And one hour later it is all done - your time will vary depending on internet speed. It created 23186 files in the "tutorial1" directory and went from 108KB to 1.4GB of disk usage. The cache toped out at 627MB (what got downloaded from the net), while the chroot space was 489MB. The hybrid ISO is 136MB, which is not bad for a command line only live distro. Let's see what tutorial 2 does, when we add the LXDE desktop and iceweasel.


>mkdir tutorial2
>cd tutorial2
> lb config
[2014-01-24 18:26:29] lb config 
P: Creating config tree for a debian/wheezy/amd64 system

> echo "task-lxde-desktop iceweasel" >> config/package-lists/my.list.chroot

> sudo lb build 2>&1 | tee build.log
[2014-01-24 18:27:39] lb build 
[2014-01-24 18:27:39] lb bootstrap 
P: Setting up cleanup function
[2014-01-24 18:27:40] lb bootstrap_cache restore
P: Restoring bootstrap stage from cache...
[2014-01-24 18:27:40] lb bootstrap_cdebootstrap 
[2014-01-24 18:27:40] lb bootstrap_debootstrap 
P: Begin bootstrapping system...
....
....
....
[2014-01-25 01:02:31] lb testroot 
P: Begin unmounting /dev/pts...
P: Begin unmounting filesystems...
P: Saving caches...
Reading package lists...
Building dependency tree...
Reading state information...
[2014-01-25 01:02:31] lb source

I watched the build and at one point it said 815 newly installed packages, and this was after a rather long list of already loaded packages. This is a result of loading the "task" package - which is a collection of packages and metapackages, not just one package, that various groups have decided should go together to make the whole, in this case, LXDE experience or tool set. With more to download, the time took about 7 hours and put 60957 files in the tutorial2 directory structures. The directory grew to 5.8GB with a surprisingly usable image of 649MB, a chroot of 2.1GB and a cache of 1.1GB. It is still larger than I would like, but you can burn CDRoms using this image and have a little room to grow.

I now have two ISO images to test and compare. I also have on my build system, both QEMU and Virtual Box installed. For the first image, I used the command
> kvm -cdrom binary.hybrid.iso
from the tutorial1 directory. It came up and displayed a menu which I just hit return and was presented with a command prompt after a very short time. I ran a few comands and checked the number of packages installed - it was 167. I didn't see any ssh installed, nor did I see any other important tools missing. It looked like a great minimun system, running the latest patched kernel and updated utilties - in this case release 7.3.

Since the only difference between the first and second tutorial build is adding lxde and iceweasel, we can assume for now that installion onto a hard drive will work the same for both images. So this next test is using the second build and Virtual Box. This requires more work to test, but all the steps are covered in the linux-build manual, in section 4.6.2. I pointed one of my virtual box test sessions at the generated image and started it. The package was not as fast to boot as the base, but still quick and the lxde desktop was nicely layed out with a considerable amount of usable menu items. I liked it enough that I copied the image to where I store my ISO images I download for testing, with the idea that I might burn a CDRom later for use on one of my other systems.

We now have a basic understanding of how the linux-build system works, tested two build outputs, and have some facts about size and programs selected. The tutorial 2 however added too much unwanted packages, and we start doing the real item, we need to learn how to manually adjust the package list. The next step needs to be re-doing tutorial 2 with a smaller selection of packages. I noticed when checking the task-lxde-desktop list of package, that only 4 of them were "depends", or must have. So for this test we need to remove the "task-" line and replace it with the four "depends" items.

Section 6.1 of the manual explains how to use the auto update command set which we will use in step 3 and 4, but for now we will just edit the file created for tutorial 2, to add our own packages to the image. That file was "config/package-lists/my.list.chroot" and contained only the line "task-lxde-desktop iceweasel". Using vi, I edited the file to look like this "lightdm lxde task-desktop tasksel iceweasel". Not sure what the task-desktop will add, but hopefully enough to make it usable. So we then do "lb clean" and the "lb build 2>&1 | tee build_2.log" command line - note the "_2" added to the build log, so I can compare this build against the previous log if needed.

Not exactly what I though would happen, but it worked. I expected to only get the LXDE desktop and instead I got both LXDE and GNOME. The download count went from the 800 range to 1300 range, and took only 3hrs, as there were only 500 new packages between the two builds. The cache went from 1.1GB to 1.5GB, while to image grew to just over 1GB from the 118111 files in the 6.4GB directory. I tested the image in virtual boxand gnome desktop start, but with error messages. LibreOffice was loaded, which makes me think we hit some sort of re-cursive loop where one package included another package that included even more packages. Not good, but then you should expect a few failures or odd results as we learn the finer points of linux-build and debian repositories.

Unwilling to leave things alone, I changed the config file to "lightdm lxde menu xorg xserver-xorg-input-all xserver-xorg-video-all librsvg2-common tasksel iceweasel" and re-did the build. It took 45 mins, with very little network downloading - mostly checking for updates, and produced a 345MB image. The directory shrank to 3.2GB, while the cache and chroot remained the same. The best part is when I tested it, everything worked fine with a good selection of menu items, nice desktop layout, a real keeper.

So this ends step 2 as we should know enough about the toolset to make better choices and selections. We have returned to the manual once or twice and thus know where to find the commands we need to use for the next step. Time to do the base package for real.

Step 3 - Selecting our base package list

At this point we have built several test images, tested them and have a good idea of our problems. The main problem was using the "task" selection of packages, as it produced way too many packages we don't want or need. We will need to be considerably more selective on what packages are included in our base selection. We must have a "desktop" and a few tools associated with it. However, a full office setup is not needed, but some programming tools will be needed - tcl/TK, perl, python? The controlling factor here is size against needs - it can't be too large or not have the tools we must have.

There two ways to go for selecting the base list of packages - gradually adding one package at time, or finding some Debian based distro and using their package list. While checking out puppy, their download site had package lists associated with each release. You could download those and check them over for problems before doing a build. The problems to look for in any other distro list, is special packages they created to replace a normal debian package. Typically that is how these special distros come about, they take a plain Debian build and replace some packages with their own toolset. Finding those new or different packages probably is less work than building your list one package at a time. Keep in mind that you can test the source distro before using their package list and know that you like their selection. Trying it first will certainly cut down on what needs to be done to build your own distro.

In selecting the desktop, I feel it is important to keep things as close to normal debian install as possible. We might not have the packages, but then if the users wants a fuller install, running apt and adding more packages is the solutin. We don't want to try and have every package possible, let the user decide that later. Our objective is a select list of programs that run on a desktop that should be close to what they might load themselves. The Debian teams are considering changing from gnome default to xfce. Both gnome and KDE have become leading edge and just too much for the average user. I use xfce over LXDE, both are good for small systems, but LXDE is slightly smaller. With a good final build of LXDE and for this "how-to" I have decided to just add to the LXDE based already built and working instead of xfce. The changes are minor, so you could do xfce on your own build with about the same results.

At this point we want to start using the auto build tools, as we will be redoing the rebuilds after one or two package changes. The auto rebuild is designed for just these steps. So start by deciding what packages are missing. You will need ssh client and server, vnc for remote desktop access, it seems perl and python are included, but no tcl/tk. For my ham projects I use ser2net, setserial, usbip, and gforth - I will skip the gforth for now, but the others need to be added. Most documents are in PDF format, so xpdf is a good option - there are other small packages to try. Remember that these packages are to support the users skill set and as such synaptic package manager is probably a must have.

We also need to consider how old a system we want to support. There may be a few 486 systems still running, but most of the recent work is from 686 and newer. In checking the information on linux-image-686-pae, it said the kernel supports, "Intel Pentium Pro/II/III/4/4M/D, Xeon, Core and Atom; AMD Geode NX, Athlon (K7), Duron, Opteron, Sempron, Turion or Phenom; Transmeta Efficeon; VIA C7; and some other processors". I did some checking and there are plenty of CPU type that might still be in use that are not 686 level. So support for older systems is "--architectures i386" and you can set the CPU level to " --linux-flavours 486 686-pae" which will load both kernel types for selection at boot or install time. By default my builds were amd64 and I followed the manuals settings for changing to i386 and got error messages because I didn't use the purge command - see note below!

Note: if changing from default amd64 to i386, you must use the "sudo lb clean --purge" to remove all the amd64 packages and replace them with the new i386 packages - this will remove the cache directory and create a new one - so it is best to start a new directory than it is to change "flavours" in a major way.

After you start adding packages and testing the changes, you just keep doing it over and over again until you feel that you have a proper base system to work with. Expect that you will find missing packages as you do step 4, so don't fuss too much over finding every package you need for the base system. However it is important that you have enough packages to properly load and test the results.

Step 4 - Selecting the ARRL toolset

In step 3 we selected the packages that make up our base system and made sure the overall size was would fit on a CDRom, as well as room for the main reason for the disk, the ARRL toolset. At this point, we could cause the overall size to exceed our maximum, if so, we will need to review the base package to see what can be removed. Since I am not on the staff of ARRL, I don't know enough about their current tool usage to make any real guess as to which of the many Debian package would work best for them. I need some other method of selecting package, which is also a good process to use for anyone trying to select tools for their own distro. My first selection criteria will be based on support. Packages that no longer are supported or don't have any active signs of recent updates, should not be selected. Keep in mind that you want a distro that has someone other than you keeping the tools current and secure.

We start by checking each of the tools for a given task in the repository index and following their links till we find nothing or current support information. Try and find the latest "release" text file or log. There really is no quick way to do this, as some links might look great, only to find the actual update log with a ten year old entry. Just because there is a Debain maintainer asigned to the package doesn't mean they fix program bugs - they may only make sure the source code compiles correctly on the current release and nothing more. You will need to dig deeper and find the actual state of the package and who if anyone fixes bugs or security leaks.

Now if I were a staff member of the ARRL or for that fact any group wanting to create their own distro, I would simply build a distro with all the possible tools and pass it around to other staffers for testing. Their feedback would then reduce the selection list and I might again redo the release process till only one tool for each task is on the image. Another option would be to setup a linux server with all the choices and a common data storage location where examples of project data can go. Depending on how the server was setup, I would give each staff person a login and their own home location so they could do the testing on the sever. I like the server idea, because as root, you can control the setup and what people see. If you have vmware or such, it is possible to use the image you created to provide complete personal systems on one server with control over the image that each person gets. These options are great for people who have windows only machines, as you add vnc or such and they log in from the comforts of old and funky windows.

I went thru several add, test, subtract, re-do loops and came up with my list of packages. I discovered that several of the electronic apps were duplicating tools and as such ended up just using kicad. I figured a word processor would be needed to generate articles and thus selected abiword. I added midnight commander, as I have a friend that uses it all the time. Lastly I went through the menu and removed any programs that didn't appear on it. I figured if it isn't on one of the menu lists, the user will never see it or know it is there. The end size came out at exactly 500MB, which is more than my target of 400MB. However I feel the set of tools is about right and with synaptics the user can add more prorams from the many tabs to choose from. Clearly not ideal, but then without ARRL input and testing, it is not possible to be more inclusive.

You will find a link on the bottom of the page to this image and the two config files that controlled the build. Let us now talk about testing.

Step 5 - Testing the final product

At this point we now have our base package set, the ARRL toolset, and can build the image such that it will fit on a CDRom. Next step is making sure it works as we expect. If you plan on being able to install the whole package on old systems, you will need to find a few old systems to test it on. Most of my testing to this point has been using emulation, but now comes actual CDRom burning and firing up some old systems I have setting around the place. We need to find some data to enter and test the results produced. Since we provided grig and fldigi, we will need serial cables and sound cables to check connections to ham rigs. Don't forget to test normal ssh and regular linux tasks - can you read a pdf file?

The testing might go great with no problems, or you might find you missed some tools. In either case, you just test till your happy and can move on to the next step - other testers.

Step 6 - Finshing off the project

It all seems to works, so ready to ship? Well yes but not to your intended user group. Along with documenting and backing up all the work you did to make the images, you need to find some friends to do more testing. You have a skill set that most likely is more advanced than the average user. Find some average users and get them to test your release, I guarantee they will find faults that you need to consider and address. It is pretty much like my days as editor and publisher of TCJ, no matter how much time I spent checking articles for errors, readers always managed to find some mistake or problem that slipped past my many reviews. Expect the same with this release - it has problems, you just haven't found them all yet. However, your users will.

Another area that needs work will be documenting what has been done. No live image can stand by itself - it needs documenting. You may have what you think is enough support information on the image, but you will still need on-line information that users can access before loading the disk. For a start you need the system level that the kernel supports or any special limits to what will work. How many langauages and their special fonts are covered - if none, explain how to add more. There can never be too little documentation.

Only when you feel you have done all you can and your other testers haven't found any problems after the last update, try a small sample of intended users. Get some volunteers and have them go at the disk and tell you what went wrong. I have seen projects like this in which the actual users rejected the entire project. Your part of the project went great, it is just somewhere along the line the project drifted too far in the wrong direction. Take their comments, replot your course and try again, knowing this time what steps are needed and that they can achieve a goal without too much effort. For the ARRL project, if the staff isn't part of the initial setup, chances of it failing will be very high. The staff is a must have part of the early testing if it has any chance of working out as planned.

Final Thoughts

This project was done for several reasons; to produce a ARRL handbook disk; to prove to myself that it can be done and without great effort; to guide and show others that the tools are out there and almost anyone can master them. I did say you should be a good command line junkie, and that still stands if you want to produce a great special disk. As you saw using the tutorial steps, anyone could create a disk by following their steps. Would it be good or just a selected release, that really depends on how much experience you have using linux and understanding some of it's finer points.

There were several steps and options I didn't show or use that can enhance the overall apeal of your release - they include launching an app when it boots up. If this is a special release, don't use the default desktop images - create your own. For the ARRL handbook, you would need to have all the handbook data in the user directory, I didn't show that either, but it would need to happen. So it seems what I did should just be enough for you to get a good start and I hope it is as much fun for you as it was for me.

Update

A quick update - I changed the lxde options to xfce and 45 minutes later had a working xfce version of the arrl disk. The image is below with the config settings as well. While adding this update I realized I hadn't covered how long it took me to learn and build these releases. I started on the 24th and completed the lxde on the 27th. So we are talking about 4 days of from 4 to 6 hours of work each day. Keep in mind that alot of the time was writing the articles while my slow internet connection downloaded the packages and updates. If you just skimmed the manual and followed the steps for tutorial2, I suspect you could have an image in a few hours - depending on your internet download speed. Simply put - it is simple....

1/29/14 - discovered the install filesystem on the disk is empty and attempting to install Debian doesn't work. Since I started using the auto/config, I simply had to add this to the file - "--debian-installer live" and don't forget the "\" line extender. Do a "sudo lb clean" and "sudo lb build" - expect it take much longer as it downloads all the "udeb" for doing the install. I re-did the images and included them below. The default is "not" to have any install system in the image. While testing the install, I created two virtual box hard drive images that are included below. These 1.8GB vbi files, have a "arrl" login user, with both the user - "arrl" and "root" - using "arrl1pwd" as the password. Just create a new virtual box setup and point it at the vbi file. Will work on all platforms running virtual box, that mean windows and mac as well.

Ubuntu vs Debian

I have made some remarks about Ubuntu and indicated that I strongly think you should be using Debian stable instead of Ubuntu, even their LTS releases. Why so? First off the use of "testing" by Ubuntu means that they started out with a version that most likely has some bugs in it. In the snapshot of the testing branch from which they build their distro, I know there will be problems. Over the last 6 mounths I have loaded the Debian testing branch and went back to stable, as I just had too many problems with testing to keep on using it. I saw some nice new features, but there were just too many items still not right. So when you use Ubuntu, you are hoping that all those "just not right" problems have been caught and fixed. The real issues is not knowing if they did find and fix them all or not.

As to finding the "issues", I must say I don't like the release cycle they use. I feel they are rushing to meet deadlines and as such end up ignoring problems that should be addressed. When Debian releases their stable release, you know that it works, is stable on 12 different hardware platforms, comes in CD and DVD sizes, with some special images for network installs, live disks, USB sticks and tiny CDRoms. This is after at least 2 years of testing and upgrading on all those hardware platforms. I suspect that testing on some of those odd platforms exposes bugs that got past the x86 folks and thus provides a slightly higher level of stability than just fixing a few bugs found in testing. I also found that I had considerable problems keeping track of which releases are which. The short development cycle creates so many versions that a new users can easily get confused. Keep in mind too that "LTS" only means they will keep trying to find the bugs, it doesn't mean it is realy stable, more of a "trying to keep the un-stable release from crashing".

Another reason to do Debian over Ubuntu is the desktop versions. Originally unbuntu images would fit on CDRoms and thus you had a separate CDRom for each possible desktop. Ubuntu has long since become a DVD only release and yet there are separate and as it turns out totaly independent keepers of other desktops. I went to the xbuntu site to check on the xfce version, only to find that their releases are not in sync with regular ubuntu, it looked like two to three months behind. This says to me that they have to change lots of code once the regular ubuntu is released. Their support life cycle is considerably less than regular ubuntu. With Debian, their stable means that all versions of debian desktops are stable and released as part of the stable images. I especially like their CDRom set, as the first disk contains the desktop of choice. The next two CDRoms contain what is considered to be about the same images you would get if you had done the DVD version. So you have two options in the installation process, use CDrom #1 for a network assisted install - where during the install it will download all the other packages for a full install, or use the CDRom disks #2 and #3 for a complete non-network installation. For older and slower network connects I find these options very important to have. By the way, with Debian you can load all their desktop packages and switch between when logging in - great way to find out which desktop really fits your work style.

So to conclude, I recommend Debian stable over ubuntu or any of their derivities for the absolute insistance of making stable truly stable. They have a simpler development cycle and fewer variations to confuse users. Their on-line resources are probably the largest and most comprehensive selection of hardware platforms and supported packages available. Don't get me wrong, ubuntu has done a great job of getting people to use linux and should be commended for doing that. However I feel strongly that people should know what they are getting and be using Debian stable and not a variation of Debian testing.

Links of interest...

My config/package-lists/my.list.chroot for LXDE.
My config/package-lists/my.list.chroot for XFCE.
My auto/config - no install.
My auto/config with install.
An example of ARRL LXDE i386 500MB image - no install.
An example of ARRL XFCE i386 515MB image - no install.
An example of ARRL LXDE i386 660MB image - with install.
An example of ARRL XFCE i386 670MB image - with install.
An example of ARRL LXDE i386 1.8GB virtual box disk image.
An example of ARRL XFCE i386 1.8GB virtual box disk image.
List of sections in "wheezy" repositories.
Debian's Live CD WIKI page - great place to start!
Debian Live systems main page - docs and image builder.
Debian Live-build manual page - all you need to know about live images!