A bit more online reading and I found more configuration settings that look good when working with containers. So have now compiled the 4.4.45 kernel with these turned on.
Typing from my handwritten notes, here they are:
Device Drivers ->
[*] Support for multiple instances of devpts CONFIG_DEVPTS_MULTIPLE_INSTANCES
[*] Network device support ->
-*- Network core driver support ->
<*> Virtual ethernet pair device CONFIG_VETH
<*> MAC-VLAN support CONFIG_MACVLAN
In my notes, I have written that for the multiple devpts, mount with "mount -o newinstance ..."
CONFIG_VETH supports a local ethernet tunnel.
A progress note: my "next generation" Quirky, codenamed "Easy", boots to the desktop. Details to follow.
4Comments I will post this while it is fresh in my mind!
For years I have been providing Quirky as a ready-made image for an 8GB (or greater) USB Flash stick or SD-card.
To cater for the fact that "8GB" sticks actually have quite different amounts of memory, I create two partitions a bit smaller than the capacity of the drive. The first is a 512MB fat32 partition and the second is a f2fs or ext4 partition that does not quite fill the drive.
After having written Quirky to the stick, I then copy it back as an image file, only copying to the end of the second partition.
Now for the first problem. I am using a GUID partition table, and the way they work is there is a primary GPT at the start of the drive, and a secondary (backup) GPT at the very physical end of the drive.
In my scenario, I am creating an image file with the secondary GPT missing.
Now, a user downloads my image file and writes it to a 8GB stick. If it is a new stick, or one that has been wiped, no problem, Linux will see only the primary GPT and use that.
The problem arises if the stick has been used before, for Quirky or some other Linux distro, in which case it will have a GPT at the end of the drive, or rather most likely will. Note, GPT is usually required for booting on computers with UEFI-firmware.
Linux, and indeed the 'fdisk' utility, both get confused here. This is where it gets murky. Today I discovered that if the image file is written to a drive that has larger capacity than the one I created the image from, all is well. Linux, and fdisk, determine that the secondary GPT is faulty, and use the first one. Fine, that is what we want.
The murkiness comes in when write the image file to a drive that is less, maybe still a nominal 8GB but with less capacity than mine.
When I did this, and replugged the drive, no partition icon showed up on the desktop. Hmmm, I looked in /sys/block/sdb (my flash stick was sdb) and there was a sdb1 but it was reported as not having a filesystem, and the size was completely wrong.
"fdisk -l /dev/sdb" reported that the primary GPT is faulty, and it is using the secondary GPT!
Why? The primary GPT has a pointer to where the secondary GPT is supposed to be. If that pointer is somewhere within the drive, that will be case A. If that pointer is beyond the physical end of the drive, that will be case B. In the latter case, Linux kernel and fdisk then conclude the primary partition is invalid.
This is incredible, but does seem to be the situation.
The new Easy Linux that I am developing, has only one 519MB fat32 partition, total image size of 520MB. If I create this on a flash stick that is smaller than what any user will have, all will be well. I could create it on a 1GB drive, if I had one. I have to start asking around, see if someone has an old one, that is not yet broken.
Easy Linux is actually intended to run on a Flash stick as small as 2GB (linuxcbon will be happy!), though 4GB or more is more useful.
The way I am designing it, at first bootup it will create a ext4 partition to fill the drive, and at the same time create a correct secondary GPT. This will happen automatically at first bootup.
That's the plan anyway.
1 Comment I might as well give this effort a name, calling it "Easy Containers". I am having a go at creating containers for Quirky, using grassroots utilities, rather than the heavy-duty packaged techniques such as LXC or Docker.
Thanks to jamesbond and heaps of posts on the Internet, the basics of Easy Containers is looking good. Well, it is fun anyway.
The container is to be a layered filesystem, using aufs or overlayfs, with the operation system SFS file underneath, and the rw layer either a folder, tmpfs or a partition. The SFS would be your Quirky, Puppy, or Fatdog .sfs, straight from the live-CD.
Layered root filesystem
Setting up the layers is straightforward, for example using overlayfs:
# mount -t squashfs -o loop,noatime q.sfs q_ro/ro1
# mount -t overlay -o lowerdir=q_ro/ro1,upperdir=q_rw/rw1,workdir=q_rw/work1 overlay q_top1
Where q_ro/ro1, q_rw/rw1, q_rw/work1 and q_top1 are just folders.
After doing this, I made some changes on top, that is, in ./q_top1:
With Quirky, /sys and /run do not exist, so created them.
I want to avoid any "mount -bind ..." or "mount -rbind ...", so instead of doing that for /dev, I created static device nodes in q_top1/dev -- you can copy these out of initrd-tree/dev in Woof.
Don't copy /dev/* from the host system, it has all sorts of stuff mounted. I also manually created folders /dev/pts and /dev/shm.
The online advice is to execute "mount --rbind /dev ./q_top1/dev" prior to the chroot, and I don't know what the downside is with just having static device nodes. A device node that is in-use in the host, what of that? -- having a copy of the node in the rootfs, is that unusable? In what situation would this matter?
chroot into rootfs
To chroot into q_top1, just do this:
# cp -f /etc/resolv.conf ./q_top1/etc/resolv.conf
# unshare -piumU --fork --mount-proc --map-root-user --setgroups=deny env -i TERM=xterm DISPLAY=:0 /bin/busybox chroot ./q_top1 /bin/sh
Then inside the chroot, do this:
# mount -t proc proc /proc
# mount -t tmpfs shmfs /dev/shm
You can then run leafpad, geany, rox, seamonkey, etc. However, there is a bug, the first time an X app is started, it crashes with this error message:
"BadShmSeg (invalid shared segment parameter)"
Thereafter, X apps start OK.
The CLI full-screen 'mp' text editor, that uses ncurses, works. As it is not an X app, it does not have that crashing problem.
It is interesting that ":0" works inside the chroot, some online docs state that won't work, and you need to use other methods, such as the following...
This is probably a better way to do it, as I aim to make the chroot environment more isolated. It is odd, when I tested this, the above-mentioned "BadShmSeg" crash did not occur, yet just now did it the same way, and that error is back. So, there is a mystery here.
I read if you remove the "-nolisten tcp" on the Xorg invocation line in /usr/bin/xwin, and run "xhost +localhost", you can then set DISPLAY=localhost:0 inside the chroot filesystem.
Two problems with this. Firstly, it is considered to be a security risk, and xhost is considered to not be a very secure safeguard. Second, I could not get it to work, the value of DISPLAY is reported as invalid inside the chroot environment.
However, there is another, secure way of doing it. You need the 'socat' utility, which is available as a DEB, if you are running Ubuntu or Debian based puplet. I found out how to do it here:
Create a little script, I have called it 'ec-chroot':
if ! pidof socat; then
# -ly causes log to syslog.
socat -ly -d -d TCP-LISTEN:6000,fork,bind=localhost UNIX-CONNECT:/tmp/.X11-unix/X0 &
#needed for internet access...
cp -f /etc/resolv.conf ./q_top1/etc/resolv.conf
#--map-root-user needed if have -U, otherwise env will fail.
unshare -piumU --fork --mount-proc --map-root-user --setgroups=deny env -i TERM=xterm DISPLAY=localhost:0 /bin/busybox chroot ./q_top1 /bin/sh
Once inside the chroot, run these:
# mount -t proc proc /proc
# mount -t tmpfs shmfs /dev/shm
...and leafpad starts without crashing, or rather, it did first time I tested this, not today!
SeaMonkey works too, and accesses the Internet, except that churns out dbus errors. I tested with starting udevd beforehand, but no good. SM still runs. I could compile SM without dbus support, to avoid the problem. Or, could figure out why dbus doesn't work.
I have tested running two containers simultaneously, both with the underlying q.sfs. Got them running right now, one is running geany, the other leafpad, no problem with these windows.
Just an observation, not quantified, but I think there is a slight startup delay of the X app when using the DISPLAY=localhost:0 method.
What to do about that "BadShmSeg"? This question has come up many times on the Internet, but haven't found a definitive solution. My wild guess is that there is some shared-memory that Xorg is using in the host, that is not valid in the chroot environment, and Xorg detects that when the first X app is run. There are many online posts of a fix for Qt-based apps, using "export QT_X11_NO_MITSHM=1", however, that is not a fix, it is just avoiding the problem.
A less-than-satisfactory solution would be to run a little do-nothing X app on first entry to the chroot environment.
It is most pertinent to now ask, how secure is this? I have used 'unshare' to unshare everything, have not run "mount --bind" or "mount --rbind", "env -i" has removed most environment variables. What more can we do to improve security?
Here is something interesting to think about. Inside the container:
sh-4.3# mkdir /mnt/sda2
sh-4.3# busybox mount -t ext4 /dev/sda2 /mnt/sda2
mount: permission denied (are you root?)
...which is good from a security viewpoint. But why doesn't it work, and can it be got around?
This is such an interesting topic, so I will start a Forum thread for feedback.
No comments Ha ha, I thought that jamesbond, one of the main guys behind Fatdog, would have already studied containers and simple ways to implement them!
I have already conducted simple experiments on "grassroots", or build-your-own" containers, using 'env', 'unshare' and 'chroot', haven't posted to the blog yet, as need to do more investigation.
However, jamesbond has already done all the work on a "grassroots" implementation. This is an email he sent me today:
RE: Container - you may want to see how Fatdog supports containers, here: http://distro.ibiblio.org/fatdog/web/faqs/sandbox.html
I have merged the standard sandbox into Woof-CE, so newer puppies should have sandbox.sh built-in.
As you are probably aware by now, "container" is full of hype and stuff. For Windows users these kind of things are probably "new" and "interesting", but for Linux they're re-packaged old stuff. A container is just "chroot on steroid" and there are many ways to achieve it.
The most basic way you can just use "unshare" command from recent core-utils (in fact, sandbox.sh from Fatdog uses this if it's available instead of just standard "chroot"). If you want to run a process inside an existing namespace, you can use "nsenter" (also from core-utils).
Of course, the basic tools for Linux container is LXC, and this is what I use for sandbox-lxc.sh/rw-sandbox-lxc.sh.
This script and all other Fatdog scripts are available here: http://distro.ibiblio.org/fatdog/packages/710/fatdog-scripts-710.0-noarch-1.txz
For reference, there is a webpage that info on a grassroots build-your-own approach, using 'unshare', 'env' and 'chroot':
Note, 'unshare', 'env' and 'chroot' are all busybox applets, though "full" versions are available elsewhere.
1 Comment My venture into containers continues. Yesterday I posted some preliminary notes, and a quick look at Firejail:
LXC is, as far as I can determine, the officially supported Linux kernel mechanism for containers. The website for LXC is here:
None of my previous kernels for Quirky have namespaces and cgroups support. For my experiments with namespaces and cgroups, I have compiled the 4.4.44 kernel with these settings:
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CHECKPOINT_RESTORE is not set
Note that for cgroups, I only enabled CONFIG_CGROUP_DEVICE, as that is the only one that intersts me, for now. Was that a bad decision? Maybe should have enabled more, but can do so in future.
The Ubuntu DEB has a huge number of dependencies, so I compiled lxc version 2.0.6 from source, using this very cutdown config:
# ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --build=x86_64-pc-linux-gnu --with-distro=slackware --disable-lua --enable-bash --disable-python --disable-selinux --disable-apparmor
This is the report:
- compiler: gcc
- distribution: slackware
- init script type(s):
- rpath: no
- GnuTLS: yes
- Bash integration: yes
- Apparmor: no
- Linux capabilities: yes
- seccomp: no
- SELinux: no
- cgmanager: no
- lua: no
- python3: no
- examples: yes
- API documentation: no
- user documentation: no
- tests: no
- mutex debugging: no
- Logs in configpath: no
After installation of LXC, I ran "lxc-checkconfig", and it reported:
--- Namespaces ---
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
Multiple /dev/pts instances: missing
--- Control groups ---
Cgroup namespace: required
Cgroup device: enabled
Cgroup sched: missing
Cgroup cpu account: missing
Cgroup memory controller: missing
Cgroup cpuset: missing
--- Misc ---
Veth pair device: missing
Advanced netfilter: enabled
FUSE (for use with lxcfs): enabled
--- Checkpoint/Restore ---
checkpoint restore: missing
File capabilities: enabled
What is not shown above, is that the line "Cgroup namespace: required" has the "required" in red colour text. Indicating that something is amiss.
After googling around, I couldn't determine the exact cause of this "required", except for lots of people asking the same question, but did find this statement "This one should be fine to ignore", here:
Anyway, a quick little test, to see if can create a basic container. It has to be created from files from the host system, and for this there are templates:
# ls /usr/share/lxc/templates
lxc-alpine lxc-centos lxc-fedora lxc-oracle lxc-sshd
lxc-altlinux lxc-cirros lxc-gentoo lxc-plamo lxc-ubuntu
lxc-archlinux lxc-debian lxc-openmandriva lxc-slackware lxc-ubuntu-cloud
lxc-busybox lxc-download lxc-opensuse lxc-sparclinux
I will give the busybox template a go:
# lxc-create -n mycontainer -t busybox
lxc-create: lxccontainer.c: do_create_container_dir: 972 No such file or directory - failed to create container path /var/lib/lxc/mycontainer
lxc-create: tools/lxc_create.c: main: 318 Error creating container mycontainer
...hmmm. I manually created /var/lib/lxc, then tried again:
# lxc-create -n mycontainer -t busybox
setting root password to "root"
'dropbear' ssh utility installed
Yes, works, and it is even chrootable:
# chroot /var/lib/lxc/mycontainer/rootfs /bin/sh
bin etc lib mnt root selinux tmp var
dev home lib64 proc sbin sys usr
However, there are a lot of lxc-* utilities, so can use those to get into my container.
Using this webpage as a getting-started guide:
It seems that we have to "start" a container first, before can log into it:
# lxc-start -n mycontainer -d
lxc-start: tools/lxc_start.c: main: 360 The container failed to start.
lxc-start: tools/lxc_start.c: main: 362 To get more details, run the container in foreground mode.
lxc-start: tools/lxc_start.c: main: 364 Additional information can be obtained by setting the --logfile and --logpriority options.
Using the fireground option:
# lxc-start -n mycontainer -F
lxc-start: cgroups/cgfs.c: cgfs_init: 2359 cgroupfs failed to detect cgroup metadata
lxc-start: start.c: lxc_spawn: 1093 Failed initializing cgroup support.
...cgroups. I am at the bottom of the learning curve with cgroups, so had better readup on that next. Obviously, there is something I have to initialize.
Well, at the bottom of the learning curve with namespaces and containers also!
But, one step at a time, will get there.
Pages:       ...