Kerrighed Trunk 4977 Installation on Debian GNU/Linux

There are 5 steps to do basic research on cluster computing with kerrighed: system and applications installation, building, testing, profiling cluster, and comparing the results.

Requirements

Hardware

  • Minimum 2 x86 machine connected to a network
  • Storage media with enough space that will be shared to our cluster

Software

  • Debian GNU/Linux 6.0 (squeeze)
  • Kerrighed with Subversion Trunk 4977. Linux kernel 2.6.20 and kerrighed source are included.
  • Access to Debian packages repository, local or network/Internet
  • Root access or
    sudo

    to the system for packages installation dan system modification

  • Additional and development packages: build-essential, bzip2, rsync, xmlto

automake, libtool, pkg-config, lsb-release, libncurses5-dev1)

kernel-package if you want to package the kernel into debian package

Compilation and Installation Steps

  1. sha256sum -c krg-20090203.tar.bz2.sha256
  2. Change directory to /usr/src/
    cd /usr/src/
  3. tar jxf /home/stwn/krg-20090203.tar.bz2
  4. cd krg-20090203
  5. ./autogen.sh
  6. Configure source with your system, don't –enable-tests, it's broken
    ./configure --enable-tests
    ********************************************************************
      Kerrighed tests configuration is now complete
    ********************************************************************
    
        - apps                               : yes
        - proc                               : yes
        - ktp                                : yes
        - benchmark                          : yes
    ********************************************************************
      Kerrighed configuration is now complete
    ********************************************************************
    
        - Sources dir            : /usr/src/krg-20090203
        - Kernel source dir      : /usr/src/krg-20090203/kernel
        - Kernel version         : 2.6.20-krg (already patched)
        - Kernel configuration   : none. Run manually 'make *config' in kernel source tree
        - Kerrighed module       : yes
        - libkerrighed           : yes
        - Kerrighed tools        : yes
        - Kerrighed tests        : yes
    
    ********************************************************************
  7. Save default configuration for kerrighed
    make defconfig
  8. Configure kernel if you want to add some network drivers you use in cluster
    # cd kernel
    # make menuconfig

    I added support for many network cards available in 2.6.20 as module and configure built-in kernel for network card I use in cluster like sis900[*] e1000e[*] and [M] for others type which aren't selected as default. [ Device Drivers - Network device support - Ethernet (10 or 100Mbit|Ethernet 1000Mbit ] FIXME

  9. Compile the kernel
    cd ..
    make kernel

    or

    sudo vim /etc/kernel-pkg.conf
    make-kpkg --initrd --revision=2.6.20-svn-022009-1 kernel_image kernel_headers
  10. Install kernel
    make kernel-install

    or

    cd ..
    sudo dpkg -i linux-image-2.6.20-krg_2.6.20-svn-022009-1_i386.deb
  11. Compile kerrighed modules, tool and libkerrighed
    make

    NOTES: kerrighed modules are compiled with version 2.6.20-krg, if you want to append-version with make-kpkg, just change the version in kerrighed-2.2.1.

  12. Install kerrighed tool and libkerrighed
    make install
  13. Remove default kernel in Debian, if you want to :)
    dpkg -P linux-image-2.6.26-1-686 linux-image-2.6-686
  14. Reboot

Configuration

terdapat satu komputer yang akan menyimpan image sistem dan aplikasi dan juga media penyimpan dan lain-lain seperti repo, wiki, demo, hasil

Head Node

Kerrighed

  • Configure network card, if you haven't configured it yet
    vim /etc/network/interfaces
    auto lo eth1
    iface lo inet loopback
    
    iface eth1 inet static
            address 192.168.0.11
            netmask 255.255.255.0
  • vim /etc/default/kerrighed
    ENABLE=true
    # If kerrighed has this feature
    LEGACY_SCHED=true
  • vim /etc/fstab
    configfs        /config         configfs defaults       0       0
  • mkdir /config
  • /etc/kerrighed_nodes
    session=1
    nbmin=2
    192.168.0.11:11:eth1
    192.168.0.12:12:eth0
    192.168.0.13:13:eth0
    192.168.0.14:14:eth0
    192.168.0.15:15:eth0
  • Set boot parameter in /boot/grub/menu.lst, add “session_id=1 node_id=1”
    title           Debian GNU/Linux, kernel 2.6.20-krg
    root            (hd0,3)
    kernel          /boot/vmlinuz-2.6.20-krg root=/dev/sda4 ro session_id=1 node_id=1
    initrd          /boot/initrd.img-2.6.20-krg
  • Set /etc/hosts. NFS and MPI will resolve hostname to IP, so make sure it sets correctly.
  • Reboot

TFTP & PXE Server

  • Install atftpd and syslinux2)
    apt-get install atftpd syslinux
  • mkdir /srv/tftpboot
  • cp /usr/lib/syslinux/pxelinux.0 /srv/tftpboot/
  • cd /srv/tftpboot/
  • mkdir pxelinux.cfg
  • vim pxelinux.cfg/default
    timeout 5
    # prompt 1
    
    default kerrighed
    
    label kerrighed
    kernel vmlinuz
    append initrd=initrd root=/dev/nfs nfsroot=192.168.0.11:/NFSROOT/kerrighed ip=dhcp ro session_id=1,rsize=4096,wsize=4096
    
    label local
    localboot 0

    Remove initrd, if you found out a hang state with error message “Wait for root filesystem”. It's in initramfs, I don't know it's buggy or something.

  • ln -s /boot/vmlinuz-2.6.20-krg /srv/tftpboot/vmlinuz
  • sudo ln -s /boot/initrd.img-2.6.20-krg /srv/tftpboot/initrd
  • vim /etc/default/atftpd
    USE_INETD=true
    OPTIONS="--tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --verbose=5 /srv/tftpboot"
  • vim /etc/inetd.conf
    tftp            dgram   udp4    wait    nobody /usr/sbin/tcpd /usr/sbin/in.tftpd --tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --verbose=5 /srv/tftpboot
  • /etc/init.d/openbsd-inetd restart

DHCP Server

  • apt-get install dhcp3-server
  • vim /etc/dhcp3/dhcpd.conf
    ddns-update-style none;
    option domain-name "lskk.ee.itb.ac.id";
    default-lease-time 600;
    max-lease-time 7200;
    
    log-facility local7;
    
    option dhcp-max-message-size 2048;
    # use-host-decl-names on;
    # deny unknown-clients;
    deny bootp;
    
    next-server 192.168.0.11;
    
    subnet 192.168.0.0 netmask 255.255.255.0 {
            range 192.168.0.12 192.168.0.15;
            filename "/srv/tftpboot/pxelinux.0";
            option root-path "192.168.0.11:/NFSROOT/kerrighed";
    }
  • /etc/init.d/dhcp3-server start

NFS Server & NFSROOT

  • apt-get install unfs3 | apt-get install nfs-kernel-server (/etc/init.d/unfs3 stop)
  • apt-get install debootstrap
  • mkdir /NFSROOT
  • debootstrap lenny /NFSROOT/kerrighed http://192.168.0.10/stable
  • chroot /NFSROOT/kerrighed/
  • passwd
  • mount -t proc none /proc/
  • apt-get install dhcp3-common nfs-common nfsbooted3)
  • vim /etc/fstab
    /dev/hda        none   swap   sw            0 0
    none            /proc           proc     defaults      0 0
    none            /var/run        tmpfs    defaults      0 0
    none            /var/lock       tmpfs    defaults      0 0
    none            /var/log        tmpfs    defaults      0 0
    none            /tmp            tmpfs    defaults      0 0
    none            /dev/pts        tmpfs    defaults      0 0
    configfs        /config         configfs defaults      0 0
    192.168.0.11:/media/storage   /media/storage   nfs     rw,hard,nolock      0 0
    192.168.0.11:/NFSROOT/home    /home            nfs     rw,hard,nolock     0 0
  • mkdir /config
  • mkdir /media/storage
  • sudo chown -R stwn /media/storage/ (or change to group render?)
  • vim /etc/hosts
    127.0.0.1       localhost
    192.168.0.11    krg-01
    192.168.0.12    krg-02
    192.168.0.13    krg-03
    192.168.0.14    krg-04
    192.168.0.15    krg-05
  • sudo cp -r /usr/src/* /NFSROOT/kerrighed/usr/src/
  • chroot /NFSROOT/kerrighed/
  • apt-get install busybox initramfs-tools klibc-utils libklibc libvolume-id0 udev
  • apt-get install automake make build-essential
  • cd /usr/src/krg-20090203/
  • make install
  • dpkg -i linux-image-2.6.20-krg_2.6.20-1_i386.deb
  • adduser stwn
  • sudo cp /etc/kerrighed_nodes /NFSROOT/kerrighed/etc/
    session=1
    nbmin=2
    192.168.0.11:11:eth1
    192.168.0.12:12:eth0
    192.168.0.13:13:eth0
    192.168.0.14:14:eth0
    192.168.0.15:15:eth0
  • vim /etc/exports
    /NFSROOT/kerrighed 192.168.0.0/24(ro,async,no_root_squash,no_subtree_check)
    /media/storage 192.168.0.0/24(rw,async,wdelay,no_root_squash,no_subtree_check)
    /NFSROOT/home 192.168.0.0/24(rw,async,wdelay,no_root_squash,no_subtree_check)
  • mkdir /media/storage
  • mkdir /NFSROOT/home
  • sudo /etc/init.d/nfs-kernel-server restart
NFS akan mencoba me-resolve alamat IP ke domain jika ada entri di /etc/hosts<code>/krg-system/ krg*(rw,no_root_squash,no_subtree_check,sync,fsid=1)
Apa yang dilakukan di sistem server network booting, lakukan juga pada NFSROOT

Compute Node

Boot with PXE or gPXE, suite to your network card. Download compiled gPXE/etherboot from rom-o-matic.net
tg3, sis900, rtl8139

Testing

Kerrighed

groupadd nobody
fork-test

Cpuburn

apt-get install cpuburn
chroot /NFSROOT/kerrighed/
apt-get install cpuburn
burnMMX & # 3-4 times :-)

Blender

  • apt-get install blender
  • chroot /NFSROOT/kerrighed
  • apt-get install blender
  • skrip render
  • contoh model scene
  • buka berkas model, set direktori keluaran hasil render
Blender+OpenMP
copy install/ blender to $HOME
apt-get install libjpeg62
mencoder -mf on:w=640:h=480:fps=12 -ovc copy -o output.avi \*.jpg

MPI

  • Setting up ssh in head-node and system inside NFSROOT
    ssh-keygen
    cp .ssh/id_rsa.pub /NFSROOT/kerrighed/home/stwn/.ssh/authorized_keys
    chroot /NFSROOT/kerrighed/
    su stwn
    ssh-keygen
    <Ctrl-D>
    cp /NFSROOT/kerrighed/home/stwn/.ssh/id_rsa.pub /home/stwn/.ssh/authorized_keys
  • Make sure /etc/hosts is set with hosts and their IPs, there is a resolving process during mpirun
  • Set variable P4_RSHCOMMAND to ssh
    P4_RSHCOMMAND=ssh
  • Create a machine list file
    krg-node0
    krg-node1
  • apt-get install mpich-bin libmpich1.0-dev
  • mpicc mm-mpi.c -o mm-mpi
  • Run MPI program with mpirun
    mpirun -np 4 ./Pi
  • cp id_rsa.pub authorized_keys
  • sudo cp /home/stwn/.ssh/authorized_keys /NFSROOT/home/stwn/.ssh/id_rsa.pub
  • sudo cp /home/stwn/.ssh/id_rsa.pub /NFSROOT/kerrighed/home/stwn/.ssh/authorized_keys
  • cp /home/stwn/.ssh/id_rsa.pub /home/stwn/.ssh/authorized_keys
  • chroot /NFSROOT/kerrighed
  • ssh-keygen
  • mkdir /home/stwn/.ssh
  • sudo apt-get install mpich-bin libmpich1.0-dev
  • mkdir /media/storage/demo
  • vim mm-mpi.c
  • vim machinefile
  • mpicc mm-mpi.c -o mm-mpi
  • krgcapset -d +CAN_MIGRATE
  • export P4_RSHCOMMAND=ssh
  • mpirun -machinefile machinefile -np 16 ./mm-mpi

vim cpi.c

gcc cpi.c
vim Pi.c 
mpicc Pi.c
./a.out 
mpirun -np 24 ./a.out 
mpirun -np 2 ./a.out 
mpirun -np 4 ./a.out 

OpenMP

OpenMP support in Kerrighed is unknown, people from NCHC said there is no support for OpenMP in Kerrighed
based-on reply email from Renauld Lottiaux
  • apt-get install gcc-4.2
  • vim mm-openmp.c
  • Compile it
    gcc-4.2 -fopenmp mm-openmp.c -o mm-openmp
  • export OMP_NUM_THREADS=10
  • Run mm-openmp
    time ./mm-openmp

Loop

This simple loop program will test the process migration feature of Kerrighed with kernel 2.6.20.

  • First, login to your console
  • Set Kerrighed capability to CAN_MIGRATE
    krgcapset -d +CAN_MIGRATE
  • Create a simple program containing infinite loop and compile it
  • Copy the exact program to the other nodes with the same absolute directory location
  • Run loop program in one node
    loop &
    loop &
    loop &
    loop &
  • Akan ada pesan sistem
    send_kerrighed_signal: 8 (events/0) -> 820741 (loop)
  • Untuk memigrasikan proses secara manual gunakan perintah
    migrate [process-id] [node]

Kerrighed Commands List

Status

krgadm nodes status

Start-Stop

krgadm cluster start
krgadm cluster reboot/poweroff
krgadm nodes poweroff -n 13

Lain-Lain

top ps free 'cat /proc/*'

kerrighed_nodes
kerrighed_session

Problems

  • I run 4 blender process in node and has set capability set to CAN_MIGRATE, but none of this 4 process migrated. They like hung up on something, and some messages appeared:
    Null mapping count, non null mapping address : [mem-addr]

    Blender uses relatively big data and process, and is this the reason why blender process could not be migrated to another nodes? strongly connected?

  • A program called cpuburn that does FPU calculations and check its result did well, but it still give us some messages
    Null mapping count, non null mapping address : [mem-addr]
  • The message is on shm_memory_linker.c, this deal with kerrighed's container?
  • Programs that run on head node could not migrated to another node
  • Muncul pesan pada node ketika melakukan network booting
    Gave up waiting for root device. Common problems: | Waiting for root filesystem  
    Boot args
  • hati-hati masalah konflik paket untuk versi dan juga dependensinya

Tips

  • cat /proc/cmdline untuk mengetahui boot parameter Linux
  • lakukan
    mkinitramfs -k `uname -r` -o initrd-2.6.20-krg

    jika ingin menghasilkan initramfs secara manual

Reading List

1)
if you want to configure kernel with make menuconfig
2)
dosfstools mtools syslinux syslinux-common
3)
dhcp-client libevent1 libgssglue1 libkeyutils1 libkrb53 libldap-2.4-2 libnfsidmap2 librpcsecgss3 nfs-common nfsbooted portmap ucf
 
doc/kerrighed/4977.txt · Last modified: 2013/02/28 06:34 by stwn · [Old revisions]
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki