====== Kerrighed Trunk 4977 Installation on Debian GNU/Linux ====== There are 5 steps to do basic research on cluster computing with kerrighed: system and applications installation, building, testing, profiling cluster, and comparing the results. ===== Requirements ===== ==== Hardware ==== * Minimum 2 x86 machine connected to a network * Storage media with enough space that will be shared to our cluster ==== Software ==== * Debian GNU/Linux 6.0 (squeeze) * Kerrighed with Subversion Trunk 4977. Linux kernel 2.6.20 and kerrighed source are included. * Access to Debian packages repository, local or network/Internet * Root access or sudo to the system for packages installation dan system modification * Additional and development packages: build-essential, bzip2, rsync, xmlto automake, libtool, pkg-config, lsb-release, libncurses5-dev((if you want to configure kernel with make menuconfig)) kernel-package if you want to package the kernel into debian package ===== Compilation and Installation Steps ===== - sha256sum -c krg-20090203.tar.bz2.sha256 - Change directory to /usr/src/cd /usr/src/ - tar jxf /home/stwn/krg-20090203.tar.bz2 - cd krg-20090203 - ./autogen.sh - Configure source with your system, don't --enable-tests, it's broken./configure --enable-tests******************************************************************** Kerrighed tests configuration is now complete ******************************************************************** - apps : yes - proc : yes - ktp : yes - benchmark : yes ******************************************************************** Kerrighed configuration is now complete ******************************************************************** - Sources dir : /usr/src/krg-20090203 - Kernel source dir : /usr/src/krg-20090203/kernel - Kernel version : 2.6.20-krg (already patched) - Kernel configuration : none. Run manually 'make *config' in kernel source tree - Kerrighed module : yes - libkerrighed : yes - Kerrighed tools : yes - Kerrighed tests : yes ******************************************************************** - Save default configuration for kerrighedmake defconfig - Configure kernel if you want to add some network drivers you use in cluster# cd kernel # make menuconfigI added support for many network cards available in 2.6.20 as module and configure built-in kernel for network card I use in cluster like sis900[*] e1000e[*] and [M] for others type which aren't selected as default. [ Device Drivers - Network device support - Ethernet (10 or 100Mbit|Ethernet 1000Mbit ] FIXME - Compile the kernelcd .. make kernelorsudo vim /etc/kernel-pkg.conf make-kpkg --initrd --revision=2.6.20-svn-022009-1 kernel_image kernel_headers - Install kernelmake kernel-installorcd .. sudo dpkg -i linux-image-2.6.20-krg_2.6.20-svn-022009-1_i386.deb - Compile kerrighed modules, tool and libkerrighedmakeNOTES: kerrighed modules are compiled with version 2.6.20-krg, if you want to append-version with make-kpkg, just change the version in kerrighed-2.2.1. - Install kerrighed tool and libkerrighedmake install - Remove default kernel in Debian, if you want to :)dpkg -P linux-image-2.6.26-1-686 linux-image-2.6-686 - Reboot ===== Configuration ===== terdapat satu komputer yang akan menyimpan image sistem dan aplikasi dan juga media penyimpan dan lain-lain seperti repo, wiki, demo, hasil ==== Head Node ==== === Kerrighed === * Configure network card, if you haven't configured it yetvim /etc/network/interfacesauto lo eth1 iface lo inet loopback iface eth1 inet static address 192.168.0.11 netmask 255.255.255.0 * vim /etc/default/kerrighedENABLE=true # If kerrighed has this feature LEGACY_SCHED=true * vim /etc/fstabconfigfs /config configfs defaults 0 0 * mkdir /config * /etc/kerrighed_nodessession=1 nbmin=2 192.168.0.11:11:eth1 192.168.0.12:12:eth0 192.168.0.13:13:eth0 192.168.0.14:14:eth0 192.168.0.15:15:eth0 * Set boot parameter in /boot/grub/menu.lst, add "session_id=1 node_id=1"title Debian GNU/Linux, kernel 2.6.20-krg root (hd0,3) kernel /boot/vmlinuz-2.6.20-krg root=/dev/sda4 ro session_id=1 node_id=1 initrd /boot/initrd.img-2.6.20-krg * Set /etc/hosts. NFS and MPI will resolve hostname to IP, so make sure it sets correctly. * Reboot === TFTP & PXE Server === * Install atftpd and syslinux((dosfstools mtools syslinux syslinux-common))apt-get install atftpd syslinux * mkdir /srv/tftpboot * cp /usr/lib/syslinux/pxelinux.0 /srv/tftpboot/ * cd /srv/tftpboot/ * mkdir pxelinux.cfg * vim pxelinux.cfg/defaulttimeout 5 # prompt 1 default kerrighed label kerrighed kernel vmlinuz append initrd=initrd root=/dev/nfs nfsroot=192.168.0.11:/NFSROOT/kerrighed ip=dhcp ro session_id=1,rsize=4096,wsize=4096 label local localboot 0Remove initrd, if you found out a hang state with error message "Wait for root filesystem". It's in initramfs, I don't know it's buggy or something. * ln -s /boot/vmlinuz-2.6.20-krg /srv/tftpboot/vmlinuz * sudo ln -s /boot/initrd.img-2.6.20-krg /srv/tftpboot/initrd * vim /etc/default/atftpdUSE_INETD=true OPTIONS="--tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --verbose=5 /srv/tftpboot" * vim /etc/inetd.conftftp dgram udp4 wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd --tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --verbose=5 /srv/tftpboot * /etc/init.d/openbsd-inetd restart === DHCP Server === * apt-get install dhcp3-server * vim /etc/dhcp3/dhcpd.confddns-update-style none; option domain-name "lskk.ee.itb.ac.id"; default-lease-time 600; max-lease-time 7200; log-facility local7; option dhcp-max-message-size 2048; # use-host-decl-names on; # deny unknown-clients; deny bootp; next-server 192.168.0.11; subnet 192.168.0.0 netmask 255.255.255.0 { range 192.168.0.12 192.168.0.15; filename "/srv/tftpboot/pxelinux.0"; option root-path "192.168.0.11:/NFSROOT/kerrighed"; } * /etc/init.d/dhcp3-server start === NFS Server & NFSROOT === * apt-get install unfs3 | apt-get install nfs-kernel-server (/etc/init.d/unfs3 stop) * apt-get install debootstrap * mkdir /NFSROOT * debootstrap lenny /NFSROOT/kerrighed http://192.168.0.10/stable * chroot /NFSROOT/kerrighed/ * passwd * mount -t proc none /proc/ * apt-get install dhcp3-common nfs-common nfsbooted((dhcp-client libevent1 libgssglue1 libkeyutils1 libkrb53 libldap-2.4-2 libnfsidmap2 librpcsecgss3 nfs-common nfsbooted portmap ucf)) * vim /etc/fstab/dev/hda none swap sw 0 0 none /proc proc defaults 0 0 none /var/run tmpfs defaults 0 0 none /var/lock tmpfs defaults 0 0 none /var/log tmpfs defaults 0 0 none /tmp tmpfs defaults 0 0 none /dev/pts tmpfs defaults 0 0 configfs /config configfs defaults 0 0 192.168.0.11:/media/storage /media/storage nfs rw,hard,nolock 0 0 192.168.0.11:/NFSROOT/home /home nfs rw,hard,nolock 0 0 * mkdir /config * mkdir /media/storage * sudo chown -R stwn /media/storage/ (or change to group render?) * vim /etc/hosts 127.0.0.1 localhost 192.168.0.11 krg-01 192.168.0.12 krg-02 192.168.0.13 krg-03 192.168.0.14 krg-04 192.168.0.15 krg-05 * sudo cp -r /usr/src/* /NFSROOT/kerrighed/usr/src/ * chroot /NFSROOT/kerrighed/ * **apt-get install busybox initramfs-tools klibc-utils libklibc libvolume-id0 udev** * **apt-get install automake make build-essential** * **cd /usr/src/krg-20090203/** * **make install** * dpkg -i linux-image-2.6.20-krg_2.6.20-1_i386.deb * adduser stwn * sudo cp /etc/kerrighed_nodes /NFSROOT/kerrighed/etc/session=1 nbmin=2 192.168.0.11:11:eth1 192.168.0.12:12:eth0 192.168.0.13:13:eth0 192.168.0.14:14:eth0 192.168.0.15:15:eth0 * vim /etc/exports/NFSROOT/kerrighed 192.168.0.0/24(ro,async,no_root_squash,no_subtree_check) /media/storage 192.168.0.0/24(rw,async,wdelay,no_root_squash,no_subtree_check) /NFSROOT/home 192.168.0.0/24(rw,async,wdelay,no_root_squash,no_subtree_check) * mkdir /media/storage * mkdir /NFSROOT/home * sudo /etc/init.d/nfs-kernel-server restart NFS akan mencoba me-resolve alamat IP ke domain jika ada entri di /etc/hosts/krg-system/ krg*(rw,no_root_squash,no_subtree_check,sync,fsid=1) Apa yang dilakukan di sistem server network booting, lakukan juga pada NFSROOT ==== Compute Node ==== Boot with PXE or gPXE, suite to your network card. Download compiled gPXE/etherboot from rom-o-matic.net tg3, sis900, rtl8139 ===== Testing ===== ==== Kerrighed ==== groupadd nobody fork-test ==== Cpuburn ==== apt-get install cpuburn chroot /NFSROOT/kerrighed/ apt-get install cpuburn burnMMX & # 3-4 times :-) ==== Blender ==== * apt-get install blender * chroot /NFSROOT/kerrighed * apt-get install blender * skrip render * contoh model scene * buka berkas model, set direktori keluaran hasil render Blender+OpenMP copy install/ blender to $HOME apt-get install libjpeg62 mencoder -mf on:w=640:h=480:fps=12 -ovc copy -o output.avi \*.jpg ==== MPI ==== * Setting up ssh in head-node and system inside NFSROOTssh-keygen cp .ssh/id_rsa.pub /NFSROOT/kerrighed/home/stwn/.ssh/authorized_keys chroot /NFSROOT/kerrighed/ su stwn ssh-keygen cp /NFSROOT/kerrighed/home/stwn/.ssh/id_rsa.pub /home/stwn/.ssh/authorized_keys * Make sure /etc/hosts is set with hosts and their IPs, there is a resolving process during mpirun * Set variable P4_RSHCOMMAND to sshP4_RSHCOMMAND=ssh * Create a machine list file krg-node0 krg-node1 * apt-get install mpich-bin libmpich1.0-dev * mpicc mm-mpi.c -o mm-mpi * Run MPI program with mpirunmpirun -np 4 ./Pi * cp id_rsa.pub authorized_keys * sudo cp /home/stwn/.ssh/authorized_keys /NFSROOT/home/stwn/.ssh/id_rsa.pub * sudo cp /home/stwn/.ssh/id_rsa.pub /NFSROOT/kerrighed/home/stwn/.ssh/authorized_keys * cp /home/stwn/.ssh/id_rsa.pub /home/stwn/.ssh/authorized_keys * chroot /NFSROOT/kerrighed * ssh-keygen * mkdir /home/stwn/.ssh * sudo apt-get install mpich-bin libmpich1.0-dev * mkdir /media/storage/demo * vim mm-mpi.c * vim machinefile * mpicc mm-mpi.c -o mm-mpi * krgcapset -d +CAN_MIGRATE * export P4_RSHCOMMAND=ssh * mpirun -machinefile machinefile -np 16 ./mm-mpi vim cpi.c gcc cpi.c vim Pi.c mpicc Pi.c ./a.out mpirun -np 24 ./a.out mpirun -np 2 ./a.out mpirun -np 4 ./a.out ==== OpenMP ==== OpenMP support in Kerrighed is unknown, people from NCHC said there is no support for OpenMP in Kerrighed based-on reply email from Renauld Lottiaux * apt-get install gcc-4.2 * vim mm-openmp.c * Compile itgcc-4.2 -fopenmp mm-openmp.c -o mm-openmp * export OMP_NUM_THREADS=10 * Run mm-openmptime ./mm-openmp ==== Loop ==== This simple loop program will test the process migration feature of Kerrighed with kernel 2.6.20. * First, login to your console * Set Kerrighed capability to CAN_MIGRATEkrgcapset -d +CAN_MIGRATE * Create a simple program containing infinite loop and compile it * Copy the exact program to the other nodes with the same absolute directory location * Run loop program in one nodeloop & loop & loop & loop & * Akan ada pesan sistemsend_kerrighed_signal: 8 (events/0) -> 820741 (loop) * Untuk memigrasikan proses secara manual gunakan perintahmigrate [process-id] [node] ===== Kerrighed Commands List ===== ==== Status ==== krgadm nodes status ==== Start-Stop ==== krgadm cluster start krgadm cluster reboot/poweroff krgadm nodes poweroff -n 13 ==== Lain-Lain ==== top ps free 'cat /proc/*' kerrighed_nodes kerrighed_session ===== Problems ===== * I run 4 blender process in node and has set capability set to CAN_MIGRATE, but none of this 4 process migrated. They like hung up on something, and some messages appeared:Null mapping count, non null mapping address : [mem-addr]Blender uses relatively big data and process, and is this the reason why blender process could not be migrated to another nodes? strongly connected? * A program called cpuburn that does FPU calculations and check its result did well, but it still give us some messagesNull mapping count, non null mapping address : [mem-addr] * The message is on shm_memory_linker.c, this deal with kerrighed's container? * Programs that run on head node could not migrated to another node * Muncul pesan pada node ketika melakukan network bootingGave up waiting for root device. Common problems: | Waiting for root filesystem Boot args * hati-hati masalah konflik paket untuk versi dan juga dependensinya ===== Tips ===== * cat /proc/cmdline untuk mengetahui boot parameter Linux * lakukanmkinitramfs -k `uname -r` -o initrd-2.6.20-krgjika ingin menghasilkan initramfs secara manual ===== Reading List ===== * [[http://www.kerrighed.org/wiki/index.php/SchedConfig|Configurable scheduler framework]] * [[http://source.ggy.bris.ac.uk/wiki/Configure_ssh_for_MPI|Configure SSH for MPI]] * [[http://www.etherboot.org/wiki/usermanual#testing_etherboot|Etherboot User Manual]] * [[http://kerrighed.org/wiki/index.php/Installing_Kerrighed_2.3.0|Installing Kerrighed 2.3.0]] * [[http://kerrighed.org/wiki/index.php/V2.1.0_User_Manual|Kerrighed User Manual]] * [[http://www.mcs.anl.gov/research/projects/mpi/mpich1/docs/faq.htm|MPICH Frequently Asked Questions]] * [[http://bioinformatics.rri.sari.ac.uk/drupal/?q=wiki/tutorial_kerrighed|Tutorial: Kerrighed]] * [[http://spot.river-styx.com/viewarticle.php?id=12|Using Blender with openMosix]] * man debootstrap