= Commentary on [http://osource.org/openmosix/patches/series1/openmosix-git.patch openmosix-git.patch] = {{{ hunk class file patched description noteworthy references 001 i386,config arch/i386/Kconfig this patch adds hpc/Kconfig to the build process 002 i386,remote arch/i386/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines syscall insert a BLANK(); before our code used by the assembly code in arch/i386/kernel/entry.S. first, we generate the offset to the om member of the task structure, from the begining of the structure. we then generate an offset to the dflags DDEPUTY and DREMOTE should have a member, inside of task.om, which is of type openmosix_task. finally, following _asm, indicating they're we define DDEPUTY and DREMOTE constants, setting them to the DDEPUTY the versions used by assembly code. and DREMOTE values defined in hpc/task.h. 003 i386,remote arch/i386/kernel/entry.S we modify this file to add two new entry points to the kernel, utilize remotefork how do we make these changes the syscall mapping table in arch/i386/kernel/omasm.h(in both the local conditional? #ifdefs? normal int 80h syscall path, and the sysenter path), make the two syscall syscall exit points store a pointer to the thread_info of this process, and insert a call to openmosix_pre_usermode in the userspace return path. ret_from_deputy_fork is entered by a process, when its eip is set to this function by the code we add in the function copy_thread in if this is identical, why use it? arch/i386/kernel/processes.c. ret_from_deputy_fork is an identical copy of ret_from_fork. ret_from_kickstart is called from arch_kickstart in isnt this GET_THREAD_INFO redundant? hpc/arch-i386.c. it calls GET_THREAD_INFO(%bsp), and jmps to syscall_exit, returning to userspace for the 'first' time on a remote node. we then modify the resume_userspace entry point, to call openmosix_pre_usermode between doing work_pending, and restor_all'ing. in the next two hunks, we add code to select which syscall table to use based on whether the current task is marked DREMOTE or not. this is added once in ENTRY(sysenter_entry), and again in ENTRY(system_call). we also modify syscall_exit and sysenter_exit to store the result of GET_THREAD_INFO into %ebp, cleaning up after our own code clobbers the register. 004 i386,i387 arch/i386/kernel/i387.c fxsr support is support for fast saving of the i387's floating point/sse/sse2/etc state to a 512 byte block. its a new feature, not present in earlier i387 style floating point processors. this patch changes from declaring the conversion functions for fxsr<->387 from static to OM_NSTATIC, and adds a function for finding out whether support exists during run time. 005 i386,remote creates arch/i386/kernel/omasm.h this file contains the syscall table called by processes which are whats the rule for whether to process DREMOTE. it contains a mapping of whether a syscall is to be passed to locally or back home? the home node, or handled locally. #define self out if !CONFIG_OPENMOSIX 006 userthread arch/i386/kernel/process.c in this patch, we add an entry for user_thread_helper in the remotefork kernel_thread_helper execution path, add a function for creating an in-kernel user thread, and re-direct the entry point ret_from_fork to ret_from_deputy_fork for processes that are DDEPUTY in copythread(). our user_thread_helper entry point meerly subtracts 60h from the stack pointer for this task, reserving space for the user registers on the stack, allowing execution to continue into the kernel thread helper. user_thread is called by openmosix_mig_daemon, in hpc/migrecv.c, to start a 'kernel thread', in the user segment, to handle an incoming migration request. to do this, we set up the user registers in a pt_regs structure, so that we can call do_fork, and have it create the thread for us. first we zero the structure. then we assign the function pointer to the function we want to start in to ebx, set edx to the function's argument, set xds and eds to allow the process access to __USER_DS (the usermode dataspace), and set xcs to allow the process to execute code in __KERNEL_CS (the kernels codespace). we set the orig_eax 'register' to -1, and set the eip to point to our user_thread_helper above, so that execution starts there, and set our eflags so that when this process is running, hardware interrupts are http://x86.org/intel.doc/386manuals.htm enabled, and so that the sign and parity bits are turned on. we then call do_fork, adding flags indicating that we don't want this process to receive SIGCHILD, and that it cannot be ptraced. we return the result of the do_fork call. finally, we modify copy_thread so that for processes marked DDEPUTY, instead of the parent process returning to userspace and immediately entering the kernel at ret_from_fork, we set it to enter the kernel at ret_from_deputy_fork. 007 i386,ksocket arch/i386/kernel/signal.c this patch changes the do_signal function from static to OM_NSTATIC 008 i386,remote arch/i386/kernel/sys_i386.c this patch modifies sys_mmap2 so that processes marked DREMOTE mapping memory without MAP_ANONYMOUS get forwarded to remote_do_mmap. 009 i386,local arch/i386/kernel/vm86.c this patch changes the save_v86_state and return_to_32bit functions remote so that they clear the DSTAY_86 flag before they exit. it also changes both sys_vm86 and sys_vm86old so that they return a process to its home node if it attempts to enter vm86 mode. the code we add to save_v86_state and return_to_32bit simply task_lock()s the current task, uses task_clear_stay() to clear the DSTAY_86 flag, and task_unlock()s the current task. the code we add to sys_vm86 and sys_vm86old simply calls task_go_home_for_reason(), specifying DSTAY_86. if task_go_home_for_reason returns non-zero, we force the better error message? function we're in to return -ENOMEM, as we must be on remote, and migration must have failed. 010 i386,remotemem arch/i386/lib/usercopy.c this patch changes strlen_user to redirect to deputy_strlen_user if openmosix_memory_away(). 011 ppc,config arch/ppc/Kconfig this patch adds hpc/Kconfig to the build process 012 ppc,remote arch/ppc/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines syscall insert a BLANK(); before our code. used by the assembly code in arch/ppc/kernel/entry.S. first, we generate the offset to the om member of the task structure, from the begining of the structure. we then generate an offset to the dflags DDEPUTY and DREMOTE should have a member, inside of task.om, which is of type openmosix_task. finally, following _asm, indicating they're we define DDEPUTY and DREMOTE constants, setting them to the DDEPUTY the versions used by assembly code. and DREMOTE values defined in hpc/task.h. 013 ppc,remote arch/ppc/kernel/entry.S we modify this file to a new entry point to the kernel, utilize the syscall syscall mapping table in arch/ppc/kernel/misc.h, and insert a call to openmosix_pre_usermode in our userspace return path. first we modify syscall_dotrace_cont, selecting which syscall table to use based on whether the current task is marked DREMOTE or not. ret_from_kickstart is called from arch_kickstart. ret_from_kickstart branches directly to ret_from_syscall, returning to userspace for the 'first' time on a remote enode. our last hunk branches directly to openmosix_pre_usermode in the restore_user path. 014 ppc,userthread arch/ppc/kernel/misc.S in this patch, we add an assembly function to create usermode threads, and create the remote syscall table. first we define SIGCHLD. our next hunk creates a user_thread function similar to the one in rewrite in C, if possible! arch/i386/kernel/process.c, except hand written in assembly. finally, move syscall table to omasm.h! we create the syscall table used by processes which are DREMOTE. it contains a mapping of wether a syscall is to be passed to the home node, or handled locally. 015 x86_64,config arch/x86_64/Kconfig this patch adds hpc/Kconfig to build process 016 x86_64,remote arch/x86_64/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines syscall remove ifdef around header. redundant. used by the assembly code in arch/x86_64/kerel/entry.S. first, we move this define with the others. generate the offset to the om member of the task structure, from the add a define around task! is task used? begining of the structure. next we define an entry for task. in our insert a BLANK(); before our code. last hunk, we generate an offset to the dflags member, inside of DDEPUTY and DREMOTE should have a task.om. we then define DDEPUTY and DREMOTE, setting them to the following asm! DDEPUTY and DREMOTE values defined in hpc/task.h. 017 x86_64,remote arch/x86_64/kernel/entry.S we modify this file to add a new entry point for returning from syscall dont define out omasm.h kickstart, utilize the syscall mapping table in arch/x86_64/kernel/omasm.h, modify the PTREGSCALL macro to create om_ entries for each of the 6 functions that take a PTREGS argument, insert an om_ptregscall_common entry point, insert a om_stub_execve entry point, insert a call to openmosix_pre_usermode, and insert a re-write user_thread in C! user_thread function written in assembly. ret_from_kickstart restores the state of the registers to how they were before it was called, and returns to userspace for the first time, on a remote node. in the system_call entry point, we check for DREMOTE in task.om.dflags. why the fake frame? CFI_ADJUST! if we find it, we jump over a stack frame, call into our UNFAKE_STACK_FRAME? remote_sys_call_table, step back under the stack frame, and jmp to ret_from_syscall. otherwise, we pass through to the normal syscall handler. next, we re-define the PTREGSCALL macro so that when its used, it creates two entries instead of one. one 'normal' entry, and one entry prepended with om_, that loads the address of an om_ version of the function being declared, and calls our om_ptregscall_common entry to dispatch. om_ptregscall_common is our version of non-functional differences? ptregscall_common. the only functional difference is ours begins by jumping over a stack frame, does the same work as ptregscall_common to call the C function pointed to in %rax, and peel back to before the stack frame we jumped over. entry om_stub_execve is similar to stub_execve, only call remote_do_execve instead of sys_execve, and we save the contents of r11(eflags) in r15 during this call, restoring afterwards. our next hunk modifies the common_interrupt entry to call our openmosix_pre_usermode function during the return to userspace. finally, we create our user_thread entry, which is responsible for creating our 'user thread', similar to the two previous user_thread functions. 018 x86_64,remote creates arch/x86_64/kernel/omasm.h this is a table containing mappings that are used to dispatch syscall #define self out if !CONFIG_OPENMOSIX requests made by processes that are guests. it contains mappings of whether a syscall is to be passed to the home node, or processed what about sys_ni_syscall? locally. the entries are referenced by code generated by the PTREGSCALL macro in arch/x86_64/kernel/entry.S. each mapping stores the address of one of om_sys_local, om_sys_remote, or sys_ni_syscall. this address is loaded into %rax by PTREGSCALL, and PTREGSCALL calls om_ptregscall_common to dispatch to the function retrieved from this table. 019 x86_64,remote arch/x86_64/kernel/sys_x86_64.c this patch redirects sys_mmap2 so that remote processes mapping memory without MAP_ANONYMOUS get forwarded to remote_do_mmap. 020 x86_64,local arch/x86_64/lib/copy_user.S this patch redirects copy_to_user and copy_from_user so when the kernel rmem on the home node is accessing memory in the processes userspace, it gets redirected to functions accessing memory on the remote node. our first hunk modifies copy_to_user, checking to see if the process is marked DDEPUTY, and if so re-directing to deputy_copy_to_user. the second hunk accomplishes the same task, re-directing copy_from_user to better label than 2901! deputy_copy_from_user if the task is marked DDEPUTY. 021 x86_64,local arch/x86_64/lib/usercopy.c this patch forwards __strncpy_from_user and strncpy_from_user so when rmem missing comments on #endifs the kernel on the home node is accessing memory in a deputy processes userspace, we use deputy_strncpy_from_user. we also forward __strlen_user and strlen_user to deputy_strlen_user for the same reason. 022 local,rmem fs/namei.c modify getname to use deputy_strncpy_from_user to get the filename BUG! this function is supposed to requested from userspace from the remote node when cannonicalize the filename passed in, openmosix_memory_away(). not just return it! missing comment on #endif 023 local,procfs fs/proc/base.c this patch adds "files" named where, stay, and debug in a directory named "hpc" in the /proc/$PID/ directory of each process on the take out #ifdef around header include. local node. this makes it the 'root' of our procfs handling code. it actually passes off the work involved to hpc/proc.c. first, we include the hpc/hpc.h header. in the next two hunks we add entries for PROC_TGID_OPENMOSIX, PROC_TGID_OPENMOSIX_WHERE, PROC_TGID_OPENMOSIX_STAY, PROC_TGID_OPENMOSIX_DEBUG, PROC_TID_OPENMOSIX, PROC_TID_OPENMOSIX_WHERE, PROC_TID_OPENMOSIX_STAY, and PROC_TID_OPENMOSIX_DEBUG into enum pid_directory_inos. this enum sets the inode number for each of our "files" in /proc to unique values. next we create the "om" entry with inode number PROC_TGID_OPENMOSIX in the tgid_base_stuff structure. we then do the same thing again, creating a "om" entry with inode number PROC_TID_OPENMOSIX in tid_base_stuff. next we create a pair of structures (tgid_openmosix_stuff and tid_openmosix_stuff), containing entries for PROC_TGID_OPENMOSIX_WHERE/"where", PROC_TID_OPENMOSIX_WHERE/"where", PROC_TGID_OPENMOSIX_STAY/"stay", PROC_TID_OPENMOSIX_STAY/"stay", PROC_TGID_OPENMOSIX_DEBUG/"debug", and PROC_TID_OPENMOSIX_DEBUG/"debug". these entries declare the contents of our /proc/$PID/om/ directory. in our next hunk, we add proc_pid_openmosix_read and proc_pid_openmosix_write functions, and a file_operations structure named proc_pid_openmosix_operations mapping .read and .write functions to the proc_pid_openmosix_read and proc_pid_openmosix_write we just declared. in proc_pid_openmosix_read, we first trunicate a read request to PAGE_SIZE, and request a free page in GFP_KERNEL. if __get_free_page fails, we return -ENOMEM. otherwise, we then call openmosix_proc_pid_getattr from hpc/proc.c, which does the work of dispatching our read request to the right function. it fills in the page we allocated, and returns the ammount of characters written to the page. if the length is less than zero, this indicates an error. we respond to this error by freeing our page, and returning the error value. assuming no error occured, we check to see if the user requested data beyond the end of what we 'read'. if they did, we free our requested page, and return 0. otherwise, we take the seek value (ppos), and apply it to our page. we then copy_to_user the contents of our page (past the seek) to the passed in userspace buf, free our page, and return the number of bytes copy_to_user'd. in proc_pid_openmosix_write, reverse the order of these. fail first! we first trunicate our write request to PAGE_SIZE, then check to see if the user requested a 'partial write'. if they did, we return -EINVAL. we then get a free page in GFP_USER. if that fails, we return -ENOMEM. otherwise, we copy the data in the passed in buf into our new page. if our copy_from_user fails, we free our page, and return -EFAULT. otherwise, we call openmosix_proc_pid_setattr with our page, free said page, and return the length returned(even if its a negative value, EG an error message from openmosix_proc_pid_setattr). we define a file_operations structure (proc_pid_openmosix_operations), pointing its .read and .write members to the two functions we just declared. we then forward declare proc_tid_openmosix_operations, proc_tid_openmosix_inode_operations, proc_tgid_openmosix_operations, and proc_tgid_openmosix_inode_operations structures, which we'll define later in this patch. adds a pair of cases to the large switch in proc_pident_lookup that map all eight of our unique identifiers defined at the begining of this file to their respective file_operations structures, and inode operations structures in the case of the containing directories. this allows the proc system to find the structures containing the function pointers to handle our requests. in our last hunk, we provide implementations for the functions proc_tgid_openmosix_readdir and proc_tid_openmosix_readdir that return the result of calling proc_pident_readdir against our tgid_openmosix_stuff and tid_openmosix_stuff structures. we then define the proc_tgid_openmosix_operations and proc_tid_openmosix_operations structures, mapping .read to generic_read_dir and .readdir to our proc_tgid_openmosix_readdir or proc_tid_openmosix_readdir declared above. we then define the functions proc_tgid_openmosix_lookup and proc_tid_openmosix_lookup, which return the result of calling proc_pident_lookup with our tgid_openmosix_stuff or tid_openmosix_stuff structures. finally, we define our proc_tgid_openmosix_inode_operations and proc_tid_openmosix_inode_operations structures, mapping .lookup to proc_tgid_openmosix_lookup or proc_tid_openmosix_lookup. 024 local,procfs fs/proc/root.c this patch changes proc_root_init to call openmosix_proc_init. first, remove #ifdef around include we include our hpc/hpc.h header. then, we add our call to openmosix_proc_init (from hpc/proc.c) into proc_root_init. 025 ksocket,local, fs/select.c this patch exports the do_select function, so that hpc/kcomd.c can use remote it to check for data on our pile of incoming sockets. this function is is only exported if CONFIG_KCOMD is selected. 026 i386, local, creates hpc/arch-i386.c this file contains functions converting between two different but http://arch.ece.uic.edu/~yxshi/param/web/homepage/research/doc/reference/vc130.htm remote, this file should be broken up. compatible x87 floating point state formats, for sending and receiving archmig, archetecture specific sections of a given task's state, a function for syscall starting a new guest process, and support functions for handling syscall requests from entry.S. first, we forward declare the functions twd_fxsr_to_i387 and twd_i387_to_fxsr from arch/i386/kernel/i387.c. we utilize them in creating fxsave_to_fsave and fsave_to_fxsave make the order of operations in these functions. fxsave_to_fsave is called by the later declared two functions identical. arch_mig_receive_fp to convert from fxsave to fsave format. we start by copying the contents of the cwd, swd, fip, fcs, foo, and fos fields of the from union to the to union. we then use twd_fxsr_to_i387() to fill in our twd member. we then save padding[0] to our fop member, and padding[1] to our mxcsr mrmber. next we perform a memcopy loop to copy and convert the st_space member. this member contains fields that are 16 bytes long in fxsave format, and 10 bytes long in fsave format. we loop through the fields, copying only the first 10 bytes to our to's st_space. finally, we memcopy the xmm_space member. fsave_to_fxsave is also called by arch_mig_receive_fp, to convert from fsave to fxsave format. we start by copying the contents of the cwd, swd, fip, fcs, foo, and fos to the 'to' union, from the 'from' union. we use twd_i387_to_fxsr() to fill in our twd member, save padding[0] to our fop member, and save padding[1] to our mxcsr member. after that, we enter a loop, memcopying our 10 byte long members of st_space to 16 byte spaces. finally, we memcopy the xmm_space member. the next three functions are for receiving archetecture specific state information. BROKEN! does not handle setting up arch_mig_receive_specific is called by mig_do_receive in LDT entries! hpc/migrecv.c. its purpose is to receive the archetecture specific part of a process. the one in this file has code to warn us that we're not setting up the LDT correctly, and still returns success. if its asked to setup anything else, we return -1. arch_mig_receive_proc_context is called at the top of mig_do_receive_proc_context, from hpc/migrecv.c. its function is to set up the CPU state from the passed in omp_mig_task structure. we start by getting the pt_regs structure of the task we're check failure in this function! setting up with ARCH_TASK_GET_USER_REGS. we then overwrite it with omp_mig_task's regs member. we overwrite our task's thread.debugreg with arch.debugreg from omp_mig_task, as well as overwriting thread.fs and thread.gs with arch.fs and arch.gs (setting up our segmentation registers). we then copy the contents of the tls_array structure, which http://lwn.net/Articles/5851/ contains the 'thread local space' segment offsets. this function always returns 0. arch_mig_receive_fp is called by mig_do_recieve_fp, from hpc/migrecv.c. its function is to set up the FPU state from the passed in omp_mig_fp structure. we start by calling unlazy_fpu, to initialize the FPU, then we check wether the current CPU has the fsxr instruction, and whether the remote CPU has the fsxr instruction. if they both do, or if they both don't, that means the floating point save is in the same format, so we just memcpy the state from the omp_mig_fp struct to the task's thread.i387 structure. otherwise, we call one of the above two conversion functions (fxsave_to_fsave, or fsave_to_fxsave) to perform the copy, while translating the formats. the next two functions are called by mig_do_send in hpc/migsend.c, before and after doing the actual work of sending a task to another node (home or remote). arch_mig_send_pre clears the LDT if there is one set for this process, and arch_mig_send_post loads the LDT back up, if there is one. the next three functions are the send side, to match the three arch_mig_recieve functions earlier. all three of these functions are called from mig_do_send, in hpc/migsend.c. arch_mig_send_specific is a STUB! stub that looks like it was supposed to send the LDT, but instead prints a warning if an LDT is being used. arch_mig_send_fp is called to send the FPU state. we call unlazy_fpu, then fill in the fp state(along with the fxsr flag). arch_mig_send_proc_context is called to send the CPU context of a task. in it, we store the user registers, segmentation registers (FS and GS), the thread local space entries, and the debugreg registers to the passed in struct omp_mig_task. in addition, if this task is marked DDEPUTY (meaning we're on the home node), we also send the features of the boot CPU. arch_kickstart is the function called to start up a newly "created" task. in it, we set up debug registers 0-3, 6, and 7 with set_debugreg(). we intentionally omit registers 4 and 5 due to them being just aliases for 6 and 7. we use load_TLS, sets up the thread local spaces, and use loadsegment to do we need to flush pending signals? load our FS and GS registers. we set CS to __USER_CS, flush pending signals, and execute an assembly fragment that causes us to immediately jump to the ret_from_kickstart entry point in entry.S. at this point split this back off. there's a break in the file, like this section used to be another file. we include some headers, then define three functions that are part of our syscall handling subsystem. arch_exec_syscall is called by deputy_do_syscall, to call a requested syscall on behalf of a remote process. we use OMDEBUG_SYS to print a tracing message, look up the requested syscall in the sys_call_table, and return the result of calling it (through a function pointer) with the passed in arguments. these functions belong in the same the next two functions are called via the remote_sys_call_table in place as user_thread! /arch/i386/kernelomasm.h, by guest processes. om_sys_fork is called by a guest process, trying to fork. we wrap remote_do_fork, passing it a clone_flag of SIGCHLD, and null arguments for parent and child thread pointers. om_sys_clone performs similarly, first checking for a new stack pointer in CX. if there isn't one, we re-use the current task's stack pointer. we accept the clone_flags in register ebx, the parent tidptr in edx, and the child tidptr in edi. we pass all of this to the same remote_do_fork as the previous function. 027 ppc, local, creates hpc/arch-ppc.c this patch is very similar to the previous patch, but cleaner. remote, arch_mig_receive_specific just returns 0. its called by mig_do_receive arch_mig, from hpc/migrecv.c. its purpose is to receive the archetecture specific syscall part of a process, which aparently the PPC dosent have. arch_mig_receive_proc_context is called at the top of mig_do_receive_proc_context, from hpc/migrecv.c. its purpose is to set the user registers of the current task to the contents of the passed in omp_mig_task structure. in it, we simply use ARCH_TASK_GET_USER_REGS to check return of memcpy! retreive the registers in question, then memcpy over them from our passed in structure. we always return 0. arch_mig_receive_fp is called by mig_do_receive_fp, from hpc/migrecv.c. its function is to set up the current task's FPU state to the one passed in the omp_mig_fp structure. in it, we memcopy the floating point registers from the passed in FIXME: fpscr_pad not needed? structure over the task->thread->fpr, and copy the fpscr and fpscr_pad as well. arch_mig_send_pre and arch_mig_send_post are void no-ops. their purpose is to make a process "ready to be migrated" while we're pulling the process apart, which aparently dosent need done on PPC. they're called at the begining and end of mig_do_send in hpc/migsend.c, respectively. arch_mig_send_specific is also a no-op, as the PPC has no architecture specific "parts" of a process. in it, we just return 0. arch_mig_send_fp is called by mig_do_send to fill the passed in omp_mig_fp structure with the floating point state of the current check this memcpy! task. in it, we memcopy the task->thread->fpr structure into the FIXME: fpscr_pad not needed? omp_mig_fp, and set the fpscr and fpscr_pad members as well. we always return 0. arch_mig_send_proc_context is called by mig_do_send to fill in the passed in omp_mig_task structure with the CPU state of the current task. in it, we use ARCH_TASK_GET_USER_REGS to get the pt_regs check this memcpy! structure, and just memcpy it into our omp_mig_task structure. we return 0. arch_kickstart is called by mig_handle_migration to start a guest process for the first time. to accomplish this, we get the user what are we doing with mr 1, or the registers, and branch to ret_from_kickstart, passing our user registers user registers? as input. arch_exec_syscall is called by deputy_do_syscall to call a requested syscall on behalf of a remote process, returning its result. we look up the requested syscall in the sys_call_table, and return the result of calling it. 028 x86_64 creates hpc/arch-x86_64.c this patch is very similar to the previous two patches. arch_mig_receive_specific is called by mig_do_receive from hpc/migrecv.c, to receive the architecture specific parts of a process. STUB! ldt handling? this function is a stub, returning 0. arch_receive_proc_context is called at the top of hpc/migrecv.c's mig_do_receive_proc_context. its function is to set up the CPU state, using the passed in omp_mig_task structure as input. in it, we start by using ARCH_TASK_GET_USER_REGS to retreive our task's pt_regs structure, which we overwrite with the regs member from omp_mig_task, using memcpy. we then copy our segment pointers (ds, es, fs, gs), our index registers (fsindex, gsindex), and our userspace stack pointer (userrsp) from omp_mig_task's arch member this is way different from x86. to our task structure's thread member. we call write_pda to associate check differences! our new stack pointer with our task, and return 0. arch_mig_receive_fp is called by hpc/migrecv.c's mig_do_receive_fp. its function is to set up the FPU state, using the passed in omp_mig_fp structure as bad comment. not all amd64 is opteron. input. in it, we call unlazy_fpu, then memcopy our omp_mig_fp's data member over the task's thread.i387 structure. arch_mig_send_pre is called at the top of hpc/migsend.c's mig_do_send. its function is to prepare a task for migration. in it, we clear the LDT, if one is present. arch_mig_send_post is called at the bottom of hpc/migsend.c's mig_do_send, after migration of a process is complete. it restores the LDT, if we cleared it earlier. arch_mig_send_specific is called by hpc/migsend.c's mig_do_send, to send the architecture specific section FIXME: send the LDT of a process. it should be sending the LDT, but instead is a stub, FIXME: ordering. CPU, FPU, all else. returning 0. arch_mig_send_fp is called by hpc/migsend.c's mig_do_send to store the passed in task's FPU state in the passed in omp_mig_fp structure. in this function, we unlazy_fpu, then memcopy the task's check this memcpy. thread.i387 structure into the omp_mig_fp. we return 0. arch_mig_send_proc_context is called by hpc/migsend.c's mig_do_send to store the passed in task's CPU state in the passed in omp_mig_task. we aquire a pointer to the user registers using check this memcpy. ARCH_TASK_GET_USER_REGS, and memcopy them in to the regs member of the no thread locals in recv function! passsed in omp_mig_fp structure. we then copy all the thread local spaces to the omp_mig_task's arch.tls_array. we copy the segmentation and index registers (ds, es, fs, gs, fsindex, and gsindex), then retreive our stack pointer using read_pda(oldrsp), copying it as well. this function returns return 0, indicating success. arch_kickstart is called by mig_handle_migration, in hpc/migrecv.c. its function is to jump from our kernel code to the user space code of a guest process for the first time. to accomplish this, first we use load_TLS is commented out! bug! set_debugreg to load the debugging registers (0-3, 6, and 7). then, we load the segmentation registers, using loadsegment for ds, es, and fs, and use load_gs_index for the gs segment register. we set the task's cs to __USER_CS, and its ds to __USER_DS, and set_fs(USER_DS). should we be flushing? we then flush pending signals, and jmp to ret_from_kickstart. this function by definition never returns. arch_exec_syscall is called from deputy_do_syscall in hpc/deputy.c. its function is to call a given syscall, returning the results the syscall returned. in this implimentation, we create inline assembly functions to perform the call. om_sys_fork is called via the remote process table, by a guest process trying to fork. we wrap remote_do_fork from hpc/remote.c, passing it a clone_flag of SIGCHLD, and null arguments for parent and child thread pointers. finally, we define five functions as stubs, printk! which printk an 'not implimented' message and return. each of these not implimented? #warning. functions is called by the remote_syscall_table. they are: om_sys_iopl, om_sys_vfork, om_sys_clone, om_sys_rt_sigsuspend, and om_sys_signalstack. they return -1. 029 kcom, local, creates hpc/comm.c this is the kernel-to-kernel communication system. in it, we use TCP/IP remote sockets to pass migration related information between kernels. first we define POLLIN_SET to be the set of events we want poll to tell us about on a given socket, during comm_peek and comm_wait's invocation of comm_poll. we then define three timeout variables dead code! (conn_remote_timeo, comm_connect_timeo, and comm_reconn_timeo), which are initialized from values #defined in hpc/comm.h, and are used nowhere. comm_shutdown is called in case of an error on recv()ing data in comm_recv. its a wrapper, which safely calls sock->ops->shutdown, making sure sock and sock->ops are non-null. comm_getname is a wrapper to safely call sock->ops->getname. we return -1 in case sock->ops is null, sock->ops->getname is null, or if getname returns null. otherwise, we return the size of the passed in sockaddr. dead code! comm_data_ready is a wrapper which calls wake_up_interruptable to wake all the tasks in the passed in socket's sleeping task queue. it is called by no-one, and is in fact dead code. comm_setup_tcp is called by the later defined comm_accept, to set up the options on the passed in who else does this? connection. first we save our current user space address space selector, and set ourselves to use the kernel address space. then, we http://mail.nl.linux.org/kernelnewbies/2001-11/msg00204.html use sock_setsockopt to set SO_KEEPALIVE on the passed in socket, then FIXME? old code. use sock->ops->setsockopt to set TCP_KEEPINTVL TCP_KEEPCNT, TCP_KEEPIDLE, and TCP_NODELAY. finally, we restore our origional http://www-128.ibm.com/developerworks/linux/library/l-hisock.html address space limit, and exit with 0 if everything was successful. if any of our setsockopt functions return non-zero, we restore our origional address space limit, and return the error value in question. this function is useless. comm_socket is called by comm_setup_listen, and comm_setup_connect. its function is to be a wrapper around sock_create, returning NULL in case of an error, instead of the error sock_create would normally return. comm_bind is called by comm_setup_listen, to bind our socket to a given address and port. it is a wrapper around sock->ops->bind. in printk! the event of an -EADDRINUSE error, we log a message using printk. in any event, we return the value sock->ops->bind returned to us. comm_listen is called by comm_setup_listen to start listening to a passed in socket. its a wrapper around sock->ops->listen, returning the result given to us. comm_connect is called by comm_setup_connect to connect to a remote kernel at the passed in address, via the passed socket. in it, we start by creating a waitqueue entry for the current process. we then check to see if we were passed a timeout value. if we were, we use that timeout when trying to connect() to a remote machine, with an asychronous request. otherwise, we use MAX_SCHEDULE_TIMEOUT. we then insert our waitqueue entry into the passed in socket's sk_sleep waitqueue. we enter a while loop, waiting for sock->state to be SS_CONNECTED. in this loop, we mark ourselves TASK_INTERRUPTABLE, then request another asynchronous connect. if we get an error thats not -EALREADY, we break out of our loop, as it means something went wrong with the socket. otherwise, we then schedule_timeout for a maximum of the remainder of our timeo, storing the remainder that schedule_timeout returns in timeo. if timeo runs out, we set error to -EAGAIN, and break out. at the bottom of the loop, the while checks to see if the socket connected, and continues on if it did. after the loop, we first remove ourself from the socket's waitqueue, and set our state to TASK_RUNNING. we then check to see if error is non-zero. if it is, then either we had a problem with the socket, or we ran out of time. we handle this by OMBUGing a message indicating connection failed, and returning the error value in error. we then do another check on the socket, to make sure it dosent have an error on it already. if it does, we OMBUG about it, and return the error. finally, since no errors occured, and the socket is connected, we return 0 indicating success. comm_close is a wrapper around sock_release, called by comm_setup_listen, comm_setup_connect, and hpc/migctrl.c's task_local_send in cases of failure, hpc/migctrl.c's task_remote_expel when we're done expelling a process, task_local_bring when we're done receiving a process back home, and hpc/migrecv.c's openmosix_mig_daemon when we fail to accept an incoming process, or our user_thread() returns. comm_peek is called by hpc/kernel.c's remote_pre_usermode to see if data is waiting on the passed in socket. it returns 1 for data waiting, 0 if not. comm_poll is sighfile is used here, but earlier we called by comm_wait and comm_accept to wait on an "event" to occur on a used NULL. which is correct? passed in socket. in it, the first thing we do is create an empty file pointer, for later use while polling. we then create a waitqueue entry for the current process. we check to see if a timeout value was passed in, and if one was, we use it. otherwise, we use MAX_SCHEDULE_TIMEOUT. we add our wait_queue entry into the passed in socket's waitqueue, and enter a loop, with no termation clause. in this loop, we first set our current state to TASK_INTERRUPTABLE, then get the result of polling our socket, supplying sighfile as a placeholder. we check to see if any of the poll events in our poll mask are true, and if interruptable is set, we check to see if we have a pending signal, or a pending openmosix data request. if any of these things have occured, we exit our loop. otherwise, we use schedule_timeout to give up the CPU until we're interrupted, or our timeout has expired. when schedule_timeout returns, we check to see if our timeout has expired, and if it has, we exit our loop. otherwise, we loop back to the top of our loop. after the loop, we remove ourself from the socket's waitqueue, and set our state to TASK_RUNNING. if the last poll result we received from poll has an event that is in our passed in poll mask, we return 1. otherwise, we comm_wait should be a define? return 0. comm_wait is called by hpc/deputy.c's deputy_main_loop, to check for communication on a given socket. its a wrapper around comm_poll, filling in some default parameters. it asks comm_poll to check against POLLIN_SET as a poll mask, look for interruptable events, and use the default timeout value. for a return value, we return what comm_poll returns to us. comm_accept is called by hpc/migrecv.c's openmosix_mig_daemon, to accept an incoming connection on a given socket. in it, we receive a passed in socket thats been connected to, create a new socket, and set its type and ops members the same as the what error are we looking for here? socket passed in. we then make sure theres no data to be read on the passed in socket with comm_poll. after that, we call sock->ops->accept to accept the connection attempting to attach to the passed in socket, on the newly created socket. we then use comm_setup_tcp to set the connection options on the new socket. if that succeeds, we set the passed in target socket pointer to our newly created socket, and return 0. if comm_poll finds data on our passed in socket, we return -EAGAIN, set our passed in target socket pointer to NULL, and destroy our newly created socket. if either accept() or comm_setup_tcp() fail, we return the error they return to us, set our passed in target socket pointer to NULL, and destroy our newly created socket. comm_dorecv is called by s/lenght/length comm_recv to do the actual work of reading a given ammount of data from the passed in socket. we start by entering a do...while loop, waiting for the count of bytes left to be received to hit zero. we initialize that count with the passed in length of data to be received. in this loop, we call sock_recvmsg, to receive a number of bytes from the passed in socket. we check the value returned to see if its an error, or a count of bytes returned. if its an error, first we check to see if the error was -EFAULT. if it is, we update the passed in message structure's msg_iovlen and msg_iov members, indicating how much of the request was filled, and return the number of bytes received. if the error was any other error, we return the error value in question. if sock_recvmsg returns 0, we return -EPIPE. otherwise, sock_recvmsg was successful, and returns a number of bytes received. we update our variable containing the number of bytes still left to be read, and check to see if we have read the entire ammount of bytes we are supposed to read. if we have, we update the passed in message this if inside the loop should be structure's msg_iov and msg_iovlen members, and return the number of outside! as well, the for and while in bytes read. comm_recv is called to receive a given ammount of data from the loop should be made similar in comm_dorecv. we BUG_ON() if the socket pointed to is null, or if the structure. ammount of data requested is greater than the size of a single page of memory. we create a msghdr structure to use in the request, and set its iov_base member to point to the passed in write buffer pointer, and set the iov_len to the length of data we're requesting. we then use get_fs http://kerneltrap.org/mailarchive/linux-kernel-newbies/2007/10/25/355049 and set_fs to allow us to call a system call from the kernel. we call comm_dorecv to actually talk to the socket in question, then check its result. if comm_dorecv dosent return the right ammount of data read, we OMBUG() about it, then call comm_shutdown(link), returning -EFAULT after restoring our fs. otherwise, we restore our fs, and return the length of the data read. comm_send uses the same fs manipulation trick as above to allow us to call syscalls from kernel space, then wraps sock_sendmsg(), printk'ing a message if we fail to send the proper ammount of data. we return the number of bytes transmitted. next is a "openmosix specifics start here" marker in the comments. set_our_addr is called by hpc/migrecv.c's openmosix_mig_daemon to set the passed in sockaddr structure's sin_family, sin_addr, and sin_port members. for IPV4 sockets, we set family to AF_INET, address to INADDR_ANY, and port no IPV6 setup? to the passed in port. for IPV6 sockets, we do nothing. comm_setup_listen is called by hpc/migrecv.c's openmosix_mig_daemon, and hpc/remote.c's remote_do_fork. its purpose is to set up a listening socket, ready for a remote kernel to connect to. in it, we call comm_socket to create a socket of the same type as the socket passed in, then call comm_bind to bind the socket to the requested address and port. finally, we call comm_listen start listening on that socket, and return the socket. if comm_socket fails, we return NULL. otherwise, if comm_bind or comm_listen fail, we destroy our socket, and return the error returned to us by the function that failed. comm_setup_connect is called by hpc/deputy.c's deputy_do_fork, and hpc/migctrl.c's task_local_send. its purpose is to open a socket to a remote kernel. in it, we use comm_socket to create a socket, then use comm_connect to connect to a listening socket on the target machine. we return the connected socket. if comm_socket fails, we return NULL. if comm_connect return comm_connect's error? fails, we destroy the socket, and return NULL. comm_send_hd is called to send a data segment with a omp_req header attached. in the header, we set the type and dlen to the passed in type and length, respectively. we then use comm_send to send the header, then the data. we return 0 on success, and if either invocation of comm_send return an return comm_send's errors? error, we return -1. comm_send_req sends a omp_req structure, containing only the passed in type, no data. it returns the value returned by comm_send. 030 rmem, local, creates hpc/copyuser.c this file contains routines for moving chunks of memory over an remote established connection. its broken into two parts, deputy_* functions, and remote_* functions. deputy_ functions are run on the home node, and remote_ functions are run on the node a process has been migrated to. deputy_copy_from_user is called by include/asm-i386/uaccess.h's __copy_from_user_inatomic, include/asm-x86_64/uaccess.h's __copy_from_user, and arch/x86_64/lib/copy_user.S's copy_from_user. its purpose is to read a given memory segment from the remote node. in it, BUG! use of in_atomic is flatout wrong! we use in_atomic to detect if we're in an atomic section of the kernel, http://lwn.net/Articles/274695/ and if we are, we return the length of the memory area requested as an error. otherwise, we fill in an omp_usercopy_req structure with the passed in from address and passed in length, then we use OMDEBUG_CPYUSER() is being used in the OMDEBUG_CPYUSER to log a message about the source, destination, and deputy code to printk with a unique length. we use comm_send_hd to send said omp_usercopy_req to the remote format? node, then use comm_recv to receive the results directly to the passed in destination. we return 0 indicating success. if comm_send_hd or seperate OMBUG invocations! comm_recv return an error, we OMBUG the error value, and return -1. we export a symbol pointing to this function using EXPORT_SYMBOL(). make this function and the previous deputy_strncpy_from_user is called by arch/x86_64/lib/usercopy.c's have similar orders of operation. __strncpy_from_user and strncpy_from user, as well as fs/namei.c's getname. its function is virtually identical to the previous function, and reads a given memory segment from the remote node. in it, we use OMDEBUG_CPYUSER to log a messae containing the source, destination, and length of the requested segment. we then fill in a omp_usercopy_req structure with the passed in source address, and passed in length. we use comm_send_hd to send the omp_usercopy_req to the remote node, then use comm_recv to receive the results directly into the passed in destination. we return 0 indicating success. if comm_send_hd or seperate OMBUG invocations! comm_recv return an error, we OMBUG the error value, and return -1. make this function and the previous deputy_copy_to_user is called by include/asm-i386/uaccess.h's two have similar orders of operation. __copy_to_user_inatomic, include/asm-x86_64/uaccess.h's __copy_to_user, and arch/x86_64/lib/copy_user.S's copy_to_user. its purpose is to write a passed in memory segment to the remote node. in this function, use of in_atomic is incorrect! we first use in_atomic to check if we are in an atomic state, and if we are, we return the number of bytes requested to be written, indicating failure. if we're not in atomic state, we use OMDEBUG_CPYUSER to log a message about the source, destination, and length. then we fill in a omp_usercopy_req structure with the passed in target address, and passed in length. we use comm_send_hd to send the omp_usercopy_req structure to the remote node, then use comm_send to send the actual data to be written. we return 0 indicating success. if comm_send_hd or seperate OMBUG invocations! comm_send return an error, we OMBUG the error value, and return -1. deputy_copy_to_user is exported as a symbol with EXPORT_SYMBOL(). make this function and the previous deputy_strnlen_user is called by arch/i386/lib/usercopy.c's three have similar orders of operation. strnlen_user, and arch/x86_64/lib/usercopy.c's __strnlen_user and strlen_user. its purpose is to strnlen a string on the remote node. to accomplish this, we first OMDEBUG_CPYUSER a message containing the address and maximum length of the request. we then fill in a omp_usercopy_req structure containing the passed in address and length. we sends this structure to the remote node using comm_send_hd, then uses comm_recv to get the result into a temporary variable, then return seperate OMBUG invocations! the result. if comm_send_hd or comm_recv return an error, we OMBUG the BUG: 0 is a valid return result! value returned, and return 0, indicating failure. its symbol is EXPORT_SYMBOL'd. deputy_put_userX is called by the below deputy_put_user and deputy_put_user64 functions. its purpose is to write a value of 64bits or less to a given location on the remote node. to accomplish this, we start by OMDEBUG_CPYUSER'ing saying what value we're placing in what position of what length. then, we fill in an omp_usercopy_emb structure with the passed in address, length, and value. we then use comm_send_hd to send the value to the remote node, and return 0 in case of success. if comm_send_hd fails, we return -EFAULT. deputy_put_user is called by include/asm-i386/uaccess.h's put_user, deputy_put_user64_helper, __put_user_size, and include/asm-x86_64/uaccess.h's put_user_size. its function is to put a value to a remote node. in it, we just check to make sure the size of the value passed in is not greater than sizeof(long). if it is, we call BUG_ON. otherwise, we call deputy_put_userX with our arguments, returning the value it returns. deput_put_user's symbol is EXPORT_SYMBOL'd. deputy_put_user64 is called by the bad name in comment. deputy_put_user64_helper inline assembly function, from include/asm-i386/uaccess.h. this function is only created if BITS_PER_LONG < 64, AKA, we're on a 32 bit archetecture. in deputy_put_user64, we use deputy_put_userX to put a value of type s64 to the remote node, returning the result returned by deputy_put_userX. if declared, this function is EXPORT_SYMBOL'd. deputy_get_userX is called by the following deputy_get_user and deputy_get_user64 functions. its purpose is to get a value up to 64 buts in length from the remote node. in it, we first use OMDEBUG_COPYUSER to log the requested address and size. we then place these values in a omp_usercopy_req structure, and use comm_send_hd to send it to the remote node. we comm_recv() the result, then place the result in the passed in pointer, being careful to use the right size. we return 0 indicating success. if comm_recv or comm_send_hd fail, we seperate OMBUG invocations! OMBUG the error returned, and return -EFAULT. deputy_get_user is called by include/asm-i386/uaccess.h's get_user and get_user_size macros, and include/asm-x86_64/uaccess.h's get_user macro. its purpose is to get a value up to sizeof(long) from the remote node. in it, we BUG_ON if the value requested is larger than sizeof(long), and call deputy_get_userX to actually do the work. this function is EXPORT_SYMBOL'd. BUG: shouldnt we be calling this on deputy_get_user64 is created if BITS_PER_LONG < 64. its not called from i386? anywhere. its purpose is to get a 64bit value from the remote node. in it, we simply call the above deputy_get_userX function. its symbol is EXPORT_SYMBOL'd. at this point, we start into code intended to run on the remote node, responding to the above functions. remote_copy_user is called from the below remote_handle_user, to handle DEP_COPY_FROM_USER and DEP_COPY_TO_USER packets, sent from deputy_copy_from_user and deputy_copy_to_user on the home node. in it, we first receive an omp_usercopy_req structure from the home node, then allocate the ammount of space indicated by the omp_usercopy_req's len member, to store the information being copied. we then check the passed in request, to see if we're copying to or from userspace. if we're copying from userspace, we invoke copy_from_user to copy the data requested into our temporary buffer, then use comm_send to send the data to the home node. if we're copying to userspace, we use comm_recv to receive the data into our temporary buffer, then copy_to_user to copy the data into the requested location. remote_strncpy_from_user performs strncpy_from_user on behalf of deputy_strncpy_from_user. it uses comm_recv to get its target, and comm_send to return the results. its symbol is not exported. remote_strnlen_from_user performs strnlen_user or strlen_user on behalf of deputy_strnlen_user. it functions similarly to remote_strncpy_from_user. its symbol is not exported. remote_put_user will use put_user on behalf of either in up to a 64bit size. its missing BITS_PER_LONG logic that should be like the following function. its symbol is not exported. remote_get_user is structured similarly. its got BITS_PER_LONG==64 logic. its symbol is not exported. finally, we have remote_handle_user, which is the function that dispatches up to above remote_ functions. it calls comm_recv looking for a req structure. other than that, its a large select case. we return from it when we receive a endtype packet, returning 0. if there's an unrecognized packet, we call remote_disappear to die. 031 omctrlfs, local creates hpc/ctrlfs.c omctrlfs is the future filesystem for performing migration and http://osdir.com/ml/linux.cluster.openmosix.devel/2006-01/msg00028.html remote process state monitoring. this file is a stub, implimenting this filesystem type as simply as possible. we start by defining CTRLFS_MAGIC, which is the magic string the filesystem layer will use dead code. to recognize this FS type. we then create a vfsmount structure, which is normally used to track our filesystem state, but is not used in this module. next, we declare our mount count, which is used to track how many times this filesystem has been mounted. this is only used when de-registering the filesystem. ctrlfs_fill_super is called as a callback function by our invocation of get_sb_single in ctrlfs_get_sb. ctrlfs_fill_super wraps simple_fill_super(), passing it our CTRLFS_MAGIC, and our empty list of files. ctrlfs_get_sb is called by the kernel's filesystem layer via the ctrlfs_type variable we send to register_filesystem. in it, we wrap get_sb_single(), telling it to use ctrlfs_fill_super to generate our filesystem's superblock. we then declare a file_system_type structure, mapping .get_sb to our ctrlfs_get_sb above, and .kill_sb to a generic cleanup function. om_ctrlfs_init is called from the kernel to initialize the module. in it, we just call register_filesystem() with the previously defined file_system_type structure. om_ctrlfs_exit is called prior to removing the module. in it, we call simple_release_fs(), then unregister_filesystem(). we then use module_init() and module_exit() to define the init and exit points for this module, then register the license and the author. 032 debug creates hpc/debug.c this file contains debugging assisting code. it starts with debug_mlink which is a wrapper which printks the address of a socket. debug_page creates a checksum of a 4096 byte page of memory, and check incoming pointers! printks the results. debug_vmas dumps the starting address and ending address of each vma belonging to a given mm_struct. debug_signals is a stub, not printking anything of value. 033 debugfs,local creates hpc/debugfs.c this file contains our debugfs module. this module creates entries in the debugging filesystem allowing userspace to see openmosix specific debugging values. we start by defining a dentry structure, used to contain the om/ directory we create in the debugfs. we then define four file entries, pointing the migration, syscall, rinode, and copyuser files to entries in the om_opts structure (defined in hpc/kernel.c), move the declaration of om_opts to this and an array of dentry structures to contain the four files. module? we don't seem to be using these om_debugfs_init is called by the kernel to initialize this module. In debug values anywhere else, what is it, we call debugfs_create_dir to create the om debugfs directory, then the use of this code? debugfs_create_u8 to create entries for our four files in this directory. om_debugfs_exit is called by the kernel prior to unloading this module. in it, we use debugfs_remove to destroy the entries for the four files, and the directory itself. we then add code defining the entry and exit points of the module, the license, and the author. 034 i386,arch-debug creates hpc/debug-i386.c this file contains architecture specific debugging code for the i386 remove one uaccess.h include. archetecture. none of the functions in this file should be called in remove one ptrace.h include. the normal functioning of openmosix. om_debug_regs printks the user printk! register set of the passed in or current process, otherwise known as the pt_regs structure. if no pt_regs structure is passed in, we use ARCH_TASK_GET_USER_REGS to retrieve the pt_regs structure of the current process, and printk it. debug_thread printks the contents of the passed in thread_struct register. show_user_registers is shamelessly stolen according to the comments, and does a fuller job of dumping the state of a user process than the previously defined om_debug_regs. in it, we printk the CPUNO, EIP and EFLAGS. we then printk the EAX, EBX, ECX, EDX, ESI, EDI, EBP, DS, ES, and SS registers, and the processes pid, thread_info pointer, and task pointer. we call show_stack to dump the stack, then we dump the hex values of the 20 instructions before EIP, and the 20 instructions after EIP. 035 ppc,arch-debug creates hpc/debug-ppc.c this file contains architecture specific debugging code for the ppc remove one uaccess.h, and one ptrace.h. archetecture. none of the functions in this file should be called in printk! the normal functions of openmosix. om_debug_regs printks the contents of the passed in pt_regs structure, or if NULL is passed in, the pt_regs structure of the current process. debug_thread and show_user_registers are stubs, doing nothing and returning nothing. 036 x86_64, creates hpc/debug-x86_64.c this file contains architecture specific debugging code for the x86_64 arch-debug remove one uaccess.h, and one ptrace.h. archetecture. none of the functions in this file should be called in printk! the normal functions of openmosix. om_debug_regs printks the contents of the passed in pt_regs structure, or if NULL is passed in, the pt_regs structure of the current process. debug_thread and show_user_registers are stubs, debug_thread only printking a single line, and both and returning nothing. 037 omremote creates hpc/deputy.c this file contains functions for servicing requests from a remote process, AKA, communication to the home node, from a process that is a guest on a remote node. first, we define deputy_die_on_communication, rename deputy_die_on_communication to which in spite of its name is called by deputy_process_communication to deputy_die. kill the deputy when communication with the remote node containing the printk! remote half of the process fails. it printk's a message, then calls do_exit(SIGKILL). Its defined NORET_TYPE indicating it will not be returning. deputy_do_syscall is called by the later defined deputy_process_communication when a syscall request from a remote process is received. in it, we use comm_recv() to receive the syscall number and address of its arguments, then OMDEBUG_SYS a debugging message. we execute the syscall requested using arch_exec_syscall, OMDEBUG_SYS another debugging message, then return the result of the syscall. deputy_do_fork is called by the later defined deputy_process_communication to perform a fork on behalf of the remote process. in it, we use comm_recv to receive the fork request. we then use comm_setup_connect to open up a connection to a listening socket on the remote node, call do_fork, then use task_set_comm to associate the new child process with the newly created connection. we call comm_send_hd to return the results of the fork to both the parent and child on the remote node. deputy_do_readpage is called by the later defined deputy_process_communication when we receive a page read request. in it, we use comm_recv to get the file pointer and offset of the content requested. we then use task_heldfiles_find get a read handle on the file. we use an empty vma page to hold the results of a page in request from the file, and send the page to the remote end. we unmap the page, then return the result of the comm_send. merge this into do_mmap_pgoff? deputy_do_mmap_pgoff is called by do_mmap_pgoff in mm/mmap.c to perform the same function as do_mmap_pgoff's lower half, taking into account the differences required by deputy processes. to do this, we allocate memory for a vma structure from SLAB_KERNEL and zero it. we set up a vma structure in this space to contain the definition of the memory area we've been requested to occupy. we pass this vma to our passed file *'s mmap f_up handler. we then add this file to our held files for this process by calling task_heldfiles_add. theres a comment fill in missing code! here indicating that we're supposed to insert the vma into our current->?? (current->mm->mmap list?), but the code for that isn't written. deputy_do_mmap is called from the later defined deputy_process_communication, when a mmap request is received from a remote process. in it, we uses do_mmap_pgoff from mm/mmap.c to mmap a file into the deputy, then return the mmapped region's contents to the remote host. bprm_drop is used by the later declared __deputy_do_execve to destroy a linux_binprm structure, which is a structure for containing an executable program, its arguments, pages, security http://www.kernel-api.org/docs/online/1.0/da/d1e/structlinux__binprm.html context, and mm structure. bprm_drop calls the appropriate destructors for the members of our binprm, and calls fput() on all the processes still open writable files, before destroying the binprm structure itsself. __deputy_do_execve is called by deputy_do_execve, to do the work of performing an execve when an execve reques is received from a remote process. we use search_binary_handler to perform the execve on FIXME: free the pages our arguments are the home node. if it was successful, we have a FIXME indicating we contained in. should be freeing the pages containing our arguments. we then free our binprm's security context, call acct_update_integrals (to tell the accounting system about the new process), free the bprm structure, and "return" to the new process. otherwise, we use the previously defined bprm_drop to clean up the failed execve attempt. deputy_setup_bprm is called by the later defined deputy_do_execve to setup a bprm structure suitable for execution by __deputy_do_execve. in it, we allocate space for our bprm structure from GFP_KERNEL. we then call open_exec to attempt to open our executable. if that succeeds, we fill in the bprm's file, filename, interp, and mm members, using mm_alloc to fill in mm. we use the kernel's init_new_context() to perform architecture specific mm setup. on x86, init_new_context copies the local descriptor table of the current process to the new process. we then copy argc and envc, making sure neither is less than zero. we allocates a security context, then use prepare_binprm to fill in the rest of the bprm structure. we use copy_strings_kernel to copy our filename, argv, and envp arrays into kernel pages, instead of user space memory. if any of the above fails, we use bprm_drop to clean up. deputy_do_execve is called by the later defined deputy_process_communication when an execve request from a remote node is received. in it, we call comm_recv to receive the requested file, argv, and envp, deputy_setup_bprm to get a brpm structure ready to execute, then __deputy_do_execve to perform the FIXME: empty reply? whats with this? work. we then use comm_send_hd to send back an empty reply. if any of FIXME: multiple unique error paths! the above fails, we destroy the space used to store the request, and return the error value in question. deputy_do_sigpending is a wrapper around do_signal, called by deputy_process_misc. it has code for doing FIXME: dead code! more, but its dead/unused code. deputy_process_misc is called by deputy_main_loop to checks for pending dreqs, and dispatches them to task_do_request. it also checks for pending signals, and dispatches them to deputy_do_sigpending. deputy_process_communication is called by deputy_main_loop if a process has communication. it contains a large switch that dispatches requests the aforementioned callees. it calls deputy_die_on_communication if comm_recv returns an error, if the type member of the req received is zero, or if one of the functions we call returns negative. deputy_main_loop is the userspace loop that is executed as the main thread of a deputy process. in it, we immediately enter a large while loop, waiting for the process to not be DDEPUTY. in the while loop, we call deputy_process_communication when comm_wait returns true, then call deputy_process_misc to dispatch pending dreqs and signals. its my beleif that this function never returns naturally. deputy_startup is called by hpc/migctrl.c's task_local_send to exit the old task, and enter deputy state. in deputy_startup, we use task_set_dflags to mark this task as deputy, flush a signal that pops FIXME: hunt down the unknown reason. up for unknown reason, according to a fixme, and calls exit_mm(). 038 omremotefile creates hpc/files.c This file contains routines for handling file and dentry requests on omremotedentry BREAKUP: should create hpc/dentry.c behalf of remote processes. It starts with two structure declarations. OPT: move remote_aops inside of remote_aops is an address_space_operations structure, mapping .readpage rdentry_create_entry to hpc/remote.c's remote_readpage, not touching any other mappings. its used by the later defined rdentry_create_entry(). The second structure defined is remote_file_operations, mapping .mmap to hpc/remote.c's remote_file_mmap, not touching any other mappings. remote_file_operations is used by rdentry_create_entry() and OPT: return void? rdentry_create_file(). task_heldfiles_add is called by deputy_do_mmap_pgoff in hpc/deputy.c to create and insert an om_held_file structure representing the passed in file into the list of files held by the given process. in it, we allocate an om_held_file struct from GFP_KERNEL, use get_file to increment the file's usage OPT: remove nb member? counter, fill in the om_held_file's file and nb entries with the passed http://www.faqs.org/docs/kernel_2_4/lki-3.html in file pointer, fill in rfile->nopage with nopage from the passed in vm_operations_struct, and insert our om_held_file struct into task->om.rfiles with list_add. we then return 0, since no conditions in this function return errors. task_heldfiles_clear is called by openmosix_task_exit to destroy the contents of the passed in held files linked list. for each file in the list, we call fput to decrement the file usage counter, then free the om_held_file structure. task_heldfiles_find searches the list of files held by the passed in task for an om_held_file whose file member matches the passed in file pointer. in it, we use list_for_each_entry to iterate through the list. if we find a match, we return the heldfile, otherwise, we DEAD: Dead code. printk() an error message, and return NULL. next we have a structure declaration that has been commented out via a #if 0. it was to declare SPACE: break in the file? a backing_dev_info structure. after that, theres a break in the file. we define the om_remote_dentry structure, a spinlock, and a list_head structure for containing the om_remote_dentrys. rdentry_delete acquires the remote_dentries spinlock, and removes the first entry in the remote_dentries list with a dentry member that matches the passed in dentry. If we don't find a matching entry, we call BUG(), and return -ENOENT. rdentry_iput is called via a pointer stored in the later defined remote_dentry_ops structure. Its function is to free a given inode's generic_ip member (containing our rfile_inode_data structure), then call iput to both push the inode's unsaved contents to disk, and decrement its usage counter. the remote_dentry_ops structure maps its .d_delete and .d_put entries to the previous two DEAD: Dead code. functions, but is not used anywhere in the code. Next, we declare a super_operations structure, containing only default members. We then use this structure to fill the .s_op member when declaring a super_block structure, filling the super_block's .s_inodes member with OPT: move remote_file_vfsmnt inside of a new LIST_HEAD. struct remote_file_vfsmnt is a vfsmount structure, rdentry_create_file. contining five required list heads, and a mount count. it is declared as its own parent. rdentry_add_entry is called by the later defined rentry_create_dentry to create an om_remote_dentry structure containing the passed in dentry, and insert it into the list of remote dentries. in it, we allocate a new om_remote_dentry from GFP_KERNEL, sets the dentry member to the passed in dentry, aquire the remote_dentries spinlock, add the om_remote_entry to the remote_dentries list, then release the spinlock. If the kmalloc fails, we return -ENOMEM, otherwise we return 0, indicating success. rdentry_create_dentry is called by rdentry_create_file to create a new dentry pointing to the passed in rfile_inode_data, and register it with rdentry_add_entry. first, we create a new inode, backed by rfiles_dummy_block. we create a duplicate of the passed in rfile_inode_data allocated from GFP_KERNEL, and set our new inode's u.generic_ip member(the inodes private data space) to point to our copied inode. our new inode's file and address space operations are pointed to the earlier defined stubs, remote_file_operations and remote_aops. We allocate a dentry using d_alloc, set its inode member to our new inode, and make it its own parent. to accomplish this, when we call d_alloc, we temporarily create FIXME: expand gccism? a qstr structure stating that this entry is for a file named "/", FIXME: real file name and length? with a length of 1. we pass this qstr as our argument to d_alloc. We use rdentry_add_entry to add our newly aquired dentry to our FIXME: error handling? remote_dentries list, and return the new dentry. the error handling in this function seems VERY broken. If either of our alloc calls fail (kmalloc or d_alloc), we free our passed in data(!), call iput on our allocated inode, and return NULL. rfile_inode_get_data is a wrapper called by rfiles_inode_get_file. it returnsthe given inode's u.generic_ip(the private data space set by rdentry_create_dentry) contents. rfiles_inode_get_file is a wrapper called by hpc/remote.c's remote_readpage returning rfile_inode_get_data(inode)->file. rfiles_inode_compare is a wrapper called by rdentry_find and task_rfiles_get. it memcmps the passed inode's private data space against the passed in rfile. returning the result. rdentry_find is called by rdentry_create_file to look up an rdenty coresponding to the passwd in inode. in it, we grab the remote_dentries spinlock, and use list_for_each_entry to cycle through all of the rdentries, comparing each to rdentry->dentry->d_inode. if we find a match, we breaks out, unlock the spinlock and return the dentry of the matching rdentry. otherwise, we unlock the spinlock and return NULL, due to the last FIXME: verify this works. dentry being NULL. rdentry_create_file is called by the later defined task_rfiles_get to create a file pointer matching the supplied rfile_inode_data. in it, we use get_empty_filp to create an empty file pointer, then call dget(rdentry_find(data)) to get a dentry pointing to the passed rfile_inode_data(if one exists). if dget fails, we call rdentry_create_entry to create a dentry pointing to our passed rfile_inode_data. if our rdentry_create_entry call fails, we call put_filep to close our file pointer, and return NULL. we use remote_file_operations and remote_file_vfsmnt structures to set our new file pointer's f_op and f_vfsmnt members, set its f_dentry to our dentry, and mark its mode FMODE_READ. we then return the file pointer. if get_empty_filep fails, we return NULL. task_rfiles_get is called by mig_do_receive_vma and remote_do_mmap to search through the processes' vma pages, and check to see if any of them have a paticular file associated with them. to do this, we construct an rfile_inode_data structure containing our passed in origfile, node, and isize. We then compare it against our list of rdentry files, using rfiles_inode_compare. If rfiles_inode_compare returns true, we return the file pointer associated to the inode in question. if not, we call rdentry_create_file to create a new rdentry containing the passed in file, an return the file pointer returned from rdntry_create_file. 039 kcomd creates hpc/kcomd.c This file contains the kernel-to-kernel socket communication code. This file is set up to create a kcomd.ko kernel module. We start by defining three socket related functions. socket_listen is called via the later defined sock_listen_ipv4 and sock_listen_ipv6, by the later defined kcomd_thread function, to set up the listening IPv4 and IPv6 sockets. In it, we create a socket, call sock_map_fd to associate an fd to the socket, bind to the socket using its sock->ops->bind(), start listening to the socket using its sock->ops->listen(), set the passed in pointers res to point to the newly created socket, and return the file FIXME: set res to NULL in every fail descriptor to the newly created listening socket. If sock_create fails, case. unique return values! we return -1. If sock_map_fd fails, we release our socket, assign NULL to the address passed via res, and return -1. If either our bind or listen fails, we close our fd, release our socket, assign NULL to res, and return -1. socket_listen_ipv4 and socket_listen_ipv6 are called by kcomd_thread to set up the correct type of listening socket. both these functions are wrappers performing setup, then calling the above socket_listen function, returning the returned result. they each set up their appropriate type of sockaddr structure, and call BREAKUP: move these structures to a socket_listen, returning the result. next we define three kcom related private header. data structures. struct kcom_pkt is designed to contain a packet destined to a remote kernel. struct kcom_node is designed to contain a socket, and the information reguarding the node it points to. kcom_task is the structure that contains kcomd's knowledge about a migrated process. it contains the pid of the process in question, a kcom_node OPT: is this list used? structure defining what node a process is on, a list of processes communicating with this node(?), a list for containing outgoing packets, and space for one incoming packet. we then define a spinlock and a list_head for containing these kcom_nodes. next, we define sockets_fds as a fd_set_bits structure. this structure is a more scalable version of a fd_set, used by do_select. We then declare sockets_fds_bitmap and maxfds, which are set and used by the next function, alloc_fd_bitmap, to hold a dynamically grown array of fds. alloc_fd_bitmap checks the passed in fdcount against ammount of fds the current sockets_fds_bitmap was created to hold, and if its greater, frees sockets_fds_bitmap (and its contents), and allocates a new one. if kmalloc fails, we return ENOMEM. otherwise, we set the in, out, ex, res_in, res_out, and res_ex members of the sockets_fds structue to offsets of our sockets_fds_bitmap structure, and return 0. kcom_pkt_create is called by the later defined kcom_task_send to create a new kcom_pkt structure with the len, type, and data members initialized to the passed in values. if kzalloc fails, we return NULL. Otherwise, we return the properly initialized kcom_pkt structure. __kcom_node_find is called by the later defined kcom_node_find to do the work of finding a node in our kcom_nodes list that uses the passed in sockaddr to communicate. We use list_for_each_entry and memcmp to FIXME: doublecheck this return compare the address of our sock with the address of our node(!). this FIXME: note the fixme reguarding memcmp function will return NULL if it fails. kcom_node_find is not called by DEAD: dead code. anything. it wraps __kcom_node_find, grabbing the kcom_nodes_lock before entry, and releasing it afterward. kcom_node_add is called by accept_connection to create a new kcom_node struct, and add it to the DEAD: dead code. kcom_nodes list. there is code commented out reguarding finding out if the node is already in the list, but its incomplete. kcom_node_del is not called by anything. its called to remove a node from the kcom_nodes list that uses the passed in sockaddr. in it, we aquire the kcom_nodes spinlock, then use __kcom_node_find to find a node structure to be deleted. if we don't find one, we release the kcom_nodes spinlock, and return -ENOENT. otherwise, we call list_del to remove the node from our node list, release the spinlock, close the communicating fd, release the socket, free the node structure's memory, and return 0. DEAD: dead code. comm_simple is a stub that returns 0, and is not called elsewhere in DEAD: dead code. the code. we then forward declare comm_ack, comm_iovec, and comm_iovec_ack functions, which are not defined or called anywhere else. accept_connection is called by the later declared kcomd_thread to accept an incoming connection on a passed in socket. in it, we start by allocating a new socket, and calling the accept() operation of the passed in socket to accept a connection from the passed in socket, on FIXME: this dead code needs to live! our new socket. theres a block of commented out code, for checking if a node is already in our node_list, but it is unused/incomplete. we then use sock_map_fd to get a file descriptor to this socket, add the node use socket is communicating with to our node_list, and return our file descriptor. If our socket allocation returns null, we return -1. If our FIXME: unique error paths! accept or sock_map_fd have problems, we release our socket, and return -1. if kcom_node_add fails, we close our fd, release our socket, then return -1. we then create data_read, data_write, and dispatch stubs that only return 0. data_read and data_write are called by DEAD: dead code. kcomd_thread. dispatch is never called. kcom_task_create is not called from anything. its called to create a kcom_task structure for a given kcom_node and PID, initializing the pid, node, and list members. in it, we use kzalloc to allocate the memory from GFP_KERNEL. if the kzalloc returns NULL, we return NULL. otherwise, we initialize the fields of DEAD: dead code. our new structure, and return it. kcom_task_delete is not currently called from anywhere. its called to delete the first entry in the nodes FIXME: task manipulation functions are list matching the given PID. __kcom_task_find and kcom_task_find are missing spinlocks like the node code constructed like the above node find code, but without spinlocks. uses. find out why. __kcom_task_find uses list_for_each_entry to cycle through our list of nodes, using list_for_each_entry on each node's list of tasks to find a task with the same pid as the passed in PID, and return it, or NULL if one is not found. kcom_task_find is a wrapper around __kcom_task_find, passing it the PID passed in, and returning the returned result. kcom_task_send isn't called by anything. it uses kcom_pkt_create to add a packet to the task structure belonging to the FIXME: incomplete code! pid passed in. it has comments reguarding sleeping and replying, but instead returns 0. kcomd_thread is the function executed in kernel space, as a kernel thread.in it, we first call daemonize to create a "kcomd" process. we then wait for a connection on an ipv4 and an ipv6 FIXME: should this while loop exit? socket. when we receive a connection, we enter a large while loop In this loop, we first call alloc_fd_bitmap to make sure our fd bitmap is big enough to hold maxfds number of fds. we then zero the in, out, and ex fd sets, add our two listening sockets to the in set, add the listening fds of each node in our node_list to the in set, add each fd in our node list that we have packets to send on to the out set, zero the res_in, res_out, and res_ex set of fds, and call select. if select returns -1, we return to the top of our loop. otherwise, we test to see if our v4 or v6 listening socket received a connection. if so, we call accept_connection. if not, we examine each fd belonging to our list of nodes, and if they have data to read, call data_read (a NOP!), or if they have data to be written call data_write(also a NOP!). when done, we return to the top of our never-ending while loop. kcom_init calls kernel_thread to start the aforementioned kcomd_thread function as a kernel thread. kcomd_exit is an empty function. we register kcomd_init to run when the module is loaded, kcomd_exit to run upon unload, and call two macros, the first licensing this file under the "GPL", and the second attributing Vincent Hanquez as the author. 040 config creates hpc/Kconfig this file defines our openmosix menu options in the kernels configuration system (menuconfig). we declare a top level menu titled "HPC Options". our configuration options all exist under this entry. FIXME: seperate this into PMIGUEST first, we create an entry defining KCOMD as a tristate, or an item that support, and PMIREMOTE support. can be either on (in the kernel), off, or a module (loadable and BUG: shouldnt KCOMD depend on unloadable while the kernel is running). next we create an entry OPENMOSIX, not the other way round? defining OPENMOSIX as bool (in kernel, or not). this turns on or off the parts of openmosix that must be in-kernel for openmosix to DEAD: dead code. function. bool OPENMOSIX_VERBOSE is supposed to make openmosix more verbose, but just serves to make OPENMOSIX_MIGRATION_VERBOSE and OPENMOSIX_DEBUG_FS visible. bool OPENMOSIX_MIGRATION_VERBOSE enables debugging messages of the form OM_VERBOSE_MIG(...) in include/hpc/prototype.h. bool OPENMOSIX_DEBUG accomplishes many things. first, it enables compilation and inclusion of hpc/debug.c and an archetecture specific hpc/debug-$(ARCH).c, both of which contain functions for printing the state of various structures, processor registers, and other associated values. then, it enables debugging messages of the form OMDEBUG(...) in include/hpc/debug.h. it enables the tracking of the contents of the structure openmosix_options in include/hpc/hpc.h, and makes OPENMOSIX_MIGRATION_DEBUG and DEAD: dead code. OPENMOSIX_DEBUG_FS visible. bool OPENMOSIX_MIGRATION_DEBUG does not do anything, and can be safely removed. bool OPENMOSIX_DEBUG_FS enables the compilation and inclusion of the contents of hpc/debugfs.c, creating the om/ directory and its contents under the debugfs. bool OPENMOSIX_CTRL_FS enables the compilation and inclusion of hpc/ctrlfs.c, which is a filesystem stub, intended to be the next migration control filesystem. 041 ominterface creates hpc/kernel.c This file is the primary interface for the kernel to the process migration system. It contains mostly functions meant to be called by hooks we place in the kernel. First, we export the openmosix_options OPT: shouldnt the existance of this structure, which contains four constants used as "ceilings" for the structure depend on debugfs? OMDEBUG_* debugging macros. The values in this datastructure are settable through debugfs entries, created by hpc/debugfs.c. openmosix_pre_clone is called by code added to kernel/fork.c's do_fork before the kernel starts processing the fork request. In this function, we check wether the current process has requested a shared memory space between the two clones, and if it has, we mark the process as un-migratable for the DSTAY_CLONE reason, and increase the usage count on its mm structure. Note that both processes resulting from the fork will be marked DSTAY_CLONE, and both will inherit mm usage counts increased by +1 on their mm structures. openmosix_post_clone is called by code inserted at the end of do_fork in kernel/fork.c. It is called on the thread of both the parent and the child process after the work in do_fork has completed. In it, we first check to see if the process is marked VM_CLONE. if it is, we immediately return. otherwise, we check the mm_realusers counter. if its just 1, then the processes mm is this supposed to happen when a structure is only being used once, so we clear the DSTAY_CLONE flag child dies, or otherwise drops the previously assigned by openmosix_pre_clone. task_maps_inode is called shared mm? by the next function, openmosix_no_longer_monkey, to check wether a FIXME: stub! given task maps a given inode, but is just a stub, returning 0. openmosix_no_longer_monkey is called from __remove_shared_vm_struct to check every process on the machine and see wether its using the passed in inode. if it is, we set the processes DREQ_CHECKSTAY flag, as this inode is about to be removed from service by __remove_shared_vma_struct, and doing such may make this process migratable. setting this flag indicates to hpc/task.c's task_do_request that this process needs reexamined during the reexamine sweep. to accomplish the former function, we first aquire the tasklist_lock, then invoke for_each_process() against the list of all processes. We use the previous function task_maps_inode to check if each process is using the passed in inode. Since the previous function is a stub, this function does nothing. At the end of the function, we release the tasklist_lock. stay_me_and_my_clones is called by code added to sys_mlock and sys_mlockall in mm/mlock.c, as well as do_mmap_pgoff in mm/mmap.c. it applies a given bitmask of reasons not to migrate to the current task, and all tasks that share its mm structure. in it, we first use task_lock to lock the current process. we then set its stay reason, then task_unlock. if the number of mm_realusers is greater than one (indicating that some other process uses this processes mm structure), we grab the tasklist_lock, use for_each_process to search for a process with the same mm pointer (that aren't the current process), and use task_lock, task_set_stay, and task_unlock to add our OPT: void, not int? stay reasons to the process. we always return 0. obtain_mm is called by mig_handle_migration() in hpc/migrecv.c and task_local_bring() in hpc/migctrl.c to allocate a new mm structure, initialize it, and make FIXME: should there be a mm, should we it the context of the current process. We start by checking to see if be DDEPUTY? theres a mm structure already associated with the passed in task. If there is, we check to make sure the process is not marked DDEPUTY. If FIXME: what was the logic of the it is, we call panic() to print a debugging message. at this point is commented out code? some commented out code that responds to there being an mm on a deputy process by calling exit_mm on it. Either way, we then mm_alloc() a new FIXME: mm leak? mm, initialize it to hold our given task with init_new_context(), aquire the mmlist_lock, initialize our new mm's mmlist member with the mmlist of process zero, and release the mlist_lock. We then assign this mm to our process by first aquiring the task_lock(), saving our curent active mm, setting the task's active mm and mm to our newly created mm, and task_unlock()ing. we call activate_mm with our origional and new mm, then mmdrop the old active_mm. if our mm_alloc() fails, we return -ENOMEM. if init_new_context() fails, we destory our allocated mm, and return the error init_new_context() failed with. otherwise, we return 0 for success. unstay_mm is called by code added to sys_munlock and sys_munlockall in mm/mlock.c to set the DREQ_CHECKSTAY flag on the current process, and all processes that share its mmstructure. for the premature optimization? looks good tho. common case of just one task using a given mm structure, we just call BUG: no locking! task_set_dreqs(current, DREQ_CHECKSTAY). otherwise, we use for_each_process() with a read_lock held on the tasklist_lock to iterate through each process on the machine, checking if its using our passed mm, and if so, we call task_set_dreqs(p, DREQ_CHECKSTAY) on it. remote_pre_usermode is called by the later defined openmosix_pre_usermode to check for communication events before entering userspace. in it, we call comm_peek() to see if theres pending input, and if there is, call remote_do_comm() to process the OPT: void, not int? communication in question. remote_pre_usermode always returns 0 for success. deputy_pre_usermode is also called by openmosix_pre_usermode, before jumping to userspace while handling a process in deputy state. in this function, we just jump into deputy_main_loop, instead of going to any real usermode code. if deputy_main_loop() returns, this function returns 0. openmosix_pre_usermode is called by assembly code inserted into arch/$ARCH/kernel/entry, when switching from kernel space to user space. we first check for pending dreqs, and if we find one, we save our current irq mask, call task_do_request, and restore our irq mask once task_do_request returns. after dispatching possible dreqs, we call one of the previous two functions depending on wether the process is in DDEPUTY or DREMOTE state. like before, we save our irq mask before calling remote_pre_usermode or deputy_pre_usermode, then restore them OPT: return void? once we return from userspace. this function always returns 0 for success. openmosix_init is called upon subsystem load. it starts the openmosix_mig_daemon kernel thread to receive incoming processes, and returns 0. the last line in this file tells the subsystem system to call openmosix_init upon initializing this kernel component. 042 config creates hpc/Makefile This Makefile contains the make fragments that tell the kernel what targets to build in the hpc/ directory. this code defines five targets, obj-$(CONFIG_KCOMD), obj-$(CONFIG_OPENMOSIX) obj-$(CONFIG_OPENMOSIX_CTRL_FS), obj-$(CONFIG_OPENMOSIX_DEBUG) and obj-$(CONFIG_OPENMOSIX_DEBUG_FS). each of these targets coresponds with one of the configuration variables defined by hpc/Kconfig. when an option has been enabled in the menus, the items listed by each of these targets wull be built. obj-$(CONFIG_KCOMD) says to build kcomd.o. obj-$(CONFIG_OPENMOSIX) says to build kernel.o, task.o, comm.o, remote.o, deputy.o, copyuser.o, files.o, syscalls.o, migrecv.o, migsend.o, migctrl.o, service.o, proc.o, and an arch-$(ARCH).o file containing archetecture specific functionality. proc.o has a comment isnt this the code we use? noting that its "legacy code". obj-$(CONFIG_OPENMOSIX_CTRL_FS) says to build ctrlfs.o. obj-$(CONFIG_OPENMOSIX_DEBUG) says to include debug.o, and an archetecture specific debug-$(ARCH).o. finally, obj-$(CONFIG_OPENMOSIX_DEBUG_FS) says to include debugfs.o. 043 core creates hpc/migctrl.c This file contains functions for moving processes via the migration migration_cntrl infrastructure. task_remote_expel is called by the next function FIXME: shouldn't remote_do_comm use task_remote_wait_expel and hpc/remote.c's remote_do_comm to send a task_remote_wait_expel? remote process back to its origional node, merging it with its deputy. First we check to make sure the task we've been passed is in DREMOTE state, and use BUG_ON to panic the system if its not. We then use http://kerneltrap.org/node/7204 hpc/migsend.c's mig_send_hshake to request a return migration from the home node. If this succeeds, we call hpc/migsend.c's mig_do_send to actually perform the migration. After that, we destroy our link to the home node by using hpc/task.c's task_set_comm to associate our link to null, then calling hpc/comm.c's comm_close() against our old link (returned by task_set_comm). We then call do_exit(SIGKILL) to end the FIXME: Who gets this result? current process. In case either of our mig_send_hshake or mig_do_send FIXME: unique errors! calls fail, we OMBUG("failed\n"), and return -1. task_remote_wait_expel is called by the later defined __task_move_to_node to return a remote task to its home node. This function wraps the previous function, first requesting permission to return home by sending a REM_BRING_HOME req, then waiting on a DEP_COMING_HOME reply. If comm_recv fails, or we recv something other than a DEP_COMING_HOME, we return -1. otherwise, we call task_remote_expel, returning its result. task_local_send is called by the later defined __task_move_to_node to send a local task to a remote host. In it, we first check to make sure the task is not in FIXME: returning success in case of DDEPUTY state. If it is, we return 0, as this process is already error! running on a remote node, and does not belong to the local machine to begin with. Otherwise, we open a new connection using hpc/service.c's sockaddr_setup_port and hpc/comm.c's comm_setup_connect, then attach our new connection to the current process with task_set_comm. We set the current process into DDEPUTY state, and ask permission to send by FIXME: why do we use hshake here, and sending a HSHAKE_MIG_REQUEST using mig_send_hshake. If our handshake is req above? successful, we call mig_do_send to perform migration to the remote node. When mig_do_send returns successfully, the process has been sent to the remote node, and the local process is now a deputy. We call FIXME: return errors! make them unique! deputy_startup, and return 0. If either comm_setup_connect, mig_send_hshake, or mig_do_send returns failure, we remove our DDEPUTY flag, destroy our link to the remote node (if applicable), and return 0. task_local_bring is called by the later defined __task_move_to_node to return a remote process to the current node, re-merging it with its deputy. in it, we first check to make sure the current task is in FIXME: returning success in case of DDEPUTY state. If its not, we return 0, as this process is already error! running on the local node. Otherwise, we use obtain_mm to get a new mm struct. we then make a DEP_COMING_HOME request to the remote end, and use mig_recv_hshake to receive our reply. Assuming success, we call hpc/migrecv.c's mig_do_receive to receive the process back, clear our DDEPUTY flag, and use task_set_comm/comm_close to destroy our FIXME: unique ombug message for each link. If obtain_mm, mig_recv_hshake, or mig_do_receive return failure, failure! and unique return codes! we OMBUG("failed\n"), and return -1. task_move_remove2remote is called by the later defined __task_move_to_node to handle moving a task between two remote hosts as in moving it from one machine, to another machine, without ever returning the process to its home node. This FIXME: STUB! dosent return fail? function is a stub. It just calls OMBUG(), and returns 0. __task_move_to_node is called by the later defined task_move_to_node, task_go_home, and task_go_home_for_reason to move a task to a given node, using the appropriate function from above. First, we set flag DPASSING on given task, indicating we're going to try and transfer it somewhere. Then, we check to see if it has a DREMOTE flag. If it does, and we were given a node to send it to, we call task_move_remote2remote, accomplishing nothing since that function is a stub. if DREMOTE is set, but we were not given a node to send it to, we call task_remote_wait_expel. if DREMOTE is not set, and we were given a node to send to, we call task_local_send. otherwise, DREMOTE is not set, and we were given no destination to send a process to, so we call task_local_bring. after we've called one of these four functions, we clear the DPASSING flag, and return the error passed to us by the function we called. task_move_to_node is called by hpc/task.c's task_request_move, to move a process to the given node. this function wraps __task_move_to_node, first checking for a stay reason. If there FIXME: printk! is a stay reason, we printk() an error and return -1. otherwise, we FIXME: pass on __task_move_to_node's call __task_move_to_node, and return 0. task_go_home is called by return value! hpc/deputy.c's deputy_process_communication, to send a task home when we have received a REM_BRING_HOME request. First, we check to make sure FIXME: PRINTK! the given process is DMIGRATED. If its not, we printk() a warning, and return -1. otherwise, we call __task_move_to_node, supplying only a task as an argument, thereby requesting a move to the home node. If the FIXME: check __task_move_to_node's process is still marked DMIGRATED after calling __task_move_to_node, we return value! printk! returning 0 in printk() a warning. regardless of outcome, we return 0. case of error? task_go_home_for_reason is called by code added to FIXME: what about other arches? arch/i386/kernel/vm86.c's sys_vm86 and sys_vm86old to send a the given task home, supplying a reason why (DSTAY_86), which is sent to the home node. first, we check wether the reason flag given is already marked on FIXME: printk! this process. if it is, we printk() a warning. we then set the given returning 0 for error! reason flag on the given process, and test wether the process is DREMOTE or not. if its not, we return 0. otherwise, we call __task_move_to_node the same way as the previous function, to send the task home. we check __task_move_to_node's return, and if its not 0, we clear the stay reason we just set. Finally, we then return the value returned by __task_move_to_node. 044 omrecv creates hpc/migrecv.c this file is functions for receiving parts of processes, and filling in the appropriate data structures. it also contains the openmosix mig_daemon. mig_recv_hshake is called by mig_handle_migration, as well as task_local_bring in hpc/migctrl.c. it receives a omp_mig_handshake, and sends a reply, with our OPENMOSIX_VERSION in it. we return -1 if either the send or recv fail, along with invoking OMBUG with a short description of what happened. in case of success, we return 0. bad comment! mig_do_receive_mm is called by mig_do_receive to set the passed task's mm structure to the values stored in the passed in omp_mig_mm structure, starting with start_code and ending at env_end. it starts by OMDEBUG_MIG'ing a trace message, then uses memcpy to push the values from the passed in omp_mig_mm to the processes given's mm structure. bad comment! mig_do_receive_vma is called by mig_handle_migration to set up a vm area in the current process matching the passed in omp_mig_vma structure's definition. we start by OMDEBUG_MIG'ing a trace message, including the start address of the vm, and its size. we then have broken! commented out code, that is supposed to use task_rfiles_get to retreive the file pointer that was initially associated to this page, in the case of a page mmap'd from a file. next, we call do_mmap_pgoff to create the mapping in the mm structure. we supply it with a NULL file argument, the vm_start, vm_size, and vm_pgoff from the passed in omp_mig_vma. for the prot argument, we create a long containing the VM_(READ|WRITE|EXEC) protection flags from the vm_flags in the omp_mig_vma. for the flags argument, we create a long containing the VM_(GROWSDOWN|DENYWRITE|EXECUTABLE) behavior flags from vm_flags in the omp_mig_vma, adding MAP_FIXED and MAP_PRIVATE. we check the result of do_mmap_pgoff with IS_ERR(result), and if we have an error, we return PTR_ERR(result). otherwise, we check vm_flags for the VM_READHINTMASK flag, and if its present, we use sys_memadvise directly to give the kernel either MADV_RANDOM or MADV_SEQUENTIAL memory access hints for this page, depending on wether vm_flags is marked VM_SEQ_READ or not. assuming the IS_ERR earlier didnt cause us to return, we now return 0 indicating success. mig_do_receive_page is called by mig_do_receive to receive a page of memory, and map it into the given task at the appropriate location. like the previous functions, first we OMDEBUG_MIG a trace message, this time including the address of the page we're creating. we then use find_vma to find the vm area that should own this page. if theres no VMA for this page, we OMBUG, then return -1. otherwise, we allocate memory for the page in userspace using alloc_page(GFP_HIGHUSER). if that fails, we OMBUG, then return what about different size pages? -ENOMEM. we kmap the page into kernel space so we can fill it, then receive a page's worth of data using comm_recv. after comm_recv, we kunmap the page. to add the page in the task at the correct spot, we alloc a pte entry pointing to the address we're mapping our page to (and optionaly entries in the pmd, and pud), then check to make sure the entry in question has no page already mapped to it. if it does, we OMBUG() about it. either way, we use set_pte to point this pte to http://kernel.lupaworld.com/downloads/The_Linux_Kernel_Memory_API.pdf our page, after applying the containing vma's vm_page_prot page protection flags while converting the address to a pte with mk_pte, and marking the resulting pte "structure" dirty using pte_mkdirty. use set_pte_atomic? we then unmap the pte entry from kernel space using pte_unmap. page_dup_rmap marks the page as in-use by a pte, and inc_mm_counter marks the mm structure as owning one more page. in case either comm_recv, pud_alloc, pmd_alloc, or pte_alloc_map return an error, we free the page we allocated earlier with __free_page, and return -1. mig_do_receive_fp is called by mig_do_receive to set up the floating point state of a given task. it first calls OMDEBUG_MIG to print a current task, or passed task? tracing message, then uses set_used_math() to mark the current(!) task as one that uses floating point math. we then call the archetecture specific function for setting up a floating point state. mig_do_receive_proc_context is called by mig_do_receive to set up the processor context of the passed in task to the values stored in the passed in omp_mig_task structure. first, it calls the archetecture specific handler to set up things like processor registers and TLS entries, then handles copying the cross-arch items, like the pid and tgid, the user credentials (uid, euid, suid, fsuid, gid, egid, sgid, and fsgid),and various signal related members ( blocked, real_blocked, sas_ss_sp, and sas_ss_sp_size). we also copy the signal handler's 'action'. we have a note about copying an rlimit here, but no code to go with it. we copy the task's comm and personality members, then call arch_pick_mmap_layout to set the task's mm structure up. mig_do_receive is called by task_local_bring in hpc/migctrl.c to receive a process from a remote node, back into its deputy. to begin, we use __get_free_page to get a page thats mapped in GFP_KERNEL, then set the passed in task's state flag to DINCOMING. we clear the used_math flag, and go into the receive loop. in this loop, we first receive a req structure. we examine the dlen member of the received structure to see if it is over a pagesize, and if it is, we BUG_ON, panicing the system. after that check, we decode req.type, and dispatch the data received, and the task to operate on to the appropriate function . the loop ends when we receive a ABORT, the default case, or case MIG_TASK is called, which is the last stage in migration. MIG_TASK's case sends a req back along the socket indicating that migration is complete, clears our state of DINCOMING, flushes the tlb for this processes mm structure, and returns 0. in case of failure in our __get_free_page call, either of the comm_recv calls, or any of the mig_do_receive_* functions that can return failure, we clear the if __get_free_page fails, is it right DINCOMING flag, free our data page, OMBUG a failure message, and return to free its result? -1. mig_handle_migration is the function that a newly spawned, ready to be filled with state task is kicked into by openmosix_mig_daemon, in order for the task to receive the contents of a task being migrated to this node. first we OM_VERBOSE_MIG a trace message, then we use task_set_comm() to setup our link back to the home node. we use obtain_mm to get a new mm structure for this task, and call mig_recv_hshake to inform the home that we're ready to start receiving data. we call mig_do_receive to receive all the process data, re-parent ourself to init, then run arch_kickstart to jump into the process (now in a "runable" state). openmosix_mig_daemon is the migration daemon itsself. first we daemonize ourself with om_daemonize as "omkmigd". we set a flag marking ourself DREMOTEDAEMON, and then initialize our socket/socketaddr. we use set_our_addr to initialize the socketaddr, then comm_setup_listen to open our listening printk()! socket. if comm_setup_listen returns null, we printk a warning, flush our signals, mark ourselves TASK_INTERRUPTABLE, schedule_timeout(HZ), and loop up to just before we called comm_setup_listen, thereby entering a loop, waiting for comm_setup_listen to work. once we have a socket, we enter the listening loop. in this loop, we run comm_accept to attempt to get a channel from a remote kernel. if we get EINTR, ERESTART, EAGAIN, or ERESTARTSYS, we check for a pending SIGCHLD. if we get one, we printk a debugging message. either way, we flush our signals, and re-start the loop. if the error returned by comm_accept wasn't one of those four errors, and wasn't NULL, we OMBUG a failure message, close our link, and return to the spot just before we call comm_setup_listen. if the error is NULL, then we've got a connection. we then call user_thread, sending the socket as the argument to the new process. if spawning the new process returns an error, we close the socket ourselves, either way, when user_thread returns, we return to the top of loop, and wait for a new connection. 045 omsend creates hpc/migsend.c migsend is the mirror image of migrecv. its responsible for tearing down a process, and sending it to a remote node over a socket. mig_send_hshake is called by task_remote_expel and task_local_send in hpc/migctrl.c to 'handshake' with the remote end, asking permission to migrate a task. we send a handshake containing the passed in type, the personality should be checked and OPENMOSIX_VERSION, and the task's personality flags, and see if the translated, to allow ia32 machines to hshake the remote end replies with is a reply, and has a type that send to AMD64. matches the one we sent. if it dosent, we OMBUG about it, and return -1. if either the comm_send or comm_recv fail, we return -1. otherwise, we return 0 for success. mig_send_fp is called by mig_do_send to send the floating point state of a given task to the remote end, if applicable. we call used_math() to check wether the current process uses floating point, then if it does, we call arch_mig_send_fp to store the floating point state into a omp_mig_fp structure, and return the result of comm_send_hd'ing the omp_mig_fp structure to the remote end. if the process dosent use floating point math, we return 0. mig_send_mm is called by mig_do_send to send part of the mm structure of the passed task to the remote end. we copy into a omp_mig_mm structure the part of the mm of the given task starting at mm->start_code, and ending at &mm->start_code+sizeof(struct omp_mig_mm), or mm->env_end. we then return the result of comm_send_hd'ing the omp_mig_mm structure. mig_send_vma_file is called by mig_send_vmas to add file related data data to a passed in omp_mig_vma struct, describing the file associated with the passed in vm_area_struct. first, we set the vm_pgoff member to the vm_pgoff of the passed in passed vma. then, we set the i_size does the file have a valid dentry when member to the inode size of the file's dentry. we then check wether remote? we are running on the remote node, or not. if we are, we set m->vm_file from inode->u.generic_ip. otherwise, we set m->vm_file from vma->vm_file, and set m->f_dentry from vma->vm_file->f_dentry. mig_send_vmas is called by mig_do_send to send the vmas of the passed process to the remote end. it loops through the vmas, copying start, flags, and files to a omp_mig_vma struct. we then set size to vma.end - vma.start. we set vm_pgoff to 0, and call mig_send_vma_file if there is a file associated with this vm. we send the omp_mig_vma struct to the remote end, and return to the top of our loop (if there are more VMAs to send). of our comm_send_hd fails, we break out of the loop, and return the result. otherwise, we naturally exit the loop when all VMAs have been sent, returning 0(the result of the last successful comm_send_hd). mig_send_pages is called by mig_do_send to send all the readable pages of a given task to the remote node. to accomplish this, we iterate over all the vma's in the task (starting at task->mm->mmap), and if we can VM_READ the vma, we loop over each page in the vma, sending first its address via comm_send_hd, then the page data itsself using comm_send. if the vma wasn't marked VM_READ, we just skip it, as its not being used by the 'running code' of the process. if either comm_send_hd or comm_send fail, we OMBUG a message including the address we were trying to send, and return -1. otherwise, once all pages have been sent, we return 0. slightly different order? synchronise! mig_send_proc_context is called by mig_do_send to send the process' curent "state" to the remote end. we fill in a omp_mig_task structure, ptrace is not used by the remote! first pulling in members from the passed in task. first ptrace, then IDs (pid and tgid). we copy the user credentials (uid, euid, suid, and groups? fsuid), and group credentials (gid, egid, sgid, and fsgid), then copy the current signal state (blocked, real_blocked, sas_ss_sp, and sas_ss_size) along with memcpy()ing the signal handler's action struct nice, caps.. (task->sighand->action, a struct k_sigaction). we copy the niceness of the process, and its posix.1e capabilities. we store its capabilities in task->om.remote_caps, and copy the task's personality. we then copy our comm structure with memcpy, and use arch_mig_send_proc_context() to save the archetecture specific parts of the process' state (CPU registers, etc) to the omp_mig_task structure. we then send the omp_mig_task state structure via comm_send_hd, then wait for a reply, check comm_recv's error value! and return 0 if we get one. if comm_send_hd fails, we OMBUG about it, and return -1. mig_do_send is the wrapper that calls all the above in the proper order. its called by either task_remote_expel or task_local_send in hpc/migctrl.c. first we call arch_mig_send_pre to do any archetecture specific work that needs done before migration. then, we send the task's components in the following order: MM, vmas, pages, floating point state, archetecture specific components (via arch_mig_send_specific), and the processor context. we call arch_mig_send_post to clean up archetecture specific tweaks done by arch_mig_send_pre, and returns 0 on success. if any of the mig_send functions or arch_mig_send_specific fail, we OMBUG about it, and send shouldnt we call arch_mig_send_post? a MIG_ABORT req to the remote end. we then return -1 for failure. 046 prochpc creates hpc/proc.c this file contains the code that adds a hpc directory to every process this file is way out of order. reorg. on the machine's proc/$PID/ directory (for controling process seperate into 'hpc' and 'admin' migration), as well as the code adding the /proc/hpc/admin directory to control global aspects of openmosix. proc_pid_set_where is called by openmosix_proc_pid_setattr when a user writes a destination to /proc/$PID/hpc/where. first we check for the string "home", and if so, printk, and bad formatting. we printk that we found the "HOME" string, and call check return of task_register_migration task_register_migration to migrate the process to its home node. otherwise, we see if we can decode an ipaddress by calling check return of task_register_migration string_to_sockaddr. if we find one, we call task_register_migration to migrate the process to the ip we found. if we don't find "home" or an doing nothing is wrong. ip, we do nothing. either way, we return the size of the string passed in. proc_pid_get_where is called by openmosix_proc_pid_getattr when a user requests the node a given process is running on, by reading /proc/$PID/hpc/where. it writes to a supplied char * the address of the node the process is running on, or "home" for the home node. first we check to see if the process is DREMOTE. if it is, we use comm_getname to get the address of the node a process is running on, use sockaddr_to_string to write it to the passed in char *, then add a '\n'. if the process is not DREMOTE, we place the string "home\n" in the passed in char *. either way, we return the length of the location, stayreason_string belongs either in a including the \n at the end. the array stayreason_string contains the header, or in the function that uses it short strings matching the reasons a process might be confined to its home node, defined in hpc/task.h. "monkey" means a process is using a writeable memory mapped file. "mmap_dev" means a process has a character device mmapped. "VM86_mode" means a process is running in VM86 mode. "priv_inst" is supposed to mean a process is using the IN/OUT assembly instructions, but is not yet implimented. "mem_lock" means one of the VMAs or the MM structure belonging to this process are marked VM_LOCKED. "clone_vm" means either this task does not have a mm structure, or its mm structure is in use by more than one process. "rt_sched" means the process has a realtime scheduling priority set. "direct_io" is meant to mean a process has permissions to access I/O space, but it is not yet implimented. "system" means the process is either the init process (pid 1), or one of our openmosix daemons created with om_daemonize. "extern_1" "extern_2" "extern_3" and "extern_4" are reasons that are meant to be setable by a userspace program, along with "user_lock", which indicates that the user has requested for this process not to be migrated. this array is used exclusively by proc_pid_get_stay. proc_pid_get_stay is called by openmosix_proc_pid_getattr. it writes to a passed in char * a string (taken from staystring) describing each stay reason set in the stay reason mask, seperated by \n, terminated by \n, and returns the length. we acomplish this by looping through the 32 possibilities, and testing for each of them with task_test_stay(). proc_pid_get_debug is called by openmosix_proc_pid_getattr. it writes to the passed in char * the hex value of the passed in task's om.dflags member. we return the length of clean up this string. the string we wrote, including its trailing \n. proc_admin_set_bring these two functions are STUBs! and proc_admin_set_expel are called by openmosix_proc_admin_setattr when the user wites to /proc/admin/bring and /proc/admin/expel, printk()! clean the messages. respectively. they just printk a message, and return the size value clean up this string. passed in. proc_admin_get_version is called by openmosix_proc_admin_getattr. it prints into the passed in char * a string reading "openMosix version: " and then the OPENMOSIX_VERSION_TUPPLE. to create the entries in our proc_om_entry_admin structure (of type om_proc_entry) we create a temporary #define. proc_om_entry_admin contains entries that define the 'files' found in the /proc/hpc/admin/ directory, along with what function should be called to dispatch data being read/written to that file. we create an entry for "bring", mapping writes to proc_admin_set_bring, and reads to proc_admin_get_0. we create an "expel" entry, mapping writes to proc_admin_set_expel, reads to proc_admin_get_0. we also add a "version" entry, mapping writes to proc_admin_set_0, and reads to proc_admin_get_version. in effect, this makes it where bring and expel can only be written to, and version can only be read. anything else will be routed to proc_admin_set_0 or should return -EACCES! proc_admin_get_0, which are later defined to return -EINVAL. proc_om_entry_pid holds the entries that define the 'files' found in the /proc/$PID/om/ directories, along with what function should be called to dispatch data being read/written to that file. we create a "where" entry, mapping its write to proc_pid_set_where, and its read to proc_pid_get_where. we create a "stay" entry, mapping write to proc_pid_set_0, and read to proc_pid_get_stay. finally, we create a "debug" entry, mapping write to proc_pid_set_0, and read to proc_pid_get_debug. this makes it where "where" can be read and written to, whereas "stay" and "debug" can only be read. anything else will be routed to proc_pid_set_0 or proc_pid_get_0, which return where is this called from? -EINVAL. openmosix_proc_pid_getattr dispatches requests to the above functions. it uses proc_om_entry_pid, which contains function pointers to the appropriate dispatching function. to accomplish this, we iterate over the members of proc_om_entry_pid, checking if the name of the file passed in matches the name on this specific entry. if it does, we call the function for 'get' in the entry. we return the length returned by the 'get' function, or -EINVAL if we don't find a matching entry. where is this called from? openmosix_proc_pid_setattr works in much the same way, calling the set member of the appropriate entry in the array proc_om_entry_pid to dispatch. it also returns the number of characters successfully written, or -EINVAL if we don't have an entry for the file in question. proc_callback_read is called by the later defined proc_om_read_admin. it calls the appropriate handler for the file its passed (placing its results in a kernel page), and copies the results from the page passed to the handler, into the passed in userspace buffer. we start by using __get_free_page to get a GFP_KERNEL page. if this fails, we return -ENOMEM. otherwise, we check what filename we were requested to does not this iteration method require give results for. we find the handler by iterating through the passed a NULL entry to be last in the array? in "entry" array, looking for a handler based on the name of the file passed in, and if we find one, we call the pointer to the handler stored in the entry. if we don't find an entry matching the file in question, or if the handler returns an error, we free the page in question, and return either the error returned, or -EINVAL. next we check to see if the user requested a segment that was beyond the length of the data being returned (with fseek?). if they requested an offset greater than or equal to the length, we free our kernel page, and return 0. finally, we copy the contents of our kernel page into the passed in buf, taking into account the offset requested, using copy_to_user. if our copy_to_user returns an error, we free our kernel page, and return -EFAULT. otherwise, we free our kernel page, and return the number of bytes copied to the passed in buf. proc_callback_write is the counterpart to the previous function. its called by the later defined proc_om_write_admin, to dispatch writes to shouldnt we return an error while the /proc/hpc/admin/ directory. first, we trunicate the count of data trunicating? reject first! being written in to PAGE_SIZE. then, we reject writes which have an offset, indicating they are partial writes. we use __get_free_page to get a GFP_USER page. if this fails, we return -ENOMEM. we copy the data to be written using copy_from_user. if this fails, we free our again, isnt this iteration method kernel page, and return -EFAULT. we iterate through the passed in broken? om_proc_entry_t structure, looking for an entry matching the file we have been requested to write to. we use the same pointer tricks as above to dispatch the call to the right function. if we don't find an entry matching the given file, we free our page, and return -EFAULT. otherwise, we free the page we requested, and return the length of data written. we define an admin subsystem's read and write calls with some #defines, creating wrapper functions proc_om_read_admin and proc_om_write_admin, wrapping the previous two functions. proc_om_admin_operations is a structure mapping .read and .write to proc_om_read_admin and proc_om_write_admin. openmosix_proc_create_entry is called by openmosix_proc_init to register the files in /proc/hpc/admin with the /proc filesystem. we accomplish this by iterating through the passed in om_proc_entry_t structure, passing each name and mode to create_proc_entry, along with the passed in proc_dir_entry pointer. in cases where create_proc_entry where is this called from? fails, we OMBUG() about it. openmosix_proc_init creates /proc/hpc/, then creates /proc/hpc/admin (using proc_mkdir). finally, it calls openmosix_proc_create_entry to create the 'files' in /proc/hpc/admin/. if either proc_mkdir call fails, we OMBUG about it, and return. 047 omremote creates linux/hpc/remote.c this file is routines for the remote node to request work done on the home node. remote_disappear is called by both remote_do_syscall in this file, and remote_handle_user in hpc/copyuser.c when a syscall fails. its purpose is to kill the process on the remote end. it just calls do_exit(SIGKILL), and never returns. remote_inode_map is a vm_operations_struct mapping just the .nopage member to the normal filemap_nopage function. remote_file_mmap is used by the rdentry code in hpc/files.c, as a member of the remote_file_operations structure contained in a dentry on the remote node. it points the provided vma's vm_ops structure to use the previously defined remote_inode_map. remote_readpage is also used by the rdentry code in hpc/files.c, as a member of the remote_aops structure also contained in a dentry on the remote node. its called to read a page from the home node, on behalf of the remote process. first, we map the passed page into kernel space (so we can write to it). we then get the file * origionally associated with this page on the home node using rfiles_inode_get_file, and calculate the offset, placing both these in a omp_page_req structure. we send this structure to the home node with comm_send_hd, get a reply, and write the reply directly into the page passed in. we set the page as being up to date, unmap the page, and return 0 indicating success. if our comm_recv or comm_send_hd return an error, we OMBUG() about it, mark the page as NOT up to date, as well as marking it as having an error, and return the error comm_recv or comm_send_hd gave us. ppc needs to do this as well! remote_do_mmap is called by both arch/i386/kernel/sys_i386.c's sys_mmap2, and arch/x86_64/kernel/sys_x86_64's sys_mmap2. its purpose is to perform a mmap on the home node, on behalf of the remote process. we place all our arguments into a omp_mmap_req structure, and send them to the home node with comm_send_hd. we receive a omp_mmap_ret structure containing file and isize members. we pass these members to check the result of task_rfiles_get? task_rfiles_get(), and receive back a file pointer. we down_write the mmap_sem semaphore belonging to the mm structure of this process, and do our own do_mmap_pgoff() call. in this way, the mmap is done on both the remote and home nodes. we up_write the earlier semaphore, and return the result from do_mmap_pgoff. if either our comm_recv or comm_send_hd fail, we return the error. remote_wait is called by remote_do_fork, as well as remote_do_execve. it waits for a req struct from the home node, and checks its type against the passed in value. it also checks against the passed in len, to make sure the expected amount of data was sent. if either of these checks fail, we OMBUG about it, and return -1. remote_do_signal is called by remote_do_comm when the deputy stub on the home node sends a signal. we comm_recv a omp_signal printk! structure, printk a trace message, grab the lock for the current check __group_send_sig_info's error! process' signal handler, and call __group_send_sig_info() to call the signal handler. we then drop the signal handler lock, and return 0. remote_do_comm handles incoming communication from the home node, on behalf of the remote node. its set up to comm_recv a req packet, then check error returned by dispatch DEP_SIGNAL and DEP_COMING_HOME by calling remote_do_signal or remote_do_signal and task_remote_expel! task_remote_expel, appropriately. we return 0 after dispatching. if comm_recv fails, or if we get a req that isn't DEP_SIGNAL or be specific on type of error! DEP_COMING_HOME, we OMBUG() "failed", call do_exit(-1), and return -1. shouldnt do_exit kill the process? remote_do_syscall is called by om_sys_remote in hpc/syscalls.c to send why are we returning even? a syscall request from the remote to the home node. we start by why arent we using the earlier OMDEBUG_SYS()ing a trace message. then we pack the syscall requested, remote_disappear function? along with its arguments into a omp_syscall_req structure. we comm_send_hd this structure, and OMDEBUG_SYS() a trace message. we call remote_handle_user to handle memory requests from the home node, on behalf of the syscall handler on the home node. once the home node tells remote_handle_user its done issuing requests, remote_handle_user returns. we then comm_recv a omp_syscall_ret structure. after that, we OMDEBUG_SYS() another trace message, and return the return value packed into the omp_syscall_ret structure. if comm_send_hd, remote_handle_user, or comm_recv return an error, we immediately call never returns! why do we return after remote_disappear, which kills the process, and never returns. after remote_disappear? calling remote_disappear, we return -1. remote_do_fork performs a fork call on both the home node and the remote node, connecting the new printk! processes together. first, we printk() a trace message, and use sockaddr_inherit to make the sockaddr for our child have the same type (ipv4 vs ipv6) and address as our parent's sockaddr. we stuff our passed in clone_flags, stack_start, stack_size, and pt_regs into a omp_fork_req structure, then open a listening socket using our child's sockaddr. when comm_setup_listen returns, we use comm_getname to get http://www.linuxjournal.com/article/7660 the address of the node on the other end of our child socket (in a sockaddr structure), then add our sockaddr structure to our omp_fork_req structure. we use comm_send_hd to send our omp_fork_req to the home node (over our parent's connection). we use remote_wait to wait for a reply, saving the reply in a omp_fork_ret structure. we then call do_fork ourselves, and use find_task_by_pid() to locate the child. we set the child's socket to our newly created socket with task_set_comm, and return the child's PID. if comm_setup_listen, specific error messages! comm_getname, comm_send_hd, or remote_wait fails, we OMBUG() about it, printk! and return -1. if find_task_by_pid fails, we printk() about it, and return -1. count_len is just a cut-and-paste from fs/exec.c. it appears to count the length of argv members, but abort if there are more argv members than requested. we return a count of argv members found, and set a pointer passed in to the length of the argv entries in total. remote_do_execve performs an execve system call while on the remote node. first we get the length of the name of the file we're execve'ing check strlen_user's return! from userspace with strlen_user(), and place it in a omp_execve_req structure. then, we add argc to our omp_execve_req, by calling count_len. we tell count_len not to abort if the number of arguments is greater than the ammount of void * that can fit in (PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *))/sizeof(void *). we then do the same thing to fill in the envc member. we copy the passed in pt_regs structure into our omp_execve_req, then allocate enough space from GFP_KERNEL to hold our filename, argv, and envp, including newlines we fill this space by using copy_user to grab the filename, argv, and envp, terminating each with a newline. we use comm_send_hd to send our omp_execve_req structure, and comm_send to send the space containing our filename, argv, and envp. we then free this space. we call remote_wait to get our response from the home node, and return 0. if either of our count_len invocations, comm_send_hd, comm_send, or what about freeing data? remote_wait return an error, we immediately return with that error. if our kmalloc fails, we return -ENOMEM. if our copy_from_user invocations fail, we return -EFAULT. 48 creates linux/hpc/service.c this file contains the om_daemonize function, and four sockaddr related conversion functions. feels like it should be broken up. om_daemonize creates a kernel thread, and optionally sets a high priority mode. first we call daemonize(). then, we zero the euid, suid, and gid of current. we alloc a new group_info with groups_alloc(0). we grab current->sighand->siglock while emptying the blocked signal set with sigemptyset(). we lock our task structure, then set a priority. if we were requested to use a high priority, we use SCHED_FIFO, and set a rt->priority of 0. we set task_set_stay(DSTAY_RT). otherwise, we set SCHED_NORMAL, task_clear_stay(DSTAY_RT), and set_user_nice(0). either way, we set task_set_stay(DSTAY_SYSTEM), then unlock our task structure. sockaddr_to_string and string_to_sockaddr perform as they sound, ipv4 or ipv6. sockaddr_setup_port is a wrapper arround the proper inet_setup_port for ipv4 or ipv6. sockaddr_inherit we saw used not long ago, to set a sockaddr structure up the same as the sockaddr does IPV6 not have INADDR_ANY? used to create our current link. 49 creates linux/hpc/syscalls.c this file is callback points from where the current running process calls a syscall, it gets directed here. om_sys_local dispatches a syscall request to the local kernel (on the remote node). om_sys_remote calls remote_do_syscall to dispatch a syscall back to the home node. om_sys_gettid and om_sys_getpid are broken, but should return the relevant values from curent->om.tgid and pid, respectively. finally, om_sys_execve dispatches a request for the execve syscall to remote_do_execve. 50 creates linux/hpc/task.c this file contains functions that operate on tasks. that deliniation isn't clear to me, couldnt all this stuff be defined as operating on tasks? task_set_comm sets the link associated with a task to the passed in value, returns the old value, and if the new link is flagged ghosts? SOCK_OOB_IN, we task_set_dreqs(DREQ_URGENT). so far as i can tell, both these flags are never used anywhere else. task_file_check_stay checks a given vma to make sure its file mappings dont prenvent being migrated. it checks to see if the VM_NONLINEAR flag is set indicating this vm page has a non-linear file mapping in it. if so, we check to see if prio_tree_empty indicates the inode's i_mmap has no regions. if there is a single region of the file mmapped in, we add DSTAY_MONKEY to the stay flags, because we're using a mmapped file access. next, we check vma->shared.vm_set.list hoping that its empty. if it has contents, it indicates that we're using a shared memory mapping, and we add DSTAY_MONKEY to the stay flags. next, we check vma->vm_file->f_dentry->d_inode->i_mode to see if it is S_ISCHR, S_ISFIFO, or S_ISSOCK. if it is, we add DSTAY_DEV to the flags, as this is a fifo, socket, or character device file. we then return the stay flags. task_request_checkstay dispatches requests for re-evaulation of a process with reguards to its stay flags. we clear DREQ_CHECKSTAY, printk a log message, then check to see if theres a reason we can shouldnt this have DSTAY_MLOCK? clear (DSTAY_PER_MM | DSTAY_CLONE). assuming there is, we lock the task, clear its flags, and start the process of re-checking for these two flags. if task->mm is null, we set DSTAY_CLONE. if mm_realusers is greater than 1, it means multiple processes are using this mm struct. in that case, we set stay reason DSTAY_CLONE. if mm->def_flags matches VM_LOCKED, it means that someone has locked a memory page. we set stay reason DSTAY_MLOCK. if we marked any flags, we propogate these marks back to the task, unlock the task, and return. task_request_move dispatches DREQ_MOVE move requests. it clears om->whereto, then calls task_move_to_node with that value. it frees the previous om->whereto string. openmosix_task_init initializes all openmosix related members of a given task. if its pid 1(init), set DSTAY_SYSTEM. if it's parent is DREMOTEDAEMON, set DREMOTE. if it's parent is DDEPUTY, set DDEPUTY. init the head of list task->om.files, then return 0. openmosix_task_exit exits the current task. if its not DDEPUTY or DREMOTE, its not our process, so return 0. if its our process, call task_heldfiles_clear. if we're connected, call comm_close to disconnect. assuming all went well, return 0. task_wait_contact creates a WAITQUEUE, sets the task to TASK_UNINTERUPTABLE, and calls schedule. it stays in this loop until om.contact has contents, then removes the WAITQUEUE entry, sets the task's current state to TASK_RUNNING, and ends. task_register_migration sets a process up for migration to a given node. we convert from the sockaddr passed to a string, and stores this string as whereto send the process. we set the DREQ_MOVE flag, call wake_up_process(), mark the process as in need of rescheduling, and return 0. task_do_request checks for DREQ_MOVE, or DREQ_CHECKSTAY, and dispatches them to task_request_(move|checkstay). 51 i386 creates this file defines ia32 specific cpu feature detection functions, a linux/include/asm-i386/om.h function returning argument N passed to a syscall, a function returning the numbe of arguments passed to a syscall, the NR_MAX_SYSCALL_ARG, and a define for getting the user registers from the task struct. cpu_has_feature_fxsr was declared earlier, and tells us which i387 register format to use. arch_get_sys_arg is simple, since our arguments are stored in order in our pt_regs structure. arch_get_sys_nb just returns regs->eax. NR_MAX_SYSCALL_ARG is set to 6. ARCH_TASK_GET_USER_REGS uses offsets to get the registers out of the task->thread_info structure. 52 i386 creates this file contains definitions for the archetecture specific structures linux/include/asm-i386/om-protocol.h used when transfering a process between ia32 machines. we define MIG_ARCH_I386_LDT, which is a flag for arch-i386 telling it to transfer the local descriptor table (which is not commonly changed by a process) along with migrated processes. the LDT is changed by programs such as wine and qemu. omp_mig_fp contains the floating point context of a process. it includes a flag for which format of register save function was used. omp_mig_arch is supposed to be for archetecture specific process functionality, but such functionality is still a stub. omp_mig_arch_task is the structure containing archetecture specific members of the process context. 53 i386 linux/include/asm-i386/uaccess.h this header handles simple reads and writes to/from userspace. we hpc/uaccess.h remote-memory modify it so that we use openmosix's deputy_put and deputy_get functions. the first hunk just includes our header. the second hunk modifies get_user calls so that deputy_get_user is called if cleanup line endings, make code blend openmosix_memory_away(). the third hunk modifies the first of the in better put_user definitions so that deputy_put_user is called if openmosix_memory_away(). both the second and third hunks use the trick of just putting an if{}else in front of the current switch statement. the fourth hunk just modifies the second put_user definition so that we just return the result of deputy_put_user if openmosix_memory_away. the fifth hunk defines deputy_put_user64_helper, which is a wrapper for calling deputy_put_user64 from the following assembly. we then add logic to this definition of __put_user_size to call deputy_put_user or deputy_put_user64_helper in cases where we want to copy 8 bytes. we use the if{}else switch method discussed earlier. the next patch modifies the second declaration of __put_user_size, with the same if{}else trick only with an if following. the seventh modifies __get_user_size. the #ifdef CONFIG_OPENMOSIX elsewhere? eigth patch adds code surrounded in #ifdef CONFIG_OPENMOSIX. it returns CONFIG_OPENMOSIX policy? the result of deputy_copy_to_user. the nineth adds a deputy_copy_from_user call surrounded by #ifdef CONFIG_OPENMOSIX. reorg this code. the last hunk is surrounded bt #ifdef CONFIG_OPENMOSIX, and re-defines strlen_user to check openmosix_memory_away, and call deputy_strlen_user when necissary. 54 ppc creates include/asm-ppc/om.h this file contains archetecture specific code for returning a single passed in argument in a syscall, or finding the number of arguments passed to a syscall, the maximum number of arguments passable to a syscall, and a define for retrieving the pt_regs structure associated with this execution thread. arch_get_sys_arg checks to make sure you're not requesting an argument greater than 31, then returns gpr[1..32]. arch_get_sys_nb returns gpr[0]>>2. we define NR_MAX_SYSCALL_ARG 7, and use an index method to get the pt_regs structure out of task->thread_info. 55 ppc creates include/asm-ppc/om-protocol.h this file contains process state structures that are archetecture specific. the first strcuture here is the floating point state, which is properly filled out. the omp_mig_arch and omp_mig_arch_task structures are, however, empty. 56 x86-64 creates include/asm-x86_64/om.h this file contains archetecture specific functions for returning arguments to the syscall invocation right before we were called, returning the number of arguments that were passed to the syscall, define the maximum number of syscall arguments allowed on this arch, and a define returning the address of the pt_regs structure in the thread_info structure for this process. arch_get_sys_arg uses a switch statement to select the right register, instead of a index based method like the other two arches. arch_get_sys_nb returns the bottom half of register rax (32 bits). we define NR_MAX_SYSCALL_ARG to 6, and use an index based method to get struct pt_regs, which is the last item in the thread_info struct. 57 x86-64 creates this file contains structures for process state that are archetecture include/asm-x86_64/om-protocol.h specific. omp_mig_fp holds the floating point context. omp_mig_arch is a stub as usual, and omp_mig_arch_task holds the segmentation registers, and the TLS entries associated with this task. 58 x86-64 creates this file contains a simplified version of the modifications in i386's hpc/uaccess.h include/asm-x86_64/uaccess.h uaccess.h, due to this being a 64 bit archetecture, which makes operations on 8 bytes at a time much easier. the first hunk includes hpc/uaccess.h. the second invokes deputy_get_user inside of get_user. the third invokes deputy_put_user inside of put_user. the fourth puts deputy_copy_from_user inside __copy_from_user, using a define of CONFIG_OPENMOSIX to decide wether to compile it in or not. the fifth puts deputy_copy_to_user inside __copy_to_user, using the same define as the last hunk. 59 om-core creates include/hpc/arch.h this file prototypes the archetecture specific functions for sending hpc/protocol.h and receiving process pieces, dispatching a local syscall (after we've asm/om.h intercepted it), and starting a recently assembled process. worth hpc/syscalls.h is that we include protocol.h, define almost everything, include om.h and syscalls.h, THEN define arch_exec_syscall, so we have the syscall_parameter_t * to pass in. 60 comm creates include/hpc/comm.h this file declares all of the communications subsystem, both the part that is a wrapper of the sockets interface, and a seperated section dead code that wraps OM datastructure sending routines. we start by defining SOCK_INTER_OPENMOSIX, which is unused. then we define SOCK_OOB_IN, which is tested for, never set, and causes another bit to be set, which is never tested for. the next nine defines are timeout definitions used just once, in comm_setup_tcp, or not at all. MIG_DAEMON_PORT is a duplicate of REMOTE_DAEMON_PORT. the first grouping of functions is a sockets wrapper set. the second is openmosix specific shortcut functions. one to setup, one to listen, one to connect, one for sending large data chunks, and one for sending messages. after that is a define that belongs in hpc/task.h for task_set_comm. 61 creates include/hpc/comm-ipv4.h this header contains string_to_inet and inet_to_string functions, and inet_setup_port, which is a wrapper that just sets sa_in->port to the passed port. 62 creates include/hpc/comm-ipv6.h this header is similar to the last, it contains string_to_inet6 and inet6_to_string, and inet6_setup_port, which wraps sa_in6->sin6_port. 63 creates include/hpc/debug.h this header prototypes a series of debugging functions matching the regex proc_debug_get_(loadinfo|admin|lfree_mem|pkeep_free|nodes) (which don't exist in our sourcebase), defines arch specific om_debug_regs, defines debug_mlink, debug_page, and debug_vmas, defines debug_regs(which dosent exist), and defines some macros for printk. OMDEBUG_* are defined as macros of OMDEBUG. 64 creates include/hpc/hpc.h this file contains the kernel's API to openmosix. it prototypes all of the higher level functions of openmosix. user_thread is a function for creating a user thread. info_startup does not exist. openmosix_proc_* are part of proc.c. the next 6 functions are part of kernel.c. i might mention that unlike most headers, this header has a comment of where an item is located at. the rest of this file belongs to task.c 65 creates include/hpc/mig.h this file prototypes the migration daemon, defines the port we listen on, and prototypes six migration related functions. the first two, mig_do_receive and mig_do_send, are entry points into their respective files. mig_send_hshake and mig_recv_hshake are for establishing a connection. task_move_to_node and task_expel perform the actual task of migration. 66 creates include/hpc/omtask.h this file contains the openmosix_task structure. so far as i can tell, the features[] declared at the and, and thus the first define is unused. the rest of this file seems to be well commented. 67 creates include/hpc/proc.h header exclusively used by proc.c. the first four functions are not directly called by name, but instead are defined, because the E() definition in proc.c will direct operations we've filled in 0 for to these functions. om_proc_entry is defined to hold an entry in the /proc/hpc/admin/ directory. it holds the set and get function pointers, along with the name, mode, length, and type. om_proc_pid_entry is defined to hold an entry in the /proc/$PID/om/ directory. it holds set and get function pointers, along with the name, mode, length, and type. 68 creates include/hpc/protocol.h this file defines all the flags, bitmasks, and datastructures spacing issues associated with the archetecture independant part of the openmosix inter-kernel wire protocol. its reasonably well commented. note that DEP_FLG, MIG_FLG, and REM_FLG are not used outside of this header. REPLY is used all over the place. the next 21 defines use their own 8 bit value, ored with _FLG's 8 bit value (in the 16 bit place), to make a constant 16 bit value (flag+type). omp_mig_task contains the complete context of a task, both archetecture independant parts, and a archetecture dependant structure (with a definition inherited from our arch/ code.). omp_mig_mm contains values defining the process memory layout out of the mm struct. omp_mig_vma contains values out of a given vm_area_struct required to reconstruct that vma on the remote end. its worth noting that file and dentry don't mean much when remote. use define for constant 7 omp_syscall_req and omp_syscall_ret are used to perform remote syscalls. omp_fork_req and omp_fork_ret are used to perform a fork on home, from remote. omp_usercopy_req and omp_usercopy_emb are part of the deputy kernel to remote process memory access API. omp_page_req is a structure containing a request for a page of a file thats open on the home node, from the remote node. omp_mmap_req holds a request to mmap out of order? a chunk of a file on the home node, from the remote node. omp_execve_req passes the arguments for an execve request to the home node, from the remote node. omp_execve_ret holds the return value from the home node. omp_mmap_ret returns the results of a mmap request. omp_signal is used for passing signals both directions. 69 creates include/hpc/prototype.h this file contains debugging related defines, prototypes for the three top level functions called as a deputy, the definition of the rfiles API and structures, and the definition of the communications system on the remote node. the first ifdef block handles turning on and off wether OM_VERBOSE_MIG does something or not, based on if CONFIG_OPENMOSIX_MIGRATION_VERBOSE is defined. we then define OMBUG as a macro of printk(). deputy_die_on_communication is called when the deputy process gets an error recv()ing data from the remote. deputy_main_loop is basically main() for the deputy process. deputy_startup initializes the environment for deputy_main_loop, so that the next time this process is scheduled, openmosix_pre_usermode kicks us into deputy_main_loop. om_held_file is the structure representing a file to the deputy process. list contains a pointer to the rfiles entry, which is never used. rfile_inode_data holds a file on remote, with data for sending requests home. task_heldfiles_add is called in deputy_do_mmap_pgoff to add a file to our list of managed file pointers on the deputy. task_heldfiles_clear is called by openmosix_task_exit on process exit, via a later hook in kerel/exit.c. its role is to safely call fput() on om_held_file->file. task_heldfiles_find is called by deputy_do_readpage to find the local om_held_file coresponding to a remote file *. task_rfiles_get is called on the remote by mig_do_receive_vma to get a file * to a file created by rdentry_create_file(). rfiles_inode_get_file is called by remote_readpage to get a file * from a file->f_dentry->d_inode passed in. its just a wrapper. at this point theres a break in the file, and we start prototyping remote_ functions. remote_disappear is in remote.c. remote_mmap does not exist. remote_do_syscall, remote_do_comm, remote_do_fork, remote_do_mmap, remote_file_mmap, and remote_readpage are in remote.c. 70 creates include/hpc/service.h this is the header for service.c. it prototypes all of the functions in naming of variables in prototypes service.c. sockaddr_to_string and string_to_sockaddr are just conversion functions. sockaddr_setup_port calls the right inet6?_setup_port, based on wether we are using ipv4 or ipv6. sockaddr_inherit fills sa with the same kind of sockaddr as was used in the creation of passed in socket * mlink. om_daemonize creates a OM daemon. its used to create omkmigd. 71 creates include/hpc/syscalls.h this is the header for syscalls.c. all of the definitions here are used in syscalls.c, and only syscalls.c. 72 creates include/hpc/task.h this file contains some init related datastructures, dflags dreqs and dstay related constants, task_ related function definitions and prototypes, and two task_ related datastructures. this entire file is wrapped in CONFIG_OPENMOSIX, and if we're not, defines OPENMOSIX_INIT_TASK and OPENMOSIX_INIT_MM as comments. otherwise, OPENMOSIX_INIT_TASK is defined to a structure for initializing the .om members of the task structure, and OPENMOSIX_INIT_MM is defined to init the .mm_realusers member of the mm structure. the dflags related defines seem well commented to me, execpt for DREMOTEDAEMON. the openmosix migration daemon is marked DREMOTEDAEMON. when it spawns a child, that child process is automatically marked DREMOTE by the code in openmosix_task_init. DREQ_URGENT is used once, in code that is inherited from the 2.4 branch, and does nothing, ATM. DSTAY_PER_MM is used along with DSTAY_CLONE in task_request_checkstay, as a list of reasons to check, and flag. task_(clear|set|test)_* could have been created as a macro. its worth noting that dreqs related wrappers use atomic_* functions, where the others just use bitmasks on values directly. task_add_balance_reason does not exist. task_check_stay does not exist. 73 creates include/hpc/uaccess.h this header is included from include/$ARCH/uaccess.h. it prototypes the API the kernel uses to access remote memory. its wrapped in what about user64? CONFIG_OPENMOSIX, and if we're not, defines openmosix_memory_away, deputy_put_user and deputy_get_user as non-ops. otherwise, we prototype our deputy_ functions for doing memory access from the home kernel to a remote process, then we define openmosix_memory_away. the first if in openmosix_memory_away checks if we're not IN a process. the second checks if we're in a deputy process. 74 creates include/hpc/version.h defines to createversion tuples, all of which are set to 0. we'll worry about versions once we have A working version. 75 include/linux/compiler.h here we define OM_NSTATIC and KCOMD_NSTATIC. which plainly mean, "if KCOMD or OPENMOSIX are defined, don't make these static.". otherwise, we just define them to nothing. 76 creates include/linux/hpc.h this header just includes two other headers, and defines OM_MM(task), which we only use in mm/mmap.c. if we're not CONFIG_OPENMOSIX, OM_MM evaluates to 1(true). 77 include/linux/init_task.h here we add code to call the earlier OPENMOSIX_INIT_TASK and OPENMOSIX_INIT_MM defines, to initialize openmosix specific members of trash in patch the task and mm structures. 78 include/linux/net.h this is the last fragment of the code to make sock_alloc an exported kernel symbol. all we're doing is adding a prototype for it in this header. 79 include/linux/sched.h here we actually add our members to the task and mm structures. one of hpc/omtask.h ourfragments is commented, one isnt. 80 DROP include/linux/signal.h name some function parameters, to make it easier to read. 81 kernel-kcom kernel/exit.c this file includes modifications to export the exit_mm and linux/hpc.h reparent_to_init so that our openmosix code can call them, and code trash in patch to modify task destruction. the first hunk is bogus. the second hunk includes our header, and prototypes exit_mm as OM_NSTATIC. the third declares reparent_to_init as OM_NSTATIC. the fourth declares exit_mm as OM_NSTATIC and delays the call of mm_release until later. the fifth calls openmosix_task_exit, which completes the delayed call of mm_release earlier. 82 kernel/fork.c this is where we modify mm_init, dup_mm, copy_mm, copy_process. the linux/hpc.h first hunk includes our header. in the second hunk, we change mm_init to also initialize the mm_realusers member of the mm structure. in hunk three, we modify dup_mm so that on success of creating a new mm struct for the current task, we remove its DSTAY_CLONE flag. in the fourth hunk, we modify copy_mm so that if we were asked for CLONE_VM, we mark the old vm as having another user before using it for the new process. in the fifth hunk, we modify copy_process to call openmosix_task_init to init the openmosix members of the newely created task structure. the whitespace cleanups next patch adds our call to openmosix_pre_clone at the top of do_fork, the one after that adds our call to openmosix_post_clone at the end of a successful run through do_fork, before the return. 83 kernel-kcom kernel/sched.h make task_rq_lock and task_rq_unlock OM_NSTATIC. 84 MAINTAINERS add Vincent to the MAINTAINERS file. 85 config Makefile three changes. one adds -om to the end of a kernel name in a debian kernel-package incompatible way. the second adds a rule for running an unsparse program, that i believe is remnant of 2.4, and can be removed. the third adds hpc to the list of core-y directories to build. 86 mm/mlock.c we modify mlock.c for updating stay flags. sys_mlock marks a process DSTAY_MLOCK, and sys_munlock unmarks it, sys_mlockall marks a process DSTAY_MLOCK, and sys_munlockall unmarks it. 87 mm/mmap.c this file gets updated to update the stay reasons when mmap related linux/hpc.h events occur. the first hunk includes our header. the second hunk causes __remote_shared_vm_struct to notify openmosix when removing the last node in a shared memory segment, or when we are both parent and head of the vm list (there are no other shared users). the third adds the stay_reason variable. the fourth, fofth, and sixth make sure there is a MM struct associated to this task before accessing it in do_mmap_pgoff. the seventh marks us DSTAY_MONKEY if the mmap is writable, DSTAY_MONKEY if i_mmap_writable, and DSTAY_DEV if S_ISCHR. the eigth calls deputy_do_mmap_pgoff if all the above tests cleared. the nineth passes stay_reason to stay_me_and_my_clones. the last one whitespace? makes it where get_unmapped_area always returns PAGE_ALIGN pages. 88 net/socket.c the last of the code exporting sock_alloc so kcom can use it. }}} = Commentary on [http://osource.org/openmosix/patches/series1/openmosix-documentation.patch openmosix-documentation.patch] = {{{ 1 hpc/copyuser.c documentation updates. the first hunk adds docs for three arguments to deputy_copy_from_user. the second adds docs for three args to deputy_strncpy_from_user. the third adds three docs for three args to deputy_copy_to_user, but the third argument is named wrong. 2 hpc/kcomd.c documentation updates. the first hunk adds a comment labeling and describing socket_listen. the second and third fix spacing issues, someone was using tabs. the fourth labels and fully describes accept_connection. the fifth documents data_write, which is a stub. the sixth and seventh are more tab updates. the eigth starts with a spacing update, then adds a label and description for kcomd_thread. the nineth is another spacing update. 3 hpc/migsend.c the changes in this patch document the mig_send_* and mig_do_send functions. the first hunk adds a label and description to mig_send_mm. it mentions that mig_send_mm waits for acknowledgement. the second hunk adds a label and description to mig_send_vmas. no acknowledgement label is present. the third hunk adds a label and description to mig_send_pages, again with no acknowledgement label. the fourth hunk adds a label and oneline description to mig_send_proc_context. the fifth adds a well formatted one line description and label to mig_do_send. }}} = openmosix-cleanup.patch = {{{ 1 kcomd hpc/kcomd.c the first hunk corrects error handling in socket_listen so that received datastructures are properly destroyed. the second hunk corrects error handling, and adds error messages in each error one missing error condition. it also adds code to inherit the ops and type of our message passed socket into the sock dedicated to this connection, and code to retreive the address of the peer we're talking to. we're still not checking the peer's address. 2 om-rmem include/hpc/hpc.h prototype remote_handle_user, declared in copyuser.c. 3 kcore include/hpc/mig.h prototype reparent_to_init, which is part of kernel/exit.c 4 om-remote include/hpc/migrecv.c prevent gcc warning. 5 kcore include/net/socket.h prototype sock_alloc so we can use it elsewhere. 6 kcore-DROP include/linux/compiler.h include linux/config.h. what was the purpose of this? 7 kcore linux/net/socket.c change sock_alloc's declaration so it is no longer static, period. }}} = openmosix-kcomd-base-functions.patch = {{{ 001 hpc/kcomd.c this patch is the first in a series of patches rebuilding kcomd. we start by adding three headers, and declaring a global variable, indicating wether kcomd is running. the next hunk removes code that shouldnt be in a .c file, namely the kcom_pkt, kcom_node, and kcom_task structures, the kcom_nodes list and its lock, socket_fds, socket_fds_bitmaps, and maxfds. it also removes the helper function alloc_fd_bitmap, kcom_pkt_create, all of the *kcom*node* list functions, the comm_simple stub, and the prototypes for the non-existant functions comm_ack, comm_iovec, and comm_iovec_ack. the third and fourth hunks change the kcom_node_add call in accept_connection to return a node pointer, and store the address of the remote end in the node pointer. the next hunk creates the data_send, data_exception, append_in_packs, pkt_read, functions, destroying data_read, dispatch, kcom_task_create, kcom_task_delete, __kcom_task_find, kcom_task_find, and kcom_task_send functions. we also flesh out data_write. data_send first marks down the time it similar to comm_send? starts, then uses sock_sendmsg to send the passed in kcom_pkt structure and its size to the remote end. we then check the size of the data member pointed to by the kcom_pkt, and if its less than 32, copy it into a 32 byte buffer, and send that buffer and a length of 32 to the remote end. otherwise, we send kcom_pkt->data and its length to the bad error message formats remote end. our while loops for sending are wrapped to use KERNEL_DS, and restore fs to its saved state upon exiting the while. after exiting the while loop that sends the data, we mark down the time. notice that we don't do anything with our time measurements. we return the ammount of data written on success (not including the kcom_pkt wrapping it). data_exception is supposed to clean up in case of dropped connection. according to the comment, its broken, and its free calls are commented out. append_in_packs places a passed kcom_pkt into the queue belonging to the task the packet is marked as destined for. it examines the passed kcom_pkt->type to determine wether this packet was created on behalf of a deputy process or a remote process, and places the packet in the queue belonging to rpid or hpid, respectively. it then wakes typoes! up the process in question. pkt_read is called to read a packet and either place it in a queue to a destined process (with append_in_packs), or dispatch it immediately due to it being a error handling. migration related request (go home, come home, init). data_write iterates through each task that has a process on the passed in node, and uses data_send to send pending packets. after send, packets have spacing. their memory free'd. there is no error checking in this function. the next two hunks perform a major overhaul on the kcomd_thread function, add two flags to the kernel_thread invocation that creates the kcomd_thread, and fleshes out kcomd_exit. the changes to kcom_thread start by utilizing kmem_cache_create to create several caches that are never used (but are properly destroyed later). we block all signals to the current process right after calling daemonize, add code to alloc_fd_bitmap just once outside the while loop, and move some variables out of the while loop, into the top of the function. inside of the while loop, we've disabled our locking functions around way too much commented out code kcom_nodes_lock, and we've inserted a lot of debugging code thats commented out. we're measuring the time the while loop takes to complete, and starting the do_select part of the function. we set up to measure the time used, and we insert a new method of using do_select. we enable SIGHUP, and sleep until we get it from kcom_send. we insert much better error handling code, and dynamically alocate a fd pointing to the socket for the node. our bit testing section has been completely re-written, and we actually clean up on exit of kcomd. we add CLONE_FS and CLONE_FILES to the kernel_thread call in kcomd_init, and flesh out kcomd_exit by setting a global variable and sending SIGHUP if we can find the kcomd task. 002 config hpc/Makefile add kcom.o to our list of object files. 003 omcore include/hpc/protocol.h our first hunk just corrects a spacing issue. the secod hunk re-defines how we set our constant flags. in general, its a nice cleanup, but could use more docs. the third hunk is noise. drop. 004 omcore include/hpc/prototype.h we add a whole bunch of declarations to functions we don't have, and some we do, and some we just added. no suprise, since this patch is part of a set. 005 ommig include/hpc/task.h change the prototype to task_register_migration, so that we no longer require a sockaddr, just a task. 006 kcore net/socket.c EXPORT_GPL(sock_alloc) if CONFIG_KCOMD or CONFIG_KCOMD_MODULE 007 kcore fs/select.c EXPORT_GPL(do_select) if CONFIG_KCOMD_MODULE 008 kcore include/linux/compiler.h don't static functions defined KCOMD_NSTATIC of CONFIG_KCOMD or CONFIG_KCOMD_MODULE }}} = openmosix-kcomd-migsend-to-kcomd.patch = {{{ 001 linux/hpc/migsend.c our first hunk just adds some headers, the second is spacing related noise, drop. the third changes mig_send_fp to use kcom_send_with_ack instead of comm_send_hd. the fourth hunk re-writes mig_send_mm to use kcom_send_with_ack, only it also stops using a omp_mig_mm structure, and instead just relies on sizeof(omp_mig_mm). the fifth hunk changes mig_send_vmas to preserve vm_pgoff during transmission, and use kcom_send_with_ack instead of comm_send_hd. the sixth changes mig_send_pages to allocate a page of memory, copy our data there, and no error checking. send from that buffer using kcom_send_with_ack. the next two hunks swap out comm_recv with kcom_send_with_ack inside of mig_send_proc_context. chunk nine uses kcom_send_with_ack at the top of mig_do_send to request permission to migrate a process, before jumping in to sending. the final patch printk's when a process migrates successfully, and changes the fail_mig routine to print an error, and not to send anything to the remote end if we fail to migrate. }}} = openmosix-kcomd-proc-to-kcomd.patch = {{{ 001 omproc hpc/proc.c the first hunk adds three headers. two for inet related functions, one for kcom. the rest of the file changes both proc_pid_set_where, and proc_pid_get_where functions. in proc_pid_set_where, the first real difference is that instead of just printing home detected, and reacting, we print "HOME detected - on deputy node" and react based on wether we are the deputy or the remote as to how we get migration accomplished. home on deputy sends MIG_COME_HOME to tsk on remote. home on remote node calls task_register_migration. IP on deputy is broken, IP on remote calls task_register_migration. proc_pid_get_where is modified so that instead of using comm_getname and sockaddr_to_string, examine portability. we use variable assignments, and bitmask tricks with sprintf. }}} = openmosix-kcomd-remote-to-kcomd.patch = {{{ 001 remote hpc/remote.c the first hunk includes two header files. the second hunk revamps remote_do_signal, making it accept a packet in its parameters, instead trash of trying to comm_recv a packet off the queue. we also transmit an acknowledgement packet, waking up kcomd with SIGHUP to do so. except that we're not looking for kcomd, and don't declare the variable we break things! kcomd's pid is stored in. therefore, -EBROKENCODE. the next hunk disables calling remote_do_signal inside of remote_do_comm. the last two hunks change remote_do_syscall so that we transmit our packet by calling kcom_send_with_ack, and setting ourself to TASK_INTERRUPTABLE. we call remote_handle_user to dispatch memory requests from the home node (unless this syscall is exit, in which case we just exit). when it returns, it returns with the result of our syscall. we return this result. }}} = openmosix-kcomd-task-to-kcomd.patch = {{{ 001 om hpc/task.c our first hunk includes the kcom.h header. the second hunk changes our task_move_to_node invocation in task_request_move to not clear use omdebug() instead of printk om.whereto or free its memory. the third hunk adds a debugging printk spacing at the top of openmosix_task_init. the fourth hunk is just a spacing fix, drop. the fifth hunk starts with a spacing fix in openmosix_task_exit, but continues on to change from just clearing heldfiles and closing connection, to dumping stack, calling kcom_task_delete(), clearing heldfiles, and freeing task->om.whereto. the final hunk changes task_register_migration so that it no longer accepts a destination as a parameter, dosent mess with task->om.whereto, and is exported via EXPORT_SYMBOL_GPL. }}} = openmosix-kcomd-migctl-to-kcomd.patch = {{{ 001 omrecv hpc/migrecv.c our first and second hunk just include three headers for us. the third patch adds the mig_do_receive_home, and mig_do_receive_init functions, ;; using EXPORT_SYMBOL_GPL to export them. we create mig_do_receive_home, which is a function for acheiving a move from a remote home back to the home node. we are given a packet via a passed in argument, and check if it is marked PKT_NEW_MSG. if it is, we assume this is the spacing home node being migrated to, send a PKT_ACK packet, call task_register_migration and return 0. if the packet passed in was not marked PKT_NEW_MSG, we assume this is the remote node being migrated from, and call wake_up_process on the task in question. we return spacing 0 for success, -1 for failure. the mig_do_receive_init function is called by kcomd with a MIG_INIT packet, to do the "work" of setting up a process on the current (remote) node on behalf of a remote node. first, we check to see if the packet passed via passed in argument was error handling! marked PKT_NEW_MSG. if it isn't, we just return 0. if it is, we begin constructing our response packet, and check wether this is migration why treat loopback differently? via loopback, defined as 127.0.0.1. if it is, we use why loopback migrate at all? kcom_home_task_find to find the kcom_task structure associated with what if loopback isnt 127.0.0.1? ipv6? the origional process. otherwise, we use kcom_task_create to make a new task, and return its kcom_task structure, and we copy the PID of the task on the home node from our MIG_INIT packet to kcom_task->hpid. many comments indicating this code we delete the packet we were called with, and call user_thread to needs help! handle migration (via mig_handle_migration), and wait for it to set a variable we're spinning on. once that variable is set to non-zero, if its greater than zero, its the PID of our new process, after migration has completed. if its negative, something went wrong, and we send a NACK flag, indicating failure, and return -1. assuming PID was positive, we set the rpid member of our kcom_task, and send a ACK packet back to the home node, indicating success, and telling it what the PID of the new process to talk to is. we then return 0. mig_do_receive_mm gets a bit of a facelift, using a passed in packet, when should we down_write? wrapping the actual mm modification in down_write and up_write, and sending a response with kcom_send_ack. it also gets EXPORT_SYMBOL_GPL'd. mig_do_receive_mm_area gets renamed to spacing issues in patch! mig_do_receive_vma, and a facelift. the first obvious changes are that we now use a passed in packet, and instead of using the given vm_flags, we mark pages RWX. we've added code to check the response from sys_madvise, and if it returns nonzero, we kcom_send_nack, and return the result. otherwise, we kcom_send_ack, and return 0. this function is also EXPORT_SYMBOL_GPL'd. mig_do_receive_page is adjusted so that it accepts a passed in packet, sends a kcom_send_nack in case of failure, uses alloc_zeroed_user_highpage instead of alloc_page, and so that we use kcom_send_ack in case of success. this function is not EXPORT_SYMBOL_GPL'd. mig_do_receive_fp gets a similar treatment, receiving a passed packet, sending acknowledgement with kcom_send_ack, and returning 0. its also not EXPORT_SYMBOL_GPL'd. mig_do_receive_proc_context is modified to receive a passed in packet, use the sys_set family of functions to set members of the task_t set_personality gets p from where? structure related to id/credentials, use set_personality instead of touching p->personality, send an ack using kcom_send_ack, and return 0 in case of success. mig_do_receive is completely re-written, starting off by sitting and spinning, waiting on kcomd to fill in the mytsk pointer for this structure (which is never done!). the rest of the function now initializes om.whereto if DREMOTE, sets us to TASK_INTERRUPTABLE, and enters a while(1) loop. in this loop, we look for incoming packets and dispatch, just like the old version of this function. at the end of the loop, we print a message, and reschedule, so that kcomd can run (and thus feed us packets). the last function in this patch is mig_handle_migration. this function is started by the user_thread call in mig_do_receive_init, and is the "top" of the newly created process. we start by re-parenting to init, calling obtain_mm, setting ourselves to DREMOTE, then telling mig_do_receive_init our pid. after that, we jump into the mig_do_receive function to receive all our process state. we set ourselves to TASK_RUNNING, call schedule, print a message saying we're starting the new process, clear_thread_flag (TIF_SIGPENDING), and call arch_kickstart to jump into the new process. we add some test code just in case arch_kickstart returns, and call do_exit(SIGKILL) if we run into errors. }}} = openmosix-kcomd-remote-preuser-to-kcomd-api.patch = {{{ 001 om hpc/kernel.c the first hunk just includes two headers. the second changes remote_pre_usermode to dispatch packets containing requests for signals remote_do_signal returns ? to remote_do_signal and delete them before returning 0. the last hunk removes the code kicking off the openmosix_mig_daemon, making openmosix_init into a stub returning 0. }}} = openmosix-kcomd-move-copy-to-user-to-kcomd-api.patch = {{{ 001 rmem hpc/copyuser.c our first hunk just includes two headers. the second and third hunks change deputy_copy_from_user to use kcom_send_with_response, instead of error handling? using comm_send_hd and then comm_recv to get a response. the fourth the sizeof in the kzalloc looks funny. hunk changes deputy_strncpy_from_user to use kcom_send_with_response to no free of u? error handling? send its request, and receive the data from the remote end. the next hunk changes deputy_copy_to_user so that instead of sending two packets containing our request to the remote end(one with comm_send_hd, the other with comm_send), and not getting a response, we now send one error handling! large packet with kcom_send_with_ack, and get an acknowledgement from the remote end. hunk six changes deputy_strnlen_user to use error handling! kcom_send_with_response, instead of using comm_send_hd, and comm_recv. error handling! hunk seven and eight change deputy_put_userX to use kcom_send_with_ack instead of just comm_send_hd. hunk nine changes deputy_get_userX to use error handling! kcom_send_with_response instead of comm_send_hd and comm_recv. bad comment. remote_copy_user gets broken into two functions, remote_copy_from_user and remote_copy_to_user. remote_copy_from_user creates a buffer kfree()? allocated via kmalloc(GFP_KERNEL), uses copy_from_user to fill it, and replies with the contents via kcom_send_resp. we return the result of the copy_from_user call. remote_copy_to_user just calls copy_to_user and sends an ack with kcom_send_ack, returning the result of the copy_to_user call. remote_strncpy_from_user is changed to accept a passed in packet, send its reply with kcom_send_resp, and return the result from strncpy_from_user. remote_strnlen_user has been modified to accept a passed in packet, create a buffer, send that buffer with kcom_send_resp, and return 0. remote_put_user is changed to accept a passed in packet, and send an acknowledgement using kcom_send_ack. remote_get_user is changed to accept a passed in packet, create a buffer, fill that buffer with get_user, send a reply with spacing! kcom_send_resp, and always return 0. remote_handle_user gets a re-write, basically performing like the previous rendition, except for accepting a passed packet, setting ourself TASK_INTERRUPTABLE, and completely new code for handling a SYSCALL_DONE packet which is our exit path out of this loop. it wakes up kcomd after attaching a newly created packet to our out packets. it then deletes our passed packet, sets us to TASK_RUNNING, calls schedule, and returns the result of the syscall (given in the passed in packet). }}} = openmosix-kcomd-move-deputy-to-kcomd-api.patch = {{{ 001 omhome hpc/deputy.c the first hunk just adds our kcom header. the second changes deputy_do_syscall to accept a passed in packet, and immediately reply why not kcom_send_ack? with an acknowledgement packet. we use kcom_send_with_ack to send our compare and contrast this manual ack response (the result of the syscall) to the remote node, instead of creation with kcom_send_ack. comm_send_hd, and remove a debugging message, making our debugging slightly less verbose. the third hunk expands deputy_do_sigpending, or attempted to, but failed, and still is a stub that calls do_signal, only this stub then prints a message, and de-queues all pending signals. the next hunk comments out completely the deputy_process_communication function. the last hunk changes deputy_main_loop so that instead of just checking comm_wait, and dispatching to deputy_process_communication, we spin on our incoming packet queue, and when a packet arrives, we check if its a syscall, and dispatch it if so. while we're spinning on packets, we call deputy_process_misc before rescheduling ourself. }}}