hunk class file patched description noteworthy references
001 i386,config arch/i386/Kconfig this patch adds hpc/Kconfig to the build process
002 i386,remote arch/i386/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines
syscall used by the assembly code in arch/i386/kernel/entry.S. first, we
generate the offset to the om member of the task structure, from the
begining of the structure. we then generate an offset to the dflags
DDEPUTY and DREMOTE should have a member, inside of task.om, which is of type openmosix_task. finally,
following _asm, indicating they're we define DDEPUTY and DREMOTE constants, setting them to the DDEPUTY
the versions used by assembly code. and DREMOTE values defined in hpc/task.h.
003 i386,remote arch/i386/kernel/entry.S we modify this file to add two new entry points to the kernel, utilize
remotefork how do we make these changes the syscall mapping table in arch/i386/kernel/omasm.h(in both the
local conditional? #ifdefs? normal int 80h syscall path, and the sysenter path), make the two
syscall syscall exit points store a pointer to the thread_info of this process,
and insert a call to openmosix_pre_usermode in the userspace return
path. ret_from_deputy_fork is entered by a process, when its eip is set
to this function by the code we add in the function copy_thread in
if this is identical, why use it? arch/i386/kernel/processes.c. ret_from_deputy_fork is an identical copy
of ret_from_fork. ret_from_kickstart is called from arch_kickstart in
isnt this GET_THREAD_INFO redundant? hpc/arch-i386.c. it calls GET_THREAD_INFO(%bsp), and jmps to
syscall_exit, returning to userspace for the 'first' time on a remote
node. we then modify the resume_userspace entry point, to call
openmosix_pre_usermode between doing work_pending, and restor_all'ing.
in the next two hunks, we add code to select which syscall table to use
based on whether the current task is marked DREMOTE or not. this is
added once in ENTRY(sysenter_entry), and again in ENTRY(system_call).
we also modify syscall_exit and sysenter_exit to store the result of
GET_THREAD_INFO into %ebp, cleaning up after our own code clobbers the
register.
004 i386,i387 arch/i386/kernel/i387.c fxsr support is support for fast saving of the i387's floating
point/sse/sse2/etc state to a 512 byte block. its a new feature,
not present in earlier i387 style floating point processors. this patch
changes from declaring the conversion functions for fxsr<->387 from
static to OM_NSTATIC, and adds a function for finding out whether
support exists during run time.
005 i386,remote creates arch/i386/kernel/omasm.h this file contains the syscall table called by processes which are
whats the rule for whether to process DREMOTE. it contains a mapping of whether a syscall is to be passed to
locally or back home? the home node, or handled locally.
#define self out if !CONFIG_OPENMOSIX
006 userthread arch/i386/kernel/process.c in this patch, we add an entry for user_thread_helper in the
remotefork kernel_thread_helper execution path, add a function for creating an
in-kernel user thread, and re-direct the entry point ret_from_fork to
ret_from_deputy_fork for processes that are DDEPUTY in copythread().
our user_thread_helper entry point meerly subtracts 60h from the stack
pointer for this task, reserving space for the user registers on the
stack, allowing execution to continue into the kernel thread helper.
user_thread is called by openmosix_mig_daemon, in hpc/migrecv.c, to
start a 'kernel thread', in the user segment, to handle an incoming
migration request. to do this, we set up the user registers in a
pt_regs structure, so that we can call do_fork, and have it create the
thread for us. first we zero the structure. then we assign the
function pointer to the function we want to start in to ebx, set edx to
the function's argument, set xds and eds to allow the process access to
__USER_DS (the usermode dataspace), and set xcs to allow the process
to execute code in __KERNEL_CS (the kernels codespace). we set the
orig_eax 'register' to -1, and set the eip to point to our
user_thread_helper above, so that execution starts there, and set our
eflags so that when this process is running, hardware interrupts are http://x86.org/intel.doc/386manuals.htm
enabled, and so that the sign and parity bits are turned on. we then
call do_fork, adding flags indicating that we don't want this process
to receive SIGCHILD, and that it cannot be ptraced. we return the
result of the do_fork call. finally, we modify copy_thread so that for
processes marked DDEPUTY, instead of the parent process returning to
userspace and immediately entering the kernel at ret_from_fork, we set
it to enter the kernel at ret_from_deputy_fork.
007 i386,ksocket arch/i386/kernel/signal.c this patch changes the do_signal function from static to OM_NSTATIC
008 i386,remote arch/i386/kernel/sys_i386.c this patch modifies sys_mmap2 so that processes marked DREMOTE mapping
memory without MAP_ANONYMOUS get forwarded to remote_do_mmap.
009 i386,local arch/i386/kernel/vm86.c this patch changes the save_v86_state and return_to_32bit functions
remote so that they clear the DSTAY_86 flag before they exit. it also changes
both sys_vm86 and sys_vm86old so that they return a process to its
home node if it attempts to enter vm86 mode. the code we add to
save_v86_state and return_to_32bit simply task_lock()s the current
task, uses task_clear_stay() to clear the DSTAY_86 flag, and
task_unlock()s the current task. the code we add to sys_vm86 and
sys_vm86old simply calls task_go_home_for_reason(), specifying
DSTAY_86. if task_go_home_for_reason returns non-zero, we force the
better error message? function we're in to return -ENOMEM, as we must be on remote, and
migration must have failed.
010 i386,remotemem arch/i386/lib/usercopy.c this patch changes strlen_user to redirect to deputy_strlen_user if
openmosix_memory_away().
011 ppc,config arch/ppc/Kconfig this patch adds hpc/Kconfig to the build process
012 ppc,remote arch/ppc/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines
syscall used by the assembly code in arch/ppc/kernel/entry.S. first, we
generate the offset to the om member of the task structure, from the
begining of the structure. we then generate an offset to the dflags
DDEPUTY and DREMOTE should have a member, inside of task.om, which is of type openmosix_task. finally,
following _asm, indicating they're we define DDEPUTY and DREMOTE constants, setting them to the DDEPUTY
the versions used by assembly code. and DREMOTE values defined in hpc/task.h.
013 ppc,remote arch/ppc/kernel/entry.S we modify this file to a new entry point to the kernel, utilize the
syscall syscall mapping table in arch/ppc/kernel/misc.h, and insert a call to
openmosix_pre_usermode in our userspace return path. first we modify
syscall_dotrace_cont, selecting which syscall table to use based on
whether the current task is marked DREMOTE or not. ret_from_kickstart
is called from arch_kickstart. ret_from_kickstart branches directly to
ret_from_syscall, returning to userspace for the 'first' time on a
remote enode. our last hunk branches directly to openmosix_pre_usermode
in the restore_user path.
014 ppc,userthread arch/ppc/kernel/misc.S in this patch, we add an assembly function to create usermode threads,
and create the remote syscall table. first we define SIGCHLD. our next
hunk creates a user_thread function similar to the one in
rewrite in C, if possible! arch/i386/kernel/process.c, except hand written in assembly. finally,
move syscall table to omasm.h! we create the syscall table used by processes which are DREMOTE. it
contains a mapping of wether a syscall is to be passed to the home
node, or handled locally.
015 x86_64,config arch/x86_64/Kconfig this patch adds hpc/Kconfig to build process
016 x86_64,remote arch/x86_64/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines
syscall remove ifdef around header. redundant. used by the assembly code in arch/x86_64/kerel/entry.S. first, we
move this define with the others. generate the offset to the om member of the task structure, from the
add a define around task! is task used? begining of the structure. next we define an entry for task. in our
last hunk, we generate an offset to the dflags member, inside of
DDEPUTY and DREMOTE should have a task.om. we then define DDEPUTY and DREMOTE, setting them to the
following asm! DDEPUTY and DREMOTE values defined in hpc/task.h.
017 x86_64,remote arch/x86_64/kernel/entry.S we modify this file to add a new entry point for returning from
syscall dont define out omasm.h kickstart, utilize the syscall mapping table in
arch/x86_64/kernel/omasm.h, modify the PTREGSCALL macro to create om_
entries for each of the 6 functions that take a PTREGS argument, insert
an om_ptregscall_common entry point, insert a om_stub_execve entry
point, insert a call to openmosix_pre_usermode, and insert a
re-write user_thread in C! user_thread function written in assembly. ret_from_kickstart restores
the state of the registers to how they were before it was called, and
returns to userspace for the first time, on a remote node. in the
system_call entry point, we check for DREMOTE in task.om.dflags.
why the fake frame? CFI_ADJUST! if we find it, we jump over a stack frame, call into our
UNFAKE_STACK_FRAME? remote_sys_call_table, step back under the stack frame, and jmp to
ret_from_syscall. otherwise, we pass through to the normal syscall
handler. next, we re-define the PTREGSCALL macro so that when its used,
it creates two entries instead of one. one 'normal' entry, and one
entry prepended with om_, that loads the address of an om_ version of
the function being declared, and calls our om_ptregscall_common
entry to dispatch. om_ptregscall_common is our version of
non-functional differences? ptregscall_common. the only functional difference is ours begins by
jumping over a stack frame, does the same work as ptregscall_common to
call the C function pointed to in rax, and peel back to before the
stack frame we jumped over. entry om_stub_execve is similar to
stub_execve, only call remote_do_execve instead of sys_execve, and we
save the contents of r11(eflags) in r15 during this call, restoring
afterwards. our next hunk modifies the common_interrupt entry to call
our openmosix_pre_usermode function during the return to userspace.
move to C? finally, we create our user_thread entry, which is responsible for
creating our 'user thread', similar to the two previous user_thread
functions.
018 x86_64,remote creates arch/x86_64/kernel/omasm.h this is a table containing mappings that are used to dispatch syscall
#define self out if !CONFIG_OPENMOSIX requests made by processes that are guests. it contains mappings of
whether a syscall is to be passed to the home node, or processed
what about sys_ni_syscall? locally. the entries are referenced by code generated by the
PTREGSCALL macro in arch/x86_64/kernel/entry.S. each mapping stores the
address of one of om_sys_local, om_sys_remote, or sys_ni_syscall.
this address is loaded into %rax by PTREGSCALL, and PTREGSCALL calls
om_ptregscall_common to dispatch to the function retrieved from this
table.
019 x86_64,remote arch/x86_64/kernel/sys_x86_64.c this patch redirects sys_mmap2 so that remote processes mapping memory
without MAP_ANONYMOUS get forwarded to remote_do_mmap.
020 x86_64,local arch/x86_64/lib/copy_user.S this patch redirects copy_to_user and copy_from_user so when the kernel
rmem on the home node is accessing memory in the processes userspace, it
gets redirected to functions accessing memory on the remote node. our
first hunk modifies copy_to_user, checking to see if the process is
marked DDEPUTY, and if so re-directing to deputy_copy_to_user. the
second hunk accomplishes the same task, re-directing copy_from_user to
better label than 2901! deputy_copy_from_user if the task is marked DDEPUTY.
021 x86_64,local arch/x86_64/lib/usercopy.c this patch forwards __strncpy_from_user and strncpy_from_user so when
rmem missing comments on #endifs the kernel on the home node is accessing memory in a deputy processes
userspace, we use deputy_strncpy_from_user. we also forward
__strlen_user and strlen_user to deputy_strlen_user for the same
reason.
022 local,rmem fs/namei.c modify getname to use deputy_strncpy_from_user to get the filename
BUG! this function is supposed to requested from userspace from the remote node when
cannonicalize the filename passed in, openmosix_memory_away().
not just return it!
missing comment on #endif
023 local,procfs fs/proc/base.c this patch adds "files" named where, stay, and debug in a directory
named "hpc" in the /proc/$PID/ directory of each process on the
take out #ifdef around header include. local node. first, we include the hpc/hpc.h header. in the next two
hunks we add entries for PROC_TGID_OPENMOSIX,
PROC_TGID_OPENMOSIX_WHERE, PROC_TGID_OPENMOSIX_STAY,
PROC_TGID_OPENMOSIX_DEBUG, PROC_TID_OPENMOSIX,
PROC_TID_OPENMOSIX_WHERE, PROC_TID_OPENMOSIX_STAY, and
PROC_TID_OPENMOSIX_DEBUG into enum pid_directory_inos. this enum sets
the inode number for each of our "files" in /proc to unique values.
next we create the "om" entry with inode number PROC_TGID_OPENMOSIX
in the tgid_base_stuff structure. we then do the same thing again,
creating a "om" entry with inode number PROC_TID_OPENMOSIX in
tid_base_stuff. next we create a pair of structures
(tgid_openmosix_stuff and tid_openmosix_stuff), containing entries for
PROC_TGID_OPENMOSIX_WHERE/"where", PROC_TID_OPENMOSIX_WHERE/"where",
PROC_TGID_OPENMOSIX_STAY/"stay", PROC_TID_OPENMOSIX_STAY/"stay",
PROC_TGID_OPENMOSIX_DEBUG/"debug", and
PROC_TID_OPENMOSIX_DEBUG/"debug". these entries declare the contents of
our /proc/$PID/om/ directory. in our next hunk, we add
proc_pid_openmosix_read and proc_pid_openmosix_write functions, and a
file_operations structure named proc_pid_openmosix_operations
mapping .read and .write functions to the proc_pid_openmosix_read and
proc_pid_openmosix_write we just declared. in proc_pid_openmosix_read,
we first trunicate a read request to PAGE_SIZE, and request a free page
in GFP_KERNEL. if __get_free_page fails, we return -ENOMEM. otherwise,
we then call openmosix_proc_pid_getattr from hpc/proc.c, which does the
work of dispatching our read request to the right function. it fills in
the page we allocated, and returns the ammount of characters written to
the page. if the length is less than zero, this indicates an error. we
respond to this error by freeing our page, and returning the error
value. assuming no error occured, we check to see if the user requested
data beyond the end of what we 'read'. if they did, we free our
requested page, and return 0. otherwise, we take the seek value (ppos),
and apply it to our page. we then copy_to_user the contents of our page
(past the seek) to the passed in userspace buf, free our page, and
return the number of bytes copy_to_user'd. in proc_pid_openmosix_write,
reverse the order of these. fail first! we first trunicate our write request to PAGE_SIZE, then check to see if
the user requested a 'partial write'. if they did, we return -EINVAL.
we then get a free page in GFP_USER. if that fails, we return -ENOMEM.
otherwise, we copy the data in the passed in buf into our new page. if
our copy_from_user fails, we free our page, and return -EFAULT.
otherwise, we call openmosix_proc_pid_setattr with our page, free said
page, and return the length returned(even if its a negative value, EG
file_operations structure (proc_pid_openmosix_operations), pointing its
.read and .write members to the two functions we just declared.
we then forward declare proc_tid_openmosix_operations,
proc_tid_openmosix_inode_operations, proc_tgid_openmosix_operations,
and proc_tgid_openmosix_inode_operations structures, which we'll define
later in this patch. adds a pair of cases to the large switch in
proc_pident_lookup that map all eight of our unique identifiers defined
at the begining of this file to their respective file_operations
structures, and inode operations structures in the case of the
containing directories. this allows the proc system to find the
structures containing the function pointers to handle our requests.
in our last hunk, we provide implementations for the functions
proc_tgid_openmosix_readdir and proc_tid_openmosix_readdir
that return the result of calling proc_pident_readdir against our
tgid_openmosix_stuff and tid_openmosix_stuff structures.
we then define the proc_tgid_openmosix_operations and
proc_tid_openmosix_operations structures, mapping .read to
generic_read_dir and .readdir to our proc_tgid_openmosix_readdir or
proc_tid_openmosix_readdir declared above. we then define the functions
proc_tgid_openmosix_lookup and proc_tid_openmosix_lookup, which return
the result of calling proc_pident_lookup with our tgid_openmosix_stuff
or tid_openmosix_stuff structures. finally, we define our
proc_tgid_openmosix_inode_operations and
proc_tid_openmosix_inode_operations structures, mapping .lookup to
proc_tgid_openmosix_lookup or proc_tid_openmosix_lookup.
024 local,procfs fs/proc/root.c this patch changes proc_root_init to call openmosix_proc_init. first,
remove #ifdef around include we include our hpc/hpc.h header. then, we add our call to
openmosix_proc_init (from hpc/proc.c) into proc_root_init.
025 ksocket,local, fs/select.c this patch exports the do_select function, so that hpc/kcomd.c can use
remote it to check for data on our pile of incoming sockets. this function is
is only exported if CONFIG_KCOMD is selected.
026 i386, local creates hpc/arch-i386.c this file contains functions converting between two different but http://arch.ece.uic.edu/~yxshi/param/web/homepage/research/doc/reference/vc130.htm
remote, syscall this file should be broken up. compatible x87 floating point state formats, for sending and receiving
archmig archetecture specific sections of a given task's state, a function for
starting a new guest process, and support functions for handling
syscall requests from entry.S. first, we forward declare the functions
twd_fxsr_to_i387 and twd_i387_to_fxsr from arch/i386/kernel/i387.c.
we utilize them in creating fxsave_to_fsave and fsave_to_fxsave
make the order of operations in these functions. fxsave_to_fsave is called by the later declared
two functions identical. arch_mig_receive_fp to convert from fxsave to fsave format. we start
by copying the contents of the cwd, swd, fip, fcs, foo, and fos fields
of the from union to the to union. we then use twd_fxsr_to_i387() to
fill in our twd member. we then save padding[0] to our fop member, and
padding[1] to our mxcsr mrmber. next we perform a memcopy loop to copy
and convert the st_space member. this member contains fields that are
16 bytes long in fxsave format, and 10 bytes long in fsave format. we
loop through the fields, copying only the first 10 bytes to our to's
st_space. finally, we memcopy the xmm_space member. fsave_to_fxsave is
also called by arch_mig_receive_fp, to convert from fsave to fxsave
format. we start by copying the contents of the cwd, swd, fip, fcs,
foo, and fos to the 'to' union, from the 'from' union. we use
twd_i387_to_fxsr() to fill in our twd member, save padding[0] to our
fop member, and save padding[1] to our mxcsr member. after that, we
enter a loop, memcopying our 10 byte long members of st_space to 16
byte spaces. finally, we memcopy the xmm_space member. the next three
functions are for receiving archetecture specific state information.
BROKEN! does not handle setting up arch_mig_receive_specific is called by mig_do_receive in
LDT entries! hpc/migrecv.c. its purpose is to receive the archetecture specific part
of a process. the one in this file has code to warn us that we're not
setting up the LDT correctly, and still returns success. if its asked
to setup anything else, we return -1. arch_mig_receive_proc_context is
called at the top of mig_do_receive_proc_context, from hpc/migrecv.c.
its function is to set up the CPU state from the passed in omp_mig_task
structure. we start by getting the pt_regs structure of the task we're
check failure in this function! setting up with ARCH_TAK_GET_USER_REGS. we then overwrite it with
omp_mig_task's regs member. we overwrite our task's thread.debugreg
with arch.debugreg from omp_mig_task, as well as overwriting thread.fs
and thread.gs with arch.fs and arch.gs (setting up our segmentation
registers). we then copy the contents of the tls_array structure, which http://lwn.net/Articles/5851/
contains the 'thread local space' segment offsets. this function always
returns 0. arch_mig_receive_fp is called by mig_do_recieve_fp, from
hpc/migrecv.c. its function is to set up the FPU state from the passed
in omp_mig_fp structure. we start by calling unlazy_fpu, to initialize
the FPU, then we check wether the current CPU has the fsxr instruction,
and whether the remote CPU has the fsxr instruction. if they both do,
or if they both don't, that means the floating point save is in the
same format, so we just memcpy the state from the omp_mig_fp struct to
the task's thread.i387 structure. otherwise, we call one of the above
two conversion functions (fxsave_to_fsave, or fsave_to_fxsave) to
perform the copy, while translating the formats. the next two functions
are called by mig_do_send in hpc/migsend.c, before and after doing the
actual work of sending a task to another node (home or remote).
arch_mig_send_pre clears the LDT if there is one set for this process,
and arch_mig_send_post loads the LDT back up, if there is one.
the next three functions are the send side, to match the three
arch_mig_recieve functions earlier. all three of these functions are
called from mig_do_send, in hpc/migsend.c. arch_mig_send_specific is a
STUB! stub that looks like it was supposed to send the LDT, but instead
prints a warning if an LDT is being used. arch_mig_send_fp is called
to send the FPU state. we call unlazy_fpu, then fill in the fp
state(along with the fxsr flag). arch_mig_send_proc_context is called
to send the CPU context of a task. in it, we store the user registers,
segmentation registers (FS and GS), the thread local space entries, and
the debugreg registers to the passed in struct omp_mig_task. in
addition, if this task is marked DDEPUTY (meaning we're on the home
node), we also send the features of the boot CPU. arch_kickstart is the
function called to start up a newly "created" task. in it, we set up
debug registers 0-3, 6, and 7 with set_debugreg(). we intentionally
omit registers 4 and 5 due to them being just aliases for 6 and 7. we
use load_TLS, sets up the thread local spaces, and use loadsegment to
do we need to flush pending signals? load our FS and GS registers. we set CS to __USER_CS, flush pending
signals, and execute an assembly fragment that causes us to immediately
jump to the ret_from_kickstart entry point in entry.S. at this point
split this back off. there's a break in the file, like this section used to be another file.
we include some headers, then define three functions that are part of
our syscall handling subsystem. arch_exec_syscall is called by
deputy_do_syscall, to call a requested syscall on behalf of a remote
process. we use OMDEBUG_SYS to print a tracing message, look up the
requested syscall in the sys_call_table, and return the result of
calling it (through a function pointer) with the passed in arguments.
these functions belong in the same the next two functions are called via the remote_sys_call_table in
place as user_thread! /arch/i386/kernelomasm.h, by guest processes. om_sys_fork is called by
a guest process, trying to fork. we just wrap remote_do_fork, passing
it a clone_flag of SIGCHLD, and null arguments for parent and child
thread pointers. om_sys_clone performs similarly, first checking for a
new stack pointer in CX. if there isn't one, we re-use the current
task's stack pointer. we accept the clone_flags in register ebx, the
parent tidptr in edx, and the child tidptr in edi. we pass all of this
to the same remote_do_fork as the previous function.
027 ppc, local, creates hpc/arch-ppc.c this patch is very similar to the previous patch, but cleaner.
remote, arch_mig_receive_specific just returns 0. its called by mig_do_receive
arch_mig, from hpc/migrecv.c. its purpose is to receive the archetecture specific
syscall part of a process, which aparently the PPC dosent have.
arch_mig_receive_proc_context is called at the top of
mig_do_receive_proc_context, from hpc/migrecv.c. its purpose is to set
the user registers of the current task to the contents of the passed in
omp_mig_task structure. in it, we simply use ARCH_TASK_GET_USER_REGS to
check return of memcpy! retreive the registers in question, then memcpy over them from our
passed in structure. we always return 0. arch_mig_receive_fp is called
by mig_do_receive_fp, from hpc/migrecv.c. its function is to set up the
current task's FPU state to the one passed in the omp_mig_fp structure.
in it, we memcopy the floating point registers from the passed in
FIXME: fpscr_pad not needed? structure over the task->thread->fpr, and copy the fpscr and fpscr_pad
as well. arch_mig_send_pre and arch_mig_send_post are void no-ops.
their purpose is to make a process "ready to be migrated" while we're
pulling the process apart, which aparently dosent need done on PPC.
they're called at the begining and end of mig_do_send in hpc/migsend.c,
respectively. arch_mig_send_specific is also a no-op, as the PPC has no
architecture specific "parts" of a process. in it, we just return 0.
arch_mig_send_fp is called by mig_do_send to fill the passed in
omp_mig_fp structure with the floating point state of the current
check this memcpy! task. in it, we memcopy the task->thread->fpr structure into the
FIXME: fpscr_pad not needed? omp_mig_fp, and set the fpscr and fpscr_pad members as well. we always
return 0. arch_mig_send_proc_context is called by mig_do_send to fill
in the passed in omp_mig_task structure with the CPU state of the
current task. in it, we use ARCH_TASK_GET_USER_REGS to get the pt_regs
check this memcpy! structure, and just memcpy it into our omp_mig_task structure. we
return 0. arch_kickstart is called by mig_handle_migration to start a
guest process for the first time. to accomplish this, we get the user
what are we doing with mr 1, or the registers, and branch to ret_from_kickstart, passing our user registers
user registers? as input. arch_exec_syscall is called by deputy_do_syscall to call a
requested syscall on behalf of a remote process, returning its result.
we look up the requested syscall in the sys_call_table, and return the
result of calling it.
028 x86_64 creates hpc/arch-x86_64.c similar to the previous file. <asm/uaccess.h>
arch_mig_receive_specific is a stub, returning 0. <linux/kernel.h>
arch_receive_proc_context copies from our task structure to our <linux/kallsyms.h>
none of these functions return failure. omp_mig_task structure the user registers, ds, es, fs, gs, fsindex, <linux/sched.h>
why do they not return void? gsindex, then uses write_pda to set gs to point to the per-processor <hpc/debug.h>
datastructure. arch_mig_receive_fp calls unlazy_fpu, then memcopies the <asm/ptrace.h>
thread.i387 datastructure. arch_mig_send_pre clears the LDT, <asm/desc.h>
arch_mig_send_post loads the LDT (if we have one in our context). <asm/i387.h>
arch_mig_send_specific is a stub, returning 0. <hpc/protocol.h>
arch_mig_send_proc_context copies from our omp_mig_task structure to <hpc/arch.h>
our taso structure the user registers, ds, es, fs, gs, fsindex, <hpc/task.h>
gsindex, then uses read_pda to get the pointer to our per-processor <hpc/syscalls.h>
datastructure out of the gs register. arch_kickstart sets debugging <hpc/prototype.h>
registers 0-3,6,7, loads up the segmentation registers, flushes
pending signals, and jmps to ret_from_kickstart. arch_exec_syscall just
calls a given syscall returning the results the syscall returned.
asmlinkage om_sys_fork calls remote_do_fork. om_sys_iopl, om_sys_vfork,
om_sys_clone, om_sys_rt_sigsuspend, and om_sys_signalstack are declared
as unimplimented functions, printing an error and returning -1 when
called.
029 kcom creates hpc/comm.c this is the kernel-to-kernel communication system. we use tcp/ip <linux/sched.h>
sockets to pass information back and forth between kernels. <linux/socket.h>
sock and sk need sync'd first, we define three timeout variables (conn_remote_timeo, <linux/in.h>
comm_connect_timeo, and comm_reconn_timeo), which are initialized from <linux/in6.h>
bad comments values #defined elsewhere. comm_shutdown, is a wrapper to safely call <linux/net.h>
sock->ops->shutdown. comm_getname is a wrapper to safely call <hpc/mig.h>
sock->ops->getname. it returns -1 if somethings null that shouldnt be, <hpc/debug.h>
or if getname returns null. comm_data_ready is a wrapper which calls <hpc/comm.h>
wake_up_interruptable to wake up task(s) in the sockets sleeping task <hpc/task.h>
who else needs to do this? queue. comm_setup_tcp first saves our current address space limit, http://mail.nl.linux.org/kernelnewbies/2001-11/msg00204.html
turns on kernel address space, uses sock_setsockopt
should this be comm_wrappers.c? to set SO_KEEPALIVE, then uses sock->ops->setsockopt to set
TCP_KEEPINTVL TCP_KEEPCNT, TCP_KEEPIDLE, and TCP_NODELAY. it restores
our origional address space limit, and exits. comm_socket is a wrapper
around sock_create, returning NULL on error. comm_bind is a wrapper
missing checks! around sock->ops->bind that logs an error via printk, comm_listen is a
missing checks! wrapper around sock->ops->listen. comm_connect connects to a remote
missing checks! kernel via the passed socket, to the passed address. it adds the
current process to the socket's sleeping task queue, and asynchronously
asks for the connection to be established. we enter a loop, marking
the current process TASK_INTERRUPTIBLE, requesting connection
establishment asynchronously, then uses schedule_timeout to go away.
when the connection succeeds, we leave the loop, mark the current
process TASK_RUNNING, and return 0. comm_close is a wrapper around
sock_release. comm_peek returns wether a socket has data pending.
sighfile needs more docs. comm_poll waits on an "event" to occur on a socket via poll(), until
the passed timeout period, or MAX_SCHEDULE_TIMEOUT. it uses a similar
method as the earlier comm_connect, only we use poll() to see if there
is any data waiting for us on the socket. if there is, return 1,
comm_wait should be a define? otherwise we return 0 when we hit our timeout period. comm_wait is a
wrapper around comm_poll, filling in some default parameters. com_accept
receives a passed socket thats been connected to, creates a new socket,
and uses it to accept a connection from a remote kernel. once comm_poll
indicates theres data on the passed socket, we use comm_setup_tcp
to set the connection options on the new socket. if that succeeds, we
return 0. otherwise, we destroy our newly created socket, and return
the relevant error. comm_dorecv wraps the sock_recvmsg api to
s/lenght/length read a given ammount of data from a socket. comm_recv wraps
comm_dorecv, but also uses the address space change trick of earlier to
jump into KERNEL_DS, and in case of short read we OMBUG(), then call
comm_shutdown(link), returning the error from comm_dorecv. comm_send
when should we printk, uses the address space change, then wraps sock_sendmsg(). in case of
when should we OMBUG()? short send, it printks and just returns the error. next is a
"openmosix specifics start here" marker in the comments. hpc/protocol.h
set_our_addr sets up the passed sockaddr structure with its default
family, INADDR_ANY, and the passed port. comm_setup_listen uses
comm_socket, comm_bind, and comm_listen to set up a listening socket.
comm_setup_connect opens a connection to a target
kernel using comm_socket, then comm_connect. comm_send_hd sends a data
segment, with a omp_req header, then the data itsself.
finally, comm_send_req sends a omp_req structure, containing only the
type, no data.
030 rmem creates hpc/copyuser.c this file contains routines for moving chunks of memory over an <linux/sched.h>
established connection. its broken into two parts, deputy_* functions, <hpc/protocol.h>
and remote_* functions. deputy_ functions are run on the home node, and <hpc/debug.h>
remote_ functions run on the node a process has been migrated to. <hpc/prototype.h>
deputy_copy_from_user requests a given memory segment from the remote <hpc/hpc.h>
node. it uses comm_send_hd to send the address to read and the size to
OMDEBUG_CPYUSER() is being used in the read to the remote host. it then uses comm_recv to recv the results
deputy code to printk with a unique directly to the passed destination. its symbol is exported via
format? EXPORT_SYMBOL(). deputy_strncpy_from_user requests a given memory
segment from the remote node, and should be merged with the previous
function. it uses comm_send_hd to send the address to read and the size
to read to the remote host. it then uses comm_recv to recv the results
directly to the passed destination. its symbol is not exported.
deputy_copy_to_user functions similarly, using comm_send_hd to send the
address to write and the size, then comm_send to send the data to be
written to the remote node. its symbol is EXPORT_SYMBOL'd.
deputy_strnlen_user sends the address and length via comm_send_hd, then
uses comm_recv to get the result from the remote node. its symbol is
EXPORT_SYMBOL'd. deputy_put_userX writes a value of 64bits or less
using a single call to comm_send_hd. its symbol is not exported.
deputy_put_user puts a long to remote by calling deputy_put_userX.
its symbol is EXPORT_SYMBOL'd. if BITS_PER_LONG < 64, we create a
deputy_put_user64 that uses deputy_put_userX to put a up to 64 bit
value to remote, and EXPORT_SYMBOL it. deputy_get_userX gets a 64 bit
or less value from remote using comm_send_hd, then comm_recv. its
symbol is not exported. deputy_get_user wraps deputy_get_userX,
warning us if its asked for something greater than sizeof(long). its
symbol is EXPORT_SYMBOL'd. if BITS_PER_LONG < 64, we create a
deputy_get_user64 that uses deputy_get_userX to get a 64 bit value
from the remote node. its symbol is EXPORT_SYMBOL'd. at this point, we
start into code running on the remote node, responding to the above
sections of code. remote_copy_user handles requests from d
eputy_copy_to_user and deputy_copy_from_user. its symbol is not
exported. remote_strncpy_from_user performs strncpy_from_user on
behalf of deputy_strncpy_from_user. it uses comm_recv to get its
target, and comm_send to return the results. its symbol is not
exported. remote_strnlen_from_user performs strnlen_user or strlen_user
on behalf of deputy_strnlen_user. it works similarly to
remote_strncpy_from_user. its symbol is not exported. remote_put_user
will use put_user on behalf of the home node in up to a 64bit size. its
missing BITS_PER_LONG logic that should be like the following function.
its symbol is not exported. remote_get_user is structured similarly.
its got BITS_PER_LONG==64 logic. its symbol is not exported. finally,
we have remote_handle_user, which is the function that dispatches up to
above remote_ functions. it calls com_recv looking for a req structure.
other than that, its a large select case. we return from it when we
receive a endtype packet, returning 0. if theres an unrecognised
packet, we call remote_disappear to die.
031 omctrlfs creates hpc/ctrlfs.c omctrlfs is the future filesystem for performing migration and http://osdir.com/ml/linux.cluster.openmosix.devel/2006-01/msg00028.html
remote process state monitoring. this file is a stub of support for <linux/config.h>
this filesystem type. CTRLFS_MAGIC is the magic string at the begining <linux/module.h>
of the FS for the filesystem layer to recognise this FS type. <linux/fs.h>
ctrlfs_fill_super wraps simple_fill_super(), passing it our <linux/mount.h>
CTRLFS_MAGIC, and our empty list of files. ctrlfs_get_sb wraps
get_sb_single(), telling it to use ctrlfs_fill_super to generate our
filesystem's superblock. we then have a file_system_type structure,
mapping .get_sb to our ctrlfs_get_sb, and .kill_sb to a generic cleanup
function. om_ctrlfs_init is called from the kernel to init the
module. it calls register_filesystem() with the previously defined
file_system_type structure. om_ctrlfs_exit is called previous to
removing the module. it calls simple_release_fs(), then
unregister_filesystem(). we then define the init and exit points for
the module, register the license and the author.
032 debug creates hpc/debug.c this file contains debugging assisting code. it starts with debug_mlink <asm/uaccess.h>
which is a wrapper which printks the address of a socket. <linux/kallsyms.h>
debug_page creates a checksum of a 4096 byte page of memory, and <linux/sched.h>
check incoming pointers! printks the results. debug_vmas dumps the starting address and ending <linux/config.h>
address of each vma belonging to a given mm_struct. debug_signals is a <hpc/debug.h> <hpc/protocol.h> <hpc/comm.h>
stub, not printking anything of value.
033 debugfs creates hpc/debugfs.c this file contains the debugfs module. it starts with a dentry <hpc/hpc.h>
structure for the om/ debugfs directory itsself, then we define four
file entries, pointing the migration, syscall, rinode, and copyuser
move om_opts here? files to entries the om_opts structure (defined in hpc/kernel.c),
we don't seem to be using these and an array of dentry structures for the four files. om_debugfs_init
debug values anywhere else, what is is called to initialize the module. it calls debugfs_create_dir to
the use of this code? create the om debugfs directory, then debugfs_create_u8 to create
entries to our four files in the directory. om_debugfs_exit is called
previous to removing this module. it uses debugfs_remove to destroy the
entries for the four files, then the directory itsself. we then have
code defining the entry and exit points of the module, the license,
and the author.
034 i386,arch-debug creates hpc/debug-i386.c archetecture specific debugging code, i386 version. om_debug_regs dumps <asm/uaccess.h> <linux/kallsyms.h> <linux/sched.h>
the user register set of the passed in, or current process otherwise <hpc/debug.h> <asm/ptrace.h> <asm/desc.h>
remove one uaccess.h include. known as the pt_regs structure. if no pt_regs structure is passed in, <asm/i387.h> <asm/uaccess.h> <asm/ptrace.h>
remove one ptrace.h include. we use ARCH_TASK_GET_USER_REGS to retreive the structure from the <hpc/protocol.h> <hpc/arch.h> <hpc/task.h>
current process. debug_thread dumps thread related registers.
show_user_registers is shamelessly stolen according to the comments,
and does a much better job of dumping the full state of a user process
than om_debug_regs, including code pointer, stack pointer.. lots of
debugging.
035 ppc,arch-debug creates hpc/debug-ppc.c archetecture specific debugging code, ppc version. om_debug_regs dumps <asm/uaccess.h> <linux/kallsyms.h> <linux/sched.h>
the pt_regs structure passed in, or if NULL is passed in, the pt_regs <hpc/debug.h> <asm/ptrace.h> <asm/uaccess.h>
remove one uaccess.h, one ptrace.h structure of the current process. debug_thread and show_user_registers <asm/ptrace.h> <asm/processor.h> <hpc/protocol.h>
are stubs, doing nothing and returning nothing. <hpc/arch.h>
036 x86_64, creates hpc/debug-x86_64.c archetecture specific debugging code, x86_64 version. om_debug_regs <asm/uaccess.h> <linux/kallsyms.h> <linux/sched.h>
arch-debug dumps the pt_regs structure passed in, or if NULL, of the current <hpc/debug.h> <asm/ptrace.h> <asm/desc.h>
process. debug_thread and show_user_registers are stubs, doing nothing <asm/i387.h> <asm/uaccess.h> <asm/ptrace.h>
and returning nothing. <hpc/protocol.h> <hpc/arch.h> <hpc/task.h>
037 omremote creates hpc/deputy.c deputy.c contains functions for servicing requests from a remote <linux/sched.h>
process, AKA, communication to the home node, from a process that is a <linux/signal.h>
guest on a remote node. first, theres deputy_die_on_communication, <linux/file.h>
rename deputy_die_on_communication to which in spite of its name is called by deputy_process_communication to <linux/mount.h>
deputy_die kill the deputy when communication with the remote node containing the <linux/acct.h>
remote half of the process fails. it printk's a message, then calls <asm/mmu_context.h>
do_exit(SIGKILL). deputy_do_syscall receives a syscall request from the <hpc/comm.h>
remote process and executes it, returning the result. deputy_do_fork <hpc/task.h>
processes a fork on behalf of the remote process. it opens up a new <hpc/mig.h>
connection to the remote node, calls do_fork, then uses task_set_comm <hpc/arch.h>
to make the child process communicate over the newly created <hpc/syscalls.h>
connection. deputy_do_readpage uses task_heldfiles_find to find a <hpc/debug.h>
given file owned by the current deputy process, maps a single page into <hpc/prototype.h>
memory, sends the contents to the remote node, then unmaps the page.
deputy_do_mmap_pgoff is called by do_mmap_pgoff in mm/mmap.c to perform
the same function as do_mmap_pgoff's lower half, with differences for
deputy processes. to accomplish this, we allocate memory for a vma
structure from SLAB_KERNEL and zero it. we set up a vma structure in
this memory coresponding to the memory area we've been requested to
occupy, and pass it to our passed file *'s mmap f_op handler. we then
add this file to our held files for this process by calling
task_heldfiles_add. theres a comment here indicating that we're
fill in missing code! supposed to insert the vma into our current->??, but the code for that
isn't yet written. deputy_do_mmap is called from
deputy_process_communication below. it uses do_mmap_pgoff in mm/mmap.c
to mmap a file into the deputy, and returns the mmapped region's
contents to the remote host. bprm_drop is used by the later declared
__deputy_do_execve to destroy a linux_binprm structure, which is an
executable program and arguments, destroying its pages, its security http://www.kernel-api.org/docs/online/1.0/da/d1e/structlinux_\
_binprm.html
context, mm structure, and calling fput() on all its writable files.
__deputy_do_execve uses search_binary_handler to attempt execve on
the home node. if it was successful, we have a FIXME indicating we
should be freeing the pages containing our arguments.
we then free bprm our security context, call acct_update_integrals
(to tell the accounting system about the new process), free the
bprm structure, and "return" to the new process. otherwise, we use the
above bprm_drop to clean up the failed execve attempt.
deputy_setup_bprm is used by the below deputy_do_execve to setup a bprm
structure suitable for execution by __deputy_do_execve. we allocate
space for our bprm structure from GFP_KERNEL. we use open_exec to
attempt to open our executable. if that succeeds, we fill in the bprm's
file, filename, interp, and mm members, using mm_alloc to fill in
mm. we use init_new_context to accomplish any archeteture specific
requirements. on x86, this function copies the local descriptor table
of the current process to the new process, assuming it has been
customized. we copy argc and envc, making sure neither is less than
zero. we allocates a security context, then use prepare_binprm to fill
in the rest of the bprm structure. we use copy_strings_kernel to copy
our filename, our safely copy our filename, argv, and envp array
into kernel pages, instead of user space memory. if any of the above
fails, we use bprm_drop to clean up in case of error. deputy_do_execve
processes an execve request from the remote process to execve a new
executable. it calls comm_recv to receive the requested file, argv, and
envp, deputy_setup_bprm to get a brpm structure ready to execute, then
__deputy_do_execve to perform the work. we then use comm_send_hd to
send back an empty reply. if any of the above fails, we call bprm_drop
to destroy our bprm structure. deputy_do_sigpending is a wrapper around
do_signal. it has code for doing more, but its dead/unused code.
deputy_process_misc checks for pending dreqs, and dispatches them to
task_do_request. it then checks for pending signals, and dispatches
them to deputy_do_sigpending. its called by deputy_main between
communication events. deputy_process_communication contains the switch
case that calls the aforementioned functions. it calls
deputy_die_on_communication if comm_recv returns an error, if the type
member of the req received is zero, or if one of the functions we call
returns negative. deputy_main_loop is the userspace loop that is
executed on the home node when a process has gone remote. it calls
deputy_process_communication when comm_wait returns true. it then calls
deputy_process_misc to accomplish dispatching of events. deputy_startup
uses task_set_dflags to mark this task as deputy, flushes a signal that
pops up for unknown reason, according to a fixme, and calls exit_mm,
which is a forward declare from kernel/exit.c.
038 omremotefile creates hpc/files.c this file contains routines for handling file access on the home node <linux/fs.h>
for processes that are running on a remote node. it starts by declaring <linux/list.h>
move remote_aops inside of two structures. remote_aops is an address_space_operations structure, <linux/sched.h>
rdentry_create_entry mapping .readpage to remote_readpage, and not touching any other <linux/file.h>
mappings. the second structure is remote_file_operations, mapping .mmap <linux/pagemap.h>
to remote_file_mmap, and not touching any other mappings. <linux/mm.h>
task_heldfiles_add is called by deputy_do_mmap_pgoff in hpc/deputy.c, <hpc/comm.h>
to create and insert a om_held_file structure representing a file into <hpc/prototype.h>
our linked list of held files. it allocates the om_held_file struct <hpc/debug.h>
from GFP_KERNEL, uses get_file to increment the file usage counter, http://www.faqs.org/docs/kernel_2_4/lki-3.html
remove nb member? fills in the om_held_file's file and nb entries with our passed file
pointer, fills in rfile->nopage with nopage from the
vm_operations_struct passed in, and inserts our om_held_file struct
into task->om.rfiles with list_add. we then return 0, since get_file
and list_add can't return errors. task_heldfiles_clear is called by
openmosix_task_exit to destroy the linked list containing all the files
held by the process. for each file in the list, it calls fput to
decrement the file usage counter, then frees the om_held_file
structure. task_heldfiles_find searches the list of heldfiles for a
om_held_file whos file member matches the passed in file pointer. it
uses list_for_each_entry to iterate over items. if it finds a match,
we return the heldfile, otherwise, we printk() an error, and return
NULL. next we have a structure declaration that has been commented out
with a #if 0 block. it was to declare a backing_dev_info structure.
after that, theres a break in the file, indicating the rest of the file
why is rfiles in the task structure, is different from the above. this section starts by defining the
and dentries are stored globally? om_remote_dentry structure, then defining a spinlock, and a list_head
for containing remote dentries. rdentry_delete aquires the
remove dead code. remote_dentries spinlock, and removes the first entry in the list with
a dentry member that matches the passed in dentry. if it dosent find a
matching entry, it calls BUG(), and returns -ENOENT. rdentry_iput frees
a passed inode's generic_ip member (which contains our rfile_inode_data
structure), then calls iput to both push an inode's contents to disk,
and decrement its usage counter. the struct remote_dentry_ops maps
its .d_delete and .d_put entries to the previous two functions. the
previous two functions, and this structure are not used anywhere in
the code. we declare a super_operations structure, containing no
operations, then we use this structure to fill the .s_op member when
declaring a super_block structure, also filling the .s_inodes member
move remote_file_vfsmnt inside of with a new LIST_HEAD. struct remote_file_vfsmnt is a "empty"
rdentry_create_file. vfsmount structure, contining five list heads, and a mount count. it is
declared to be its own parent. rdentry_add_entry creates an to
om_remote_dentry structure to contain a passed in dentry. it
allocates the om_remote_dentry from GFP_KERNEL, sets the dentry member
to the passed dentry, aquires the remote_dentries spinlock, adds the
om_remote_entry to the remote_dentries list, and releases the spinlock.
if the kmalloc fails, we return -ENOMEM, otherwise we return 0.
rdentry_create_dentry is called by rdentry_create_file to create a new
dentry coresponding to the passed in rfile_inode_data. along the way,
it also registers the dentry with rdentry_add_entry. first, we create
a new inode, backed by our dummy rfiles_dummy_block. we create a
duplicate of the passed in rfile_inode_data allocated from GFP_KERNEL,
and set inode->u.generic_ip (the inodes private data space) to point to
the new copy. the inode's file and address space operations are pointed
to our earlier stubs remote_file_operations, and remote_aops. we
allocate a dentry using d_alloc, set its inode to this new inode, set
its .name to be "/", and makes it its own parent. we use
rdentry_add_entry to add this to our remote_dentries list, and return
the new dentry. the error handling in this function seems VERY broken.
if either of our alloc calls fails (kmalloc or d_alloc), we free our
passed in data(!), call iput on our allocated inode, and return NULL.
rfile_inode_get_data is a wrapper returning inode->u.generic_ip.
rfiles_inode_get_file is a wrapper returning
rfile_inode_get_data(inode)->file. rfiles_inode_compare is a wrapper
that memcmps the passed inode's private data space against a supplied
rfile. returning the result. rdentry_find finds a rdentry whos dentry's
inode matches the passed in inode. it grabs the remote_dentries
spinlock, and uses list_for_each_entry to cycle through all of the
rdentries, comparing to rdentry->dentry->d_inode. if it finds a match,
it breaks out, unlocks the spinlock and returns the dentry of the
rdentry structure that was a match. otherwise, it unlocks the spinlock
verify this works. and returns NULL, due to the last dentry being NULL.
rdentry_create_file creates a file pointer matching the supplied
rfile_inode_data. it uses get_empty_filp to create an empty file
pointer, then uses dget(rdentry_find(data)) to get a dentry pointing to
the passed rfile_inode_data. if dget fails, we call
rdentry_create_entry to create a dentry pointing to our passed
rfile_inode_data. if our rdentry_create_entry call fails, we call
put_filep to close our file pointer, and return NULL. otherwise, we use
the remote_file_operations and remote_file_vfsmnt structures to set the
file pointer's f_op and f_vfsmnt members, set f_dentry to our dentry,
and mark the file pointer FMODE_READ. we then return the file pointer.
task_rfiles_get is called by mig_do_receive_vma and remote_do_mmap to
search through the processes' vma pages, and check to see if any of
them have a paticular file associated with them. first, we construct a
rfile_inode_data containing our passed in origfile, node, and isize.
we then compare it against our list of rdentry files, using
rfiles_inode_compare. if rfiles_inode_compare returns true,
task_rfiles_get returns the file pointer associated to the inode in
question. if not, it calls rdentry_create_file to create a
new rdentry containing the passed in file, an returns s the file
pointer returned from rdntry_create_file.
039 kcomd creates hpc/kcomd.c kernel-to-kernel socket communication code. this file is set up to <linux/sched.h>
create a kcomd.ko kernel module. it starts with three socket_ <linux/socket.h>
functions. socket_listen creates a socket, calls sock_map_fd to <linux/in.h>
associate an fd to the socket, binds to it using its sock->ops->bind(), <linux/in6.h>
starts listening using its sock->ops->listen(), sets the passed in <linux/net.h>
pointers res to point to the newly created socket, and returns the file <linux/syscalls.h>
descriptor to the now established stream. if sock_create fails, we <net/sock.h>
return -1. the sock_map_fd fails, we release our sock, assign NULL to <net/tcp.h>
the address passed via res, and return -1. if either our bind or listen
fails, we close our fd, release our sock, assign NULL to res, and
return -1. socket_listen_ipv4 and socket_listen_ipv6 are called by
kcomd_thread to set up the correct type of listening socket. both
these functions are wrappers of the above socket_listen function.
they set up their appropriate type of sockaddr structure, and call
move these structures to a private socket_listen. struct kcom_pkt is designed to contain a packet destined
header. to a remote kernel. struct kcom_node is a container for a socket, and
the information reguarding the node it points to. kcom_task is the
structure that contains kcomd's knowlege about a migrated process. it
contains the pid of the process in question, a kcom_node structure
defining what node a process is on, a list of processes communicating
with this node(?), a list containing outgoing packets, and a space for
one incoming packet. we define a spinlock and a list_head for
containing kcom_nodes. we then define sockets_fds as a fd_set_bits
structure. this structure is a more scalable version of a fd_set, used
by do_select. we then declare sockets_fds_bitmap and maxfds, which are
set and used by the next function, alloc_fd_bitmap, to hold a
dynamically grown array of fds. alloc_fd_bitmap takes the passed in fd
count, and if its greater than what the current sockets_fds_bitmap was
created to hold, frees sockets_fds_bitmap (and its contents), and
allocates a new one. if kmalloc fails, we return ENOMEM. otherwise, we
set the in, out, ex, res_in, res_out, and res_ex members of the
sockets_fds structue to offsets of our sockets_fds_bitmap structure,
and return 0. kcom_pkt_create creates a new kcom_pkt structure with the
len, type, and data members initialized to the passed in values. if
kzalloc fails, we return NULL. __kcom_node_find is called by the later
defined kcom_node_find to do the work of finding a node in our
kcom_nodes list that uses the passed sockaddr to communicate. we use
doublecheck this return list_for_each_entry and memcmp to compare the address of our sock with
BUG: note the fixme reguarding memcmp the address of our node(!). this function will return NULL if it fails.
kcom_node_find wraps __kcom_node_find, grabbing the kcom_nodes_lock
before entry, and releasing it afterward. kcom_node_add is called by
accept_connection to create a new kcom_node struct, and adds it to the
kcom_nodes list. there is code commented out reguarding finding out if
the node is already in the list, but its incomplete. kcom_node_del
removes a node from the kcom_nodes list that uses the passed in
sockaddr. we aquire the kcom_nodes spinlock, then use __kcom_node_find
to find the node structure to be deleted. if we don't find one, we
release the kcom_nodes spinlock, and return -ENOENT. otherwise, we call
list_del to remove the node from our node list, release the spinlock,
close its fd, release its socket, free the node structure's memory, and
pull dead code. return 0. comm_simple is a stub that returns 0, and is not used
elsewhere in the code. we then declare comm_ack, comm_iovec, and
comm_iovec_ack, which also are not used anywhere else.
accept_connection is called by kcomd_thread (declared later), to accept
an incoming connection on a passed in socket. it starts by allocating
a new socket, and calling the accept() operation of the passed in
socket to accept a connection from the passed in socket, on our new
socket. theres a block of commented out code, for checking if a node
is already in our node_list, but its unused/incomplete. we then use
sock_map_fd to get a file descriptor to this socket, add the node this
socket is communicating to to our node_list, and return our file
descriptor. if our socket allocation returns null, we return -1. if our
accept or sock_map_fd have problems, we release our socket, and return
-1. if our kcom_node_add fails, we close our fd, release our socket,
then return -1. data_read, data_write, and dispatch are all stubs that
return 0. data_read and data_write are called by kcomd_thread.
kcom_task_create creates a kcom_task structure allocated from
GFP_KERNEL for a given kcom_node and PID, initializing the pid, node,
and list members. if the kzalloc returns NULL, we return NULL.
kcom_task_delete deletes the first entry in the nodes list that matches
the given PID. these task list manipulation functions are missing the
spinlock code that the above node_list manipulation code has.
__kcom_task_find and kcom_task_find are formed like the above node find
code, but without its spinlock code. kcom_task_send uses
kcom_pkt_create to add a packet to the task structure belonging to the
pid passed in. it has comments reguarding sleeping and replying, but
instead it returns 0. kcomd_thread is the function executed in kernel
space, as a kernel thread. first, we call daemonize to create a "kcomd"
process. we then wait for a connection on an ipv4 and an ipv6 socket.
when we receive a connection, we enter a large while loop (which we
never exit?). in this loop, we first call alloc_fd_bitmap to make sure
our fd bitmap is big enough to hold maxfds number of fds. we then zero
the in, out, and ex fd sets, add our two listening sockets to the in
set, add the listening fds of each node in our node_list to the in set,
add each fd in our node list that we have packets to send on to the out
set, zero the res_in, res_out, and res_ex set of fds, and call select.
if select returns -1, we return to the top of our loop. otherwise, we
test wether our v4 or v6 listening socket received a connection. if so,
we call accept_connection. we then test each fd belonging to our list
of nodes, and if they have data to read, call data_read (a NOP!), or
if they have data to be written call data_write(also a NOP!).
at this point, we return to the top of our never-ending while loop.
kcom_init calls kernel_thread to start the aforementioned kcomd_thread
function. the rest of the file is just module glue for creating a kcomd
module, licensing it GPL, and attributing Vincent Hanquez as the
author.
040 config creates hpc/Kconfig this file defines our openmosix menu options in the kernels
configuration system (menuconfig). we declare a top level menu titled
"HPC Options". our configuration options all exist under this entry.
first, we create an entry defining KCOMD as a tristate, or an item that
can be either on (in the kernel), off, or a module (loadable and
unloadable while the kernel is running). next we create an entry
defining OPENMOSIX as bool (in kernel, or not). this turns on or off
the parts of openmosix that have to be in-kernel for openmosix to
function. bool OPENMOSIX_VERBOSE is supposed to make openmosix more
verbose, but just serves to make OPENMOSIX_MIGRATION_VERBOSE and
OPENMOSIX_DEBUG_FS visible. bool OPENMOSIX_MIGRATION_VERBOSE enables
debugging messages of the form OM_VERBOSE_MIG(...) in
include/hpc/prototype.h. bool OPENMOSIX_DEBUG accomplishes many things.
first, it enables compilation and inclusion of hpc/debug.c, and an
archetecture specific hpc/debug-$(ARCH).c, both of which contain
functions for printing the state of various structures, processor
registers, and other associated values. then, it enables debugging
messages of the form OMDEBUG(...) in include/hpc/debug.h. it enables
the tracking of the contents of the structure openmosix_options in
include/hpc/hpc.h, and makes OPENMOSIX_MIGRATION_DEBUG and
remove dead code. OPENMOSIX_DEBUG_FS visible. bool OPENMOSIX_MIGRATION_DEBUG
dosent do anything, and can be safely removed. bool OPENMOSIX_DEBUG_FS
enables the compilation and inclusion of the contents of hpc/debugfs.c,
creating the om/ directory and its contents under the debugfs. bool
OPENMOSIX_CTRL_FS enables the compilation and inclusion of
hpc/ctrlfs.c, which is the control filesystem used to tell the kernel
to migrate processes, as well as where to tell what node a process is
running on.
041 openmosix creates hpc/kernel.c this file is the kernel's interface to the openmosix system. it
contains only functions that are meant to be called by the kernel.
first, we export our openmosix_options datastructure, which contains
four constants that are used as "ceilings" for the OMDEBUG_* debugging
macros, settable through the debugfs. openmosix_pre_clone is called
when a process requests the clone syscall, before the kernel starts
processing it. in this function, we check wether the current process
has requested a shared memory space between the two clones, and if it
has, we mark the process as un-migratable for that reason, and increase
the usage count on its mm structure. note that as a result, both
processes will be marked DSTAY_CLONE, and both will have a usage count
+1 on the mm structure. processes are started with a usage count of 1.
openmosix_post_clone is called by the clone syscall, on the thread of
the parent, not the child, after the clone is completed. it checks the
magic! mm_realusers counter. if its just 1, then somehow the process magically
is this supposed to happen when a decresed its usage flag, and we clear the DSTAY_CLONE flag given to it
child dies, or otherwise drops the by openmosix_pre_clone. task_maps_inode is supposed to check wether a
shared mm? stub! given task maps a given inode, but is just a stub.
monkey? openmosix_no_longer_monkey is called from __remove_shared_vm_struct
to check every process on the machine and see wether its using the
passed in inode. if it is, we set the DREQ_CHECKSTAY flag, as this
inode is about to be removed from service, and doing such may make this
process migratable. we aquire the tasklist_lock around our invocation
of for_each_process(). since the previous function is a stub, this
function does nothing. stay_me_and_my_clones is called by sys_mlock and
sys_mlockall in mm/mlock.c, as well as do_mmap_pgoff in mm/mmap.c. it
applies a given bitmask of reasons to the current task, and all tasks
that share its mm structure. first, it uses task_lock to lock the
current process, sets its stay reason, and task_unlocks. if the number
of mm_realusers is greater than one (some other process uses this
processes mm structure), we grab the tasklist_lock, use
for_each_process to search for processes with the same mm pointer, that
aren't the current process, and use task_lock/task_set_stay/task_unlock
to add our stay reasons to the found processes. obtain_mm is called by
mig_handle_migration() in hpc/migrecv.c and task_local_bring() in
hpc/migctrl.c to allocate a new mm structure, initialize it, an make
it the context of the current process. we start by checking to see if
there is currently a mm structure associated with the passed in task.
if there is, we call panic() to print a debugging message. we then
mm_alloc() a new mm, initialize it to hold our given task with
init_new_context(), aquire the mmlist_lock, initialize our new mm's
mmlist member with the mmlist of process zero, and release the
mlist_lock. we then assign this mm to our process by first aquiring the
task_lock(), saving our curent active mm, setting the task's active mm
and mm to our newly created mm, and task_unlock()ing. we call
activate_mm with our origional and new mm, then mmdrop the old
active_mm. if our mm_alloc() fails, we return -ENOMEM. if
init_new_context() fails, we destory our allocated mm, and return the
error init_new_context() failed with. otherwise, we return 0 for
success. unstay_mm is called by sys_munlock and sys_munlockall in
mm/mlock.c to request a re-evaluation of the stayability of the
current process, and all processes that share its mm structure. for the
premature optimization? looks good tho. common case of just one task using a given mm structure, we just call
task_set_dreqs(current, DREQ_CHECKSTAY). otherwise, we use
for_each_process() with a read_lock held on the tasklist_lock to
iterate through each process on the machine, checking if its using our
passed mm, and if so, we call task_set_dreqs(p, DREQ_CHECKSTAY) on it.
remote_pre_usermode is called by the later defined
openmosix_pre_usermode to check for communication events before
entering userspace. it calls comm_peek() to see if theres pending
input, and if there is, it calls remote_do_comm() to process the
communication in question. remote_pre_usermode always returns 0 for
success. deputy_pre_usermode is also called by openmosix_pre_usermode,
before jumping to userspace while handling a process in deputy state.
in this function, we just jump into deputy_main_loop, instead of going
to any real usermode code. when deputy_main_loop() returns, we return
0. openmosix_pre_usermode is called by assembly code in
arch/$ARCH/kernel/entry, when switching from kernel space to user
space. we first check for pending dreqs, and if we finds one, we save
our current irq mask, call task_do_request, and restore our irq mask
once task_do_request returns. after dispatching dreqs, we call one of
the previous two functions depending on wether the process is in
DDEPUTY or DREMOTE state. like before, we save our irq mask before
calling remote_pre_usermode or deputy_pre_usermode, then restore them
once we return from userspace. this function always returns 0 for
success. openmosix_init is called on subsystem load. it starts the
mig_daemon kernel thread, and returns 0. the last line in this file
tells the subsystem system to call openmosix_init on initializing this
kernel component.
042 config creates hpc/Makefile this Makefile contains the make fragments that tells the kernel
what targets to build in the hpc/ directory. this code contains
five targets, obj-$(CONFIG_KCOMD) obj-$(CONFIG_OPENMOSIX)
obj-$(CONFIG_OPENMOSIX_CTRL_FS) obj-$(CONFIG_OPENMOSIX_DEBUG)
and obj-$(CONFIG_OPENMOSIX_DEBUG_FS). each of these targets matches
with the configuration variables defined by hpc/Kconfig.
obj-$(CONFIG_KCOMD) says to compile kcomd.o. obj-$(CONFIG_OPENMOSIX)
says to compile kernel.o, task.o, comm.o, remote.o, deputy.o,
copyuser.o, files.o, syscalls.o, migrecv.o, migsend.o, migctrl.o,
service.o, proc.o, and an arch-$(ARCH).o file containing archetecture
specific functionality. proc.o has a comment noting that its "legacy
code". obj-$(CONFIG_OPENMOSIX_CTRL_FS) says to create ctrlfs.o.
obj-$(CONFIG_OPENMOSIX_DEBUG) says to include debug.o, and
an archetecture specific debug-$(ARCH).o. finally,
obj-$(CONFIG_OPENMOSIX_DEBUG_FS) says to include debugfs.o.
043 ominterface creates hpc/migctrl.c this file contains functions for moving processes via openmosix.
task_remote_expel is called either by task_remote_wait_expel or
remote_do_comm() in hpc/remote.c, to send a DREMOTE process back to
its origional node, merging it with its deputy. first we check to make
sure the task we've been passed is in DREMOTE state, and use BUG_ON
if its not. then, we use mig_send_hshake to request a migration back
from the home node. if this succeeds, we call mig_do_send to actually
perform the migration. after that, we destroy our link to the home node
by using task_set_comm to associate our link to null, then calling
comm_close() against our old link (returned by task_set_comm). we then
who gets this result? call do_exit(SIGKILL) to end the process. in case either of our
mig_send_hshake or mig_do_send calls fail, we OMBUG("failed\n"), and
return -1. task_remote_wait_expel is called by __task_move_to_node, to
return a task to the home node. it wraps the previous function, first
requesting permission to return home by sending a REM_BRING_HOME req,
then waiting on a DEP_COMING_HOME reply. if comm_recv fails, or we recv
something other than a DEP_COMING_HOME, we return -1. otherwise, we
call task_remote_expel. task_local_send is called by
__task_move_to_node to send a local task to a remote host. first, we
check to make sure the task is not in DDEPUTY state. if it is, we
returning success in case of 'error'? return 0, as this process is already running on a remote node.
otherwise, we open a new connection using sockaddr_setup_port and
comm_setup_connect, then attach it to this process with
task_set_comm. we set the current process into DDEPUTY state, and
why use hshake here, and req above? ask permission to send by sending a HSHAKE_MIG_REQUEST using
mig_send_hshake. if that succeeds, we call mig_do_send to actually send
the process to the remote node. when mig_do_send returns successfully,
the process has been sent to the remote node, and the local process is
now a deputy. we call deputy_startup, and return 0. if either
comm_setup_connect, mig_send_hshake, or mig_do_send returns failure, we
remove our DDEPUTY flag, destroy our link to the remote node (if
applicable), and return 0. task_local_bring is called by
__task_move_to_node to return a remote process to the current node,
re-merging it with its deputy. first, we check to make sure the current
returning success in case of error! task is in DDEPUTY state. if its not, we return 0. we then use
obtain_mm to get a new mm struct. we then make a DEP_COMING_HOME
request to the remote end, and use mig_recv_hshake to receive