hunk class file patched description noteworthy references
001 i386,config arch/i386/Kconfig this patch adds hpc/Kconfig to the build process
002 i386,remote arch/i386/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines
syscall insert a BLANK(); before our code used by the assembly code in arch/i386/kernel/entry.S. first, we
generate the offset to the om member of the task structure, from the
begining of the structure. we then generate an offset to the dflags
DDEPUTY and DREMOTE should have a member, inside of task.om, which is of type openmosix_task. finally,
following _asm, indicating they're we define DDEPUTY and DREMOTE constants, setting them to the DDEPUTY
the versions used by assembly code. and DREMOTE values defined in hpc/task.h.
003 i386,remote arch/i386/kernel/entry.S we modify this file to add two new entry points to the kernel, utilize
remotefork how do we make these changes the syscall mapping table in arch/i386/kernel/omasm.h(in both the
local conditional? #ifdefs? normal int 80h syscall path, and the sysenter path), make the two
syscall syscall exit points store a pointer to the thread_info of this process,
and insert a call to openmosix_pre_usermode in the userspace return
path. ret_from_deputy_fork is entered by a process, when its eip is set
to this function by the code we add in the function copy_thread in
if this is identical, why use it? arch/i386/kernel/processes.c. ret_from_deputy_fork is an identical copy
of ret_from_fork. ret_from_kickstart is called from arch_kickstart in
isnt this GET_THREAD_INFO redundant? hpc/arch-i386.c. it calls GET_THREAD_INFO(%bsp), and jmps to
syscall_exit, returning to userspace for the 'first' time on a remote
node. we then modify the resume_userspace entry point, to call
openmosix_pre_usermode between doing work_pending, and restor_all'ing.
in the next two hunks, we add code to select which syscall table to use
based on whether the current task is marked DREMOTE or not. this is
added once in ENTRY(sysenter_entry), and again in ENTRY(system_call).
we also modify syscall_exit and sysenter_exit to store the result of
GET_THREAD_INFO into %ebp, cleaning up after our own code clobbers the
register.
004 i386,i387 arch/i386/kernel/i387.c fxsr support is support for fast saving of the i387's floating
point/sse/sse2/etc state to a 512 byte block. its a new feature,
not present in earlier i387 style floating point processors. this patch
changes from declaring the conversion functions for fxsr<->387 from
static to OM_NSTATIC, and adds a function for finding out whether
support exists during run time.
005 i386,remote creates arch/i386/kernel/omasm.h this file contains the syscall table called by processes which are
whats the rule for whether to process DREMOTE. it contains a mapping of whether a syscall is to be passed to
locally or back home? the home node, or handled locally.
#define self out if !CONFIG_OPENMOSIX
006 userthread arch/i386/kernel/process.c in this patch, we add an entry for user_thread_helper in the
remotefork kernel_thread_helper execution path, add a function for creating an
in-kernel user thread, and re-direct the entry point ret_from_fork to
ret_from_deputy_fork for processes that are DDEPUTY in copythread().
our user_thread_helper entry point meerly subtracts 60h from the stack
pointer for this task, reserving space for the user registers on the
stack, allowing execution to continue into the kernel thread helper.
user_thread is called by openmosix_mig_daemon, in hpc/migrecv.c, to
start a 'kernel thread', in the user segment, to handle an incoming
migration request. to do this, we set up the user registers in a
pt_regs structure, so that we can call do_fork, and have it create the
thread for us. first we zero the structure. then we assign the
function pointer to the function we want to start in to ebx, set edx to
the function's argument, set xds and eds to allow the process access to
__USER_DS (the usermode dataspace), and set xcs to allow the process
to execute code in __KERNEL_CS (the kernels codespace). we set the
orig_eax 'register' to -1, and set the eip to point to our
user_thread_helper above, so that execution starts there, and set our
eflags so that when this process is running, hardware interrupts are http://x86.org/intel.doc/386manuals.htm
enabled, and so that the sign and parity bits are turned on. we then
call do_fork, adding flags indicating that we don't want this process
to receive SIGCHILD, and that it cannot be ptraced. we return the
result of the do_fork call. finally, we modify copy_thread so that for
processes marked DDEPUTY, instead of the parent process returning to
userspace and immediately entering the kernel at ret_from_fork, we set
it to enter the kernel at ret_from_deputy_fork.
007 i386,ksocket arch/i386/kernel/signal.c this patch changes the do_signal function from static to OM_NSTATIC
008 i386,remote arch/i386/kernel/sys_i386.c this patch modifies sys_mmap2 so that processes marked DREMOTE mapping
memory without MAP_ANONYMOUS get forwarded to remote_do_mmap.
009 i386,local arch/i386/kernel/vm86.c this patch changes the save_v86_state and return_to_32bit functions
remote so that they clear the DSTAY_86 flag before they exit. it also changes
both sys_vm86 and sys_vm86old so that they return a process to its
home node if it attempts to enter vm86 mode. the code we add to
save_v86_state and return_to_32bit simply task_lock()s the current
task, uses task_clear_stay() to clear the DSTAY_86 flag, and
task_unlock()s the current task. the code we add to sys_vm86 and
sys_vm86old simply calls task_go_home_for_reason(), specifying
DSTAY_86. if task_go_home_for_reason returns non-zero, we force the
better error message? function we're in to return -ENOMEM, as we must be on remote, and
migration must have failed.
010 i386,remotemem arch/i386/lib/usercopy.c this patch changes strlen_user to redirect to deputy_strlen_user if
openmosix_memory_away().
011 ppc,config arch/ppc/Kconfig this patch adds hpc/Kconfig to the build process
012 ppc,remote arch/ppc/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines
syscall insert a BLANK(); before our code. used by the assembly code in arch/ppc/kernel/entry.S. first, we
generate the offset to the om member of the task structure, from the
begining of the structure. we then generate an offset to the dflags
DDEPUTY and DREMOTE should have a member, inside of task.om, which is of type openmosix_task. finally,
following _asm, indicating they're we define DDEPUTY and DREMOTE constants, setting them to the DDEPUTY
the versions used by assembly code. and DREMOTE values defined in hpc/task.h.
013 ppc,remote arch/ppc/kernel/entry.S we modify this file to a new entry point to the kernel, utilize the
syscall syscall mapping table in arch/ppc/kernel/misc.h, and insert a call to
openmosix_pre_usermode in our userspace return path. first we modify
syscall_dotrace_cont, selecting which syscall table to use based on
whether the current task is marked DREMOTE or not. ret_from_kickstart
is called from arch_kickstart. ret_from_kickstart branches directly to
ret_from_syscall, returning to userspace for the 'first' time on a
remote enode. our last hunk branches directly to openmosix_pre_usermode
in the restore_user path.
014 ppc,userthread arch/ppc/kernel/misc.S in this patch, we add an assembly function to create usermode threads,
and create the remote syscall table. first we define SIGCHLD. our next
hunk creates a user_thread function similar to the one in
rewrite in C, if possible! arch/i386/kernel/process.c, except hand written in assembly. finally,
move syscall table to omasm.h! we create the syscall table used by processes which are DREMOTE. it
contains a mapping of wether a syscall is to be passed to the home
node, or handled locally.
015 x86_64,config arch/x86_64/Kconfig this patch adds hpc/Kconfig to build process
016 x86_64,remote arch/x86_64/kernel/asm-offsets.c this patch defines offsets to members inside of structures, and defines
syscall remove ifdef around header. redundant. used by the assembly code in arch/x86_64/kerel/entry.S. first, we
move this define with the others. generate the offset to the om member of the task structure, from the
add a define around task! is task used? begining of the structure. next we define an entry for task. in our
insert a BLANK(); before our code. last hunk, we generate an offset to the dflags member, inside of
DDEPUTY and DREMOTE should have a task.om. we then define DDEPUTY and DREMOTE, setting them to the
following asm! DDEPUTY and DREMOTE values defined in hpc/task.h.
017 x86_64,remote arch/x86_64/kernel/entry.S we modify this file to add a new entry point for returning from
syscall dont define out omasm.h kickstart, utilize the syscall mapping table in
arch/x86_64/kernel/omasm.h, modify the PTREGSCALL macro to create om_
entries for each of the 6 functions that take a PTREGS argument, insert
an om_ptregscall_common entry point, insert a om_stub_execve entry
point, insert a call to openmosix_pre_usermode, and insert a
re-write user_thread in C! user_thread function written in assembly. ret_from_kickstart restores
the state of the registers to how they were before it was called, and
returns to userspace for the first time, on a remote node. in the
system_call entry point, we check for DREMOTE in task.om.dflags.
why the fake frame? CFI_ADJUST! if we find it, we jump over a stack frame, call into our
UNFAKE_STACK_FRAME? remote_sys_call_table, step back under the stack frame, and jmp to
ret_from_syscall. otherwise, we pass through to the normal syscall
handler. next, we re-define the PTREGSCALL macro so that when its used,
it creates two entries instead of one. one 'normal' entry, and one
entry prepended with om_, that loads the address of an om_ version of
the function being declared, and calls our om_ptregscall_common
entry to dispatch. om_ptregscall_common is our version of
non-functional differences? ptregscall_common. the only functional difference is ours begins by
jumping over a stack frame, does the same work as ptregscall_common to
call the C function pointed to in %rax, and peel back to before the
stack frame we jumped over. entry om_stub_execve is similar to
stub_execve, only call remote_do_execve instead of sys_execve, and we
save the contents of r11(eflags) in r15 during this call, restoring
afterwards. our next hunk modifies the common_interrupt entry to call
our openmosix_pre_usermode function during the return to userspace.
finally, we create our user_thread entry, which is responsible for
creating our 'user thread', similar to the two previous user_thread
functions.
018 x86_64,remote creates arch/x86_64/kernel/omasm.h this is a table containing mappings that are used to dispatch syscall
#define self out if !CONFIG_OPENMOSIX requests made by processes that are guests. it contains mappings of
whether a syscall is to be passed to the home node, or processed
what about sys_ni_syscall? locally. the entries are referenced by code generated by the
PTREGSCALL macro in arch/x86_64/kernel/entry.S. each mapping stores the
address of one of om_sys_local, om_sys_remote, or sys_ni_syscall.
this address is loaded into %rax by PTREGSCALL, and PTREGSCALL calls
om_ptregscall_common to dispatch to the function retrieved from this
table.
019 x86_64,remote arch/x86_64/kernel/sys_x86_64.c this patch redirects sys_mmap2 so that remote processes mapping memory
without MAP_ANONYMOUS get forwarded to remote_do_mmap.
020 x86_64,local arch/x86_64/lib/copy_user.S this patch redirects copy_to_user and copy_from_user so when the kernel
rmem on the home node is accessing memory in the processes userspace, it
gets redirected to functions accessing memory on the remote node. our
first hunk modifies copy_to_user, checking to see if the process is
marked DDEPUTY, and if so re-directing to deputy_copy_to_user. the
second hunk accomplishes the same task, re-directing copy_from_user to
better label than 2901! deputy_copy_from_user if the task is marked DDEPUTY.
021 x86_64,local arch/x86_64/lib/usercopy.c this patch forwards __strncpy_from_user and strncpy_from_user so when
rmem missing comments on #endifs the kernel on the home node is accessing memory in a deputy processes
userspace, we use deputy_strncpy_from_user. we also forward
__strlen_user and strlen_user to deputy_strlen_user for the same
reason.
022 local,rmem fs/namei.c modify getname to use deputy_strncpy_from_user to get the filename
BUG! this function is supposed to requested from userspace from the remote node when
cannonicalize the filename passed in, openmosix_memory_away().
not just return it!
missing comment on #endif
023 local,procfs fs/proc/base.c this patch adds "files" named where, stay, and debug in a directory
named "hpc" in the /proc/$PID/ directory of each process on the
take out #ifdef around header include. local node. this makes it the 'root' of our procfs handling code. it
actually passes off the work involved to hpc/proc.c. first, we include
the hpc/hpc.h header. in the next two hunks we add entries for
PROC_TGID_OPENMOSIX, PROC_TGID_OPENMOSIX_WHERE,
PROC_TGID_OPENMOSIX_STAY, PROC_TGID_OPENMOSIX_DEBUG,
PROC_TID_OPENMOSIX, PROC_TID_OPENMOSIX_WHERE, PROC_TID_OPENMOSIX_STAY,
and PROC_TID_OPENMOSIX_DEBUG into enum pid_directory_inos. this enum
sets the inode number for each of our "files" in /proc to unique
values. next we create the "om" entry with inode number
PROC_TGID_OPENMOSIX in the tgid_base_stuff structure. we then do the
same thing again, creating a "om" entry with inode number
PROC_TID_OPENMOSIX in tid_base_stuff. next we create a pair of
structures (tgid_openmosix_stuff and tid_openmosix_stuff), containing
entries for PROC_TGID_OPENMOSIX_WHERE/"where",
PROC_TID_OPENMOSIX_WHERE/"where", PROC_TGID_OPENMOSIX_STAY/"stay",
PROC_TID_OPENMOSIX_STAY/"stay", PROC_TGID_OPENMOSIX_DEBUG/"debug", and
PROC_TID_OPENMOSIX_DEBUG/"debug". these entries declare the contents of
our /proc/$PID/om/ directory. in our next hunk, we add
proc_pid_openmosix_read and proc_pid_openmosix_write functions, and a
file_operations structure named proc_pid_openmosix_operations
mapping .read and .write functions to the proc_pid_openmosix_read and
proc_pid_openmosix_write we just declared. in proc_pid_openmosix_read,
we first trunicate a read request to PAGE_SIZE, and request a free page
in GFP_KERNEL. if __get_free_page fails, we return -ENOMEM. otherwise,
we then call openmosix_proc_pid_getattr from hpc/proc.c, which does the
work of dispatching our read request to the right function. it fills in
the page we allocated, and returns the ammount of characters written to
the page. if the length is less than zero, this indicates an error. we
respond to this error by freeing our page, and returning the error
value. assuming no error occured, we check to see if the user requested
data beyond the end of what we 'read'. if they did, we free our
requested page, and return 0. otherwise, we take the seek value (ppos),
and apply it to our page. we then copy_to_user the contents of our page
(past the seek) to the passed in userspace buf, free our page, and
return the number of bytes copy_to_user'd. in proc_pid_openmosix_write,
reverse the order of these. fail first! we first trunicate our write request to PAGE_SIZE, then check to see if
the user requested a 'partial write'. if they did, we return -EINVAL.
we then get a free page in GFP_USER. if that fails, we return -ENOMEM.
otherwise, we copy the data in the passed in buf into our new page. if
our copy_from_user fails, we free our page, and return -EFAULT.
otherwise, we call openmosix_proc_pid_setattr with our page, free said
page, and return the length returned(even if its a negative value, EG
an error message from openmosix_proc_pid_setattr). we define a
file_operations structure (proc_pid_openmosix_operations), pointing its
.read and .write members to the two functions we just declared.
we then forward declare proc_tid_openmosix_operations,
proc_tid_openmosix_inode_operations, proc_tgid_openmosix_operations,
and proc_tgid_openmosix_inode_operations structures, which we'll define
later in this patch. adds a pair of cases to the large switch in
proc_pident_lookup that map all eight of our unique identifiers defined
at the begining of this file to their respective file_operations
structures, and inode operations structures in the case of the
containing directories. this allows the proc system to find the
structures containing the function pointers to handle our requests.
in our last hunk, we provide implementations for the functions
proc_tgid_openmosix_readdir and proc_tid_openmosix_readdir
that return the result of calling proc_pident_readdir against our
tgid_openmosix_stuff and tid_openmosix_stuff structures.
we then define the proc_tgid_openmosix_operations and
proc_tid_openmosix_operations structures, mapping .read to
generic_read_dir and .readdir to our proc_tgid_openmosix_readdir or
proc_tid_openmosix_readdir declared above. we then define the functions
proc_tgid_openmosix_lookup and proc_tid_openmosix_lookup, which return
the result of calling proc_pident_lookup with our tgid_openmosix_stuff
or tid_openmosix_stuff structures. finally, we define our
proc_tgid_openmosix_inode_operations and
proc_tid_openmosix_inode_operations structures, mapping .lookup to
proc_tgid_openmosix_lookup or proc_tid_openmosix_lookup.
024 local,procfs fs/proc/root.c this patch changes proc_root_init to call openmosix_proc_init. first,
remove #ifdef around include we include our hpc/hpc.h header. then, we add our call to
openmosix_proc_init (from hpc/proc.c) into proc_root_init.
025 ksocket,local, fs/select.c this patch exports the do_select function, so that hpc/kcomd.c can use
remote it to check for data on our pile of incoming sockets. this function is
is only exported if CONFIG_KCOMD is selected.
026 i386, local, creates hpc/arch-i386.c this file contains functions converting between two different but http://arch.ece.uic.edu/~yxshi/param/web/homepage/research/doc/reference/vc130.htm
remote, this file should be broken up. compatible x87 floating point state formats, for sending and receiving
archmig, archetecture specific sections of a given task's state, a function for
syscall starting a new guest process, and support functions for handling
syscall requests from entry.S. first, we forward declare the functions
twd_fxsr_to_i387 and twd_i387_to_fxsr from arch/i386/kernel/i387.c.
we utilize them in creating fxsave_to_fsave and fsave_to_fxsave
make the order of operations in these functions. fxsave_to_fsave is called by the later declared
two functions identical. arch_mig_receive_fp to convert from fxsave to fsave format. we start
by copying the contents of the cwd, swd, fip, fcs, foo, and fos fields
of the from union to the to union. we then use twd_fxsr_to_i387() to
fill in our twd member. we then save padding[0] to our fop member, and
padding[1] to our mxcsr mrmber. next we perform a memcopy loop to copy
and convert the st_space member. this member contains fields that are
16 bytes long in fxsave format, and 10 bytes long in fsave format. we
loop through the fields, copying only the first 10 bytes to our to's
st_space. finally, we memcopy the xmm_space member. fsave_to_fxsave is
also called by arch_mig_receive_fp, to convert from fsave to fxsave
format. we start by copying the contents of the cwd, swd, fip, fcs,
foo, and fos to the 'to' union, from the 'from' union. we use
twd_i387_to_fxsr() to fill in our twd member, save padding[0] to our
fop member, and save padding[1] to our mxcsr member. after that, we
enter a loop, memcopying our 10 byte long members of st_space to 16
byte spaces. finally, we memcopy the xmm_space member. the next three
functions are for receiving archetecture specific state information.
BROKEN! does not handle setting up arch_mig_receive_specific is called by mig_do_receive in
LDT entries! hpc/migrecv.c. its purpose is to receive the archetecture specific part
of a process. the one in this file has code to warn us that we're not
setting up the LDT correctly, and still returns success. if its asked
to setup anything else, we return -1. arch_mig_receive_proc_context is
called at the top of mig_do_receive_proc_context, from hpc/migrecv.c.
its function is to set up the CPU state from the passed in omp_mig_task
structure. we start by getting the pt_regs structure of the task we're
check failure in this function! setting up with ARCH_TASK_GET_USER_REGS. we then overwrite it with
omp_mig_task's regs member. we overwrite our task's thread.debugreg
with arch.debugreg from omp_mig_task, as well as overwriting thread.fs
and thread.gs with arch.fs and arch.gs (setting up our segmentation
registers). we then copy the contents of the tls_array structure, which http://lwn.net/Articles/5851/
contains the 'thread local space' segment offsets. this function always
returns 0. arch_mig_receive_fp is called by mig_do_recieve_fp, from
hpc/migrecv.c. its function is to set up the FPU state from the passed
in omp_mig_fp structure. we start by calling unlazy_fpu, to initialize
the FPU, then we check wether the current CPU has the fsxr instruction,
and whether the remote CPU has the fsxr instruction. if they both do,
or if they both don't, that means the floating point save is in the
same format, so we just memcpy the state from the omp_mig_fp struct to
the task's thread.i387 structure. otherwise, we call one of the above
two conversion functions (fxsave_to_fsave, or fsave_to_fxsave) to
perform the copy, while translating the formats. the next two functions
are called by mig_do_send in hpc/migsend.c, before and after doing the
actual work of sending a task to another node (home or remote).
arch_mig_send_pre clears the LDT if there is one set for this process,
and arch_mig_send_post loads the LDT back up, if there is one.
the next three functions are the send side, to match the three
arch_mig_recieve functions earlier. all three of these functions are
called from mig_do_send, in hpc/migsend.c. arch_mig_send_specific is a
STUB! stub that looks like it was supposed to send the LDT, but instead
prints a warning if an LDT is being used. arch_mig_send_fp is called
to send the FPU state. we call unlazy_fpu, then fill in the fp
state(along with the fxsr flag). arch_mig_send_proc_context is called
to send the CPU context of a task. in it, we store the user registers,
segmentation registers (FS and GS), the thread local space entries, and
the debugreg registers to the passed in struct omp_mig_task. in
addition, if this task is marked DDEPUTY (meaning we're on the home
node), we also send the features of the boot CPU. arch_kickstart is the
function called to start up a newly "created" task. in it, we set up
debug registers 0-3, 6, and 7 with set_debugreg(). we intentionally
omit registers 4 and 5 due to them being just aliases for 6 and 7. we
use load_TLS, sets up the thread local spaces, and use loadsegment to
do we need to flush pending signals? load our FS and GS registers. we set CS to __USER_CS, flush pending
signals, and execute an assembly fragment that causes us to immediately
jump to the ret_from_kickstart entry point in entry.S. at this point
split this back off. there's a break in the file, like this section used to be another file.
we include some headers, then define three functions that are part of
our syscall handling subsystem. arch_exec_syscall is called by
deputy_do_syscall, to call a requested syscall on behalf of a remote
process. we use OMDEBUG_SYS to print a tracing message, look up the
requested syscall in the sys_call_table, and return the result of
calling it (through a function pointer) with the passed in arguments.
these functions belong in the same the next two functions are called via the remote_sys_call_table in
place as user_thread! /arch/i386/kernelomasm.h, by guest processes. om_sys_fork is called by
a guest process, trying to fork. we wrap remote_do_fork, passing it a
clone_flag of SIGCHLD, and null arguments for parent and child thread
pointers. om_sys_clone performs similarly, first checking for a new
stack pointer in CX. if there isn't one, we re-use the current task's
stack pointer. we accept the clone_flags in register ebx, the parent
tidptr in edx, and the child tidptr in edi. we pass all of this to the
same remote_do_fork as the previous function.
027 ppc, local, creates hpc/arch-ppc.c this patch is very similar to the previous patch, but cleaner.
remote, arch_mig_receive_specific just returns 0. its called by mig_do_receive
arch_mig, from hpc/migrecv.c. its purpose is to receive the archetecture specific
syscall part of a process, which aparently the PPC dosent have.
arch_mig_receive_proc_context is called at the top of
mig_do_receive_proc_context, from hpc/migrecv.c. its purpose is to set
the user registers of the current task to the contents of the passed in
omp_mig_task structure. in it, we simply use ARCH_TASK_GET_USER_REGS to
check return of memcpy! retreive the registers in question, then memcpy over them from our
passed in structure. we always return 0. arch_mig_receive_fp is called
by mig_do_receive_fp, from hpc/migrecv.c. its function is to set up the
current task's FPU state to the one passed in the omp_mig_fp structure.
in it, we memcopy the floating point registers from the passed in
FIXME: fpscr_pad not needed? structure over the task->thread->fpr, and copy the fpscr and fpscr_pad
as well. arch_mig_send_pre and arch_mig_send_post are void no-ops.
their purpose is to make a process "ready to be migrated" while we're
pulling the process apart, which aparently dosent need done on PPC.
they're called at the begining and end of mig_do_send in hpc/migsend.c,
respectively. arch_mig_send_specific is also a no-op, as the PPC has no
architecture specific "parts" of a process. in it, we just return 0.
arch_mig_send_fp is called by mig_do_send to fill the passed in
omp_mig_fp structure with the floating point state of the current
check this memcpy! task. in it, we memcopy the task->thread->fpr structure into the
FIXME: fpscr_pad not needed? omp_mig_fp, and set the fpscr and fpscr_pad members as well. we always
return 0. arch_mig_send_proc_context is called by mig_do_send to fill
in the passed in omp_mig_task structure with the CPU state of the
current task. in it, we use ARCH_TASK_GET_USER_REGS to get the pt_regs
check this memcpy! structure, and just memcpy it into our omp_mig_task structure. we
return 0. arch_kickstart is called by mig_handle_migration to start a
guest process for the first time. to accomplish this, we get the user
what are we doing with mr 1, or the registers, and branch to ret_from_kickstart, passing our user registers
user registers? as input. arch_exec_syscall is called by deputy_do_syscall to call a
requested syscall on behalf of a remote process, returning its result.
we look up the requested syscall in the sys_call_table, and return the
result of calling it.
028 x86_64 creates hpc/arch-x86_64.c this patch is very similar to the previous two patches.
arch_mig_receive_specific is called by mig_do_receive from
hpc/migrecv.c, to receive the architecture specific parts of a process.
STUB! ldt handling? this function is a stub, returning 0. arch_receive_proc_context is
called at the top of hpc/migrecv.c's mig_do_receive_proc_context. its
function is to set up the CPU state, using the passed in omp_mig_task
structure as input. in it, we start by using ARCH_TASK_GET_USER_REGS to
retreive our task's pt_regs structure, which we overwrite with the regs
member from omp_mig_task, using memcpy. we then copy our segment
pointers (ds, es, fs, gs), our index registers (fsindex, gsindex), and
our userspace stack pointer (userrsp) from omp_mig_task's arch member
this is way different from x86. to our task structure's thread member. we call write_pda to associate
check differences! our new stack pointer with our task, and return 0. arch_mig_receive_fp
is called by hpc/migrecv.c's mig_do_receive_fp. its function is to
set up the FPU state, using the passed in omp_mig_fp structure as
bad comment. not all amd64 is opteron. input. in it, we call unlazy_fpu, then memcopy our omp_mig_fp's data
member over the task's thread.i387 structure. arch_mig_send_pre
is called at the top of hpc/migsend.c's mig_do_send. its function is to
prepare a task for migration. in it, we clear the LDT, if one is
present. arch_mig_send_post is called at the bottom of hpc/migsend.c's
mig_do_send, after migration of a process is complete. it restores the
LDT, if we cleared it earlier. arch_mig_send_specific is called by
hpc/migsend.c's mig_do_send, to send the architecture specific section
FIXME: send the LDT of a process. it should be sending the LDT, but instead is a stub,
FIXME: ordering. CPU, FPU, all else. returning 0. arch_mig_send_fp is called by hpc/migsend.c's mig_do_send
to store the passed in task's FPU state in the passed in omp_mig_fp
structure. in this function, we unlazy_fpu, then memcopy the task's
check this memcpy. thread.i387 structure into the omp_mig_fp. we return 0.
arch_mig_send_proc_context is called by hpc/migsend.c's mig_do_send to
store the passed in task's CPU state in the passed in omp_mig_task.
we aquire a pointer to the user registers using
check this memcpy. ARCH_TASK_GET_USER_REGS, and memcopy them in to the regs member of the
no thread locals in recv function! passsed in omp_mig_fp structure. we then copy all the thread local
spaces to the omp_mig_task's arch.tls_array. we copy the segmentation
and index registers (ds, es, fs, gs, fsindex, and gsindex), then
retreive our stack pointer using read_pda(oldrsp), copying it as well.
this function returns return 0, indicating success.
arch_kickstart is called by mig_handle_migration, in hpc/migrecv.c.
its function is to jump from our kernel code to the user space code of
a guest process for the first time. to accomplish this, first we use
load_TLS is commented out! bug! set_debugreg to load the debugging registers (0-3, 6, and 7). then,
we load the segmentation registers, using loadsegment for ds, es, and
fs, and use load_gs_index for the gs segment register. we set the
task's cs to __USER_CS, and its ds to __USER_DS, and set_fs(USER_DS).
should we be flushing? we then flush pending signals, and jmp to ret_from_kickstart. this
function by definition never returns. arch_exec_syscall is called from
deputy_do_syscall in hpc/deputy.c. its function is to call a given
syscall, returning the results the syscall returned. in this
implimentation, we create inline assembly functions to perform the
call. om_sys_fork is called via the remote process table, by a guest
process trying to fork. we wrap remote_do_fork from hpc/remote.c,
passing it a clone_flag of SIGCHLD, and null arguments for parent and
child thread pointers. finally, we define five functions as stubs,
printk! which printk an 'not implimented' message and return. each of these
not implimented? #warning. functions is called by the remote_syscall_table. they are: om_sys_iopl,
om_sys_vfork, om_sys_clone, om_sys_rt_sigsuspend, and
om_sys_signalstack. they return -1.
029 kcom, local, creates hpc/comm.c this is the kernel-to-kernel communication system. in it, we use TCP/IP
remote sockets to pass migration related information between kernels.
first we define POLLIN_SET to be the set of events we want poll to tell
us about on a given socket, during comm_peek and comm_wait's invocation
of comm_poll. we then define three timeout variables
dead code! (conn_remote_timeo, comm_connect_timeo, and comm_reconn_timeo), which
are initialized from values #defined in hpc/comm.h, and are used
nowhere. comm_shutdown is called in case of an error on recv()ing data
in comm_recv. its a wrapper, which safely calls sock->ops->shutdown,
making sure sock and sock->ops are non-null. comm_getname is a wrapper
to safely call sock->ops->getname. we return -1 in case sock->ops is
null, sock->ops->getname is null, or if getname returns null.
otherwise, we return the size of the passed in sockaddr.
dead code! comm_data_ready is a wrapper which calls wake_up_interruptable to wake
all the tasks in the passed in socket's sleeping task queue. it is
called by no-one, and is in fact dead code. comm_setup_tcp is called by
the later defined comm_accept, to set up the options on the passed in
who else does this? connection. first we save our current user space address space
selector, and set ourselves to use the kernel address space. then, we http://mail.nl.linux.org/kernelnewbies/2001-11/msg00204.html
use sock_setsockopt to set SO_KEEPALIVE on the passed in socket, then
FIXME? old code. use sock->ops->setsockopt to set TCP_KEEPINTVL TCP_KEEPCNT,
TCP_KEEPIDLE, and TCP_NODELAY. finally, we restore our origional http://www-128.ibm.com/developerworks/linux/library/l-hisock.html
address space limit, and exit with 0 if everything was successful. if
any of our setsockopt functions return non-zero, we restore our
origional address space limit, and return the error value in question.
this function is useless. comm_socket is called by comm_setup_listen, and comm_setup_connect.
its function is to be a wrapper around sock_create, returning NULL in
case of an error, instead of the error sock_create would normally
return. comm_bind is called by comm_setup_listen, to bind our socket to
a given address and port. it is a wrapper around sock->ops->bind. in
printk! the event of an -EADDRINUSE error, we log a message using printk. in
any event, we return the value sock->ops->bind returned to us.
comm_listen is called by comm_setup_listen to start listening to a
passed in socket. its a wrapper around sock->ops->listen, returning the
result given to us. comm_connect is called by comm_setup_connect to
connect to a remote kernel at the passed in address, via the passed
socket. in it, we start by creating a waitqueue entry for the current
process. we then check to see if we were passed a timeout value. if we
were, we use that timeout when trying to connect() to a remote machine,
with an asychronous request. otherwise, we use MAX_SCHEDULE_TIMEOUT. we
then insert our waitqueue entry into the passed in socket's sk_sleep
waitqueue. we enter a while loop, waiting for sock->state to be
SS_CONNECTED. in this loop, we mark ourselves TASK_INTERRUPTABLE, then
request another asynchronous connect. if we get an error thats not
-EALREADY, we break out of our loop, as it means something went wrong
with the socket. otherwise, we then schedule_timeout for a maximum of
the remainder of our timeo, storing the remainder that schedule_timeout
returns in timeo. if timeo runs out, we set error to -EAGAIN, and break
out. at the bottom of the loop, the while checks to see if the socket
connected, and continues on if it did. after the loop, we first remove
ourself from the socket's waitqueue, and set our state to TASK_RUNNING.
we then check to see if error is non-zero. if it is, then either we had
a problem with the socket, or we ran out of time. we handle this by
OMBUGing a message indicating connection failed, and returning the
error value in error. we then do another check on the socket, to make
sure it dosent have an error on it already. if it does, we OMBUG about
it, and return the error. finally, since no errors occured, and the
socket is connected, we return 0 indicating success. comm_close is a
wrapper around sock_release, called by comm_setup_listen,
comm_setup_connect, and hpc/migctrl.c's task_local_send in cases of
failure, hpc/migctrl.c's task_remote_expel when we're done expelling a
process, task_local_bring when we're done receiving a process back
home, and hpc/migrecv.c's openmosix_mig_daemon when we fail to accept
an incoming process, or our user_thread() returns. comm_peek is called
by hpc/kernel.c's remote_pre_usermode to see if data is waiting on the
passed in socket. it returns 1 for data waiting, 0 if not. comm_poll is
sighfile is used here, but earlier we called by comm_wait and comm_accept to wait on an "event" to occur on a
used NULL. which is correct? passed in socket. in it, the first thing we do is create an empty file
pointer, for later use while polling. we then create a waitqueue entry
for the current process. we check to see if a timeout value was passed
in, and if one was, we use it. otherwise, we use MAX_SCHEDULE_TIMEOUT.
we add our wait_queue entry into the passed in socket's waitqueue, and
enter a loop, with no termation clause. in this loop, we first set our
current state to TASK_INTERRUPTABLE, then get the result of polling our
socket, supplying sighfile as a placeholder. we check to see if any of
the poll events in our poll mask are true, and if interruptable is set,
we check to see if we have a pending signal, or a pending openmosix
data request. if any of these things have occured, we exit our loop.
otherwise, we use schedule_timeout to give up the CPU until we're
interrupted, or our timeout has expired. when schedule_timeout returns,
we check to see if our timeout has expired, and if it has, we exit our
loop. otherwise, we loop back to the top of our loop. after the loop,
we remove ourself from the socket's waitqueue, and set our state to
TASK_RUNNING. if the last poll result we received from poll has an
event that is in our passed in poll mask, we return 1. otherwise, we
comm_wait should be a define? return 0. comm_wait is called by hpc/deputy.c's deputy_main_loop, to
check for communication on a given socket. its a wrapper around
comm_poll, filling in some default parameters. it asks comm_poll to
check against POLLIN_SET as a poll mask, look for interruptable events,
and use the default timeout value. for a return value, we return what
comm_poll returns to us. comm_accept is called by hpc/migrecv.c's
openmosix_mig_daemon, to accept an incoming connection on a given
socket. in it, we receive a passed in socket thats been connected to,
create a new socket, and set its type and ops members the same as the
what error are we looking for here? socket passed in. we then make sure theres no data to be read on the
passed in socket with comm_poll. after that, we call sock->ops->accept
to accept the connection attempting to attach to the passed in socket,
on the newly created socket. we then use comm_setup_tcp to set the
connection options on the new socket. if that succeeds, we set the
passed in target socket pointer to our newly created socket, and return
0. if comm_poll finds data on our passed in socket, we return -EAGAIN,
set our passed in target socket pointer to NULL, and destroy our newly
created socket. if either accept() or comm_setup_tcp() fail, we return
the error they return to us, set our passed in target socket pointer to
NULL, and destroy our newly created socket. comm_dorecv is called by
s/lenght/length comm_recv to do the actual work of reading a given ammount of data from
the passed in socket. we start by entering a do...while loop, waiting
for the count of bytes left to be received to hit zero. we initialize
that count with the passed in length of data to be received. in this
loop, we call sock_recvmsg, to receive a number of bytes from the
passed in socket. we check the value returned to see if its an error,
or a count of bytes returned. if its an error, first we check to see if
the error was -EFAULT. if it is, we update the passed in message
structure's msg_iovlen and msg_iov members, indicating how much of the
request was filled, and return the number of bytes received. if the
error was any other error, we return the error value in question. if
sock_recvmsg returns 0, we return -EPIPE. otherwise, sock_recvmsg was
successful, and returns a number of bytes received. we update our
variable containing the number of bytes still left to be read, and
check to see if we have read the entire ammount of bytes we are
supposed to read. if we have, we update the passed in message
this if inside the loop should be structure's msg_iov and msg_iovlen members, and return the number of
outside! as well, the for and while in bytes read. comm_recv is called to receive a given ammount of data from
the loop should be made similar in comm_dorecv. we BUG_ON() if the socket pointed to is null, or if the
structure. ammount of data requested is greater than the size of a single page of
memory. we create a msghdr structure to use in the request, and set its
iov_base member to point to the passed in write buffer pointer, and set
the iov_len to the length of data we're requesting. we then use get_fs http://kerneltrap.org/mailarchive/linux-kernel-newbies/2007/10/25/355049
and set_fs to allow us to call a system call from the kernel. we call
comm_dorecv to actually talk to the socket in question, then check its
result. if comm_dorecv dosent return the right ammount of data read, we
OMBUG() about it, then call comm_shutdown(link), returning -EFAULT
after restoring our fs. otherwise, we restore our fs, and return the
length of the data read. comm_send uses the same fs manipulation trick
as above to allow us to call syscalls from kernel space, then wraps
sock_sendmsg(), printk'ing a message if we fail to send the proper
ammount of data. we return the number of bytes transmitted. next is a
"openmosix specifics start here" marker in the comments. set_our_addr
is called by hpc/migrecv.c's openmosix_mig_daemon to set the passed in
sockaddr structure's sin_family, sin_addr, and sin_port members. for
IPV4 sockets, we set family to AF_INET, address to INADDR_ANY, and port
no IPV6 setup? to the passed in port. for IPV6 sockets, we do nothing.
comm_setup_listen is called by hpc/migrecv.c's openmosix_mig_daemon,
and hpc/remote.c's remote_do_fork. its purpose is to set up a listening
socket, ready for a remote kernel to connect to. in it, we call
comm_socket to create a socket of the same type as the socket passed
in, then call comm_bind to bind the socket to the requested address and
port. finally, we call comm_listen start listening on that socket, and
return the socket. if comm_socket fails, we return NULL. otherwise, if
comm_bind or comm_listen fail, we destroy our socket, and return the
error returned to us by the function that failed. comm_setup_connect is
called by hpc/deputy.c's deputy_do_fork, and hpc/migctrl.c's
task_local_send. its purpose is to open a socket to a remote kernel. in
it, we use comm_socket to create a socket, then use comm_connect to
connect to a listening socket on the target machine. we return the
connected socket. if comm_socket fails, we return NULL. if comm_connect
return comm_connect's error? fails, we destroy the socket, and return NULL. comm_send_hd is called
to send a data segment with a omp_req header attached. in the header,
we set the type and dlen to the passed in type and length,
respectively. we then use comm_send to send the header, then the data.
we return 0 on success, and if either invocation of comm_send return an
return comm_send's errors? error, we return -1. comm_send_req sends a omp_req structure,
containing only the passed in type, no data. it returns the value
returned by comm_send.
030 rmem, local, creates hpc/copyuser.c this file contains routines for moving chunks of memory over an
remote established connection. its broken into two parts, deputy_* functions,
and remote_* functions. deputy_ functions are run on the home node, and
remote_ functions are run on the node a process has been migrated to.
deputy_copy_from_user is called by include/asm-i386/uaccess.h's
__copy_from_user_inatomic, include/asm-x86_64/uaccess.h's
__copy_from_user, and arch/x86_64/lib/copy_user.S's copy_from_user. its
purpose is to read a given memory segment from the remote node. in it,
BUG! use of in_atomic is flatout wrong! we use in_atomic to detect if we're in an atomic section of the kernel, http://lwn.net/Articles/274695/
and if we are, we return the length of the memory area requested as an
error. otherwise, we fill in an omp_usercopy_req structure with the
passed in from address and passed in length, then we use
OMDEBUG_CPYUSER() is being used in the OMDEBUG_CPYUSER to log a message about the source, destination, and
deputy code to printk with a unique length. we use comm_send_hd to send said omp_usercopy_req to the remote
format? node, then use comm_recv to receive the results directly to the passed
in destination. we return 0 indicating success. if comm_send_hd or
seperate OMBUG invocations! comm_recv return an error, we OMBUG the error value, and return -1. we
export a symbol pointing to this function using EXPORT_SYMBOL().
make this function and the previous deputy_strncpy_from_user is called by arch/x86_64/lib/usercopy.c's
have similar orders of operation. __strncpy_from_user and strncpy_from user, as well as fs/namei.c's
getname. its function is virtually identical to the previous function,
and reads a given memory segment from the remote node. in it, we use
OMDEBUG_CPYUSER to log a messae containing the source, destination, and
length of the requested segment. we then fill in a omp_usercopy_req
structure with the passed in source address, and passed in length. we
use comm_send_hd to send the omp_usercopy_req to the remote node, then
use comm_recv to receive the results directly into the passed in
destination. we return 0 indicating success. if comm_send_hd or
seperate OMBUG invocations! comm_recv return an error, we OMBUG the error value, and return -1.
make this function and the previous deputy_copy_to_user is called by include/asm-i386/uaccess.h's
two have similar orders of operation. __copy_to_user_inatomic, include/asm-x86_64/uaccess.h's __copy_to_user,
and arch/x86_64/lib/copy_user.S's copy_to_user. its purpose is to
write a passed in memory segment to the remote node. in this function,
use of in_atomic is incorrect! we first use in_atomic to check if we are in an atomic state, and if we
are, we return the number of bytes requested to be written, indicating
failure. if we're not in atomic state, we use OMDEBUG_CPYUSER to log a
message about the source, destination, and length. then we fill in a
omp_usercopy_req structure with the passed in target address, and
passed in length. we use comm_send_hd to send the omp_usercopy_req
structure to the remote node, then use comm_send to send the actual
data to be written. we return 0 indicating success. if comm_send_hd or
seperate OMBUG invocations! comm_send return an error, we OMBUG the error value, and return -1.
deputy_copy_to_user is exported as a symbol with EXPORT_SYMBOL().
make this function and the previous deputy_strnlen_user is called by arch/i386/lib/usercopy.c's
three have similar orders of operation. strnlen_user, and arch/x86_64/lib/usercopy.c's __strnlen_user and
strlen_user. its purpose is to strnlen a string on the remote node. to
accomplish this, we first OMDEBUG_CPYUSER a message containing the
address and maximum length of the request. we then fill in a
omp_usercopy_req structure containing the passed in address and length.
we sends this structure to the remote node using comm_send_hd, then
uses comm_recv to get the result into a temporary variable, then return
seperate OMBUG invocations! the result. if comm_send_hd or comm_recv return an error, we OMBUG the
BUG: 0 is a valid return result! value returned, and return 0, indicating failure. its symbol is
EXPORT_SYMBOL'd. deputy_put_userX is called by the below
deputy_put_user and deputy_put_user64 functions. its purpose is to
write a value of 64bits or less to a given location on the remote node.
to accomplish this, we start by OMDEBUG_CPYUSER'ing saying what value
we're placing in what position of what length. then, we fill in an
omp_usercopy_emb structure with the passed in address, length, and
value. we then use comm_send_hd to send the value to the remote node,
and return 0 in case of success. if comm_send_hd fails, we return
-EFAULT. deputy_put_user is called by include/asm-i386/uaccess.h's
put_user, deputy_put_user64_helper, __put_user_size, and
include/asm-x86_64/uaccess.h's put_user_size. its function is to put a
value to a remote node. in it, we just check to make sure the size of
the value passed in is not greater than sizeof(long). if it is, we
call BUG_ON. otherwise, we call deputy_put_userX with our arguments,
returning the value it returns. deput_put_user's symbol is
EXPORT_SYMBOL'd. deputy_put_user64 is called by the
bad name in comment. deputy_put_user64_helper inline assembly function, from
include/asm-i386/uaccess.h. this function is only created if
BITS_PER_LONG < 64, AKA, we're on a 32 bit archetecture.
in deputy_put_user64, we use deputy_put_userX to put a value of type
s64 to the remote node, returning the result returned by
deputy_put_userX. if declared, this function is EXPORT_SYMBOL'd.
deputy_get_userX is called by the following deputy_get_user and
deputy_get_user64 functions. its purpose is to get a value up to 64
buts in length from the remote node. in it, we first use
OMDEBUG_COPYUSER to log the requested address and size. we then place
these values in a omp_usercopy_req structure, and use comm_send_hd to
send it to the remote node. we comm_recv() the result, then place the
result in the passed in pointer, being careful to use the right size.
we return 0 indicating success. if comm_recv or comm_send_hd fail, we
seperate OMBUG invocations! OMBUG the error returned, and return -EFAULT. deputy_get_user is called
by include/asm-i386/uaccess.h's get_user and get_user_size macros, and
include/asm-x86_64/uaccess.h's get_user macro. its purpose is to get a
value up to sizeof(long) from the remote node. in it, we BUG_ON if the
value requested is larger than sizeof(long), and call deputy_get_userX
to actually do the work. this function is EXPORT_SYMBOL'd.
BUG: shouldnt we be calling this on deputy_get_user64 is created if BITS_PER_LONG < 64. its not called from
i386? anywhere. its purpose is to get a 64bit value from the remote node. in
it, we simply call the above deputy_get_userX function. its symbol is
EXPORT_SYMBOL'd. at this point, we start into code intended to run on
the remote node, responding to the above functions. remote_copy_user is
called from the below remote_handle_user, to handle DEP_COPY_FROM_USER
and DEP_COPY_TO_USER packets, sent from deputy_copy_from_user and
deputy_copy_to_user on the home node. in it, we first receive an
omp_usercopy_req structure from the home node, then allocate the
ammount of space indicated by the omp_usercopy_req's len member, to
store the information being copied. we then check the passed in
request, to see if we're copying to or from userspace. if
we're copying from userspace, we invoke copy_from_user to copy the data
requested into our temporary buffer, then use comm_send to send the
data to the home node. if we're copying to userspace, we use comm_recv
to receive the data into our temporary buffer, then copy_to_user to
copy the data into the requested location.
remote_strncpy_from_user performs strncpy_from_user on
behalf of deputy_strncpy_from_user. it uses comm_recv to get its
target, and comm_send to return the results. its symbol is not
exported. remote_strnlen_from_user performs strnlen_user or strlen_user
on behalf of deputy_strnlen_user. it functions similarly to
remote_strncpy_from_user. its symbol is not exported. remote_put_user
will use put_user on behalf of either in up to a 64bit size. its
missing BITS_PER_LONG logic that should be like the following function.
its symbol is not exported. remote_get_user is structured similarly.
its got BITS_PER_LONG==64 logic. its symbol is not exported. finally,
we have remote_handle_user, which is the function that dispatches up to
above remote_ functions. it calls comm_recv looking for a req structure.
other than that, its a large select case. we return from it when we
receive a endtype packet, returning 0. if there's an unrecognized
packet, we call remote_disappear to die.
031 omctrlfs, local creates hpc/ctrlfs.c omctrlfs is the future filesystem for performing migration and http://osdir.com/ml/linux.cluster.openmosix.devel/2006-01/msg00028.html
remote process state monitoring. this file is a stub, implimenting this
filesystem type as simply as possible. we start by defining
CTRLFS_MAGIC, which is the magic string the filesystem layer will use
dead code. to recognize this FS type. we then create a vfsmount structure, which
is normally used to track our filesystem state, but is not used in this
module. next, we declare our mount count, which is used to track how
many times this filesystem has been mounted. this is only used when
de-registering the filesystem. ctrlfs_fill_super is called as a
callback function by our invocation of get_sb_single in ctrlfs_get_sb.
ctrlfs_fill_super wraps simple_fill_super(), passing it our
CTRLFS_MAGIC, and our empty list of files. ctrlfs_get_sb is called by
the kernel's filesystem layer via the ctrlfs_type variable we send to
register_filesystem. in it, we wrap get_sb_single(), telling it to use
ctrlfs_fill_super to generate our filesystem's superblock. we then
declare a file_system_type structure, mapping .get_sb to our
ctrlfs_get_sb above, and .kill_sb to a generic cleanup function.
om_ctrlfs_init is called from the kernel to initialize the module. in
it, we just call register_filesystem() with the previously defined
file_system_type structure. om_ctrlfs_exit is called prior to removing
the module. in it, we call simple_release_fs(), then
unregister_filesystem(). we then use module_init() and module_exit() to
define the init and exit points for this module, then register the
license and the author.
032 debug creates hpc/debug.c this file contains debugging assisting code. it starts with debug_mlink
which is a wrapper which printks the address of a socket.
debug_page creates a checksum of a 4096 byte page of memory, and
check incoming pointers! printks the results. debug_vmas dumps the starting address and ending
address of each vma belonging to a given mm_struct. debug_signals is a
stub, not printking anything of value.
033 debugfs,local creates hpc/debugfs.c this file contains our debugfs module. this module creates entries in
the debugging filesystem allowing userspace to see openmosix specific
debugging values. we start by defining a dentry structure, used to
contain the om/ directory we create in the debugfs. we then define four
file entries, pointing the migration, syscall, rinode, and copyuser
files to entries in the om_opts structure (defined in hpc/kernel.c),
move the declaration of om_opts to this and an array of dentry structures to contain the four files.
module? we don't seem to be using these om_debugfs_init is called by the kernel to initialize this module. In
debug values anywhere else, what is it, we call debugfs_create_dir to create the om debugfs directory, then
the use of this code? debugfs_create_u8 to create entries for our four files in this
directory. om_debugfs_exit is called by the kernel prior to unloading
this module. in it, we use debugfs_remove to destroy the entries for
the four files, and the directory itself. we then add code defining the
entry and exit points of the module, the license, and the author.
034 i386,arch-debug creates hpc/debug-i386.c this file contains architecture specific debugging code for the i386
remove one uaccess.h include. archetecture. none of the functions in this file should be called in
remove one ptrace.h include. the normal functioning of openmosix. om_debug_regs printks the user
printk! register set of the passed in or current process, otherwise known as
the pt_regs structure. if no pt_regs structure is passed in, we use
ARCH_TASK_GET_USER_REGS to retrieve the pt_regs structure of the
current process, and printk it. debug_thread printks the contents of
the passed in thread_struct register. show_user_registers is
shamelessly stolen according to the comments, and does a fuller job
of dumping the state of a user process than the previously defined
om_debug_regs. in it, we printk the CPUNO, EIP and EFLAGS. we then
printk the EAX, EBX, ECX, EDX, ESI, EDI, EBP, DS, ES, and SS registers,
and the processes pid, thread_info pointer, and task pointer. we call
show_stack to dump the stack, then we dump the hex values of the 20
instructions before EIP, and the 20 instructions after EIP.
035 ppc,arch-debug creates hpc/debug-ppc.c this file contains architecture specific debugging code for the ppc
remove one uaccess.h, and one ptrace.h. archetecture. none of the functions in this file should be called in
printk! the normal functions of openmosix. om_debug_regs printks the contents
of the passed in pt_regs structure, or if NULL is passed in, the
pt_regs structure of the current process. debug_thread and
show_user_registers are stubs, doing nothing and returning nothing.
036 x86_64, creates hpc/debug-x86_64.c this file contains architecture specific debugging code for the x86_64
arch-debug remove one uaccess.h, and one ptrace.h. archetecture. none of the functions in this file should be called in
printk! the normal functions of openmosix. om_debug_regs printks the contents
of the passed in pt_regs structure, or if NULL is passed in, the
pt_regs structure of the current process. debug_thread and
show_user_registers are stubs, debug_thread only printking a single
line, and both and returning nothing.
037 omremote creates hpc/deputy.c this file contains functions for servicing requests from a remote
process, AKA, communication to the home node, from a process that is a
guest on a remote node. first, we define deputy_die_on_communication,
rename deputy_die_on_communication to which in spite of its name is called by deputy_process_communication to
deputy_die. kill the deputy when communication with the remote node containing the
printk! remote half of the process fails. it printk's a message, then calls
do_exit(SIGKILL). Its defined NORET_TYPE indicating it will not be
returning. deputy_do_syscall is called by the later defined
deputy_process_communication when a syscall request from a remote
process is received. in it, we use comm_recv() to receive the syscall
number and address of its arguments, then OMDEBUG_SYS a debugging
message. we execute the syscall requested using arch_exec_syscall,
OMDEBUG_SYS another debugging message, then return the result of the
syscall. deputy_do_fork is called by the later defined
deputy_process_communication to perform a fork on behalf of the
remote process. in it, we use comm_recv to receive the fork request.
we then use comm_setup_connect to open up a connection to a listening
socket on the remote node, call do_fork, then use task_set_comm to
associate the new child process with the newly created connection.
we call comm_send_hd to return the results of the fork to both the
parent and child on the remote node. deputy_do_readpage is called by
the later defined deputy_process_communication when we receive a page
read request. in it, we use comm_recv to get the file pointer and
offset of the content requested. we then use task_heldfiles_find get a
read handle on the file. we use an empty vma page to hold the results
of a page in request from the file, and send the page to the remote
end. we unmap the page, then return the result of the comm_send.
merge this into do_mmap_pgoff? deputy_do_mmap_pgoff is called by do_mmap_pgoff in mm/mmap.c to
perform the same function as do_mmap_pgoff's lower half, taking into
account the differences required by deputy processes. to do this, we
allocate memory for a vma structure from SLAB_KERNEL and zero it. we
set up a vma structure in this space to contain the definition of the
memory area we've been requested to occupy. we pass this vma to our
passed file *'s mmap f_up handler. we then add this file to our held
files for this process by calling task_heldfiles_add. theres a comment
fill in missing code! here indicating that we're supposed to insert the vma into our
current->?? (current->mm->mmap list?), but the code for that isn't
written. deputy_do_mmap is called from the later defined
deputy_process_communication, when a mmap request is received from a
remote process. in it, we uses do_mmap_pgoff from mm/mmap.c to mmap a
file into the deputy, then return the mmapped region's contents to the
remote host. bprm_drop is used by the later declared __deputy_do_execve
to destroy a linux_binprm structure, which is a structure for
containing an executable program, its arguments, pages, security http://www.kernel-api.org/docs/online/1.0/da/d1e/structlinux__binprm.html
context, and mm structure. bprm_drop calls the appropriate destructors
for the members of our binprm, and calls fput() on all the processes
still open writable files, before destroying the binprm structure
itsself. __deputy_do_execve is called by deputy_do_execve, to do the
work of performing an execve when an execve reques is received from a
remote process. we use search_binary_handler to perform the execve on
FIXME: free the pages our arguments are the home node. if it was successful, we have a FIXME indicating we
contained in. should be freeing the pages containing our arguments. we then free our
binprm's security context, call acct_update_integrals (to tell the
accounting system about the new process), free the bprm structure, and
"return" to the new process. otherwise, we use the previously defined
bprm_drop to clean up the failed execve attempt. deputy_setup_bprm is
called by the later defined deputy_do_execve to setup a bprm structure
suitable for execution by __deputy_do_execve. in it, we allocate
space for our bprm structure from GFP_KERNEL. we then call open_exec to
attempt to open our executable. if that succeeds, we fill in the bprm's
file, filename, interp, and mm members, using mm_alloc to fill in
mm. we use the kernel's init_new_context() to perform architecture
specific mm setup. on x86, init_new_context copies the local descriptor
table of the current process to the new process. we then copy argc and
envc, making sure neither is less than zero. we allocates a security
context, then use prepare_binprm to fill in the rest of the bprm
structure. we use copy_strings_kernel to copy our filename, argv, and
envp arrays into kernel pages, instead of user space memory. if any of
the above fails, we use bprm_drop to clean up. deputy_do_execve is
called by the later defined deputy_process_communication when an execve
request from a remote node is received. in it, we call comm_recv to
receive the requested file, argv, and envp, deputy_setup_bprm to get a
brpm structure ready to execute, then __deputy_do_execve to perform the
FIXME: empty reply? whats with this? work. we then use comm_send_hd to send back an empty reply. if any of
FIXME: multiple unique error paths! the above fails, we destroy the space used to store the request, and
return the error value in question. deputy_do_sigpending is a wrapper
around do_signal, called by deputy_process_misc. it has code for doing
FIXME: dead code! more, but its dead/unused code. deputy_process_misc is called by
deputy_main_loop to checks for pending dreqs, and dispatches them to
task_do_request. it also checks for pending signals, and dispatches
them to deputy_do_sigpending. deputy_process_communication is called by
deputy_main_loop if a process has communication. it contains a large
switch that dispatches requests the aforementioned callees. it calls
deputy_die_on_communication if comm_recv returns an error, if the type
member of the req received is zero, or if one of the functions we call
returns negative. deputy_main_loop is the userspace loop that is
executed as the main thread of a deputy process. in it, we immediately
enter a large while loop, waiting for the process to not be DDEPUTY.
in the while loop, we call deputy_process_communication when comm_wait
returns true, then call deputy_process_misc to dispatch pending dreqs
and signals. its my beleif that this function never returns naturally.
deputy_startup is called by hpc/migctrl.c's task_local_send to exit the
old task, and enter deputy state. in deputy_startup, we use
task_set_dflags to mark this task as deputy, flush a signal that pops
FIXME: hunt down the unknown reason. up for unknown reason, according to a fixme, and calls exit_mm().
038 omremotefile creates hpc/files.c This file contains routines for handling file and dentry requests on
omremotedentry BREAKUP: should create hpc/dentry.c behalf of remote processes. It starts with two structure declarations.
OPT: move remote_aops inside of remote_aops is an address_space_operations structure, mapping .readpage
rdentry_create_entry to hpc/remote.c's remote_readpage, not touching any other mappings. its
used by the later defined rdentry_create_entry(). The second structure
defined is remote_file_operations, mapping .mmap to hpc/remote.c's
remote_file_mmap, not touching any other mappings.
remote_file_operations is used by rdentry_create_entry() and
OPT: return void? rdentry_create_file(). task_heldfiles_add is called by
deputy_do_mmap_pgoff in hpc/deputy.c to create and insert an
om_held_file structure representing the passed in file into the list of
files held by the given process. in it, we allocate an om_held_file
struct from GFP_KERNEL, use get_file to increment the file's usage
OPT: remove nb member? counter, fill in the om_held_file's file and nb entries with the passed http://www.faqs.org/docs/kernel_2_4/lki-3.html
in file pointer, fill in rfile->nopage with nopage from the passed in
vm_operations_struct, and insert our om_held_file struct into
task->om.rfiles with list_add. we then return 0, since no conditions in
this function return errors. task_heldfiles_clear is called by
openmosix_task_exit to destroy the contents of the passed in held files
linked list. for each file in the list, we call fput to decrement the
file usage counter, then free the om_held_file structure.
task_heldfiles_find searches the list of files held by the passed
in task for an om_held_file whose file member matches the passed in
file pointer. in it, we use list_for_each_entry to iterate through the
list. if we find a match, we return the heldfile, otherwise, we
DEAD: Dead code. printk() an error message, and return NULL. next we have a structure
declaration that has been commented out via a #if 0. it was to declare
SPACE: break in the file? a backing_dev_info structure. after that, theres a break in the file.
we define the om_remote_dentry structure, a spinlock, and a list_head
structure for containing the om_remote_dentrys. rdentry_delete
acquires the remote_dentries spinlock, and removes the first entry in
the remote_dentries list with a dentry member that matches the passed
in dentry. If we don't find a matching entry, we call BUG(), and
return -ENOENT. rdentry_iput is called via a pointer stored in the
later defined remote_dentry_ops structure. Its function is to free a
given inode's generic_ip member (containing our rfile_inode_data
structure), then call iput to both push the inode's unsaved contents
to disk, and decrement its usage counter. the remote_dentry_ops
structure maps its .d_delete and .d_put entries to the previous two
DEAD: Dead code. functions, but is not used anywhere in the code. Next, we declare a
super_operations structure, containing only default members. We then
use this structure to fill the .s_op member when declaring a
super_block structure, filling the super_block's .s_inodes member with
OPT: move remote_file_vfsmnt inside of a new LIST_HEAD. struct remote_file_vfsmnt is a vfsmount structure,
rdentry_create_file. contining five required list heads, and a mount count. it is declared
as its own parent. rdentry_add_entry is called by the later defined
rentry_create_dentry to create an om_remote_dentry structure containing
the passed in dentry, and insert it into the list of remote dentries.
in it, we allocate a new om_remote_dentry from GFP_KERNEL, sets the
dentry member to the passed in dentry, aquire the remote_dentries
spinlock, add the om_remote_entry to the remote_dentries list, then
release the spinlock. If the kmalloc fails, we return -ENOMEM,
otherwise we return 0, indicating success. rdentry_create_dentry is
called by rdentry_create_file to create a new dentry pointing to the
passed in rfile_inode_data, and register it with rdentry_add_entry.
first, we create a new inode, backed by rfiles_dummy_block. we create a
duplicate of the passed in rfile_inode_data allocated from GFP_KERNEL,
and set our new inode's u.generic_ip member(the inodes private data
space) to point to our copied inode. our new inode's file and address
space operations are pointed to the earlier defined stubs,
remote_file_operations and remote_aops. We allocate a dentry using
d_alloc, set its inode member to our new inode, and make it its own
parent. to accomplish this, when we call d_alloc, we temporarily create
FIXME: expand gccism? a qstr structure stating that this entry is for a file named "/",
FIXME: real file name and length? with a length of 1. we pass this qstr as our argument to d_alloc. We
use rdentry_add_entry to add our newly aquired dentry to our
FIXME: error handling? remote_dentries list, and return the new dentry. the error handling in
this function seems VERY broken. If either of our alloc calls fail
(kmalloc or d_alloc), we free our passed in data(!), call iput on our
allocated inode, and return NULL. rfile_inode_get_data is a wrapper
called by rfiles_inode_get_file. it returnsthe given inode's
u.generic_ip(the private data space set by rdentry_create_dentry)
contents. rfiles_inode_get_file is a wrapper called by hpc/remote.c's
remote_readpage returning rfile_inode_get_data(inode)->file.
rfiles_inode_compare is a wrapper called by rdentry_find and
task_rfiles_get. it memcmps the passed inode's private data space
against the passed in rfile. returning the result. rdentry_find is
called by rdentry_create_file to look up an rdenty coresponding to the
passwd in inode. in it, we grab the remote_dentries spinlock, and use
list_for_each_entry to cycle through all of the rdentries, comparing
each to rdentry->dentry->d_inode. if we find a match, we breaks out,
unlock the spinlock and return the dentry of the matching rdentry.
otherwise, we unlock the spinlock and return NULL, due to the last
FIXME: verify this works. dentry being NULL. rdentry_create_file is called by the later defined
task_rfiles_get to create a file pointer matching the supplied
rfile_inode_data. in it, we use get_empty_filp to create an empty file
pointer, then call dget(rdentry_find(data)) to get a dentry pointing to
the passed rfile_inode_data(if one exists). if dget fails, we call
rdentry_create_entry to create a dentry pointing to our passed
rfile_inode_data. if our rdentry_create_entry call fails, we call
put_filep to close our file pointer, and return NULL. we use
remote_file_operations and remote_file_vfsmnt structures to set our new
file pointer's f_op and f_vfsmnt members, set its f_dentry to our
dentry, and mark its mode FMODE_READ. we then return the file pointer.
if get_empty_filep fails, we return NULL. task_rfiles_get is called by
mig_do_receive_vma and remote_do_mmap to search through the processes'
vma pages, and check to see if any of them have a paticular file
associated with them. to do this, we construct an rfile_inode_data
structure containing our passed in origfile, node, and isize. We then
compare it against our list of rdentry files, using
rfiles_inode_compare. If rfiles_inode_compare returns true, we return
the file pointer associated to the inode in question. if not, we call
rdentry_create_file to create a new rdentry containing the passed in
file, an return the file pointer returned from rdntry_create_file.
039 kcomd creates hpc/kcomd.c This file contains the kernel-to-kernel socket communication code. This
file is set up to create a kcomd.ko kernel module. We start by defining
three socket related functions. socket_listen is called via the later
defined sock_listen_ipv4 and sock_listen_ipv6, by the later defined
kcomd_thread function, to set up the listening IPv4 and IPv6 sockets.
In it, we create a socket, call sock_map_fd to associate an fd to the
socket, bind to the socket using its sock->ops->bind(), start listening
to the socket using its sock->ops->listen(), set the passed in pointers
res to point to the newly created socket, and return the file
FIXME: set res to NULL in every fail descriptor to the newly created listening socket. If sock_create fails,
case. unique return values! we return -1. If sock_map_fd fails, we release our socket, assign NULL
to the address passed via res, and return -1. If either our bind or
listen fails, we close our fd, release our socket, assign NULL to res,
and return -1. socket_listen_ipv4 and socket_listen_ipv6 are called by
kcomd_thread to set up the correct type of listening socket. both
these functions are wrappers performing setup, then calling the above
socket_listen function, returning the returned result. they each
set up their appropriate type of sockaddr structure, and call
BREAKUP: move these structures to a socket_listen, returning the result. next we define three kcom related
private header. data structures. struct kcom_pkt is designed to contain a packet
destined to a remote kernel. struct kcom_node is designed to contain a
socket, and the information reguarding the node it points to. kcom_task
is the structure that contains kcomd's knowledge about a migrated
process. it contains the pid of the process in question, a kcom_node
OPT: is this list used? structure defining what node a process is on, a list of processes
communicating with this node(?), a list for containing outgoing
packets, and space for one incoming packet. we then define a spinlock
and a list_head for containing these kcom_nodes. next, we define
sockets_fds as a fd_set_bits structure. this structure is a more
scalable version of a fd_set, used by do_select. We then declare
sockets_fds_bitmap and maxfds, which are set and used by the next
function, alloc_fd_bitmap, to hold a dynamically grown array of fds.
alloc_fd_bitmap checks the passed in fdcount against ammount of fds the
current sockets_fds_bitmap was created to hold, and if its greater,
frees sockets_fds_bitmap (and its contents), and allocates a new one.
if kmalloc fails, we return ENOMEM. otherwise, we set the in, out, ex,
res_in, res_out, and res_ex members of the sockets_fds structue to
offsets of our sockets_fds_bitmap structure, and return 0.
kcom_pkt_create is called by the later defined kcom_task_send to create
a new kcom_pkt structure with the len, type, and data members
initialized to the passed in values. if kzalloc fails, we return NULL.
Otherwise, we return the properly initialized kcom_pkt structure.
__kcom_node_find is called by the later defined kcom_node_find to do
the work of finding a node in our kcom_nodes list that uses the passed
in sockaddr to communicate. We use list_for_each_entry and memcmp to
FIXME: doublecheck this return compare the address of our sock with the address of our node(!). this
FIXME: note the fixme reguarding memcmp function will return NULL if it fails. kcom_node_find is not called by
DEAD: dead code. anything. it wraps __kcom_node_find, grabbing the kcom_nodes_lock
before entry, and releasing it afterward. kcom_node_add is called by
accept_connection to create a new kcom_node struct, and add it to the
DEAD: dead code. kcom_nodes list. there is code commented out reguarding finding out if
the node is already in the list, but its incomplete. kcom_node_del is
not called by anything. its called to remove a node from the kcom_nodes
list that uses the passed in sockaddr. in it, we aquire the kcom_nodes
spinlock, then use __kcom_node_find to find a node structure to be
deleted. if we don't find one, we release the kcom_nodes spinlock,
and return -ENOENT. otherwise, we call list_del to remove the node from
our node list, release the spinlock, close the communicating fd,
release the socket, free the node structure's memory, and return 0.
DEAD: dead code. comm_simple is a stub that returns 0, and is not called elsewhere in
DEAD: dead code. the code. we then forward declare comm_ack, comm_iovec, and
comm_iovec_ack functions, which are not defined or called anywhere
else. accept_connection is called by the later declared kcomd_thread to
accept an incoming connection on a passed in socket. in it, we start by
allocating a new socket, and calling the accept() operation of the
passed in socket to accept a connection from the passed in socket, on
FIXME: this dead code needs to live! our new socket. theres a block of commented out code, for checking if a
node is already in our node_list, but it is unused/incomplete. we then
use sock_map_fd to get a file descriptor to this socket, add the node
use socket is communicating with to our node_list, and return our file
descriptor. If our socket allocation returns null, we return -1. If our
FIXME: unique error paths! accept or sock_map_fd have problems, we release our socket, and return
-1. if kcom_node_add fails, we close our fd, release our socket, then
return -1. we then create data_read, data_write, and dispatch stubs
that only return 0. data_read and data_write are called by
DEAD: dead code. kcomd_thread. dispatch is never called. kcom_task_create is not called
from anything. its called to create a kcom_task structure for a given
kcom_node and PID, initializing the pid, node, and list members. in it,
we use kzalloc to allocate the memory from GFP_KERNEL. if the kzalloc
returns NULL, we return NULL. otherwise, we initialize the fields of
DEAD: dead code. our new structure, and return it. kcom_task_delete is not currently
called from anywhere. its called to delete the first entry in the nodes
FIXME: task manipulation functions are list matching the given PID. __kcom_task_find and kcom_task_find are
missing spinlocks like the node code constructed like the above node find code, but without spinlocks.
uses. find out why. __kcom_task_find uses list_for_each_entry to cycle through our list
of nodes, using list_for_each_entry on each node's list of tasks to
find a task with the same pid as the passed in PID, and return it, or
NULL if one is not found. kcom_task_find is a wrapper around
__kcom_task_find, passing it the PID passed in, and returning the
returned result. kcom_task_send isn't called by anything. it uses
kcom_pkt_create to add a packet to the task structure belonging to the
FIXME: incomplete code! pid passed in. it has comments reguarding sleeping and replying, but
instead returns 0. kcomd_thread is the function executed in kernel
space, as a kernel thread.in it, we first call daemonize to create a
"kcomd" process. we then wait for a connection on an ipv4 and an ipv6
FIXME: should this while loop exit? socket. when we receive a connection, we enter a large while loop
In this loop, we first call alloc_fd_bitmap to make sure our fd bitmap
is big enough to hold maxfds number of fds. we then zero the in, out,
and ex fd sets, add our two listening sockets to the in set, add the
listening fds of each node in our node_list to the in set, add each fd
in our node list that we have packets to send on to the out set, zero
the res_in, res_out, and res_ex set of fds, and call select. if select
returns -1, we return to the top of our loop. otherwise, we test to see
if our v4 or v6 listening socket received a connection. if so, we call
accept_connection. if not, we examine each fd belonging to our list
of nodes, and if they have data to read, call data_read (a NOP!), or
if they have data to be written call data_write(also a NOP!).
when done, we return to the top of our never-ending while loop.
kcom_init calls kernel_thread to start the aforementioned kcomd_thread
function as a kernel thread. kcomd_exit is an empty function. we
register kcomd_init to run when the module is loaded, kcomd_exit to run
upon unload, and call two macros, the first licensing this file under
the "GPL", and the second attributing Vincent Hanquez as the author.
040 config creates hpc/Kconfig this file defines our openmosix menu options in the kernels
configuration system (menuconfig). we declare a top level menu titled
"HPC Options". our configuration options all exist under this entry.
FIXME: seperate this into PMIGUEST first, we create an entry defining KCOMD as a tristate, or an item that
support, and PMIREMOTE support. can be either on (in the kernel), off, or a module (loadable and
BUG: shouldnt KCOMD depend on unloadable while the kernel is running). next we create an entry
OPENMOSIX, not the other way round? defining OPENMOSIX as bool (in kernel, or not). this turns on or off
the parts of openmosix that must be in-kernel for openmosix to
DEAD: dead code. function. bool OPENMOSIX_VERBOSE is supposed to make openmosix more
verbose, but just serves to make OPENMOSIX_MIGRATION_VERBOSE and
OPENMOSIX_DEBUG_FS visible. bool OPENMOSIX_MIGRATION_VERBOSE enables
debugging messages of the form OM_VERBOSE_MIG(...) in
include/hpc/prototype.h. bool OPENMOSIX_DEBUG accomplishes many things.
first, it enables compilation and inclusion of hpc/debug.c and an
archetecture specific hpc/debug-$(ARCH).c, both of which contain
functions for printing the state of various structures, processor
registers, and other associated values. then, it enables debugging
messages of the form OMDEBUG(...) in include/hpc/debug.h. it enables
the tracking of the contents of the structure openmosix_options in
include/hpc/hpc.h, and makes OPENMOSIX_MIGRATION_DEBUG and
DEAD: dead code. OPENMOSIX_DEBUG_FS visible. bool OPENMOSIX_MIGRATION_DEBUG does not do
anything, and can be safely removed. bool OPENMOSIX_DEBUG_FS enables
the compilation and inclusion of the contents of hpc/debugfs.c,
creating the om/ directory and its contents under the debugfs. bool
OPENMOSIX_CTRL_FS enables the compilation and inclusion of
hpc/ctrlfs.c, which is a filesystem stub, intended to be the next
migration control filesystem.
041 ominterface creates hpc/kernel.c This file is the primary interface for the kernel to the process
migration system. It contains mostly functions meant to be called by
hooks we place in the kernel. First, we export the openmosix_options
OPT: shouldnt the existance of this structure, which contains four constants used as "ceilings" for the
structure depend on debugfs? OMDEBUG_* debugging macros. The values in this datastructure are
settable through debugfs entries, created by hpc/debugfs.c.
openmosix_pre_clone is called by code added to kernel/fork.c's do_fork
before the kernel starts processing the fork request. In this function,
we check wether the current process has requested a shared memory space
between the two clones, and if it has, we mark the process as
un-migratable for the DSTAY_CLONE reason, and increase the usage count
on its mm structure. Note that both processes resulting from the fork
will be marked DSTAY_CLONE, and both will inherit mm usage counts
increased by +1 on their mm structures. openmosix_post_clone is called
by code inserted at the end of do_fork in kernel/fork.c. It is called
on the thread of both the parent and the child process after the work
in do_fork has completed. In it, we first check to see if the process
is marked VM_CLONE. if it is, we immediately return. otherwise, we
check the mm_realusers counter. if its just 1, then the processes mm
is this supposed to happen when a structure is only being used once, so we clear the DSTAY_CLONE flag
child dies, or otherwise drops the previously assigned by openmosix_pre_clone. task_maps_inode is called
shared mm? by the next function, openmosix_no_longer_monkey, to check wether a
FIXME: stub! given task maps a given inode, but is just a stub, returning 0.
openmosix_no_longer_monkey is called from __remove_shared_vm_struct
to check every process on the machine and see wether its using the
passed in inode. if it is, we set the processes DREQ_CHECKSTAY flag,
as this inode is about to be removed from service by
__remove_shared_vma_struct, and doing such may make this process
migratable. setting this flag indicates to hpc/task.c's task_do_request
that this process needs reexamined during the reexamine sweep. to
accomplish the former function, we first aquire the tasklist_lock, then
invoke for_each_process() against the list of all processes. We use the
previous function task_maps_inode to check if each process is using
the passed in inode. Since the previous function is a stub, this
function does nothing. At the end of the function, we release the
tasklist_lock. stay_me_and_my_clones is called by code added to
sys_mlock and sys_mlockall in mm/mlock.c, as well as do_mmap_pgoff in
mm/mmap.c. it applies a given bitmask of reasons not to migrate to the
current task, and all tasks that share its mm structure. in it, we
first use task_lock to lock the current process. we then set its stay
reason, then task_unlock. if the number of mm_realusers is greater than
one (indicating that some other process uses this processes mm
structure), we grab the tasklist_lock, use for_each_process to search
for a process with the same mm pointer (that aren't the current
process), and use task_lock, task_set_stay, and task_unlock to add our
OPT: void, not int? stay reasons to the process. we always return 0. obtain_mm is called by
mig_handle_migration() in hpc/migrecv.c and task_local_bring() in
hpc/migctrl.c to allocate a new mm structure, initialize it, and make
FIXME: should there be a mm, should we it the context of the current process. We start by checking to see if
be DDEPUTY? theres a mm structure already associated with the passed in task. If
there is, we check to make sure the process is not marked DDEPUTY. If
FIXME: what was the logic of the it is, we call panic() to print a debugging message. at this point is
commented out code? some commented out code that responds to there being an mm on a deputy
process by calling exit_mm on it. Either way, we then mm_alloc() a new
FIXME: mm leak? mm, initialize it to hold our given task with init_new_context(),
aquire the mmlist_lock, initialize our new mm's mmlist member with the
mmlist of process zero, and release the mlist_lock. We then assign this
mm to our process by first aquiring the task_lock(), saving our curent
active mm, setting the task's active mm and mm to our newly created mm,
and task_unlock()ing. we call activate_mm with our origional and new
mm, then mmdrop the old active_mm. if our mm_alloc() fails, we return
-ENOMEM. if init_new_context() fails, we destory our allocated mm, and
return the error init_new_context() failed with. otherwise, we return 0
for success. unstay_mm is called by code added to sys_munlock and
sys_munlockall in mm/mlock.c to set the DREQ_CHECKSTAY flag on the
current process, and all processes that share its mmstructure. for the
premature optimization? looks good tho. common case of just one task using a given mm structure, we just call
BUG: no locking! task_set_dreqs(current, DREQ_CHECKSTAY). otherwise, we use
for_each_process() with a read_lock held on the tasklist_lock to
iterate through each process on the machine, checking if its using our
passed mm, and if so, we call task_set_dreqs(p, DREQ_CHECKSTAY) on it.
remote_pre_usermode is called by the later defined
openmosix_pre_usermode to check for communication events before
entering userspace. in it, we call comm_peek() to see if theres pending
input, and if there is, call remote_do_comm() to process the
OPT: void, not int? communication in question. remote_pre_usermode always returns 0 for
success. deputy_pre_usermode is also called by openmosix_pre_usermode,
before jumping to userspace while handling a process in deputy state.
in this function, we just jump into deputy_main_loop, instead of going
to any real usermode code. if deputy_main_loop() returns, this function
returns 0. openmosix_pre_usermode is called by assembly code inserted
into arch/$ARCH/kernel/entry, when switching from kernel space to user
space. we first check for pending dreqs, and if we find one, we save
our current irq mask, call task_do_request, and restore our irq mask
once task_do_request returns. after dispatching possible dreqs, we call
one of the previous two functions depending on wether the process is in
DDEPUTY or DREMOTE state. like before, we save our irq mask before
calling remote_pre_usermode or deputy_pre_usermode, then restore them
OPT: return void? once we return from userspace. this function always returns 0 for
success. openmosix_init is called upon subsystem load. it starts the
openmosix_mig_daemon kernel thread to receive incoming processes, and
returns 0. the last line in this file tells the subsystem system to
call openmosix_init upon initializing this kernel component.
042 config creates hpc/Makefile This Makefile contains the make fragments that tell the kernel what
targets to build in the hpc/ directory. this code defines five targets,
obj-$(CONFIG_KCOMD), obj-$(CONFIG_OPENMOSIX)
obj-$(CONFIG_OPENMOSIX_CTRL_FS), obj-$(CONFIG_OPENMOSIX_DEBUG)
and obj-$(CONFIG_OPENMOSIX_DEBUG_FS). each of these targets coresponds
with one of the configuration variables defined by hpc/Kconfig. when an
option has been enabled in the menus, the items listed by each of these
targets wull be built. obj-$(CONFIG_KCOMD) says to build kcomd.o.
obj-$(CONFIG_OPENMOSIX) says to build kernel.o, task.o, comm.o,
remote.o, deputy.o, copyuser.o, files.o, syscalls.o, migrecv.o,
migsend.o, migctrl.o, service.o, proc.o, and an arch-$(ARCH).o file
containing archetecture specific functionality. proc.o has a comment
isnt this the code we use? noting that its "legacy code". obj-$(CONFIG_OPENMOSIX_CTRL_FS) says to
build ctrlfs.o. obj-$(CONFIG_OPENMOSIX_DEBUG) says to include debug.o,
and an archetecture specific debug-$(ARCH).o. finally,
obj-$(CONFIG_OPENMOSIX_DEBUG_FS) says to include debugfs.o.
043 core creates hpc/migctrl.c This file contains functions for moving processes via the migration
migration_cntrl infrastructure. task_remote_expel is called by the next function
FIXME: shouldn't remote_do_comm use task_remote_wait_expel and hpc/remote.c's remote_do_comm to send a
task_remote_wait_expel? remote process back to its origional node, merging it with its deputy.
First we check to make sure the task we've been passed is in DREMOTE
state, and use BUG_ON to panic the system if its not. We then use http://kerneltrap.org/node/7204
hpc/migsend.c's mig_send_hshake to request a return migration from the
home node. If this succeeds, we call hpc/migsend.c's mig_do_send to
actually perform the migration. After that, we destroy our link to the
home node by using hpc/task.c's task_set_comm to associate our link to
null, then calling hpc/comm.c's comm_close() against our old link
(returned by task_set_comm). We then call do_exit(SIGKILL) to end the
FIXME: Who gets this result? current process. In case either of our mig_send_hshake or mig_do_send
FIXME: unique errors! calls fail, we OMBUG("failed\n"), and return -1. task_remote_wait_expel
is called by the later defined __task_move_to_node to return a remote
task to its home node. This function wraps the previous function, first
requesting permission to return home by sending a REM_BRING_HOME req,
then waiting on a DEP_COMING_HOME reply. If comm_recv fails, or we recv
something other than a DEP_COMING_HOME, we return -1. otherwise, we
call task_remote_expel, returning its result. task_local_send is called
by the later defined __task_move_to_node to send a local task to a
remote host. In it, we first check to make sure the task is not in
FIXME: returning success in case of DDEPUTY state. If it is, we return 0, as this process is already
error! running on a remote node, and does not belong to the local machine to
begin with. Otherwise, we open a new connection using hpc/service.c's
sockaddr_setup_port and hpc/comm.c's comm_setup_connect, then attach
our new connection to the current process with task_set_comm. We set
the current process into DDEPUTY state, and ask permission to send by
FIXME: why do we use hshake here, and sending a HSHAKE_MIG_REQUEST using mig_send_hshake. If our handshake is
req above? successful, we call mig_do_send to perform migration to the remote
node. When mig_do_send returns successfully, the process has been sent
to the remote node, and the local process is now a deputy. We call
FIXME: return errors! make them unique! deputy_startup, and return 0. If either comm_setup_connect,
mig_send_hshake, or mig_do_send returns failure, we remove our DDEPUTY
flag, destroy our link to the remote node (if applicable), and return
0. task_local_bring is called by the later defined __task_move_to_node
to return a remote process to the current node, re-merging it with its
deputy. in it, we first check to make sure the current task is in
FIXME: returning success in case of DDEPUTY state. If its not, we return 0, as this process is already
error! running on the local node. Otherwise, we use obtain_mm to get a new mm
struct. we then make a DEP_COMING_HOME request to the remote end, and
use mig_recv_hshake to receive our reply. Assuming success, we
call hpc/migrecv.c's mig_do_receive to receive the process back,
clear our DDEPUTY flag, and use task_set_comm/comm_close to destroy our
FIXME: unique ombug message for each link. If obtain_mm, mig_recv_hshake, or mig_do_receive return failure,
failure! and unique return codes! we OMBUG("failed\n"), and return -1. task_move_remove2remote is called
by the later defined __task_move_to_node to handle moving a task
between two remote hosts as in moving it from one machine, to another
machine, without ever returning the process to its home node. This
FIXME: STUB! dosent return fail? function is a stub. It just calls OMBUG(), and returns 0.
__task_move_to_node is called by the later defined task_move_to_node,
task_go_home, and task_go_home_for_reason to move a task to a given
node, using the appropriate function from above. First, we set flag
DPASSING on given task, indicating we're going to try and transfer it
somewhere. Then, we check to see if it has a DREMOTE flag. If it does,
and we were given a node to send it to, we call
task_move_remote2remote, accomplishing nothing since that function is a
stub. if DREMOTE is set, but we were not given a node to send it to, we
call task_remote_wait_expel. if DREMOTE is not set, and we were given a
node to send to, we call task_local_send. otherwise, DREMOTE is not
set, and we were given no destination to send a process to, so we call
task_local_bring. after we've called one of these four functions, we
clear the DPASSING flag, and return the error passed to us by the
function we called. task_move_to_node is called by hpc/task.c's
task_request_move, to move a process to the given node. this function
wraps __task_move_to_node, first checking for a stay reason. If there
FIXME: printk! is a stay reason, we printk() an error and return -1. otherwise, we
FIXME: pass on __task_move_to_node's call __task_move_to_node, and return 0. task_go_home is called by
return value! hpc/deputy.c's deputy_process_communication, to send a task home when
we have received a REM_BRING_HOME request. First, we check to make sure
FIXME: PRINTK! the given process is DMIGRATED. If its not, we printk() a warning, and
return -1. otherwise, we call __task_move_to_node, supplying only a
task as an argument, thereby requesting a move to the home node. If the
FIXME: check __task_move_to_node's process is still marked DMIGRATED after calling __task_move_to_node, we
return value! printk! returning 0 in printk() a warning. regardless of outcome, we return 0.
case of error? task_go_home_for_reason is called by code added to
FIXME: what about other arches? arch/i386/kernel/vm86.c's sys_vm86 and sys_vm86old to send a the given
task home, supplying a reason why (DSTAY_86), which is sent to the home
node. first, we check wether the reason flag given is already marked on
FIXME: printk! this process. if it is, we printk() a warning. we then set the given
returning 0 for error! reason flag on the given process, and test wether the process is
DREMOTE or not. if its not, we return 0. otherwise, we call
__task_move_to_node the same way as the previous function, to send the
task home. we check __task_move_to_node's return, and if its not 0, we
clear the stay reason we just set. Finally, we then return the value
returned by __task_move_to_node.
044 omrecv creates hpc/migrecv.c this file is functions for receiving parts of processes, and filling
in the appropriate data structures. it also contains the openmosix
mig_daemon. mig_recv_hshake is called by mig_handle_migration, as well
as task_local_bring in hpc/migctrl.c. it receives a omp_mig_handshake,
and sends a reply, with our OPENMOSIX_VERSION in it. we return -1 if
either the send or recv fail, along with invoking OMBUG with a short
description of what happened. in case of success, we return 0.
bad comment! mig_do_receive_mm is called by mig_do_receive to set the passed task's
mm structure to the values stored in the passed in omp_mig_mm
structure, starting with start_code and ending at env_end. it starts
by OMDEBUG_MIG'ing a trace message, then uses memcpy to push the values
from the passed in omp_mig_mm to the processes given's mm structure.
bad comment! mig_do_receive_vma is called by mig_handle_migration to set up a vm
area in the current process matching the passed in omp_mig_vma
structure's definition. we start by OMDEBUG_MIG'ing a trace message,
including the start address of the vm, and its size. we then have
broken! commented out code, that is supposed to use task_rfiles_get to retreive
the file pointer that was initially associated to this page, in the
case of a page mmap'd from a file. next, we call do_mmap_pgoff to
create the mapping in the mm structure. we supply it with a NULL file
argument, the vm_start, vm_size, and vm_pgoff from the passed in
omp_mig_vma. for the prot argument, we create a long containing the
VM_(READ|WRITE|EXEC) protection flags from the vm_flags in the
omp_mig_vma. for the flags argument, we create a long containing the
VM_(GROWSDOWN|DENYWRITE|EXECUTABLE) behavior flags from vm_flags in the
omp_mig_vma, adding MAP_FIXED and MAP_PRIVATE. we check the result
of do_mmap_pgoff with IS_ERR(result), and if we have an error, we
return PTR_ERR(result). otherwise, we check vm_flags for the
VM_READHINTMASK flag, and if its present, we use sys_memadvise directly
to give the kernel either MADV_RANDOM or MADV_SEQUENTIAL memory access
hints for this page, depending on wether vm_flags is marked VM_SEQ_READ
or not. assuming the IS_ERR earlier didnt cause us to return, we now
return 0 indicating success. mig_do_receive_page is called by
mig_do_receive to receive a page of memory, and map it into the given
task at the appropriate location. like the previous functions, first
we OMDEBUG_MIG a trace message, this time including the address of the
page we're creating. we then use find_vma to find the vm area that
should own this page. if theres no VMA for this page, we OMBUG, then
return -1. otherwise, we allocate memory for the page in userspace
using alloc_page(GFP_HIGHUSER). if that fails, we OMBUG, then return
what about different size pages? -ENOMEM. we kmap the page into kernel space so we can fill it, then
receive a page's worth of data using comm_recv. after comm_recv, we
kunmap the page. to add the page in the task at the correct spot, we
alloc a pte entry pointing to the address we're mapping our page to
(and optionaly entries in the pmd, and pud), then check to make sure
the entry in question has no page already mapped to it. if it does,
we OMBUG() about it. either way, we use set_pte to point this pte to http://kernel.lupaworld.com/downloads/The_Linux_Kernel_Memory_API.pdf
our page, after applying the containing vma's vm_page_prot page
protection flags while converting the address to a pte with mk_pte,
and marking the resulting pte "structure" dirty using pte_mkdirty.
use set_pte_atomic? we then unmap the pte entry from kernel space using pte_unmap.
page_dup_rmap marks the page as in-use by a pte, and inc_mm_counter
marks the mm structure as owning one more page. in case either
comm_recv, pud_alloc, pmd_alloc, or pte_alloc_map return an error, we
free the page we allocated earlier with __free_page, and return -1.
mig_do_receive_fp is called by mig_do_receive to set up the floating
point state of a given task. it first calls OMDEBUG_MIG to print a
current task, or passed task? tracing message, then uses set_used_math() to mark the current(!)
task as one that uses floating point math. we then call the
archetecture specific function for setting up a floating point state.
mig_do_receive_proc_context is called by mig_do_receive to set up
the processor context of the passed in task to the values stored in
the passed in omp_mig_task structure. first, it calls the
archetecture specific handler to set up things like processor
registers and TLS entries, then handles copying the cross-arch items,
like the pid and tgid, the user credentials (uid, euid, suid, fsuid,
gid, egid, sgid, and fsgid),and various signal related members (
blocked, real_blocked, sas_ss_sp, and sas_ss_sp_size). we also copy the
signal handler's 'action'. we have a note about copying an rlimit here,
but no code to go with it. we copy the task's comm and personality
members, then call arch_pick_mmap_layout to set the task's mm structure
up. mig_do_receive is called by task_local_bring in hpc/migctrl.c to
receive a process from a remote node, back into its deputy. to
begin, we use __get_free_page to get a page thats mapped in GFP_KERNEL,
then set the passed in task's state flag to DINCOMING. we clear the
used_math flag, and go into the receive loop. in this loop, we first
receive a req structure. we examine the dlen member of the received
structure to see if it is over a pagesize, and if it is, we BUG_ON,
panicing the system. after that check, we decode req.type, and dispatch
the data received, and the task to operate on to the appropriate
function . the loop ends when we receive a ABORT, the default case, or
case MIG_TASK is called, which is the last stage in migration.
MIG_TASK's case sends a req back along the socket indicating that
migration is complete, clears our state of DINCOMING, flushes the tlb
for this processes mm structure, and returns 0. in case of failure in
our __get_free_page call, either of the comm_recv calls, or any of the
mig_do_receive_* functions that can return failure, we clear the
if __get_free_page fails, is it right DINCOMING flag, free our data page, OMBUG a failure message, and return
to free its result? -1. mig_handle_migration is the function that a newly spawned, ready to
be filled with state task is kicked into by openmosix_mig_daemon, in
order for the task to receive the contents of a task being migrated to
this node. first we OM_VERBOSE_MIG a trace message, then we use
task_set_comm() to setup our link back to the home node. we use
obtain_mm to get a new mm structure for this task, and call
mig_recv_hshake to inform the home that we're ready to start receiving
data. we call mig_do_receive to receive all the process data, re-parent
ourself to init, then run arch_kickstart to jump into the process
(now in a "runable" state). openmosix_mig_daemon is the migration
daemon itsself. first we daemonize ourself with om_daemonize as
"omkmigd". we set a flag marking ourself DREMOTEDAEMON, and
then initialize our socket/socketaddr. we use set_our_addr to
initialize the socketaddr, then comm_setup_listen to open our listening
printk()! socket. if comm_setup_listen returns null, we printk a warning, flush
our signals, mark ourselves TASK_INTERRUPTABLE, schedule_timeout(HZ),
and loop up to just before we called comm_setup_listen, thereby
entering a loop, waiting for comm_setup_listen to work. once we have a
socket, we enter the listening loop. in this loop, we run comm_accept
to attempt to get a channel from a remote kernel. if we get EINTR,
ERESTART, EAGAIN, or ERESTARTSYS, we check for a pending SIGCHLD. if
we get one, we printk a debugging message. either way, we flush our
signals, and re-start the loop. if the error returned by comm_accept
wasn't one of those four errors, and wasn't NULL, we OMBUG a failure
message, close our link, and return to the spot just before we call
comm_setup_listen. if the error is NULL, then we've got a connection.
we then call user_thread, sending the socket as the argument to the
new process. if spawning the new process returns an error, we
close the socket ourselves, either way, when user_thread returns,
we return to the top of loop, and wait for a new connection.
045 omsend creates hpc/migsend.c migsend is the mirror image of migrecv. its responsible for tearing
down a process, and sending it to a remote node over a socket.
mig_send_hshake is called by task_remote_expel and task_local_send in
hpc/migctrl.c to 'handshake' with the remote end, asking permission to
migrate a task. we send a handshake containing the passed in type, the
personality should be checked and OPENMOSIX_VERSION, and the task's personality flags, and see if the
translated, to allow ia32 machines to hshake the remote end replies with is a reply, and has a type that
send to AMD64. matches the one we sent. if it dosent, we OMBUG about it, and return
-1. if either the comm_send or comm_recv fail, we return -1. otherwise,
we return 0 for success. mig_send_fp is called by mig_do_send to send
the floating point state of a given task to the remote end, if
applicable. we call used_math() to check wether the current process
uses floating point, then if it does, we call arch_mig_send_fp to store
the floating point state into a omp_mig_fp structure, and return the
result of comm_send_hd'ing the omp_mig_fp structure to the remote end.
if the process dosent use floating point math, we return 0. mig_send_mm
is called by mig_do_send to send part of the mm structure of the passed
task to the remote end. we copy into a omp_mig_mm structure the part
of the mm of the given task starting at mm->start_code, and ending at
&mm->start_code+sizeof(struct omp_mig_mm), or mm->env_end. we then
return the result of comm_send_hd'ing the omp_mig_mm structure.
mig_send_vma_file is called by mig_send_vmas to add file related data
data to a passed in omp_mig_vma struct, describing the file associated
with the passed in vm_area_struct. first, we set the vm_pgoff member to
the vm_pgoff of the passed in passed vma. then, we set the i_size
does the file have a valid dentry when member to the inode size of the file's dentry. we then check wether
remote? we are running on the remote node, or not. if we are, we set m->vm_file
from inode->u.generic_ip. otherwise, we set m->vm_file from
vma->vm_file, and set m->f_dentry from vma->vm_file->f_dentry.
mig_send_vmas is called by mig_do_send to send the vmas of the passed
process to the remote end. it loops through the vmas, copying start,
flags, and files to a omp_mig_vma struct. we then set size to vma.end -
vma.start. we set vm_pgoff to 0, and call mig_send_vma_file if there
is a file associated with this vm. we send the omp_mig_vma struct to
the remote end, and return to the top of our loop (if there are more
VMAs to send). of our comm_send_hd fails, we break out of the loop, and
return the result. otherwise, we naturally exit the loop when all VMAs
have been sent, returning 0(the result of the last successful
comm_send_hd). mig_send_pages is called by mig_do_send to send all the
readable pages of a given task to the remote node. to accomplish this,
we iterate over all the vma's in the task (starting at task->mm->mmap),
and if we can VM_READ the vma, we loop over each page in the vma,
sending first its address via comm_send_hd, then the page data itsself
using comm_send. if the vma wasn't marked VM_READ, we just skip it,
as its not being used by the 'running code' of the process. if either
comm_send_hd or comm_send fail, we OMBUG a message including the
address we were trying to send, and return -1. otherwise, once all
pages have been sent, we return 0.
slightly different order? synchronise! mig_send_proc_context is called by mig_do_send to send the process'
curent "state" to the remote end. we fill in a omp_mig_task structure,
ptrace is not used by the remote! first pulling in members from the passed in task. first ptrace, then
IDs (pid and tgid). we copy the user credentials (uid, euid, suid, and
groups? fsuid), and group credentials (gid, egid, sgid, and fsgid), then copy
the current signal state (blocked, real_blocked, sas_ss_sp, and
sas_ss_size) along with memcpy()ing the signal handler's action struct
nice, caps.. (task->sighand->action, a struct k_sigaction). we copy the niceness of
the process, and its posix.1e capabilities. we store its capabilities
in task->om.remote_caps, and copy the task's personality. we then copy
our comm structure with memcpy, and use arch_mig_send_proc_context() to
save the archetecture specific parts of the process' state (CPU
registers, etc) to the omp_mig_task structure. we then send the
omp_mig_task state structure via comm_send_hd, then wait for a reply,
check comm_recv's error value! and return 0 if we get one. if comm_send_hd fails, we OMBUG about it,
and return -1. mig_do_send is the wrapper that calls all the above
in the proper order. its called by either task_remote_expel or
task_local_send in hpc/migctrl.c. first we call arch_mig_send_pre to
do any archetecture specific work that needs done before migration.
then, we send the task's components in the following order: MM, vmas,
pages, floating point state, archetecture specific components (via
arch_mig_send_specific), and the processor context. we call
arch_mig_send_post to clean up archetecture specific tweaks done by
arch_mig_send_pre, and returns 0 on success. if any of the mig_send
functions or arch_mig_send_specific fail, we OMBUG about it, and send
shouldnt we call arch_mig_send_post? a MIG_ABORT req to the remote end. we then return -1 for failure.
046 prochpc creates hpc/proc.c this file contains the code that adds a hpc directory to every process
this file is way out of order. reorg. on the machine's proc/$PID/ directory (for controling process
seperate into 'hpc' and 'admin' migration), as well as the code adding the /proc/hpc/admin directory
to control global aspects of openmosix. proc_pid_set_where is called
by openmosix_proc_pid_setattr when a user writes a destination to
/proc/$PID/hpc/where. first we check for the string "home", and if so,
printk, and bad formatting. we printk that we found the "HOME" string, and call
check return of task_register_migration task_register_migration to migrate the process to its home node.
otherwise, we see if we can decode an ipaddress by calling
check return of task_register_migration string_to_sockaddr. if we find one, we call task_register_migration to
migrate the process to the ip we found. if we don't find "home" or an
doing nothing is wrong. ip, we do nothing. either way, we return the size of the string passed
in. proc_pid_get_where is called by openmosix_proc_pid_getattr when a
user requests the node a given process is running on, by reading
/proc/$PID/hpc/where. it writes to a supplied char * the address of the
node the process is running on, or "home" for the home node. first we
check to see if the process is DREMOTE. if it is, we use comm_getname
to get the address of the node a process is running on, use
sockaddr_to_string to write it to the passed in char *, then add a
'\n'. if the process is not DREMOTE, we place the string "home\n" in
the passed in char *. either way, we return the length of the location,
stayreason_string belongs either in a including the \n at the end. the array stayreason_string contains the
header, or in the function that uses it short strings matching the reasons a process might be confined to its
home node, defined in hpc/task.h. "monkey" means a process is using a
writeable memory mapped file. "mmap_dev" means a process has a
character device mmapped. "VM86_mode" means a process is running in
VM86 mode. "priv_inst" is supposed to mean a process is using the
IN/OUT assembly instructions, but is not yet implimented. "mem_lock"
means one of the VMAs or the MM structure belonging to this process are
marked VM_LOCKED. "clone_vm" means either this task does not have a mm
structure, or its mm structure is in use by more than one process.
"rt_sched" means the process has a realtime scheduling priority set.
"direct_io" is meant to mean a process has permissions to access I/O
space, but it is not yet implimented. "system" means the process is
either the init process (pid 1), or one of our openmosix daemons
created with om_daemonize. "extern_1" "extern_2" "extern_3" and
"extern_4" are reasons that are meant to be setable by a userspace
program, along with "user_lock", which indicates that the user has
requested for this process not to be migrated. this array is used
exclusively by proc_pid_get_stay. proc_pid_get_stay is called
by openmosix_proc_pid_getattr. it writes to a passed in char * a string
(taken from staystring) describing each stay reason set in the stay
reason mask, seperated by \n, terminated by \n, and returns the length.
we acomplish this by looping through the 32 possibilities, and testing
for each of them with task_test_stay(). proc_pid_get_debug is called by
openmosix_proc_pid_getattr. it writes to the passed in char * the hex
value of the passed in task's om.dflags member. we return the length of
clean up this string. the string we wrote, including its trailing \n. proc_admin_set_bring
these two functions are STUBs! and proc_admin_set_expel are called by openmosix_proc_admin_setattr
when the user wites to /proc/admin/bring and /proc/admin/expel,
printk()! clean the messages. respectively. they just printk a message, and return the size value
clean up this string. passed in. proc_admin_get_version is called by
openmosix_proc_admin_getattr. it prints into the passed in char * a
string reading "openMosix version: " and then the
OPENMOSIX_VERSION_TUPPLE. to create the entries in our
proc_om_entry_admin structure (of type om_proc_entry) we create a
temporary #define. proc_om_entry_admin contains entries that define the
'files' found in the /proc/hpc/admin/ directory, along with what
function should be called to dispatch data being read/written to that
file. we create an entry for "bring", mapping writes to
proc_admin_set_bring, and reads to proc_admin_get_0. we create an
"expel" entry, mapping writes to proc_admin_set_expel, reads to
proc_admin_get_0. we also add a "version" entry, mapping writes to
proc_admin_set_0, and reads to proc_admin_get_version. in effect, this
makes it where bring and expel can only be written to, and version can
only be read. anything else will be routed to proc_admin_set_0 or
should return -EACCES! proc_admin_get_0, which are later defined to return -EINVAL.
proc_om_entry_pid holds the entries that define the 'files' found in
the /proc/$PID/om/ directories, along with what function should be
called to dispatch data being read/written to that file. we create a
"where" entry, mapping its write to proc_pid_set_where, and its read to
proc_pid_get_where. we create a "stay" entry, mapping write to
proc_pid_set_0, and read to proc_pid_get_stay. finally, we create a
"debug" entry, mapping write to proc_pid_set_0, and read to
proc_pid_get_debug. this makes it where "where" can be read and
written to, whereas "stay" and "debug" can only be read. anything else
will be routed to proc_pid_set_0 or proc_pid_get_0, which return
where is this called from? -EINVAL. openmosix_proc_pid_getattr dispatches requests to the above
functions. it uses proc_om_entry_pid, which contains function pointers
to the appropriate dispatching function. to accomplish this, we iterate
over the members of proc_om_entry_pid, checking if the name of the file
passed in matches the name on this specific entry. if it does, we call
the function for 'get' in the entry. we return the length returned by
the 'get' function, or -EINVAL if we don't find a matching entry.
where is this called from? openmosix_proc_pid_setattr works in much the same way, calling the set
member of the appropriate entry in the array proc_om_entry_pid to
dispatch. it also returns the number of characters successfully
written, or -EINVAL if we don't have an entry for the file in question.
proc_callback_read is called by the later defined proc_om_read_admin.
it calls the appropriate handler for the file its passed (placing its
results in a kernel page), and copies the results from the page passed
to the handler, into the passed in userspace buffer. we start by
using __get_free_page to get a GFP_KERNEL page. if this fails, we
return -ENOMEM. otherwise, we check what filename we were requested to
does not this iteration method require give results for. we find the handler by iterating through the passed
a NULL entry to be last in the array? in "entry" array, looking for a handler based on the name of the file
passed in, and if we find one, we call the pointer to the handler
stored in the entry. if we don't find an entry matching the file in
question, or if the handler returns an error, we free the page in
question, and return either the error returned, or -EINVAL. next we
check to see if the user requested a segment that was beyond the length
of the data being returned (with fseek?). if they requested an offset
greater than or equal to the length, we free our kernel page, and
return 0. finally, we copy the contents of our kernel page into the
passed in buf, taking into account the offset requested, using
copy_to_user. if our copy_to_user returns an error, we free our kernel
page, and return -EFAULT. otherwise, we free our kernel page, and
return the number of bytes copied to the passed in buf.
proc_callback_write is the counterpart to the previous function. its
called by the later defined proc_om_write_admin, to dispatch writes to
shouldnt we return an error while the /proc/hpc/admin/ directory. first, we trunicate the count of data
trunicating? reject first! being written in to PAGE_SIZE. then, we reject writes which have
an offset, indicating they are partial writes. we use __get_free_page
to get a GFP_USER page. if this fails, we return -ENOMEM. we copy the
data to be written using copy_from_user. if this fails, we free our
again, isnt this iteration method kernel page, and return -EFAULT. we iterate through the passed in
broken? om_proc_entry_t structure, looking for an entry matching the file we
have been requested to write to. we use the same pointer tricks as
above to dispatch the call to the right function. if we don't find an
entry matching the given file, we free our page, and return -EFAULT.
otherwise, we free the page we requested, and return the length of
data written. we define an admin subsystem's read and write calls with
some #defines, creating wrapper functions proc_om_read_admin and
proc_om_write_admin, wrapping the previous two functions.
proc_om_admin_operations is a structure mapping .read and .write to
proc_om_read_admin and proc_om_write_admin.
openmosix_proc_create_entry is called by openmosix_proc_init to
register the files in /proc/hpc/admin with the /proc filesystem.
we accomplish this by iterating through the passed in om_proc_entry_t
structure, passing each name and mode to create_proc_entry, along with
the passed in proc_dir_entry pointer. in cases where create_proc_entry
where is this called from? fails, we OMBUG() about it. openmosix_proc_init creates /proc/hpc/,
then creates /proc/hpc/admin (using proc_mkdir).
finally, it calls openmosix_proc_create_entry to create the 'files' in
/proc/hpc/admin/. if either proc_mkdir call fails, we OMBUG about it,
and return.
047 omremote creates linux/hpc/remote.c this file is routines for the remote node to request work done on the
home node. remote_disappear is called by both remote_do_syscall in
this file, and remote_handle_user in hpc/copyuser.c when a syscall
fails. its purpose is to kill the process on the remote end. it just
calls do_exit(SIGKILL), and never returns. remote_inode_map is a
vm_operations_struct mapping just the .nopage member to the normal
filemap_nopage function. remote_file_mmap is used by the rdentry code
in hpc/files.c, as a member of the remote_file_operations structure
contained in a dentry on the remote node. it points the provided
vma's vm_ops structure to use the previously defined remote_inode_map.
remote_readpage is also used by the rdentry code in hpc/files.c, as a
member of the remote_aops structure also contained in a dentry on the
remote node. its called to read a page from the home node, on behalf
of the remote process. first, we map the passed page into kernel space
(so we can write to it). we then get the file * origionally associated
with this page on the home node using rfiles_inode_get_file, and
calculate the offset, placing both these in a omp_page_req structure.
we send this structure to the home node with comm_send_hd, get a reply,
and write the reply directly into the page passed in. we set the page
as being up to date, unmap the page, and return 0 indicating success.
if our comm_recv or comm_send_hd return an error, we OMBUG() about it,
mark the page as NOT up to date, as well as marking it as having an
error, and return the error comm_recv or comm_send_hd gave us.
ppc needs to do this as well! remote_do_mmap is called by both arch/i386/kernel/sys_i386.c's
sys_mmap2, and arch/x86_64/kernel/sys_x86_64's sys_mmap2. its purpose
is to perform a mmap on the home node, on behalf of the remote process.
we place all our arguments into a omp_mmap_req structure, and send them
to the home node with comm_send_hd. we receive a omp_mmap_ret structure
containing file and isize members. we pass these members to
check the result of task_rfiles_get? task_rfiles_get(), and receive back a file pointer. we down_write the
mmap_sem semaphore belonging to the mm structure of this process, and
do our own do_mmap_pgoff() call. in this way, the mmap is done on
both the remote and home nodes. we up_write the earlier semaphore,
and return the result from do_mmap_pgoff. if either our comm_recv or
comm_send_hd fail, we return the error. remote_wait is called by
remote_do_fork, as well as remote_do_execve. it waits for a req struct
from the home node, and checks its type against the passed in value. it
also checks against the passed in len, to make sure the expected amount
of data was sent. if either of these checks fail, we OMBUG about it,
and return -1. remote_do_signal is called by remote_do_comm when the
deputy stub on the home node sends a signal. we comm_recv a omp_signal
printk! structure, printk a trace message, grab the lock for the current
check __group_send_sig_info's error! process' signal handler, and call __group_send_sig_info() to call the
signal handler. we then drop the signal handler lock, and return 0.
remote_do_comm handles incoming communication from the home node, on
behalf of the remote node. its set up to comm_recv a req packet, then
check error returned by dispatch DEP_SIGNAL and DEP_COMING_HOME by calling remote_do_signal or
remote_do_signal and task_remote_expel! task_remote_expel, appropriately. we return 0 after dispatching. if
comm_recv fails, or if we get a req that isn't DEP_SIGNAL or
be specific on type of error! DEP_COMING_HOME, we OMBUG() "failed", call do_exit(-1), and return -1.
shouldnt do_exit kill the process? remote_do_syscall is called by om_sys_remote in hpc/syscalls.c to send
why are we returning even? a syscall request from the remote to the home node. we start by
why arent we using the earlier OMDEBUG_SYS()ing a trace message. then we pack the syscall requested,
remote_disappear function? along with its arguments into a omp_syscall_req structure. we
comm_send_hd this structure, and OMDEBUG_SYS() a trace message. we
call remote_handle_user to handle memory requests from the home node,
on behalf of the syscall handler on the home node. once the home node
tells remote_handle_user its done issuing requests, remote_handle_user
returns. we then comm_recv a omp_syscall_ret structure. after that, we
OMDEBUG_SYS() another trace message, and return the return value packed
into the omp_syscall_ret structure. if comm_send_hd,
remote_handle_user, or comm_recv return an error, we immediately call
never returns! why do we return after remote_disappear, which kills the process, and never returns. after
remote_disappear? calling remote_disappear, we return -1. remote_do_fork performs a fork
call on both the home node and the remote node, connecting the new
printk! processes together. first, we printk() a trace message, and use
sockaddr_inherit to make the sockaddr for our child have the same type
(ipv4 vs ipv6) and address as our parent's sockaddr. we stuff our
passed in clone_flags, stack_start, stack_size, and pt_regs into a
omp_fork_req structure, then open a listening socket using our child's
sockaddr. when comm_setup_listen returns, we use comm_getname to get http://www.linuxjournal.com/article/7660
the address of the node on the other end of our child socket (in a
sockaddr structure), then add our sockaddr structure to our
omp_fork_req structure. we use comm_send_hd to send our omp_fork_req
to the home node (over our parent's connection). we use remote_wait to
wait for a reply, saving the reply in a omp_fork_ret structure. we then
call do_fork ourselves, and use find_task_by_pid() to locate the child.
we set the child's socket to our newly created socket with
task_set_comm, and return the child's PID. if comm_setup_listen,
specific error messages! comm_getname, comm_send_hd, or remote_wait fails, we OMBUG() about it,
printk! and return -1. if find_task_by_pid fails, we printk() about it, and
return -1. count_len is just a cut-and-paste from fs/exec.c. it appears
to count the length of argv members, but abort if there are more argv
members than requested. we return a count of argv members found, and
set a pointer passed in to the length of the argv entries in total.
remote_do_execve performs an execve system call while on the remote
node. first we get the length of the name of the file we're execve'ing
check strlen_user's return! from userspace with strlen_user(), and place it in a omp_execve_req
structure. then, we add argc to our omp_execve_req, by calling
count_len. we tell count_len not to abort if the number of arguments is
greater than the ammount of void * that can fit in (PAGE_SIZE *
MAX_ARG_PAGES - sizeof(void *))/sizeof(void *). we then do the same
thing to fill in the envc member. we copy the passed in pt_regs
structure into our omp_execve_req, then allocate enough space from
GFP_KERNEL to hold our filename, argv, and envp, including newlines
we fill this space by using copy_user to grab the filename, argv, and
envp, terminating each with a newline. we use comm_send_hd to send our
omp_execve_req structure, and comm_send to send the space containing
our filename, argv, and envp. we then free this space. we call
remote_wait to get our response from the home node, and return 0. if
either of our count_len invocations, comm_send_hd, comm_send, or
what about freeing data? remote_wait return an error, we immediately return with that error.
if our kmalloc fails, we return -ENOMEM. if our copy_from_user
invocations fail, we return -EFAULT.
48 creates linux/hpc/service.c this file contains the om_daemonize function, and four sockaddr related
conversion functions. feels like it should be broken up. om_daemonize
creates a kernel thread, and optionally sets a high priority mode.
first we call daemonize(). then, we zero the euid, suid, and gid of
current. we alloc a new group_info with groups_alloc(0). we grab
current->sighand->siglock while emptying the blocked signal set with
sigemptyset(). we lock our task structure, then set a priority. if we
were requested to use a high priority, we use SCHED_FIFO, and set a
rt->priority of 0. we set task_set_stay(DSTAY_RT). otherwise, we set
SCHED_NORMAL, task_clear_stay(DSTAY_RT), and set_user_nice(0).
either way, we set task_set_stay(DSTAY_SYSTEM), then unlock our task
structure. sockaddr_to_string and string_to_sockaddr perform as they
sound, ipv4 or ipv6. sockaddr_setup_port is a wrapper arround the
proper inet_setup_port for ipv4 or ipv6. sockaddr_inherit we saw used
not long ago, to set a sockaddr structure up the same as the sockaddr
does IPV6 not have INADDR_ANY? used to create our current link.
49 creates linux/hpc/syscalls.c this file is callback points from where the current running process
calls a syscall, it gets directed here. om_sys_local dispatches a
syscall request to the local kernel (on the remote node).
om_sys_remote calls remote_do_syscall to dispatch a syscall back to the
home node. om_sys_gettid and om_sys_getpid are broken, but should
return the relevant values from curent->om.tgid and pid, respectively.
finally, om_sys_execve dispatches a request for the execve syscall to
remote_do_execve.
50 creates linux/hpc/task.c this file contains functions that operate on tasks. that deliniation
isn't clear to me, couldnt all this stuff be defined as operating on
tasks? task_set_comm sets the link associated with a task to the passed
in value, returns the old value, and if the new link is flagged
ghosts? SOCK_OOB_IN, we task_set_dreqs(DREQ_URGENT). so far as i can tell, both
these flags are never used anywhere else. task_file_check_stay checks
a given vma to make sure its file mappings dont prenvent being
migrated. it checks to see if the VM_NONLINEAR flag is set indicating
this vm page has a non-linear file mapping in it. if so, we check to
see if prio_tree_empty indicates the inode's i_mmap has no regions. if
there is a single region of the file mmapped in, we add DSTAY_MONKEY
to the stay flags, because we're using a mmapped file access. next, we
check vma->shared.vm_set.list hoping that its empty. if it has
contents, it indicates that we're using a shared memory mapping, and we
add DSTAY_MONKEY to the stay flags. next, we check
vma->vm_file->f_dentry->d_inode->i_mode to see if it is S_ISCHR,
S_ISFIFO, or S_ISSOCK. if it is, we add DSTAY_DEV to the flags, as this
is a fifo, socket, or character device file. we then return the stay
flags. task_request_checkstay dispatches requests for re-evaulation of
a process with reguards to its stay flags. we clear DREQ_CHECKSTAY,
printk a log message, then check to see if theres a reason we can
shouldnt this have DSTAY_MLOCK? clear (DSTAY_PER_MM | DSTAY_CLONE). assuming there is, we lock the
task, clear its flags, and start the process of re-checking for these
two flags. if task->mm is null, we set DSTAY_CLONE. if mm_realusers is
greater than 1, it means multiple processes are using this mm struct.
in that case, we set stay reason DSTAY_CLONE. if mm->def_flags matches
VM_LOCKED, it means that someone has locked a memory page. we set stay
reason DSTAY_MLOCK. if we marked any flags, we propogate these marks
back to the task, unlock the task, and return. task_request_move
dispatches DREQ_MOVE move requests. it clears om->whereto, then calls
task_move_to_node with that value. it frees the previous om->whereto
string. openmosix_task_init initializes all openmosix related members
of a given task. if its pid 1(init), set DSTAY_SYSTEM. if it's parent
is DREMOTEDAEMON, set DREMOTE. if it's parent is DDEPUTY, set DDEPUTY.
init the head of list task->om.files, then return 0.
openmosix_task_exit exits the current task. if its not DDEPUTY or
DREMOTE, its not our process, so return 0. if its our process, call
task_heldfiles_clear. if we're connected, call comm_close to
disconnect. assuming all went well, return 0. task_wait_contact creates
a WAITQUEUE, sets the task to TASK_UNINTERUPTABLE, and calls schedule.
it stays in this loop until om.contact has contents, then removes the
WAITQUEUE entry, sets the task's current state to TASK_RUNNING, and
ends. task_register_migration sets a process up for migration to a
given node. we convert from the sockaddr passed to a string, and stores
this string as whereto send the process. we set the DREQ_MOVE flag,
call wake_up_process(), mark the process as in need of rescheduling,
and return 0. task_do_request checks for DREQ_MOVE, or DREQ_CHECKSTAY,
and dispatches them to task_request_(move|checkstay).
51 i386 creates this file defines ia32 specific cpu feature detection functions, a
linux/include/asm-i386/om.h function returning argument N passed to a syscall, a function returning
the numbe of arguments passed to a syscall, the NR_MAX_SYSCALL_ARG,
and a define for getting the user registers from the task struct.
cpu_has_feature_fxsr was declared earlier, and tells us which i387
register format to use. arch_get_sys_arg is simple, since our arguments
are stored in order in our pt_regs structure. arch_get_sys_nb just
returns regs->eax. NR_MAX_SYSCALL_ARG is set to 6.
ARCH_TASK_GET_USER_REGS uses offsets to get the registers out of the
task->thread_info structure.
52 i386 creates this file contains definitions for the archetecture specific structures
linux/include/asm-i386/om-protocol.h used when transfering a process between ia32 machines. we define
MIG_ARCH_I386_LDT, which is a flag for arch-i386 telling it to transfer
the local descriptor table (which is not commonly changed by a process)
along with migrated processes. the LDT is changed by programs such as
wine and qemu. omp_mig_fp contains the floating point context of a
process. it includes a flag for which format of register save function
was used. omp_mig_arch is supposed to be for archetecture specific
process functionality, but such functionality is still a stub.
omp_mig_arch_task is the structure containing archetecture specific
members of the process context.
53 i386 linux/include/asm-i386/uaccess.h this header handles simple reads and writes to/from userspace. we hpc/uaccess.h
remote-memory modify it so that we use openmosix's deputy_put and deputy_get
functions. the first hunk just includes our header. the second hunk
modifies get_user calls so that deputy_get_user is called if
cleanup line endings, make code blend openmosix_memory_away(). the third hunk modifies the first of the
in better put_user definitions so that deputy_put_user is called if
openmosix_memory_away(). both the second and third hunks use the trick
of just putting an if{}else in front of the current switch statement.
the fourth hunk just modifies the second put_user definition so that
we just return the result of deputy_put_user if openmosix_memory_away.
the fifth hunk defines deputy_put_user64_helper, which is a wrapper for
calling deputy_put_user64 from the following assembly. we then add
logic to this definition of __put_user_size to call deputy_put_user or
deputy_put_user64_helper in cases where we want to copy 8 bytes. we use
the if{}else switch method discussed earlier. the next patch modifies
the second declaration of __put_user_size, with the same if{}else trick
only with an if following. the seventh modifies __get_user_size. the
#ifdef CONFIG_OPENMOSIX elsewhere? eigth patch adds code surrounded in #ifdef CONFIG_OPENMOSIX. it returns
CONFIG_OPENMOSIX policy? the result of deputy_copy_to_user. the nineth adds a
deputy_copy_from_user call surrounded by #ifdef CONFIG_OPENMOSIX.
reorg this code. the last hunk is surrounded bt #ifdef CONFIG_OPENMOSIX, and re-defines
strlen_user to check openmosix_memory_away, and call deputy_strlen_user
when necissary.
54 ppc creates include/asm-ppc/om.h this file contains archetecture specific code for returning a single
passed in argument in a syscall, or finding the number of arguments
passed to a syscall, the maximum number of arguments passable to a
syscall, and a define for retrieving the pt_regs structure associated
with this execution thread. arch_get_sys_arg checks to make sure you're
not requesting an argument greater than 31, then returns gpr[1..32].
arch_get_sys_nb returns gpr[0]>>2. we define NR_MAX_SYSCALL_ARG 7,
and use an index method to get the pt_regs structure out of
task->thread_info.
55 ppc creates include/asm-ppc/om-protocol.h this file contains process state structures that are archetecture
specific. the first strcuture here is the floating point state, which
is properly filled out. the omp_mig_arch and omp_mig_arch_task
structures are, however, empty.
56 x86-64 creates include/asm-x86_64/om.h this file contains archetecture specific functions for returning
arguments to the syscall invocation right before we were called,
returning the number of arguments that were passed to the syscall,
define the maximum number of syscall arguments allowed on this arch,
and a define returning the address of the pt_regs structure in the
thread_info structure for this process. arch_get_sys_arg uses a switch
statement to select the right register, instead of a index based
method like the other two arches. arch_get_sys_nb returns the bottom
half of register rax (32 bits). we define NR_MAX_SYSCALL_ARG to 6, and
use an index based method to get struct pt_regs, which is the last item
in the thread_info struct.
57 x86-64 creates this file contains structures for process state that are archetecture
include/asm-x86_64/om-protocol.h specific. omp_mig_fp holds the floating point context. omp_mig_arch is
a stub as usual, and omp_mig_arch_task holds the segmentation
registers, and the TLS entries associated with this task.
58 x86-64 creates this file contains a simplified version of the modifications in i386's hpc/uaccess.h
include/asm-x86_64/uaccess.h uaccess.h, due to this being a 64 bit archetecture, which makes
operations on 8 bytes at a time much easier. the first hunk includes
hpc/uaccess.h. the second invokes deputy_get_user inside of get_user.
the third invokes deputy_put_user inside of put_user. the fourth
puts deputy_copy_from_user inside __copy_from_user, using a define of
CONFIG_OPENMOSIX to decide wether to compile it in or not. the fifth
puts deputy_copy_to_user inside __copy_to_user, using the same define
as the last hunk.
59 om-core creates include/hpc/arch.h this file prototypes the archetecture specific functions for sending hpc/protocol.h
and receiving process pieces, dispatching a local syscall (after we've asm/om.h
intercepted it), and starting a recently assembled process. worth hpc/syscalls.h
is that we include protocol.h, define almost everything, include om.h
and syscalls.h, THEN define arch_exec_syscall, so we have the
syscall_parameter_t * to pass in.
60 comm creates include/hpc/comm.h this file declares all of the communications subsystem, both the part
that is a wrapper of the sockets interface, and a seperated section
dead code that wraps OM datastructure sending routines. we start by defining
SOCK_INTER_OPENMOSIX, which is unused. then we define SOCK_OOB_IN,
which is tested for, never set, and causes another bit to be set,
which is never tested for. the next nine defines are timeout
definitions used just once, in comm_setup_tcp, or not at all.
MIG_DAEMON_PORT is a duplicate of REMOTE_DAEMON_PORT. the first
grouping of functions is a sockets wrapper set. the second is openmosix
specific shortcut functions. one to setup, one to listen, one to
connect, one for sending large data chunks, and one for sending
messages. after that is a define that belongs in hpc/task.h for
task_set_comm.
61 creates include/hpc/comm-ipv4.h this header contains string_to_inet and inet_to_string functions,
and inet_setup_port, which is a wrapper that just sets sa_in->port
to the passed port.
62 creates include/hpc/comm-ipv6.h this header is similar to the last, it contains string_to_inet6 and
inet6_to_string, and inet6_setup_port, which wraps sa_in6->sin6_port.
63 creates include/hpc/debug.h this header prototypes a series of debugging functions matching the
regex proc_debug_get_(loadinfo|admin|lfree_mem|pkeep_free|nodes)
(which don't exist in our sourcebase), defines arch specific
om_debug_regs, defines debug_mlink, debug_page, and debug_vmas, defines
debug_regs(which dosent exist), and defines some macros for printk.
OMDEBUG_* are defined as macros of OMDEBUG.
64 creates include/hpc/hpc.h this file contains the kernel's API to openmosix. it prototypes all of
the higher level functions of openmosix. user_thread is a function for
creating a user thread. info_startup does not exist. openmosix_proc_*
are part of proc.c. the next 6 functions are part of kernel.c. i might
mention that unlike most headers, this header has a comment of where
an item is located at. the rest of this file belongs to task.c
65 creates include/hpc/mig.h this file prototypes the migration daemon, defines the port we listen
on, and prototypes six migration related functions. the first two,
mig_do_receive and mig_do_send, are entry points into their respective
files. mig_send_hshake and mig_recv_hshake are for establishing a
connection. task_move_to_node and task_expel perform the actual task
of migration.
66 creates include/hpc/omtask.h this file contains the openmosix_task structure. so far as i can tell,
the features[] declared at the and, and thus the first define is
unused. the rest of this file seems to be well commented.
67 creates include/hpc/proc.h header exclusively used by proc.c. the first four functions are not
directly called by name, but instead are defined, because the E()
definition in proc.c will direct operations we've filled in 0 for to
these functions. om_proc_entry is defined to hold an entry in the
/proc/hpc/admin/ directory. it holds the set and get function pointers,
along with the name, mode, length, and type. om_proc_pid_entry is
defined to hold an entry in the /proc/$PID/om/ directory. it holds set
and get function pointers, along with the name, mode, length, and type.
68 creates include/hpc/protocol.h this file defines all the flags, bitmasks, and datastructures
spacing issues associated with the archetecture independant part of the openmosix
inter-kernel wire protocol. its reasonably well commented. note that
DEP_FLG, MIG_FLG, and REM_FLG are not used outside of this header.
REPLY is used all over the place. the next 21 defines use their own 8
bit value, ored with _FLG's 8 bit value (in the 16 bit place), to make
a constant 16 bit value (flag+type). omp_mig_task contains the complete
context of a task, both archetecture independant parts, and a
archetecture dependant structure (with a definition inherited from our
arch/ code.). omp_mig_mm contains values defining the process memory
layout out of the mm struct. omp_mig_vma contains values out of a given
vm_area_struct required to reconstruct that vma on the remote end. its
worth noting that file and dentry don't mean much when remote.
use define for constant 7 omp_syscall_req and omp_syscall_ret are used to perform remote
syscalls. omp_fork_req and omp_fork_ret are used to perform a fork
on home, from remote. omp_usercopy_req and omp_usercopy_emb are part of
the deputy kernel to remote process memory access API. omp_page_req is
a structure containing a request for a page of a file thats open on the
home node, from the remote node. omp_mmap_req holds a request to mmap
out of order? a chunk of a file on the home node, from the remote node.
omp_execve_req passes the arguments for an execve request to the home
node, from the remote node. omp_execve_ret holds the return value from
the home node. omp_mmap_ret returns the results of a mmap request.
omp_signal is used for passing signals both directions.
69 creates include/hpc/prototype.h this file contains debugging related defines, prototypes for the three
top level functions called as a deputy, the definition of the rfiles
API and structures, and the definition of the communications system on
the remote node. the first ifdef block handles turning on and off
wether OM_VERBOSE_MIG does something or not, based on if
CONFIG_OPENMOSIX_MIGRATION_VERBOSE is defined. we then define OMBUG as
a macro of printk(). deputy_die_on_communication is called when the
deputy process gets an error recv()ing data from the remote.
deputy_main_loop is basically main() for the deputy process.
deputy_startup initializes the environment for deputy_main_loop, so
that the next time this process is scheduled, openmosix_pre_usermode
kicks us into deputy_main_loop. om_held_file is the structure
representing a file to the deputy process. list contains a pointer
to the rfiles entry, which is never used. rfile_inode_data holds a
file on remote, with data for sending requests home.
task_heldfiles_add is called in deputy_do_mmap_pgoff to add a file
to our list of managed file pointers on the deputy.
task_heldfiles_clear is called by openmosix_task_exit on process exit,
via a later hook in kerel/exit.c. its role is to safely call fput()
on om_held_file->file. task_heldfiles_find is called by
deputy_do_readpage to find the local om_held_file coresponding to a
remote file *. task_rfiles_get is called on the remote by
mig_do_receive_vma to get a file * to a file created by
rdentry_create_file(). rfiles_inode_get_file is called by
remote_readpage to get a file * from a file->f_dentry->d_inode passed
in. its just a wrapper. at this point theres a break in the file, and
we start prototyping remote_ functions. remote_disappear is in
remote.c. remote_mmap does not exist. remote_do_syscall,
remote_do_comm, remote_do_fork, remote_do_mmap, remote_file_mmap, and
remote_readpage are in remote.c.
70 creates include/hpc/service.h this is the header for service.c. it prototypes all of the functions in
naming of variables in prototypes service.c. sockaddr_to_string and string_to_sockaddr are just
conversion functions. sockaddr_setup_port calls the right
inet6?_setup_port, based on wether we are using ipv4 or ipv6.
sockaddr_inherit fills sa with the same kind of sockaddr as was used in
the creation of passed in socket * mlink. om_daemonize creates a OM
daemon. its used to create omkmigd.
71 creates include/hpc/syscalls.h this is the header for syscalls.c. all of the definitions here are used
in syscalls.c, and only syscalls.c.
72 creates include/hpc/task.h this file contains some init related datastructures, dflags dreqs and
dstay related constants, task_ related function definitions and
prototypes, and two task_ related datastructures. this entire file
is wrapped in CONFIG_OPENMOSIX, and if we're not, defines
OPENMOSIX_INIT_TASK and OPENMOSIX_INIT_MM as comments. otherwise,
OPENMOSIX_INIT_TASK is defined to a structure for initializing the .om
members of the task structure, and OPENMOSIX_INIT_MM is defined to
init the .mm_realusers member of the mm structure. the dflags related
defines seem well commented to me, execpt for DREMOTEDAEMON. the
openmosix migration daemon is marked DREMOTEDAEMON. when it spawns a
child, that child process is automatically marked DREMOTE by the code
in openmosix_task_init. DREQ_URGENT is used once, in code that is
inherited from the 2.4 branch, and does nothing, ATM. DSTAY_PER_MM is
used along with DSTAY_CLONE in task_request_checkstay, as a list of
reasons to check, and flag. task_(clear|set|test)_* could have been
created as a macro. its worth noting that dreqs related wrappers use
atomic_* functions, where the others just use bitmasks on values
directly. task_add_balance_reason does not exist. task_check_stay does
not exist.
73 creates include/hpc/uaccess.h this header is included from include/$ARCH/uaccess.h. it prototypes the
API the kernel uses to access remote memory. its wrapped in
what about user64? CONFIG_OPENMOSIX, and if we're not, defines openmosix_memory_away,
deputy_put_user and deputy_get_user as non-ops. otherwise, we
prototype our deputy_ functions for doing memory access from the home
kernel to a remote process, then we define openmosix_memory_away. the
first if in openmosix_memory_away checks if we're not IN a process. the
second checks if we're in a deputy process.
74 creates include/hpc/version.h defines to createversion tuples, all of which are set to 0. we'll worry
about versions once we have A working version.
75 include/linux/compiler.h here we define OM_NSTATIC and KCOMD_NSTATIC. which plainly mean, "if
KCOMD or OPENMOSIX are defined, don't make these static.". otherwise,
we just define them to nothing.
76 creates include/linux/hpc.h this header just includes two other headers, and defines OM_MM(task),
which we only use in mm/mmap.c. if we're not CONFIG_OPENMOSIX, OM_MM
evaluates to 1(true).
77 include/linux/init_task.h here we add code to call the earlier OPENMOSIX_INIT_TASK and
OPENMOSIX_INIT_MM defines, to initialize openmosix specific members of
trash in patch the task and mm structures.
78 include/linux/net.h this is the last fragment of the code to make sock_alloc an exported
kernel symbol. all we're doing is adding a prototype for it in this
header.
79 include/linux/sched.h here we actually add our members to the task and mm structures. one of hpc/omtask.h
ourfragments is commented, one isnt.
80 DROP include/linux/signal.h name some function parameters, to make it easier to read.
81 kernel-kcom kernel/exit.c this file includes modifications to export the exit_mm and linux/hpc.h
reparent_to_init so that our openmosix code can call them, and code
trash in patch to modify task destruction. the first hunk is bogus. the second hunk
includes our header, and prototypes exit_mm as OM_NSTATIC. the third
declares reparent_to_init as OM_NSTATIC. the fourth declares exit_mm
as OM_NSTATIC and delays the call of mm_release until later. the
fifth calls openmosix_task_exit, which completes the delayed call of
mm_release earlier.
82 kernel/fork.c this is where we modify mm_init, dup_mm, copy_mm, copy_process. the linux/hpc.h
first hunk includes our header. in the second hunk, we change mm_init
to also initialize the mm_realusers member of the mm structure. in hunk
three, we modify dup_mm so that on success of creating a new mm struct
for the current task, we remove its DSTAY_CLONE flag. in the fourth
hunk, we modify copy_mm so that if we were asked for CLONE_VM, we mark
the old vm as having another user before using it for the new process.
in the fifth hunk, we modify copy_process to call openmosix_task_init
to init the openmosix members of the newely created task structure. the
whitespace cleanups next patch adds our call to openmosix_pre_clone at the top of do_fork,
the one after that adds our call to openmosix_post_clone at the end of
a successful run through do_fork, before the return.
83 kernel-kcom kernel/sched.h make task_rq_lock and task_rq_unlock OM_NSTATIC.
84 MAINTAINERS add Vincent to the MAINTAINERS file.
85 config Makefile three changes. one adds -om to the end of a kernel name in a debian
kernel-package incompatible way. the second adds a rule for running an
unsparse program, that i believe is remnant of 2.4, and can be removed.
the third adds hpc to the list of core-y directories to build.
86 mm/mlock.c we modify mlock.c for updating stay flags. sys_mlock marks a process
DSTAY_MLOCK, and sys_munlock unmarks it, sys_mlockall marks a process
DSTAY_MLOCK, and sys_munlockall unmarks it.
87 mm/mmap.c this file gets updated to update the stay reasons when mmap related linux/hpc.h
events occur. the first hunk includes our header. the second hunk
causes __remote_shared_vm_struct to notify openmosix when removing the
last node in a shared memory segment, or when we are both parent and
head of the vm list (there are no other shared users). the third adds
the stay_reason variable. the fourth, fofth, and sixth make sure
there is a MM struct associated to this task before accessing it in
do_mmap_pgoff. the seventh marks us DSTAY_MONKEY if the mmap is
writable, DSTAY_MONKEY if i_mmap_writable, and DSTAY_DEV if S_ISCHR.
the eigth calls deputy_do_mmap_pgoff if all the above tests cleared.
the nineth passes stay_reason to stay_me_and_my_clones. the last one
whitespace? makes it where get_unmapped_area always returns PAGE_ALIGN pages.
88 net/socket.c the last of the code exporting sock_alloc so kcom can use it.
1 hpc/copyuser.c documentation updates. the first hunk adds docs for three arguments to
deputy_copy_from_user. the second adds docs for three args to
deputy_strncpy_from_user. the third adds three docs for three args to
deputy_copy_to_user, but the third argument is named wrong.
2 hpc/kcomd.c documentation updates. the first hunk adds a comment labeling and
describing socket_listen. the second and third fix spacing issues,
someone was using tabs. the fourth labels and fully describes
accept_connection. the fifth documents data_write, which is a stub.
the sixth and seventh are more tab updates. the eigth starts with a
spacing update, then adds a label and description for kcomd_thread.
the nineth is another spacing update.
3 hpc/migsend.c the changes in this patch document the mig_send_* and mig_do_send
functions. the first hunk adds a label and description to mig_send_mm.
it mentions that mig_send_mm waits for acknowledgement. the second
hunk adds a label and description to mig_send_vmas. no acknowledgement
label is present. the third hunk adds a label and description to
mig_send_pages, again with no acknowledgement label. the fourth hunk
adds a label and oneline description to mig_send_proc_context. the
fifth adds a well formatted one line description and label to
mig_do_send.
openmosix-cleanup.patch
1 kcomd hpc/kcomd.c the first hunk corrects error handling in socket_listen so that
received datastructures are properly destroyed. the second hunk
corrects error handling, and adds error messages in each error
one missing error condition. it also adds code to inherit the ops and type of our
message passed socket into the sock dedicated to this connection, and
code to retreive the address of the peer we're talking to.
we're still not checking the peer's address.
2 om-rmem include/hpc/hpc.h prototype remote_handle_user, declared in copyuser.c.
3 kcore include/hpc/mig.h prototype reparent_to_init, which is part of kernel/exit.c
4 om-remote include/hpc/migrecv.c prevent gcc warning.
5 kcore include/net/socket.h prototype sock_alloc so we can use it elsewhere.
6 kcore-DROP include/linux/compiler.h include linux/config.h. what was the purpose of this?
7 kcore linux/net/socket.c change sock_alloc's declaration so it is no longer static,
period.
openmosix-kcomd-base-functions.patch
001 hpc/kcomd.c this patch is the first in a series of patches rebuilding kcomd. we <linux/inet.h> <hpc/kcom.h> <hpc/prototype.h>
start by adding three headers, and declaring a global variable,
indicating wether kcomd is running. the next hunk removes code that
shouldnt be in a .c file, namely the kcom_pkt, kcom_node, and kcom_task
structures, the kcom_nodes list and its lock, socket_fds,
socket_fds_bitmaps, and maxfds. it also removes the helper function
alloc_fd_bitmap, kcom_pkt_create, all of the *kcom*node*
list functions, the comm_simple stub, and the prototypes for the
non-existant functions comm_ack, comm_iovec, and comm_iovec_ack.
the third and fourth hunks change the kcom_node_add call in
accept_connection to return a node pointer, and store the address of
the remote end in the node pointer. the next hunk creates the
data_send, data_exception, append_in_packs, pkt_read, functions,
destroying data_read, dispatch, kcom_task_create, kcom_task_delete,
__kcom_task_find, kcom_task_find, and kcom_task_send functions. we
also flesh out data_write. data_send first marks down the time it
similar to comm_send? starts, then uses sock_sendmsg to send the passed in kcom_pkt structure
and its size to the remote end. we then check the size of the data
member pointed to by the kcom_pkt, and if its less than 32, copy it
into a 32 byte buffer, and send that buffer and a length of 32 to the
remote end. otherwise, we send kcom_pkt->data and its length to the
bad error message formats remote end. our while loops for sending are wrapped to use KERNEL_DS,
and restore fs to its saved state upon exiting the while. after exiting
the while loop that sends the data, we mark down the time. notice that
we don't do anything with our time measurements. we return the ammount
of data written on success (not including the kcom_pkt wrapping it).
data_exception is supposed to clean up in case of dropped connection.
according to the comment, its broken, and its free calls are commented
out. append_in_packs places a passed kcom_pkt into the queue belonging
to the task the packet is marked as destined for. it examines the
passed kcom_pkt->type to determine wether this packet was created on
behalf of a deputy process or a remote process, and places the packet
in the queue belonging to rpid or hpid, respectively. it then wakes
typoes! up the process in question. pkt_read is called to read a packet and
either place it in a queue to a destined process
(with append_in_packs), or dispatch it immediately due to it being a
error handling. migration related request (go home, come home, init). data_write
iterates through each task that has a process on the passed in node,
and uses data_send to send pending packets. after send, packets have
spacing. their memory free'd. there is no error checking in this function.
the next two hunks perform a major overhaul on the kcomd_thread
function, add two flags to the kernel_thread invocation that creates
the kcomd_thread, and fleshes out kcomd_exit. the changes to
kcom_thread start by utilizing kmem_cache_create to create several
caches that are never used (but are properly destroyed later). we block
all signals to the current process right after calling daemonize,
add code to alloc_fd_bitmap just once outside the while loop, and move
some variables out of the while loop, into the top of the function.
inside of the while loop, we've disabled our locking functions around
way too much commented out code kcom_nodes_lock, and we've inserted a lot of debugging code thats
commented out. we're measuring the time the while loop takes to
complete, and starting the do_select part of the function. we set up
to measure the time used, and we insert a new method of using
do_select. we enable SIGHUP, and sleep until we get it from kcom_send.
we insert much better error handling code, and dynamically alocate a
fd pointing to the socket for the node. our bit testing section has
been completely re-written, and we actually clean up on exit of kcomd.
we add CLONE_FS and CLONE_FILES to the kernel_thread call in
kcomd_init, and flesh out kcomd_exit by setting a global variable and
sending SIGHUP if we can find the kcomd task.
002 config hpc/Makefile add kcom.o to our list of object files.
003 omcore include/hpc/protocol.h our first hunk just corrects a spacing issue. the secod hunk re-defines
how we set our constant flags. in general, its a nice cleanup, but
could use more docs. the third hunk is noise. drop.
004 omcore include/hpc/prototype.h we add a whole bunch of declarations to functions we don't have, and
some we do, and some we just added. no suprise, since this patch is
part of a set.
005 ommig include/hpc/task.h change the prototype to task_register_migration, so that we no longer
require a sockaddr, just a task.
006 kcore net/socket.c EXPORT_GPL(sock_alloc) if CONFIG_KCOMD or CONFIG_KCOMD_MODULE
007 kcore fs/select.c EXPORT_GPL(do_select) if CONFIG_KCOMD_MODULE
008 kcore include/linux/compiler.h don't static functions defined KCOMD_NSTATIC of CONFIG_KCOMD or
CONFIG_KCOMD_MODULE
openmosix-kcomd-migsend-to-kcomd.patch
001 linux/hpc/migsend.c our first hunk just adds some headers, the second is spacing related
noise, drop. the third changes mig_send_fp to use kcom_send_with_ack
instead of comm_send_hd. the fourth hunk re-writes mig_send_mm to use
kcom_send_with_ack, only it also stops using a omp_mig_mm structure,
and instead just relies on sizeof(omp_mig_mm). the fifth hunk changes
mig_send_vmas to preserve vm_pgoff during transmission, and use
kcom_send_with_ack instead of comm_send_hd. the sixth changes
mig_send_pages to allocate a page of memory, copy our data there, and
no error checking. send from that buffer using kcom_send_with_ack. the next two hunks swap
out comm_recv with kcom_send_with_ack inside of mig_send_proc_context.
chunk nine uses kcom_send_with_ack at the top of mig_do_send to request
permission to migrate a process, before jumping in to sending. the
final patch printk's when a process migrates successfully, and changes
the fail_mig routine to print an error, and not to send anything to the
remote end if we fail to migrate.
openmosix-kcomd-proc-to-kcomd.patch
001 omproc hpc/proc.c the first hunk adds three headers. two for inet related functions, one <linux/inet.h>
for kcom. the rest of the file changes both proc_pid_set_where, and <linux/in.h>
proc_pid_get_where functions. in proc_pid_set_where, the first real
difference is that instead of just printing home detected, and
reacting, we print "HOME detected - on deputy node" and react based on
wether we are the deputy or the remote as to how we get migration
accomplished. home on deputy sends MIG_COME_HOME to tsk on remote. home
on remote node calls task_register_migration. IP on deputy is broken,
IP on remote calls task_register_migration. proc_pid_get_where is
modified so that instead of using comm_getname and sockaddr_to_string,
examine portability. we use variable assignments, and bitmask tricks with sprintf.
openmosix-kcomd-remote-to-kcomd.patch
001 remote hpc/remote.c the first hunk includes two header files. the second hunk revamps <hpc/kcom.h>
remote_do_signal, making it accept a packet in its parameters, instead
trash of trying to comm_recv a packet off the queue. we also transmit an
acknowledgement packet, waking up kcomd with SIGHUP to do so. except
that we're not looking for kcomd, and don't declare the variable
we break things! kcomd's pid is stored in. therefore, -EBROKENCODE. the next hunk
disables calling remote_do_signal inside of remote_do_comm. the last
two hunks change remote_do_syscall so that we transmit our packet by
calling kcom_send_with_ack, and setting ourself to TASK_INTERRUPTABLE.
we call remote_handle_user to dispatch memory requests from the home
node (unless this syscall is exit, in which case we just exit). when it
returns, it returns with the result of our syscall. we return this
result.
openmosix-kcomd-task-to-kcomd.patch
001 om hpc/task.c our first hunk includes the kcom.h header. the second hunk changes our <hpc/kcom.h>
task_move_to_node invocation in task_request_move to not clear
use omdebug() instead of printk om.whereto or free its memory. the third hunk adds a debugging printk
spacing at the top of openmosix_task_init. the fourth hunk is just a spacing
fix, drop. the fifth hunk starts with a spacing fix in
openmosix_task_exit, but continues on to change from just clearing
heldfiles and closing connection, to dumping stack, calling
kcom_task_delete(), clearing heldfiles, and freeing task->om.whereto.
the final hunk changes task_register_migration so that it no longer
accepts a destination as a parameter, dosent mess with
task->om.whereto, and is exported via EXPORT_SYMBOL_GPL.
openmosix-kcomd-migctl-to-kcomd.patch
001 omrecv hpc/migrecv.c our first and second hunk just include three headers for us. the third
patch adds the mig_do_receive_home, and mig_do_receive_init functions,
;; using EXPORT_SYMBOL_GPL to export them. we create mig_do_receive_home,
which is a function for acheiving a move from a remote home back to
the home node. we are given a packet via a passed in argument, and
check if it is marked PKT_NEW_MSG. if it is, we assume this is the
spacing home node being migrated to, send a PKT_ACK packet, call
task_register_migration and return 0. if the packet passed in was not
marked PKT_NEW_MSG, we assume this is the remote node being migrated
from, and call wake_up_process on the task in question. we return
spacing 0 for success, -1 for failure. the mig_do_receive_init function is
called by kcomd with a MIG_INIT packet, to do the "work" of setting
up a process on the current (remote) node on behalf of a remote node.
first, we check to see if the packet passed via passed in argument was
error handling! marked PKT_NEW_MSG. if it isn't, we just return 0. if it is, we begin
constructing our response packet, and check wether this is migration
why treat loopback differently? via loopback, defined as 127.0.0.1. if it is, we use
why loopback migrate at all? kcom_home_task_find to find the kcom_task structure associated with
what if loopback isnt 127.0.0.1? ipv6? the origional process. otherwise, we use kcom_task_create to make a
new task, and return its kcom_task structure, and we copy the PID of
the task on the home node from our MIG_INIT packet to kcom_task->hpid.
many comments indicating this code we delete the packet we were called with, and call user_thread to
needs help! handle migration (via mig_handle_migration), and wait for it to set
a variable we're spinning on. once that variable is set to non-zero,
if its greater than zero, its the PID of our new process, after
migration has completed. if its negative, something went wrong, and we
send a NACK flag, indicating failure, and return -1. assuming PID was
positive, we set the rpid member of our kcom_task, and send a ACK
packet back to the home node, indicating success, and telling it what
the PID of the new process to talk to is. we then return 0.
mig_do_receive_mm gets a bit of a facelift, using a passed in packet,
when should we down_write? wrapping the actual mm modification in down_write and up_write, and
sending a response with kcom_send_ack. it also gets
EXPORT_SYMBOL_GPL'd. mig_do_receive_mm_area gets renamed to
spacing issues in patch! mig_do_receive_vma, and a facelift. the first obvious changes are that
we now use a passed in packet, and instead of using the given vm_flags,
we mark pages RWX. we've added code to check the response from
sys_madvise, and if it returns nonzero, we kcom_send_nack, and return
the result. otherwise, we kcom_send_ack, and return 0. this function is
also EXPORT_SYMBOL_GPL'd. mig_do_receive_page is adjusted so that it
accepts a passed in packet, sends a kcom_send_nack in case of failure,
uses alloc_zeroed_user_highpage instead of alloc_page, and so that we
use kcom_send_ack in case of success. this function is not
EXPORT_SYMBOL_GPL'd. mig_do_receive_fp gets a similar treatment,
receiving a passed packet, sending acknowledgement with
kcom_send_ack, and returning 0. its also not EXPORT_SYMBOL_GPL'd.
mig_do_receive_proc_context is modified to receive a passed in packet,
use the sys_set family of functions to set members of the task_t
set_personality gets p from where? structure related to id/credentials, use set_personality instead of
touching p->personality, send an ack using kcom_send_ack, and return 0
in case of success. mig_do_receive is completely re-written, starting
off by sitting and spinning, waiting on kcomd to fill in the mytsk
pointer for this structure (which is never done!). the rest of the
function now initializes om.whereto if DREMOTE, sets us to
TASK_INTERRUPTABLE, and enters a while(1) loop. in this loop, we
look for incoming packets and dispatch, just like the old version of
this function. at the end of the loop, we print a message, and
reschedule, so that kcomd can run (and thus feed us packets). the last
function in this patch is mig_handle_migration. this function is
started by the user_thread call in mig_do_receive_init, and is the
"top" of the newly created process. we start by re-parenting to init,
calling obtain_mm, setting ourselves to DREMOTE, then telling
mig_do_receive_init our pid. after that, we jump into the
mig_do_receive function to receive all our process state. we set
ourselves to TASK_RUNNING, call schedule, print a message saying we're
starting the new process, clear_thread_flag (TIF_SIGPENDING), and call
arch_kickstart to jump into the new process. we add some test code just
in case arch_kickstart returns, and call do_exit(SIGKILL) if we run
into errors.
openmosix-kcomd-remote-preuser-to-kcomd-api.patch
001 om hpc/kernel.c the first hunk just includes two headers. the second changes <hpc/kcom.h>
remote_pre_usermode to dispatch packets containing requests for signals
remote_do_signal returns ? to remote_do_signal and delete them before returning 0. the last hunk
removes the code kicking off the openmosix_mig_daemon, making
openmosix_init into a stub returning 0.
openmosix-kcomd-move-copy-to-user-to-kcomd-api.patch
001 rmem hpc/copyuser.c our first hunk just includes two headers. the second and third hunks <linux/in.h>
change deputy_copy_from_user to use kcom_send_with_response, instead of <hpc/kcom.h>
error handling? using comm_send_hd and then comm_recv to get a response. the fourth
the sizeof in the kzalloc looks funny. hunk changes deputy_strncpy_from_user to use kcom_send_with_response to
no free of u? error handling? send its request, and receive the data from the remote end. the next
hunk changes deputy_copy_to_user so that instead of sending two packets
containing our request to the remote end(one with comm_send_hd, the
other with comm_send), and not getting a response, we now send one
error handling! large packet with kcom_send_with_ack, and get an acknowledgement from
the remote end. hunk six changes deputy_strnlen_user to use
error handling! kcom_send_with_response, instead of using comm_send_hd, and comm_recv.
error handling! hunk seven and eight change deputy_put_userX to use kcom_send_with_ack
instead of just comm_send_hd. hunk nine changes deputy_get_userX to use
error handling! kcom_send_with_response instead of comm_send_hd and comm_recv.
bad comment. remote_copy_user gets broken into two functions, remote_copy_from_user
and remote_copy_to_user. remote_copy_from_user creates a buffer
kfree()? allocated via kmalloc(GFP_KERNEL), uses copy_from_user to fill it, and
replies with the contents via kcom_send_resp. we return the result of
the copy_from_user call. remote_copy_to_user just calls copy_to_user
and sends an ack with kcom_send_ack, returning the result of the
copy_to_user call. remote_strncpy_from_user is changed to accept a
passed in packet, send its reply with kcom_send_resp, and return the
result from strncpy_from_user. remote_strnlen_user has been modified to
accept a passed in packet, create a buffer, send that buffer with
kcom_send_resp, and return 0. remote_put_user is changed to accept a
passed in packet, and send an acknowledgement using kcom_send_ack.
remote_get_user is changed to accept a passed in packet, create a
buffer, fill that buffer with get_user, send a reply with
spacing! kcom_send_resp, and always return 0. remote_handle_user gets a
re-write, basically performing like the previous rendition, except for
accepting a passed packet, setting ourself TASK_INTERRUPTABLE,
and completely new code for handling a SYSCALL_DONE packet which is our
exit path out of this loop. it wakes up kcomd after attaching a newly
created packet to our out packets. it then deletes our passed packet,
sets us to TASK_RUNNING, calls schedule, and returns the result of the
syscall (given in the passed in packet).
openmosix-kcomd-move-deputy-to-kcomd-api.patch
001 omhome hpc/deputy.c the first hunk just adds our kcom header. the second changes
deputy_do_syscall to accept a passed in packet, and immediately reply
why not kcom_send_ack? with an acknowledgement packet. we use kcom_send_with_ack to send our
compare and contrast this manual ack response (the result of the syscall) to the remote node, instead of
creation with kcom_send_ack. comm_send_hd, and remove a debugging message, making our debugging
slightly less verbose. the third hunk expands deputy_do_sigpending,
or attempted to, but failed, and still is a stub that calls do_signal,
only this stub then prints a message, and de-queues all pending
signals. the next hunk comments out completely the
deputy_process_communication function. the last hunk changes
deputy_main_loop so that instead of just checking comm_wait, and
dispatching to deputy_process_communication, we spin on our incoming
packet queue, and when a packet arrives, we check if its a syscall,
and dispatch it if so. while we're spinning on packets, we call
deputy_process_misc before rescheduling ourself.