The goal of the mmview extensions is to provide means to create additional address spaces, called views in the userspace. The views are selectable on the thread level and are generally kept in sync with the original address space. Users of the extension can designate specific regions as copy-on-write, allowing them to diverge between views or threads.
long mmview(MMVIEW_CREATE)
Creates an additional address space view.
Returns id on success. On failure, -1 is returned and errno is set appropriately:
ENOMEM
Out of memory. long mmview(MMVIEW_DELETE, long id)
Marks the address space view referenced by
id
for deletion. Its resources will be freed after the last thread stops using it.Returns 0 on success. On failure, -1 is returned and errno is set appropriately:
EPERM
Address space view referenced by id
is the base view.EINVAL
Address space view referenced by id
does not exist.long mmview(MMVIEW_CURRENT)
Returns id of the current address space. No errors.
long mmview(MMVIEW_MIGRATE, long id)
Migrates current thread to the address space view referenced by
id
.Returns id of the previous address space view on success. On failure, -1 is returned and errno is set appropriately:
EINVAL
Address space view referenced by id
does not exist.long mmview(MMVIEW_UNSHARE, void *addr, size_t len)
long mmview(MMVIEW_SHARE, void *addr, size_t len)
Designates memory pages containing any part of the address range in the interval
[addr, addr+len-1]
as shared or unshared. Must only be called with single view present.addr
must be aligned to a page boundary.Returns 0 on success. On failure, -1 is returned and errno is set appropriately:
EPERM
Calling process has additional views. EINVAL
An invalid address range has been specified. EACCES
Memory contained in the interval cannot be unshared. long mmview(MMVIEW_SWITCH_BASE)
Appoints the current view to be the new base view of the process. This can be used to optimize memory accesses, or to free the memory belonging to the previous base view by allowing its deletion.
Returns id of the current view on success. On failure, -1 is returned and errno is set appropriately:
ENOMEM
Out of memory.
System call wrappers along with the test suite can be found in the mmview-tests repository.
Due to the system call interface getting out of control after introduction of new features, the mmview functionality is now re-exported under a single system call with the first parameter specifying the requested procedure.
After fork()
or deletion of other mmviews, it might be desired to redesignate
some areas as shared again. This is now possible.
The base mmview is the initial address space, used as the master copy for shared areas in other views. Changing the base view is transparent to the application, but enables some techniques, which were not possible before (e.g., deletion of the original address space). Another side effect is the reduced overhead of page faults in the base mmview.
Updating the kernel tree revealed a few bugs, some of which were previously non-reproducible:
- Page fault handler returning with
VM_FAULT_RETRY
drops themmap_lock
. In case the fault is initiated in base mm the lock has to be retaken. - Streamlined creation of new mms, ensuring that mms without
common
are not exposed to the outside world. - Fixed broken anonymous
VM_SHARED
mappings, where a separate file was created for each view inmmap_region
. - Zapping instead of refaulting after swap-in of shared mmview pages achieved a modest speed improvement during swapping.
- Fixed race condition where a concurrent fault could restore the old page after zapping.
- No more redundant zapping such as when reusing a page or managing software dirty/young bits.
- A new reference counter in
struct page
tracks the amount of page table entries referencing the page from a non-base view. This allows reliably detecting cases where a shared mmview page can be reused on write fault.
- Converted types to allow clean compilation on 32-bit architectures.
- Fixed invalid usage of high memory mappings in the pagefault handler.
- Modified pagefault handler to account for architectures with software emulated dirty/young bits.
copy_page_range
called during fork
or mmview_create
system calls now uses
pagewalk API to write-protect the views in the COW-case. This change has
allowed proper usage of the secondary mmu notifier, which is required when
modifying protection bits of the PTEs.
Structures facilitating coredumping need to be shared among threads which wasn’t the case if threads used different views.
Reverse mapping structures (anon_vma
) were missing for non-base views, which
prevented swap-out of pages referenced by these views.