OpenBSD manual page server

Manual Page Search Parameters

UVM(9) Kernel Developer's Manual UVM(9)

uvmvirtual memory system external interface

#include <sys/param.h>
#include <uvm/uvm.h>

The UVM virtual memory system manages access to the computer's memory resources. User processes and the kernel access these resources through UVM's external interface. UVM's external interface includes functions that:

In addition to exporting these services, UVM has two kernel-level processes: pagedaemon and swapper. The pagedaemon process sleeps until physical memory becomes scarce. When that happens, pagedaemon is awoken. It scans physical memory, paging out and freeing memory that has not been recently used. The swapper process swaps in runnable processes that are currently swapped out, if there is room.

There are also several miscellaneous functions.

void
(void);

void
uvm_init_limits(struct proc *p);

void
uvm_setpagesize(void);

void
uvm_swap_init(void);

The () function sets up the UVM system at system boot time, after the copyright has been printed. It initialises global state, the page, map, kernel virtual memory state, machine-dependent physical map, kernel memory allocator, pager and anonymous memory subsystems, and then enables paging of kernel objects. uvm_init() must be called after machine-dependent code has registered some free RAM with the uvm_page_physload() function.

The () function initialises process limits for the named process. This is for use by the system startup for process zero, before any other processes are created.

The () function initialises the uvmexp members pagesize (if not already done by machine-dependent code), pageshift and pagemask. It should be called by machine-dependent code early in the pmap_init(9) call.

The () function initialises the swap subsystem.

int
uvm_map(vm_map_t map, vaddr_t *startp, vsize_t size, struct uvm_object *uobj, voff_t uoffset, vsize_t alignment, uvm_flag_t flags);

int
uvm_map_pageable(vm_map_t map, vaddr_t start, vaddr_t end, boolean_t new_pageable, int lockflags);

int
uvm_map_pageable_all(vm_map_t map, int flags, vsize_t limit);

boolean_t
uvm_map_checkprot(vm_map_t map, vaddr_t start, vaddr_t end, vm_prot_t protection);

int
uvm_map_protect(vm_map_t map, vaddr_t start, vaddr_t end, vm_prot_t new_prot, boolean_t set_max);

int
uvm_deallocate(vm_map_t map, vaddr_t start, vsize_t size);

struct vmspace *
uvmspace_alloc(vaddr_t min, vaddr_t max, boolean_t pageable, boolean_t remove_holes);

void
uvmspace_exec(struct proc *p, vaddr_t start, vaddr_t end);

struct vmspace *
uvmspace_fork(struct vmspace *vm);

void
uvmspace_free(struct vmspace *vm1);

void
uvmspace_share(struct proc *p1, struct proc *p2);

int
UVM_MAPFLAG(vm_prot_t prot, vm_prot_t maxprot, vm_inherit_t inh, int advice, int flags);

The () function establishes a valid mapping in map map, which must be unlocked. The new mapping has size size, which must be in PAGE_SIZE units. If alignment is non-zero, it describes the required alignment of the list, in power-of-two notation. The uobj and uoffset arguments can have four meanings. When uobj is NULL and uoffset is UVM_UNKNOWN_OFFSET, uvm_map() does not use the machine-dependent PMAP_PREFER function. If uoffset is any other value, it is used as the hint to PMAP_PREFER. When uobj is not NULL and uoffset is UVM_UNKNOWN_OFFSET, uvm_map() finds the offset based upon the virtual address, passed as startp. If uoffset is any other value, we are doing a normal mapping at this offset. The start address of the map will be returned in startp.

flags passed to () are typically created using the () macro, which uses the following values. The prot and maxprot can take the following values:

#define UVM_PROT_MASK   0x07    /* protection mask */
#define UVM_PROT_NONE   0x00    /* protection none */
#define UVM_PROT_ALL    0x07    /* everything */
#define UVM_PROT_READ   0x01    /* read */
#define UVM_PROT_WRITE  0x02    /* write */
#define UVM_PROT_EXEC   0x04    /* exec */
#define UVM_PROT_R      0x01    /* read */
#define UVM_PROT_W      0x02    /* write */
#define UVM_PROT_RW     0x03    /* read-write */
#define UVM_PROT_X      0x04    /* exec */
#define UVM_PROT_RX     0x05    /* read-exec */
#define UVM_PROT_WX     0x06    /* write-exec */
#define UVM_PROT_RWX    0x07    /* read-write-exec */

The values that inh can take are:

#define UVM_INH_MASK    0x30    /* inherit mask */
#define UVM_INH_SHARE   0x00    /* "share" */
#define UVM_INH_COPY    0x10    /* "copy" */
#define UVM_INH_NONE    0x20    /* "none" */
#define UVM_INH_DONATE  0x30    /* "donate" << not used */

The values that advice can take are:

#define UVM_ADV_NORMAL  0x0     /* 'normal' */
#define UVM_ADV_RANDOM  0x1     /* 'random' */
#define UVM_ADV_SEQUENTIAL 0x2  /* 'sequential' */
#define UVM_ADV_MASK    0x7     /* mask */

The values that flags can take are:

#define UVM_FLAG_FIXED   0x010000 /* find space */
#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */
#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries */
#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag */
#define UVM_FLAG_AMAPPAD 0x100000 /* bss: pad amap to reduce malloc() */
#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock map */
#define UVM_FLAG_HOLE    0x400000 /* no backend */

The UVM_MAPFLAG macro arguments can be combined with an or operator. There are several special purpose macros for checking protection combinations, e.g., the UVM_PROT_WX macro. There are also some additional macros to extract bits from the flags. The UVM_PROTECTION, UVM_INHERIT, UVM_MAXPROTECTION and UVM_ADVICE macros return the protection, inheritance, maximum protection and advice, respectively. () returns a standard errno.

The () function changes the pageability of the pages in the range from start to end in map map to new_pageable. The () function changes the pageability of all mapped regions. If limit is non-zero and () is implemented, ENOMEM is returned if the amount of wired pages exceed limit. The map is locked on entry if lockflags contain UVM_LK_ENTER, and locked on exit if lockflags contain UVM_LK_EXIT. uvm_map_pageable() and uvm_map_pageable_all() return a standard errno.

The () function checks the protection of the range from start to end in map map against protection. This returns either TRUE or FALSE.

The () function changes the protection start to end in map map to new_prot, also setting the maximum protection to the region to new_prot if set_max is non-zero. This function returns a standard errno.

The () function deallocates kernel memory in map map from address start to start + size.

The () function allocates and returns a new address space, with ranges from min to max, setting the pageability of the address space to pageable. If remove_holes is non-zero, hardware ‘holes’ in the virtual address space will be removed from the newly allocated address space.

The () function either reuses the address space of process p if there are no other references to it, or creates a new one with uvmspace_alloc(). The range of valid addresses in the address space is reset to start through end.

The () function creates and returns a new address space based upon the vm1 address space, typically used when allocating an address space for a child process.

The () function lowers the reference count on the address space vm, freeing the data structures if there are no other references.

The () function causes process p2 to share the address space of p1.

int
uvm_fault(vm_map_t orig_map, vaddr_t vaddr, vm_fault_t fault_type, vm_prot_t access_type);

The () function is the main entry point for faults. It takes orig_map as the map the fault originated in, a vaddr offset into the map the fault occurred, fault_type describing the type of fault, and access_type describing the type of access requested. uvm_fault() returns a standard errno.

struct uvm_object *
uvn_attach(void *arg, vm_prot_t accessprot);

void
uvm_vnp_setsize(struct vnode *vp, voff_t newsize);

void
uvm_vnp_sync(struct mount *mp);

void
uvm_vnp_terminate(struct vnode *vp);

boolean_t
uvm_vnp_uncache(struct vnode *vp);

The () function attaches a UVM object to vnode arg, creating the object if necessary. The object is returned.

The () function sets the size of vnode vp to newsize. Caller must hold a reference to the vnode. If the vnode shrinks, pages no longer used are discarded. This function will be removed when the file system and VM buffer caches are merged.

The () function flushes dirty vnodes from either the mount point passed in mp, or all dirty vnodes if mp is NULL. This function will be removed when the file system and VM buffer caches are merged.

The () function frees all VM resources allocated to vnode vp. If the vnode still has references, it will not be destroyed; however all future operations using this vnode will fail. This function will be removed when the file system and VM buffer caches are merged.

The () function disables vnode vp from persisting when all references are freed. This function will be removed when the file system and UVM caches are unified. Returns true if there is no active vnode.

int
uvm_io(vm_map_t map, struct uio *uio);

The () function performs the I/O described in uio on the memory described in map.

vaddr_t
uvm_km_alloc(vm_map_t map, vsize_t size);

vaddr_t
uvm_km_zalloc(vm_map_t map, vsize_t size);

vaddr_t
uvm_km_alloc1(vm_map_t map, vsize_t size, vsize_t align, boolean_t zeroit);

vaddr_t
uvm_km_kmemalloc(vm_map_t map, struct uvm_object *obj, vsize_t size, int flags);

vaddr_t
uvm_km_valloc(vm_map_t map, vsize_t size);

vaddr_t
uvm_km_valloc_wait(vm_map_t map, vsize_t size);

struct vm_map *
uvm_km_suballoc(vm_map_t map, vaddr_t *min, vaddr_t *max , vsize_t size, int flags, boolean_t fixed, vm_map_t submap);

void
uvm_km_free(vm_map_t map, vaddr_t addr, vsize_t size);

void
uvm_km_free_wakeup(vm_map_t map, vaddr_t addr, vsize_t size);

The () and () functions allocate size bytes of wired kernel memory in map map. In addition to allocation, uvm_km_zalloc() zeros the memory. Both of these functions are defined as macros in terms of uvm_km_alloc1(), and should almost always be used in preference to uvm_km_alloc1().

The () function allocates and returns size bytes of wired memory in the kernel map aligned to the align boundary, zeroing the memory if the zeroit argument is non-zero.

The () function allocates and returns size bytes of wired kernel memory into obj. The flags can be any of:

#define UVM_KMF_NOWAIT  0x1                     /* matches M_NOWAIT */
#define UVM_KMF_VALLOC  0x2                     /* allocate VA only */
#define UVM_KMF_TRYLOCK UVM_FLAG_TRYLOCK        /* try locking only */

The UVM_KMF_NOWAIT flag causes () to return immediately if no memory is available. UVM_KMF_VALLOC causes no pages to be allocated, only a virtual address. UVM_KMF_TRYLOCK causes uvm_km_kmemalloc() to use () when locking maps.

The () and () functions return a newly allocated zero-filled address in the kernel map of size size. uvm_km_valloc_wait() will also wait for kernel memory to become available, if there is a memory shortage.

The () function allocates submap (with the specified flags, as described above) from map, creating a new map if submap is NULL. The addresses of the submap can be specified exactly by setting the fixed argument to non-zero, which causes the min argument to specify the beginning of the address in the submap. If fixed is zero, any address of size size will be allocated from map and the start and end addresses returned in min and max.

The () and () functions free size bytes of memory in the kernel map, starting at address addr. uvm_km_free_wakeup() calls () on the map before unlocking the map.

struct vm_page *
uvm_pagealloc(struct uvm_object *uobj, voff_t off, struct vm_anon *anon, int flags);

void
uvm_pagerealloc(struct vm_page *pg, struct uvm_object *newobj, voff_t newoff);

void
uvm_pagefree(struct vm_page *pg);

int
uvm_pglistalloc(psize_t size, paddr_t low, paddr_t high, paddr_t alignment, paddr_t boundary, struct pglist *rlist, int nsegs, int flags);

void
uvm_pglistfree(struct pglist *list);

void
uvm_page_physload(vaddr_t start, vaddr_t end, vaddr_t avail_start, vaddr_t avail_end, int free_list);

The () function allocates a page of memory at virtual address off in either the object uobj or the anonymous memory anon, which must be locked by the caller. Only one of anon and uobj can be non NULL. The flags can be any of:

#define UVM_PGA_USERESERVE      0x0001  /* ok to use reserve pages */
#define UVM_PGA_ZERO            0x0002  /* returned page must be zeroed */

The UVM_PGA_USERESERVE flag means to allocate a page even if that will result in the number of free pages being lower than uvmexp.reserve_pagedaemon (if the current thread is the pagedaemon) or uvmexp.reserve_kernel (if the current thread is not the pagedaemon). The UVM_PGA_ZERO flag causes the returned page to be filled with zeroes, either by allocating it from a pool of pre-zeroed pages or by zeroing it in-line as necessary.

The () function reallocates page pg to a new object newobj, at a new offset newoff, and returns NULL when no page can be found.

The () function frees the physical page pg.

The () function allocates a list of pages for size size byte under various constraints. low and high describe the lowest and highest addresses acceptable for the list. If alignment is non-zero, it describes the required alignment of the list, in power-of-two notation. If boundary is non-zero, no segment of the list may cross this power-of-two boundary, relative to zero. nsegs is the maximum number of physically contiguous segments. The allocated memory is returned in the rlist list. The flags can be any of:

#define UVM_PLA_WAITOK	0x0001	/* may sleep */
#define UVM_PLA_NOWAIT	0x0002	/* can't sleep */
#define UVM_PLA_ZERO	0x0004	/* zero all pages before returning */

The UVM_PLA_WAITOK flag means that the function may sleep while trying to allocate the list of pages (this is currently ignored). Conversely, the UVM_PLA_NOWAIT flag signifies that the function may not sleep while allocating. It is an error not to provide one of the above flags. Optionally, one may also specify the UVM_PLA_ZERO flag to receive zeroed memory in the page list.

The () function frees the list of pages pointed to by list.

The () function loads physical memory segments into VM space on the specified free_list. uvm_page_physload() must be called at system boot time to set up physical memory management pages. The arguments describe the start and end of the physical addresses of the segment, and the available start and end addresses of pages not already in use.

void
uvm_pageout(void *arg);

void
uvm_scheduler(void);

void
uvm_swapin(struct proc *p);

The () function is the main loop for the page daemon. The arg argument is ignored.

The () function is the process zero main loop, which is to be called after the system has finished starting other processes. uvm_scheduler() handles the swapping in of runnable, swapped out processes in priority order.

The () function swaps in the named process.

struct uvm_object *
(vsize_t size, int flags);

void
(struct uvm_object *uobj);

void
uao_reference(struct uvm_object *uobj);

boolean_t
uvm_chgkprot(caddr_t addr, size_t len, int rw);

void
uvm_kernacc(caddr_t addr, size_t len, int rw);

void
uvm_vslock(struct proc *p, caddr_t addr, size_t len, vm_prot_t access_type);

void
uvm_vsunlock(struct proc *p, caddr_t addr, size_t len);

void
uvm_meter();

int
uvm_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp , size_t newlen, struct proc *p);

void
uvm_fork(struct proc *p1, struct proc *p2, boolean_t shared, void *stack, size_t stacksize, void (*func)(void *arg), void *arg);

int
uvm_grow(struct proc *p, vaddr_t sp);

int
uvm_coredump(struct proc *p, struct vnode *vp, struct ucred *cred, struct core *chdr);

The (), () and uao_reference() functions operate on anonymous memory objects, such as those used to support System V shared memory. uao_create() returns an object of size size with flags:

#define UAO_FLAG_KERNOBJ        0x1     /* create kernel object */
#define UAO_FLAG_KERNSWAP       0x2     /* enable kernel swap */

which can only be used once each at system boot time. () creates an additional reference to the named anonymous memory object. () removes a reference from the named anonymous memory object, destroying it if removing the last reference.

The () function changes the protection of kernel memory from addr to addr + len to the value of rw. This is primarily useful for debuggers, for setting breakpoints. This function is only available with options KGDB.

The () function checks the access at address addr to addr + len for rw access, in the kernel address space.

The () and () functions control the wiring and unwiring of pages for process p from addr to addr + len. The access_type argument of uvm_vslock() is passed to uvm_fault(). These functions are normally used to wire memory for I/O.

The () function calculates the load average and wakes up the swapper if necessary.

The () function provides support for the CTL_VM domain of the sysctl(3) hierarchy. uvm_sysctl() handles the VM_LOADAVG, VM_METER and VM_UVMEXP calls, which return the current load averages, calculates current VM totals, and returns the uvmexp structure respectively. The load averages are accessed from userland using the getloadavg(3) function. The uvmexp structure has all global state of the UVM system, and has the following members:

/* vm_page constants */
int pagesize;   /* size of a page (PAGE_SIZE): must be power of 2 */
int pagemask;   /* page mask */
int pageshift;  /* page shift */

/* vm_page counters */
int npages;     /* number of pages we manage */
int free;       /* number of free pages */
int active;     /* number of active pages */
int inactive;   /* number of pages that we free'd but may want back */
int paging;	/* number of pages in the process of being paged out */
int wired;      /* number of wired pages */

int zeropages;		/* number of zero'd pages */
int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
int reserve_kernel;	/* number of pages reserved for kernel */
int anonpages;		/* number of pages used by anon pagers */
int vnodepages;		/* number of pages used by vnode page cache */
int vtextpages;		/* number of pages used by vtext vnodes */

/* pageout params */
int freemin;    /* min number of free pages */
int freetarg;   /* target number of free pages */
int inactarg;   /* target number of inactive pages */
int wiredmax;   /* max number of wired pages */
int anonmin;	/* min threshold for anon pages */
int vtextmin;	/* min threshold for vtext pages */
int vnodemin;	/* min threshold for vnode pages */
int anonminpct;	/* min percent anon pages */
int vtextminpct;/* min percent vtext pages */
int vnodeminpct;/* min percent vnode pages */

/* swap */
int nswapdev;	/* number of configured swap devices in system */
int swpages;	/* number of PAGE_SIZE'ed swap pages */
int swpginuse;	/* number of swap pages in use */
int swpgonly;	/* number of swap pages in use, not also in RAM */
int nswget;	/* number of times fault calls uvm_swap_get() */
int nanon;	/* number total of anon's in system */
int nanonneeded;/* number of anons currently needed */
int nfreeanon;	/* number of free anon's */

/* stat counters */
int faults;		/* page fault count */
int traps;		/* trap count */
int intrs;		/* interrupt count */
int swtch;		/* context switch count */
int softs;		/* software interrupt count */
int syscalls;		/* system calls */
int pageins;		/* pagein operation count */
			/* pageouts are in pdpageouts below */
int swapins;		/* swapins */
int swapouts;		/* swapouts */
int pgswapin;		/* pages swapped in */
int pgswapout;		/* pages swapped out */
int forks;  		/* forks */
int forks_ppwait;	/* forks where parent waits */
int forks_sharevm;	/* forks where vmspace is shared */
int pga_zerohit;	/* pagealloc where zero wanted and zero
			   was available */
int pga_zeromiss;	/* pagealloc where zero wanted and zero
			   not available */
int zeroaborts;		/* number of times page zeroing was
			   aborted */

/* fault subcounters */
int fltnoram;	/* number of times fault was out of ram */
int fltnoanon;	/* number of times fault was out of anons */
int fltpgwait;	/* number of times fault had to wait on a page */
int fltpgrele;	/* number of times fault found a released page */
int fltrelck;	/* number of times fault relock called */
int fltrelckok;	/* number of times fault relock is a success */
int fltanget;	/* number of times fault gets anon page */
int fltanretry;	/* number of times fault retrys an anon get */
int fltamcopy;	/* number of times fault clears "needs copy" */
int fltnamap;	/* number of times fault maps a neighbor anon page */
int fltnomap;	/* number of times fault maps a neighbor obj page */
int fltlget;	/* number of times fault does a locked pgo_get */
int fltget;	/* number of times fault does an unlocked get */
int flt_anon;	/* number of times fault anon (case 1a) */
int flt_acow;	/* number of times fault anon cow (case 1b) */
int flt_obj;	/* number of times fault is on object page (2a) */
int flt_prcopy;	/* number of times fault promotes with copy (2b) */
int flt_przero;	/* number of times fault promotes with zerofill (2b) */

/* daemon counters */
int pdwoke;	/* number of times daemon woke up */
int pdrevs;	/* number of times daemon rev'd clock hand */
int pdswout;	/* number of times daemon called for swapout */
int pdfreed;	/* number of pages daemon freed since boot */
int pdscans;	/* number of pages daemon scanned since boot */
int pdanscan;	/* number of anonymous pages scanned by daemon */
int pdobscan;	/* number of object pages scanned by daemon */
int pdreact;	/* number of pages daemon reactivated since boot */
int pdbusy;	/* number of times daemon found a busy page */
int pdpageouts;	/* number of times daemon started a pageout */
int pdpending;	/* number of times daemon got a pending pagout */
int pddeact;	/* number of pages daemon deactivates */
int pdreanon;	/* anon pages reactivated due to min threshold */
int pdrevnode;	/* vnode pages reactivated due to min threshold */
int pdrevtext;	/* vtext pages reactivated due to min threshold */

int fpswtch;	/* FPU context switches */
int kmapent;	/* number of kernel map entries */

The () function forks a virtual address space for process' (old) p1 and (new) p2. If the shared argument is non zero, p1 shares its address space with p2, otherwise a new address space is created. The stack, stacksize, func and arg arguments are passed to the machine-dependent () function. The uvm_fork() function currently has no return value, and thus cannot fail.

The () function increases the stack segment of process p to include sp.

The () function generates a coredump on vnode vp for process p with credentials cred and core header description in chdr.

The structure and types whose names begin with “vm_” were named so UVM could coexist with BSD VM during the early development stages. They will be renamed to “uvm_”.

getloadavg(3), kvm(3), sysctl(3), ddb(4), options(4), pmap(9)

UVM is a new VM system developed at Washington University in St. Louis (Missouri). UVM's roots lie partly in the Mach-based 4.4BSD VM system, the FreeBSD VM system, and the SunOS4 VM system. UVM's basic structure is based on the 4.4BSD VM system. UVM's new anonymous memory system is based on the anonymous memory system found in the SunOS4 VM (as described in papers published by Sun Microsystems, Inc.). UVM also includes a number of features new to BSD including page loanout, map entry passing, simplified copy-on-write, and clustered anonymous memory pageout. UVM is also further documented in an August 1998 dissertation by Charles D. Cranor.

UVM appeared in OpenBSD 2.9.

Charles D. Cranor ⟨chuck@ccrc.wustl.edu⟩ designed and implemented UVM.

Matthew Green ⟨mrg@eterna.com.au⟩ wrote the swap-space management code.

Chuck Silvers ⟨chuq@chuq.com⟩ implemented the aobj pager, thus allowing UVM to support System V shared memory and process swapping.

Artur Grabowski ⟨art@openbsd.org⟩ handled the logistical issues involved with merging UVM into the OpenBSD source tree.

The uvm_fork() function should be able to fail in low memory conditions.

December 24, 2010 OpenBSD-5.1