Big Kernel Lock\n\nA global spin lock introduced during 2.0 -> 2.2 transition.\nHas some outstanding features.\nShould not be used in new code.\n\n\n<html><pre>117 /*\n118 * Getting the big kernel semaphore.\n119 */\n120 void __lockfunc lock_kernel(void)\n121 {\n122 struct task_struct *task = current;\n123 int depth = task->lock_depth + 1;\n124 \n125 if (likely(!depth))\n126 /*\n127 * No recursion worries - we set up lock_depth _after_\n128 */\n129 down(&kernel_sem);\n130 \n131 task->lock_depth = depth;\n132 }\n133 \n134 void __lockfunc unlock_kernel(void)\n135 {\n136 struct task_struct *task = current;\n137 \n138 BUG_ON(task->lock_depth < 0);\n139 \n140 if (likely(--task->lock_depth < 0))\n141 up(&kernel_sem);\n142 }</pre></html>\n\n[[LKDv2]] Ch9 "Kernel Synchronization Methods" Section "The Big Kernel Lock".
Contain the items that I'm currently working on.
To get started with this blank TiddlyWiki, you'll need to modify the following tiddlers:\n* SiteTitle & SiteSubtitle: The title and subtitle of the site, as shown above (after saving, they will also appear in the browser title bar)\n* MainMenu: The menu (usually on the left)\n* DefaultTiddlers: Contains the names of the tiddlers that you want to appear when the TiddlyWiki is opened\n* StyleSheet: Contains additional CSS style\nYou'll also need to enter your username for signing your edits: <<option txtUserName>>
Linux Kernel Development Second Edition\nby Robert Love\n\nPublisher : Sams Publishing \nPub Date : January 12, 2005 \nISBN : 0-672-32720-1 \nPages : 432 \n
[[Welcome]]\n[[Tags]]\n[[Config|GettingStarted]]\n\n@@''Threads:''@@\n[[sys_open]]\n[[sys_close]]\n[[sys_mount]]\n\n@@''Meta links:''@@\n<<tag Current>>\n<<tag Question>>\n<<tag Stub>>\n@@''Index:''@@\n<<tag function>>\n<<tag datatype>>
Search of (?) instead
Ryan Gao (高远, aka 完美废人), undergraduate student of University of Canberra, Australia. Like music (playing piano), always appreciate the nature, loves computer and Linux programming.\nHave done the Chinese translation of "Advanced Linux Programming" volumn 1.\nCurrently working on this document ("Linux VFS Tour Unguided") and an FUSE module.\n\nContact:\nBlog: (Chinese)\nEmail: wolf0403$(AT)hotmail$(DOT)com (@#(&!)@spams-_-)
Security Enhanced Linux, a set of security checking hooks plugged in to Linux kernel, doing policy checking against rules (namely "policies").\n\nThis is the main definition and explaination of these hooks.
with [[LXR|]] and [[TiddlyWiki|]]
Linux VFS Tour Unguided
/***\nPlace your custom CSS here\n***/\n/*{{{*/\npre { font-family: courier, serif }\n.code { font-family: courier, serif; background-color: #fe8 }\n.reffer { font-style: italic; background-color: #fe8 }\n/*}}}*/\n
<<tagCloud systemConfig Current Stub Question function>>
Due to limitation of the wiki software, the modifier\n{{{__user}}}\nhas been replace with \n{{{USER}}}\n\nThis argument indicates an pointer is pointing to an userspace address thus should not be manipulated directly from within the kernel.
Welcome to the Linux VFS Tour Unguided. The intention for this... um, thing, is to record the steps I myself walked through through the VFS code. It is generally NOT STRUCTED. The threads are the calling chain in the code. Topics are categorized in to different [[Tags]].\n\nCurrently I have the following paths through (or at least on the way):\n* [[sys_open]]\n* [[sys_close]]\n* [[sys_mount]]\n\n<<slider vfstuAuthor Ryan "About the Author" "A few lines about myself">>\n{{code{void fastcall {{{__}}}fput(struct file *file)}}}\n\nFires up epoll events, cleans up locks, call {{code{release}}} from {{code{file->f_op}}} if specified and at last frees the {{code{file}}} object, release associated {{code{vfsmount}}} and {{code{dentry}}} references.\n<html><pre>1266 #define __getname() kmem_cache_alloc(names_cachep, SLAB_KERNEL)</pre></html>\n{{code{int copy_mount_options(const void USER *data, unsigned long *where)}}}\n\n1. Allocates kernel memory by calling [[__get_free_page]] with GFP_KERNEL.\n2. Copy the options in to the newly allocated memory, zero-out the rest.\n<html><pre>13 #define current get_current()</pre></html>in same file\n<html><pre>8 static inline struct task_struct * get_current(void)\n9 {\n10 return current_thread_info()->task;\n11 }</pre></html>This global accessable "variable" is system dependent. Usually it is used in process-context to determine the current running process. But even in interrupt-context, it is pointing to the process that has been interrupted. The implementation is architecture dependent.
{{reffer{Despite this useful unification, the VFS often needs to perform directory-specific operations, such as path name lookup. Path name lookup involves translating each component of a path, ensuring it is valid, and following it to the next component.}}} \n..and\n{{reffer{A dentry is a specific component in a path. Using the previous example, /, bin, and vi are all dentry objects. The first two are directories and the last is a regular file. This is an important point: dentry objects are all components in a path, including files. }}}\n\n//Both from LKDv2 Ch12 "The Virtual Filesystem" Section "The Dentry Object"//\n\nRather than representing any physical object on the filesystem (or, disk), dentry is representing, and being used by the kernel to manipulate on paths. Each {{code{dentry}}} object represents a single element in a path, may or may not has an physical repsentation ([[inode]]) being associated with.\n{{code{struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags)}}}\n\nThe {{code{dentry}}} and {{code{mnt}}} argument was prepared with [[open_namei]] in [[filp_open]].\nThe {{code{flags}}} argument was as passed in to [[sys_open]].\n\n1. [[Allocate|get_emtry_filp]] an [[struct file]] object, associate it with the {{code{dentry}}} argument and set up the access mode according to the {{code{flags}}} argument. \n[[file_move|]] adds the newly constructed object in to the queue associating with the [[inode]] (of the {{code{dentry}}} passed in as argument).\n2. The open function provided by the {{code{dentry}}} (now as {{code{f->f_op->open}}}) is then called to actually open the file from the filesystem. This is the __filesystem-dependent code__.\n[[file_ra_state_init|]] initializes the readahead state.\n{{code{int do_add_mount(struct vfsmount *newmnt, struct nameidata *nd, int mnt_flags, struct list_head *fslist)}}}\n\n\n{{code{struct vfsmount * do_kern_mount(const char *fstype, int flags, const char *name, void *data)}}}\n\n#Maps fstype to [[struct file_system_type]] object with [[get_fs_type]], passing the name {{code{fstype}}} as argument.\n#Alloc [[struct vfsmount]] object with [[alloc_vfsmnt]], passing the device name as argument.\n#Alloc a buffer with [[alloc_secdata]] and copy filesystem specific option string in to this buffer (SELinux related), later passed to [[security_sb_kern_mount]] for SELinux audit.\n#Get [[struct super_block]] object from the [[struct file_system_type]] object. This is filesystem dependent.\n#Fill the [[struct vfsmount]] object with relevant information from the [[struct super_block]] object.\n#Call [[put_filesystem]] to free the [[struct file_system_type]] object allocated with [[get_fs_type]].\n#Return the [[struct vfsmount]] object that has been initialized.\n\n{{code{long do_mount(char * dev_name, char * dir_name, char *type_page,\n unsigned long flags, void *data_page)}}}\n\n#Discard magic (see in code comments)\n#Basic sanity check with path and device paths. See [[memchr]].\n#Look up the path to mount point with [[path_lookup]], filling in an [[struct nameidata]] object. This is the 1st argument to any of the following do_* mount functions.\n#Do SELinux checking with [[security_sb_mount]].\n#Dispatch the mount request to different functions:\n*[[do_remount]] for remount\n*[[do_loopback]] for bind mount (mount --bind). Is mount -o loop also here?(?)\n*[[do_move_mount]] (mount --move)\n*[[do_new_mount]] for all the rest.\n{{code{static int do_new_mount(struct nameidata *nd, char *type, int flags, int mnt_flags, char *name, void *data)}}}\n\nThe 1st argument is pointing to the mount point directory.\n\n#Calls [[do_kern_mount]] to get the [[struct vfsmount]] object representing the new mount.\n#Associate the newly created mount with the existing filesystem namespace with [[do_add_mount]].
fs/open.c: L1082\n{{code{long do_sys_open(int dfd, const char [[USER]] *filename, int flags, int mode)}}}\n\nCalled from [[sys_open]] to handle the actually open routine.\nGenerally the same as [[sys_open in 2.6.11|sys_open (2.6.11)]] \n# Allocate kernel buffer for temperarily held the filename by calling [[getname]].\n# Allocate a new (unused) file descriptor from current process's [[struct fdtable]].\n# Open the file and get an [[struct file]] object representing the opend file.\n# If succeed, send [[inotify]] events by calling [[fsnotify_open]] on the [[dentfy]] field of the object returned in previous step.\n# After [[fsnotify_open]], installs the file object with the file descriptor.\n# Free name buffer and return.\n
File Descriptor, a small integer used as key in file manipulation functions.\nFile descriptor is the representation kernel provided to user space programs. All syscalls dealing with files always deal with file descriptors.\n\n*''Allocating'': [[get_unused_fd]] finds the next available smallest fd with [[find_next_zero_bit|]]\n*''Release'': [[put_unused_fd]] check if current->files->next_fd is bigger than the returned fd. If so, the returned fd is set to be next allocated.\n<html><pre>921 void fastcall fd_install(unsigned int fd, struct file * file)\n922 {\n923 struct files_struct *files = current->files;\n924 spin_lock(&files->file_lock);\n925 if (unlikely(files->fd[fd] != NULL))\n926 BUG();\n927 files->fd[fd] = file;\n928 spin_unlock(&files->file_lock);\n929 }</pre></html>
When process manipulate physical files in the filesystem, they're generally dealing with an [[file descriptor|fd]]. When in kernel space, the corresponding representation is an [[struct file]] object. Actually while opening an file with [[sys_open]], kernel allocates an [[fd]], creates an [[struct file]] object and [[associates|fd_install]] them together.\n{{code{int filp_close(struct file *filp, fl_owner_t id)}}}\n\nSince the [[file]] has been taken out from the [[current]] process's open file table, [[filp_close]] is focusing on clearing resources associated with the [[file]] only.\n\n1. Save pending errors on the [[file]] and clear the pending error mark.\n2. Check if ref count of this [[file]] has reached 0 already (unexpected).\n3. If the [[file|struct file]] has been specified an {{code{flush}}} operation (filesystem dependent code), call it.\n4. Remove the [[file]] from [[current]] process's [[dnotify]] list with [[dnotify_flush]].\n5. Clear POSIX locks associated with this [[file]] with [[locks_remove_posix]].\n6. Release the {{code{filp}}} object with [[fput]].\nAt last return the pending error (either saved from the 1st step or from the {{code{flush}}} operation in step 3.\n\n{{code{struct file *filp_open(const char * filename, int flags, int mode)}}}\n\nOpens file with specified path, flag and mode.\n\n1. Calls [[open_namei]]. As said in the comment, {{reffer{this is in fact almost the whole open-routine.}}}. This function fills an {{code{struct nameidata}}} object {{code{nd}}} with the right {{code{dentry}}} for the intended file. [[open_namei]] doesn't do take the //"open"// action itself but prepares the {{code{struct nameidata}}} object for the next step.\n3. Calls [[dentry_open]] with the dentry found in the previous step. This will generate the {{code{struct file}}} object to be returned.\n<html><pre>108 void fastcall fput(struct file *file)\n109 {\n110 if (atomic_dec_and_test(&file->f_count))\n111 __fput(file);\n112 }</pre></html>\nSee [[__fput]].\n<html><pre>60 /* Find an unused file structure and return a pointer to it.\n61 * Returns NULL, if there are no more free file structures or\n62 * we run out of memory.\n63 */\n64 struct file *get_empty_filp(void)</pre></html>\n{{code{char * getname(const char [[USER]] * filename)}}}\n\nCalls [[do_getname|]] to copy file path from userspace to kernel space.\n\n[[__getname]] and [[__putname|putname]] allocates and frees memory for paths in kernel, from slab cache(?) //names_cachep//, respectively.\n\n[[TASK_SIZE|]] specifies the high bound of user space address (3Gig by default on Linux/x86).\n\nSee [[putname]].
A new filesystem change notification system introduced into mainline kernel in version 2.6.13. As a drop-in replacement for the old [[dnotify|]] system, inotify provide a much better usable API interface. Inotify provides the ability to monitor directories as well as single files for read / write / rename / open / close events. No file descriptor is needed associating with the file being monitored so filesystem can be safely unmounted while being monitored. This gives inotify advantage to dnotify when being used in current computing environment where removable hardware are used more and more.\n{{code{int fastcall link_path_walk(const char * name, struct nameidata *nd)}}}\n\n{{reffer{Name resolution.}}}\n\n<html><pre>691 while (*name=='/')\n692 name++;</pre></html>\nSkips all begining /s (so "/path" becomes "path")\n\n<html><pre>713 = name;\n714 c = *(const unsigned char *)name;\n715 \n716 hash = init_name_hash();\n717 do {\n718 name++;\n719 hash = partial_name_hash(c, hash);\n720 c = *(const unsigned char *)name;\n721 } while (c && (c != '/'));\n722 this.len = name - (const char *);\n723 this.hash = end_name_hash(hash);</pre></html>\nConstructor for [[struct qstr]] object {{code{this}}} ([[703|]]). After that, {{code{struct qstr this}}} is refering to the first part left in the {{code{path}}} (ie, "usr" of "/usr/src/linux"). Since this is contained in a loop (line 701), each time this strips off the 1st part in {{code{name}}}. eg. "/usr/src/linux" turns out with "usr", "src", "linux" in each time.\n\n<html><pre>725 /* remove trailing slashes? */\n726 if (!c)\n727 goto last_component;\n728 while (*++name == '/');\n729 if (!*name)\n730 goto last_with_slashes;</pre></html> [[726|]] turns out with success if {{code{}}} is the last token in {{code{name}}} (the intended path). Duplicated slashes ("/{1,}") are allowed as path delimiters. Both "{{code{last_with_slashes}}}" and "{{code{last_component}}}" share a same logic (line [[799|]]). Otherwise (not yet reached the end of {{code{name}}}) the logic continues:\n\n<html><pre>737 if ([0] == '.') switch (this.len) {\n738 default:\n739 break;\n740 case 2: \n741 if ([1] != '.')\n742 break;\n743 follow_dotdot(&nd->mnt, &nd->dentry);\n744 inode = nd->dentry->d_inode;\n745 /* fallthrough */\n746 case 1:\n747 continue;\n748 }</pre></html>\nSkip all tokens of "." and "..". Line 747 leads back to the begin of the loop at line 701.\n\nLater by calling [[do_lookup|]], the actual info (vfsmount and dentry up to the current path element) is filled in the {{code{struct path next}}} declared at the begining of [[link_path_walk]]. Since the current element is not the end of the path specified by {{code{name}}}, this can only be an symbolic link or and directory. The rest code up to line 796 ensures this constraint.\n\nProcedure for the last element in the path is generally the same.\n\nIn conclusion, when following up an path, the VFS break down the path in to tokens, supressing redundant path delimiters, follow up special element like . and .., validate each of the tokens to see if they are valid or not. For each token, an {{code{dentry}}} is created, as well as {{code{vfsmount}}} information is checked.\n{{code{static void locks_delete_lock(struct file_lock **thisfl_p)}}}\n\nWake up processes blocked on this lock and then free the lock.\n{{code{void locks_remove_posix(struct file *filp, fl_owner_t owner)}}}\n\n1. Check if the {{code{filp}}} has any lock associated. If no, simply return.\n2. If the {{code{filp}}} has an lock, and has an {{code{lock}}} op specified in the {{code{f_op}}} table (from underlying filesystem), use that op to release the lock.\n3. If the {{code{filp}}} has an lock but has no {{code{lock}}} op specified, kernel will iterate through all locks associated with this {{code{filp}}}, delete every lock on the file owned by current process (identified by {{code{owner}}} with [[locks_delete_lock]].\n4. Cleanup and return. (?)\n{{code{int may_open(struct nameidata *nd, int acc_mode, int flag)}}}\n\nGenerally checks permission for intended open.
x86\n\nor generic\n\n{{code{void *memchr(const void *s, int c, size_t n)}}}\nSearchs for {{code{c}}} in memory area started at {{code{s}}} with size {{code{n}}}.\nReturns address of first occurrence or NULL.\n{{code{int open_namei(const char * pathname, int flag, int mode, struct nameidata *nd)}}}\n\n{{{ * namei for open - this is in fact almost the whole open-routine. }}}\n\n[[open_namei]] fills in the "" data indicating an open operation then handle this {{code{nameidata}}} data into [[path_lookup]].\nAfter [[path_lookup]] returns, {{code{open_namei}}} calls [[may_open]] with the {{code{nameidata}}} information found by last step.\n\nSee also: [[struct nameidata]]\n{{code{int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd)}}}\n\nAccepts the path {{code{name}}} and [[struct nameidata]] object {{code{nd}}} as arguments, finds the relevent mount point ({{code{nd->mnt}}}) and dentry ({{code{nd->dentry}}}) and fills them in. \nIf the intended path starts with '/' (so an absolute path), both {{code{nd->mnt}}} and {{code{nd->dentry}}} refer to the root directory(/). Otherwise, {{code{nd->dentry}}} refer to the current directory and {{code{nd->mnt}}} is also related to the current directory.(?)\n\nLater these information are used as the root for searching the actual file {{code{path}}} is refering to in [[link_path_walk]].\n\n[[dget|]] and [[mntget|]] returns a new reference to an dentry / vfsmount object respectively, having its reference count increased.\n\nSee also [[path_release]].\n{{code{void path_release(struct nameidata *nd)}}}\nCalls dput and mntput for the dentry and mnt contained in the [[struct nameidata]].\n\nSee also [[path_lookup]].
from :\n<html><pre>1267 #define __putname(name) kmem_cache_free(names_cachep, (void *)(name))\n1268 #ifndef CONFIG_AUDITSYSCALL\n1269 #define putname(name) __putname(name)\n1270 #else\n1271 #define putname(name) \s\n1272 do { \s\n1273 if (unlikely(current->audit_context)) \s\n1274 audit_putname(name); \s\n1275 else \s\n1276 __putname(name); \s\n1277 } while (0)\n1278 #endif</pre></html>\nSee [[getname]]
References used while constructing this document. Both electronical and paper-printed. \nCredits and many thanks to original authors.\n\nWhile parsing and dealing with paths, kernel splits the path in to separate elements. Each of these elements are represented as an [[struct dentry|]] object.\n\nWhile an file is opened with [[sys_open]], [[filp_open]] is called to map the path in to an [[struct file]] object, and an [[struct dentry]] object is then being initialized (with in [[open_namei]]).
Type the text for 'New Tiddler'\n\nThe in-kernel representation of an opened file of a process.\n{{reffer{The object (but not the physical file) is created in response to the open() system call and destroyed in response to the close() system call.}}} \n//from LKDv2 Ch12 "The Virtual Filessytem"" Section "The File Object"//\n\n{{code{f_op}}} as well as much of other members of this struct are initialized from associated [[struct dentry]] f_op with in [[dentry_open]].\n\nAllocate: [[get_emtry_filp]]\nDeallocate: [[fput]]\n\nOpen: [[filp_open]]\nClose: [[filp_close]]\n\n{{code{struct nameidata}}} contains VFS namespace related information, including \n{{code{struct dentry *dentry;\nstruct vfsmount *mnt;}}} \nand a few other members.\n<html><pre>struct qstr {\n unsigned int hash;\n unsigned int len;\n const unsigned char *name;\n};</pre></html>\n\nConstructor (from [[link_path_walk]]):\n<html><pre>const char *name; // source string\nunsigned int c;\nunsigned long hash;\n\ = name;\nhash = init_name_hash ();\ndo\n {\n name++;\n hash = partial_name_hash (c, hash);\n c = *(const unsigned char *) name;\n }\nwhile (c && (c != '/'));\nthis.len = name - (const char *);\nthis.hash = end_name_hash (hash);\n</pre></html>\n\nRepresenting a mount "relation" instance.\nInitialized in [[do_kern_mount]].\n{{code{asmlinkage long sys_close(unsigned int fd)}}}\n\nSystem call close(2).\n0. Lock the [[current]] process's open file table\n1. Ensure the [[fd]] passed in is valid. \n2. Ensure the [[filp|struct file]] associated with {{code{fd}}} is valid.\nEither of the above check failing would cause sys_close to unlock file table and return -[[EBADF]].\n3. Remove the filp from the list of [[file|struct file]]s [[current]] process is holding opened, return the {{code{fd}}}.\n4. Unlock the open file table\n5. call [[filp_close]] to close the [[file]].\n{{code}}asmlinkage long sys_mount(char __user * dev_name, char __user * dir_name,\n char __user * type, unsigned long flags,\n void __user * data)}}}\n\nSystem call mount(2).\n\n1. Copy all options in to kernel memory with [[copy_mount_options]].\n2. Lock the kernel ([[BKL]], [[lock_kernel|BKL]]) and call [[do_mount]].
fs/open.c: L1104\n{{code{asmlinkage long sys_open(const char [[USER]] *filename, int flags, int mode)}}}\n\n[[sys_open]] has been changed since [[2.6.11|sys_open (2.6.11)]].\n\n# Checks if the system has a 32bit long time or longer. If not, set [[O_LARGEFILE]] flag automatically.\n# Handle everything to [[do_sys_open]].\n# Return.\n{{code{asmlinkage long sys_open(const char [[USER]] * filename, int flags, int mode) }}}\n\nSystem call open(2).\n\n1. Calls [[getname]] to copy filename into kernel space.\n2. Opens the file with [[filp_open]] and [[installs|fd_install]] the opend file with an allocated [[fd]].\n3. Returns the newly allocated (associated) fd, or errno on error.
version.extensions.tagCloud = {major: 1, minor: 0 , revision: 1, date: new Date(2005,8,16)};\n//Created by Clint Checketts, contributions by Jonny Leroy and Eric Shulman\n\nconfig.macros.tagCloud = {\n noTags: "No tag cloud created because there are no tags.",\n tooltip: "%1 tiddlers tagged with '%0'"\n};\n\nconfig.macros.tagCloud.handler = function(place,macroName,params) {\n \nvar tagCloudWrapper = createTiddlyElement(place,"div",null,"tagCloud",null);\n\nvar tags = store.getTags();\nfor (t=0; t<tags.length; t++) {\n for (p=0;p<params.length; p++) if (tags[t][0] == params[p]) tags[t][0] = "";\n}\n\n if(tags.length == 0) \n createTiddlyElement(tagCloudWrapper,"span",null,null,this.noTags);\n //Findout the maximum number of tags\n var mostTags = 0;\n for (t=0; t<tags.length; t++) if (tags[t][0].length > 0){\n if (tags[t][1] > mostTags) mostTags = tags[t][1];\n }\n //divide the mostTags into 4 segments for the 4 different tagCloud sizes\n var tagSegment = mostTags / 4;\n\n for (t=0; t<tags.length; t++) if (tags[t][0].length > 0){\n var tagCloudElement = createTiddlyElement(tagCloudWrapper,"span",null,null,null);\n tagCloudWrapper.appendChild(document.createTextNode(" "));\n var theTag = createTiddlyButton(tagCloudElement,tags[t][0],this.tooltip.format(tags[t]),onClickTag,"tagCloudtag tagCloud" + (Math.round(tags[t][1]/tagSegment)+1));\n theTag.setAttribute("tag",tags[t][0]);\n }\n\n};\n\nsetStylesheet(".tagCloud span{height: 1.8em;margin: 3px;}.tagCloud1{font-size: 1.2em;}.tagCloud2{font-size: 1.4em;}.tagCloud3{font-size: 1.6em;}.tagCloud4{font-size: 1.8em;}.tagCloud5{font-size: 1.8em;font-weight: bold;}","tagCloudsStyles");