Internet search engines virtually always create a ranking of all pages, and then they choose only those pages that contain the right words. A new approach that yields more relevant hits and faster search engines is also been designed.The goal of all search engines is to attain the most relevant responses as quickly as possible. When search engines calculate their search results, they are steered by an algorithm that assigns higher or lower values to features of Web pages. The most common search engines on the Net, such as Google, generate a gigantic single ranking based on a search of all pages available on the Net.The algorithm will rank pages, instead of each relevant starting page, and includes pages that are directly or indirectly linked to by the starting page. Then a normalised mean value of the relevance of the various pages is calculated.A page that has links to it from several different pages is therefore assigned a higher value than those that are found only once. In this way it is faster to find pages of interest. For ordinary standard algorithms it takes more than seven days to go through and rank Web pages in a certain database. Using this algorithm, a researcher has managed to do this in 158 seconds.What’s more, his algorithm has proven to yield the most relevant responses. The relevance of hits in the top ten lists for three different algorithms: the one that was developed and two variants of PageRank, the algorithm used by Google. A total of 100 different expressions for all Nordic languages and English, including the expression master of engineering science were examined.The top ten lists always had some form of overlapping between the different algorithms, but they were never completely identical. Users were then asked to judge the relevance of the various hits, without knowing which search engines had generated the alternative responses.The users in the study found that the search engine that is now developed is better than the others in more than 60 percent of cases.Besides search engines, the dissertation is also about methods for finding structures in huge masses of information, such as keywords and methods for extracting free text, such as parts of the documentation from the source code.
NTFS and FAT32 are the two formats available while installing Windows XP, this is the information a computer rookie have. Though the fact goes much deeper than that this article gives a general overview of both of these formats along with the advantages and disadvantages.FAT gets its name from the use of a kind of database called a File Allocation Table that contains an entry for each cluster on the disk. The FAT system has been in use by Microsoft since before DOS 1 (the first version was devised by a teenager named Bill Gates) and has undergone several revisions. There are versions called FAT12, FAT16, and FAT32. The numbers refer to the number of bits used for the cluster entries in the table. More recent PC users may find it hard to believe but in 1987 the FAT system then in use (in DOS 3) was unable to read a hard drive (or more accurately, volume) bigger than 32 MB. (That’s right, 32 megabytes). By the time of DOS 6, the upper limit had been enlarged in several steps to 2 GB but the ever increasing size of hard disks made yet another revision necessary. With Windows 95B, FAT32 was introduced, increasing the upper limit to 2 terabytes (theoretically but not practically). These continual problems with disk size arose from several causes, including the fact that the number of entries in FAT is limited by the finite number of bits used for describing the location of a cluster. For example, FAT16 can hold no more than 2^16 or 65,526 cluster entries (actually somewhat less). Another factor is that the number of sectors per cluster is also limited.A further problem with bigger disks is the large amount of wasted space or “slack”. Since there are a fixed number of clusters available, larger disks mean that the cluster size has to be increased in order to fill the available space. However, this results in more and more unutilized disk space since a typical file is rarely close to an even multiple of a cluster size. For example, a FAT32 system uses 16 KB clusters for partition sizes between 16 and 32 GB. A 20 KB file would require two 16 KB clusters actually occupying 32 KB of space. A mere 1 KB file still requires 16 KB of space. A typical large disk might have 30% or even 40% of its space wasted this way. Making smaller partitions alleviates slack but with 200 GB disks now common, and ever-bigger ones on the way, partitioning is no longer a practical solution.Another problem is file fragmentation. Although a file may require several clusters, the clusters need not be in close physical proximity on the disk. When a file is loaded to the disk the operating system chooses unused clusters wherever it finds them. If many files consist of widely separated parts, the time required to retrieve them for program use inevitably slows the system (hence the need for defragging).Actually, the FAT system has been enjoying something of a come-back. Thumb or flash drives have become very common and these are of a size that makes the FAT system useful. The smaller sizes are even formatted in FAT16.NTFS is much more flexible than FAT. Its system areas are almost all files instead of the fixed structures used in FAT. Since files are used, the system areas can be modified, enlarged, or moved as is needed. An example of one of the several system files is the Master File Table (MFT). The MFT is a sort of relational database with a variety of information about all the files on the disk. If a file is small (1 KB or less) the MFT may even hold the file itself. For larger files NTFS uses clusters in assigning disk space but in a way different from FAT. The cluster size will not normally exceed 4 KB. A type of individual file compression is built in so that the problems with slack do not arise.Because it is intended for multi-user environments, NTFS has much more security built in. For example, the XP Professional version (not the Home version) allows permissions and encrypting to be applied to individual files. While much more secure, XP is accordingly much harder to tinker with. That makes trouble-shooting and system tweaking more problematical. It also means that the user has to be very careful when setting up passwords and permissions on a system. Forgetting a password has much more serious consequences than it did in Windows 98.The MFT and other system files occupy quite a bit of space so NTFS is not intended for small disks. Also the amount of memory required is substantial. These system overhead requirements, which formerly limited the use of Windows NT to larger computers, have largely disappeared as a factor with newer PCs and their much larger amounts of RAM and very large hard drives.
Forty years ago this summer, a programmer sat down and knocked out in one month what would become one of the most important pieces of software ever created.In August 1969, Ken Thompson, a programmer at AT&T subsidiary Bell Laboratories, saw the month-long departure of his wife and young son as an opportunity to put his ideas for a new operating system into practice. He wrote the first version of Unix in assembly language for a wimpy Digital Equipment Corp. (DEC) PDP-7 minicomputer, spending one week each on the operating system, a shell, an editor and an assembler.Thompson and a colleague, Dennis Ritchie, had been feeling adrift since Bell Labs had withdrawn earlier in the year from a troubled project to develop a time-sharing system called Multics (Multiplexed Information and Computing Service). They had no desire to stick with any of the batch operating systems that predominated at the time, nor did they want to reinvent Multics, which they saw as grotesque and unwieldy.After batting around some ideas for a new system, Thompson wrote the first version of Unix, which the pair would continue to develop over the next several years with the help of colleagues Doug McIlroy, Joe Ossanna and Rudd Canaday. Some of the principles of Multics were carried over into their new operating system, but the beauty of Unix then (if not now) lay in its less-is-more philosophy."A powerful operating system for interactive use need not be expensive either in equipment or in human effort," Ritchie and Thompson would write five years later in the Communications of the ACM (CACM), the journal of the Association for Computing Machinery. "[We hope that] users of Unix will find that the most important characteristics of the system are its simplicity, elegance, and ease of use."Apparently they did. Unix would go on to become a cornerstone of IT, widely deployed to run servers and workstations in universities, government facilities and corporations. And its influence spread even farther than its actual deployments, as the ACM noted in 1983 when it gave Thompson and Ritchie its top prize, the A.M. Turing Award for contributions to IT: "The model of the Unix system has led a generation of software designers to new ways of thinking about programming."