Other types of indexes

4.6 Other types of indexes

This chapter only describes construction of non-positional indexes. Except for the much larger data volume we need to accommodate, the only differ- ence for positional indexes is that (termID, docID, (position1, position2, . . . ))

2. See, for example, ( Cormen et al. 1990 , ch. 19).

◮ Figure 4.8

A user-document matrix for access control lists. Element ( i ,j ) is 1 if user i has access to document j and 0 otherwise. During query processing, a user’s access postings list is intersected with the results list returned by the text part of the index.

triples instead of (termID, docID) pairs have to be processed and that tokens and postings contain positional information in addition to docIDs. With this change, the algorithms discussed here can all be applied to positional in- dexes.

In the indexes we have considered so far, postings lists are ordered with respect to docID. As we will see in the next chapter, this is advantageous for compression – instead of docIDs we can compress smaller gaps between IDs, thus reducing space requirements for the index. However, this struc-

RANKED RETRIEVAL

ture for the index is not optimal when we build ranked (Chapters 6 and 7 ) – as opposed to Boolean – retrieval systems. In ranked retrieval, postings are often ordered according to weight or impact, with the highest-weighted postings occurring first. With this organization, scanning of long postings lists during query processing can usually be terminated early when weights have become so small that any further documents can be predicted to be of

low similarity to the query (see Chapter 6 ). In a docID-sorted index, new documents are always inserted at the end of postings lists. In an impact- sorted index (Section 7.1.5 , page 140 ), the insertion can occur anywhere, thus complicating the update of the inverted index.

SECURITY

Security is an important consideration for retrieval systems in corporations.

A low-level employee should not be able to find the salary roster of the cor- poration, but authorized managers need to be able to search for it. Users’ results lists must not contain documents they are barred from opening since the very existence of a document can be sensitive information.

ACCESS CONTROL LISTS

User authorization is often mediated through access control lists or ACLs. ACLs can be dealt with in an inverted index data structure by representing each document as the set of users that can access them (Figure 4.8 ) and then inverting the resulting user-document matrix. The inverted ACL index has, for each user, a “postings list” of documents they can access – the user’s User authorization is often mediated through access control lists or ACLs. ACLs can be dealt with in an inverted index data structure by representing each document as the set of users that can access them (Figure 4.8 ) and then inverting the resulting user-document matrix. The inverted ACL index has, for each user, a “postings list” of documents they can access – the user’s

We discussed indexes for storing and retrieving terms (as opposed to doc-

uments) in Chapter 3 .

Can spelling correction compromise document-level security? Consider the case where ? a spelling correction is based on documents the user does not have access to.

Exercise 4.5

Dokumen yang terkait

IMPLEMENTASI METODE SPEED UP FEATURES DALAM MENDETEKSI WAJAH

0 0 6

Pengujian Aktivitas Anti Cendawan Sekresi Pertahanan Diri Rayap Tanah Coptotermes curvignathus Holmgren (Isoptera: Rhinotermitidae) Preliminary Detection of Antifungal Activity of Soldier Defensive Secretions from Subterranean Termites Coptotermes curvign

0 0 5

Sri Harti Widiastutik 2008 E PPROGRAM STUDI PENDIDIKAN BAHASA DAN SASTRA INDONESIA SEKOLAH TINGGI KEGURUAN DAN ILMU PENDIDIKAN PERSATUAN GURU REPUBLIK INDONESIA JOMBANG 2010 BAB I PENDAHULUAN - MAKALAH PROFESI KEPENDIDIKAN Antara Pekerjaan Dan Profesi

0 0 7

USING STRING BEADS TO SUPPORT STUDENTS’ UNDERSTANDING OF POSITIONING NUMBERS UP TO ONE HUNDRED Sri Imelda Edo, Kamaliyah, and Zetra Hainul Putra International Master Program on Mathematics Education (IMPoME) University of Sriwijaya, Indonesia Abstract - P

0 0 12

MATA PELAJARAN TEKNOLOGI INFORMASI DAN KOMUNIKASI (TIK) KELAS XI IPA SEMESTER GANJIL KODE : 10X2SMA 2008

0 0 22

YAYASAN SANDHYKARA PUTRA TELKOM SMK TELKOM SANDHY PUTRA MALANG 2008

0 0 16

PERUBAHAN TARIF PAJAK PENGHASILAN BADAN MENURUT UNDANG-UNDANG PAJAK PENGHASILAN No. 36 TAHUN 2008 DAN PRAKTIK EARNINGS MANAGEMENT

0 0 12

View of PENGUJIAN SUMUR HP-01 PADA RESERVOIR EP-B DENGAN MENGGUNAKAN METODE PRESSURE BUILD UP DI LAPANGAN PRABUMULIH PT. PERTAMINA EP ASSET 2

0 6 8

STUDI KELAYAKAN INVESTASI DAN TARIF UNTUK PEMBANGUNAN INFRASTRUKTUR UP STREAM DAN DOWN STREAM SPAM PEKANBARU 2015 – 2035

0 0 11

Analisa Performansi Safety Instrument System (SIS) pada HRSG PLTGU di PT. PJB UP Gresik

0 3 9