For a given state of the table formed by insertions from an empty table, the time for successful search for a given element is the time that it took for unsuccessful search for that element, as the authors built the table, plus one.
ly, a table is a mapping (function) from keys to values. Given a search key k, table search has to find the table entry (k,v) containing that key. The found entry may be retrieved, or removed (deleted) from the table, or its value, v, may be updated. If the table has no such entry, a new entry with key k may be created and inserted in the table. Operations on a table also initialize a table to the empty one or indicate that an entry with the given key is absent. Insertions and deletions modify the mapping of keys onto values specified by the table. Example 3.2. Table 3.1 presents a very popular (at least in textbooks on algorithms and data structures) table having three-letter identifiers of airports as keys and associated data such as airport locations, as values. Each identifier has a unique integer representation k = 26c0 + 26c1 + c2 where the ci; i = 0,1,2, are ordinal numbers of letters in the English alphabet (A corresponds to 0, B to 1, . . . , Z to 25). For example, AKL corresponds to 26 ·0+26 ·10+11 = 271. In total, there are 26 = 17576 possible different keys and entries. Table 3.1: A map between airport codes and locations. Key Associated value v Code k City Country State / Place AKL 271 Auckland New Zealand DCA 2080 Washington USA District of Columbia (D.C.) FRA 3822 Frankfurt Germany Hesse GLA 4342 Glasgow UK Scotland HKG 4998 Hong Kong China LAX 7459 Los Angeles USA California SDF 12251 Louisville USA Kentucky ORY 9930 Paris France As can be seen from this example, we may map the keys to integers. Chapter 3: Efficiency of Searching 71 We deal with both static (where the database is fixed in advance and no insertions, deletions or updates are done) and dynamic (where insertions, deletions or updates are allowed) implementations of the table ADT. In all our implementations of the table ADT, we may simplify the analysis as follows. We use lists and trees as our basic containers. We treat each query or update of a list element or tree node, or comparison of two of them, as an elementary operation. The following lemma summarizes some obvious relationships. Lemma 3.3. Suppose that a table is built up from empty by successive insertions, and we then search for a key k uniformly at random. Let Tss(k) (respectively Tus(k)) be the time to perform successful (respectively unsuccessful) search for k. Then • the time taken to retrieve, delete, or update an element with key k is at least Tss(k); • the time taken to insert an element with key k is at least Tus(k); • Tss(k)≤ Tus(k). In addition • the worst case value for Tss(k) equals the worst case value for Tus(k); • the average value of Tss(k) equals one plus the average of the times for the unsuccessful searches undertaken while building the table. Proof. To insert a new element, we first try to find where it would be if it were contained in the data structure, and then perform a single insert operation into the container. To delete an element, we first find it, and then perform a delete operation on the container. Analogous statements hold for updating and retrieval. Thus for a given state of the table formed by insertions from an empty table, the time for successful search for a given element is the time that it took for unsuccessful search for that element, as we built the table, plus one. This means that the time for unsuccessful search is always at least the time for successful search for a given element (the same in the worst case), and the average time for successful search for an element in a table is the average of all the times for unsuccessful searches plus one. If the data structure used to implement a table arranges the records in a list, the efficiency of searching depends on whether the list is sorted. In the case of the telephone book, we quickly find the desired phone number (data record) by name (key). But it is almost hopeless to search directly for a phone number unless we have a special reverse directory where the phone number serves as a key. We discuss unsorted lists in the Exercises below, and sorted lists in the next section. Chapter 3: Efficiency of Searching 72 Exercises Exercise 3.1.1. The sequential search algorithm simply starts at the head of a list and examines elements in order until it finds the desired key or reaches the end of the list. An array-based version is shown in Figure 3.1. algorithm sequentialSearch Input: array a[0..n−1]; key k begin for i← 0 while i < n step i← i+1 do if a[i] = k then return i end for return not found end Figure 3.1: A sequential search algorithm. Show that both successful and unsuccessful sequential search in a list of size n have worst-case and average-case time complexity Θ(n). Exercise 3.1.2. Show that sequential search is slightly more efficient for sorted lists than unsorted ones. What is the time complexity of successful and unsuccessful search? 3.2 Sorted lists and binary search A sorted list implementation allows for much better search method that uses the divide-and-conquer paradigm. The basic idea of binary search is simple. Let k be the desired key for which we want to search. • If the list is empty, return “not found”. Otherwise: • Choose the key a[m] of the middle element of the list. If a[m] = k, return its record; if a[m]> k, make a recursive call on the head sublist; if a[m]< k, make a recursive call on the tail sublist. Example 3.4. Figure 3.2 illustrates binary search for the key k = 42 in a list of size 16. At the first iteration, the search key 42 is compared to the key a[7] = 53 in the middle position m = ⌊(0+15)/2⌋= 7. Because 42 < 53, the second iteration explores the first half of the list with positions 0, . . . ,6. The search key is compared to the key a[3] = 33 in the middle position m = ⌊(0+ 6)/2⌋ = 3. Because 42 > 33, the third iteration explores the second half of the current sublist with positions 4, . . . ,6. The comparison to the key a[5] = 49 in the middle position m = ⌊(4+6)/2⌋= 5 reduces the search range to a single key, a[4] = 42, because now l = r = m = 4. Because the single entry is equal to the search key, the algorithm returns the answer 4. Chapter 3: Efficiency of Searching 73 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 7 14 27 33 42 49 51 53 67 70 77 81 89 94 95 99