https://leetcode.com/problems/median-of-two-sorted-arrays/
Median of Two Sorted Arrays
Total Accepted: 65901 Total Submissions: 385447 Difficulty: Hard
There are two sorted arrays nums1 and nums2 of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
Let me first show the code and then show the idea.
import sys
class Solution(object):
def findMedianSortedArrays(self, nums1, nums2):
"""
:type nums1: List[int]
:type nums2: List[int]
:rtype: float
"""
if len(nums1) == 0 and len(nums2) == 0:
return None
if len(nums1) < len(nums2):
nums1, nums2 = nums2, nums1
if len(nums2) == 0:
return (nums1[(len(nums1)-1)/2] + nums1[len(nums1)/2]) / 2.0
jmin, jmax = 0, 2*len(nums2)
while jmin <= jmax:
jmid = (jmin + jmax) / 2
imid = len(nums1) + len(nums2) - jmid
i_left = nums1[(imid-1)/2] if imid else - sys.maxint
i_right = nums1[imid/2] if 2*len(nums1) - imid else sys.maxint
j_left = nums2[(jmid-1)/2] if jmid else - sys.maxint
j_right = nums2[jmid/2] if 2*len(nums2) - jmid else sys.maxint
if i_left > j_right:
jmin = jmid + 1
elif j_left > i_right:
jmax = jmid - 1
else: // i_left <= j_right and j_left <= i_right
return (max(i_left, j_left) + min(i_right, j_right)) / 2.0
Idea
The length of nums1 or nums2 can be odd or even. The total length of nums1 and nums2 can also be odd or even. To calculate median, if the list has N (N is odd) elements, you should return the very middle element list[N/2]. If N is even, you should return the average of the two middle elements: (list[(N-1)/2] + list[N/2]) / 2. All these considerations will make your brain twisted. So first of all we need to come up with a more general notation system to deal with either odd or even numbers.
The notation system can be better illustrated in the following two examples:
original array: 99, 100, 101
original index: 0, 1, 2
expanded array: #, 99, #, 100, #, 101, #
expanded index: 0, 1, 2, 3, 4, 5, 6
median cut original index: 1
median cut expanded index: 3
median: 100
original array: 99, 100, 101, 102
original index: 0, 1, 2, 3
expanded array: #, 99, #, 100, #, 101, #, 102, #
expanded index: 0, 1, 2, 3, 4, 5, 6, 7, 8
median cut original index: 2,3
median cut expanded index: 4
median: (100 + 101)/2
The two arrays are expanded by adding `#` between numbers as well as at the beginning and at the end of the array. You may find that ‘#’ always have odd index in the expanded array. Numbers always have even index in the expanded array. Every position in the expanded array denotes a possible cut. A cut at position `c` (0-based, c!=0 and c!=len(arr)) in the expanded array will return:
- If you cut at `#`, then the cut will return the average of two numbers on its left and right. i.e., (ori[(c-1)/2] + ori)/2
- If you cut at a number in the expanded array, you return that number directly. Or equivalently, you can imagine you cut on that number so the number is copied and splitted both on the cut’s left side and right side. So you still return the average of two (same) numbers on its left and right, i.e., (ori[(c-1)/2] + ori)/2, since ori[(c-1)/2] and ori always point to that number in this case.
The expanded array always has odd number length. If we know where we should cut in the expanded array to find the median of the original array (say position `c`), you can always return (ori[(c-1)/2] + ori)/2. You can think that the largest value on the left of cut `c` in the original array is always ori[(c-1)/2]. The smallest value on the right of cut `c` in the original array is always ori.
Now go back to our problem. We have nums1 and nums2. We make sure that nums1 always has no shorter length than nums2 in order to make us focus only on one case. (line 13-14.)
From now on we suppose that nums1 and nums2 have already been expanded. So the two expanded arrays have 2*N1+1 and 2*N2+1 length respectively. (N1 >= N2). The total length of nums1 and nums2 is thus 2*N1+2*N2+2, an even number. The whole algorithm’s idea is to find two cuts, cut_i and cut_j, in nums1 and nums2 so that the median of the two lists resides in the middle of the two cuts. Such two cuts, cut_i and cut_j, will have two following properties:
- There should be totally (N1+N2) elements on the left of cut_i in nums1 and on the left of cut_j in nums2. There should also be totally (N1+N2) elements on the right of cut_1 in nums1 and on the left of cut_j in nums2. So once cut_i’s or (cut_j’s) index is determined, the other is determined also.
- The median is between the two cuts’ values. So, the largest element in nums1 which is on the left of cut_i (nums1[(cut_i-1)/2]), should be no larger than the smallest element on the right of cut_j in nums2 (nums2[cut_j/2]). The largest element on the left of cut_j in nums2, nums2[(cut_j-1)/2], should be no larger than the smallest element on the right of cut_i in nums1 (nums1[cut_i/2]).
Using these two properties, you can start to binary search a cut in nums2, and consequently determine the cut in nums1. If either of the two properties doesn’t hold, you need to adjust the cut in nums2 in a binary search manner. However, we shouldn’t first start binary search a cut in nums1 (the longer list) and then determine the cut in nums2 (the shorter list). Because you may find a cut in nums1 on the left of which there already are more elements than N1+N2. In that case, you can’t even determine where is the cut in nums2.
The only corner cases you need to handle is when any cut is at index 0 or end index. But such cases can be handled intuitively: if you cut at index 0, your left value should be -inf. If you cut at end index, your right value should be inf.
Reference
https://leetcode.com/discuss/41621/very-concise-iterative-solution-with-detailed-explanation