Day 10: Dictionaries

back · home · slides · CMSC 201 (Fall 2024) @ UMBC · another mutable structure for us :) hashmaps!

CMSC 201: Dictionaries

Agenda:

  • HW5 due this Friday!
  • "Fun": projecteuler.net
  • Who cares? (AKA: what will we make)
  • Dictionaries!

    Goal: Learn about dictionaries, maybe have fun :)

    My email is: sdonahue@umbc.edu (Shane Donahue), office hours Tu/Th ITE 373 2-3PM.

  • Midterm #1

    Midterm #1 happened... it went well... mostly. I think. Will review the midterm once everything is graded with it.

    HW5

    HW5 is out and due Friday 2024-10-18 at 11:59:59 PM :) You know everything you need to know for it! Don't use dictionaries.

    Let's look at HW5!

    We'll talk about the project next time :) It is also currently out.

    "Fun": project euler

    Have you ever finished your homework, and thought, I wish I could solve more problems? With more math?

    Look no further: projecteuler.net! Many fun problems to tackle :) Use any language you'd like!

    Who cares (about dictionaries)?

    Dictionaries are a very useful tool (and new mutable datastructure!). They let us store data in a really flexible way. Let's use it to make a revolving sushi simulator :) Goals: serve some number of customers and keep track of total bill per group. https://i.ytimg.com/vi/dgs2Hxo25Cs/maxresdefault.jpg

    Dictionaries

    Here's a problem for you:

    Challenge: Using lists, take a block of space-delimited text. Return the number of times each word was used.

    What do we do?! We need to dynamically associate inputs with outputs (the words with their count). We could do some janky list manipulations (let's try that)!

    List solution
    bee_movie = """
    According to all known laws
    of aviation,
    there is no way a bee
    should be able to fly.
    Its wings are too small to get
    its fat little body off the ground.
    The bee, of course, flies anyway
    because bees don't care
    what humans think is impossible.
    Yellow, black. Yellow, black.
    Yellow, black. Yellow, black.
    Ooh, black and yellow!
    Let's shake it up a little.
    Barry! Breakfast is ready!
    """
    
    words = []
    word_count = []
    
    word_split = bee_movie_first.split() 
    
    for word in word_split:
        word = word.lower()
        if word not in words:
            words.append(word)
            word_count.append(1)
        else:
            index = 0
            curr_index = 0
            for word_search in words:
                if word == word_search:
                    index = curr_index 
                curr_index += 1
            word_count[index] += 1
    
    print(words)
    print(word_count)

    Wow that was kind of not great. There is a better way: dictionaries!

    Dictionaries (in other languages, usually called hash maps or hash tables) are a way to associate any "hashable" input with an arbitrary value. Think of it like a dictionary: each word is associated with a definition.

    my_favorite_dictionary = {"apple": 3,
                              "banana": 64,
                              "pear": 17}
    

    Curly brackets on either side. Colons separate the keys and values. You can also make empty dictionaries:

    empty_dict = {}
    also_empty_dict = dict()
    

    Retrieve values from the dictionary just like indexing a list, except instead of it being an index, it's the key you want.

    print(my_favorite_dictionary["pear"]) # == 17
    print(my_favorite_dictionary["banana"]) # == 64
    print(my_favorite_dictionary["cake"]) # KeyError
    

    Unlike lists, we can now add an item by assigning to an index.

    my_favorite_dictionary["dragonfruit"] = 51

    Dictionaries are mutable! They act like they are "passed by reference". You can change them and they change for everyone.

    def add_double_num(dict_input, key, value):
        dict_input[key] = value*2
    
    my_favorite_dictionary = {"apple": 3,
                              "banana": 64,
                              "pear": 17}
    
    add_double_num(my_favorite_dictionary, "papaya", 3)
    print(my_favorite_dictionary["papaya"]) # == 6
    

    If you want to take a copy, like lists, you can do dict(your_list), which will return a copy of your dictionary.

    Notice that these return something that looks like a list. You can iterate through a dictionary with a for loop. This will iterate through all the keys.

    my_favorite_dictionary = {"apple": 3,
                              "banana": 64,
                              "pear": 17}
    
    for fruit in my_favorite_dictionary:
        print(fruit)
    

    The in keyword can be used to check if a key is in the dictionary. (Just like lists or strings.)

    Challenge: Using dictionaries, take a block of space-delimited text. Return the number of times each word was used. (This is the same as the list one, but now with dictionaries.)
    Dictionaries wordcount solution
    bee_movie = """
    According to all known laws
    of aviation,
    there is no way a bee
    should be able to fly.
    Its wings are too small to get
    its fat little body off the ground.
    The bee, of course, flies anyway
    because bees don't care
    what humans think is impossible.
    Yellow, black. Yellow, black.
    Yellow, black. Yellow, black.
    Ooh, black and yellow!
    Let's shake it up a little.
    Barry! Breakfast is ready!
    """
    
    words_split = bee_movie_first.split() 
    words_count = {}
    
    for word in words_split: 
        if word in words_count:
            words_count[word] += 1
        else:
            words_count[word] = 1
    
    print(word_count)

    You can get all keys with the .keys() method. Get all values with .values()! As a reminder, keys are the (left hand side) inputs to the dictionary, values are the (right hand side) outputs.

    Like everything in Python, typing is not enforced. You can do something very cursed like this:

    my_favorite_dictionary = {14: 3,
                              "banana": 64,
                              0.1: "cool",
                              True: False}
    

    But, you can't use "unhashable" data types as keys.

    my_favorite_dictionary[[0, 1, 2]] = "nice_list"
    # TypeError: unhashable type: 'list'

    This is because the list can change. It's mutable. And if it changes, then Python won't be able to find it in the dictionary anymore, even if you consider it to be the "same" list.

    Challenge: Caching colors diary-- write a program that lets you write down your opinions on various colors. If you attempt to give an opinion on a color twice, print out the "cached" saved response from the first time.
    Color diary solution
    color_opinions = {}
    
    user_input = input("Welcome! What is the color? ")
    while user_input != "done":
        if user_input in color_opinions:
            print("No! You already gave an opinion on", \
    				user_input, "which was", color_opinions[user_input])
        else:
            new_opinion = input("What is your opinion: ")
            color_opinions[user_input] = new_opinion
        
        user_input = input("What is the color? ")
    
    print(color_opinions)

    How does a dictionary (or hash table, same thing) actually work? Your input is converted into a "hash", which is then stored in a big list. Instead of traversing the big list to find your element, you hash your element, and then jump straight to that index.

    https://upload.wikimedia.org/wikipedia/commons/7/7d/Hash_table_3_1_1_0_1_0_0_SP.svg Picture of hash tables from wikipedia

    What if two inputs create the same hash? That is known as a hash collision. This is a problem for dictionaries, since it prevents them from returning the correct value.

    Two common copes: open addressing-- bump all sequential items down by one. Separate chaining-- have a list of values that can exist at each hash.

    If this process is inefficient, it can cause serious problems. Most languages (including Python 3.2 and older) have had issues where the hashing algorithm was predictable, so someone could cause hashes to collide with carefully crafted inputs. This would take up so much computing power that it causes a denial of service (DoS).

    Let's make the revolving sushi simulator!

    Goals: serve some number of customers and keep track of total bill per group. Input should look like: "group_name order_item". Salmon is 2.50, inari is 2, tuna is 3, and california roll is 5. If it is a new group, welcome them.

    Sushi solution
    totals_dict = {}
    user_input = input(">> ")
    while user_input != "done":
        split_order = user_input.split()
        group_name = split_order[0]
        order = " ".join(split_order[1:])
    
        if group_name not in totals_dict:
            print("Welcome new group!")
            totals_dict[group_name] = 0
    
        if "salmon" in order:
            totals_dict[group_name] += 2.5
        elif "inari" in order:
            totals_dict[group_name] += 2
        elif "tuna" in order:
            totals_dict[group_name] += 3
        elif "california roll" in order:
            totals_dict[group_name] += 5
        else:
            print("Not a real thing! Banned!")
            totals_dict[group_name] += 1000
    
        user_input = input(">> ")
    
    print(totals_dict)