Day 11: File Input/Output

back · home · slides · CMSC 201 (Fall 2024) @ UMBC · file io :^)

CMSC 201 Day 11: File Input/Output

Agenda:

  • Project?! (and HW5 due this Friday!)
  • "Fun": hacker news
  • Who cares? (AKA: what will we make)
  • Files, opening
  • Files, reading/writing, newlines, closing
  • How does Linux actually work?
  • Files, "with"

    Goal: Learn about files, maybe have fun :)

    My email is: sdonahue@umbc.edu (Shane Donahue), office hours Tu/Th ITE 373 2-3PM.

  • Project #1

    Project #1 is currently out! It's due Friday 2024-11-01 (Nov 1) at 11:59:59 PM! (So you have about three weeks.)

    Let's look at project 1!

    HW5 is still out and due Friday 2024-10-18 at 11:59:59 PM :)

    "Fun": hacker news

    It's like reddit, but harder to read! news.ycombinator.com

    Who cares (about file input/output)?

    Writing to files is really the only way we have to "save" information from a program when our computer turns off! (Non-volatile memory.) Let's use it to make a "database" of the same revolving sushi calculator, that works between runs! https://i.ytimg.com/vi/dgs2Hxo25Cs/maxresdefault.jpg

    Files

    We've been writing and reading files the whole time (when you emacs filename.py, you're editing a file). But we can use python to automate the reading and writing of these files.

    Before using a file, you must open it!

    my_file = open("cool_file.txt")

    This returns a file "object" (class). When we open a file like this, in Python, the file needs to already exist. Othewise, we get:

    FileNotFoundError: [Errno 2] No such file or directory: 'cool_file.txt'

    The default mode (when you don't pass one, like we didn't above, is "read" or "r"). In order to write (create) it, we need to pass either "write" "w" or "append" "a" modes:

    # Open file in write mode
    my_file = open("cool_file.txt", "w")

    The modes are:

  • r: Read (read only, error if does not exist)
  • w: Write (overwrite) (can't read)
  • a: Append (add to the end)
  • x: Create (just create the file)
  • # Open in append mode
    my_file = open("cool_file.txt", "a")

    Once we have this "object", we can use methods on it just like we have methods for strings and lists and dictionaries, except now they're file methods.

    Some useful reading methods are:

  • read(): read ALL the contents
  • readlines(): read ALL and split on newline, without removing it...
  • readline(): same thing but just one line
  • Files are a "stream" style of "object"... This means when you read it, you're not getting it back! It's not "idempotent". You can run the same method twice and get a DIFFERENT result.

    my_file = open("cool_file.txt", "r")
    print(my_file.readlines()) # Everything!
    print(my_file.readlines()) # Nothing...
    
    Challenge: Get all newline-separated lines from a file into a big list, and remove the newlines (readlines w/o included newlines).

    There's also another type of mode, "b" for binary and "t" for text ("rb" would be read binary, "wb" would be write binary, etc). This deals with character encodings. It uses "t" by default. If you have any non-ASCII characters, they will not be shown properly if you use "b" (binary), and similarly if you have any binary files (photos, music, etc) then they will not be processed properly in text mode. (But we don't really need to worry about this).

    The writing equivalents:

  • write(content): write whatever you pass (returns number of characters written)
  • writelines(list_content): write list of strings (does not add newlines)
  • Buffering... where did our content go? Why is it not in the file?

    Our output is "buffered" to the file. That means it won't write until we close the file or we reach some threshold of input (maybe 4096 characters or more).

    File paths, how do they work?

    In Linux, / is the "root directory". This is kind of like C:\ on Windows, if you've seen that. All files live under /. For example, /etc/passwd is a file in the etc folder under /. If you have a file path that starts with a /, it's an absolute path. If you start with just the file name or ./ or ../, it's a relative path.

    Challenge: Save our fingers from typing every single line of a given input for homework! Automatically read in test input from a file.
    Automated HW test cases
    HUMAN_INPUT = False
    test_input = open("hw5-1-1.txt")
    
    def get_input(prompt):
        if HUMAN_INPUT:
            return input(prompt)
        else:
            return test_input.readline()[:-1]
    
    if __name__ == "__main__":
        order_list = []
        user_input = get_input("Order: ")
        while user_input != "place order":
            order_list.append(user_input)
            user_input = get_input("Order: ")
    
        print(order_list)

    How does Linux actually work? Python has to interact with our hard drive. We do not trust Python to use our hard drive directly. It could break it or corrupt all of our files! Similarly, we don't want any random software on our computer to be able to use it directly...

    Enter: the kernel! Linux provides an interface, a lot like a programming language, to interact with files. When you use Python's "open" function, it will use Linux's "system call" to actually open the file. The modes we use loosely correspond to the modes for Linux's system call. It will use a "file descriptor".

    Anyway... When we're done with a file, we need to close it. If we forget to close it, we may never write the content (due to buffering) or we may prevent another program from opening it. Forgetting to close the "handle" is called a "leak" or "file descriptor leak".

    my_file.close()
    Challenge: Word finder-- take a word from the user and print which row and word number it can be found at.
    Word finder
    # bee-movie.txt contains the IMDB page for the bee movie
    my_file = open("bee-movie.txt", "r")
    lines = my_file.readlines()
    
    def search_word(word, lines):
        rows = 0
        for line in lines:
            split_line = line.split()
            column = 0
            for search_word in split_line:
                if word == search_word:
                    return [rows, column]
                column += 1
            rows += 1
    
    if __name__ == "__main__":
        print(search_word("Jerry", lines))

    There is a Python feature for helping you not forget to close the file. It is "with".

    with open(file_name, 'r') as f:
        print(f.readlines())
    # Automatically closed as soon as we reach this point!

    Let's save some sushi prices!

    Sushi permanence: Save all the entries from the totals dictionary to a file, and read it every time the program starts. (We are going to provide a custom "serialization" structure for the dictionary).
    Sushi database!
    def add_cost(order):
        if "salmon" in order:
            return 2.5
        elif "inari" in order:
            return 2
        elif "tuna" in order:
            return 3
        elif "california roll" in order:
            return 5
        print("What is that?!")
        return 0
    
    user_input = input("> ")
    
    totals = {}
    
    # populate from disk db
    with open("sushi.db") as f:
        lines = f.readlines()
        for line in lines:
            line = line.strip()
            split_input = line.split(":")
            totals[split_input[0]] = float(split_input[1])
    
    while user_input != "done":
        split_order = user_input.split()
        # group_cool california roll
        group_name = split_order[0]
        order_name = " ".join(split_order[1:])
    
        if group_name in totals:
            totals[group_name] = totals[group_name] + add_cost(order_name)
        else:
            print("Welcome to sussssshiihi")
            totals[group_name] = add_cost(order_name)
    
        user_input = input("> ")
    
    print(totals)
    
    # write to saved db
    with open("sushi.db", "w") as f:
        for group in totals:
            f.write(group + ":" + str(totals[group]) + "\n")