Why are there like seventy different kinds of strings in Rust???

Disclaimer: Memory Management

Assumption: You need to allocate memory to handle dynamically-sized data

How It Usually Goes


                    let my_string = "Hello, World!";
                    

                    fn print_string(string: &str) {
                        println!("Printing string: {}", string);
                    }

                    let my_string = "Hello, World!";
                    print_string(my_string);
                    

                    struct MyStruct {
                        my_string: &str,
                    }
                    

                    error[E0106]: missing lifetime specifier
                     --> src/main.rs:2:13
                      |
                    2 |     string: &str,
                      |             ^ expected lifetime parameter
                    

                    struct MyStruct {
                        my_string: String,
                    }

                    impl MyStruct {
                        pub fn new(string: &str) -> MyStruct {
                            MyStruct {
                                my_string: string,
                            }
                        }
                    }
                    

                    error[E0308]: mismatched types
                     --> src/main.rs:8:24
                      |
                    8 |             my_string: string,
                      |                        ^^^^^^
                      |                        |
                      |                        expected struct `std::string::String`, found &str
                      |                        help: try using a conversion method: `string.to_string()`
                      |
                      = note: expected type `std::string::String`
                                 found type `&str`
                    

                    impl MyStruct {
                        pub fn new(string: String) -> MyStruct {
                            MyStruct {
                                my_string: string,
                            }
                        }
                    }

                    let my_struct = MyStruct::new("Hello, World!");
                    

                    error[E0308]: mismatched types
                      --> src/main.rs:14:31
                       |
                    14 | let my_struct = MyStruct::new("Hello, World!");
                       |                               ^^^^^^^^^^^^^^^
                       |                               |
                       |                               expected struct `std::string::String`, found reference
                       |                               help: try using a conversion method: `"Hello, World!".to_string()`
                       |
                       = note: expected type `std::string::String`
                                  found type `&'static str`
                    

String Types

  • String
  • &str
  • &'static str(???)
  • OsString
  • OsStr
  • CString
  • CStr
  • Path
  • PathBuf
Q: Why can't Rust make strings easy like they are in $LANG???

A: Because rust has different goals than $LANG

Strings highlight a lot of the ways Rust differs from other programming languages

Two Major Differences

  • Memory management/ownership
  • Dealing with UTF-8

Memory and Ownership

Rust only has one string type

&str

(pronounced "string slice")

Borrowed array of bytes containing valid UTF-8

Those bytes must exist somewhere in order to be borrowed

String Literals


                    let my_string: &'static str = "Hello!";
                    

The bytes are baked into the application binary

Dynamic Strings

std adds allocation


                    let mut input = String::new();
                    io::stdin().read_line(&mut input).unwrap();
                    println!("User input: {}", input);
                    

String

  • Allocates space in memory
  • Owns the allocated memory
  • Provides a safe API for filling that memory
  • Allows you to borrow a &str from that memory

Why do it this way?

  • Rust is performance-conscious
  • Rust is explicit
  • Rust gives you control
  • Rust emphasizes local reasoning

Example: Building Strings


                    assert_eq!("Hello, " + "World!", "Hello, World!");
                    

                    error[E0369]: binary operation `+` cannot be applied to type `&str`
                     --> src/main.rs:2:26
                      |
                    2 |     assert_eq!("Hello, " + "World!", "Hello, World!");
                      |                --------- ^ -------- &str
                      |                |         |
                      |                |         `+` cannot be used to concatenate two `&str` strings
                      |                &str
                    help: `to_owned()` can be used to create an owned `String` from a string reference. String concatenation appends the string on the right to the string on the left and may require reallocation. This requires ownership of the string on the left
                      |
                    2 |     assert_eq!("Hello, ".to_owned() + "World!", "Hello, World!");
                    

                    let mut result = String::new();
                    result.push_str("Hello, ");
                    result.push_str("World!");
                    

                    "Hello, ".to_owned() + "World!" + "some" + "more"
                    

                    let mut my_string = String::new();
                    for item in some_list {
                        my_string = my_string + ", " + item;
                    }
                    println!("{}", item);
                    

This is O(n^2) in most languages, but it's O(n)in Rust

Other languages have to introduce a separate string type to give you control when it matters:


                    var builder = new StringBuilder();
                    foreach (var item in someList)
                    {
                        builder.Append(", ").Append(item);
                    }
                    Console.WriteLine(builder);
                    

Copying strings


                    let my_string = String::from("Hello");
                    let another_string = my_string + ", World!";

                    assert_eq!(my_string, "Hello");
                    assert_eq!(another_string, "Hello, World!");
                    

                    error[E0382]: borrow of moved value: `my_string`
                     --> src/main.rs:5:5
                      |
                    2 |     let my_string = String::from("Hello");
                      |         --------- move occurs because `my_string` has type `std::string::String`, which does not implement the `Copy` trait
                    3 |     let another_string = my_string + ", World!";
                      |                          --------- value moved here
                    4 |
                    5 |     assert_eq!(my_string, "Hello");
                      |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ value borrowed here after move
                    

                    let my_string = String::from("Hello");
                    let another_string = my_string.clone() + ", World!";

                    assert_eq!(my_string, "Hello");
                    assert_eq!(another_string, "Hello, World!");
                    

Local reasoning


                    struct MyStruct
                    {
                        char *myString;
                    }
                    
  • Where does myString live?
  • Was it allocated?
  • If it was, who needs to free it?
  • Is the answer different in different cases?

                    struct MyStruct {
                        my_string: String,
                    }
                    

vs


                    struct MyStruct<'a> {
                        my_string: &'a str,
                    }
                    

UTF-8

oh god why


                    assert_eq!("🤦🏼‍♂️".len(), 17);
                    

Please read https://hsivonen.fi/string-length/

oh god it's so complicated


                    let my_string = "Hello";
                    let first_letter = my_string[0];
                    assert_eq(first_letter, 'H');
                    

                    error[E0277]: the type `str` cannot be indexed by `{integer}`
                     --> src/main.rs:3:24
                      |
                    3 |     let first_letter = my_string[0];
                      |                        ^^^^^^^^^^^^ string indices are ranges of `usize`
                      |
                      = help: the trait `std::slice::SliceIndex` is not implemented for `{integer}`
                      = note: you can use `.chars().nth()` or `.bytes().nth()`
                              see chapter in The Book 
                      = note: required because of the requirements on the impl of `std::ops::Index<{integer}>` for `str`
                    

                    let my_string = "Hello";
                    let first_letter = &my_string[0..1];
                    assert_eq(first_letter, 'H');
                    

                    error[E0277]: can't compare `&str` with `char`
                     --> src/main.rs:4:5
                      |
                    4 |     assert_eq!(first_letter, 'H');
                      |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no implementation for `&str == char`
                      |
                      = help: the trait `std::cmp::PartialEq` is not implemented for `&str`
                    

                    let my_string = "Hello";
                    let first_letter = my_string.chars().nth(0).unwrap();
                    assert_eq!(first_letter, 'H');
                    

There are many, many examples of things that seem easy in other languages but are complicated in Rust

Here's the thing...

Strings are hard, actually

If you write your application assuming all input is ascii, it will break

often in surprising and horrible ways

  • Rust cares about correctness
  • Rust cares about consistency
  • Rust will make you think about edge cases early

Conclusion Goes Here