## file-io demo  

Open and read text file, read first line & last line, also use binary read. Only binary read can use reverse order seek, faster for large file processing.  To test on larger text files.      

Extends basic idea from <i>Python Workout<i>, by Reuven Lerner, c 2020 Manning, chapter 5       
Start August 7, 2020 week,  
    Updates 9/3/2020 week    

### 0. Check python version, active conda env  

In [15]:
!python --version

Python 3.8.2


In [17]:
!conda env list  
# * is next to active conda env.  

# conda environments:
#
base                     /home/jyoon/conda3
dlpy                  *  /home/jyoon/conda3/envs/dlpy
fluentpy                 /home/jyoon/conda3/envs/fluentpy



### 1. Start practice

In [3]:
f = open('demofile.txt', 'rt')
blob = f.read()  # read all 
print(blob)

Hello! Welcome to demofile.txt
This file is for testing purposes.
Good Luck!



In [4]:
print(f.closed)  # check if f is closed at this point.

False


In [5]:
f = open('demofile.txt')  # Open again to go back to file start point.  
line = f.readline()  # read first line, no S
print(line)

Hello! Welcome to demofile.txt



In [6]:
f = open('demofile.txt')
lines = f.readlines()  # read all lines, iterable list. With S
print(lines[-1])  # print last line
f.close()

Good Luck!



### 2. Binary format reads 'rb'  

In [7]:
# Try binary format reading.  
with open('demofile.txt', 'rb') as f:  # binary format
    first = f.readline()
    f.seek(-5, 2)  # Go to pointer 5th bytes from end of file 
    #last = f.read()  # does readline work?  May not read all of the last line.  
    last = f.readline()  # readline does work.  Still not all of the line.  
    print(first, last)


b'Hello! Welcome to demofile.txt\n' b'uck!\n'


In [8]:
# Seek from end of file needs a loop.  
with open('demofile.txt', 'rb') as f:  # binary format
    first = f.readline()
    
    f.seek(-2, 2)  # Go to pointer 2 bytes relative to end   
    while f.read(1) != b"\n":   # while read one byte is not equal to binary '\n'
        f.seek(-2, 1)   # go back 1 byte.  
    # after while loop, this is beginning of last line    
    last = f.read()  # read all bytes from here.   
    # last = f.readline() gives same output.  
    
    print(str(first)+"\n", str(last))

b'Hello! Welcome to demofile.txt\n'
 b'Good Luck!\n'


### 3. Good for short files, readlines() all at once, save, index [ ] into each line

In [9]:
# Good for short text files.  
with open('demofile.txt', 'rt') as f:  
    lines = f.readlines()
    first = lines[0]
    last = lines[-1]
    print(first, last)

print(f.closed)  # check file is cloased after "with" block end.  

Hello! Welcome to demofile.txt
 Good Luck!

True


### 4. seek(0) to go back to start, works with 'rt' 

In [10]:
f2 = open('demofile.txt')  
lines = f2.readlines()
print("lines: ", lines)

f2.seek(0)  # Goto beginning of file
blob= f2.read()
print("blob: ", blob)

f2.seek(0)
one = f2.readlines(1)
print("one: ", one)

f2.seek(0)
two = f2.readlines()[1]  # Works!  prints 2nd item. nifty. 
print("two: ", two)

lines:  ['Hello! Welcome to demofile.txt\n', 'This file is for testing purposes.\n', 'Good Luck!\n']
blob:  Hello! Welcome to demofile.txt
This file is for testing purposes.
Good Luck!

one:  ['Hello! Welcome to demofile.txt\n']
two:  This file is for testing purposes.



### 5. Only binary read 'rb' can use relative indexing  

In [11]:
# seek can be used with text format?  
# Yes, but can't do relative location other than 0, current point. 
# If the file is opened in text mode (without ‘b’), only offsets returned by tell() are legal. 
# none of the relative indexing from back/current location works without 'b' binary format.  

with open('demofile.txt', 'rb') as f:  # need to be in binary format
    first = f.readline()
    print(first)
    
    f.seek(0, 0)  
    last = f.readlines()[2]
    print(last)
 
    #f.seek(-2, 2)  # not work, only 0 offset works with 2, 1 mode.  
    f.seek(0, 2)  # end of file work with text format file.  
    print(f.tell())
    f.seek(-11, 1)
    print(f.tell())
    last = f.read()
    print(last) 


b'Hello! Welcome to demofile.txt\n'
b'Good Luck!\n'
77
66
b'Good Luck!\n'


In [12]:
with open('demofile.txt') as f:  # text format again
    f.seek(0, 2)  # end of file  
    loc = f.tell(); print(loc)
    
    f.seek(0, 0)  # go back to beginning of tile 
    print(f.tell())
    first = f.readline()
    print(first, f.tell())
    
    last = f.readlines()[-1]
    print(last, f.tell())
    
    f.seek(31, 0)
    two = f.readlines()
    print(two, f.tell())
    
    f.seek(31, 0)
    second = f.readline()
    print(second, f.tell())
    
    third = f.readline()
    print(third)

77
0
Hello! Welcome to demofile.txt
 31
Good Luck!
 77
['This file is for testing purposes.\n', 'Good Luck!\n'] 77
This file is for testing purposes.
 66
Good Luck!



### 6. Doen't solve large file last line problem, but another way to move around 'rt' format.

In [14]:
with open('demofile.txt') as f:  # text format again
    
    # num_ln = len(f.readlines()); print(num_ln)
    
    num_ch_all = len(f.read())
    print(num_ch_all)
    
    f.seek(0)  # back to beginning
    num_ch_last = len(f.readlines()[-1]); 
    print(num_ch_last)
    
    offset = num_ch_all - num_ch_last  # offset from file beginning
    f.seek(offset, 0)
    print(f.tell())
    print(f.read())

77
11
66
Good Luck!



### 7. Read a slightly larger file, The Raven by Edgar Allen Poe  

In [2]:
with open('TheRaven-Poe.txt') as raven:  # read as default text format 
    lines = raven.readlines()
    first = lines[0]
    last = lines[-1]
    print("first: ", first, "\n", "last: ", last)
    

first:  The Raven
 
 last:                                       Shall be lifted--nevermore!


In [10]:
print(raven.close())  # check it it closed
print(lines[0:10])

None
['The Raven\n', 'by\n', 'Edgar Allan Poe\n', '\n', '  Once upon a midnight dreary, while I pondered, weak and weary,\n', '  Over many a quaint and curious volume of forgotten lore--\n', '  While I nodded, nearly napping, suddenly there came a tapping,\n', '  As of some one gently rapping, rapping at my chamber door.\n', '  "\'Tis some visitor," I muttered, "tapping at my chamber door--\n', '                                     Only this and nothing more."\n']


In [11]:
print("Start of poem: ")
print(lines[4])

Start of poem: 
  Once upon a midnight dreary, while I pondered, weak and weary,



In [8]:
with open('TheRaven-Poe.txt') as raven:  # read again as text 
    blob = raven.read()  # Read whole thing  
    print(blob)

The Raven
by
Edgar Allan Poe

  Once upon a midnight dreary, while I pondered, weak and weary,
  Over many a quaint and curious volume of forgotten lore--
  While I nodded, nearly napping, suddenly there came a tapping,
  As of some one gently rapping, rapping at my chamber door.
  "'Tis some visitor," I muttered, "tapping at my chamber door--
                                     Only this and nothing more."

  Ah, distinctly I remember it was in the bleak December,
  And each separate dying ember wrought its ghost upon the floor.
  Eagerly I wished the morrow;--vainly I had sought to borrow
  From my books surcease of sorrow--sorrow for the lost Lenore--
  For the rare and radiant maiden whom the angels name Lenore--
                                     Nameless here for evermore.

  And the silken sad uncertain rustling of each purple curtain
  Thrilled me--filled me with fantastic terrors never felt before;
  So that now, to still the beating of my heart, I stood repeating
  "'Tis s

In [14]:
with open('TheRaven-Poe.txt', 'rb') as f:  # read as binary format 
    first = f.readline()
    print(first)
    
    for c in (1,2,3): # Move pointer past 3 lines
        start = f.readline()
    start = f.readline()  # Next line is start of poem  
    print(start)
    
    f.seek(-2, 2)  # Go to pointer 2 bytes relative to end   
    while f.read(1) != b"\n":   # while read one byte is not equal to binary '\n'
        f.seek(-2, 1)   # go back 1 byte.  
    # after while loop, this is beginning of last line    
    last = f.read()  # read all bytes from here.   
    # last = f.readline() gives same output.  
    print(last)

b'The Raven\r\n'
b'  Once upon a midnight dreary, while I pondered, weak and weary,\r\n'
b'                                     Shall be lifted--nevermore!'


### 8. Next, to test speed using 150 - 200 MB text file from Google Ngram dataset.