11.10. Regexp Split

11.10.1. About

  • re.split()

  • Split text by pattern

11.10.2. Examples

Listing 426. Usage of re.split()
import re

PATTERN = r'\s[a-z]{3}\s'
INPUT = 'Baked Beans And Spam'

re.split(PATTERN, INPUT, flags=re.IGNORECASE)
# ['Baked Beans', 'Spam']
Listing 427. Making a Phonebook
import re


TEXT = """Jan Twardowski: 834.345.1254 Polish Space Agency

Mark Watney: 892.345.3428 Johnson Space Center
Matt Kowalski: 925.541.7625 Kennedy Space Center


Melissa Lewis: 548.326.4584 Bajkonur, Kazakhstan"""

entries = re.split('\n+', TEXT)
print(entries)
# [
#   'Jan Twardowski: 834.345.1254 Polish Space Agency',
#   'Mark Watney: 892.345.3428 Johnson Space Center',
#   'Matt Kowalski: 925.541.7625 Kennedy Space Center',
#   'Melissa Lewis: 548.326.4584 Bajkonur, Kazakhstan'
# ]

output = [re.split(':?\s', entry, maxsplit=3) for entry in entries]
print(output)
# [
#   ['Jan', 'Twardowski', '834.345.1254', 'Polish Space Agency'],
#   ['Mark', 'Watney', '892.345.3428', 'Johnson Space Center'],
#   ['Matt', 'Kowalski', '925.541.7625', 'Kennedy Space Center'],
#   ['Melissa', 'Lewis', '548.326.4584', 'Bajkonur, Kazakhstan']
# ]

11.10.3. Assignments

11.10.3.1. Moon Speech (split)

  • Complexity level: easy

  • Lines of code to write: 5 lines

  • Estimated time of completion: 10 min

  • Solution: solution/split_moon_speech.py

  • References: "Moon Speech" by John F. Kennedy at Rice Stadium, Houston, TX on 1962-09-12 [re-1]

English
  1. Download "Moon Speech" text data/moon_speech.html

  2. Save as moon_speech.html

  3. Using re.split() split text by paragraphs

  4. Print paragraph starting with "We choose to go to the moon"

Polish
  1. Pobierz tekst przemówienia "Moon Speech" data/moon_speech.html

  2. Zapisz jako moon_speech.html

  3. Za pomocą re.split() podziel tekst na paragrafy

  4. Wyświetl paragraf zaczynający się od słów "We choose to go to the moon"