В консоли с клавиатуры вводится текст. Необходимо найти наиболее часто употребляемые пары слов во введенном тексте. -Python(Питон)

#!/usr/bin/python
# -*- coding: utf-8 -*-
 
import re
 
txt = """Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the re module. Using this little language, you specify the rules for the set of possible strings that you want to match; this set might contain English sentences, or e-mail addresses, or TeX commands, or anything you like. You can then ask questions such as “Does this string match the pattern?”, or “Is there a match for the pattern anywhere in this string?”. You can also use REs to modify a string or to split it apart in various ways.
Regular expression patterns are compiled into a series of bytecodes which are then executed by a matching engine written in C. For advanced use, it may be necessary to pay careful attention to how the engine will execute a given RE, and write the RE in a certain way in order to produce bytecode that runs faster. Optimization isn’t covered in this document, because it requires that you have a good understanding of the matching engine’s internals.
The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions. There are also tasks that can be done with regular expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable.
"""
 
txt = txt.lower()
txt = re.sub(ur'[^0-9а-яa-z\s]*', '', txt)
 
dc = []
txt = txt.split(' ')
 
for i in range(len(txt)):
    if i<len(txt)-1:
        dc.append(txt[i] + ' ' + txt[i+1])
    
 
res = {}
for i in dc:
    c = dc.count(i)
    if c>1:
        res[i] = c
 
 
if len(res)>0:
    for i in res:
        print(i + ':' + str(res[i]))

Leave a Comment