Advertisement
Advertisement


Remove specific characters from a string in Python


Question

I'm trying to remove specific characters from a string using Python. This is the code I'm using right now. Unfortunately it appears to do nothing to the string.

for char in line:
    if char in " ?.!/;:":
        line.replace(char,'')

How do I do this properly?

2017/10/25
1
562
10/25/2017 10:59:21 AM

Accepted Answer

Strings in Python are immutable (can't be changed). Because of this, the effect of line.replace(...) is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line in order to have that variable take the new value, with those characters removed.

Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (but read on for Python 3 differences):

line = line.translate(None, '[email protected]#$')

or regular expression replacement with re.sub

import re
line = re.sub('[[email protected]#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in line which are in that class are replaced with the second parameter to sub: an empty string.

In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it's noted in the documentation for str.translate.

When calling the translate method of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass None as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord on them) to the ordinal values of the characters which should replace them, or—usefully to us—None to indicate that they should be deleted.

So to do the above dance with a Unicode string you would call something like

translation_table = dict.fromkeys(map(ord, '[email protected]#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeys and map are used to succinctly generate a dictionary containing

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

unicode_line = unicode_line.translate({ord(c): None for c in '[email protected]#$'})

Or create the same translation table with str.maketrans:

unicode_line = unicode_line.translate(str.maketrans('', '', '[email protected]#$'))

* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None:

import string
line = line.translate(string.maketrans('', ''), '[email protected]#$')

Here string.maketrans is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.

2020/01/20
637
1/20/2020 11:08:47 AM

Am I missing the point here, or is it just the following:

string = "ab1cd1ef"
string = string.replace("1","") 

print string
# result: "abcdef"

Put it in a loop:

a = "[email protected]#d$"
b = "[email protected]#$"
for char in b:
    a = a.replace(char,"")

print a
# result: "abcd"
2020/01/20

>>> line = "abc#@!?efg12;:?"
>>> ''.join( c for c in line if  c not in '?:!/;' )
'abc#@efg12'
2010/10/15

Easy peasy with re.sub regular expression as of Python 3.5

re.sub('\ |\?|\.|\!|\/|\;|\:', '', line)

Example

>>> import re

>>> line = 'Q: Do I write ;/.??? No!!!'

>>> re.sub('\ |\?|\.|\!|\/|\;|\:', '', line)
'QDoIwriteNo'

Explanation

In regular expressions (regex), | is a logical OR and \ escapes spaces and special characters that might be actual regex commands. Whereas sub stands for substitution, in this case with the empty string ''.

2020/09/02

For the inverse requirement of only allowing certain characters in a string, you can use regular expressions with a set complement operator [^ABCabc]. For example, to remove everything except ascii letters, digits, and the hyphen:

>>> import string
>>> import re
>>>
>>> phrase = '  There were "nine" (9) chick-peas in my pocket!!!      '
>>> allow = string.letters + string.digits + '-'
>>> re.sub('[^%s]' % allow, '', phrase)

'Therewerenine9chick-peasinmypocket'

From the python regular expression documentation:

Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched. For example, [^5] will match any character except '5', and [^^] will match any character except '^'. ^ has no special meaning if it’s not the first character in the set.

2014/01/25

The asker almost had it. Like most things in Python, the answer is simpler than you think.

>>> line = "H E?.LL!/;O:: "  
>>> for char in ' ?.!/;:':  
...  line = line.replace(char,'')  
...
>>> print line
HELLO

You don't have to do the nested if/for loop thing, but you DO need to check each character individually.

2011/12/14

Source: https://stackoverflow.com/questions/3939361
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]