Post

Moderation Filter Bypass in support.mozilla.org

SUMMARY

Recently, the triager at mozilla pointed out to me, that when replying a link to a question, the reply will be mark as spam, and will be up for moderation.

So i decided, to attack this functionality and try to bypass it

CODE REVIEW

When making a reply, we use the AnswerForm form

1
2
3
4
5
6
7
8
9
def reply(request, question_id):
    """Post a new answer to a question."""
    question = get_object_or_404(Question, pk=question_id, is_spam=False)
    answer_preview = None

    if not question.allows_new_answer(request.user):
        raise PermissionDenied

    form = AnswerForm(request.POST, **{"user": request.user, "question": question}) # LOOK HERE

This AnswerForm class has a method call clean. This is the function responsible for checking if the reply is a spam or not.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class AnswerForm(KitsuneBaseForumForm):
    """Form for replying to a question."""

    content = forms.CharField(
        label=_lazy("Content:"),
        min_length=5,
        max_length=10000,
        widget=forms.Textarea(attrs={"placeholder": REPLY_PLACEHOLDER}),
    )

    class Meta:
        model = Answer
        fields = ("content",)

    def clean(self, *args, **kwargs):
        """Override clean method to exempt question owner from spam filtering."""
        cdata = super(AnswerForm, self).clean(*args, **kwargs) # LOOK HERE
        # if there is a reply from the owner, remove the spam flag
        if self.user and self.question and self.user == self.question.creator:
            cdata.pop("is_spam", None)

        return cdata

Here, you can see that the clean, simply call the clean function of the parent class which is the KitsuneBaseForumForm. This is the KitsuneBaseForumForm class. I removed a few parts of it which is unimportant

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class KitsuneBaseForumForm(forms.Form):
   def __init__(self, *args, **kwargs):
        #UNIMPORTANT SNIPPET

    def clean(self, *args, **kwargs):
        cdata = self.cleaned_data.get("content")
        
        # UNIMPORTANT SNIPPET

        if not (
            self.user.groups.filter(name__in=TRUSTED_GROUPS).exists()
            or self.user.has_perm("flagit.can_moderate")
            or self.user.has_perm("sumo.bypass_ratelimit")
        ) and check_for_spam_content(cdata): # LOOK HERE
            self.cleaned_data.update({"is_spam": True})

        return self.cleaned_data

Here, you can see that it calls the check_for_spam_content on our user input. If it returns true, our reply will be mark as spam and will be up for moderation.

1
2
3
4
5
6
7
8
9
def check_for_spam_content(data):
    digits = "".join(filter(type(data).isdigit, data))
    is_toll_free = settings.TOLL_FREE_REGEX.match(digits)

    is_nanp_number = match_regex_with_timeout(settings.NANP_REGEX, data)

    has_links = has_blocked_link(data) # INTERESTING PART

    return is_toll_free or is_nanp_number or has_links

In the check_for_spam_content function, you can see the function that checks our input for any links.has_blocked_link

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def has_blocked_link(data):
    for match in POTENTIAL_LINK_REGEX.finditer(data):
        tld = match.group(1).upper()
        if tld in VALID_TLDS: # VULNERABLE PART OF THE CODE
            full_domain = match.group(0).lower()
            in_allowlist = False
            for allowed_domain in settings.ALLOW_LINKS_FROM:
                split = full_domain.rsplit(allowed_domain, 1)
                if len(split) != 2 or split[-1]:
                    continue
                if not split[0] or split[0][-1] == ".":
                    in_allowlist = True
                    break
            if not in_allowlist:
                return True
    # UNIMPORTANT SNIPPET
    return False

By default, it returns False, which is what we want. You can see that before anything else, it first checks if the tld of our input is in the VALID_TLDS. Lets see what this VALID_TLDS is

1
2
3
4
5
# downloaded from https://data.iana.org/TLD/tlds-alpha-by-domain.txt
path = os.path.join(os.path.dirname(__file__), "tlds-alpha-by-domain.txt")

with open(path) as f:
    VALID_TLDS = set(f.read().splitlines()[1:])

It is the tlds-alpha-by-domain.txt list from iana.org. Opening this txt file, i found out that they are using an outdated version of the list.

1
2
3
4
5
# Version 2020062200, Last Updated Mon Jun 22 07:07:01 2020 UTC
AAA
AARP
ABARTH
...

So i thought, what if there are new tlds that are registered between 2020 and today. So, i downloaded the newest tld list from https://data.iana.org/TLD/tlds-alpha-by-domain.txt, compare the two and get all the tld that doesnt exist on both txt files.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ cat tlds-alpha-by-domain.txt* | sort | uniq -u
AFAMILYCOMPANY
AIGO
BUDAPEST
CASEIH
CEB
CSC
DUCK
FUJIXEROX
GLADE
INTEL
IVECO
JCP
LIXIL
LUPIN
METLIFE
MUSIC
NATIONWIDE
NEWHOLLAND
OFF
ONYOURSIDE
QVC
RAID
RIGHTATHOME
RMIT
SCJOHNSON
SHRIRAM
SPA
SPREADBETTING
SWIFTCOVER
SYMANTEC
# Version 2020062200, Last Updated Mon Jun 22 07:07:01 2020 UTC
# Version 2022040300, Last Updated Sun Apr  3 07:07:01 2022 UTC
XN--3OQ18VL8PN36A
XN--4DBRK0CE
XN--KPU716F
XN--PBT977C

And write a simple python script to see which of these tlds, doesnt belong in the old tld list of mozilla.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import os
 
#LIST OF TLDS THAT IS NOT ON BOTH TEXT FILES
lamaw = ["AFAMILYCOMPANY","AIGO","BUDAPEST","CASEIH","CEB","CSC","DUCK","FUJIXEROX","GLADE","INTEL","IVECO","JCP","LIXIL","LUPIN","METLIFE","MUSIC","NATIONWIDE","NEWHOLLAND","OFF","ONYOURSIDE","QVC","RAID","RIGHTATHOME","RMIT","SCJOHNSON","SHRIRAM","SPA","SPREADBETTING","SWIFTCOVER","SYMANTEC","XN--3OQ18VL8PN36A","XN--4DBRK0CE","XN--KPU716F","XN--PBT977C"] 
 
path = os.path.join(os.path.dirname(__file__), "tlds-alpha-by-domain.txt")
 
for i in lamaw:
    with open(path) as f:
        VALID_TLDS = set(f.read().splitlines()[1:])
        for x in VALID_TLDS:
            if x==i:
                continue
    print(i)

Running it, i found all the valid tlds, that are not on the list of mozilla.

Now, we can make a comment with a url with any of these tld, like https://evil.spa and it will not be marked as spam, bypassing the filter.

Unfortunately, the only thing we can achieve with this bug is spam bypass which is out of scope by mozilla. If this was in any other functionalities, this would have been a cool bug.

Thank you for reading.

This post is licensed under CC BY 4.0 by the author.