Skip to content

-r Recursive option#129

Merged
KevinHock merged 29 commits into
python-security:masterfrom
omergunal:patch-4
Jun 23, 2018
Merged

-r Recursive option#129
KevinHock merged 29 commits into
python-security:masterfrom
omergunal:patch-4

Conversation

@omergunal

@omergunal omergunal commented May 13, 2018

Copy link
Copy Markdown
Contributor

Issue: #127
There is a few steps for completing this PR. Now we can get all ".py" files in directory and exclude some files with "-x" option.

Comment thread pyt/__main__.py Outdated
excluded_files = args.excluded_paths
test = discover_files(directory_path, excluded_files)

print(test)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to see if it works

@KevinHock KevinHock self-requested a review June 6, 2018 04:22
@KevinHock

Copy link
Copy Markdown
Collaborator

Hi @omergunal, thanks for the soup! :D

Can you merge master into this branch and then resolve the merge conflicts? 👍

@omergunal

Copy link
Copy Markdown
Contributor Author

No problem :) , ok i will do it

@omergunal

Copy link
Copy Markdown
Contributor Author

in usage.py "filepath" is must required. should we do optional? because if we use "-r" we do not need it

@KevinHock

Copy link
Copy Markdown
Collaborator

Can you also add

    parser.add_argument(
        'targets', metavar='targets', type=str, nargs='*',
        help='source file(s) or directory(s) to be tested'
    )

@KevinHock

Copy link
Copy Markdown
Collaborator

Let's do this https://github.com/PyCQA/bandit/blob/master/bandit/cli/main.py#L153-L160 and then remove -f :)

@KevinHock

Copy link
Copy Markdown
Collaborator

I realize I'm totally changing my mind from what I said before, "This will enable a user to just give -r /path/to/files instead of -f file one at a time." but this seems cleaner.

@KevinHock

KevinHock commented Jun 7, 2018

Copy link
Copy Markdown
Collaborator

i.e. -r means we recursively search directories, and is a bool. Whereas the targets, (positional arguments) pyt foo.py controllers/will be the targets we analyze.

@omergunal

Copy link
Copy Markdown
Contributor Author

You mean, we will delete "-f" option and use "-r" for both single file scan and directory scan.And if user will not use any parameter, it will scan one file. Is that correct?

Comment thread pyt/usage.py Outdated
action='store',
default='',
help='Separate files with commas'
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-dent )

Comment thread pyt/usage.py
help='do not skip lines with # nosec comments'
)

optional_group.add_argument(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it

    parser.add_argument(
        '-r', '--recursive', dest='recursive',
        action='store_true', help='find and process files in subdirectories'
    )

Comment thread pyt/__main__.py Outdated


def discover_files(directory_path, excluded_files):
file_list = []

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We mostly use list() everywhere in assignments in the codebase, just for consistency.

Comment thread pyt/__main__.py Outdated
if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
file_list.append(fullpath)

return(file_list)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just for consistency, you can do return included_files (and I guess rename files to included_files)

@KevinHock

KevinHock commented Jun 7, 2018

Copy link
Copy Markdown
Collaborator

re:"You mean, we will delete -f option and use -r for both single file scan and directory scan.And if user will not use any parameter, it will scan one file. Is that correct?"

Delete -f
Use targets for both file and directory scan.
Use -r for doing something like

def discover_files(targets, excluded_files, recursion):
    included_files = list()
    excluded_list = excluded_files.split(",")

    for target in targets:
        if target.endswith('.py'):
            included_files.append(target)
        else: 
            for root, dirs, files in os.walk(target):
                for f in files:
                    fullpath = os.path.join(root, f)
                    if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
                        included_files.append(fullpath)
                if not recursion:
                    break

    return included_files

@KevinHock

Copy link
Copy Markdown
Collaborator

(just updated the code, should be better now.)

@omergunal

Copy link
Copy Markdown
Contributor Author

it seems good about returning "included_list". also "-x " parameter is available.

Comment thread pyt/__main__.py Outdated



targets = args.targets

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more DRY if you did

files = discover_files(
  args.targets,
  args.excluded_paths,
  args.recursive
)

Comment thread pyt/usage.py Outdated
default='',
help='Separate files with commas'
)
optional_group.add_argument(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess targets will be part of _add_required_group b/c it's replacing -f files

@KevinHock KevinHock left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, looking good :D

Comment thread pyt/__main__.py
directory = os.path.dirname(path)
project_modules = get_modules(directory)
local_modules = get_directory_modules(directory)
for path in files:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this for loop, you can have a vulnerabilities = list(), and then do

vulnerabilities.append(find_vulnerabilities(
    cfg_list,
    ui_mode,
    args.blackbox_mapping_file,
    args.trigger_word_file,
    nosec_lines
))

@omergunal omergunal Jun 11, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it seems bad
selection_102

there is no problem at the moment. why we should change like this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it find vulnerabilities in all the files, or just the last file (i.e. last iteration of the loop)? If I'm reading it write I think it might do e.g. vulnerabilites = kitmap_vulns...then finally vulnerabilities = a.py_vulns, and only report the last list.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As an aside, it seems strange it's not printing out the vulnerability info, and just seems to print the object.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it taking last item on the list. have you idea for fix?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix is having a list outside of the for loop and adding the vulnerabilities of each file to it. The code from my first comment should do it, although now that I think about it it's probably extend and not append.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^Ahh, this was it! 👍 Just change append to extend and it'll all work! :D

Comment thread pyt/__main__.py Outdated
nosec_lines
)

if args.baseline:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can de-dent this, if args.baseline: as only one call to get_vulnerabilities_not_in_baseline, with the vulnerabilities of every file will work.

Comment thread pyt/__main__.py Outdated
args.excluded_paths,
args.recursive
)
vulnerabilities = list()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i created list before loop as you said. the same problem about last item is still continue.
a

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, That's odd, I'll checkout/test your code later today to try and find the issue. 👍

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really weird, I'll look more in-depth on Monday 👍

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out your branch and it was append vs. extend

@KevinHock KevinHock left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super close, just the de-dent and append vs. extend and I think that's mostly it :)

Comment thread pyt/__main__.py
local_modules = get_directory_modules(directory)
tree = generate_ast(path)

if args.baseline:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep this, but just de-dent it so that we only trim once.

@KevinHock

Copy link
Copy Markdown
Collaborator

So I looked at the tests that were failing and the Mock stuff

You can do e.g.

 class MainTest(BaseTestCase):
+    @mock.patch('pyt.__main__.discover_files')
     @mock.patch('pyt.__main__.parse_args')
     @mock.patch('pyt.__main__.find_vulnerabilities')
     @mock.patch('pyt.__main__.text')
-    def test_text_output(self, mock_text, mock_find_vulnerabilities, mock_parse_args):
+    def test_text_output(self, mock_text, mock_find_vulnerabilities, mock_parse_args, mock_discover_files):
         mock_find_vulnerabilities.return_value = 'stuff'
         example_file = 'examples/vulnerable_code/inter_command_injection.py'
         output_file = 'mocked_outfile'
 
+        mock_discover_files.return_value = [example_file]
         mock_parse_args.return_value = mock.Mock(
             autospec=True,
-            filepath=example_file,
             project_root=None,
             baseline=None,
             json=None,

and the same for the other tests.

This makes it so that in the tests, we don't really ever call discover_files, but instead "mock" it, and just use the return value that we want to.

Comment thread pyt/__main__.py
)
initialize_constraint_table(cfg_list)
analyse(cfg_list)
vulnerabilities.extend(find_vulnerabilities(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to you?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are no vulnerability in a.py b.py and c.py but it printing from xss.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't figure out the bug yet, gonna look more tomorrow 😁 This is harder than expected to track down

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i fixed it. its about vulnerabilities = list() location

@KevinHock KevinHock left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write tests for discover_files when you get a chance? 👍

Comment thread pyt/__main__.py Outdated
included_files.append(fullpath)
else:
if target not in excluded_list:
included_files.append(targets[0])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if targets is a list of files, e.g. python -m pyt examples/vulnerable_code/command_injection.py examples/vulnerable_code/XSS.py, then discover_files will return the first file N times. (Where N is the len of targets.)

Comment thread pyt/__main__.py Outdated

for target in targets:
if os.path.isdir(target):
if recursive:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So having if recursive: here it will make it so that if you don't have -r then you won't search directories.

You can change it to:

def discover_files(targets, excluded_files, recursive=False):
    included_files = list()
    excluded_list = excluded_files.split(",")

    for target in targets:
        if os.path.isdir(target):
            for root, dirs, files in os.walk(target):
                for f in files:
                    fullpath = os.path.join(root, f)
                    if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
                        included_files.append(fullpath)
                if not recursive:
                    break
        else:
            if target not in excluded_list:
                included_files.append(target)
    return included_files

Comment thread pyt/__main__.py
args.recursive
)
for path in files:
vulnerabilities = list()

@omergunal omergunal Jun 20, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added the list inside to loop
s

Its ok now

@KevinHock KevinHock Jun 21, 2018

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the bug I found yesterday, or more accurately the thing I don't understand 😕 , is that find_vulnerabilities returns the vulnerabilities for all the files previously analyzed, as if find_vulnerabilities knows all the vulnerabilities found for the other files we've looked at, how does it know this? 😱

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as far as the PR, you can change it to what you had originally, i.e. vulnerabilities = find_vulnerabilities(..), sorry I misunderstood the code, I'll still look into the reason why the code does this though.

@KevinHock KevinHock Jun 21, 2018

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, figured it out, so I knew constraint_table etc. were global variables that keep state, and that they could be the culprit if they were used weirdly, however it is due to FrameworkAdaptor https://github.com/python-security/pyt/blob/master/pyt/web_frameworks/framework_adaptor.py#L88 adding all the past CFGs to the list :) I'll add a comment to __main__.py about it after we merge this PR

@KevinHock KevinHock left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :)

Comment thread pyt/__main__.py Outdated
if os.path.isdir(target):
for root, dirs, files in os.walk(target):
for f in files:
if not recursive:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important for the if not recursive: to be after the

                        fullpath = os.path.join(root, f)
                        if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
                            included_files.append(fullpath)

this is so that we only iterate through the for f in files: once. i.e. just one-level of depth and not recursively.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i did it

Comment thread pyt/__main__.py Outdated
vulnerabilities,
args.baseline
)
vulnerabilities = get_vulnerabilities_not_in_baseline(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for de-denting the if args.baseline: You can de-dent the vulnerabilities = get_vulnerabilities_not_in_baseline( too, 👍

Comment thread pyt/__main__.py Outdated

for target in targets:
if os.path.isdir(target):
for root, dirs, files in os.walk(target):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line, for root, dirs, files in os.walk(target): is indented one more level than it has to be.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will try to do better for returning "included_files"

@KevinHock KevinHock left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just make the tests pass and I'll merge 👍 (Feel free to write tests for discover_files if you'd like to though.)

Comment thread pyt/__main__.py Outdated
excluded_list = excluded_files.split(",")
for target in targets:
if os.path.isdir(target):
for root, dirs, files in os.walk(target):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can de-dent from line 38 to line 44

Comment thread pyt/__main__.py Outdated
args.baseline
)


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can delete this newline

@omergunal

Copy link
Copy Markdown
Contributor Author

and its done. i will write test for discover_files later

@KevinHock KevinHock left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So happy to merge 🎊 🎈 🎉 🎂

@KevinHock KevinHock merged commit 853300b into python-security:master Jun 23, 2018
KevinHock added a commit that referenced this pull request Jun 23, 2018
…ll versions, add travis commands to tox so this does not happen again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants