Log most expensive predicates and timings to query log by angelapwen · Pull Request #1349 · github/vscode-codeql

angelapwen · 2022-05-16T18:10:24Z

Prior to our move to structured logging, the most expensive predicates and timing summary was available in the Query Log generated for each query run. When we moved to structured logging, this information was only surfaced in the Query Server console and the Structured Evaluator Log (Summary) views. We are now logging this information back to the Query Log for parity with the existing user experience.

The per-query query.log file now ends with:

Checklist

CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
Issues have been created for any UI or other user-facing changes made by this pull request.
[Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

angelapwen · 2022-05-16T18:10:52Z

extensions/ql-vscode/src/run-queries.ts

+            // Write summary to Query Log as well, as this information was present here before structured logging.
+            void logger.log(' --- Evaluator Log Summary --- ', { additionalLogLocation: this.logPath });
+            void logger.log(buffer.toString(), { additionalLogLocation: this.logPath });


Note that this will also log to the "CodeQL Extension Log" console view as well, so this information is showing in many different places (Query Server console and Structured Evaluator Log (Summary), and now Query Log and Extension Log). Does this seem too noisy?

Hmmm...not sure. It seems like query information is best kept in the query log. But, if people are asking for it in the extension log as well, that's probably fine.

Nobody is actually asking for it in the extension log, only in the query log. But I was under the impression that in order to log to the query log, I had to also log it to the extension log like above. logger.log() without the additionalLogLocation only writes to the extension log, I believe?

Which log it writes to depends on the logger object at construction time. We have three global logger objects, one each for the extension log, the query server log, and the language server log. Where does this variable come from?

I also think it is best kept in the query server log.

A ha, I see. I can just continue to use the query server logger object and log to an additional location of the query log (as requested).

…y-log

aeisenberg

Looks good. Before merging can you try out running two queries at once to see if their results are intermingled in the logs?

In particular, try running 1 long query, and before it finishes, run a short query and see if al the log entries are in the proper file.

My guess is no, but this isn't something you need to worry about now. Pretty sure this is a long standing limitation of the query server, but want to see how it behaves with structured logging.

angelapwen · 2022-05-20T20:52:06Z

Eek, sorry, I had enabled auto-merge. Let me give that a try now.

angelapwen · 2022-05-20T21:29:08Z

Ok. I ran Definitions.ql on the FreeCAD database and tryfinally.ql (the short query) after the former began on main. All the log entries ended up in the appropriate query.log and structured log files, but it seems that tryfinally actually did not begin running until Definitions finished. See screenshot (ignore the results shown, that was from a prior run).

Additionally it seems like both are taking quite a while to display results because the extension is blocking on Definitions.ql's log summary file being generated. I find that odd because #1350 was supposed to prevent blocking on the log summary generation, and I guess a hypothesis would be that is just the file I/O (reading in the log file to summarize) that is taking so long. (The reading is not async). But it seems to also have blocked the log summary generation and completion of the faster query, which I didn't expect.

aeisenberg · 2022-05-21T04:47:06Z

Thanks for looking into this. Now that I remember, this is expected behaviour. Queries on the same database are run serially, and queries on separate databases are run in parallel. This is enough of an edge case that I think we are fine the way things are.

angelapwen · 2022-05-22T00:07:48Z

Thanks for looking into this. Now that I remember, this is expected behaviour. Queries on the same database are run serially, and queries on separate databases are run in parallel. This is enough of an edge case that I think we are fine the way things are.

Ah, I see, I had thought about trying a query on another database but didn't. I guess that's rare enough that we haven't heard this as pain point too much.

I'm still not sure why both query results blocked on the former's summary file being created. I guess it is that the summary command is being run serially as well? 🤔 not quite what I expected.

aeisenberg · 2022-05-22T15:32:41Z

Hmmm...not sure. Perhaps @edoardopirovano can weigh in here. I would think that true parallelism here would be a major change to the query server, but it would be nice to better understand the mechanisms going on.

edoardopirovano · 2022-05-22T16:11:30Z

(My apologies to any external contributors who may be following along this conversation: The comment below may be hard to follow as it contains many links to our private code base because that is where the underlying issue is.)

Hmm, it looks like something is quite broken here. I don't think the fact we run queries serially on a single DB is the expected behaviour: I think that's a recent regression and some debugging shows that I unfortunately introduced this with a mistake in https://github.com/github/semmle-code/pull/41918. In particular, on this line I think I should have chosen withDBShared rather than withDBExclusive.

My intent was that we would continue to allow parallel running of queries (as we have before), but would rotate the structured logs so that everything ends up in the log of the query that started last (not ideal, I'll admit, but this is a rare enough occurrence that it seemed okay to avoid the extra complexity of having to keep multiple file handles open and log to all of them). However, what's happened by using withDBExclusive there is that when we make the call to start the log for the second query (which we await since we expect it to return immediately), that actually doesn't return until the first query has finished running because it's waiting for an exclusive lock on the DB! The effect is that the logs do end up being sensible with everything being in the right place, but only because we've serialised things (which, again, I believe to be a regression). Additionally, the fact we're awaiting that call means we block a thread in VS Code for a very long time and other weirdness can ensue (like what Angela observed).

Note that using withDBShared is safe because there's already locking much lower down in StructuredLogger (here), so we don't need to lock the whole DB to rotate the log (this is very much deliberate because as I said I had intended to keep it possible to allow parallel queries). That fine grained lock on just the logger is safe to wait for because we'll never need to wait a long time for it.

In summary, I think Angela has uncovered a bug and the solution is to change withDBExclusive to withDBShared in the startLog and endLog functions of EvaluationServer. @angelapwen / @aeisenberg: Let me know if you both agree with my assessment and I'll get a PR up (after doing a bit more testing to make sure that really fixes things).

aeisenberg · 2022-05-22T17:01:13Z

This all seems reasonable to me, but I am not entirely familiar with this area. Would you be able to create an issue for this in our internal repository and the rest of our team can comment?

Ganselo9

#1426

#@_
Duplicate of #_

Log most expensive predicates and timings to query log

df61cac

angelapwen requested a review from a team as a code owner May 16, 2022 18:10

angelapwen commented May 16, 2022

View reviewed changes

angelapwen added 4 commits May 16, 2022 14:11

Update CHANGELOG.md

d4a04a4

Stop printing end of query summaries to Extension Log

07a8c1b

Merge remote-tracking branch 'upstream/main' into add-summary-to-quer…

b2fe057

…y-log

Merge branch 'main' into add-summary-to-query-log

326b213

angelapwen enabled auto-merge (squash) May 20, 2022 20:10

angelapwen requested review from adityasharad and aeisenberg May 20, 2022 20:10

aeisenberg approved these changes May 20, 2022

View reviewed changes

angelapwen merged commit 2f9aca7 into github:main May 20, 2022

angelapwen deleted the add-summary-to-query-log branch May 23, 2022 12:35

Ganselo9 reviewed Jul 17, 2022

View reviewed changes

Conversation

angelapwen commented May 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

angelapwen May 16, 2022

Choose a reason for hiding this comment

Uh oh!

aeisenberg May 17, 2022

Choose a reason for hiding this comment

Uh oh!

angelapwen May 17, 2022

Choose a reason for hiding this comment

Uh oh!

adityasharad May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

angelapwen May 17, 2022

Choose a reason for hiding this comment

Uh oh!

aeisenberg left a comment

Choose a reason for hiding this comment

Uh oh!

angelapwen commented May 20, 2022

Uh oh!

angelapwen commented May 20, 2022

Uh oh!

aeisenberg commented May 21, 2022

Uh oh!

angelapwen commented May 22, 2022

Uh oh!

aeisenberg commented May 22, 2022

Uh oh!

edoardopirovano commented May 22, 2022

Uh oh!

aeisenberg commented May 22, 2022

Uh oh!

Ganselo9 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

angelapwen commented May 16, 2022 •

edited

Loading

adityasharad May 17, 2022 •

edited

Loading

Ganselo9 left a comment •

edited

Loading