Skip to content

Log most expensive predicates and timings to query log#1349

Merged
angelapwen merged 5 commits intogithub:mainfrom
angelapwen:add-summary-to-query-log
May 20, 2022
Merged

Log most expensive predicates and timings to query log#1349
angelapwen merged 5 commits intogithub:mainfrom
angelapwen:add-summary-to-query-log

Conversation

@angelapwen
Copy link
Copy Markdown
Contributor

@angelapwen angelapwen commented May 16, 2022

Prior to our move to structured logging, the most expensive predicates and timing summary was available in the Query Log generated for each query run. When we moved to structured logging, this information was only surfaced in the Query Server console and the Structured Evaluator Log (Summary) views. We are now logging this information back to the Query Log for parity with the existing user experience.

The per-query query.log file now ends with:

Screen Shot 2022-05-16 at 2 12 42 PM

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • [Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

@angelapwen angelapwen requested a review from a team as a code owner May 16, 2022 18:10
Comment on lines +210 to +212
// Write summary to Query Log as well, as this information was present here before structured logging.
void logger.log(' --- Evaluator Log Summary --- ', { additionalLogLocation: this.logPath });
void logger.log(buffer.toString(), { additionalLogLocation: this.logPath });
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will also log to the "CodeQL Extension Log" console view as well, so this information is showing in many different places (Query Server console and Structured Evaluator Log (Summary), and now Query Log and Extension Log). Does this seem too noisy?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...not sure. It seems like query information is best kept in the query log. But, if people are asking for it in the extension log as well, that's probably fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody is actually asking for it in the extension log, only in the query log. But I was under the impression that in order to log to the query log, I had to also log it to the extension log like above. logger.log() without the additionalLogLocation only writes to the extension log, I believe?

Copy link
Copy Markdown
Contributor

@adityasharad adityasharad May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which log it writes to depends on the logger object at construction time. We have three global logger objects, one each for the extension log, the query server log, and the language server log. Where does this variable come from?

I also think it is best kept in the query server log.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ha, I see. I can just continue to use the query server logger object and log to an additional location of the query log (as requested).

@angelapwen angelapwen enabled auto-merge (squash) May 20, 2022 20:10
Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Before merging can you try out running two queries at once to see if their results are intermingled in the logs?

In particular, try running 1 long query, and before it finishes, run a short query and see if al the log entries are in the proper file.

My guess is no, but this isn't something you need to worry about now. Pretty sure this is a long standing limitation of the query server, but want to see how it behaves with structured logging.

@angelapwen angelapwen merged commit 2f9aca7 into github:main May 20, 2022
@angelapwen
Copy link
Copy Markdown
Contributor Author

Eek, sorry, I had enabled auto-merge. Let me give that a try now.

@angelapwen
Copy link
Copy Markdown
Contributor Author

Ok. I ran Definitions.ql on the FreeCAD database and tryfinally.ql (the short query) after the former began on main. All the log entries ended up in the appropriate query.log and structured log files, but it seems that tryfinally actually did not begin running until Definitions finished. See screenshot (ignore the results shown, that was from a prior run).
Screen Shot 2022-05-20 at 4 58 51 PM

Additionally it seems like both are taking quite a while to display results because the extension is blocking on Definitions.ql's log summary file being generated. I find that odd because #1350 was supposed to prevent blocking on the log summary generation, and I guess a hypothesis would be that is just the file I/O (reading in the log file to summarize) that is taking so long. (The reading is not async). But it seems to also have blocked the log summary generation and completion of the faster query, which I didn't expect.

@aeisenberg
Copy link
Copy Markdown
Contributor

Thanks for looking into this. Now that I remember, this is expected behaviour. Queries on the same database are run serially, and queries on separate databases are run in parallel. This is enough of an edge case that I think we are fine the way things are.

@angelapwen
Copy link
Copy Markdown
Contributor Author

Thanks for looking into this. Now that I remember, this is expected behaviour. Queries on the same database are run serially, and queries on separate databases are run in parallel. This is enough of an edge case that I think we are fine the way things are.

Ah, I see, I had thought about trying a query on another database but didn't. I guess that's rare enough that we haven't heard this as pain point too much.

I'm still not sure why both query results blocked on the former's summary file being created. I guess it is that the summary command is being run serially as well? 🤔 not quite what I expected.

@aeisenberg
Copy link
Copy Markdown
Contributor

Hmmm...not sure. Perhaps @edoardopirovano can weigh in here. I would think that true parallelism here would be a major change to the query server, but it would be nice to better understand the mechanisms going on.

@edoardopirovano
Copy link
Copy Markdown
Contributor

(My apologies to any external contributors who may be following along this conversation: The comment below may be hard to follow as it contains many links to our private code base because that is where the underlying issue is.)

Hmm, it looks like something is quite broken here. I don't think the fact we run queries serially on a single DB is the expected behaviour: I think that's a recent regression and some debugging shows that I unfortunately introduced this with a mistake in https://github.com/github/semmle-code/pull/41918. In particular, on this line I think I should have chosen withDBShared rather than withDBExclusive.

My intent was that we would continue to allow parallel running of queries (as we have before), but would rotate the structured logs so that everything ends up in the log of the query that started last (not ideal, I'll admit, but this is a rare enough occurrence that it seemed okay to avoid the extra complexity of having to keep multiple file handles open and log to all of them). However, what's happened by using withDBExclusive there is that when we make the call to start the log for the second query (which we await since we expect it to return immediately), that actually doesn't return until the first query has finished running because it's waiting for an exclusive lock on the DB! The effect is that the logs do end up being sensible with everything being in the right place, but only because we've serialised things (which, again, I believe to be a regression). Additionally, the fact we're awaiting that call means we block a thread in VS Code for a very long time and other weirdness can ensue (like what Angela observed).

Note that using withDBShared is safe because there's already locking much lower down in StructuredLogger (here), so we don't need to lock the whole DB to rotate the log (this is very much deliberate because as I said I had intended to keep it possible to allow parallel queries). That fine grained lock on just the logger is safe to wait for because we'll never need to wait a long time for it.

In summary, I think Angela has uncovered a bug and the solution is to change withDBExclusive to withDBShared in the startLog and endLog functions of EvaluationServer. @angelapwen / @aeisenberg: Let me know if you both agree with my assessment and I'll get a PR up (after doing a bit more testing to make sure that really fixes things).

@aeisenberg
Copy link
Copy Markdown
Contributor

This all seems reasonable to me, but I am not entirely familiar with this area. Would you be able to create an issue for this in our internal repository and the rest of our team can comment?

@angelapwen angelapwen deleted the add-summary-to-query-log branch May 23, 2022 12:35
Copy link
Copy Markdown

@Ganselo9 Ganselo9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1426

  • #@_
    Duplicate of #_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants