It's important to get some core platform concepts in place to try to troubleshoot this issue. The big one is the distinction between DML operations, trigger invocations, and transactions.
Your transaction is the unit you need to be worrying about. The transaction, if I can summarize ruthlessly, spans everything from your entry point (wherever execution started, be it a UI update, an Anonymous Apex invocation, a unit test being executed, or an API call) to the end of all synchronous functionality spawned as a consequence thereof.
In your scenario, the external API call marks the start of a transaction. All triggers that execute subsequent to that call are part of the same transaction. Any Asynchronous Apex (future methods, Queueable Apex, Scheduled Apex, and Batch Apex) takes place in a separate transaction.
Experimenting with Unit Tests in a test Sandbox I've noticed a few things. First off that for each block of 200, I'm supposedly consuming 14 queries (I'm using Limits.getQueries()). If I update more than 200, they get automatically partitioned in the Trigger, but they still are tallying up for the SOQL Limit,
This is how the SOQL limit (and many other limits as well) work. They are bounded by the transaction, not by the DML Operation or trigger invocation. Your trigger may be called upon to process more than one block of 200 records in any given transaction.
if I do two separate Updates of 100 each, the Trigger treated each chunk with a separate trigger of 100, but the SOQL count is still adding up for both (28 total SOQL queries).
The updates are still taking place in the same transaction. Each unit test takes place in one transaction.
Overall, unit tests are unlikely to be the most effective tool in analyzing this whole problem. They may help you understand the limits consumption of your code in specific scenarios, which is good, and help you guard against specific expensive mistakes. But what you need to do is understand the entire path of execution during your external-call-initiated sequence of operations and where, specifically, SOQL is being used in that sequence, with real data or data representative of what your code is seeing in production.
If your real-world code is using an order of magnitude more queries than your unit tests, you should examine your logs to see whether you may have trigger recursion taking place, or a call stack much deeper than you think it is due to misshapen logic that your unit tests are not designed to catch.
All that said, ...
This aren't being done in a single instance, the c# code that is calling the Salesforce API is already doing all upserts in chunks of 200 at most.
Turning your batch size down is often a quick-and-dirty solution to this kind of issue.
Best Answer
Try running the query using the new query resource feedback parameter in both environments. There may be a difference in the query plan being used based on the volume of data and available indexes.
See Developing Selective Force.com Queries through the Query Resource Feedback Parameter Pilot
There should be an option to access this in the Developer Console with a Summer 14 Org. That won't help you at the moment with production.
Depending on your query, you may need to look at: