SQL

From Craft of Testing - Wiki
Revision as of 21:48, 4 April 2025 by Gregpaskal (talk | contribs) (Created page with "{{: Header_Nav}} This section is all about SQL In Jan 2025 I took the following course on Google Cloud Skills Boost Introduction to SQL for BigQuery and Cloud SQL https://www.cloudskillsboost.google/course_templates/623/labs/503669 SQL = Structured Query Language {| class="wikitable" |+Structured dataset ! {| class="wikitable" | colspan="1" rowspan="1" |User | colspan="1" rowspan="1" |Price | colspan="1" rowspan="1" |Shipped |- | colspan="1" rowspan="1" |Sean | co...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What can we help you find?

Language Syntax
XPath Examples

Random
Home Testing Automation AI IoT OS RF Data Development References Tools Wisdom Inspiration Investing History Fun POC Help

This section is all about SQL


In Jan 2025 I took the following course on Google Cloud Skills Boost


Introduction to SQL for BigQuery and Cloud SQL

https://www.cloudskillsboost.google/course_templates/623/labs/503669


SQL = Structured Query Language

Structured dataset
User Price Shipped
Sean $35 Yes
Rocky $50 No

SELECT and FROM

SQL is phonetic by nature and before running a query, it's always helpful to first figure out what question you want to ask your data (unless you're just exploring for fun.) SQL has predefined keywords which you use to translate your question into the pseudo-english SQL syntax so you can get the database engine to return the answer you want.

The most essential keywords are SELECT and FROM:

  • Use SELECT to specify what fields you want to pull from your dataset.
  • Use FROM to specify what table or tables you want to pull our data from.

An example may help understanding. Assume that you have the following table example_table, which has columns USER, PRICE, and SHIPPED:

example_table
USER PRICE SHIPPED
Sean $35 Yes
Rocky $50 No
Amanda $20 Yes


And say that you want to just pull the data that's found in the USER column. You can do this by running the following query that uses SELECT and FROM:

SELECT USER FROM example_table

If you executed the above command, you would select all the names from the USER column that are found in example_table.

You can also select multiple columns with the SQL SELECT keyword. Say that you want to pull the data that's found in the USER and SHIPPED columns. To do this, modify the previous query by adding another column value to our SELECT query (making sure it's separated by a comma!):

SELECT USER, SHIPPED FROM example_table

Running the above retrieves the USER and the SHIPPED data from memory:

And just like that you've covered two fundamental SQL keywords! Now to make things a bit more interesting.

WHERE

The WHERE keyword is another SQL command that filters tables for specific column values. Say that you want to pull the names from example_table whose packages were shipped. You can supplement the query with a WHERE, like the following:

SELECT USER FROM example_table WHERE SHIPPED='YES'

Running the above returns all USERs whose packages have been SHIPPED from memory:

Now that you have a baseline understanding of SQL's core keywords, apply what you've learned by running these types of queries in the BigQuery console.


SELECT * = returns all columns.

Task 3. More SQL Keywords: GROUP BY, COUNT, AS, and ORDER BY

GROUP BY

The GROUP BY keyword will aggregate result-set rows that share common criteria (e.g. a column value) and will return all of the unique entries found for such criteria.

This is a useful keyword for figuring out categorical information on tables.

  1. To get a better picture of what this keyword does, clear the query from the editor, then copy and paste the following command:
SELECT start_station_name FROM `bigquery-public-data.london_bicycles.cycle_hire` GROUP BY start_station_name;
  1. Click Run.

Your results are a list of unique (non-duplicate) column values.

Without the GROUP BY, the query would have returned the full 83,434,866 rows. GROUP BY will output the unique column values found in the table. You can see this for yourself by looking in the bottom right corner. You will see 954 rows, meaning there are 954 distinct London bikeshare starting points.

COUNT

The COUNT() function will return the number of rows that share the same criteria (e.g. column value). This can be very useful in tandem with a GROUP BY.

Add the COUNT function to our previous query to figure out how many rides begin at each starting point.

  • Clear the query from the editor, then copy and paste the following command and then click Run:
SELECT start_station_name, COUNT(*) FROM `bigquery-public-data.london_bicycles.cycle_hire` GROUP BY start_station_name;


Your output shows how many bikeshare rides begin at each starting location.

AS

SQL also has an AS keyword, which creates an alias of a table or column. An alias is a new name that's given to the returned column or table—whatever AS specifies.

  1. Add an AS keyword to the last query you ran to see this in action. Clear the query from the editor, then copy and paste the following command:
SELECT start_station_name, COUNT(*) AS num_starts FROM `bigquery-public-data.london_bicycles.cycle_hire` GROUP BY start_station_name;
  1. Click Run.

For Results, the right column name changed from COUNT(*) to num_starts.

As you see, the COUNT(*) column in the returned table is now set to the alias name num_starts. This is a handy keyword to use especially if you are dealing with large sets of data — forgetting that an ambiguous table or column name happens more often than you think!

ORDER BY

The ORDER BY keyword sorts the returned data from a query in ascending or descending order based on a specified criteria or column value. Add this keyword to our previous query to do the following:

  • Return a table that contains the number of bikeshare rides that begin at each starting station, organized alphabetically by the starting station.
  • Return a table that contains the number of bikeshare rides that begin at each starting station, organized numerically from lowest to highest.
  • Return a table that contains the number of bikeshare rides that begin at each starting station, organized numerically from highest to lowest.

Each of the commands below is a separate query. For each command:

  1. Clear the query Editor.
  2. Copy and paste the command into the query Editor.
  3. Click Run. Examine the results.
SELECT start_station_name, COUNT(*) AS num FROM `bigquery-public-data.london_bicycles.cycle_hire` GROUP BY start_station_name ORDER BY start_station_name;
SELECT start_station_name, COUNT(*) AS num FROM `bigquery-public-data.london_bicycles.cycle_hire` GROUP BY start_station_name ORDER BY num;
SELECT start_station_name, COUNT(*) AS num FROM `bigquery-public-data.london_bicycles.cycle_hire` GROUP BY start_station_name ORDER BY num DESC;

The results of the last query lists start locations by the number of starts from that location.

You see that "Hyde Park Corner, Hyde Park" has the highest number of starts. However, as a fraction of the total (671688/83434866), you see that < 1% of rides start from this station.

DELETE keyword

Here are a couple more SQL keywords that help us with data management. The first is the DELETE keyword.

  • Run the following commands in your MySQL session to delete the first row of the london1 and london2:
DELETE FROM london1 WHERE num=0;
DELETE FROM london2 WHERE num=0;

You should receive the following output after running both commands:

Query OK, 1 row affected (0.04 sec)

The rows deleted were the column headers from the CSV files. The DELETE keyword will not remove the first row of the file per se, but all rows of the table where the column name (in this case "num") contains a specified value (in this case "0"). If you run the SELECT * FROM london1; and SELECT * FROM london2; queries and scroll to the top of the table, you will see that those rows no longer exist.

INSERT INTO keyword

You can also insert values into tables with the INSERT INTO keyword.

  • Run the following command to insert a new row into london1, which sets start_station_name to "test destination" and num to "1":
INSERT INTO london1 (start_station_name, num) VALUES ("test destination", 1);

The INSERT INTO keyword requires a table (london1) and will create a new row with columns specified by the terms in the first parenthesis (in this case "start_station_name" and "num"). Whatever comes after the "VALUES" clause will be inserted as values in the new row.

You should receive the following output:

Query OK, 1 row affected (0.05 sec)

If you run the query SELECT * FROM london1; you will see an additional row added at the bottom of the "london1" table.

UNION keyword

The last SQL keyword that you'll learn about is UNION. This keyword combines the output of two or more SELECT queries into a result-set. You use UNION to combine subsets of the "london1" and "london2" tables.

The following chained query pulls specific data from both tables and combines them with the UNION operator.

  • Run the following command at the MySQL server prompt:
SELECT start_station_name AS top_stations, num FROM london1 WHERE num>100000
UNION
SELECT end_station_name, num FROM london2 WHERE num>100000
ORDER BY top_stations DESC;

The first SELECT query selects the two columns from the "london1" table and creates an alias for "start_station_name", which gets set to "top_stations". It uses the WHERE keyword to only pull rideshare station names where over 100,000 bikes start their journey.

The second SELECT query selects the two columns from the "london2" table and uses the WHERE keyword to only pull rideshare station names where over 100,000 bikes end their journey.


The UNION keyword in between combines the output of these queries by assimilating the "london2" data with "london1". Since "london1" is being unioned with "london2", the column values that take precedence are "top_stations" and "num".

ORDER BY will order the final, unioned table by the "top_stations" column value alphabetically and in descending order.

Example output (your results may differ) :


As you see, 13/14 stations share the top spots for rideshare starting and ending points. With some basic SQL keywords you were able to query a sizable dataset, which returned data points and answers to specific questions.